Skills catalog
colibri-skills is Colibri’s read-only runtime consumer for Clawdie-AI skill
artifacts. Clawdie-AI authors and reviews the skillpacks; Colibri indexes
them, validates checksums, chunks searchable text, and exposes typed structs to
the daemon, CLI, and TUI. This crate does not author skills.
→ crates/colibri-skills/src/lib.rs
→ docs/COLIBRI-SKILLS-PLAN.md
Decisions
Section titled “Decisions”Source of truth stays in Clawdie-AI
Section titled “Source of truth stays in Clawdie-AI”Skill artifacts live in the clawdie-ai repository, not in colibri. They are
committed reviewed directories containing prose, screenshots, transcripts,
scripts, a manifest, and a checksum file. colibri-skills imports these
artifacts into Colibri’s SQLite store at runtime.
This split preserves review discipline: a skill changes through a PR in its home repo, then Colibri re-indexes the checkout.
Read-only, not authoring
Section titled “Read-only, not authoring”The crate deliberately lacks “create skill” or “edit skill” operations. Those belong in Clawdie-AI where human review and media pipelines run. Putting authoring here would duplicate state and split review authority.
The import path is target for Phase 1: scan the configured Clawdie-AI checkout, parse manifests, verify checksums, and upsert into SQLite. The type scaffold exists today; the importer, chunker, and FTS5 index are planned.
→ docs/COLIBRI-SKILLS-PLAN.md (Phases 1-7)
Manifest-driven identity
Section titled “Manifest-driven identity”Each skill directory contains a run manifest file. From it the importer derives:
skill_iddisplay_namesource_pathwithin the Clawdie-AI checkout- pipeline stages and models used
- source media metadata
Any file not listed in the manifest can still be classified and indexed as an artifact, but the manifest is the canonical identity document.
Artifact classification by extension and filename
Section titled “Artifact classification by extension and filename”ArtifactType::from_path classifies files without relying on a sidecar:
- Python or shell files → Script
- paths containing contact_sheet → ContactSheet
- paths containing run_manifest and ending in .json → Manifest
- paths containing sha256 or checksum → Checksum
- paths containing report and ending in .json → Report
- .md → Document
- .jpg / .png / .webp → Image
- .txt transcript files → Transcript
- anything else → Other
This heuristic keeps classification local and fast. Misclassified files can be fixed by renaming within Clawdie-AI.
→ crates/colibri-skills/src/lib.rs (ArtifactType::from_path)
Checksums are validated, then stored
Section titled “Checksums are validated, then stored”The run manifest is accompanied by a checksum file. At import time the runtime
computes SHA-256 of each artifact and compares it to the committed checksum.
Failures are reported in ImportSummary::checksum_failures and prevent
success().
Only the hash is stored in SQLite; image and media blobs stay on disk. The catalog stores relative paths and hashes, not the binary content.
Content is chunked into searchable units
Section titled “Content is chunked into searchable units”The planned chunker turns skill content into SkillChunk rows:
- Markdown sections by heading
- Command blocks
- Code blocks
- Tables
- Transcript segments
Chunks are the unit of search and the unit shown in TUI or CLI results.
SkillChunk carries line_start/line_end so a hit can point back to the
source artifact.
→ crates/colibri-skills/src/lib.rs (SkillChunk, ChunkType)
SQLite + FTS5 as the runtime search backend
Section titled “SQLite + FTS5 as the runtime search backend”The target schema keeps three tables:
system_skills— one row per skillsystem_skill_artifacts— one row per filesystem_skill_chunks— one row per searchable chunk, plus a virtual FTS5 table for ranked text search
This matches the store’s pragmatic relational model. If skill volumes grow beyond tens of thousands of chunks, we can move the FTS index to PostgreSQL pgvector; until then, SQLite keeps the control-plane self-contained.
→ docs/COLIBRI-SKILLS-PLAN.md (SQLite schema target)
Status is a lifecycle marker, not a state machine
Section titled “Status is a lifecycle marker, not a state machine”SkillStatus is active, archived, or superseded. There is no pending
review state because review happens in Clawdie-AI before import. Colibri simply
stops returning archived skills in default searches but keeps them in the store
for audit and explicit lookups.
Natural-language verification question
Section titled “Natural-language verification question”Each skill can carry a verification field like “can the user create and run
an Astro project?”. This is not an executable test; it is the acceptance
criterion used during skill review and later during agent self-verification.
Runtime commands are read-only
Section titled “Runtime commands are read-only”The CLI surface is planned as:
colibri list-skillscolibri show-skill <id>colibri search-skills <query>colibri index-skillscolibri verify-skill <id>
index-skills refreshes the catalog from disk. The remaining commands query the
runtime store. None mutate the Clawdie-AI checkout.
Entity shape
Section titled “Entity shape”Skill ├─ skill_id, display_name, source_path, status, verification ├─ SkillManifest │ ├─ run_id, created, notes │ ├─ ManifestSource │ ├─ [PipelineStage] │ └─ [ModelUsage] └─ [SkillArtifact] ├─ artifact_type, relative_path, file_name, mime_type, size_bytes, sha256_hash └─ [SkillChunk] ├─ chunk_type, heading, content, line_start, line_end, tokens_estimateSee also
Section titled “See also”- store-schema — coordination and planned skill catalog tables
- operator-cli — planned skill catalog CLI commands
- task-board — agents will match claimed tasks to skills by capability