Sonde
An MCP-native topic registry for GitHub, arXiv, Hugging Face, RSS, and web monitors. Declare what you're watching, simulate before you collect, and give agents governed access to fresh signal with full lineage.
COLLECTION_FLOW
01 Declare topic intent in YAML
02 Lint, dedupe, simulate
03 Run produces versioned manifest
04 Every artifact traceable to its origin.
Not a Scraper. The Layer Above.
Most collection systems hide their actual intent in scattered YAML, scripts, seed URLs, RSS lists, API queries, and watchlists. Sonde turns that hidden layer into a governed topic pack that can live in Git and produce reproducible run manifests.
Every collected artifact answers: which topic and topic version produced this, which config hash, which adapter and source query, when it was collected, and what raw or normalized source record it came from.
CLI Commands
sonde initScaffold a new topic configuration
sonde lintValidate topic schemas and detect issues
sonde dedupeFind duplicate or near-duplicate queries
sonde diffCompare two config versions side-by-side
sonde simulateSample expected yield and noise for a topic
sonde runExecute collection with manifest generation
sonde statusView run history and registry state
sonde exportPackage topics as portable topic packs
sonde mcpStart MCP server for agent access
sonde versionPrint version information
Source Adapters
GitHub
Repository search API
arXiv
Atom API
Hugging Face
Hub API
RSS
Public feeds
Local JSONL
Offline fixtures
Declarative Topics
Topics are versioned YAML objects with explicit intent, queries, negative terms, source bindings, schedules, and ownership. Everything that drives collection lives in one inspectable place.
Lint catches schema errors. Dedupe catches query overlap. Diff shows what changed between versions. Simulate shows what would be collected before you commit.
- id: "agent_security_model"
intent: "Track emerging work on identity,
permissioning, and threat models
for AI agents."
version: "1.0.0"
priority: high
queries:
- "agent security model"
- "AI agent permissioning"
negative_terms:
- "real estate agent"
schedule:
interval_minutes: 120
MCP Surface
15 tools, 13 resources, and 6 prompts. Agents get governed access to the full topic lifecycle — from drafting and validation to collection and lineage inspection.
Tools
lint_topicsValidate a topic config
dedupe_topicsFind duplicate and overlapping topics
find_semantic_overlapDetect semantic overlap between topics
diff_topicsCompare two topic configs
simulate_topicSample expected yield and noise
estimate_collection_costEstimate API requests, artifacts, storage
run_topic_dry_runExecute a dry run, return manifest
create_topic_draftCreate a new draft topic
update_topic_draftModify a topic, returns diff
deprecate_topicTransition topic to deprecated
promote_topicPromote a draft to active
rollback_topic_versionRoll back to a previous version
generate_aliasesGenerate query aliases from intent
generate_negative_termsGenerate negative terms to reduce noise
summarize_topic_healthYield, noise, staleness, coverage report
Resources
sonde://topicsAll topics (summary)
sonde://topics/{id}Full topic definition
sonde://topics/{id}/versionsVersion history
sonde://topics/{id}/qualityQuality metrics
sonde://sourcesAll configured source IDs
sonde://runsRecent collection runs
sonde://runs/{id}Full run manifest
sonde://artifacts/{id}Single artifact with lineage
sonde://lineage/artifact/{id}Lineage chain for an artifact
sonde://diffs/{from}/{to}Diff between topic versions
sonde://schema/topicTopic JSON schema
sonde://schema/artifactArtifact JSON schema
Prompts
review_topic_qualityReview yield, noise, overlap, and versioning
create_collection_strategyDesign a collection strategy for a domain
expand_topic_aliasesExpand aliases and negative terms
deprecate_noisy_topicDraft a deprecation decision
write_signal_reportSummarize recent signal for a topic
recommend_deprecationsIdentify candidates for deprecation
Full Lineage
Versioned Manifests
Every run produces a manifest under .sonde/artifacts/manifests/ with config hash, topic version, and timestamps.
SQLite Registry
Local database at .sonde/sonde.db tracks all runs, their parameters, and outcomes.
Normalized Artifacts
Collected items appended to .sonde/artifacts/normalized/artifacts.jsonl with full provenance.
Govern what you collect.
Open source. MIT licensed. Built by Adjective.