Metatopic v3
Universal semantic indexer using budget-constrained interval segmentation for any ordered corpus.
Core Innovation
Transform ordered sequences (text, transcripts, timelines) into hierarchical, gap-free intervals through budget-constrained semantic compression.
Key guarantee: Every unit belongs to exactly one interval at each hierarchy level.
The Budget Model
See Budget Constraints Drive Emergence for philosophy.
CREATE_COST = 1.0 # Creating new vocabulary term
REUSE_COST = 0.1 # Reusing existing term
Budget dynamics:
B_t+1 = B_t - costs_t + η · fidelity_t · reuse_rate_t
The 10:1 ratio forces intelligent vocabulary reuse, creating emergent semantic organization.
Technical
- PostgreSQL INT4RANGE + EXCLUDE constraints for gap-free coverage
- Groq LLM (gpt-oss-120b) with structured outputs via instructor
- SHA256-based caching to avoid redundant LLM calls
- Async, type-safe, fully tested
- Latin schema naming (documentum, comparatio, rogatio)
Real Data
Includes examples using real data: - Homer’s Iliad (790KB) - 197 podcast transcripts
Watch vocabulary reuse increase: - Episode 1: 0% reuse (creates all terms) - Episode 10: 75% reuse (heavy reuse) - Cost savings: 50+ budget units
Meta-Testing
Innovative approach: LLM-based code quality tests. Using AI to evaluate AI-assisted code for qualitative review.
Future
Deploy to scriptorium.online as MCP server. Any project can use semantic indexing as a service.
This garden itself could use metatopic for indexing.