growing · updated 2025-10-18 22:03:00

Metatopic v3

Universal semantic indexer using budget-constrained interval segmentation for any ordered corpus.

Core Innovation

Transform ordered sequences (text, transcripts, timelines) into hierarchical, gap-free intervals through budget-constrained semantic compression.

Key guarantee: Every unit belongs to exactly one interval at each hierarchy level.

The Budget Model

See Budget Constraints Drive Emergence for philosophy.

CREATE_COST = 1.0  # Creating new vocabulary term
REUSE_COST = 0.1   # Reusing existing term

Budget dynamics:
B_t+1 = B_t - costs_t + η · fidelity_t · reuse_rate_t

The 10:1 ratio forces intelligent vocabulary reuse, creating emergent semantic organization.

Technical

  • PostgreSQL INT4RANGE + EXCLUDE constraints for gap-free coverage
  • Groq LLM (gpt-oss-120b) with structured outputs via instructor
  • SHA256-based caching to avoid redundant LLM calls
  • Async, type-safe, fully tested
  • Latin schema naming (documentum, comparatio, rogatio)

Real Data

Includes examples using real data: - Homer’s Iliad (790KB) - 197 podcast transcripts

Watch vocabulary reuse increase: - Episode 1: 0% reuse (creates all terms) - Episode 10: 75% reuse (heavy reuse) - Cost savings: 50+ budget units

Meta-Testing

Innovative approach: LLM-based code quality tests. Using AI to evaluate AI-assisted code for qualitative review.

Future

Deploy to scriptorium.online as MCP server. Any project can use semantic indexing as a service.

This garden itself could use metatopic for indexing.