Contents
  1. 1 The Problem with How We Build LLM Systems Today
  2. 2 What OrgBox Is
  3. 3 The Seven-Layer Architecture
  4. 4 The Bridge Ontology — Cross-Domain Reasoning as Infrastructure
  5. 5 The Three-Perspective Cognitive Model
  6. 6 Human Expertise Integration — Where AI Cannot Substitute
  7. 7 Model-Agnostic Adapter Pattern and Cost-Optimized Routing
  8. 8 Deployment Architecture
  9. 9 Instantiation Guide — Building Your Own Box
  10. 10 Comparison to Existing Approaches
  11. 11 The Open-Source Specification and Invitation to Build
Section 1

The Problem with How We Build LLM Systems Today

Brief
The industry relies on five approaches to make LLMs useful inside organizations: fine-tuning, RAG, system prompts, MCP, and platform wrappers. Each solves part of the problem. None solves the whole problem. No combination of them — without significant custom engineering — produces a unified, portable, model-agnostic system with structured knowledge, cross-domain reasoning, behavioral identity, cost-optimized routing, and formal evaluation. That gap is where most organizational AI deployments quietly fail. OrgBox is designed to fill it.
Technical Summary

Large language models are powerful, but the methods for deploying them inside organizations have not kept pace with the models themselves. The industry has converged on five dominant approaches, each with structural limitations that become critical at organizational scale.

Fine-tuning embeds domain knowledge directly into model weights. This produces fluent, domain-native outputs but creates four problems: the knowledge is opaque and unauditable, it locks the organization to a single model provider, it is expensive to maintain as knowledge evolves, and it encodes knowledge without encoding behavior — the model learns terminology but not how to conduct a structured conversation.

RAG solves the freshness and traceability problems by retrieving documents at query time. But standard RAG retrieves without reasoning across domains, has no persistent identity or behavioral boundaries, stores knowledge as unstructured text chunks with no ontological relationships, and provides no framework for structured multi-step interactions.

System prompts shape behavior through prepended instructions. They are fast to iterate but brittle across models, unversioned, untestable, and consume context window space. They work for prototypes but not for production systems requiring reliability across thousands of interactions.

MCP standardized tool interoperability — a genuine breakthrough. But it is a protocol for calling functions, not an architecture for organizational intelligence. It defines how a model uses a tool, not when, why, or in what behavioral context.

Platform wrappers (Custom GPTs, Claude Projects, Gems) make it easy to create specialized assistants. But they lock users to a single provider, treat knowledge as a document pile rather than a structured system, and lack evaluation frameworks, cost routing, interaction design, or human expertise boundaries.

Each approach covers real ground. Together, they still leave a gap: no unified, portable, model-agnostic system that integrates structured knowledge, cross-domain reasoning, behavioral identity, validated interaction patterns, cost-optimized routing, human expertise boundaries, and formal evaluation into a single deployable artifact. That gap is where organizational AI deployments fail — not catastrophically, but through outputs that are slightly wrong, slightly generic, and slightly unreliable enough that the organization never fully trusts or commits to the system. OrgBox is the open-source architecture designed to close that gap.

Section 2

What OrgBox Is

Brief
OrgBox is a portable, structured intelligence layer that sits between an organization and any foundation LLM. It is a deployable artifact — like a container image — containing knowledge, ontologies, identity, tools, interaction patterns, and evaluation criteria organized across seven architectural layers. The model generates language; the Box provides everything else. The architecture is model-agnostic and domain-agnostic: swap models by changing a config value, and populate the Box with any domain's knowledge to create a new instance.
Technical Summary

OrgBox is a structured package — a directory with a defined schema — that transforms any compatible LLM into a domain-specific organizational intelligence. The closest analogy is a Docker container image: just as a container packages everything an application needs independent of the host machine, an OrgBox instance packages everything a domain-intelligent AI system needs independent of the underlying model.

The Box contains seven layers. L0 routes queries and manages cost. L1 defines identity, values, and behavioral boundaries. L2 holds structured ontologies and the cross-domain bridge that enables reasoning across fields. L3 stores domain knowledge (vector store, graph store, term bank). L4 provides tool integrations and model adapters via MCP. L5 contains validated interaction patterns — complete conversation architectures, not prompt templates. L6 handles evaluation, guardrails, and compliance. L7 manages deployment and distribution.

The model provides language generation — essential but insufficient. It does not know the organization's domain, values, workflows, or when to escalate to a human. The Box provides all of that.

Model-agnosticism is an architectural principle, not a convenience. The adapter pattern at L4 means switching from Mistral to Claude to Llama requires changing a configuration value. All knowledge, ontology, identity, and evaluation criteria remain intact. This means no organization using OrgBox is dependent on any single AI provider's pricing, licensing, or continued existence. The investment an org makes in building its Box compounds over time rather than depreciating with each model generation.

The architecture is also domain-agnostic. The seven-layer structure, interfaces, and methodology are reusable. The first instance — CommunityLLM — covers community organizing across 20 social science domains. But the same architecture could produce a LegalBox, HealthBox, or OpsBox by populating the layers with different domain content.

OrgBox is not a framework (like LangChain), not a platform (like AWS Bedrock), not a protocol (like MCP), and not a wrapper (like Custom GPTs). It is an open specification for a portable intelligence layer — defining what goes between the model and the organization, how it is structured, and how it is built.

Section 3

The Seven-Layer Architecture

Brief
OrgBox uses seven layers, each with one job: L0 routes queries and manages cost, L1 defines identity and behavior, L2 holds structured ontologies and the cross-domain bridge, L3 stores domain knowledge, L4 connects to tools and models, L5 governs structured conversations, and L6 evaluates and audits outputs. L7 handles deployment. Every component is a human-readable file. The model is a swappable parameter. Data flows down through the layers; evaluation flows back up.
Technical Summary

The seven-layer architecture is governed by four principles: separation of concerns (each layer has one job and can be developed, tested, and replaced independently), directional flow (queries flow down from L0, evaluation flows up through L6), everything is a file (all components are human-readable YAML, JSON, or Markdown in a version-controlled directory), and the model is a parameter (no layer assumes a specific model; all LLM calls go through the L4 adapter interface).

L0 — Query Router and Orchestrator. Every query is classified along five axes: domains touched, complexity (1–5 scale), stakes (low/medium/high), interaction mode, and user context. A 12-type Cognitive Type Taxonomy (Analytical, Evaluative, Synthesis, Ethical Deliberation, Empathic Modeling, etc.) determines what kind of thinking the query demands. These classifications drive three decisions: which model tier handles it, which layers activate, and how the context budget is allocated across layers. Classification happens before any model is invoked, typically under 100ms.

L1 — Identity and Behavioral Core. Structured YAML defining mission, values, persona, communication style, behavioral boundaries, and crisis protocols. Parameterized by user context — the system adapts tone, vocabulary, and depth to expertise level, language, and emotional state. Fully version-controlled and auditable; an org can diff two versions and see exactly what changed.

L2 — Cross-Domain Reasoning Matrix. Three components: an upper ontology (shared categories and relations), domain ontologies (one per field, YAML, following the 7-artifact frozen package), and the bridge ontology (typed cross-domain connections — analogies, causal chains, contradictions — with evidence strength and human validator metadata). The bridge is compressed to ~1–2K tokens for prompt injection.

L3 — Domain Knowledge Store. Knowledge extraction pipeline, vector store (multi-index, scoped by domain), graph store, and term bank. This is where RAG lives — but structured RAG. Retrieved passages carry domain, confidence, and provenance tags, and are assembled alongside L2 ontology context.

L4 — Agentic Capabilities. Tool definitions (MCP-compatible), workflow definitions (YAML, orchestrated by Kestra), and model adapters implementing a standard interface. OrgBox is both MCP server and MCP client.

L5 — Interaction Patterns. Structured multi-turn conversation architectures — not prompt templates. Each pattern specifies flow, branching logic, required information elements, escalation rules, quality criteria, and adaptation rules. These represent professional expertise about how to conduct effective domain-specific conversations.

L6 — Evaluation and Guardrails. Three modes: sync guardrails (fast, always-on safety/scope checks), async evaluation (model-as-judge quality assessment after delivery), and L6.5 compliance/trust layer (provenance audit trail). Key principle: the system validates outputs against verifiable criteria, not against whether the reasoning text looks convincing.

L7 — Deployment. Supports fully local, hybrid, and fully hosted configurations. Reference implementation deploys as a Nextcloud ExApp.

A complete query lifecycle — classification, identity injection, ontology context, knowledge retrieval, model call, evaluation, delivery — completes in 2–4 seconds at ~$0.003 for a Tier 2 query.

Section 4

The Bridge Ontology — Cross-Domain Reasoning as Infrastructure

Brief
The bridge ontology is OrgBox's key innovation: a structured, validated map of connections between knowledge domains. It encodes six types of cross-domain relationships (concept mappings, structural analogies, causal chains, intervention chains, contradictions, abstraction mappings) as YAML records with evidence strength and human validator metadata. The core claim — that a small free model with bridge context can match a frontier model without it — is tested at Phase 1.5 via a 50-query existential test. Results will be published regardless of outcome.
Technical Summary

Knowledge domains are silos. Academic fields develop separate terminology, frameworks, and literatures. The connections between them — where the same phenomenon appears under different names, where an intervention in one field produces effects in another, where two fields contradict each other — are real and consequential but scattered across literatures that don't cite each other. Organizations whose work spans multiple domains (most organizations) need cross-domain reasoning, and existing AI systems handle it poorly.

The bridge ontology addresses this by encoding typed, validated connections between domain ontologies as structured data. Each entry records: source concept, target concept, relation type (from a closed set of six), evidence strength (0.0–1.0), evidence source (citation), and human validator identity.

The six relation types: concept mapping (same phenomenon, different terminology across fields), structural analogy (different phenomena sharing deep relational structure, formalized via Gentner's Structure Mapping Theory), causal chain (documented cross-domain causal links), intervention chain (how an action in one field produces effects in another — the most practitioner-useful type), contradiction (where fields disagree or use terms incompatibly), and abstraction mapping (connections between different levels of generality across domains).

Each domain is represented as a standardized 7-artifact frozen package in a single YAML file: taxonomy, controlled vocabulary, thesaurus (SKOS conventions), ontology (11 typed relationships), metadata schema, governance record, and inter-domain connection hints. YAML is used instead of OWL deliberately — it is human-readable, diffs cleanly in Git, compresses well for prompt injection, and lowers the expertise barrier. The trade-off is losing automated logical reasoning, which is acceptable because the LLM handles inference while the ontology provides structure.

At query time, L0 dynamically assembles a condensed bridge matrix (~1–2K tokens) containing only the bridge entries relevant to the detected domains, ranked by evidence strength and query relevance.

The existential test (Phase 1.5, Sprint 7) validates the entire approach: 50 expert-generated cross-domain queries evaluated blind across three conditions — small model with bridge, small model without bridge, frontier model without bridge. Success means the small+bridge condition matches frontier quality. Results will be published with full methodology and raw data regardless of outcome.

Section 5

The Three-Perspective Cognitive Model

Brief
OrgBox processes every interaction through three simultaneous perspectives: what the user is actually thinking (Perspective A), what the system infers the user needs (Perspective B), and what cognitive mode the system itself should adopt (Perspective C). A 12-type Cognitive Type Taxonomy classifies the kind of thinking each query demands. Meta-reasoning — the system asking "what kind of thinking does this require?" — is the single function that enables both cost optimization and cognitive quality. Three interlocking registries (Human Expertise, Psychology Touchpoint, Cognitive Attention) complete the human layer specification.
Technical Summary

Most LLM systems treat queries as transactions: message in, response out. OrgBox adds a cognitive architecture that asks three questions before generating anything.

Perspective A — User's cognition. What is the human actually thinking? This is cognitive modeling, not user profiling. The same words ("How did it go?") carry entirely different cognitive intent depending on whether the speaker is a data engineer (expecting metrics) or a community organizer (expecting stories and relationships). Perspective A reads the query through the lens of who is asking and what cognitive operation they are performing.

Perspective B — System's model of the user. What does OrgBox infer the user needs, which may differ from what they explicitly asked? A junior volunteer asking a technically precise question may be operating beyond their depth; Perspective B detects this and adds scaffolding. The critical constraint is humility — B informs formatting, depth, and vocabulary but never overrides explicit user requests.

Perspective C — System's model of itself. What cognitive mode should OrgBox adopt? Is this a factual lookup, an analytical decomposition, a multi-domain synthesis, an ethical deliberation? The answer determines model tier, prompt scaffolding, context allocation, and evaluation criteria.

The Cognitive Type Taxonomy provides the vocabulary: 12 types of cognitive operations (Analytical, Evaluative, Pattern Recognition, Synthesis, Creative, Ethical Deliberation, Empathic Modeling, Classification, Spatial/Structural, Meta-Reasoning, Decision Under Uncertainty, Translation/Formatting). A single query can invoke multiple types; the system identifies the blend and adjusts accordingly.

Meta-reasoning (MR) is the most important function in the system. It asks "what kind of thinking does this require?" in milliseconds via a lightweight classifier. This single operation enables the 92% cost reduction (by routing simple queries to cheap models) and drives cognitive quality (by matching the system's output mode to the query's actual demand — empathic attunement for emotional queries, analytical precision for technical ones).

Three registries complete the human layer: the Human Expertise Registry (where human judgment is irreplaceable), the Psychology Touchpoint Registry (where psychological principles constrain design), and the Cognitive Attention Registry (how cognitive operations work at every system touchpoint from all three perspectives). This three-registry pattern is domain-agnostic and reusable across any OrgBox instance.

Section 6

Human Expertise Integration — Where AI Cannot Substitute

Brief
OrgBox treats human expertise as a permanent structural component, not a bottleneck to be automated away. The Human Expertise Registry (HX) formally catalogs every point where human judgment is irreplaceable — ontology authoring, bridge validation, ethics review, democratic accountability. The companion Psychology Touchpoint Registry (PX) specifies why psychology constrains the design at each of those points. Founding principle: machines assist, organize, suggest, and scale — but the authoritative act of defining what a term means is a human judgment call.
Technical Summary

Most AI architectures treat human involvement as a constraint to minimize. OrgBox inverts this: human expertise is a permanent architectural component, as fundamental as the model adapter or vector store.

The founding principle comes from ontologist Ole Olesen-Bagneux: humans must write the vocabulary. The machine can assist, organize, and scale, but the authoritative act of defining what a term means in a domain is a human judgment call. This generalizes: anywhere a judgment requires domain expertise, ethical reasoning, cultural context, community trust, or democratic legitimacy — that is a human seat. The system serves the human in that seat; it does not occupy it.

The Human Expertise Registry (HX) formalizes this as a system architecture artifact. Each entry specifies: component, human role required, what the human does, what the system does, why AI cannot replace this, and when the role is needed. Entries cluster into four categories: semantic infrastructure (vocabulary authoring, taxonomy validation, bridge ontology construction — the highest-judgment task in the system), per-deployment configuration (ontology customization, domain review, training), knowledge provenance and governance (source selection, extraction quality, vocabulary governance), and ethical/democratic legitimacy (ethics review, democratic accountability).

Five design requirements follow: the system must have a designed interface for every human role (not backend access), must present information in the specialist's cognitive frame, must learn from human judgments over time, must never bypass a human checkpoint even when confidence is high, and must document every human decision for auditability and attribution.

The Psychology Touchpoint Registry (PX) is the companion artifact explaining why psychology constrains design at each point where HX says who is needed. It draws on Self-Determination Theory, cognitive load theory, tacit knowledge research, flow theory, change management psychology, and identity dynamics. The connection is direct: HX says hire an ontologist; PX says that ontologist needs autonomy support, that their tacit knowledge is what makes them irreplaceable, and that disrupting their workflow is a change management challenge that can threaten professional identity.

Both registries are domain-agnostic patterns. Any OrgBox instance — legal, health, civic — needs its own HX and PX entries. The pattern is reusable; the content is domain-specific.

Section 7

Model-Agnostic Adapter Pattern and Cost-Optimized Routing

Brief
OrgBox connects to any LLM through a standard four-operation adapter interface (assemble, generate, parse, report). Switching models means changing a config value. A three-tier routing system sends ~85% of queries to a free local model, ~10% to a mid-range API, and ~5% to a frontier model — achieving 92% cost reduction versus all-frontier routing. When the cloud budget runs out, the system drops to local-only mode. It never locks an organization out of its own AI.
Technical Summary

OrgBox communicates with language models through model adapters — an abstraction layer where each adapter implements a standard four-operation interface: assemble (format the structured request for the target model's API), generate (make the call, handle retries and errors), parse (normalize the response back to the Box's internal format), and report (return metadata: model name, tokens, latency, cost). Writing a new adapter for any provider requires implementing these four operations — typically a few hundred lines of code. The Box itself does not change.

Model-agnosticism is a strategic requirement, not a convenience. The AI model landscape changes faster than any deployment can track. An organization locked to one provider reprices its entire AI infrastructure every time that provider changes terms. With OrgBox, knowledge, ontologies, identity, and evaluation live in the Box — not in any model's weights. That investment compounds over time rather than depreciating with each model generation. The adapter pattern also enables per-query switching between local models (for privacy-sensitive data) and cloud models (for capability).

Three-tier cost optimization makes this economically viable. Tier 1 (local, free, ~85% of queries): a 7–14B parameter model via Ollama. The Box provides structured knowledge, behavioral guidance, and ontological context — turning the model's job into reading comprehension and synthesis, which small models handle well with good input. Tier 2 (mid-range API, ~10%, ~$0.003/query): Mistral Small with a reasoning_effort parameter creating a further cost gradient within the tier. Tier 3 (frontier, ~5%, ~$0.05/query): always used for crisis-adjacent queries regardless of apparent complexity.

Result: ~$2.80/month for 1,000 queries. Fully local deployments cost zero.

Six activation profiles standardize routing decisions (QUICK_LOOKUP, DOMAIN_EXPERT, BRIDGE_REASONING, CRISIS_RESPONSE, AGENTIC_WORKFLOW, GENERATION). When L6 evaluation flags a Tier 1 response, the system escalates via a draft-then-refine pattern — sending the original query, the Tier 1 draft, and L6's specific concerns to Tier 3 for improvement. This is cheaper and often better than generating from scratch. Organizations set monthly API cost caps; when exhausted, the system drops to Tier 1 only. It never cuts off service entirely.

Section 8

Deployment Architecture

Brief
An OrgBox instance is a version-controlled directory of human-readable files (YAML, JSON, Markdown) organized by architectural layer. It supports three deployment modes — fully local (free, fully private), hybrid (local + cloud API), and fully hosted — selected by configuration, not architecture. The reference implementation deploys as a Nextcloud ExApp via Docker, but the Box runs on any platform that can host a container. What changes per deployment is domain content; the architectural core stays constant.
Technical Summary

An OrgBox instance is a directory with a defined structure. Every component — identity rules, ontologies, knowledge records, tool definitions, interaction patterns, evaluation criteria — is a human-readable file in a version-controlled tree. An engineer with a text editor can inspect everything. Nothing is hidden in a database or binary.

The directory is organized by layer: L1-identity/ (values, persona, boundaries, adaptive rules), L2-reasoning-matrix/ (upper ontology, domain ontologies, bridge, reasoning scaffold), L3-knowledge/ (corpus, vectors, graph, term bank), L4-capabilities/ (tools, workflows, adapters), L5-interaction/ (conversation flows, templates, scenarios), L6-evaluation/ (sync checks, async checks, compliance).

A single config.yaml defines runtime parameters: available models per tier, API endpoints, monthly budget cap, enabled domains, and platform integration settings. This is the only file that differs between a laptop deployment running Mistral locally and a managed server running Claude via API.

Three deployment modes. Fully local: model runs via Ollama, all data on-premises, no internet required, zero inference cost, needs 12GB+ VRAM. Hybrid (default/recommended): ~85% of queries handled locally for free, the rest routed to cloud APIs, with sensitive queries configurable to always stay local. Fully hosted: for organizations without local compute, least private but most accessible. The deployment mode is a config choice — the Box is identical across all three.

Nextcloud reference implementation. The Box deploys as a Nextcloud ExApp (Docker container via AppAPI). Four communication channels: Task Processing (inbound queries), WebDAV file access (inbound, user-consented), Action writing (outbound — calendar, chat, documents), and MCP server (bidirectional — exposing tools to NC Context Agent and consuming NC capabilities). Nextcloud is the reference, not a dependency.

Customization surface. What changes per deployment: L1 identity, L2 ontologies, L3 knowledge, L5 interaction patterns, L6 evaluation criteria, runtime config. What stays constant: L0 router logic, L4 adapter pattern, L6 evaluation framework, L7 deployment infrastructure. This separation is what makes the architecture reusable — a LegalBox and a HealthBox share the same core, with different domain content.

Section 9

Instantiation Guide — Building Your Own Box

Brief
An eight-step process for building a domain-specific OrgBox instance: define domains (start with 3–5), build ontology packages (7-artifact YAML template per domain), populate the knowledge store, design bridge ontology entries, configure identity, build interaction patterns, set up evaluation, and deploy. A minimal instance (3 domains, no bridge, basic identity) can be built by one person in 4–6 weeks. Estimated costs range from $15K–$40K for a small instance to $200K–$500K for a 10–20 domain enterprise build.
Technical Summary

This section translates the architecture into a practical build process — eight steps from zero to a running domain-specific instance.

Step 1 — Define domains. Identify the knowledge fields your organization draws on simultaneously when facing its hardest problems. Write a one-sentence scope statement and boundary for each. Map adjacencies between domains — these drive bridge construction. Start with fewer than you think you need: 3–5 well-built domains with validated bridges beat 15 shallow domains without them.

Step 2 — Build ontology packages. Each domain gets a single YAML file following the 7-artifact frozen package template. Build sequence: top concepts (5–15 broadest categories) → child concepts → genus-differentia definitions for every entry → relationships from the closed set of 11 types → controlled vocabulary for ambiguous terms → bridge hints for cross-domain connections → governance documentation. For 3–5 domains, domain experts can draft packages that a professional ontologist later validates. For 10+ domains, involve a professional ontologist from the start.

Step 3 — Populate knowledge store. Select ~10 authoritative source texts per domain (quality over quantity). Extract structured knowledge via KES or equivalent. Build the vector store (multi-index: one per domain plus global), graph store, and term bank.

Step 4 — Design bridge ontology. Gather bridge hints from all domain packages. A cross-disciplinary knowledge engineer evaluates each candidate, types and grades accepted entries, and builds the condensed matrix (~1–2K tokens). 15–30 validated entries per domain pair is sufficient to meaningfully improve cross-domain reasoning.

Step 5 — Configure identity. Mission, persona, boundaries, and adaptive behavior rules — all structured YAML.

Step 6 — Build interaction patterns. Design conversation flows for key use cases: stages, transitions, information requirements, escalation rules, quality criteria.

Step 7 — Set up evaluation. Sync guardrails, async model-as-judge criteria, compliance/audit infrastructure.

Step 8 — Deploy. Choose deployment mode, choose platform integration, run validation suite, onboard users with a human trainer/facilitator.

Estimated effort: Small instance (3–5 domains, single org): 8–12 weeks, $15K–$40K. Medium (5–10 domains, production): 4–6 months, $50K–$150K. Large (10–20 domains, enterprise): 12–18 months, $200K–$500K. A minimally viable instance — 3 domains, no bridge, basic identity — can be built by one technically capable person in 4–6 weeks using the open-source templates.

Section 10

Comparison to Existing Approaches

Brief
OrgBox is compared structurally — not competitively — to fine-tuning, RAG, MCP, agent frameworks, and platform wrappers. Each addresses a subset of what organizations need. Across seven capability dimensions (identity, structured knowledge, cross-domain reasoning, retrieval, tools, interaction patterns, evaluation), no existing approach covers more than one or two. OrgBox addresses all seven plus cost optimization and model portability. The approaches are complementary: a fine-tuned model can be a Tier 1 engine, RAG lives at L3, MCP lives at L4, and frameworks can implement parts of the Box.
Technical Summary

This section maps OrgBox against the five dominant approaches from Section 1, comparing what each provides structurally across seven capability dimensions.

vs. Fine-Tuning. Fine-tuning produces domain-fluent language without external knowledge injection — stronger for narrow, stable tasks. But it provides knowledge only (partially), with no auditable identity, no structured ontology, no cross-domain reasoning, no evaluation, no cost optimization, and no portability. The two are complementary: a fine-tuned model can serve as Tier 1 inside an OrgBox instance, combining fluency with the Box's structured intelligence.

vs. RAG. RAG is efficient and well-understood for pure knowledge retrieval. But it provides retrieval (L3) and nothing else — no identity, no structured knowledge, no cross-domain reasoning, no interaction patterns, no cost optimization, no evaluation. OrgBox includes RAG as one component. L3 retrieval is enhanced by L2 ontological context, L1 identity, and L6 evaluation.

vs. MCP. MCP solved tool interoperability and solved it well. But it is a protocol for one layer (L4) of what OrgBox provides. The relationship is complementary — OrgBox uses MCP as its tool protocol and addresses the six layers MCP does not: identity, structured knowledge, cross-domain reasoning, cost routing, interaction design, and evaluation.

vs. Agent Frameworks (LangChain, LlamaIndex). Frameworks offer maximum flexibility — a developer can build virtually anything. But they provide capabilities without architecture. The developer must still design knowledge architecture, build ontologies, define identity, create interaction patterns, and implement evaluation. LangChain is a toolkit; OrgBox is a blueprint. A developer can use LangChain to implement parts of an OrgBox instance.

vs. Platform Wrappers (Custom GPTs, Claude Projects, Gems). Fast to set up, no technical skill required, useful for individual and light team use. But not portable, no structured knowledge, no cross-domain reasoning, no interaction design, no evaluation, no cost optimization. For individual productivity, wrappers are a better choice. For organizational infrastructure that multiple people depend on, that handles high-stakes questions, and that must survive model changes — OrgBox is designed for that case.

What OrgBox does not replace: the model itself, training data, compute infrastructure, or human expertise.

Summary: Across the seven dimensions, every existing approach leaves most cells empty. OrgBox fills them.

Section 11

The Open-Source Specification and Invitation to Build

Brief
Everything described in this whitepaper is open-source: architecture spec, bridge ontology methodology, cognitive model, domain ontology template, adapter interface, router schema, evaluation framework, and the CommunityLLM reference implementation. Code is AGPL-3.0-or-later; content and specifications are CC BY-SA 4.0. The project is governed by the OLS Foundation (nonprofit). A formal research agenda commits to publishing all results — including failures. This is not a product launch. It is public infrastructure, and an invitation for engineers, ontologists, researchers, organizations, critics, and translators to build on it.
Technical Summary

Everything in this whitepaper is released under open licenses. The distinction is deliberate: code under AGPL-3.0-or-later (copyleft, ensuring modifications shared back) and content/specifications under CC BY-SA 4.0 (requiring attribution and share-alike to ensure improvements feed back into the commons).

What is being open-sourced: The complete seven-layer architectural specification. The bridge ontology methodology (six relation types, evidence grading, condensed matrix, 7-artifact frozen package template, existential test protocol). The three-perspective cognitive model and Cognitive Type Taxonomy. The domain ontology YAML template. The Human Expertise Registry template. The four-operation adapter interface with reference implementations. The L0 Router classification schema (five-axis classification, activation profiles, tier assignment). The evaluation framework (sync guardrails, async model-as-judge, L6.5 compliance, trace-collection-from-trace-trust principle). And the CommunityLLM reference implementation — the first OrgBox instance, covering 20 social science domains.

Governance. The OLS Foundation, a nonprofit under fiscal sponsorship of One World Community Support Center (OWCSC), maintains the specification, owns the trademark, and governs the repositories. Specification changes require proposal, community discussion, and maintainer review. The Foundation does not control how instances are deployed.

Research agenda. OrgBox makes empirical claims that require validation. Committed research tracks include: bridge ontology validation (the existential test), cognitive routing optimization (validating the 85/10/5 distribution), ontology methodology refinement, cross-domain reasoning benchmarks (none currently exist — building one is part of the agenda), human expertise integration patterns, and three-perspective cognitive model evaluation. All results will be published regardless of outcome.

The invitation. Engineers: build adapters, implement the router, write evaluation harnesses. Ontologists: instantiate the spec for new domains — build a LegalBox, HealthBox, CivicBox. Researchers: replicate the existential test, design better evaluation methods. Organizations: deploy an instance for real work and report what breaks. Critics: challenge the assumptions. Translators: adapt the system for non-English-speaking communities who need it most.

What this is not. Not a product launch — nothing to buy. Not a startup pitch — no investors. OLS is a mutual aid nonprofit building public infrastructure. The architecture is designed and specified; significant build work remains. The honest status: the blueprint is drawn and the foundation is being poured.

The closing commitment: free, private, model-agnostic, human-centered organizational intelligence — available to every organization that needs it, governed by the community that builds it. The system adapts to human flourishing. Humans never adapt to the system.