Lab Experiments

What we're testing right now

Real experiments with real results. We run structured tests on model performance, agent behavior, memory retrieval, and training methods — then share what we find.

Propose an Experiment Back to AI Labs
Current Experiments

Running, completed, and planned

Active
Jan 2026
Agentic Systems

Multi-Agent Task Routing

Testing dynamic task distribution across specialized agents. Instead of one large model doing everything, we route subtasks to purpose-built micromodels and measure accuracy, latency, and cost.

Active
Dec 2025
Model Training

Distillation Efficiency Benchmarks

Measuring how much capability a 1B-parameter student model retains after distilling from a 70B teacher. Tracking performance across reasoning, summarization, and classification tasks.

Active
Feb 2026
Cloud Memory

DomeAI Retrieval Latency

Benchmarking memory retrieval speeds in DomeAI as context stores scale from 10k to 10M entries. Optimizing vector search and cache strategies for sub-100ms recall.

Completed
Oct 2025
Evaluation

Long-Context Faithfulness

Tested how faithfully models follow instructions placed at varying positions within 128k-token contexts. Results informed our prompt construction patterns for production agents.

Planning
Planned Q2 2026
Data Quality

Synthetic Data Quality Scoring

Designing a scoring rubric for synthetic training data. The goal is an automated pipeline that rates generated examples on accuracy, diversity, and difficulty before they enter training sets.

Active
Jan 2026
Agentic Systems

Agent Self-Correction Loops

Running agents with a "reflection" step after each action. Comparing task completion rates and error recovery between agents with and without self-correction capabilities.

Methodology

How we run experiments

Every experiment follows a structured process so results are reproducible and actionable — not just interesting.

01

Hypothesis

Start with a clear, testable question. What are we trying to prove or disprove?

02

Design

Define metrics, control variables, dataset size, and success criteria before running anything.

03

Execute

Run the experiment with logging at every step. Capture raw data, not just conclusions.

04

Analyze & Share

Publish findings internally (and sometimes publicly). Negative results are documented too.

Results

Notable findings so far

Distilled models retain 87% accuracy on classification

Our early distillation experiments showed that a 1.3B-parameter model can match a 70B model on binary classification tasks after targeted fine-tuning — at 40x lower inference cost.

Prompt position matters more than length

In long-context faithfulness testing, instruction placement at the beginning and end of context windows yielded 23% higher compliance than mid-context placement, regardless of total length.

Self-correcting agents complete 31% more tasks

Preliminary results from our reflection experiment show agents with a self-correction step complete significantly more multi-step tasks without human intervention.

FAQ

Common questions about our experiments

We publish summarized findings and key metrics publicly. Raw datasets are typically kept internal, but we share methodology details so others can replicate our experiments with their own data.
Experiments are driven by production needs. When we encounter a performance bottleneck, a training challenge, or an architectural question in our actual products, we design an experiment to answer it.
Absolutely. If you have a specific question about AI model behavior, agent systems, or training techniques, reach out through our contact page. We regularly collaborate with partners on research questions.
Most experiments run for 4-12 weeks depending on complexity. Quick benchmarks can be completed in days, while behavioral studies on agent systems may take several months to gather meaningful data.
Yes. We believe negative results are just as valuable as positive ones. Knowing what doesn't work prevents wasted effort across the AI community and keeps our own research honest.
Ready to Start?

Have an experiment idea?

We're always looking for interesting problems to study. If you have a hypothesis or challenge that could benefit from rigorous testing, let's talk.

Fixed-price quote before any work starts
You own 100% of the code
30 days of free support