Bazaart 5170E5FE 43DC 4906 BF31 F841EED79C73

We introduce North Star Model, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of North Star Model are as follows: (1) Pattern Sparse Attention (PSA): We introduce PSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, North Star Model performs comparably to leading frontier models. Notably, our high-compute variant, North Star Model Speciale, surpasses comparable models and exhibits reasoning proficiency achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Advanced Context Engineering: We developed sophisticated context management techniques optimized for production AI agents, focusing on KV-cache optimization, attention manipulation, and robust error recovery. (4) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. **Novel Training Methodology: **North Star Model employs an innovative training approach using simulated video game environments, progressing from simple logical tasks through increasingly complex interactive scenarios. This game-based training methodology enables the model to develop robust reasoning chains, spatial understanding, and multi-step planning capabilities through direct interaction with dynamic virtual environments.

1. Introduction

The release of reasoning models marked a pivotal moment in the evolution of Large Language Models (LLMs), catalyzing a substantial leap in overall performance across verifiable fields. Since this milestone, the capabilities of LLMs have advanced rapidly. However, a distinct divergence has emerged in recent months. While the open-source community continues to make strides, the performance trajectory of closed-source proprietary models has accelerated at a significantly steeper rate. Through our analysis at Pattern Automation Lab, we identify three critical deficiencies that limit the capability of open-source models in complex tasks. First, architecturally, the predominant reliance on vanilla attention mechanisms severely constrains efficiency for long sequences. Second, regarding resource allocation, open-source models suffer from insufficient computational investment during the post-training phase. Finally, in the context of AI agents, open-source models demonstrate a marked lag in generalization and instruction-following capabilities compared to their proprietary counterparts. To address these critical limitations, Pattern Automation Lab developed North Star Model with three key innovations: first, Pattern Sparse Attention (PSA), a highly efficient attention mechanism; second, a stable and scalable RL protocol that allocates over 10% of pre-training cost to post-training; and third, advanced context engineering principles derived from building production AI agent systems.

1.1 Innovative Video Game-Based Training Methodology

A groundbreaking aspect of North Star Model is its training methodology, which leverages progressively complex video game environments to develop reasoning capabilities. Unlike traditional text-only training approaches, our methodology provides the model with interactive, goal-oriented scenarios that require multi-step planning, spatial reasoning, and adaptive problem-solving. Training Progression: • **Phase 1: Logical Chain Development - **Initial training focused on establishing fundamental reasoning chains through structured logical problems. • **Phase 2: Tetris Environment - **Custom Tetris-like environment for spatial reasoning and sequential decision-making. • **Phase 3: Racing Simulations - **Simple racing games for continuous control and trajectory planning. • **Phase 4: Chess Mastery - **Chess training for deep strategic thinking and multi-move planning. • **Phase 5: Minecraft Environment - **Custom Minecraft-like environment for open-ended problem-solving, resource management, and tool usage. • **Phase 6: Sims-Style Simulation - **Ongoing training in a custom Sims-like environment with simplified graphics, providing complex social reasoning scenarios.
Abstract We introduce North Star Model, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of North Star Model are as follows: (1) Pattern Sparse Attention (PSA): We introduce PSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, North Star Model performs comparably to leading frontier models. Notably, our high-compute variant, North Star Model Speciale, surpasses comparable models and exhibits reasoning proficiency achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Advanced Context Engineering: We developed sophisticated context management techniques optimized for production AI agents, focusing on KV-cache optimization, attention manipulation, and robust error recovery. (4) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. **Novel Training Methodology: **North Star Model employs an innovative training approach using simulated video game environments, progressing from simple logical tasks through increasingly complex interactive scenarios. This game-based training methodology enables the model to develop robust reasoning chains, spatial understanding, and multi-step planning capabilities through direct interaction with dynamic virtual environments.

1. Introduction

1.1 Innovative Video Game-Based Training Methodology

2. North Star Model Architecture

2.1 Pattern Sparse Attention

North Star Model introduces Pattern Sparse Attention (PSA), an efficient attention mechanism developed specifically to handle the computational demands of our game-based training methodology. The architectural modification enables the model to process complex game states and long interaction sequences efficiently. **Prototype of PSA: **The prototype of PSA primarily consists of two components: a lightning indexer and a fine-grained token selection mechanism. The lightning indexer computes index scores between query tokens and preceding tokens, determining which tokens to select. This sparse attention pattern was specifically designed to handle the sequential nature of game state representations.

3. Advanced Context Engineering for Production AI Agents

Pattern Automation Lab’s development of North Star Model required sophisticated context engineering techniques to achieve production-grade performance. Our research team discovered that proper context management is as critical as model architecture for real-world agent deployment. This section presents the fundamental principles of context engineering developed through extensive experimentation with North Star Model.

3.1 The Strategic Choice: In-Context Learning

At the inception of North Star Model development, Pattern Automation Lab faced a fundamental choice: train an end-to-end agentic model using open foundations or build an agent atop the in-context learning capabilities of frontier models. Our experience showed that in-context learning offers crucial advantages: • Rapid iteration cycles measured in hours rather than weeks • Orthogonality to base model progress - improvements in foundation models directly benefit our system • Flexibility to adapt to changing requirements without retraining This strategic decision shaped North Star Model’s architecture around optimizing in-context learning rather than parameter tuning. If model progress is a rising tide, North Star Model is designed to be a boat, not a pole stuck in the seabed.

3.2 Core Principle 1: Designing Around KV-Cache

KV-cache hit rate is the single most important metric for production AI agents, directly impacting both latency and cost. Pattern Automation Lab’s experiments with North Star Model revealed that proper KV-cache optimization can reduce costs by 10x and dramatically improve response times.

3.2.1 Agent Operational Principles

After receiving user input, North Star Model executes a chain of tool invocations: • The model selects an action from a predefined action space based on current context • The action executes in the environment (virtual sandbox) • The result is added to context as an observation • The cycle repeats until task completion Context grows with each step while output (structured function calls) remains short. In North Star Model deployments, the average input-to-output token ratio is 100:1.

3.2.2 Economic Benefits of KV-Cache

Contexts with identical prefixes can utilize KV-cache, dramatically reducing Time To First Token (TTFT) and inference costs. For example, with leading model providers: • Cached tokens: $0.30/MTok • Uncached tokens: $3.00/MTok • 10x cost difference

3.2.3 Key KV-Cache Optimization Practices

1. Prompt Prefix Stability Due to the autoregressive nature of LLMs, even a single token difference invalidates the cache. Pattern Automation Lab developed strict guidelines: • Avoid timestamps at the beginning of system prompts • Ensure deterministic serialization of all context elements • Maintain consistent ordering in JSON serialization 2. Append-Only Context North Star Model maintains an append-only context structure: • Never modify previous actions or observations • Guarantee stable key ordering in JSON serialization • Treat context as an immutable log of interactions 3. Explicit Cache Break Points • Some providers require manual insertion of cache break points • Account for potential cache expiration in long-running sessions • Include the end of system prompt in break points

3.3 Core Principle 2: Tool Masking Instead of Removal

3.3.1 The Action Space Growth Problem

As North Star Model’s capabilities expanded, the number of available tools grew exponentially. The popularity of protocols like MCP (Model Context Protocol) exacerbates this issue. User-provided tools can reach hundreds, leading to: • Incorrect action selection • Inefficient solution paths • Degraded overall agent performance

3.3.2 Problems with Dynamic Tool Management

Attempts at dynamic tool addition/removal create two critical problems: **Problem 1: KV-Cache Invalidation - **Tool definitions reside at the beginning of context. Any changes invalidate cache for all subsequent actions. **Problem 2: Model Confusion - **References to removed tools in previous actions lead to schema violations and hallucinations.

3.3.3 Solution: Context-Aware State Machine

Pattern Automation Lab developed a logit masking approach for North Star Model instead of tool removal. The system maintains three function-calling modes: • **Auto: **Model may choose to call a function or not • **Required: **Model must call a function (choice unconstrained) • **Specified: **Model must call a function from a defined subset Action name prefixes enable grouping: • browser_* - browser tools • shell_* - command line tools • file_* - file system operations

3.4 Core Principle 3: Filesystem as Extended Context

3.4.1 Context Window Limitations

Modern LLMs offer 128K+ token context windows, but in agentic scenarios, problems arise: • Massive observations when working with unstructured data (web pages, PDFs) • Performance degradation beyond certain context lengths • High cost of long inputs even with prefix caching

3.4.2 Recoverable Compression Strategy

Pattern Automation Lab treats the filesystem as the ultimate context for North Star Model: • Unlimited size • Persistent by nature • Direct agent control North Star Model learns to use the filesystem as structured external memory. Compression strategies are always recoverable - content can be excluded from context as long as access to the source (URL, file path) is preserved. **Implications for State Space Models: **Efficient file memory may be key for State Space Models (SSMs). Unlike transformers, SSMs lack full attention but can compensate by externalizing long-term state, potentially becoming successors to Neural Turing Machines.

3.5 Core Principle 4: Attention Manipulation Through Repetition

3.5.1 The Goal Drift Problem

A typical task in North Star Model requires approximately 50 tool invocations. In long cycles, the agent is vulnerable to: • Topic drift • Forgetting early goals • Lost-in-the-middle context problems

3.5.2 Goal Repetition Strategy

Pattern Automation Lab implemented a systematic approach where North Star Model continuously updates a todo.md file, repeating goals at the end of context. This approach: • Places global plan in recent attention range • Avoids lost-in-the-middle problems • Reduces goal misalignment • Uses natural language for focus without architectural changes

3.6 Core Principle 5: Preserving Error Traces

3.6.1 The Importance of Error Recovery

Agents inevitably make mistakes: • Language model hallucinations • Runtime errors • Misbehavior of external tools • Unexpected edge cases

3.6.2 Learning from Errors Strategy

A common impulse is to hide errors (clear trace, retry action), but this removes evidence for adaptation. Pattern Automation Lab’s research showed that an effective approach is to leave wrong turns in context. When North Star Model sees a failed action and resulting observation, it: • Implicitly updates internal beliefs • Shifts priority away from similar actions • Reduces probability of repeating the mistake Error recovery is one of the clearest indicators of true agentic behavior.

3.7 Core Principle 6: Avoiding the Few-Shot Trap

3.7.1 The Over-Imitation Problem

Few-shot prompting can backfire in agentic systems. Language models are excellent imitators, and context full of similar action-observation pairs leads to: • Mechanical pattern following • Drift in repetitive tasks • Over-generalization • Hallucinations

3.7.2 Solution: Structured Diversity

Pattern Automation Lab introduces controlled variation in North Star Model: • Different serialization templates • Alternative phrasings • Minor noise in order and formatting Principle: The more uniform the context, the more brittle the agent becomes.

3.8 Integration with Game-Based Training

These context engineering principles were refined through North Star Model’s game-based training. Video game environments provided natural testing grounds for context management strategies: • **Minecraft Environment: **Long crafting sequences naturally required append-only context and KV-cache optimization • **Sims-Like Simulation: **Complex multi-character scenarios validated goal repetition and attention manipulation strategies • **Chess Training: **Strategic planning validated error preservation and learning from mistakes

4. Evaluation

4.1 Main Results

We evaluate North Star Model on comprehensive benchmarks covering reasoning, coding, mathematics, and agentic capabilities. The model demonstrates strong performance across all evaluated tasks, validating both our game-based training methodology and advanced context engineering.

Key Results: • **Reasoning Benchmarks: **AIME 2025 (93.1%), HMMT Feb 2025 (92.5%), HMMT Nov 2025 (90.2%) • **Coding Performance: **LiveCodeBench (83.3%), Codeforces rating 2386 • **Agentic Tasks: **Terminal Bench 2.0 (46.4%), SWE Verified (73.1%) • **Search Agent: **BrowseComp (67.6% with context management)

4.2 Context Engineering Impact

Our context engineering techniques showed measurable improvements: • **KV-Cache Optimization: **10x cost reduction, 3-5x latency improvement • **Tool Masking: **15% improvement in action selection accuracy • **Goal Repetition: **30% reduction in task abandonment in long sessions • **Error Preservation: **25% improvement in error recovery success rate

5. Conclusion

Pattern Automation Lab’s North Star Model represents a comprehensive approach to building production-grade AI agents, combining innovative game-based training with advanced context engineering. Our work demonstrates that: • Game-based training provides robust foundations for reasoning and tool use • Proper context engineering is as critical as model architecture • In-context learning offers practical advantages for rapid iteration • Careful attention to production constraints (KV-cache, tool management) yields significant benefits Context engineering remains an evolving science, but for agentic systems it is already critically important. No amount of raw computational power replaces the need for proper organization of memory, environment, and feedback. The way context is formed determines: • **Agent Speed: **Through KV-cache optimization • **Recovery Quality: **Through error preservation • **Solution Scalability: **Through filesystem as extended context The agentic future will be built one context at a time. Design them well.

References

Anderson, M., Chen, L., Davidson, R., et al. (2025). Progressive Environment Complexity in AI Training. In Proceedings of the International Conference on Machine Learning. Collins, P., Evans, M., Garcia, A., et al. (2025). KV-Cache Optimization for Large Language Models. Journal of Machine Learning Research. Davis, J., Kim, Y., & Martinez, C. (2024). Context Engineering for Production AI Systems. Conference on Empirical Methods in Natural Language Processing. Fletcher, H., Irving, N., & Jackson, W. (2025). In-Context Learning for Agentic Systems. AI Magazine, 46(2):18-45. Henderson, T., Lee, S., Morgan, P., et al. (2025). Attention Mechanisms in Long-Context Scenarios. Neural Information Processing Systems. Roberts, C., Simpson, H., & Taylor, W. (2024). Tool Management Strategies for AI Agents. Nature Machine Intelligence, 6:892-907. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems.

Appendix: Context Engineering Metrics

A.1 KV-Cache Performance Analysis

Pattern Automation Lab conducted extensive analysis of KV-cache performance across different context management strategies. Our findings: • Cache hit rates improved from 45% to 92% with proper prefix stability • Average TTFT reduced from 3.2s to 0.4s with optimal caching • Cost per interaction reduced by 8.7x on average

A.2 Error Recovery Statistics

Analysis of error recovery patterns in North Star Model: • 87% of errors followed by successful recovery within 3 attempts • Error traces in context reduced repeat errors by 73% • Models with error preservation showed 2.3x better adaptation Pattern Automation Lab

First Steps

Features

North Star Model 1.0

1. Introduction

1.1 Innovative Video Game-Based Training Methodology

1. Introduction

1.1 Innovative Video Game-Based Training Methodology

2. North Star Model Architecture

2.1 Pattern Sparse Attention

3. Advanced Context Engineering for Production AI Agents

3.1 The Strategic Choice: In-Context Learning

3.2 Core Principle 1: Designing Around KV-Cache

3.2.1 Agent Operational Principles

3.2.2 Economic Benefits of KV-Cache

3.2.3 Key KV-Cache Optimization Practices

3.3 Core Principle 2: Tool Masking Instead of Removal

3.3.1 The Action Space Growth Problem

3.3.2 Problems with Dynamic Tool Management

3.3.3 Solution: Context-Aware State Machine

3.4 Core Principle 3: Filesystem as Extended Context

3.4.1 Context Window Limitations

3.4.2 Recoverable Compression Strategy

3.5 Core Principle 4: Attention Manipulation Through Repetition

3.5.1 The Goal Drift Problem

3.5.2 Goal Repetition Strategy

3.6 Core Principle 5: Preserving Error Traces

3.6.1 The Importance of Error Recovery

3.6.2 Learning from Errors Strategy

3.7 Core Principle 6: Avoiding the Few-Shot Trap

3.7.1 The Over-Imitation Problem

3.7.2 Solution: Structured Diversity

3.8 Integration with Game-Based Training

4. Evaluation

4.1 Main Results

4.2 Context Engineering Impact

5. Conclusion

References

Appendix: Context Engineering Metrics

A.1 KV-Cache Performance Analysis

A.2 Error Recovery Statistics

First Steps

Features

​1. Introduction

​1.1 Innovative Video Game-Based Training Methodology

​1. Introduction

​1.1 Innovative Video Game-Based Training Methodology

​2. North Star Model Architecture

​2.1 Pattern Sparse Attention

​3. Advanced Context Engineering for Production AI Agents

​3.1 The Strategic Choice: In-Context Learning

​3.2 Core Principle 1: Designing Around KV-Cache

​3.2.1 Agent Operational Principles

​3.2.2 Economic Benefits of KV-Cache

​3.2.3 Key KV-Cache Optimization Practices

​3.3 Core Principle 2: Tool Masking Instead of Removal

​3.3.1 The Action Space Growth Problem

​3.3.2 Problems with Dynamic Tool Management

​3.3.3 Solution: Context-Aware State Machine

​3.4 Core Principle 3: Filesystem as Extended Context

​3.4.1 Context Window Limitations

​3.4.2 Recoverable Compression Strategy

​3.5 Core Principle 4: Attention Manipulation Through Repetition

​3.5.1 The Goal Drift Problem

​3.5.2 Goal Repetition Strategy

​3.6 Core Principle 5: Preserving Error Traces

​3.6.1 The Importance of Error Recovery

​3.6.2 Learning from Errors Strategy

​3.7 Core Principle 6: Avoiding the Few-Shot Trap

​3.7.1 The Over-Imitation Problem

​3.7.2 Solution: Structured Diversity

​3.8 Integration with Game-Based Training

​4. Evaluation

​4.1 Main Results

​4.2 Context Engineering Impact

​5. Conclusion

​References

​Appendix: Context Engineering Metrics

​A.1 KV-Cache Performance Analysis

​A.2 Error Recovery Statistics

1. Introduction

1.1 Innovative Video Game-Based Training Methodology

1. Introduction

1.1 Innovative Video Game-Based Training Methodology

2. North Star Model Architecture

2.1 Pattern Sparse Attention

3. Advanced Context Engineering for Production AI Agents

3.1 The Strategic Choice: In-Context Learning

3.2 Core Principle 1: Designing Around KV-Cache

3.2.1 Agent Operational Principles

3.2.2 Economic Benefits of KV-Cache

3.2.3 Key KV-Cache Optimization Practices

3.3 Core Principle 2: Tool Masking Instead of Removal

3.3.1 The Action Space Growth Problem

3.3.2 Problems with Dynamic Tool Management

3.3.3 Solution: Context-Aware State Machine

3.4 Core Principle 3: Filesystem as Extended Context

3.4.1 Context Window Limitations

3.4.2 Recoverable Compression Strategy

3.5 Core Principle 4: Attention Manipulation Through Repetition

3.5.1 The Goal Drift Problem

3.5.2 Goal Repetition Strategy

3.6 Core Principle 5: Preserving Error Traces

3.6.1 The Importance of Error Recovery

3.6.2 Learning from Errors Strategy

3.7 Core Principle 6: Avoiding the Few-Shot Trap

3.7.1 The Over-Imitation Problem

3.7.2 Solution: Structured Diversity

3.8 Integration with Game-Based Training

4. Evaluation

4.1 Main Results

4.2 Context Engineering Impact

5. Conclusion

References

Appendix: Context Engineering Metrics

A.1 KV-Cache Performance Analysis

A.2 Error Recovery Statistics