Read Full Report

1. Introduction: Expression Lags Behind Thought

In the study of Large Language Models (LLMs), we often observe a phenomenon of “expression lag.” When decomposing model capabilities, a significant performance gap emerges across three distinct levels:

  1. Task Performance (TP, $P_{TP}$): The model’s ability to execute a task correctly (the “Hand”).
  2. Self-Verification (SV, $P_{SV}$): The model’s ability to verify the correctness of its own answer (the “Mouth”).
  3. Representation Readout (RR, $P_{RR}$): Information directly decodable from internal neuron activations via linear probes (the “Mind”).

Existing research consistently supports the inequality: $$ \sup_{\pi} P_{TP}(\pi) \le \sup_{\pi,A,s} P_{SV}(\pi,A,s) \le \sup_{\pi,A,l,g} P_{RR}(\pi,A,l,g) $$


Figure 1. The performance gap hierarchy in LLMs: Representation > Verification > Execution.

This creates a paradox: the model possesses “internal awareness” (RR) but lacks “explicit articulation” (SV), leading to execution failures (TP). While Chain-of-Thought (CoT) attempts to align execution with verification, what can bridge the gap between language output and internal intuition?

To solve this, we turn our attention to Looped Transformers.

2. The Challenger: The “Introspective” Potential

Unlike the standard feedforward Transformer, Looped Transformers introduce a recurrence mechanism across layers, allowing the model to “ruminate” on information before generating output.

[!TIP] What is a Looped Transformer?

  • CoT scales sequence length for thinking time.
  • Scaling Laws scale parameters for intelligence.
  • Looped Transformers scale computation depth.

The core concept is “Weight Sharing & Recursive Processing.” Intermediate hidden states are fed back into the model using the same weights for multiple cycles. This mimics human “deep thinking”—using limited brain capacity (parameters) but investing more time (loops) to improve reasoning quality.

Current research explores this architecture across two dimensions:

  • Mechanism Design (How to Loop):
    • PonderLM [1,2]: Converts prediction into probability-weighted sums.
    • Retrofitting-Recurrence [3]: Modifies only intermediate layers.
    • THINK-AT-HARD [4]: Introduces LoRA and duo-causal attention.
  • Strategy Control (Where to Loop):
    • Google MoR [5]: Uses routing to dynamically allocate compute budgets.
    • SEED OURO [7]: Introduces “early stopping” and entropy regularization.

We utilized the OURO model for our experiments, leveraging its support for the vLLM framework to verify whether “looping” truly enables “introspection.”

3. Finding 1: The Gap Shrinks, But at a Cost

We compared the performance of a Language Monitor versus a Representation Monitor in math and safety scenarios. The results reveal an intriguing trade-off.

Observation: As the number of loops increases, the gap between language expression and internal representation narrows, but not entirely for positive reasons.

  • Positive Effect: Enhanced Verification The model articulates and checks correctness better after “ruminating.”


    Figure 2. Language verification accuracy trends upward with more loops.

  • The Trade-off: Representation Erosion The gap reduction is partly due to a decline in representation monitoring performance.


    Figure 3. Representation monitoring degrades as thinking depth increases.

This implies a harsh reality: The looping process organizes thought but may simultaneously compress or erode raw information (entropy reduction). If “thinking too long” dulls intuitive sharpness, we must carefully balance depth with signal preservation.

4. Finding 2: Schrödinger’s Introspection

To test if the model continuously self-monitors, we replicated the Anthropic-style [8] “Thought Injection” experiment: forcibly injecting a concept vector during latent thinking and observing if the model notices.

The results were counterintuitive:


Figure 4. The model identifies injected concepts only near the final output stage.
  • Blindness During Process: In early loops, the model effectively ignores the injected vector, acting “unconsciously.”
  • Late Awakening: The model only identifies the foreign concept in the final loop, right before outputting.

This suggests that even with a recurrent architecture, the model’s semantic processing remains local and short-sighted. It does not maintain continuous self-awareness but rather “wakes up” only when forced to commit to an output.

5. Conclusion & Outlook

Our experiments highlight the complex relationship between language and representation:

  1. Asynchronous Evolution: Looping helps the model “say” things better, but not necessarily “think” more clearly.
  2. Limitations of Introspection: Current recurrent architectures have not yet yielded continuous self-monitoring awareness.

These findings are not endpoints but guideposts. Bridging the gap between “expression” and “representation” remains a critical milestone on the path to more advanced and trustworthy AI.

References

  1. Zeng B, Song S, Huang S, et al. Pretraining Language Models to Ponder in Continuous Space. arXiv preprint arXiv:2505.20674, 2025.
  2. Zeng B, Li H, Song S, et al. PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space. arXiv preprint arXiv:2509.23184, 2025.
  3. McLeish S, Li A, Kirchenbauer J, et al. Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence. arXiv preprint arXiv:2511.07384, 2025.
  4. Fu T, You Y, Chen Z, et al. Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models. arXiv preprint arXiv:2511.08577, 2025.
  5. Bae S, Kim Y, Bayat R, et al. Mixture-of-recursions: Learning dynamic recursive depths for adaptive token-level computation. arXiv preprint arXiv:2507.10524, 2025.
  6. Zhu R J, Wang Z, Hua K, et al. Scaling latent reasoning via looped language models. arXiv preprint arXiv:2510.25741, 2025.
  7. Lindsey J. Emergent introspective awareness in large language models. arXiv preprint arXiv:2601.01828, 2026.