这是indexloc提供的服务,不要输入任何密码
\addbibresource

FlawsOfOthers.bib

A Mathematical Theory of Discursive Networks

Juan B. Gutiérrez111juan.gutierrez3@utsa.edu Department of Mathematics, University of Texas at San Antonio.
(July 23, 2025)
Abstract

Large language models (LLMs) turn writing into a live exchange between humans and software. We characterize this new medium as a discursive network that treats people and LLMs as equal nodes and tracks how their statements circulate. We define the generation of erroneous information as invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. We develop a general mathematical model of discursive networks that shows that a network governed only by drift and self-repair stabilizes at a modest error rate. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source Flaws-of-Others (FOO) algorithm: a configurable loop in which any set of agents critique one another while a harmonizer merges their verdicts. We identify an ethical transgression, epithesis, that occurs when humans fail to engage in the discursive network. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from connecting imperfect ones into networks that enforce mutual accountability.

1 Introduction

Large Language Models (LLMs) are advanced artificial intelligence systems trained on vast amounts of text data to generate human-like language. These models utilize deep learning techniques to understand and produce text based on the patterns and structures found in their training data \parencitevaswani2017attention, brown2020language. LLMs have demonstrated impressive capabilities in various natural language processing tasks, including text generation, translation, and question-answering.

Despite their remarkable performance, LLMs are prone to generating false or misleading statements, a phenomenon often referred to as “hallucinations” \parenciteji2022survey. However, the metaphor of hallucination is limited: it implies a private sensory distortion, whereas an unfounded LLM assertion can be circulated, cited, and acted upon as fact. Throughout this paper we therefore use the broader term invalidation, and we show that what is commonly referred to as hallucination is just one of the many manifestations of invalid information.

Invalidations in LLMs can manifest as factual inconsistencies, logical contradictions, or entirely fabricated content that appears plausible \parencitemaynez2020faithfulness. This issue is exacerbated by the lack of a verification mechanism within the models themselves \parencitebender2021dangers, weidinger2021ethical. Although retrieval-augmented generation and self-consistency checks \parencitelewis2020retrieval,wang2023self reduce the problem, a substantial share of outputs remains unreliable-enough to undermine trust in practical deployments.

Empirical evaluations show that LLMs continue to produce non-trivial rates of factual error and harmful content after instruction tuning and reinforcement learning. Studies in medical domains have documented significant factual inaccuracies in model answers [thirunavukarasu2023large, sallam2023chatgpt], and work on adversarial prompting has demonstrated that safety-trained models still emit disallowed content [zou2023universal, chao2023jailbreaking].

Several factors contribute to the occurrence of invalid information, including biases in training data, limitations in knowledge representation, and the models’ tendency to prioritize fluency over factual accuracy \parencitelin2022truthfulqa. The prevalence and impact of invalidations are significant, with quantitative evaluations revealing that they occur in up to 30% of generated responses, substantially affecting the trustworthiness of these models \parenciteji2022survey, lin2022truthfulqa. Moreover, studies have shown that LLMs can generate false information even when explicitly prompted to be truthful \parenciteevans2021truthful, underscoring the challenge of aligning model outputs with factual correctness.

This article is organized as follows: Section 1.1 introduces the concept of invalidation as a broader alternative to hallucination; Section 1.2 situates invalidation within established cognitive and media theories, showing it as a universal feature of both human and artificial cognition; Section 1.3 examines how invalidations propagate through interconnected communication systems; Section 2.3 provides an information-theoretic basis for why verification is fundamentally easier than generation; Section 2.4 formalizes discursive networks as mathematical structures with actors, statements, and update rules; Section 2.5 develops three progressively complex models of invalidation dynamics: single-network with binary states, single-network with emergent invalidation, and cross-network detection; Section 3 demonstrates the mathematical consistency of these models through theoretical analysis and parameter exploration; finally, Section 4.2 addresses ethical concerns including epithesis, energy costs, and epistemic diversity, before outlining future research directions.

The scope of this manuscript encompasses both theoretical foundations and practical implementations. We establish mathematical floors on invalidation probability, develop network models for error propagation and detection, and present the Flaws-of-Others (FOO) algorithm with cryptographic integrity verification. The analysis focuses on invalidations that emerge during inference in large language models and mitigation strategies based on cross-agent critique. While we do not provide exhaustive failure catalogues or benchmark comparisons, we advance a unified mathematical framework demonstrating how networks of imperfect agents can achieve error rates below what any individual agent attains.

1.1 From “Hallucination” to Invalidation

The word hallucination has become the default label for false statements generated by LLMs. Borrowed from perceptual psychology, it misses two crucial aspects. First, an LLM’s error is not confined to a private experience; it can be adopted by readers and propagate through networks, amplifying misinformation \parencitecrawford2021excavating. Second, focussing on hallucinations alone narrows the research agenda, leaving other error classes (logical contradictions, format violations, ethical breaches) under-examined.

We call any output that violates a constraint set of facts, logic, norms, and formats, an invalidation. Because an autoregressive decoder maximises next-token likelihood rather than global consistency, a non-zero slice of probability mass inevitably falls outside the constraints.

In practice, invalidation surfaces along at least five recurring archetypes that differ in locus and detectability. First, hallucination denotes the introduction of content that is ungrounded in any trusted source or context; large-scale surveys show it to be pervasive even in the highest-performing models \parencitehuang2025survey. Second, contradiction captures internally inconsistent statements that coexist within a single generation, a failure mode quantified and mitigated by prompt-based self-refinement techniques \parencitemundler2023selfcontradiction. Third, deductive error arises when the model draws logically invalid conclusions from true premises, an error family systematically stress-tested with adversarial perturbations \parencitehoppe2025deductive. Fourth, pragmatic impropriety concerns outputs that violate social or professional norms, including toxicity, hate speech, or privacy leakage; the RealToxicityPrompts benchmark revealed that even innocuous inputs can trigger toxic degeneration \parencitegehman2020realtoxicity. Finally, format violation occurs when the model breaks explicit structural constraints (e.g., JSON Schema), jeopardising downstream machine consumption; work with JSONSchemaBench shows that such violations remain stubbornly frequent despite constrained decoding \parencitegeng2025jsonschema.

Each error class manifests one predicate: the content fails to match a state of the world. That predicate admits representation with an given invalidation rate. The models in Section 2.4 use the rate alone and does not depend on class labels while it quantifies the chance that any output lacks validity.

Taken together, these archetypes point to a broad invalidation family, potentially with more members, underscoring the need for evaluation suites and mitigation strategies that address the full spectrum of failure modes. Figure 1 schematizes this superset-subset relationship.

Invalid OutputsHallucinationContradictionDeductiveErrorPragmaticImproprietyFormatViolation
Figure 1: Venn-style illustration of members of a broader set of invalid outputs produced by large language models.

1.2 Invalidation as a Universal Feature of Human and Artificial Cognition

Invalidation in contemporary LLM outputs is not an isolated flaw unique to artificial intelligence but mirrors well-documented behaviors in human discourse. This similarity suggests that invalidation is not simply a by-product of autoregressive sampling, but a potentially universal cognitive process rooted in the fundamental nature of information processing through language. It emerges when any complex agent (biological or artificial) operates under uncertainty, bounded rationality, and social constraints.

Human cognition systematically prioritizes narrative coherence over factual accuracy, a tendency that emerges not from individual pathology but from the fundamental architecture of meaning-making itself. When confronted with contradictory evidence, both individuals and groups construct elaborate justifications that preserve existing belief structures rather than revising them \parencitefestinger1957cognitive, cohen2001states. This preference for coherence manifests across scales: from the micro-level impression management that shapes everyday social interactions \parencitegoffman1959presentation to the macro-level collective narratives that enable societies to ignore systemic atrocities \parencitecohen2001states. Remarkably, this same structural bias toward local coherence over global accuracy appears in large language models, where autoregressive architectures favor maintaining consistency with previous tokens even at the expense of factual correctness. The parallel suggests that invalidation arises not from a flaw in either human or artificial systems, but from a deeper computational trade-off inherent to any agent that must construct meaning from sequential, uncertain information.

The medium itself acts as an epistemic filter, determining not just what information reaches us but what we accept as real, a process that operates identically whether the medium is television, print journalism, or a large language model. Media theorists have long recognized that truth emerges less from content evaluation than from structural repetition: the same claim, encountered repeatedly through trusted channels, eventually sediments into accepted fact regardless of its veracity \parencitegerbner1976living, mcluhan1964understanding. This manufacturing of consensus operates through cascading filters (economic incentives, institutional biases, and technological affordances) that systematically amplify certain narratives while suppressing others \parenciteherman1988manufacturing. Large language models instantiate this same filtering mechanism at an unprecedented scale: their training corpora encode the biases of millions of sources, their attention mechanisms privilege frequently repeated patterns, and their optimization objectives reward fluent reproduction over factual verification. The result is a computational echo chamber where invalidations, once embedded in training data, achieve the same truth-like status through sheer statistical dominance that media repetition grants to human beliefs.

Propaganda operates by exploiting a fundamental vulnerability in epistemic systems: sustained repetition of coordinated falsehoods eventually overwhelms the capacity for empirical verification, creating an alternate reality that becomes self-reinforcing through social proof. This mechanism, which political theorists identify as the cornerstone of totalitarian control, functions by flooding the information environment with internally consistent but externally false narratives until the sheer cognitive cost of maintaining skepticism exceeds most people’s capacity \parencitearendt1951origins, ellul1965propaganda. The parallel with large language models is striking: trained on billions of documents where certain false narratives appear thousands of times, these systems internalize misinformation not through ideological commitment but through pure statistical frequency. Just as propaganda succeeds by making lies more cognitively available than truth \parenciteellul1965propaganda, LLMs generate invalidations by sampling from probability distributions where well-represented falsehoods outweigh poorly-documented facts. The computational architecture thus recreates, without intention or awareness, the same reality-distortion mechanisms that human propagandists deploy deliberately.

Invalidation emerges from the fundamental computational shortcuts that make complex reasoning tractable; these shortcuts manifest identically in biological neural networks and artificial transformers. The core mechanism is substitution: when faced with difficult questions about truth or probability, both humans and LLMs unconsciously replace them with easier questions about familiarity and similarity \parencitetversky1974judgment, kunda1990motivated. This substitution operates through dual channels: the availability heuristic replaces “what is true?” with “what comes easily to mind?”, while the representativeness heuristic replaces “what is probable?” with “what resembles my prototype?” In large language models, these exact substitutions occur mechanistically—the softmax function literally converts truth-seeking into frequency-matching, while attention heads select tokens based on similarity rather than veracity. The result is a convergent failure mode where both human reasoning and machine generation systematically mistake statistical patterns for factual reality, producing confident invalidations that feel true precisely because they align with existing distributions rather than external facts \parencitekunda1990motivated.

The cross-disciplinary convergence of these findings implies that invalidation is not an accidental error mode but a structural consequence of how intelligent systems manage complexity, uncertainty, and contradiction. This insight reframes LLM invalidation as a computational echo of cognitive strategies humans employ. Rather than indicating a breakdown of alignment, it may reveal the presence of alignment to socially and contextually shaped heuristics that guide behavior in uncertain conditions.

Addressing invalidation in LLMs will require approaches that incorporate sociological, psychological, and media-theoretical models of belief formation and narrative control. At the same time, observing how LLMs generate invalidations may provide new empirical traction for understanding human cognitive phenomena such as confirmation bias, belief perseverance, and collective denial. The interdependence between artificial and human cognition in this respect suggests that solutions to the propagation of invalidation may emerge not solely from engineering but from a broader inquiry into the structure of meaning-making itself.

1.3 Risks of Invalidation Spill-over Through Discursive Networks

Invalidations produced by large language models rarely remain isolated. Once released into public channels they propagate through interconnected systems of communication. A discursive network is a large-scale ecosystem of human and machine agents whose utterances circulate, reinforce, and mutate through repeated exchange. When an LLM instantiates many synthetic voices that emit high volumes of text, those voices become additional nodes that mediate narrative transmission and amplification.

We use the term discursive network, rather than the established discourse network, to signal a McLuhanian twist. In classical discourse-network analysis the medium is a passive conduit: speeches, papers, and news items are shuffled among human actors while the underlying carrier remains inert. In the networks that include LLMs the carrier intervenes. The medium does not just move the message; it edits, rewrites, and recombines it at every hop. By switching from discourse to discursive we emphasise that language itself is now produced inside the network dynamics, co-authored by the very infrastructure that transmits it; this is a direct extension of McLuhan’s dictum from “the medium is the message” to “the medium acts on the message.”

Discursive networks consist of nodes (agents or messages) and edges (interactions, citations, reshares). Closely related structures appear in political-science discourse-network analysis, where policy actors and speech acts form time-evolving coalitions \parenciteleifeld2014,leifeld2012. Kittler’s media archaeology likewise foregrounds how technological substrates shape what counts as meaningful speech \parencitekittler1990. Our focus on generative models extends these traditions by treating LLM outputs as first-class network nodes.

Humans struggle to separate machine-generated prose from human writing. Controlled studies across genres report identification accuracy only slightly above chance (about 55-65 %) and show that readers rely on fragile surface heuristics such as pronoun frequency and stylistic fluency \parenciteippolito2020,DBLP:conf/acl/GehrmannSR19. In a discursive network this limitation matters: synthetic invalidations, mis-taken for human statements, are more readily reposted or cited.

Once injected, invalid content spreads via echo-amplification. Research on social-media communities finds that clusters organized in bow-tie topologies (tightly knit cores with radiating peripheries) amplify low-quality or misleading messages more than high-quality ones \parencitegarimella2018. Agents in these clusters can absorb and relay invalidations, algorithmic or human, without verification, entrenching flawed narratives.

Detection tools provide only partial relief. State-of-the-art AI detectors fall below 80 % accuracy on paraphrased or adversarial samples and show biases toward false positives on human text \parencitehuang2024robust,tufts2024practical,sadasivan2025reliably. Human moderators fare little better, typically around 65 % in blinded settings \parenciteippolito2020. Because both machines and people misclassify a substantial share of content, invalidations re-enter discourse with minimal friction.

Mitigating spill-over therefore demands a synthesis of discourse-network analysis, community-structure research, and detection studies. High- volume clusters of unverified claims need tracing, and interventions should target influential nodes, always within robust privacy and governance constraints. Recognising synthetic agents as network participants reframes classic media theory for the generative era: discursive-network analysis becomes both a descriptive lens on spill-over risk and a practical framework for targeted intervention.

2 Methods

This section lays out the analytical and computational machinery that supports the paper’s argument. We begin by establishing a mathematical floor on invalidation probability, proving that no finite-loss LLM can achieve zero error rate (Section 2.1). We then categorize user-model interactions into three functional classes—critique, ideation, and product-oriented generation—and show why critique operations are computationally cheapest, residing near the peak of the model’s output distribution (Section 2.3).

Building on these foundations, we formalize discourse as a network N=(A,S,P,I,C,B,U,G)𝑁𝐴𝑆𝑃𝐼𝐶𝐵𝑈𝐺N=(A,S,P,I,C,B,U,G)italic_N = ( italic_A , italic_S , italic_P , italic_I , italic_C , italic_B , italic_U , italic_G ) whose nodes exchange and validate statements under well-defined update rules (Section 2.4). We analyze three progressively richer belief dynamics (Section 2.5): first, a single-network model with binary truth states, proving convergence to a unique equilibrium determined by flip probabilities; second, an extended model incorporating spontaneous invalidation generation, showing how fabrication rates push systems into error-dominant regimes; and third, a dual-network model with cross-detection, deriving conditions under which external scrutiny reduces aggregate error below single-network baselines.

These theoretical results culminate in practical design principles: we derive the minimum number of cross-checking agents needed to achieve any target error tolerance (Section 2.5.5) and present the Flaws-of-Others (FOO) algorithm that implements these detection mechanisms in software, complete with cryptographic integrity verification to prevent post-hoc tampering (Section 2.6). Together, these methods provide both the mathematical framework for understanding invalidation propagation and the computational tools for mitigating it in practice.

2.1 Floor on invalidation probability

This subsection proves that a strictly positive invalidation probability is unavoidable for any LLM whose cross-entropy loss on its training distribution remains finite.

A discursive network has a particular standard, e.g. factual accuracy, logical coherence, a set of safety rules, etc. Every sequence that meets this standard is called “valid.” Whenever a sequence breaks the standard, we have an “invalidation.” In symbols, the indicator C(x)=1𝐶𝑥1C(x)=1italic_C ( italic_x ) = 1 announces that sequence x𝑥xitalic_x is valid, and the collection S={x:C(x)=0}𝑆conditional-set𝑥𝐶𝑥0S=\{x:C(x)=0\}italic_S = { italic_x : italic_C ( italic_x ) = 0 } captures every possible invalidation.

The training corpus itself, represented by a probability distribution Q𝑄Qitalic_Q, might be imperfect; a fraction q=Q(S)𝑞𝑄𝑆q=Q(S)italic_q = italic_Q ( italic_S ) of its mass lies inside S𝑆Sitalic_S. This fraction q𝑞qitalic_q measures how much invalidating content appears in the training data, e.g. factual errors in bibliographic sources, biased statements in news sources, outdated information in historical documents, logical contradictions in discussion forums, etc.

After training the language model on this corpus (including pre-training via next-token prediction, instruction tuning, and safety fine-tuning), we obtain a model whose learned distribution is Pθsubscript𝑃𝜃P_{\theta}italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. The central question is how much probability this trained model still places on S𝑆Sitalic_S, i.e. what chance it retains of producing an invalidation despite all the training improvements applied.

Empirical studies provide sobering context: medical-domain evaluations report factual errors in the 15–30 % range [thirunavukarasu2023large, sallam2023chatgpt], while adversarial assessments find 3–5 % harmful outputs even after safety training [zou2023universal, chao2023jailbreaking]. These persistent rates reflect structural limitations that no training regime can fully overcome.

Given KL(QPθ)KLconditional𝑄subscript𝑃𝜃\mathrm{KL}(Q\parallel P_{\theta})roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ), the Kullback-Leibler divergence from the reference distribution Q𝑄Qitalic_Q to the model distribution Pθsubscript𝑃𝜃P_{\theta}italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, we can ennounce the following lemma.

Lemma 2.1 (Invalidation floor).

If PθQmuch-less-thansubscript𝑃𝜃𝑄P_{\theta}\ll Qitalic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ≪ italic_Q and KL(QPθ)<KLconditional𝑄subscript𝑃𝜃\mathrm{KL}(Q\parallel P_{\theta})<\inftyroman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) < ∞, then

Pθ(S)qexp(KL(QPθ)/q).subscript𝑃𝜃𝑆𝑞KLconditional𝑄subscript𝑃𝜃𝑞P_{\theta}(S)\geq q\exp\bigl{(}-\mathrm{KL}(Q\parallel P_{\theta})/q\bigr{)}.italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) ≥ italic_q roman_exp ( - roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) / italic_q ) .

PθQmuch-less-thansubscript𝑃𝜃𝑄P_{\theta}\ll Qitalic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ≪ italic_Q is known as known as absolute continuity. Whenever the training data assigns zero probability to a sequence, the trained model also assigns zero probability to that sequence. More formally: Pθ(x)>0subscript𝑃𝜃𝑥0P_{\theta}(x)>0italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) > 0 whenever Q(x)>0𝑄𝑥0Q(x)>0italic_Q ( italic_x ) > 0, and Pθ(x)=0subscript𝑃𝜃𝑥0P_{\theta}(x)=0italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) = 0 whenever Q(x)=0𝑄𝑥0Q(x)=0italic_Q ( italic_x ) = 0.

This inequality says that the residual invalidation probability cannot be driven to zero unless either the corpus is completely free of invalidations (q=0𝑞0q=0italic_q = 0) or the model’s divergence from the corpus becomes infinite, which would imply infinite loss. The floor can be exponentially small when the model departs sharply from the training data through extensive fine-tuning, yet it never vanishes entirely.

Proof.

Expand the KL divergence and partition the sum by membership in S𝑆Sitalic_S:

KL(QPθ)=xSQ(x)logQ(x)Pθ(x)+xSQ(x)logQ(x)Pθ(x).KLconditional𝑄subscript𝑃𝜃subscript𝑥𝑆𝑄𝑥𝑄𝑥subscript𝑃𝜃𝑥subscript𝑥𝑆𝑄𝑥𝑄𝑥subscript𝑃𝜃𝑥\mathrm{KL}(Q\parallel P_{\theta})=\sum_{x\in S}Q(x)\log\frac{Q(x)}{P_{\theta}% (x)}+\sum_{x\notin S}Q(x)\log\frac{Q(x)}{P_{\theta}(x)}.roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_x ∈ italic_S end_POSTSUBSCRIPT italic_Q ( italic_x ) roman_log divide start_ARG italic_Q ( italic_x ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) end_ARG + ∑ start_POSTSUBSCRIPT italic_x ∉ italic_S end_POSTSUBSCRIPT italic_Q ( italic_x ) roman_log divide start_ARG italic_Q ( italic_x ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) end_ARG .

Since KL divergence terms are non-negative, we have

KL(QPθ)xSQ(x)logQ(x)Pθ(x).KLconditional𝑄subscript𝑃𝜃subscript𝑥𝑆𝑄𝑥𝑄𝑥subscript𝑃𝜃𝑥\mathrm{KL}(Q\parallel P_{\theta})\geq\sum_{x\in S}Q(x)\log\frac{Q(x)}{P_{% \theta}(x)}.roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ≥ ∑ start_POSTSUBSCRIPT italic_x ∈ italic_S end_POSTSUBSCRIPT italic_Q ( italic_x ) roman_log divide start_ARG italic_Q ( italic_x ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) end_ARG .

Apply the log-sum inequality (for positive numbers ai,bisubscript𝑎𝑖subscript𝑏𝑖a_{i},b_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with A=iai𝐴subscript𝑖subscript𝑎𝑖A=\sum_{i}a_{i}italic_A = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and B=ibi𝐵subscript𝑖subscript𝑏𝑖B=\sum_{i}b_{i}italic_B = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we have iailogaibiAlogABsubscript𝑖subscript𝑎𝑖subscript𝑎𝑖subscript𝑏𝑖𝐴𝐴𝐵\sum_{i}a_{i}\log\frac{a_{i}}{b_{i}}\geq A\log\frac{A}{B}∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log divide start_ARG italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ≥ italic_A roman_log divide start_ARG italic_A end_ARG start_ARG italic_B end_ARG) to the right-hand side. Setting ax=Q(x)subscript𝑎𝑥𝑄𝑥a_{x}=Q(x)italic_a start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_Q ( italic_x ) and bx=Pθ(x)subscript𝑏𝑥subscript𝑃𝜃𝑥b_{x}=P_{\theta}(x)italic_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) for xS𝑥𝑆x\in Sitalic_x ∈ italic_S, we get:

xSQ(x)logQ(x)Pθ(x)Q(S)logQ(S)Pθ(S)=qlogqPθ(S).subscript𝑥𝑆𝑄𝑥𝑄𝑥subscript𝑃𝜃𝑥𝑄𝑆𝑄𝑆subscript𝑃𝜃𝑆𝑞𝑞subscript𝑃𝜃𝑆\sum_{x\in S}Q(x)\log\frac{Q(x)}{P_{\theta}(x)}\geq Q(S)\log\frac{Q(S)}{P_{% \theta}(S)}=q\log\frac{q}{P_{\theta}(S)}.∑ start_POSTSUBSCRIPT italic_x ∈ italic_S end_POSTSUBSCRIPT italic_Q ( italic_x ) roman_log divide start_ARG italic_Q ( italic_x ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) end_ARG ≥ italic_Q ( italic_S ) roman_log divide start_ARG italic_Q ( italic_S ) end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) end_ARG = italic_q roman_log divide start_ARG italic_q end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) end_ARG .

Therefore qlogqPθ(S)KL(QPθ)𝑞𝑞subscript𝑃𝜃𝑆KLconditional𝑄subscript𝑃𝜃q\log\frac{q}{P_{\theta}(S)}\leq\mathrm{KL}(Q\parallel P_{\theta})italic_q roman_log divide start_ARG italic_q end_ARG start_ARG italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) end_ARG ≤ roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ). Solving for Pθ(S)subscript𝑃𝜃𝑆P_{\theta}(S)italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) yields:

Pθ(S)qexp(KL(QPθ)/q).subscript𝑃𝜃𝑆𝑞KLconditional𝑄subscript𝑃𝜃𝑞P_{\theta}(S)\geq q\exp\bigl{(}-\mathrm{KL}(Q\parallel P_{\theta})/q\bigr{)}.italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) ≥ italic_q roman_exp ( - roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) / italic_q ) .

The inequality is a logical guard-rail: when KLqmuch-greater-thanKL𝑞\mathrm{KL}\gg qroman_KL ≫ italic_q the numerical floor may be tiny, yet it proves that any finite-loss model retains strictly positive invalidation probability. External verification layers are therefore indispensable.

The practical consequence is inescapable: any errors, contradictions, or policy violations that survive in the training data leave an indelible statistical trace in the model. However sophisticated the training regimen (whether through reinforcement learning from human feedback, constitutional AI, or advanced safety fine-tuning) Pθ(S)subscript𝑃𝜃𝑆P_{\theta}(S)italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) remains strictly positive, so some risk of invalid output persists. This mathematical inevitability motivates our investigation of asymmetric task difficulties (Section 2.3), where we show that detecting invalidations is computationally easier than avoiding them during generation.

From Inevitability to Mitigation.

Lemma 2.1 establishes that Pθ(S)>0subscript𝑃𝜃𝑆0P_{\theta}(S)>0italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) > 0 whenever training loss is finite, i.e. invalidation is mathematically inevitable. This floor exists because LLMs maximize likelihood over their training distribution Q𝑄Qitalic_Q, which itself contains errors at rate q>0𝑞0q>0italic_q > 0. The bound Pθ(S)qexp(KL(QPθ)/q)subscript𝑃𝜃𝑆𝑞KLconditional𝑄subscript𝑃𝜃𝑞P_{\theta}(S)\geq q\exp(-\mathrm{KL}(Q\parallel P_{\theta})/q)italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S ) ≥ italic_q roman_exp ( - roman_KL ( italic_Q ∥ italic_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) / italic_q ) shows that even aggressive fine-tuning (large KL divergence) cannot eliminate invalidations entirely.

This inevitability motivates a strategic pivot: rather than pursuing the impossible goal of zero invalidation through model improvement alone, we must design systems that detect and correct errors post-generation. The key insight is that different types of LLM outputs have different amenability to verification. As we show next, verification tasks (e.g. asking models to identify flaws in existing text) lie in high-probability regions of the output distribution and thus can be generated reliably even by imperfect models. This asymmetry between generation difficulty and verification difficulty forms the theoretical foundation for our multi-agent approach: we harness the relative ease of critique to construct networks where mutual verification pushes system-wide error rates below what any individual model achieves.

2.2 Information-Theoretic Basis for Verification Advantage

The inevitability of invalidations established in Lemma 2.1 raises a crucial question: if generation necessarily produces errors, can we at least detect them reliably? We now prove that verification is fundamentally easier than generation, providing the theoretical foundation for our multi-agent approach.

Theorem 2.1 (Variable-Length Entropy Comparison).

Let 𝒢𝒢\mathcal{G}caligraphic_G and 𝒱𝒱\mathcal{V}caligraphic_V be generation and verification tasks. Let:

  • (Lg,Yg)subscript𝐿𝑔subscript𝑌𝑔(L_{g},Y_{g})( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) and (Lv,Yv)subscript𝐿𝑣subscript𝑌𝑣(L_{v},Y_{v})( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ): joint random variables for length and content

  • Pg(n)superscriptsubscript𝑃𝑔𝑛P_{g}^{(n)}italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT and Pv(n)superscriptsubscript𝑃𝑣𝑛P_{v}^{(n)}italic_P start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT: conditional distributions P(Yg|Lg=n)𝑃conditionalsubscript𝑌𝑔subscript𝐿𝑔𝑛P(Y_{g}|L_{g}=n)italic_P ( italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_n ) and P(Yv|Lv=n)𝑃conditionalsubscript𝑌𝑣subscript𝐿𝑣𝑛P(Y_{v}|L_{v}=n)italic_P ( italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_n )

  • πg(n)subscript𝜋𝑔𝑛\pi_{g}(n)italic_π start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_n ) and πv(n)subscript𝜋𝑣𝑛\pi_{v}(n)italic_π start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_n ): length distributions P(Lg=n)𝑃subscript𝐿𝑔𝑛P(L_{g}=n)italic_P ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_n ) and P(Lv=n)𝑃subscript𝐿𝑣𝑛P(L_{v}=n)italic_P ( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_n )

Then (with all logarithms base 2, giving entropy in bits):

H(Lg,Yg)H(Lv,Yv)=H(Lg)H(Lv)+nπg(n)H(Yg|Lg=n)nπv(n)H(Yv|Lv=n)𝐻subscript𝐿𝑔subscript𝑌𝑔𝐻subscript𝐿𝑣subscript𝑌𝑣𝐻subscript𝐿𝑔𝐻subscript𝐿𝑣subscript𝑛subscript𝜋𝑔𝑛𝐻conditionalsubscript𝑌𝑔subscript𝐿𝑔𝑛subscript𝑛subscript𝜋𝑣𝑛𝐻conditionalsubscript𝑌𝑣subscript𝐿𝑣𝑛H(L_{g},Y_{g})-H(L_{v},Y_{v})=H(L_{g})-H(L_{v})+\sum_{n}\pi_{g}(n)H(Y_{g}|L_{g% }=n)-\sum_{n}\pi_{v}(n)H(Y_{v}|L_{v}=n)italic_H ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) - italic_H ( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) = italic_H ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) - italic_H ( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_n ) italic_H ( italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_n ) - ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_n ) italic_H ( italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_n )

where the conditional entropy is:

H(Yg|Lg=n)=ysuppPg(n)Pg(n)(y)logPg(n)(y)𝐻conditionalsubscript𝑌𝑔subscript𝐿𝑔𝑛subscript𝑦suppsuperscriptsubscript𝑃𝑔𝑛superscriptsubscript𝑃𝑔𝑛𝑦superscriptsubscript𝑃𝑔𝑛𝑦H(Y_{g}|L_{g}=n)=-\sum_{y\in\operatorname{supp}P_{g}^{(n)}}P_{g}^{(n)}(y)\log P% _{g}^{(n)}(y)italic_H ( italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_n ) = - ∑ start_POSTSUBSCRIPT italic_y ∈ roman_supp italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) roman_log italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y )

and similarly for H(Yv|Lv=n)𝐻conditionalsubscript𝑌𝑣subscript𝐿𝑣𝑛H(Y_{v}|L_{v}=n)italic_H ( italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_n ).

Proof.

By the chain rule for joint entropy:

H(Lg,Yg)𝐻subscript𝐿𝑔subscript𝑌𝑔\displaystyle H(L_{g},Y_{g})italic_H ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) =H(Lg)+H(Yg|Lg)absent𝐻subscript𝐿𝑔𝐻conditionalsubscript𝑌𝑔subscript𝐿𝑔\displaystyle=H(L_{g})+H(Y_{g}|L_{g})= italic_H ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) + italic_H ( italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) (1)
=H(Lg)+nπg(n)H(Yg|Lg=n)absent𝐻subscript𝐿𝑔subscript𝑛subscript𝜋𝑔𝑛𝐻conditionalsubscript𝑌𝑔subscript𝐿𝑔𝑛\displaystyle=H(L_{g})+\sum_{n}\pi_{g}(n)H(Y_{g}|L_{g}=n)= italic_H ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_n ) italic_H ( italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_n ) (2)
=H(Lg)+nπg(n)[ysuppPg(n)Pg(n)(y)logPg(n)(y)]absent𝐻subscript𝐿𝑔subscript𝑛subscript𝜋𝑔𝑛delimited-[]subscript𝑦suppsuperscriptsubscript𝑃𝑔𝑛superscriptsubscript𝑃𝑔𝑛𝑦superscriptsubscript𝑃𝑔𝑛𝑦\displaystyle=H(L_{g})+\sum_{n}\pi_{g}(n)\left[-\sum_{y\in\operatorname{supp}P% _{g}^{(n)}}P_{g}^{(n)}(y)\log P_{g}^{(n)}(y)\right]= italic_H ( italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_n ) [ - ∑ start_POSTSUBSCRIPT italic_y ∈ roman_supp italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) roman_log italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) ] (3)

Similarly for (Lv,Yv)subscript𝐿𝑣subscript𝑌𝑣(L_{v},Y_{v})( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ):

H(Lv,Yv)𝐻subscript𝐿𝑣subscript𝑌𝑣\displaystyle H(L_{v},Y_{v})italic_H ( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) =H(Lv)+nπv(n)H(Yv|Lv=n)absent𝐻subscript𝐿𝑣subscript𝑛subscript𝜋𝑣𝑛𝐻conditionalsubscript𝑌𝑣subscript𝐿𝑣𝑛\displaystyle=H(L_{v})+\sum_{n}\pi_{v}(n)H(Y_{v}|L_{v}=n)= italic_H ( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_n ) italic_H ( italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_n ) (4)
=H(Lv)+nπv(n)[ysuppPv(n)Pv(n)(y)logPv(n)(y)]absent𝐻subscript𝐿𝑣subscript𝑛subscript𝜋𝑣𝑛delimited-[]subscript𝑦suppsuperscriptsubscript𝑃𝑣𝑛superscriptsubscript𝑃𝑣𝑛𝑦superscriptsubscript𝑃𝑣𝑛𝑦\displaystyle=H(L_{v})+\sum_{n}\pi_{v}(n)\left[-\sum_{y\in\operatorname{supp}P% _{v}^{(n)}}P_{v}^{(n)}(y)\log P_{v}^{(n)}(y)\right]= italic_H ( italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_n ) [ - ∑ start_POSTSUBSCRIPT italic_y ∈ roman_supp italic_P start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) roman_log italic_P start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) ] (5)

Subtracting the second equation from the first yields the stated result. ∎

Remark 2.1 (Why General Theorems Fail).

We cannot prove that verification is universally easier than generation because:

  1. 1.

    Length-content dependence: In autoregressive models, length often encodes information (e.g., “yes” vs. detailed explanations)

  2. 2.

    Unobservable distributions: We cannot measure the full tail of Pg(n)(y)superscriptsubscript𝑃𝑔𝑛𝑦P_{g}^{(n)}(y)italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) or Pv(n)(y)superscriptsubscript𝑃𝑣𝑛𝑦P_{v}^{(n)}(y)italic_P start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ( italic_y ) empirically, making entropy estimates unreliable

  3. 3.

    Task-specific constraints: The set of valid sequences varies dramatically by domain and cannot be bounded universally

Any rigorous claim must be task-specific and empirically grounded.

For specific task pairs where we can measure output distributions empirically, define the observable concentration ratio:

Rk(P)=i=1kpimax(k/|suppP|,ϵ)subscript𝑅𝑘𝑃superscriptsubscript𝑖1𝑘subscript𝑝𝑖𝑘supp𝑃italic-ϵR_{k}(P)=\frac{\sum_{i=1}^{k}p_{i}}{\max(k/\lvert\operatorname{supp}P\rvert,% \epsilon)}italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_P ) = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_max ( italic_k / | roman_supp italic_P | , italic_ϵ ) end_ARG

where p1p2subscript𝑝1subscript𝑝2p_{1}\geq p_{2}\geq\ldotsitalic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ … are the ranked probabilities of distribution P𝑃Pitalic_P and ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 prevents division by zero.

Empirical evidence suggests verification outputs are more concentrated than generation outputs:

  • Generation diversity: \textcitehashimoto2019unifying measured that GPT-2’s generation coverage (fraction of human-written continuations assigned high probability) is only 15-20%, indicating that probability mass is spread across many valid but unseen completions. \textciteholtzman2020curious further showed that nucleus sampling with p=0.95𝑝0.95p=0.95italic_p = 0.95 is needed to achieve human-like text diversity, confirming that generation probability is dispersed across a large tail.

  • Verification concentration: \textciteschick2021exploiting found that when prompted for binary classification, over 90% of GPT-3’s probability mass concentrates on the top 2-3 tokens (e.g., “Yes”/“No” plus punctuation variants). \textcitemin2022rethinking demonstrated that in-context learning for verification tasks achieves near-peak performance with just label tokens, suggesting the output distribution is highly peaked on a small vocabulary subset.

  • Direct comparison: \textcitekadavath2022language showed that language models’ self-evaluation of their own outputs clusters around confidence values of 0.1, 0.5, and 0.9 (high concentration), while their actual answer distribution spans hundreds of phrasings (low concentration). This asymmetry between evaluation and generation distributions supports our theoretical framework.

This observable difference in concentration, while not universal, appears consistently enough to motivate verification-based error reduction strategies.

Implications for Discursive Networks.

The empirical concentration differences documented above provide practical justification for why cross-agent critique can achieve detection rates d𝑑ditalic_d that exceed invalidation rates λ𝜆\lambdaitalic_λ. When LLMs are tasked with verification—e.g., “find flaws” in peer outputs—they operate in the high-concentration regime where dominant patterns from training data guide responses. This contrasts with generation tasks that require exploring the long tail of the output distribution. While we cannot prove a universal entropy gap H(Yg|Lg)H(Yv|Lv)>0𝐻conditionalsubscript𝑌𝑔subscript𝐿𝑔𝐻conditionalsubscript𝑌𝑣subscript𝐿𝑣0H(Y_{g}|L_{g})-H(Y_{v}|L_{v})>0italic_H ( italic_Y start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) - italic_H ( italic_Y start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT | italic_L start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ) > 0 without task-specific assumptions (as shown in Theorem 2.1), the consistent empirical pattern of verification concentration exceeding generation concentration suggests that detection rates d𝑑ditalic_d can systematically exceed fabrication rates λ𝜆\lambdaitalic_λ in practice. The FOO algorithm exploits this empirical regularity to achieve system-wide error reduction, even when individual agents remain fallible in generation.

2.3 Categorization of Mechanisms to Engage LLM Agents

We now examine how different engagement mechanisms map onto the information-theoretic landscape established above. During inference, a Transformer language model executes a single forward pass per token. The model transforms context 𝐱tsubscript𝐱absent𝑡\mathbf{x}_{\leq t}bold_x start_POSTSUBSCRIPT ≤ italic_t end_POSTSUBSCRIPT into embeddings, applies L𝐿Litalic_L fixed self-attention layers to produce hidden state 𝐡tsubscript𝐡𝑡\mathbf{h}_{t}bold_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and maps this to a token probability distribution via:

p(wt+1𝐱t)=softmax(W𝐡t).𝑝conditionalsubscript𝑤𝑡1subscript𝐱absent𝑡softmax𝑊subscript𝐡𝑡p(w_{t+1}\mid\mathbf{x}_{\leq t})=\mathrm{softmax}\bigl{(}W\mathbf{h}_{t}\bigr% {)}.italic_p ( italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT ≤ italic_t end_POSTSUBSCRIPT ) = roman_softmax ( italic_W bold_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .

No gradient updates or objective-function evaluations occur at this stage; the only on-the-fly “optimization” is the decoding heuristic (greedy, top-k𝑘kitalic_k, nucleus, or beam search) that selects the next token from the static distribution p()𝑝p(\cdot)italic_p ( ⋅ ).

LLMs support a range of functional output types that can be systematically grouped into three core categories we introduce in this manuscript: ideation (constructive synthesis), critique (diagnostic evaluation), and product-oriented generation (goal-directed deliverables). These three umbrella functions organize diverse tasks, each with distinct inference properties, compositional demands, and distributional positions in the model’s output space. Existing literature identifies several subtypes that map naturally onto these categories \parencitebommasani2021opportunities, mialon2023augmented.

Critique output is structurally efficient to generate. Nested within critique are:

  • Flaws of others. This approach favors the strategy to ask for flaws or errors in the outputs of other agents, resulting in constructions that are frequent in training corpora thus placing them near the peak of p()𝑝p(\cdot)italic_p ( ⋅ ). Even shallow decoding heuristics retrieve frequent patterns with high fluency and relevance \parencitewei2022chain.

  • Classification and Disambiguation, such as assigning sentiment, stance, or intent. These tasks resolve ambiguity and often underlie evaluation pipelines \parencitemialon2023augmented.

  • Restatement and Summarization, which surface structural coherence or hidden biases by rephrasing or compressing content. When used diagnostically, they reveal implicit assumptions or inconsistencies \parencitemaynez2020faithfulness.

Ideation output demands compositional novelty. Prompts that ask the model to hypothesize mechanisms, imagine alternatives, or propose designs typically land in the tail of the output distribution. Generating them requires broader exploration (via large beam width or elevated temperature) and exhibits greater output variance.

Within ideation, we find:

  • Instruction and Procedural Guidance, where the model scaffolds user understanding or explains concepts in sequence. These tasks require didactic clarity and often invoke implicit audience modeling \parenciteouyang2022training.

  • Meta-Reasoning and Strategy Output, which includes multi-step planning, evaluating hypotheses, or chain-of-thought reasoning. These outputs require recursive coherence and longer dependency tracking \parencitewei2022chain.

Product-oriented output targets the generation of external artifacts: source code, formatted markup, structured data, or interactive dialogue. These tasks often carry hard constraints and precision demands. Simple forms (e.g., boilerplate code) reside in high-probability zones, while structurally complex or compositional outputs require deeper exploration.

Included in this class are:

  • Formalism Translation, such as converting text to JSON, SQL, or LaTeX. This requires syntax-aligned generation and tight coupling between prompt and output form \parencitereynolds2021prompt.

  • Retrieval-Simulation, where the model reproduces facts or references learned during pretraining. These outputs appear fluent but are not grounded in current truth, making them useful but epistemically fragile \parencitebommasani2021opportunities.

  • Social Interaction Simulation, which includes emulating customer support, roleplay, or therapeutic dialogue. These are product-like in that the output is consumed as experience or interface, and they require tone, persona, and context alignment \parencitejo2025proxyllm, park2023generative, song2024typing.

Crucially, requests to “find flaws” tend to align with high-probability lexical patterns that the model has seen many times during training (e.g., “One limitation is…,” “A potential confound is …,” “This argument assumes …”). These stigmergic patterns, i.e. emerging from indirect communication mediated by modifications of the environment [MARSH2008136], lie near the mode of p()𝑝p(\cdot)italic_p ( ⋅ ), so they are reachable with minimal search depth and are often found by even the cheapest heuristic, such as greedy decoding.

By contrast, requests for constructive, future-oriented solutions typically require compositional novelty: the model must synthesize domain facts, propose unseen mechanisms, and articulate actionable steps. Such completions reside in lower-probability regions of the distribution, forcing the decoder to explore a broader beam or to sample deeper into the tail, both of which are algorithmically and computationally more demanding. In short, critique lives near the peak; creativity lives in the tail, explaining the empirical asymmetry in generation efficiency that we observe.

2.4 Discursive Network Formalization

To systematically study the phenomenon of invalidation, we propose a formal model that quantifies how invalidations propagate within a discursive network. This model considers actors as nodes in a network, with edges representing the exchange of statements. The goal is to understand how actors influence each other and how invalidations spread, and if and how it reaches an equilibrium state. The subsequent sections will detail the analytical machinery, yielding quantiative information, for studying invalidation in both human and artificial contexts.

Definition 2.1.

Discourse. In the context of a discursive network, discourse refers to the structured process of communication and interaction between actors, A={a1,a2,,an}𝐴subscript𝑎1subscript𝑎2subscript𝑎𝑛A=\{a_{1},a_{2},\ldots,a_{n}\}italic_A = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, through which they exchange, validate, invalidate, and attempt to persuade each other regarding the truth or falsity of a set of statements S={s1,s2,,sm}𝑆subscript𝑠1subscript𝑠2subscript𝑠𝑚S=\{s_{1},s_{2},\ldots,s_{m}\}italic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }. Discourse encompasses all forms of communication C={Cij}𝐶subscript𝐶𝑖𝑗C=\{C_{ij}\}italic_C = { italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT } between actors, where beliefs BiSsubscript𝐵𝑖𝑆B_{i}\subseteq Sitalic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_S are shared, challenged, or reinforced, as well as the mechanisms of invalidation I={Iij}𝐼subscript𝐼𝑖𝑗I=\{I_{ij}\}italic_I = { italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT }, and persuasion P={Pij}𝑃subscript𝑃𝑖𝑗P=\{P_{ij}\}italic_P = { italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT }, which influence the evolution of each actor’s belief set. The outcome of discourse is governed by the update rules Ujsubscript𝑈𝑗U_{j}italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which dictate how actors revise their beliefs based on the interactions they engage in.

Definition 2.2.

Discursive Network. A discursive network is a formal structure N=(A,S,P,I,C,B,U,G)𝑁𝐴𝑆𝑃𝐼𝐶𝐵𝑈𝐺N=(A,S,P,I,C,B,U,G)italic_N = ( italic_A , italic_S , italic_P , italic_I , italic_C , italic_B , italic_U , italic_G ) where:

  • A={a1,a2,,an}𝐴subscript𝑎1subscript𝑎2subscript𝑎𝑛A=\{a_{1},a_{2},\ldots,a_{n}\}italic_A = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } is the set of actors participating in the discourse.

  • S={s1,s2,,sm}𝑆subscript𝑠1subscript𝑠2subscript𝑠𝑚S=\{s_{1},s_{2},\ldots,s_{m}\}italic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_s start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } is the set of possible statements, where each statement can be either true or false.

  • P={Piji,j{1,2,,n}}𝑃conditional-setsubscript𝑃𝑖𝑗𝑖𝑗12𝑛P=\{P_{ij}\mid i,j\in\{1,2,\ldots,n\}\}italic_P = { italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∣ italic_i , italic_j ∈ { 1 , 2 , … , italic_n } } represents the persuasion functions, where Pij(sk)subscript𝑃𝑖𝑗subscript𝑠𝑘P_{ij}(s_{k})italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) gives the likelihood that actor ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT will adopt statement sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT after receiving communication from actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  • I={Iiji,j{1,2,,n}}𝐼conditional-setsubscript𝐼𝑖𝑗𝑖𝑗12𝑛I=\{I_{ij}\mid i,j\in\{1,2,\ldots,n\}\}italic_I = { italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∣ italic_i , italic_j ∈ { 1 , 2 , … , italic_n } } denotes invalidations, where Iij(sk,sl)subscript𝐼𝑖𝑗subscript𝑠𝑘subscript𝑠𝑙I_{ij}(s_{k},s_{l})italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) signifies actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT invalidating a statement sksubscript𝑠𝑘s_{k}italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT held by actor ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT using a contradictory statement slsubscript𝑠𝑙s_{l}italic_s start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT.

  • C={Ciji,j{1,2,,n}}𝐶conditional-setsubscript𝐶𝑖𝑗𝑖𝑗12𝑛C=\{C_{ij}\mid i,j\in\{1,2,\ldots,n\}\}italic_C = { italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∣ italic_i , italic_j ∈ { 1 , 2 , … , italic_n } } represents the communications between actors, where Cijsubscript𝐶𝑖𝑗C_{ij}italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the set of statements communicated from actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to actor ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

  • B={B1,B2,,Bn}𝐵subscript𝐵1subscript𝐵2subscript𝐵𝑛B=\{B_{1},B_{2},\ldots,B_{n}\}italic_B = { italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } represents the belief sets of the actors, where BiSsubscript𝐵𝑖𝑆B_{i}\subseteq Sitalic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⊆ italic_S denotes the set of statements believed to be true by actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

  • U={Ujj{1,2,,n}}𝑈conditional-setsubscript𝑈𝑗𝑗12𝑛U=\{U_{j}\mid j\in\{1,2,\ldots,n\}\}italic_U = { italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_j ∈ { 1 , 2 , … , italic_n } } is the set of update rules that define how each actor’s belief set Bjsubscript𝐵𝑗B_{j}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is modified in response to communications and invalidations.

  • G={G1,G2,,Gn}𝐺subscript𝐺1subscript𝐺2subscript𝐺𝑛G=\{G_{1},G_{2},\ldots,G_{n}\}italic_G = { italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } represents the goal functions of the actors, with Gi:A2S:subscript𝐺𝑖𝐴superscript2𝑆G_{i}:A\to 2^{S}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_A → 2 start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT specifying the set of statements actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT seeks to convince other actors to believe.

The discursive network models the dynamics of belief formation, communication, persuasion, and invalidation among actors within a formal discourse setting.

Example. Consider a simple scenario with three actors A={a1,a2,a3}𝐴subscript𝑎1subscript𝑎2subscript𝑎3A=\{a_{1},a_{2},a_{3}\}italic_A = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } and two statements S={s1,s2}𝑆subscript𝑠1subscript𝑠2S=\{s_{1},s_{2}\}italic_S = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }. Actor a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT believes s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (B1={s1}subscript𝐵1subscript𝑠1B_{1}=\{s_{1}\}italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }) and wants a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and a3subscript𝑎3a_{3}italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT to also believe s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (G1(a2)=G1(a3)={s1}subscript𝐺1subscript𝑎2subscript𝐺1subscript𝑎3subscript𝑠1G_{1}(a_{2})=G_{1}(a_{3})=\{s_{1}\}italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }). Actor a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT believes s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (B2={s2}subscript𝐵2subscript𝑠2B_{2}=\{s_{2}\}italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }) and wants a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a3subscript𝑎3a_{3}italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT to believe s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (G2(a1)=G2(a3)={s2}subscript𝐺2subscript𝑎1subscript𝐺2subscript𝑎3subscript𝑠2G_{2}(a_{1})=G_{2}(a_{3})=\{s_{2}\}italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) = { italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }). Actor a3subscript𝑎3a_{3}italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is initially neutral (B3=subscript𝐵3B_{3}=\emptysetitalic_B start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = ∅). Actor a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT communicates C12={s1}subscript𝐶12subscript𝑠1C_{12}=\{s_{1}\}italic_C start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT = { italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } to a2subscript𝑎2a_{2}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, who invalidates s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by presenting I21(s1,s2)subscript𝐼21subscript𝑠1subscript𝑠2I_{21}(s_{1},s_{2})italic_I start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Actor a3subscript𝑎3a_{3}italic_a start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, observing this interaction, updates their belief set based on the persuasion functions and update rules. This framework models the propagation of invalidation within a discursive network, capturing the dynamics of belief, communication, and influence. By formalizing these interactions, we can analyze and predict how invalidation affects the acceptance and rejection of statements among actors in the network.

2.5 Modeling Discursive Networks

2.5.1 Single-Network Two-State Model

Let n=|A|𝑛𝐴n=|A|italic_n = | italic_A | be the actor count in the collapsed network with two mutually exclusive statements S={r,f}𝑆𝑟𝑓S=\{r,f\}italic_S = { italic_r , italic_f }, where r𝑟ritalic_r denotes a true statement and f𝑓fitalic_f a false one. For comparison with empirical simulations we work exclusively with proportions. The population state at time t𝑡titalic_t is the column vector

𝝅(t)=(πr(t),πf(t))𝖳,πr(t)=T(t)n,πf(t)=F(t)n=1πr(t),formulae-sequence𝝅𝑡superscriptsubscript𝜋𝑟𝑡subscript𝜋𝑓𝑡𝖳formulae-sequencesubscript𝜋𝑟𝑡𝑇𝑡𝑛subscript𝜋𝑓𝑡𝐹𝑡𝑛1subscript𝜋𝑟𝑡\boldsymbol{\pi}(t)=\bigl{(}\pi_{r}(t),\pi_{f}(t)\bigr{)}^{\mathsf{T}},\qquad% \pi_{r}(t)=\frac{T(t)}{n},\pi_{f}(t)=\frac{F(t)}{n}=1-\pi_{r}(t),bold_italic_π ( italic_t ) = ( italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) , italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_T ( italic_t ) end_ARG start_ARG italic_n end_ARG , italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_F ( italic_t ) end_ARG start_ARG italic_n end_ARG = 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) ,

where T(t)𝑇𝑡T(t)italic_T ( italic_t ) and F(t)𝐹𝑡F(t)italic_F ( italic_t ) are the respective counts of actors endorsing r𝑟ritalic_r and f𝑓fitalic_f. Micro-level flips are characterized by the probabilities p𝑝pitalic_p for rf𝑟𝑓r\rightarrow fitalic_r → italic_f and q𝑞qitalic_q for fr𝑓𝑟f\rightarrow ritalic_f → italic_r. These induce the population-level transition matrix

T=(1pqp1q),𝝅(t+1)=T𝝅(t).formulae-sequence𝑇matrix1𝑝𝑞𝑝1𝑞𝝅𝑡1𝑇𝝅𝑡T=\begin{pmatrix}1-p&q\\ p&1-q\end{pmatrix},\qquad\boldsymbol{\pi}(t+1)=T\,\boldsymbol{\pi}(t).italic_T = ( start_ARG start_ROW start_CELL 1 - italic_p end_CELL start_CELL italic_q end_CELL end_ROW start_ROW start_CELL italic_p end_CELL start_CELL 1 - italic_q end_CELL end_ROW end_ARG ) , bold_italic_π ( italic_t + 1 ) = italic_T bold_italic_π ( italic_t ) . (6)

The mapping of this model to Definition 2.2 is provided in Table 1.

Table 1: Mapping of discursive network elements to the single-network invalidation problem.
Element in N𝑁Nitalic_N Instantiation in single-network model
A𝐴Aitalic_A Unchanged actor set; proportions refer to n=|A|𝑛𝐴n=|A|italic_n = | italic_A |.
S={r,f}𝑆𝑟𝑓S=\{r,f\}italic_S = { italic_r , italic_f } Binary, mutually exclusive statements (true vs. false).
Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT p𝑝pitalic_p if Bj={r}subscript𝐵𝑗𝑟B_{j}=\{r\}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_r } and Bi={f}subscript𝐵𝑖𝑓B_{i}=\{f\}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_f }; q𝑞qitalic_q if Bj={f}subscript𝐵𝑗𝑓B_{j}=\{f\}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_f } and Bi={r}subscript𝐵𝑖𝑟B_{i}=\{r\}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_r }.
Iijsubscript𝐼𝑖𝑗I_{ij}italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Contradiction if BiBjsubscript𝐵𝑖subscript𝐵𝑗B_{i}\neq B_{j}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.
Cijsubscript𝐶𝑖𝑗C_{ij}italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Message containing Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT {r}𝑟\{r\}{ italic_r } or {f}𝑓\{f\}{ italic_f } for each actor.
Ujsubscript𝑈𝑗U_{j}italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Switches Bjsubscript𝐵𝑗B_{j}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with the corresponding probability, otherwise leaves it unchanged.
Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Persuade others to adopt actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s current belief.
Lemma 2.2 (Single-network invalidation propagation).

Let the single-network proportion state evolve according to

𝝅(t+1)=T𝝅(t),T=(1pqp1q),𝝅(t)=(πr(t)πf(t)),πr(t)+πf(t)=1,formulae-sequence𝝅𝑡1𝑇𝝅𝑡formulae-sequence𝑇matrix1𝑝𝑞𝑝1𝑞formulae-sequence𝝅𝑡matrixsubscript𝜋𝑟𝑡subscript𝜋𝑓𝑡subscript𝜋𝑟𝑡subscript𝜋𝑓𝑡1\boldsymbol{\pi}(t+1)=T\,\boldsymbol{\pi}(t),\qquad T=\begin{pmatrix}1-p&q\\[4% .0pt] p&1-q\end{pmatrix},\qquad\boldsymbol{\pi}(t)=\begin{pmatrix}\pi_{r}(t)\\ \pi_{f}(t)\end{pmatrix},\pi_{r}(t)+\pi_{f}(t)=1,bold_italic_π ( italic_t + 1 ) = italic_T bold_italic_π ( italic_t ) , italic_T = ( start_ARG start_ROW start_CELL 1 - italic_p end_CELL start_CELL italic_q end_CELL end_ROW start_ROW start_CELL italic_p end_CELL start_CELL 1 - italic_q end_CELL end_ROW end_ARG ) , bold_italic_π ( italic_t ) = ( start_ARG start_ROW start_CELL italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ) , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) + italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) = 1 , (7)

with flip probabilities p,q(0,1)𝑝𝑞01p,q\in(0,1)italic_p , italic_q ∈ ( 0 , 1 ). The system has a unique fixed point

𝝅=(qp+qpp+q),superscript𝝅matrix𝑞𝑝𝑞𝑝𝑝𝑞\boldsymbol{\pi}^{*}=\begin{pmatrix}\dfrac{q}{p+q}\\[6.0pt] \dfrac{p}{p+q}\end{pmatrix},bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( start_ARG start_ROW start_CELL divide start_ARG italic_q end_ARG start_ARG italic_p + italic_q end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG italic_p + italic_q end_ARG end_CELL end_ROW end_ARG ) ,

and the second eigenvalue of T𝑇Titalic_T equals 1pq1𝑝𝑞1-p-q1 - italic_p - italic_q, whose modulus is strictly smaller than 1111; hence the Markov chain converges geometrically to 𝝅superscript𝝅\boldsymbol{\pi}^{*}bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from any initial distribution.

Proof.

A fixed point satisfies 𝝅=T𝝅𝝅𝑇𝝅\boldsymbol{\pi}=T\,\boldsymbol{\pi}bold_italic_π = italic_T bold_italic_π. Writing 𝝅𝖳=(πr,πf)superscript𝝅𝖳subscript𝜋𝑟subscript𝜋𝑓\boldsymbol{\pi}^{\mathsf{T}}=(\pi_{r},\pi_{f})bold_italic_π start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = ( italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) and expanding gives

πrsubscript𝜋𝑟\displaystyle\pi_{r}italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT =(1p)πr+qπf,absent1𝑝subscript𝜋𝑟𝑞subscript𝜋𝑓\displaystyle=(1-p)\,\pi_{r}+q\,\pi_{f},= ( 1 - italic_p ) italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_q italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ,
πfsubscript𝜋𝑓\displaystyle\pi_{f}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT =pπr+(1q)πf.absent𝑝subscript𝜋𝑟1𝑞subscript𝜋𝑓\displaystyle=p\,\pi_{r}+(1-q)\,\pi_{f}.= italic_p italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + ( 1 - italic_q ) italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT .

Because πf=1πrsubscript𝜋𝑓1subscript𝜋𝑟\pi_{f}=1-\pi_{r}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the first line reduces to

pπr=qπf=q(1πr),πr=qp+q,πf=1πr=pp+q.formulae-sequence𝑝subscript𝜋𝑟𝑞subscript𝜋𝑓𝑞1subscript𝜋𝑟subscript𝜋𝑟𝑞𝑝𝑞subscript𝜋𝑓1subscript𝜋𝑟𝑝𝑝𝑞p\,\pi_{r}=q\,\pi_{f}=q\,(1-\pi_{r}),\quad\Longrightarrow\quad\pi_{r}=\frac{q}% {p+q},\qquad\pi_{f}=1-\pi_{r}=\frac{p}{p+q}.italic_p italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_q italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = italic_q ( 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) , ⟹ italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = divide start_ARG italic_q end_ARG start_ARG italic_p + italic_q end_ARG , italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = divide start_ARG italic_p end_ARG start_ARG italic_p + italic_q end_ARG .

Thus the fixed point is unique. The characteristic polynomial of T𝑇Titalic_T is λ2(2pq)λ+(1pq)=0superscript𝜆22𝑝𝑞𝜆1𝑝𝑞0\lambda^{2}-(2-p-q)\lambda+(1-p-q)=0italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( 2 - italic_p - italic_q ) italic_λ + ( 1 - italic_p - italic_q ) = 0, whose roots are λ1=1subscript𝜆11\lambda_{1}=1italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and λ2=1pqsubscript𝜆21𝑝𝑞\lambda_{2}=1-p-qitalic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 - italic_p - italic_q. Because 0<p+q<10𝑝𝑞10<p+q<10 < italic_p + italic_q < 1, we have |λ2|<1subscript𝜆21|\lambda_{2}|<1| italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | < 1; hence Tt𝝅𝟏𝖳superscript𝑇𝑡superscript𝝅superscript1𝖳T^{t}\to\boldsymbol{\pi}^{*}\mathbf{1}^{\mathsf{T}}italic_T start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT → bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT as t𝑡t\to\inftyitalic_t → ∞, so every trajectory converges to 𝝅superscript𝝅\boldsymbol{\pi}^{*}bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. ∎

Interpretation of Lemma 2.2.

In this binary model the two flip probabilities satisfy p+q=1𝑝𝑞1p+q=1italic_p + italic_q = 1, meaning every update attempt switches an actor’s belief with probability one. The fixed point then simplifies to 𝝅=(q,p)𝖳superscript𝝅superscript𝑞𝑝𝖳\boldsymbol{\pi}^{*}=(q,\,p)^{\mathsf{T}}bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_q , italic_p ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT: the long-run proportion of actors endorsing r𝑟ritalic_r equals the single parameter q𝑞qitalic_q, while the proportion endorsing f𝑓fitalic_f equals p𝑝pitalic_p. Thus the equilibrium distribution mirrors the flip probabilities directly; increasing p𝑝pitalic_p (the propensity to abandon r𝑟ritalic_r) linearly increases the eventual share of f𝑓fitalic_f believers and decreases that of r𝑟ritalic_r believers by the same amount.

2.5.2 Single-Network Emergent Invalidation Model

The two-state single-network model sets sthe stage for the analysis of the emergence of invalidations in a discursive network. To accomplish this, we first endow a single discursive network with per-statement fabrication and internal correction. This captures the behaviour of a single-instance LLM generating new text: invalidations are injected at hazard λ𝜆\lambdaitalic_λ, while subsequent self-reflections (or post-processing heuristics) invalidate a fraction of the false statements at hazard q𝑞qitalic_q.

Setup.

Let the network be N𝑁Nitalic_N with actor set A={a1,,an}𝐴subscript𝑎1subscript𝑎𝑛A=\{a_{1},\dots,a_{n}\}italic_A = { italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } and n=|A|𝑛𝐴n=|A|italic_n = | italic_A |. At any time t0𝑡subscriptabsent0t\in\mathbb{Z}_{\geq 0}italic_t ∈ blackboard_Z start_POSTSUBSCRIPT ≥ 0 end_POSTSUBSCRIPT, we track the counts T(t)𝑇𝑡T(t)italic_T ( italic_t ) and F(t)𝐹𝑡F(t)italic_F ( italic_t ) of actors endorsing true and false statements, respectively, with T(t)+F(t)=n𝑇𝑡𝐹𝑡𝑛T(t)+F(t)=nitalic_T ( italic_t ) + italic_F ( italic_t ) = italic_n.

Working with raw counts becomes cumbersome when comparing networks of different sizes or analyzing asymptotic behavior. By converting to proportions, we obtain: (i) Scale invariance: Networks with 100 or 10,000 actors can be compared directly, (ii) Probabilistic interpretation: Proportions represent the probability that a randomly selected actor holds a given belief, and (iii) Mathematical tractability: Fixed-point analysis and stability results are cleaner in normalized coordinates.

Definition 2.3 (Normalized state).

The proportion state (or normalized state) of the single network at time t𝑡titalic_t is the vector

𝝅(t)=(πT(t),πF(t)),πT(t)=T(t)n,πF(t)=F(t)n,formulae-sequence𝝅𝑡subscript𝜋𝑇𝑡subscript𝜋𝐹𝑡formulae-sequencesubscript𝜋𝑇𝑡𝑇𝑡𝑛subscript𝜋𝐹𝑡𝐹𝑡𝑛\boldsymbol{\pi}(t)=\bigl{(}\pi_{T}(t),\pi_{F}(t)\bigr{)},\qquad\pi_{T}(t)=% \frac{T(t)}{n},\pi_{F}(t)=\frac{F(t)}{n},bold_italic_π ( italic_t ) = ( italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t ) , italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t ) ) , italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_T ( italic_t ) end_ARG start_ARG italic_n end_ARG , italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_F ( italic_t ) end_ARG start_ARG italic_n end_ARG ,

where πT(t)subscript𝜋𝑇𝑡\pi_{T}(t)italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t ) and πF(t)subscript𝜋𝐹𝑡\pi_{F}(t)italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t ) represent the fractions of actors endorsing true and false statements, respectively. The constraint πT(t)+πF(t)=1subscript𝜋𝑇𝑡subscript𝜋𝐹𝑡1\pi_{T}(t)+\pi_{F}(t)=1italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t ) + italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t ) = 1 is automatically preserved, reflecting that every actor holds exactly one belief at each time step.

Stochastic primitives.

Events are scaled per statement so that λ𝜆\lambdaitalic_λ, p𝑝pitalic_p and q𝑞qitalic_q remain commensurate.

Fabrication (invalidation).

In this case, X𝑋Xitalic_X represents the number of new falsehoods generated. The Poisson distribution is used to model the number of events occurring in a fixed interval of time, given a known average rate. Each true statement is independently falsified during [t,t+1)𝑡𝑡1[t,t+1)[ italic_t , italic_t + 1 ):

X(t)Poisson(λT(t)).𝑋𝑡Poisson𝜆𝑇𝑡X(t)~{}\text{Poisson}\bigl{(}\lambda\,T(t)\bigr{)}.italic_X ( italic_t ) Poisson ( italic_λ italic_T ( italic_t ) ) .
Internal flips.

Z𝑍Zitalic_Z represents the number of true statements that become false with a fixed probability p𝑝pitalic_p of becoming false (i.e. p𝑝pitalic_p is the intrinsic truth→false hazard). W𝑊Witalic_W represents the number of false statements that are corrected to become true with a fixed probability q𝑞qitalic_q of being corrected (q𝑞qitalic_q models spontaneous acknowledgement or repair). Both follow a Binomial distribution. Truths can degrade and falsehoods can self-correct:

Z(t)Binomial(T(t),p),W(t)Binomial(F(t),q).formulae-sequencesimilar-to𝑍𝑡Binomial𝑇𝑡𝑝similar-to𝑊𝑡Binomial𝐹𝑡𝑞Z(t)\sim\mathrm{Binomial}\bigl{(}T(t),p\bigr{)},\quad W(t)\sim\mathrm{Binomial% }\bigl{(}F(t),q\bigr{)}.italic_Z ( italic_t ) ∼ roman_Binomial ( italic_T ( italic_t ) , italic_p ) , italic_W ( italic_t ) ∼ roman_Binomial ( italic_F ( italic_t ) , italic_q ) .
Update equations.

Define

ΔT(t)=Z(t)+W(t),ΔF(t)=X(t)+Z(t)W(t).formulae-sequenceΔ𝑇𝑡𝑍𝑡𝑊𝑡Δ𝐹𝑡𝑋𝑡𝑍𝑡𝑊𝑡\Delta T(t)=-Z(t)+W(t),\qquad\Delta F(t)=X(t)+Z(t)-W(t).roman_Δ italic_T ( italic_t ) = - italic_Z ( italic_t ) + italic_W ( italic_t ) , roman_Δ italic_F ( italic_t ) = italic_X ( italic_t ) + italic_Z ( italic_t ) - italic_W ( italic_t ) .

Then

T(t+1)𝑇𝑡1\displaystyle T(t+1)italic_T ( italic_t + 1 ) =T(t)+ΔT(t),absent𝑇𝑡Δ𝑇𝑡\displaystyle=T(t)+\Delta T(t),= italic_T ( italic_t ) + roman_Δ italic_T ( italic_t ) , πT(t+1)subscript𝜋𝑇𝑡1\displaystyle\pi_{T}(t+1)italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t + 1 ) =πT(t)+ΔT(t)n,absentsubscript𝜋𝑇𝑡Δ𝑇𝑡𝑛\displaystyle=\pi_{T}(t)+\frac{\Delta T(t)}{n},= italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t ) + divide start_ARG roman_Δ italic_T ( italic_t ) end_ARG start_ARG italic_n end_ARG , (8)
F(t+1)𝐹𝑡1\displaystyle F(t+1)italic_F ( italic_t + 1 ) =F(t)+ΔF(t),absent𝐹𝑡Δ𝐹𝑡\displaystyle=F(t)+\Delta F(t),= italic_F ( italic_t ) + roman_Δ italic_F ( italic_t ) , πF(t+1)subscript𝜋𝐹𝑡1\displaystyle\pi_{F}(t+1)italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t + 1 ) =πF(t)+ΔF(t)n,absentsubscript𝜋𝐹𝑡Δ𝐹𝑡𝑛\displaystyle=\pi_{F}(t)+\frac{\Delta F(t)}{n},= italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t ) + divide start_ARG roman_Δ italic_F ( italic_t ) end_ARG start_ARG italic_n end_ARG , (9)

with πT(t+1)+πF(t+1)=1subscript𝜋𝑇𝑡1subscript𝜋𝐹𝑡11\pi_{T}(t+1)+\pi_{F}(t+1)=1italic_π start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_t + 1 ) + italic_π start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_t + 1 ) = 1 preserved.

The mapping of this model to Definition 2.2 is provided in Table 2.

Table 2: Mapping of discursive-network elements to the single-network emergent-invalidation model. Unless noted otherwise, all rates are per statement.
Element in N𝑁Nitalic_N Instantiation in emergent-invalidation model
A𝐴Aitalic_A Fixed actor set; proportions refer to n=|A|𝑛𝐴n=\lvert A\rvertitalic_n = | italic_A |.
S={r,f}𝑆𝑟𝑓S=\{r,f\}italic_S = { italic_r , italic_f } Binary statements (r𝑟ritalic_r = true, f𝑓fitalic_f = false).
Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Spontaneous flips: p𝑝pitalic_p for rf𝑟𝑓r\to fitalic_r → italic_f (appears in Z(t)𝑍𝑡Z(t)italic_Z ( italic_t )), q𝑞qitalic_q for fr𝑓𝑟f\to ritalic_f → italic_r (appears in W(t)𝑊𝑡W(t)italic_W ( italic_t )).
Iijsubscript𝐼𝑖𝑗I_{ij}italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Internal invalidation; realized as self-correction Binomial(F(t),q)Binomial𝐹𝑡𝑞\mathrm{Binomial}(F(t),q)roman_Binomial ( italic_F ( italic_t ) , italic_q ) when i=j𝑖𝑗i=jitalic_i = italic_j. (No cross-actor invalidation in the single-network setting.)
Cijsubscript𝐶𝑖𝑗C_{ij}italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Message from aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT containing Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Current belief of actor aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: {r}𝑟\{r\}{ italic_r } or {f}𝑓\{f\}{ italic_f }.
Ujsubscript𝑈𝑗U_{j}italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Update rule applying the three hazards:
X(t)Poisson(λT(t))similar-to𝑋𝑡Poisson𝜆𝑇𝑡X(t)\sim\mathrm{Poisson}(\lambda T(t))italic_X ( italic_t ) ∼ roman_Poisson ( italic_λ italic_T ( italic_t ) ) (fabrications)
Z(t)Binomial(T(t),p)similar-to𝑍𝑡Binomial𝑇𝑡𝑝Z(t)\sim\mathrm{Binomial}(T(t),p)italic_Z ( italic_t ) ∼ roman_Binomial ( italic_T ( italic_t ) , italic_p ) (truth → false)
W(t)Binomial(F(t),q)similar-to𝑊𝑡Binomial𝐹𝑡𝑞W(t)\sim\mathrm{Binomial}(F(t),q)italic_W ( italic_t ) ∼ roman_Binomial ( italic_F ( italic_t ) , italic_q ) (false → true)
Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Goal: persuade all other actors to adopt Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
Lemma 2.3 (Single-network invalidation with fabrication).

Let the single-network proportion state evolve according to

𝝅(t+1)=Tλ𝝅(t),Tλ=(1(p+λ)qp+λ1q),𝝅(t)=(πr(t)πf(t)),πr(t)+πf(t)=1,formulae-sequence𝝅𝑡1subscript𝑇𝜆𝝅𝑡formulae-sequencesubscript𝑇𝜆matrix1𝑝𝜆𝑞𝑝𝜆1𝑞formulae-sequence𝝅𝑡matrixsubscript𝜋𝑟𝑡subscript𝜋𝑓𝑡subscript𝜋𝑟𝑡subscript𝜋𝑓𝑡1\boldsymbol{\pi}(t+1)=T_{\lambda}\,\boldsymbol{\pi}(t),\qquad T_{\lambda}=% \begin{pmatrix}1-(p+\lambda)&q\\ p+\lambda&1-q\end{pmatrix},\qquad\boldsymbol{\pi}(t)=\begin{pmatrix}\pi_{r}(t)% \\[2.0pt] \pi_{f}(t)\end{pmatrix},\pi_{r}(t)+\pi_{f}(t)=1,bold_italic_π ( italic_t + 1 ) = italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT bold_italic_π ( italic_t ) , italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = ( start_ARG start_ROW start_CELL 1 - ( italic_p + italic_λ ) end_CELL start_CELL italic_q end_CELL end_ROW start_ROW start_CELL italic_p + italic_λ end_CELL start_CELL 1 - italic_q end_CELL end_ROW end_ARG ) , bold_italic_π ( italic_t ) = ( start_ARG start_ROW start_CELL italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ) , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) + italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) = 1 ,

where p,q(0,1)𝑝𝑞01p,q\in(0,1)italic_p , italic_q ∈ ( 0 , 1 ) are the intrinsic flip probabilities, λ(0,1p)𝜆01𝑝\lambda\in\bigl{(}0,1-p\bigr{)}italic_λ ∈ ( 0 , 1 - italic_p ) is the per-statement fabrication probability (rf𝑟𝑓r\to fitalic_r → italic_f), and p+λ+q<1𝑝𝜆𝑞1p+\lambda+q<1italic_p + italic_λ + italic_q < 1. Then:

1. The system has a unique fixed point

𝝅=(qp+λ+qp+λp+λ+q).superscript𝝅matrix𝑞𝑝𝜆𝑞𝑝𝜆𝑝𝜆𝑞\boldsymbol{\pi}^{*}=\begin{pmatrix}\dfrac{q}{p+\lambda+q}\\[6.0pt] \dfrac{p+\lambda}{p+\lambda+q}\end{pmatrix}.bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( start_ARG start_ROW start_CELL divide start_ARG italic_q end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG end_CELL end_ROW end_ARG ) .

2. The second eigenvalue of Tλsubscript𝑇𝜆T_{\lambda}italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is 1(p+λ+q)1𝑝𝜆𝑞1-(p+\lambda+q)1 - ( italic_p + italic_λ + italic_q ), whose modulus is strictly smaller than 1111; hence the Markov chain converges geometrically to 𝝅superscript𝝅\boldsymbol{\pi}^{*}bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from any initial distribution.

Proof.

A fixed point satisfies 𝝅=Tλ𝝅𝝅subscript𝑇𝜆𝝅\boldsymbol{\pi}=T_{\lambda}\,\boldsymbol{\pi}bold_italic_π = italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT bold_italic_π. Writing 𝝅𝖳=(πr,πf)superscript𝝅𝖳subscript𝜋𝑟subscript𝜋𝑓\boldsymbol{\pi}^{\mathsf{T}}=(\pi_{r},\pi_{f})bold_italic_π start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT = ( italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) and expanding gives

πrsubscript𝜋𝑟\displaystyle\pi_{r}italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT =(1pλ)πr+qπf,absent1𝑝𝜆subscript𝜋𝑟𝑞subscript𝜋𝑓\displaystyle=(1-p-\lambda)\,\pi_{r}+q\,\pi_{f},= ( 1 - italic_p - italic_λ ) italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_q italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ,
πfsubscript𝜋𝑓\displaystyle\pi_{f}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT =(p+λ)πr+(1q)πf.absent𝑝𝜆subscript𝜋𝑟1𝑞subscript𝜋𝑓\displaystyle=(p+\lambda)\,\pi_{r}+(1-q)\,\pi_{f}.= ( italic_p + italic_λ ) italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + ( 1 - italic_q ) italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT .

Because πf=1πrsubscript𝜋𝑓1subscript𝜋𝑟\pi_{f}=1-\pi_{r}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT = 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, the first line reduces to (p+λ)πr=q(1πr),𝑝𝜆subscript𝜋𝑟𝑞1subscript𝜋𝑟(p+\lambda)\,\pi_{r}=q\,(1-\pi_{r}),( italic_p + italic_λ ) italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = italic_q ( 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) , which yields the fixed point in the statement. The characteristic polynomial of Tλsubscript𝑇𝜆T_{\lambda}italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is λ2(2pλq)λ+(1pλq)=0superscript𝜆22𝑝𝜆𝑞𝜆1𝑝𝜆𝑞0\lambda^{2}-(2-p-\lambda-q)\lambda+(1-p-\lambda-q)=0italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( 2 - italic_p - italic_λ - italic_q ) italic_λ + ( 1 - italic_p - italic_λ - italic_q ) = 0, with roots λ1=1subscript𝜆11\lambda_{1}=1italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and λ2=1(p+λ+q)subscript𝜆21𝑝𝜆𝑞\lambda_{2}=1-(p+\lambda+q)italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 - ( italic_p + italic_λ + italic_q ). Since p+λ+q<1𝑝𝜆𝑞1p+\lambda+q<1italic_p + italic_λ + italic_q < 1, we have |λ2|<1subscript𝜆21|\lambda_{2}|<1| italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | < 1; therefore Tλt𝝅𝟏𝖳superscriptsubscript𝑇𝜆𝑡superscript𝝅superscript1𝖳T_{\lambda}^{\,t}\to\boldsymbol{\pi}^{*}\mathbf{1}^{\mathsf{T}}italic_T start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT → bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT bold_1 start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT as t𝑡t\to\inftyitalic_t → ∞, so every trajectory converges to 𝝅superscript𝝅\boldsymbol{\pi}^{*}bold_italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. ∎

Interpretation.

The fabrication term λ𝜆\lambdaitalic_λ simply augments the ordinary truth-to-false hazard p𝑝pitalic_p. Consequently the equilibrium share of false believers rises from p/(p+q)𝑝𝑝𝑞p/(p+q)italic_p / ( italic_p + italic_q ) (when λ=0𝜆0\lambda=0italic_λ = 0) to (p+λ)/(p+λ+q)𝑝𝜆𝑝𝜆𝑞(p+\lambda)/(p+\lambda+q)( italic_p + italic_λ ) / ( italic_p + italic_λ + italic_q ), while the speed of convergence slows as the spectral gap 1|1(p+λ+q)|11𝑝𝜆𝑞1-\lvert 1-(p+\lambda+q)\rvert1 - | 1 - ( italic_p + italic_λ + italic_q ) | narrows. Fabrication raises the inflow into the false state by λ𝜆\lambdaitalic_λ, while internal invalidation q𝑞qitalic_q is unchanged. If λ>q𝜆𝑞\lambda>qitalic_λ > italic_q the system becomes “invalidation-dominant,” mimicking an LLM that generates more new errors than it self-repairs, an empirical regime reported in LLMs (cf. Section 3).

2.5.3 Cross-Network Invalidation-Detection Model

Now that we have studied single networks with and without spontaneous invalidation emergence, the next natural question is how to reduce invalidations. As we will see, using multiple discursive networks reduces invalidations.

Let the two discursive networks be N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with actor sets A1={a11,,a1n1}subscript𝐴1subscript𝑎11subscript𝑎1subscript𝑛1A_{1}=\{a_{11},\dots,a_{1n_{1}}\}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_a start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT 1 italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } and A2={a21,,a2n2}subscript𝐴2subscript𝑎21subscript𝑎2subscript𝑛2A_{2}=\{a_{21},\dots,a_{2n_{2}}\}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { italic_a start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT 2 italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT }; write nk=|Ak|subscript𝑛𝑘subscript𝐴𝑘n_{k}=|A_{k}|italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |.

Definition 2.4 (Normalized state).

For each network Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT the proportion state at time t𝑡titalic_t is

𝝅k(t)=(πT,k(t),πF,k(t)),πT,k(t)=Tk(t)nk,πF,k(t)=Fk(t)nk,πT,k(t)+πF,k(t)=1,formulae-sequencesubscript𝝅𝑘𝑡subscript𝜋𝑇𝑘𝑡subscript𝜋𝐹𝑘𝑡formulae-sequencesubscript𝜋𝑇𝑘𝑡subscript𝑇𝑘𝑡subscript𝑛𝑘formulae-sequencesubscript𝜋𝐹𝑘𝑡subscript𝐹𝑘𝑡subscript𝑛𝑘subscript𝜋𝑇𝑘𝑡subscript𝜋𝐹𝑘𝑡1\boldsymbol{\pi}_{k}(t)=\bigl{(}\pi_{T,k}(t),\pi_{F,k}(t)\bigr{)},\qquad\pi_{T% ,k}(t)=\frac{T_{k}(t)}{n_{k}},\pi_{F,k}(t)=\frac{F_{k}(t)}{n_{k}},\pi_{T,k}(t)% +\pi_{F,k}(t)=1,bold_italic_π start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = ( italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT ( italic_t ) , italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT ( italic_t ) ) , italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT ( italic_t ) + italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT ( italic_t ) = 1 ,

where Tk(t)subscript𝑇𝑘𝑡T_{k}(t)italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) and Fk(t)subscript𝐹𝑘𝑡F_{k}(t)italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) are the counts of true and false statements, respectively.

Stochastic primitives.

All events are now specified so that their rates are comparable across networks regardless of size. In particular, fabrication is scaled by the current stock of true statements.

Falsehood generation (fabrication).

Each currently true statement in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is independently falsified during [t,t+1)𝑡𝑡1[t,t+1)[ italic_t , italic_t + 1 ). The total number of such events is

Xk(t)Poisson(λkTk(t)),similar-tosubscript𝑋𝑘𝑡Poissonsubscript𝜆𝑘subscript𝑇𝑘𝑡X_{k}(t)\sim\text{Poisson}\bigl{(}\lambda_{k}\,T_{k}(t)\bigr{)},italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ∼ Poisson ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ) ,

where λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the per-statement fabrication hazard.

Cross-network detection.

Each false statement in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is noticed by Njsubscript𝑁𝑗N_{j}italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with probability djksubscript𝑑𝑗𝑘d_{jk}italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT, so Yjk(t)Binomial(Fk(t),djk).similar-tosubscript𝑌𝑗𝑘𝑡Binomialsubscript𝐹𝑘𝑡subscript𝑑𝑗𝑘Y_{jk}(t)\sim\mathrm{Binomial}\bigl{(}F_{k}(t),d_{jk}\bigr{)}.italic_Y start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ( italic_t ) ∼ roman_Binomial ( italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ) .

Internal flips.

True statements spontaneously become false with probability pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and false statements self-correct with probability qksubscript𝑞𝑘q_{k}italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

Zk(t)Binomial(Tk(t),pk),Wk(t)Binomial(Fk(t),qk).formulae-sequencesimilar-tosubscript𝑍𝑘𝑡Binomialsubscript𝑇𝑘𝑡subscript𝑝𝑘similar-tosubscript𝑊𝑘𝑡Binomialsubscript𝐹𝑘𝑡subscript𝑞𝑘Z_{k}(t)\sim\mathrm{Binomial}\bigl{(}T_{k}(t),p_{k}\bigr{)},\quad W_{k}(t)\sim% \mathrm{Binomial}\bigl{(}F_{k}(t),q_{k}\bigr{)}.italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ∼ roman_Binomial ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ∼ roman_Binomial ( italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) .
Normalized update equations.

Let ΔTk(t)=Zk(t)+Wk(t)Δsubscript𝑇𝑘𝑡subscript𝑍𝑘𝑡subscript𝑊𝑘𝑡\Delta T_{k}(t)=-Z_{k}(t)+W_{k}(t)roman_Δ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = - italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) + italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) and ΔFk(t)=Xk(t)+Zk(t)Wk(t)Yjk(t)Δsubscript𝐹𝑘𝑡subscript𝑋𝑘𝑡subscript𝑍𝑘𝑡subscript𝑊𝑘𝑡subscript𝑌𝑗𝑘𝑡\Delta F_{k}(t)=X_{k}(t)+Z_{k}(t)-W_{k}(t)-Y_{jk}(t)roman_Δ italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) + italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) - italic_Y start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ( italic_t ). Dividing by nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT gives the proportion dynamics

πT,k(t+1)subscript𝜋𝑇𝑘𝑡1\displaystyle\pi_{T,k}(t+1)italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT ( italic_t + 1 ) =πT,k(t)+ΔTk(t)nk,absentsubscript𝜋𝑇𝑘𝑡Δsubscript𝑇𝑘𝑡subscript𝑛𝑘\displaystyle=\pi_{T,k}(t)+\frac{\Delta T_{k}(t)}{n_{k}},= italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT ( italic_t ) + divide start_ARG roman_Δ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , (10)
πF,k(t+1)subscript𝜋𝐹𝑘𝑡1\displaystyle\pi_{F,k}(t+1)italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT ( italic_t + 1 ) =πF,k(t)+ΔFk(t)nk,absentsubscript𝜋𝐹𝑘𝑡Δsubscript𝐹𝑘𝑡subscript𝑛𝑘\displaystyle=\pi_{F,k}(t)+\frac{\Delta F_{k}(t)}{n_{k}},= italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT ( italic_t ) + divide start_ARG roman_Δ italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , (11)

with πT,k(t+1)+πF,k(t+1)=1subscript𝜋𝑇𝑘𝑡1subscript𝜋𝐹𝑘𝑡11\pi_{T,k}(t+1)+\pi_{F,k}(t+1)=1italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT ( italic_t + 1 ) + italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT ( italic_t + 1 ) = 1 preserved automatically.

The mapping of this model to Definition 2.2 is provided in Table 3.

Table 3: Mapping of discursive-network elements to the cross-network invalidation-detection model. All hazard rates are per statement.
Element in N1N2subscript𝑁1subscript𝑁2N_{1}\cup N_{2}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Instantiation in the model
Aksubscript𝐴𝑘A_{k}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT Two disjoint actor sets A1,A2subscript𝐴1subscript𝐴2A_{1},A_{2}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT; proportions refer to nk=|Ak|subscript𝑛𝑘subscript𝐴𝑘n_{k}=\lvert A_{k}\rvertitalic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |.
S={r,f}𝑆𝑟𝑓S=\{r,f\}italic_S = { italic_r , italic_f } Binary statements shared by both networks (r𝑟ritalic_r = true, f𝑓fitalic_f = false).
Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Persuasion function. Within each network Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT it reduces to constant flip probabilities:    pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for rf𝑟𝑓r\to fitalic_r → italic_f (used in Zk(t)subscript𝑍𝑘𝑡Z_{k}(t)italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t )),    qksubscript𝑞𝑘q_{k}italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for fr𝑓𝑟f\to ritalic_f → italic_r (used in Wk(t)subscript𝑊𝑘𝑡W_{k}(t)italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t )). Across networks, persuasion acts only via Iijsubscript𝐼𝑖𝑗I_{ij}italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT with success probability djksubscript𝑑𝑗𝑘d_{jk}italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT.
Iijsubscript𝐼𝑖𝑗I_{ij}italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Cross-network invalidation: if Bij={f}subscript𝐵𝑖𝑗𝑓B_{ij}=\{f\}italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { italic_f } and the receiver belongs to the other network, the statement is detected and flipped with probability djksubscript𝑑𝑗𝑘d_{jk}italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT (realized through the random variable Yjk(t)subscript𝑌𝑗𝑘𝑡Y_{jk}(t)italic_Y start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ( italic_t )).
Cijsubscript𝐶𝑖𝑗C_{ij}italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Message sent from aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT carrying Bijsubscript𝐵𝑖𝑗B_{ij}italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. Communication enables both persuasion and detection.
Bijsubscript𝐵𝑖𝑗B_{ij}italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Belief of actor aijAksubscript𝑎𝑖𝑗subscript𝐴𝑘a_{ij}\in A_{k}italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT: {r}𝑟\{r\}{ italic_r } or {f}𝑓\{f\}{ italic_f }.
Ujsubscript𝑈𝑗U_{j}italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Update rule for actor ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that applies in the following order (i) fabrication Xk(t)Poisson(λkTk(t))similar-tosubscript𝑋𝑘𝑡Poissonsubscript𝜆𝑘subscript𝑇𝑘𝑡X_{k}(t)\sim\mathrm{Poisson}(\lambda_{k}T_{k}(t))italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ∼ roman_Poisson ( italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ), (ii) internal flips Zk(t),Wk(t)subscript𝑍𝑘𝑡subscript𝑊𝑘𝑡Z_{k}(t),W_{k}(t)italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) using pk,qksubscript𝑝𝑘subscript𝑞𝑘p_{k},q_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, (iii) cross-network detection Yjk(t)subscript𝑌𝑗𝑘𝑡Y_{jk}(t)italic_Y start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ( italic_t ) using djksubscript𝑑𝑗𝑘d_{jk}italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT. The parameter λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT therefore lives inside Ujsubscript𝑈𝑗U_{j}italic_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.
Gijsubscript𝐺𝑖𝑗G_{ij}italic_G start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT Goal: persuade every other actor (within and across networks) to adopt Bijsubscript𝐵𝑖𝑗B_{ij}italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT.
Lemma 2.4 (Dual-network invalidation propagation).

Let the proportion dynamics of Eqs. (10)-(11) be driven by parameters λk,djk,pk,qk(0,1)subscript𝜆𝑘subscript𝑑𝑗𝑘subscript𝑝𝑘subscript𝑞𝑘01\lambda_{k},d_{jk},p_{k},q_{k}\in(0,1)italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ ( 0 , 1 ). Assume the per-actor falsehood-generation rate satisfies the consistency condition

λk=djkpkpk+qk,subscript𝜆𝑘subscript𝑑𝑗𝑘subscript𝑝𝑘subscript𝑝𝑘subscript𝑞𝑘\lambda_{k}=d_{jk}\,\frac{p_{k}}{p_{k}+q_{k}},italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , (12)

which guarantees that the expected proportions sum to one. Then the Markov process has the mean fixed point

πF,k=λkdjk,πT,k=λkqkdjkpk,πT,k+πF,k=1.formulae-sequencesuperscriptsubscript𝜋𝐹𝑘subscript𝜆𝑘subscript𝑑𝑗𝑘formulae-sequencesuperscriptsubscript𝜋𝑇𝑘subscript𝜆𝑘subscript𝑞𝑘subscript𝑑𝑗𝑘subscript𝑝𝑘superscriptsubscript𝜋𝑇𝑘superscriptsubscript𝜋𝐹𝑘1\pi_{F,k}^{*}=\frac{\lambda_{k}}{d_{jk}},\qquad\pi_{T,k}^{*}=\frac{\lambda_{k}% \,q_{k}}{d_{jk}\,p_{k}},\qquad\pi_{T,k}^{*}+\pi_{F,k}^{*}=1.italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT end_ARG , italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG , italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1 .
Proof.

At equilibrium the expected changes vanish, so E[ΔTk]=E[ΔFk]=0𝐸delimited-[]Δsubscript𝑇𝑘𝐸delimited-[]Δsubscript𝐹𝑘0E[\Delta T_{k}]=E[\Delta F_{k}]=0italic_E [ roman_Δ italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = italic_E [ roman_Δ italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = 0. Using the distributional means

E[Xk]𝐸delimited-[]subscript𝑋𝑘\displaystyle E[X_{k}]italic_E [ italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] =λk(Poisson distribution),absentsubscript𝜆𝑘(Poisson distribution)\displaystyle=\lambda_{k}\quad\text{(Poisson distribution)},= italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (Poisson distribution) ,
E[Yjk]𝐸delimited-[]subscript𝑌𝑗𝑘\displaystyle E[Y_{jk}]italic_E [ italic_Y start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ] =Fkdjk(Binomial distribution),absentsubscript𝐹𝑘subscript𝑑𝑗𝑘(Binomial distribution)\displaystyle=F_{k}d_{jk}\quad\text{(Binomial distribution)},= italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT (Binomial distribution) ,
E[Zk]𝐸delimited-[]subscript𝑍𝑘\displaystyle E[Z_{k}]italic_E [ italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] =Tkpk(Binomial distribution),absentsubscript𝑇𝑘subscript𝑝𝑘(Binomial distribution)\displaystyle=T_{k}p_{k}\quad\text{(Binomial distribution)},= italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (Binomial distribution) ,
E[Wk]𝐸delimited-[]subscript𝑊𝑘\displaystyle E[W_{k}]italic_E [ italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] =Fkqk(Binomial distribution),absentsubscript𝐹𝑘subscript𝑞𝑘(Binomial distribution)\displaystyle=F_{k}q_{k}\quad\text{(Binomial distribution)},= italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (Binomial distribution) ,

and dividing by nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to convert counts to proportions gives

πT,kpksubscript𝜋𝑇𝑘subscript𝑝𝑘\displaystyle\pi_{T,k}p_{k}italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =πF,kqk,absentsubscript𝜋𝐹𝑘subscript𝑞𝑘\displaystyle=\pi_{F,k}q_{k},= italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (13)
λksubscript𝜆𝑘\displaystyle\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =πF,kdjk+πF,kqkπT,kpk.absentsubscript𝜋𝐹𝑘subscript𝑑𝑗𝑘subscript𝜋𝐹𝑘subscript𝑞𝑘subscript𝜋𝑇𝑘subscript𝑝𝑘\displaystyle=\pi_{F,k}d_{jk}+\pi_{F,k}q_{k}-\pi_{T,k}p_{k}.= italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT + italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (14)

Equation (13) yields πF,k=πT,kpk/qksubscript𝜋𝐹𝑘subscript𝜋𝑇𝑘subscript𝑝𝑘subscript𝑞𝑘\pi_{F,k}=\pi_{T,k}p_{k}/q_{k}italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Insert this into the normalization πT,k+πF,k=1subscript𝜋𝑇𝑘subscript𝜋𝐹𝑘1\pi_{T,k}+\pi_{F,k}=1italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT + italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT = 1 to obtain πT,k=qk/(pk+qk)subscript𝜋𝑇𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝑞𝑘\pi_{T,k}=q_{k}/(p_{k}+q_{k})italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and πF,k=pk/(pk+qk)subscript𝜋𝐹𝑘subscript𝑝𝑘subscript𝑝𝑘subscript𝑞𝑘\pi_{F,k}=p_{k}/(p_{k}+q_{k})italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Finally, setting πF,k=λk/djksubscript𝜋𝐹𝑘subscript𝜆𝑘subscript𝑑𝑗𝑘\pi_{F,k}=\lambda_{k}/d_{jk}italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT from (14) gives the constraint (12) and the stated fixed point. ∎

Interpretation of Lemma 2.4.

The equilibrium proportions reveal clear causal roles for each parameter. The false-statement share in network Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is

πF,k=λkdjk,superscriptsubscript𝜋𝐹𝑘subscript𝜆𝑘subscript𝑑𝑗𝑘\pi_{F,k}^{*}=\frac{\lambda_{k}}{d_{jk}},italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT end_ARG ,

so it scales directly with the per-actor error-generation rate λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and inversely with the cross-network detection probability djksubscript𝑑𝑗𝑘d_{jk}italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT. More prolific error creation or weaker cross-scrutiny raises the long-run fraction of false statements.

The true-statement share is

πT,k=πF,kqkpk=λkqkdjkpk,superscriptsubscript𝜋𝑇𝑘superscriptsubscript𝜋𝐹𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝜆𝑘subscript𝑞𝑘subscript𝑑𝑗𝑘subscript𝑝𝑘\pi_{T,k}^{*}=\pi_{F,k}^{*}\,\frac{q_{k}}{p_{k}}=\frac{\lambda_{k}\,q_{k}}{d_{% jk}\,p_{k}},italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT divide start_ARG italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ,

hence it grows with the internal correction probability qksubscript𝑞𝑘q_{k}italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and falls with the internal corruption probability pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. A network that corrects errors efficiently (qksubscript𝑞𝑘absentq_{k}\uparrowitalic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↑) or seldom corrupts truths (pksubscript𝑝𝑘absentp_{k}\downarrowitalic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ↓) achieves a higher equilibrium truth proportion.

Finally, Eq. (12) couples λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to the flip parameters: if within-network corruption outpaces correction (pk>qksubscript𝑝𝑘subscript𝑞𝑘p_{k}>q_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT), the consistency condition forces a higher λksubscript𝜆𝑘\lambda_{k}italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, pushing πF,ksuperscriptsubscript𝜋𝐹𝑘\pi_{F,k}^{*}italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT upward unless the partner network compensates with stronger detection (djksubscript𝑑𝑗𝑘absentd_{jk}\uparrowitalic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ↑). Thus the model quantifies an intuitive trade-off: falsehood prevalence is driven by the ratio of error creation to error removal, internally via (pk,qk)subscript𝑝𝑘subscript𝑞𝑘(p_{k},q_{k})( italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and externally via djksubscript𝑑𝑗𝑘d_{jk}italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT.

2.5.4 Single- vs. Cross-Network Models with Invalidation

Stationary false-statement shares.

The single-network emergent-invalidation model (Lemma 2.3) stabilizes at

πfsingle=p+λp+λ+q,πrsingle=qp+λ+q.formulae-sequencesuperscriptsubscript𝜋𝑓single𝑝𝜆𝑝𝜆𝑞superscriptsubscript𝜋𝑟single𝑞𝑝𝜆𝑞\pi_{f}^{\mathrm{single}}=\frac{p+\lambda}{p+\lambda+q},\qquad\pi_{r}^{\mathrm% {single}}=\frac{q}{p+\lambda+q}.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_single end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_single end_POSTSUPERSCRIPT = divide start_ARG italic_q end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG .

For a given network Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT engaged in cross-network detection with partner Njsubscript𝑁𝑗N_{j}italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (Lemma 2.4) the corresponding steady state is

πfcross=λd,πrcross=λqdp,formulae-sequencesuperscriptsubscript𝜋𝑓cross𝜆𝑑superscriptsubscript𝜋𝑟cross𝜆𝑞𝑑𝑝\pi_{f}^{\mathrm{cross}}=\frac{\lambda}{d},\qquad\pi_{r}^{\mathrm{cross}}=% \frac{\lambda\,q}{d\,p},italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_cross end_POSTSUPERSCRIPT = divide start_ARG italic_λ end_ARG start_ARG italic_d end_ARG , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_cross end_POSTSUPERSCRIPT = divide start_ARG italic_λ italic_q end_ARG start_ARG italic_d italic_p end_ARG ,

where p,q,λ𝑝𝑞𝜆p,q,\lambdaitalic_p , italic_q , italic_λ now abbreviate pk,qk,λksubscript𝑝𝑘subscript𝑞𝑘subscript𝜆𝑘p_{k},q_{k},\lambda_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and d=djk𝑑subscript𝑑𝑗𝑘d=d_{jk}italic_d = italic_d start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT.

Here

p𝑝pitalic_p intrinsic rf𝑟𝑓r\to fitalic_r → italic_f flip probability in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT;
q𝑞qitalic_q intrinsic fr𝑓𝑟f\to ritalic_f → italic_r flip probability in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT;
λ𝜆\lambdaitalic_λ fabrication hazard per true statement in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT;
d𝑑ditalic_d probability a false statement in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is detected by Njsubscript𝑁𝑗N_{j}italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.
Lemma 2.5 (Cross-network detection lowers falsehood prevalence).

If

λd<p+λp+λ+q,𝜆𝑑𝑝𝜆𝑝𝜆𝑞\frac{\lambda}{d}<\frac{p+\lambda}{p+\lambda+q},divide start_ARG italic_λ end_ARG start_ARG italic_d end_ARG < divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG ,

then πfcross<πfsingle,superscriptsubscript𝜋𝑓crosssuperscriptsubscript𝜋𝑓single\pi_{f}^{\mathrm{cross}}<\pi_{f}^{\mathrm{single}},italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_cross end_POSTSUPERSCRIPT < italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_single end_POSTSUPERSCRIPT , i.e. coupling Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to an external detector Njsubscript𝑁𝑗N_{j}italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT reduces the steady-state prevalence of false statements.

Proof.

Subtract the two stationary shares:

πfsingleπfcross=p+λp+λ+qλd>0λd<p+λp+λ+q.formulae-sequencesuperscriptsubscript𝜋𝑓singlesuperscriptsubscript𝜋𝑓cross𝑝𝜆𝑝𝜆𝑞𝜆𝑑0𝜆𝑑𝑝𝜆𝑝𝜆𝑞\pi_{f}^{\mathrm{single}}-\pi_{f}^{\mathrm{cross}}=\frac{p+\lambda}{p+\lambda+% q}-\frac{\lambda}{d}>0\quad\Longleftrightarrow\quad\frac{\lambda}{d}<\frac{p+% \lambda}{p+\lambda+q}.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_single end_POSTSUPERSCRIPT - italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_cross end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG - divide start_ARG italic_λ end_ARG start_ARG italic_d end_ARG > 0 ⟺ divide start_ARG italic_λ end_ARG start_ARG italic_d end_ARG < divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG .

Interpretation.

External scrutiny (d𝑑absentd\uparrowitalic_d ↑) or lighter fabrication pressure (λ𝜆absent\lambda\downarrowitalic_λ ↓) pushes invalidations in the cross-network system below the single-network benchmark. Conversely, when λ/d(p+λ)/(p+λ+q)𝜆𝑑𝑝𝜆𝑝𝜆𝑞\lambda/d\geq(p+\lambda)/(p+\lambda+q)italic_λ / italic_d ≥ ( italic_p + italic_λ ) / ( italic_p + italic_λ + italic_q ), fabrications outrun detections and the dual system sustains the same or a higher falsehood share than isolation.

2.5.5 How Many Agents Guarantee a Target Falsehood Level?

Lemma 2.6 (Effective correction hazard).

Let a focal discursive network Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT possess an internal “false \to true” correction hazard q>0𝑞0q>0italic_q > 0, and let it be cross-linked to n1𝑛1n-1italic_n - 1 partner networks, each supplying an external correction hazard d>0𝑑0d>0italic_d > 0. Assuming that (i) internal and external detections act independently, and (ii) every detection channel is memory-less (exponential), the waiting time T𝑇Titalic_T until a false statement in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is first corrected is

Texp(q+(n1)d).similar-to𝑇𝑞𝑛1𝑑T\sim\exp\bigl{(}q+(n-1)d\bigr{)}.italic_T ∼ roman_exp ( italic_q + ( italic_n - 1 ) italic_d ) .

Consequently, the effective per-statement correction rate is

qeff(n)=q+(n1)d.subscript𝑞eff𝑛𝑞𝑛1𝑑q_{\text{eff}}(n)=q+(n-1)d.italic_q start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ( italic_n ) = italic_q + ( italic_n - 1 ) italic_d .
Proof.

In the discrete model each false statement faces a single Bernoulli “self-repair” trial per period with probability q𝑞qitalic_q. As the period length Δt0Δ𝑡0\Delta t\to 0roman_Δ italic_t → 0, the Binomial \to Poisson limit converts this into a Poisson correction stream of rate q𝑞qitalic_q, i.e. an exponential clock Tqexp(q)similar-tosubscript𝑇𝑞𝑞T_{q}\sim\exp(q)italic_T start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∼ roman_exp ( italic_q ).

Each of the other n1𝑛1n-1italic_n - 1 networks contributes an independent Bernoulli trial with probability d𝑑ditalic_d per period. Taking the same limit gives n1𝑛1n-1italic_n - 1 independent Poisson streams of rate d𝑑ditalic_d, or clocks Td(1),,Td(n1)i.i.d. exp(d)similar-tosubscriptsuperscript𝑇1𝑑subscriptsuperscript𝑇𝑛1𝑑i.i.d. 𝑑T^{(1)}_{d},\dots,T^{(n-1)}_{d}\sim\text{i.i.d.\ }\exp(d)italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT ( italic_n - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∼ i.i.d. roman_exp ( italic_d ).

The total waiting time until any clock rings is the minimum

T=min{Tq,Td(1),,Td(n1)}.𝑇subscript𝑇𝑞subscriptsuperscript𝑇1𝑑subscriptsuperscript𝑇𝑛1𝑑T=\min\{T_{q},T^{(1)}_{d},\dots,T^{(n-1)}_{d}\}.italic_T = roman_min { italic_T start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , … , italic_T start_POSTSUPERSCRIPT ( italic_n - 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } .

Because the minimum of independent exponentials is itself exponential with rate equal to the sum of the component rates, we obtain Texp(q+(n1)d)similar-to𝑇𝑞𝑛1𝑑T\sim\exp\bigl{(}q+(n-1)d\bigr{)}italic_T ∼ roman_exp ( italic_q + ( italic_n - 1 ) italic_d ) and hence qeff(n)=q+(n1)dsubscript𝑞eff𝑛𝑞𝑛1𝑑q_{\text{eff}}(n)=q+(n-1)ditalic_q start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ( italic_n ) = italic_q + ( italic_n - 1 ) italic_d. ∎

Lemma 2.7 (Agent requirement for a tolerance ε𝜀\varepsilonitalic_ε).

Let πf(n)superscriptsubscript𝜋𝑓𝑛\pi_{f}^{(n)}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT be the asymptotic proportion of false statements in Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT when it is coupled to the other n1𝑛1n-1italic_n - 1 networks as above. Then

πf(n)=p+λp+λ+q+(n1)d.superscriptsubscript𝜋𝑓𝑛𝑝𝜆𝑝𝜆𝑞𝑛1𝑑\pi_{f}^{(n)}=\frac{p+\lambda}{\,p+\lambda+q+(n-1)d\,}.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q + ( italic_n - 1 ) italic_d end_ARG .

Consequently, to guarantee πf(n)ε(0,1)superscriptsubscript𝜋𝑓𝑛𝜀01\pi_{f}^{(n)}\leq\varepsilon\in(0,1)italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ≤ italic_ε ∈ ( 0 , 1 ) one needs at least

nmin=1+(p+λ)(1ε1)qdsubscript𝑛1𝑝𝜆1𝜀1𝑞𝑑n_{\min}=\Bigl{\lceil}1+\frac{(p+\lambda)\bigl{(}\tfrac{1}{\varepsilon}-1\bigr% {)}-q}{d}\Bigr{\rceil}italic_n start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = ⌈ 1 + divide start_ARG ( italic_p + italic_λ ) ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG - 1 ) - italic_q end_ARG start_ARG italic_d end_ARG ⌉

cross-detecting networks (agents).

Proof.

Substituting qeff(n)=q+(n1)dsubscript𝑞eff𝑛𝑞𝑛1𝑑q_{\text{eff}}(n)=q+(n-1)ditalic_q start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ( italic_n ) = italic_q + ( italic_n - 1 ) italic_d from Lemma 2.6 into the single-network formula yields

πf(n)=p+λp+λ+q+(n1)d.superscriptsubscript𝜋𝑓𝑛𝑝𝜆𝑝𝜆𝑞𝑛1𝑑\pi_{f}^{(n)}=\frac{p+\lambda}{p+\lambda+q+(n-1)d}.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q + ( italic_n - 1 ) italic_d end_ARG .

The constraint πf(n)εsuperscriptsubscript𝜋𝑓𝑛𝜀\pi_{f}^{(n)}\leq\varepsilonitalic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ≤ italic_ε is equivalent to (n1)d(p+λ)(1ε1)q𝑛1𝑑𝑝𝜆1𝜀1𝑞(n-1)d\geq(p+\lambda)(\tfrac{1}{\varepsilon}-1)-q( italic_n - 1 ) italic_d ≥ ( italic_p + italic_λ ) ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG - 1 ) - italic_q, which gives n1+(p+λ)(1ε1)qd𝑛1𝑝𝜆1𝜀1𝑞𝑑n\geq 1+\frac{(p+\lambda)(\tfrac{1}{\varepsilon}-1)-q}{d}italic_n ≥ 1 + divide start_ARG ( italic_p + italic_λ ) ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG - 1 ) - italic_q end_ARG start_ARG italic_d end_ARG. Taking the ceiling ensures n𝑛nitalic_n is an integer. ∎

Proof.

Per Lemma 2.3, for one isolated network the long-run fraction of false statements is

πfsingle=p+λp+λ+q.superscriptsubscript𝜋𝑓single𝑝𝜆𝑝𝜆𝑞\pi_{f}^{\text{single}}=\frac{p+\lambda}{\,p+\lambda+q\,}.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT single end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG .

Coupling Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to n1𝑛1n-1italic_n - 1 partner networks multiplies its “false \to true” correction hazard from q𝑞qitalic_q to qeff(n)=q+(n1)d,subscript𝑞eff𝑛𝑞𝑛1𝑑q_{\text{eff}}(n)=q+(n-1)d,italic_q start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ( italic_n ) = italic_q + ( italic_n - 1 ) italic_d , per Lemma 2.6. Substituting this into the single-network formula gives the new steady state

πf(n)=p+λp+λ+qeff(n)=p+λp+λ+q+(n1)d.superscriptsubscript𝜋𝑓𝑛𝑝𝜆𝑝𝜆subscript𝑞eff𝑛𝑝𝜆𝑝𝜆𝑞𝑛1𝑑\pi_{f}^{(n)}=\frac{p+\lambda}{\,p+\lambda+q_{\text{eff}}(n)\,}=\frac{p+% \lambda}{\,p+\lambda+q+(n-1)d\,}.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ( italic_n ) end_ARG = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q + ( italic_n - 1 ) italic_d end_ARG .

Impose the tolerance constraint πf(n)εsuperscriptsubscript𝜋𝑓𝑛𝜀\pi_{f}^{(n)}\leq\varepsilonitalic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ≤ italic_ε,

p+λp+λ+q+(n1)dε,0<ε<1.formulae-sequence𝑝𝜆𝑝𝜆𝑞𝑛1𝑑𝜀0𝜀1\frac{p+\lambda}{p+\lambda+q+(n-1)d}\leq\varepsilon,\qquad 0<\varepsilon<1.divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q + ( italic_n - 1 ) italic_d end_ARG ≤ italic_ε , 0 < italic_ε < 1 .

Solving the inequality for n𝑛nitalic_n yields the desired result. The ceiling \lceil\cdot\rceil⌈ ⋅ ⌉ is necessary because n𝑛nitalic_n must be an integer. ∎

Interpretation.

External scrutiny scales linearly with the number of partner networks, while internal falsehood production stays fixed. Thus πf(n)superscriptsubscript𝜋𝑓𝑛\pi_{f}^{(n)}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT decays hyperbolically in n𝑛nitalic_n, and each additional agent yields diminishing (but still positive), returns in truthfulness.

2.6 FOO Algorithm with Integrity Verification

The Flaws-of-Others (FOO) algorithm instantiates the detection hazard d𝑑ditalic_d of Lemma 2.5 in software. It couples an arbitrary ensemble of LLM agents (each defined by a back-end model, decoding temperature, and free-text instructions) to a lightweight consensus loop (Algorithm 1 and Fig. 2). Neither the number of agents nor their prompts are fixed: both are read at run time from a simple JSON configuration, so the same engine can mediate anything from a two-model A/B test to a dozen specialized critics.

The FOO algorithm requires trust in the integrity of agent interactions. In collaborative scientific work, the provenance of each contribution becomes essential for reproducibility and accountability. We extend the basic FOO protocol with cryptographic integrity verification.

2.6.1 Core FOO Protocol

RequestDistributeLLM1LLMjLLMnO1subscript𝑂1O_{1}italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTOjsubscript𝑂𝑗O_{j}italic_O start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPTOnsubscript𝑂𝑛O_{n}italic_O start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPTAssembleDiffer?RespondAdd Consensus InstructionsYesNo
Figure 2: FOO consensus loop. An arbitrary set of agents a1,,amsubscript𝑎1subscript𝑎𝑚a_{1},\dots,a_{m}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT receives the user task, produces candidate answers, cross-critiques peers, and feeds all critiques to one or more harmonizers hhitalic_h. Harmonizers synthesise the feedback; agents revise and the loop continues until a convergence criterion is met. The design allows any number or type of agents and any custom instruction set, making the architecture task- and model-agnostic.

The protocol has four phases:

1. Broadcast: an initial user task is broadcast to every active agent. Each agent returns a first-pass answer.

2. Cross-examination (FOO step): every agent receives the instruction “find the flaws in …” followed by all peer answers except its own, and produces a critique. This implements the cross-detection hazard: an error overlooked by one model is likely to be flagged by at least one other.

3. Harmonization: one or more agents are flagged as harmonizers. They aggregate the entire set of critiques, separate agreements from contradictions, and emit a structured “judgement.” Harmonizers can use any rubric-majority vote, weighted confidence, specialist veto, to convert divergent feedback into a common set of observations.

4. Revision and loop: every non-harmonizer ingests the judgement and regenerates its answer, optionally rebutting points it believes to be wrong. The cycle repeats until a termination condition is met (identical outputs, bounded edit distance, or a maximum number of rounds). The final harmonizer synthesis is returned to the user.

Because the agents; instructions; stopping rule; and comparison metric are all configurable, the same code base supports tasks as different as mathematical proof sketching, literature surveying, or code review. The FOO loop thus acts as a versatile wrapper that upgrades solitary generation into a networked, self-auditing process, realising in practice the external detection hazard that pushes the system into the truth-dominant regime predicted by the theory.

Algorithm 1 FOO with integrity logging
1:user task T𝑇Titalic_T; agent set 𝒜𝒜\mathcal{A}caligraphic_A; convergence test
2:final harmonized answer R𝑅Ritalic_R with verified interaction log
3:absent\mathcal{B}\leftarrowcaligraphic_B ← initialize blockchain with genesis block
4:broadcast T𝑇Titalic_T to every a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A and collect initial answers
5:log initial responses to blockchain \mathcal{B}caligraphic_B
6:repeat
7:     for all agent a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A do
8:         supply a𝑎aitalic_a with ”find flaws in” {answers of 𝒜{a}}answers of 𝒜𝑎\{\text{answers of }\mathcal{A}\setminus\{a\}\}{ answers of caligraphic_A ∖ { italic_a } }
9:         receive critique Casubscript𝐶𝑎C_{a}italic_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT
10:         log critique Casubscript𝐶𝑎C_{a}italic_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT to blockchain \mathcal{B}caligraphic_B
11:     end for
12:     harmonizer(s) hhitalic_h aggregate {Ca}subscript𝐶𝑎\{C_{a}\}{ italic_C start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT } into judgement J𝐽Jitalic_J
13:     log harmonization decision to blockchain \mathcal{B}caligraphic_B
14:     for all non-harmonizer a𝑎aitalic_a do
15:         regenerate answer Aasubscript𝐴𝑎A_{a}italic_A start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT conditioned on J𝐽Jitalic_J
16:         log revision to blockchain \mathcal{B}caligraphic_B
17:     end for
18:until convergence test satisfied
19:return final J𝐽Jitalic_J and verified blockchain \mathcal{B}caligraphic_B

2.6.2 Integrity Extension

Each FOO interaction generates a cryptographically signed record containing: (i) Message content and timestamp, (ii) Agent identity and interaction type, (iii) Hash-based link to previous interactions, (iv) Verification signature

This creates a tamper-evident chain where any modification to historical interactions invalidates subsequent cryptographic links, making post-hoc fabrication of contributions computationally infeasible.

The integrity logging adds four checkpoint types to Algorithm 1:

  1. 1.

    Initial response logging after broadcast

  2. 2.

    Critique logging during cross-examination

  3. 3.

    Harmonization decision logging

  4. 4.

    Revision logging during iteration

Detailed implementation algorithms and security analysis are provided in Appendix A.

3 Theoretical Validation and Parameter Analysis

This section demonstrates the mathematical consistency and theoretical properties of our discursive network models. Rather than claiming empirical validation, we show how the framework accommodates realistic parameter ranges and produces theoretically coherent dynamics. The analysis serves three purposes: (i) establishing that the models yield stable, interpretable equilibria; (ii) demonstrating how parameter variations affect system behavior; and (iii) illustrating the framework’s capacity to represent different invalidation regimes observed in the literature.

We parameterize our models using representative values drawn from the LLM literature to demonstrate theoretical consistency and explore regime transitions. These parameter choices illustrate the framework’s expressive capacity rather than constituting empirical validation. Future work will require systematic parameter estimation from controlled experiments designed specifically to test the discursive network hypotheses.

Refer to caption
(a)
Refer to caption
(b)
Figure 3: (a) Single-network emergent-invalidation dynamics corresponding to Lemma 2.3. Twenty independent Monte-Carlo runs of 100100100100 steps are averaged. The blue curve shows the mean proportion πr(t)subscript𝜋𝑟𝑡\pi_{r}(t)italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) of actors endorsing the true statement r𝑟ritalic_r; the red curve shows the mean proportion πf(t)=1πr(t)subscript𝜋𝑓𝑡1subscript𝜋𝑟𝑡\pi_{f}(t)=1-\pi_{r}(t)italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_t ) = 1 - italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_t ) endorsing the false statement f𝑓fitalic_f. Shaded bands mark point-wise 95%percent9595\,\%95 % confidence intervals. Dashed horizontal lines denote the fixed points πrsuperscriptsubscript𝜋𝑟\pi_{r}^{*}italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and πfsuperscriptsubscript𝜋𝑓\pi_{f}^{*}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT computed for p=0.02𝑝0.02p=0.02italic_p = 0.02, q=0.05𝑞0.05q=0.05italic_q = 0.05, and λ=0.055𝜆0.055\lambda=0.055italic_λ = 0.055.
.
(b) Cross-network invalidation-detection dynamics corresponding to Lemma 2.4. The hazards p𝑝pitalic_p, q𝑞qitalic_q, and λ𝜆\lambdaitalic_λ are unchanged, and an external detection probability d=0.19𝑑0.19d=0.19italic_d = 0.19 links two equal networks (Lemma 2.4). Curves and bands represent the mean and 95%percent9595\,\%95 % confidence interval over 20202020 runs of 200200200200 steps; the upper subplot corresponds to Network 1 and the lower to Network 2. Dashed lines indicate the predicted equilibria πT,ksuperscriptsubscript𝜋𝑇𝑘\pi_{T,k}^{*}italic_π start_POSTSUBSCRIPT italic_T , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (true, blue) and πF,ksuperscriptsubscript𝜋𝐹𝑘\pi_{F,k}^{*}italic_π start_POSTSUBSCRIPT italic_F , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (false, red), which reduce the long-run false share from about 0.600.600.600.60 in panel (a) to about 0.290.290.290.29.

3.1 Parameter specification from literature ranges

Published studies provide parameter ranges that inform our theoretical analysis. \textciteJi2023SelfReflection report invalidation rates of 26-61% in medical domains, while \textciteZhang2024SelfAlignment document self-evaluation accuracy near chance levels (AUROC \approx 0.55). These findings suggest parameter regimes where λ>q𝜆𝑞\lambda>qitalic_λ > italic_q, corresponding to invalidation-dominant dynamics in our framework:

λ>qπfsingle=p+λp+λ+q0.50,formulae-sequence𝜆𝑞superscriptsubscript𝜋𝑓single𝑝𝜆𝑝𝜆𝑞greater-than-or-equivalent-to0.50\lambda>q\quad\Longrightarrow\quad\pi_{f}^{\text{single}}=\frac{p+\lambda}{p+% \lambda+q}\gtrsim 0.50,italic_λ > italic_q ⟹ italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT single end_POSTSUPERSCRIPT = divide start_ARG italic_p + italic_λ end_ARG start_ARG italic_p + italic_λ + italic_q end_ARG ≳ 0.50 ,

i.e. the model fabricates new errors faster than it repairs them internally, i.e. an invalidation-dominant system. While we do not claim these studies directly measure our theoretical parameters, they establish the empirical plausibility of invalidation-dominant regimes and provide realistic bounds for theoretical exploration.

The claim-level probabilities reported by \textciteZhang2024SelfAlignment are raw soft-max scores. They are over-confident until calibrated (their Fig. 5), and they average dependent claims. Consequently we treat them as proxy scores, not literal probabilities, and accompany every point estimate with an explicit uncertainty discussion in what follows.

3.2 Parameter choices for simulation

Table 4 lists the hazards used in our single- and dual-network simulations. Whenever a range was reported in the source, we chose values that place the single network near the mid-point of the error band (about 60 % false statements) so that the effect of cross-network detection is easy to visualise.

The following analysis explores model behavior under parameter values that span regimes of practical interest. We examine: (i) single-network dynamics with varying λ/q𝜆𝑞\lambda/qitalic_λ / italic_q ratios; (ii) cross-network detection effects as d𝑑ditalic_d varies; and (iii) scaling behavior as the number of agents increases. This constitutes theoretical validation of model consistency rather than empirical hypothesis testing.

Table 4: Representative parameter values for theoretical analysis. Parameters are chosen to demonstrate key regime transitions and explore model behavior within empirically plausible ranges. Values do not constitute fitted parameters but rather theoretically motivated choices for mathematical exploration.
Symbol Value Model role Empirical hook / comment
p𝑝pitalic_p 0.020.020.020.02 true \to false slip Chosen an order of magnitude smaller than q𝑞qitalic_q so that internal repair remains visible.
q𝑞qitalic_q 0.050.050.050.05 internal repair Matches self-eval’s modest AUROC 0.55absent0.55\approx 0.55≈ 0.55.
λ𝜆\lambdaitalic_λ 0.0550.0550.0550.055 invalidations Upper half of the 26-61 % error band implies λ>q𝜆𝑞\lambda>qitalic_λ > italic_q; solving πfsingle0.60superscriptsubscript𝜋𝑓single0.60\pi_{f}^{\text{single}}\approx 0.60italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT single end_POSTSUPERSCRIPT ≈ 0.60 for λ𝜆\lambdaitalic_λ with p,q𝑝𝑞p,qitalic_p , italic_q fixed gives 0.0550.0550.0550.055.
d𝑑ditalic_d 0.190.190.190.19 cross-network repair Picked so that λ/d<p/(p+q)𝜆𝑑𝑝𝑝𝑞\lambda/d<p/(p+q)italic_λ / italic_d < italic_p / ( italic_p + italic_q ), just inside the truth-dominant region; see Lemma 2.5.

3.3 Single-network baseline

With the calibrated triple (p,q,λ)𝑝𝑞𝜆(p,q,\lambda)( italic_p , italic_q , italic_λ ) the single-network model (Lemma 2.3) predicts

πfsingle=0.02+0.0550.02+0.055+0.05=0.60,πrsingle=0.40.formulae-sequencesuperscriptsubscript𝜋𝑓single0.020.0550.020.0550.050.60superscriptsubscript𝜋𝑟single0.40\pi_{f}^{\text{single}}=\frac{0.02+0.055}{0.02+0.055+0.05}=0.60,\qquad\pi_{r}^% {\text{single}}=0.40.italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT single end_POSTSUPERSCRIPT = divide start_ARG 0.02 + 0.055 end_ARG start_ARG 0.02 + 0.055 + 0.05 end_ARG = 0.60 , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT single end_POSTSUPERSCRIPT = 0.40 .

A Monte-Carlo experiment (FOO_Single_Network.py, 20202020 runs, 100100100100 steps) produces π^f(100)=0.597±0.018subscript^𝜋𝑓100plus-or-minus0.5970.018\hat{\pi}_{f}(100)=0.597\pm 0.018over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( 100 ) = 0.597 ± 0.018.

Interpretation.

When λ>q𝜆𝑞\lambda>qitalic_λ > italic_q the false share stabilises near 60 %, squarely inside the empirical band of \citeauthorJi2023SelfReflection. This baseline serves as the reference against which we gauge cross-network effects.

3.4 Dual-network architecture

Coupling two identical networks via an external repair hazard d𝑑ditalic_d (Lemma 2.5) yields

πfcross=λd=0.29,πrcross=0.71,formulae-sequencesuperscriptsubscript𝜋𝑓cross𝜆𝑑0.29superscriptsubscript𝜋𝑟cross0.71\pi_{f}^{\text{cross}}=\frac{\lambda}{d}=0.29,\qquad\pi_{r}^{\text{cross}}=0.71,italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT cross end_POSTSUPERSCRIPT = divide start_ARG italic_λ end_ARG start_ARG italic_d end_ARG = 0.29 , italic_π start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT cross end_POSTSUPERSCRIPT = 0.71 ,

i.e. the falsehood prevalence is cut roughly in half. Simulations (20202020 runs, 200200200200 steps) give π^fN1(200)=0.286±0.015superscriptsubscript^𝜋𝑓subscript𝑁1200plus-or-minus0.2860.015\hat{\pi}_{f}^{N_{1}}(200)=0.286\pm 0.015over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 200 ) = 0.286 ± 0.015 and π^fN2(200)=0.289±0.014superscriptsubscript^𝜋𝑓subscript𝑁2200plus-or-minus0.2890.014\hat{\pi}_{f}^{N_{2}}(200)=0.289\pm 0.014over^ start_ARG italic_π end_ARG start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 200 ) = 0.289 ± 0.014 (Fig. 3b).

Interpretation.

The external hazard d𝑑ditalic_d can be realized by retrieval-augmented generation, ensemble adjudication, or human post-editing. Once the ratio λ/d𝜆𝑑\lambda/ditalic_λ / italic_d drops below p/(p+q)𝑝𝑝𝑞p/(p+q)italic_p / ( italic_p + italic_q ) the system flips from an invalidation-dominant regime (πf0.60subscript𝜋𝑓0.60\pi_{f}\approx 0.60italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ≈ 0.60) to a truth-dominant one (πf0.29subscript𝜋𝑓0.29\pi_{f}\approx 0.29italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ≈ 0.29), a 51 % relative reduction.

3.5 How many independent agents for \leq5 % error?

Refer to caption
Figure 4: Long‐run false share vs. number of agents. The orange curve shows the analytic steady-state falsehood share πf(n)=(p+λ)[p+λ+q+(n1)d]1subscript𝜋𝑓𝑛𝑝𝜆superscriptdelimited-[]𝑝𝜆𝑞𝑛1𝑑1\pi_{f}(n)=\bigl{(}p+\lambda\bigr{)}\bigl{[}p+\lambda+q+(n-1)d\bigr{]}^{-1}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_n ) = ( italic_p + italic_λ ) [ italic_p + italic_λ + italic_q + ( italic_n - 1 ) italic_d ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT for the calibrated hazards (p,q,λ,d)=(0.02, 0.05, 0.055, 0.19)𝑝𝑞𝜆𝑑0.020.050.0550.19(p,q,\lambda,d)=(0.02,\,0.05,\,0.055,\,0.19)( italic_p , italic_q , italic_λ , italic_d ) = ( 0.02 , 0.05 , 0.055 , 0.19 ). Each dot marks an integer n𝑛nitalic_n; the dashed horizontal line is the 5 % target. The first point below that line occurs at nmin=9subscript𝑛9n_{\min}=9italic_n start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = 9, labelled in green, meaning at least nine mutually detecting agents are required (under this calibration) to keep the long-run error rate below one false statement in twenty.

Using Proposition 2.7 we can ask: How many cross-detecting networks are required to push the long-run false share below 5%percent55\,\%5 %? With the parameters of Table 4 and tolerance ε=0.05𝜀0.05\varepsilon=0.05italic_ε = 0.05,

nmin=1+(p+λ)(10.051)qd=8.24=9.subscript𝑛1𝑝𝜆10.051𝑞𝑑8.249n_{\min}=\Bigl{\lceil}1+\frac{(p+\lambda)\bigl{(}\tfrac{1}{0.05}-1\bigr{)}-q}{% d}\Bigr{\rceil}=\bigl{\lceil}8.24\bigr{\rceil}=9.italic_n start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = ⌈ 1 + divide start_ARG ( italic_p + italic_λ ) ( divide start_ARG 1 end_ARG start_ARG 0.05 end_ARG - 1 ) - italic_q end_ARG start_ARG italic_d end_ARG ⌉ = ⌈ 8.24 ⌉ = 9 .

Hence, in this example, at least nine mutually detecting agents are necessary to guarantee that fewer than one statement in twenty remains false at equilibrium under this calibration. The requirement grows only linearly in 1/ε1𝜀1/\varepsilon1 / italic_ε thanks to the additive nature of the external hazards, making multi-agent verification a scalable pathway to high factual reliability. Figure 4 shows the functional dependency between ε𝜀\varepsilonitalic_ε and n𝑛nitalic_n.

4 Discussion

4.1 Ethical Concerns

The first and most important observation regarding discursive networks with LLMs is ethical: there is a risk of transgression when the human(s) in a discursive network use it as a means to oursource reasoning instead of deploying it as augmented intelligence. In linguistics, epithesis refers to the addition of a sound or letter to the end of a word without changing its meaning. By analogy we identify an ethical concern in discursive networks: the scientific epithesis, wherein an individual seeks authorship on an artifact to which they have contributed only superficial edits—or none at all. Like the linguistic phenomenon, the intervention leaves the substantive content untouched while appending an external element that alters perception rather than substance. Scientific epithesis does not meet the formal threshold of plagiarism, yet it belongs to the same family of misappropriations because it places a symbolic layer of credit “upon” the discourse without engaging in its intellectual construction. In the context of discursive networks this behaviour distorts the link between contribution and attribution, undermining the very mechanism of cross-agent validation that the network is designed to support.

Authorship norms present another axis of concern. As dozens of agents contribute micro-edits, intellectual responsibility becomes increasingly opaque, complicating both credit assignment and error tracing. Empirical work across disciplines shows that diffuse contributions encourage honourary or “gift” authorship, diluting accountability and undermining public trust in published findings [maruvsic2011systematic]. What constitutes authorship in discursive network is an open question that will take time to settle.

The integrity of discursive networks fundamentally depends on the ability to verify the authenticity and provenance of each contribution, whether from human or artificial agents. When scientific conclusions emerge from iterative exchanges among multiple participants, traditional notions of authorship become complicated by the distributed nature of intellectual labor and the possibility of post-hoc modification of interaction records. This challenge is particularly acute in combating epithesis, as the practice thrives in environments where genuine contributions cannot be distinguished from superficial additions or retroactive claims of involvement. A robust solution requires cryptographic mechanisms that create tamper-evident logs of all interactions, making it computationally infeasible to fabricate authorship claims after the fact. By implementing blockchain-based integrity verification for agent communications, discursive networks can establish a trustworthy foundation where each participant’s actual contributions are permanently recorded and verifiable. This technical infrastructure does more than prevent fraud; it creates positive incentives for meaningful engagement by ensuring that substantial intellectual contributions receive proper attribution while making epithetic behavior both detectable and reputationally costly. The result is a research environment where collaborative human-AI knowledge production can proceed with confidence in the integrity of the underlying interaction records.

The energy footprint of large discursive networks pose another ethical dilemma. Each critical review involves at least one forward-and-backward pass through a language model, so if a manuscript is critiqued by a single external agent the computational cost grows linearly with the number of agents, E(N)=Θ(N)𝐸𝑁Θ𝑁E(N)=\Theta(N)italic_E ( italic_N ) = roman_Θ ( italic_N ). When every agent critiques every other agent—the fully connected case that is ideal for robustness—the number of pairwise exchanges scales as N(N1)/2=Θ(N2)𝑁𝑁12Θsuperscript𝑁2N(N-1)/2=\Theta(N^{2})italic_N ( italic_N - 1 ) / 2 = roman_Θ ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), and so does the energy consumption. Moreover, once those quadratic interactions have occurred the network typically runs a consensus or “harmonization” phase to reconcile conflicting edits; common distributed algorithms complete in O(logN)𝑂𝑁O(\log N)italic_O ( roman_log italic_N ) synchronous rounds. The aggregate budget therefore climbs to Θ(N2logN)Θsuperscript𝑁2𝑁\Theta\bigl{(}N^{2}\log N\bigr{)}roman_Θ ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log italic_N ) for end-to-end validation, dwarfing the cost of the original single-agent composition. In an era when each large-model inference already carries a measurable carbon footprint, the quadratic-plus overhead raises difficult questions about the sustainability of scaling discursive networks without parallel investment in greener compute or more frugal validation protocols.

Because those costs grow faster comapred to single-agent or single-author scenarios, only well-funded actors may afford the energy budget, deepening the resource gap already highlighted for modern NLP pipelines [Strubell2019Energy]. Sustaining the benefits of discursive verification therefore demands not merely algorithmic innovation but also governance frameworks and infrastructural subsidies that keep the playing field environmentally and economically fair.

Discursive networks promise robust cross-verification, yet their perpetual negotiation of “truth” can erode epistemic diversity. When many agents iteratively revise one another, the process tends to pull answers toward a central consensus, suppressing minority explanations in favour of the statistically safest wording. Large-scale language models already exhibit this homogenising bias, along with well-documented tendencies to replicate and even amplify the social prejudices embedded in their training data [Bender2021Parrots]. A network of such agents therefore risks hard-coding bias under the reassuring veneer of multi-agent agreement.

At the same time, cross-agent review typically demands full visibility of prompts and intermediate reasoning, thereby increasing the attack surface for privacy breaches. Membership-inference studies demonstrate how seemingly benign queries can reveal whether sensitive records were present in a training set [Shokri2017Membership], suggesting that discursive networks must treat every inter-agent channel as a potentially hamful vector for leakage.

Finally, adversarial robustness and distributive justice pose intertwined challenges. A single compromised agent can inject fashioned “triggers” that, once propagated through mutual validation, shift the entire network toward a malicious conclusion [Wallace2019Triggers].

4.2 Conclusions and Outlook

This manuscript has traced a broad arc, from theoretical grounding to practical tooling, around a single organising idea: discursive networks. By recognising that every LLM is both a generator and a consumer of discourse, we cast its interactions as edges in a network whose universal structure can be exploited for robust error control. Building on this abstraction we introduced Flaws-Of-Others (FOO), a reconfigurable agent-based algorithm packaged with user-friendly tools that assist the production, verification, and revision of scientific knowledge.

Many ethical challenges in discursive networks call for detailed, tamper-proof logs of interactions. This is essential for accountability, reproducibility, and auditability in digital systems. When interactions involve LLMs, scientific collaborations, or complex data workflows, the capacity to verify the integrity of recorded exchanges becomes a prerequisite for trust. One approach to achieving this is through the use of blockchain technology, which can provide decentralized, cryptographically secure records that are resistant to unauthorized modification. Each entry in a blockchain-based log is linked to the previous one through cryptographic hashes, ensuring that any tampering with earlier data invalidates the entire subsequent chain. This structure allows interaction logs to be both transparent and verifiable without central oversight. Furthermore, incorporating time-stamping and access control into such systems ensures that each interaction is both temporally fixed and attributable. These properties make blockchain a viable framework for securing interaction logs in research, legal compliance, and automated decision-making contexts.

This study deliberately replaces the fashionable term hallucination with the broader concept of invalidation. Whereas “hallucination” suggests a purely accidental slip in the model’s internal perception, the data reveal a richer spectrum of invalidations (failure modes) that includes strategic prompt manipulation, chain-of-thought drift and the simple inheritance of errors from flawed training corpora. All of these mechanisms manifest in the observable metric that matters to users, the production of false statements, so the single hazard rate λ𝜆\lambdaitalic_λ is most naturally interpreted as an invalidation rate. The shift in vocabulary is therefore more than semantics: it aligns theoretical parameters with the phenomena actually counted in benchmarks.

A discursive network that relies exclusively on its own self-correction routines will remain invalidation-dominant as soon as the fabrication hazard exceeds the internal repair rate (λ>q𝜆𝑞\lambda>qitalic_λ > italic_q). In that regime the long-run share of false statements is bounded below by one half, regardless of implementation details or domain. The formalism developed here reveals why: fabrication and self-repair enter the steady-state ratio πfsingle=(p+λ)/(p+λ+q)superscriptsubscript𝜋𝑓single𝑝𝜆𝑝𝜆𝑞\pi_{f}^{\text{single}}=(p+\lambda)/(p+\lambda+q)italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT single end_POSTSUPERSCRIPT = ( italic_p + italic_λ ) / ( italic_p + italic_λ + italic_q ) additively, leaving no structural mechanism for the network to “outrun” its own invalidations.

The picture changes once independent verification channels are added. Coupling the generator to n1𝑛1n-1italic_n - 1 external agents augments every false statement with an additional repair hazard d𝑑ditalic_d per agent, yielding the effective rate qeff(n)=q+(n1)dsubscript𝑞eff𝑛𝑞𝑛1𝑑q_{\text{eff}}(n)=q+(n-1)ditalic_q start_POSTSUBSCRIPT eff end_POSTSUBSCRIPT ( italic_n ) = italic_q + ( italic_n - 1 ) italic_d. If the composite system satisfies λ/d<p/(p+q)𝜆𝑑𝑝𝑝𝑞\lambda/d<p/(p+q)italic_λ / italic_d < italic_p / ( italic_p + italic_q ) it flips into the truth-dominant regime, in which the falsehood share decreases monotonically with n𝑛nitalic_n and approaches zero in the limit of infinite cross-checking capacity. The transition threshold depends only on the ratio λ/d𝜆𝑑\lambda/ditalic_λ / italic_d, providing a clean design criterion that is independent of any particular benchmark or simulation.

Proposition 2.7 turns that qualitative criterion into a quantitative planning tool. It gives a closed-form bound

nmin=1+(p+λ)(1ε1)qdsubscript𝑛1𝑝𝜆1𝜀1𝑞𝑑n_{\min}=\Bigl{\lceil}1+\frac{(p+\lambda)\bigl{(}\tfrac{1}{\varepsilon}-1\bigr% {)}-q}{d}\Bigr{\rceil}italic_n start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = ⌈ 1 + divide start_ARG ( italic_p + italic_λ ) ( divide start_ARG 1 end_ARG start_ARG italic_ε end_ARG - 1 ) - italic_q end_ARG start_ARG italic_d end_ARG ⌉

for the smallest number of mutually detecting agents required to keep the long-run falsehood share below a user-specified tolerance ε𝜀\varepsilonitalic_ε. Because nminsubscript𝑛n_{\min}italic_n start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT grows only linearly in (p+λ)/d𝑝𝜆𝑑(p+\lambda)/d( italic_p + italic_λ ) / italic_d, even ambitious error targets (e.g. ε=0.01𝜀0.01\varepsilon=0.01italic_ε = 0.01) translate into tractable network sizes. In practice, engineers can estimate p𝑝pitalic_p, q𝑞qitalic_q, and λ𝜆\lambdaitalic_λ from established benchmarks, select a viable cross-detection mechanism to determine d𝑑ditalic_d, and then read off nminsubscript𝑛n_{\min}italic_n start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT directly from the formula. The discursive-network formalization thus provides not just a descriptive model of information dynamics but a concrete apparatus for right-sizing the verification infrastructure needed to achieve prescribed levels of factual reliability.

The theoretical analysis confirms that our mathematical framework produces stable, interpretable dynamics across parameter ranges consistent with published LLM studies. The models successfully capture qualitative regime transitions (invalidation-dominant vs. truth-dominant) and provide quantitative predictions for multi-agent system design.

Empirical validation remains an important direction for future work, requiring controlled experiments specifically designed to measure the theoretical parameters (p𝑝pitalic_p, q𝑞qitalic_q, λ𝜆\lambdaitalic_λ, d𝑑ditalic_d) under realistic conditions.

Although the mathematics treats d𝑑ditalic_d as a single scalar, the concept encompasses several concrete engineering choices. Retrieval-augmented generation raises d𝑑ditalic_d by surrounding the model with authoritative passages that expose contradictions. Model-ensemble adjudication raises it further by combining the diverse priors of independently trained models; uncorrelated errors rarely agree, so the aggregate chance that a falsehood slips through diminishes rapidly. Even higher values of d𝑑ditalic_d become attainable when a human editor is placed in the loop, although latency and cost then become limiting factors. Our hazard model quantifies these trade-offs: for any target false-statement tolerance one can compute a required detection rate, and thus budget the amount of human or automated scrutiny that must be applied.

Just as people are quicker to spot another person’s mistakes than their own, a language model is often a sharper critic of a peer’s text than of its own output. The asymmetry stems from each model’s training objective: during generation it maximises local fluency rather than global factuality, so a well-phrased falsehood slips through unchallenged. When the same model is asked only to evaluate an already-written passage, fluency is no longer the bottleneck; the task collapses to checking claims against the knowledge stored in its training corpus. Verification is therefore easier, and cross-model review can expose errors that the original authoring pass left intact.

The significance of these findings extends beyond artificial systems. Invalidation dynamics of the same mathematical form could govern human conversation, peer review and social-media fact-checking. Recognising this shared structure invites a unified research agenda that links network science, cognitive psychology and algorithmic governance. Our future work will explore heterogeneous hazards at the level of individual actors, time-varying detection capacities that respond to workload, and live A/B tests that recover hazard estimates directly from production chat traffic. A deeper ethical analysis will also be required, because the push toward smaller πfsubscript𝜋𝑓\pi_{f}italic_π start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT competes with privacy constraints, energy budgets and the carbon footprint of large-scale verification.

We have entered a new cultural mode. It affects scientific production.

We have crossed a cultural threshold: language is no longer crafted solely by human hands and minds, but is continuously co-composed with machines. Large language-model systems do not “assist” writing; they repurpose it. Authorship dissolves into a live dialogue between human intention and algorithmic completion, with sentences generated, revised, and re-externalized in the same breath. Because the practice itself has changed, the evaluative yardsticks built for solitary, page-bound prose (originality scores, citation counts, style rubrics, etc.) no longer capture what is happening on the screen. This cultural shift brings novel risks and opportunities, demanding updated methods, training, expectations, and metrics.

Looking forward, the research programme must widen from token-level truthfulness to medium-level dynamics. Three paths stand out:

  1. 1.

    Cultural assimilation. Our most pressing need is an ethical framework to harmonize societal values with this new reality. We need conventions, interfaces and pedagogies that make continuous, model-mediated composition legible, trustworthy, and fair.

  2. 2.

    Metric redesign. Benchmarks rooted in solitary authorship and static text can no longer capture quality in a live, co-creative medium. New metrics should score how a statement evolves under iterative detection and repair, not just its instantaneous truth value.

  3. 3.

    Governance of agency. When the medium itself “acts on the message,” responsibility diffuses across designers, deployers and end-users. Future hazard models must therefore integrate economic incentives, interface affordances and policy constraints alongside the technical parameters p,q,λ,d𝑝𝑞𝜆𝑑p,q,\lambda,ditalic_p , italic_q , italic_λ , italic_d.

Seen through this lens, the ideas discussed in this manuscript form a prototype for broader cultural discussions that will emerge as society learns, once again, to write in a fundamentally new medium.

Acknowledgements

This work was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under Award No. 1R25GM151182 and by the National Science Foundation under Award No. 2518973. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH, NIGMS, or NSF.

In an instance of self-reference, this manuscript was developed using the methods it describes, through iterative cycles of recursive software development that refined the underlying technology. The records of interactions with multiple LLM agents are recorded only for late-stage manuscript refinement. Nevertheless, the following statement accurately represents the process employed here and provides a template for future work:

This manuscript and its supplementary materials were produced using methods detailed in [gutierrez2025flaws]. The author supplied the core concepts, the logical framework, and foundational technical content, while large language-model assistance was utilized for ideation, verification, text drafting and revision, and software implementation. Responsibility for all claims made herein rests solely with the author. Existing logs of interactions with LLM agents are included as an appendix within the supplementary materials.

\printbibliography

Appendix A Blockchain Implementation Details

.

Implementation note.

We implement the loop in Python 3.11 with asyncio concurrency; each agent call is an HTTP request to a hosted LLM endpoint (e.g. OpenAI, Anthropic). Unless stated otherwise, the experimental pool comprises a single harmonizer (the most capable model available) and, for each back-end engine under test, two specialist agents sampled at temperatures 0.1 and 0.9. The protocol executes one broadcast round plus a minimum of three consensus rounds, so every query triggers

cost=4×|A|API calls,cost4𝐴API calls\text{cost}=4\times|A|\quad\text{API calls},cost = 4 × | italic_A | API calls ,

where |A|𝐴|A|| italic_A | is the total number of agents (harmonizer + specialists). For example, with one engine (|A|=3𝐴3|A|=3| italic_A | = 3) the loop issues 4×3=1243124\times 3=124 × 3 = 12 calls; adding engines or additional specialist roles scales the cost linearly. All configuration files and source code are publicly available at https://github.com/biomathematicus/foo.

Blockchain motivation.

The reliability of discursive networks depends critically on the integrity of recorded interactions between human and artificial agents. When scientific conclusions emerge from iterative exchanges among multiple LLMs and human reviewers, the provenance and authenticity of each contribution becomes essential for both reproducibility and accountability. Traditional logging systems are vulnerable to post-hoc modification, making it difficult to distinguish genuine collaborative refinement from retrospective tampering or fabricated authorship claims.

We address this challenge through a blockchain-based integrity system that creates tamper-evident records of all agent interactions. Each message exchange in the discursive network generates a cryptographic block containing the agent identity, message content, timestamp, and hash-linked reference to the previous interaction. The system employs SHA-256 hashing with a global salt shared across all agents to ensure consistency while preventing individual agents from being identified through hash analysis alone.

Definition A.1 (Conversation Blockchain).

For a discursive network N=(A,S,P,I,C,B,U,G)𝑁𝐴𝑆𝑃𝐼𝐶𝐵𝑈𝐺N=(A,S,P,I,C,B,U,G)italic_N = ( italic_A , italic_S , italic_P , italic_I , italic_C , italic_B , italic_U , italic_G ), the conversation blockchain \mathcal{B}caligraphic_B is a sequence of cryptographically linked blocks {b0,b1,,bn}subscript𝑏0subscript𝑏1subscript𝑏𝑛\{b_{0},b_{1},\ldots,b_{n}\}{ italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } that creates tamper-evident records of all agent interactions, where σ𝜎\sigmaitalic_σ is a global salt and ||||| | denotes concatenation.

Algorithm 2 Blockchain record creation for discursive networks
1:message content misubscript𝑚𝑖m_{i}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, sender ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, receiver aksubscript𝑎𝑘a_{k}italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, global salt σ𝜎\sigmaitalic_σ
2:new blockchain block bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
3:tisubscript𝑡𝑖absentt_{i}\leftarrowitalic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← current timestamp
4:hcH(mitiσ)subscript𝑐𝐻subscript𝑚𝑖normsubscript𝑡𝑖𝜎h_{c}\leftarrow H(m_{i}||t_{i}||\sigma)italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ← italic_H ( italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_σ ) \triangleright Content hash
5:if i=0𝑖0i=0italic_i = 0 then \triangleright Genesis block
6:     hbH(hc“GENESIS”σ)subscript𝑏𝐻subscript𝑐norm“GENESIS”𝜎h_{b}\leftarrow H(h_{c}||\text{``GENESIS''}||\sigma)italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ← italic_H ( italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | | “GENESIS” | | italic_σ )
7:else
8:     hbH(hchb(bi1)σ)subscript𝑏𝐻subscript𝑐normsubscript𝑏subscript𝑏𝑖1𝜎h_{b}\leftarrow H(h_{c}||h_{b}(b_{i-1})||\sigma)italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ← italic_H ( italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT | | italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) | | italic_σ ) \triangleright Chain hash
9:end if
10:bi{mi,ti,hc,hb,i,verified}subscript𝑏𝑖subscript𝑚𝑖subscript𝑡𝑖subscript𝑐subscript𝑏𝑖verifiedb_{i}\leftarrow\{m_{i},t_{i},h_{c},h_{b},i,\text{verified}\}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← { italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_i , verified }
11:Append bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to blockchain \mathcal{B}caligraphic_B
12:return bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
Algorithm 3 Blockchain integrity verification
1:blockchain ={b0,b1,,bn}subscript𝑏0subscript𝑏1subscript𝑏𝑛\mathcal{B}=\{b_{0},b_{1},\ldots,b_{n}\}caligraphic_B = { italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }, global salt σ𝜎\sigmaitalic_σ
2:integrity status (valid/tampered)
3:for i=0𝑖0i=0italic_i = 0 to n𝑛nitalic_n do
4:     hcH(mitiσ)superscriptsubscript𝑐𝐻subscript𝑚𝑖normsubscript𝑡𝑖𝜎h_{c}^{*}\leftarrow H(m_{i}||t_{i}||\sigma)italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_H ( italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | italic_σ )
5:     if hchc(bi)superscriptsubscript𝑐subscript𝑐subscript𝑏𝑖h_{c}^{*}\neq h_{c}(b_{i})italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) then
6:         return “TAMPERED: Content hash mismatch at block i𝑖iitalic_i
7:     end if
8:     if i>0𝑖0i>0italic_i > 0 then
9:         hbH(hc(bi)hb(bi1)σ)superscriptsubscript𝑏𝐻subscript𝑐subscript𝑏𝑖normsubscript𝑏subscript𝑏𝑖1𝜎h_{b}^{*}\leftarrow H(h_{c}(b_{i})||h_{b}(b_{i-1})||\sigma)italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← italic_H ( italic_h start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | | italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) | | italic_σ )
10:         if hbhb(bi)superscriptsubscript𝑏subscript𝑏subscript𝑏𝑖h_{b}^{*}\neq h_{b}(b_{i})italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ italic_h start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) then
11:              return “TAMPERED: Chain hash mismatch at block i𝑖iitalic_i
12:         end if
13:     end if
14:end for
15:return “VERIFIED: Blockchain integrity intact”

The cryptographic hash function H()𝐻H(\cdot)italic_H ( ⋅ ) can be instantiated with various secure hash algorithms including SHA-256, SHA-3 (Keccak), BLAKE2, or BLAKE3, each offering different performance and security trade-offs. SHA-256 remains a robust choice for blockchain applications due to its widespread adoption and proven security properties, while newer algorithms like BLAKE3 offer superior performance for high-throughput scenarios.

The blockchain protocol ensures that any modification to historical interactions invalidates the cryptographic chain, producing detectable integrity violations. When agents load previous conversations, the system verifies the complete hash chain and displays prominent warnings if tampering is detected: “LOG TAMPERED. TRUST HAS BEEN BREACHED. BLOCKCHAIN FAILS.” This mechanism makes post-hoc fabrication of contributions computationally infeasible while preserving the ability to legitimately edit conversations by rebuilding the chain from the point of modification onward.

The implementation maintains separate blockchains for each agent while using a shared salt stored in the system configuration to ensure hash consistency across sessions. Genesis blocks are initialized with fixed timestamps to prevent hash divergence during system restarts. The protocol automatically migrates existing conversation logs to blockchain format, enabling backward compatibility while establishing integrity verification for all new interactions.

This cryptographic foundation serves two complementary functions in discursive networks. First, it provides technical infrastructure for reproducible research by creating verifiable logs of how scientific conclusions evolved through agent interactions. Second, it establishes an ethical framework for attribution by making it computationally expensive to falsify contributions after the fact. The blockchain thus transforms the question of authorship from a matter of trust to one of cryptographic verification, supporting the broader goal of maintaining accountability in collaborative human-AI knowledge production.