FlawsOfOthers.bib
A Mathematical Theory of Discursive Networks
Abstract
Large language models (LLMs) turn writing into a live exchange between humans and software. We characterize this new medium as a discursive network that treats people and LLMs as equal nodes and tracks how their statements circulate. We define the generation of erroneous information as invalidation (any factual, logical, or structural breach) and show it follows four hazards: drift from truth, self-repair, fresh fabrication, and external detection. We develop a general mathematical model of discursive networks that shows that a network governed only by drift and self-repair stabilizes at a modest error rate. Giving each false claim even a small chance of peer review shifts the system to a truth-dominant state. We operationalize peer review with the open-source Flaws-of-Others (FOO) algorithm: a configurable loop in which any set of agents critique one another while a harmonizer merges their verdicts. We identify an ethical transgression, epithesis, that occurs when humans fail to engage in the discursive network. The takeaway is practical and cultural: reliability in this new medium comes not from perfecting single models but from connecting imperfect ones into networks that enforce mutual accountability.
Contents
- 1 Introduction
- 2 Methods
- 3 Theoretical Validation and Parameter Analysis
- 4 Discussion
- A Blockchain Implementation Details
1 Introduction
Large Language Models (LLMs) are advanced artificial intelligence systems trained on vast amounts of text data to generate human-like language. These models utilize deep learning techniques to understand and produce text based on the patterns and structures found in their training data \parencitevaswani2017attention, brown2020language. LLMs have demonstrated impressive capabilities in various natural language processing tasks, including text generation, translation, and question-answering.
Despite their remarkable performance, LLMs are prone to generating false or misleading statements, a phenomenon often referred to as “hallucinations” \parenciteji2022survey. However, the metaphor of hallucination is limited: it implies a private sensory distortion, whereas an unfounded LLM assertion can be circulated, cited, and acted upon as fact. Throughout this paper we therefore use the broader term invalidation, and we show that what is commonly referred to as hallucination is just one of the many manifestations of invalid information.
Invalidations in LLMs can manifest as factual inconsistencies, logical contradictions, or entirely fabricated content that appears plausible \parencitemaynez2020faithfulness. This issue is exacerbated by the lack of a verification mechanism within the models themselves \parencitebender2021dangers, weidinger2021ethical. Although retrieval-augmented generation and self-consistency checks \parencitelewis2020retrieval,wang2023self reduce the problem, a substantial share of outputs remains unreliable-enough to undermine trust in practical deployments.
Empirical evaluations show that LLMs continue to produce non-trivial rates of factual error and harmful content after instruction tuning and reinforcement learning. Studies in medical domains have documented significant factual inaccuracies in model answers [thirunavukarasu2023large, sallam2023chatgpt], and work on adversarial prompting has demonstrated that safety-trained models still emit disallowed content [zou2023universal, chao2023jailbreaking].
Several factors contribute to the occurrence of invalid information, including biases in training data, limitations in knowledge representation, and the models’ tendency to prioritize fluency over factual accuracy \parencitelin2022truthfulqa. The prevalence and impact of invalidations are significant, with quantitative evaluations revealing that they occur in up to 30% of generated responses, substantially affecting the trustworthiness of these models \parenciteji2022survey, lin2022truthfulqa. Moreover, studies have shown that LLMs can generate false information even when explicitly prompted to be truthful \parenciteevans2021truthful, underscoring the challenge of aligning model outputs with factual correctness.
This article is organized as follows: Section 1.1 introduces the concept of invalidation as a broader alternative to hallucination; Section 1.2 situates invalidation within established cognitive and media theories, showing it as a universal feature of both human and artificial cognition; Section 1.3 examines how invalidations propagate through interconnected communication systems; Section 2.3 provides an information-theoretic basis for why verification is fundamentally easier than generation; Section 2.4 formalizes discursive networks as mathematical structures with actors, statements, and update rules; Section 2.5 develops three progressively complex models of invalidation dynamics: single-network with binary states, single-network with emergent invalidation, and cross-network detection; Section 3 demonstrates the mathematical consistency of these models through theoretical analysis and parameter exploration; finally, Section 4.2 addresses ethical concerns including epithesis, energy costs, and epistemic diversity, before outlining future research directions.
The scope of this manuscript encompasses both theoretical foundations and practical implementations. We establish mathematical floors on invalidation probability, develop network models for error propagation and detection, and present the Flaws-of-Others (FOO) algorithm with cryptographic integrity verification. The analysis focuses on invalidations that emerge during inference in large language models and mitigation strategies based on cross-agent critique. While we do not provide exhaustive failure catalogues or benchmark comparisons, we advance a unified mathematical framework demonstrating how networks of imperfect agents can achieve error rates below what any individual agent attains.
1.1 From “Hallucination” to Invalidation
The word hallucination has become the default label for false statements generated by LLMs. Borrowed from perceptual psychology, it misses two crucial aspects. First, an LLM’s error is not confined to a private experience; it can be adopted by readers and propagate through networks, amplifying misinformation \parencitecrawford2021excavating. Second, focussing on hallucinations alone narrows the research agenda, leaving other error classes (logical contradictions, format violations, ethical breaches) under-examined.
We call any output that violates a constraint set of facts, logic, norms, and formats, an invalidation. Because an autoregressive decoder maximises next-token likelihood rather than global consistency, a non-zero slice of probability mass inevitably falls outside the constraints.
In practice, invalidation surfaces along at least five recurring archetypes that differ in locus and detectability. First, hallucination denotes the introduction of content that is ungrounded in any trusted source or context; large-scale surveys show it to be pervasive even in the highest-performing models \parencitehuang2025survey. Second, contradiction captures internally inconsistent statements that coexist within a single generation, a failure mode quantified and mitigated by prompt-based self-refinement techniques \parencitemundler2023selfcontradiction. Third, deductive error arises when the model draws logically invalid conclusions from true premises, an error family systematically stress-tested with adversarial perturbations \parencitehoppe2025deductive. Fourth, pragmatic impropriety concerns outputs that violate social or professional norms, including toxicity, hate speech, or privacy leakage; the RealToxicityPrompts benchmark revealed that even innocuous inputs can trigger toxic degeneration \parencitegehman2020realtoxicity. Finally, format violation occurs when the model breaks explicit structural constraints (e.g., JSON Schema), jeopardising downstream machine consumption; work with JSONSchemaBench shows that such violations remain stubbornly frequent despite constrained decoding \parencitegeng2025jsonschema.
Each error class manifests one predicate: the content fails to match a state of the world. That predicate admits representation with an given invalidation rate. The models in Section 2.4 use the rate alone and does not depend on class labels while it quantifies the chance that any output lacks validity.
Taken together, these archetypes point to a broad invalidation family, potentially with more members, underscoring the need for evaluation suites and mitigation strategies that address the full spectrum of failure modes. Figure 1 schematizes this superset-subset relationship.
1.2 Invalidation as a Universal Feature of Human and Artificial Cognition
Invalidation in contemporary LLM outputs is not an isolated flaw unique to artificial intelligence but mirrors well-documented behaviors in human discourse. This similarity suggests that invalidation is not simply a by-product of autoregressive sampling, but a potentially universal cognitive process rooted in the fundamental nature of information processing through language. It emerges when any complex agent (biological or artificial) operates under uncertainty, bounded rationality, and social constraints.
Human cognition systematically prioritizes narrative coherence over factual accuracy, a tendency that emerges not from individual pathology but from the fundamental architecture of meaning-making itself. When confronted with contradictory evidence, both individuals and groups construct elaborate justifications that preserve existing belief structures rather than revising them \parencitefestinger1957cognitive, cohen2001states. This preference for coherence manifests across scales: from the micro-level impression management that shapes everyday social interactions \parencitegoffman1959presentation to the macro-level collective narratives that enable societies to ignore systemic atrocities \parencitecohen2001states. Remarkably, this same structural bias toward local coherence over global accuracy appears in large language models, where autoregressive architectures favor maintaining consistency with previous tokens even at the expense of factual correctness. The parallel suggests that invalidation arises not from a flaw in either human or artificial systems, but from a deeper computational trade-off inherent to any agent that must construct meaning from sequential, uncertain information.
The medium itself acts as an epistemic filter, determining not just what information reaches us but what we accept as real, a process that operates identically whether the medium is television, print journalism, or a large language model. Media theorists have long recognized that truth emerges less from content evaluation than from structural repetition: the same claim, encountered repeatedly through trusted channels, eventually sediments into accepted fact regardless of its veracity \parencitegerbner1976living, mcluhan1964understanding. This manufacturing of consensus operates through cascading filters (economic incentives, institutional biases, and technological affordances) that systematically amplify certain narratives while suppressing others \parenciteherman1988manufacturing. Large language models instantiate this same filtering mechanism at an unprecedented scale: their training corpora encode the biases of millions of sources, their attention mechanisms privilege frequently repeated patterns, and their optimization objectives reward fluent reproduction over factual verification. The result is a computational echo chamber where invalidations, once embedded in training data, achieve the same truth-like status through sheer statistical dominance that media repetition grants to human beliefs.
Propaganda operates by exploiting a fundamental vulnerability in epistemic systems: sustained repetition of coordinated falsehoods eventually overwhelms the capacity for empirical verification, creating an alternate reality that becomes self-reinforcing through social proof. This mechanism, which political theorists identify as the cornerstone of totalitarian control, functions by flooding the information environment with internally consistent but externally false narratives until the sheer cognitive cost of maintaining skepticism exceeds most people’s capacity \parencitearendt1951origins, ellul1965propaganda. The parallel with large language models is striking: trained on billions of documents where certain false narratives appear thousands of times, these systems internalize misinformation not through ideological commitment but through pure statistical frequency. Just as propaganda succeeds by making lies more cognitively available than truth \parenciteellul1965propaganda, LLMs generate invalidations by sampling from probability distributions where well-represented falsehoods outweigh poorly-documented facts. The computational architecture thus recreates, without intention or awareness, the same reality-distortion mechanisms that human propagandists deploy deliberately.
Invalidation emerges from the fundamental computational shortcuts that make complex reasoning tractable; these shortcuts manifest identically in biological neural networks and artificial transformers. The core mechanism is substitution: when faced with difficult questions about truth or probability, both humans and LLMs unconsciously replace them with easier questions about familiarity and similarity \parencitetversky1974judgment, kunda1990motivated. This substitution operates through dual channels: the availability heuristic replaces “what is true?” with “what comes easily to mind?”, while the representativeness heuristic replaces “what is probable?” with “what resembles my prototype?” In large language models, these exact substitutions occur mechanistically—the softmax function literally converts truth-seeking into frequency-matching, while attention heads select tokens based on similarity rather than veracity. The result is a convergent failure mode where both human reasoning and machine generation systematically mistake statistical patterns for factual reality, producing confident invalidations that feel true precisely because they align with existing distributions rather than external facts \parencitekunda1990motivated.
The cross-disciplinary convergence of these findings implies that invalidation is not an accidental error mode but a structural consequence of how intelligent systems manage complexity, uncertainty, and contradiction. This insight reframes LLM invalidation as a computational echo of cognitive strategies humans employ. Rather than indicating a breakdown of alignment, it may reveal the presence of alignment to socially and contextually shaped heuristics that guide behavior in uncertain conditions.
Addressing invalidation in LLMs will require approaches that incorporate sociological, psychological, and media-theoretical models of belief formation and narrative control. At the same time, observing how LLMs generate invalidations may provide new empirical traction for understanding human cognitive phenomena such as confirmation bias, belief perseverance, and collective denial. The interdependence between artificial and human cognition in this respect suggests that solutions to the propagation of invalidation may emerge not solely from engineering but from a broader inquiry into the structure of meaning-making itself.
1.3 Risks of Invalidation Spill-over Through Discursive Networks
Invalidations produced by large language models rarely remain isolated. Once released into public channels they propagate through interconnected systems of communication. A discursive network is a large-scale ecosystem of human and machine agents whose utterances circulate, reinforce, and mutate through repeated exchange. When an LLM instantiates many synthetic voices that emit high volumes of text, those voices become additional nodes that mediate narrative transmission and amplification.
We use the term discursive network, rather than the established discourse network, to signal a McLuhanian twist. In classical discourse-network analysis the medium is a passive conduit: speeches, papers, and news items are shuffled among human actors while the underlying carrier remains inert. In the networks that include LLMs the carrier intervenes. The medium does not just move the message; it edits, rewrites, and recombines it at every hop. By switching from discourse to discursive we emphasise that language itself is now produced inside the network dynamics, co-authored by the very infrastructure that transmits it; this is a direct extension of McLuhan’s dictum from “the medium is the message” to “the medium acts on the message.”
Discursive networks consist of nodes (agents or messages) and edges (interactions, citations, reshares). Closely related structures appear in political-science discourse-network analysis, where policy actors and speech acts form time-evolving coalitions \parenciteleifeld2014,leifeld2012. Kittler’s media archaeology likewise foregrounds how technological substrates shape what counts as meaningful speech \parencitekittler1990. Our focus on generative models extends these traditions by treating LLM outputs as first-class network nodes.
Humans struggle to separate machine-generated prose from human writing. Controlled studies across genres report identification accuracy only slightly above chance (about 55-65 %) and show that readers rely on fragile surface heuristics such as pronoun frequency and stylistic fluency \parenciteippolito2020,DBLP:conf/acl/GehrmannSR19. In a discursive network this limitation matters: synthetic invalidations, mis-taken for human statements, are more readily reposted or cited.
Once injected, invalid content spreads via echo-amplification. Research on social-media communities finds that clusters organized in bow-tie topologies (tightly knit cores with radiating peripheries) amplify low-quality or misleading messages more than high-quality ones \parencitegarimella2018. Agents in these clusters can absorb and relay invalidations, algorithmic or human, without verification, entrenching flawed narratives.
Detection tools provide only partial relief. State-of-the-art AI detectors fall below 80 % accuracy on paraphrased or adversarial samples and show biases toward false positives on human text \parencitehuang2024robust,tufts2024practical,sadasivan2025reliably. Human moderators fare little better, typically around 65 % in blinded settings \parenciteippolito2020. Because both machines and people misclassify a substantial share of content, invalidations re-enter discourse with minimal friction.
Mitigating spill-over therefore demands a synthesis of discourse-network analysis, community-structure research, and detection studies. High- volume clusters of unverified claims need tracing, and interventions should target influential nodes, always within robust privacy and governance constraints. Recognising synthetic agents as network participants reframes classic media theory for the generative era: discursive-network analysis becomes both a descriptive lens on spill-over risk and a practical framework for targeted intervention.
2 Methods
This section lays out the analytical and computational machinery that supports the paper’s argument. We begin by establishing a mathematical floor on invalidation probability, proving that no finite-loss LLM can achieve zero error rate (Section 2.1). We then categorize user-model interactions into three functional classes—critique, ideation, and product-oriented generation—and show why critique operations are computationally cheapest, residing near the peak of the model’s output distribution (Section 2.3).
Building on these foundations, we formalize discourse as a network whose nodes exchange and validate statements under well-defined update rules (Section 2.4). We analyze three progressively richer belief dynamics (Section 2.5): first, a single-network model with binary truth states, proving convergence to a unique equilibrium determined by flip probabilities; second, an extended model incorporating spontaneous invalidation generation, showing how fabrication rates push systems into error-dominant regimes; and third, a dual-network model with cross-detection, deriving conditions under which external scrutiny reduces aggregate error below single-network baselines.
These theoretical results culminate in practical design principles: we derive the minimum number of cross-checking agents needed to achieve any target error tolerance (Section 2.5.5) and present the Flaws-of-Others (FOO) algorithm that implements these detection mechanisms in software, complete with cryptographic integrity verification to prevent post-hoc tampering (Section 2.6). Together, these methods provide both the mathematical framework for understanding invalidation propagation and the computational tools for mitigating it in practice.
2.1 Floor on invalidation probability
This subsection proves that a strictly positive invalidation probability is unavoidable for any LLM whose cross-entropy loss on its training distribution remains finite.
A discursive network has a particular standard, e.g. factual accuracy, logical coherence, a set of safety rules, etc. Every sequence that meets this standard is called “valid.” Whenever a sequence breaks the standard, we have an “invalidation.” In symbols, the indicator announces that sequence is valid, and the collection captures every possible invalidation.
The training corpus itself, represented by a probability distribution , might be imperfect; a fraction of its mass lies inside . This fraction measures how much invalidating content appears in the training data, e.g. factual errors in bibliographic sources, biased statements in news sources, outdated information in historical documents, logical contradictions in discussion forums, etc.
After training the language model on this corpus (including pre-training via next-token prediction, instruction tuning, and safety fine-tuning), we obtain a model whose learned distribution is . The central question is how much probability this trained model still places on , i.e. what chance it retains of producing an invalidation despite all the training improvements applied.
Empirical studies provide sobering context: medical-domain evaluations report factual errors in the 15–30 % range [thirunavukarasu2023large, sallam2023chatgpt], while adversarial assessments find 3–5 % harmful outputs even after safety training [zou2023universal, chao2023jailbreaking]. These persistent rates reflect structural limitations that no training regime can fully overcome.
Given , the Kullback-Leibler divergence from the reference distribution to the model distribution , we can ennounce the following lemma.
Lemma 2.1 (Invalidation floor).
If and , then
is known as known as absolute continuity. Whenever the training data assigns zero probability to a sequence, the trained model also assigns zero probability to that sequence. More formally: whenever , and whenever .
This inequality says that the residual invalidation probability cannot be driven to zero unless either the corpus is completely free of invalidations () or the model’s divergence from the corpus becomes infinite, which would imply infinite loss. The floor can be exponentially small when the model departs sharply from the training data through extensive fine-tuning, yet it never vanishes entirely.
Proof.
Expand the KL divergence and partition the sum by membership in :
Since KL divergence terms are non-negative, we have
Apply the log-sum inequality (for positive numbers with and , we have ) to the right-hand side. Setting and for , we get:
Therefore . Solving for yields:
∎
The inequality is a logical guard-rail: when the numerical floor may be tiny, yet it proves that any finite-loss model retains strictly positive invalidation probability. External verification layers are therefore indispensable.
The practical consequence is inescapable: any errors, contradictions, or policy violations that survive in the training data leave an indelible statistical trace in the model. However sophisticated the training regimen (whether through reinforcement learning from human feedback, constitutional AI, or advanced safety fine-tuning) remains strictly positive, so some risk of invalid output persists. This mathematical inevitability motivates our investigation of asymmetric task difficulties (Section 2.3), where we show that detecting invalidations is computationally easier than avoiding them during generation.
From Inevitability to Mitigation.
Lemma 2.1 establishes that whenever training loss is finite, i.e. invalidation is mathematically inevitable. This floor exists because LLMs maximize likelihood over their training distribution , which itself contains errors at rate . The bound shows that even aggressive fine-tuning (large KL divergence) cannot eliminate invalidations entirely.
This inevitability motivates a strategic pivot: rather than pursuing the impossible goal of zero invalidation through model improvement alone, we must design systems that detect and correct errors post-generation. The key insight is that different types of LLM outputs have different amenability to verification. As we show next, verification tasks (e.g. asking models to identify flaws in existing text) lie in high-probability regions of the output distribution and thus can be generated reliably even by imperfect models. This asymmetry between generation difficulty and verification difficulty forms the theoretical foundation for our multi-agent approach: we harness the relative ease of critique to construct networks where mutual verification pushes system-wide error rates below what any individual model achieves.
2.2 Information-Theoretic Basis for Verification Advantage
The inevitability of invalidations established in Lemma 2.1 raises a crucial question: if generation necessarily produces errors, can we at least detect them reliably? We now prove that verification is fundamentally easier than generation, providing the theoretical foundation for our multi-agent approach.
Theorem 2.1 (Variable-Length Entropy Comparison).
Let and be generation and verification tasks. Let:
-
•
and : joint random variables for length and content
-
•
and : conditional distributions and
-
•
and : length distributions and
Then (with all logarithms base 2, giving entropy in bits):
where the conditional entropy is:
and similarly for .
Proof.
By the chain rule for joint entropy:
(1) | ||||
(2) | ||||
(3) |
Similarly for :
(4) | ||||
(5) |
Subtracting the second equation from the first yields the stated result. ∎
Remark 2.1 (Why General Theorems Fail).
We cannot prove that verification is universally easier than generation because:
-
1.
Length-content dependence: In autoregressive models, length often encodes information (e.g., “yes” vs. detailed explanations)
-
2.
Unobservable distributions: We cannot measure the full tail of or empirically, making entropy estimates unreliable
-
3.
Task-specific constraints: The set of valid sequences varies dramatically by domain and cannot be bounded universally
Any rigorous claim must be task-specific and empirically grounded.
For specific task pairs where we can measure output distributions empirically, define the observable concentration ratio:
where are the ranked probabilities of distribution and prevents division by zero.
Empirical evidence suggests verification outputs are more concentrated than generation outputs:
-
•
Generation diversity: \textcitehashimoto2019unifying measured that GPT-2’s generation coverage (fraction of human-written continuations assigned high probability) is only 15-20%, indicating that probability mass is spread across many valid but unseen completions. \textciteholtzman2020curious further showed that nucleus sampling with is needed to achieve human-like text diversity, confirming that generation probability is dispersed across a large tail.
-
•
Verification concentration: \textciteschick2021exploiting found that when prompted for binary classification, over 90% of GPT-3’s probability mass concentrates on the top 2-3 tokens (e.g., “Yes”/“No” plus punctuation variants). \textcitemin2022rethinking demonstrated that in-context learning for verification tasks achieves near-peak performance with just label tokens, suggesting the output distribution is highly peaked on a small vocabulary subset.
-
•
Direct comparison: \textcitekadavath2022language showed that language models’ self-evaluation of their own outputs clusters around confidence values of 0.1, 0.5, and 0.9 (high concentration), while their actual answer distribution spans hundreds of phrasings (low concentration). This asymmetry between evaluation and generation distributions supports our theoretical framework.
This observable difference in concentration, while not universal, appears consistently enough to motivate verification-based error reduction strategies.
Implications for Discursive Networks.
The empirical concentration differences documented above provide practical justification for why cross-agent critique can achieve detection rates that exceed invalidation rates . When LLMs are tasked with verification—e.g., “find flaws” in peer outputs—they operate in the high-concentration regime where dominant patterns from training data guide responses. This contrasts with generation tasks that require exploring the long tail of the output distribution. While we cannot prove a universal entropy gap without task-specific assumptions (as shown in Theorem 2.1), the consistent empirical pattern of verification concentration exceeding generation concentration suggests that detection rates can systematically exceed fabrication rates in practice. The FOO algorithm exploits this empirical regularity to achieve system-wide error reduction, even when individual agents remain fallible in generation.
2.3 Categorization of Mechanisms to Engage LLM Agents
We now examine how different engagement mechanisms map onto the information-theoretic landscape established above. During inference, a Transformer language model executes a single forward pass per token. The model transforms context into embeddings, applies fixed self-attention layers to produce hidden state , and maps this to a token probability distribution via:
No gradient updates or objective-function evaluations occur at this stage; the only on-the-fly “optimization” is the decoding heuristic (greedy, top-, nucleus, or beam search) that selects the next token from the static distribution .
LLMs support a range of functional output types that can be systematically grouped into three core categories we introduce in this manuscript: ideation (constructive synthesis), critique (diagnostic evaluation), and product-oriented generation (goal-directed deliverables). These three umbrella functions organize diverse tasks, each with distinct inference properties, compositional demands, and distributional positions in the model’s output space. Existing literature identifies several subtypes that map naturally onto these categories \parencitebommasani2021opportunities, mialon2023augmented.
Critique output is structurally efficient to generate. Nested within critique are:
-
•
Flaws of others. This approach favors the strategy to ask for flaws or errors in the outputs of other agents, resulting in constructions that are frequent in training corpora thus placing them near the peak of . Even shallow decoding heuristics retrieve frequent patterns with high fluency and relevance \parencitewei2022chain.
-
•
Classification and Disambiguation, such as assigning sentiment, stance, or intent. These tasks resolve ambiguity and often underlie evaluation pipelines \parencitemialon2023augmented.
-
•
Restatement and Summarization, which surface structural coherence or hidden biases by rephrasing or compressing content. When used diagnostically, they reveal implicit assumptions or inconsistencies \parencitemaynez2020faithfulness.
Ideation output demands compositional novelty. Prompts that ask the model to hypothesize mechanisms, imagine alternatives, or propose designs typically land in the tail of the output distribution. Generating them requires broader exploration (via large beam width or elevated temperature) and exhibits greater output variance.
Within ideation, we find:
-
•
Instruction and Procedural Guidance, where the model scaffolds user understanding or explains concepts in sequence. These tasks require didactic clarity and often invoke implicit audience modeling \parenciteouyang2022training.
-
•
Meta-Reasoning and Strategy Output, which includes multi-step planning, evaluating hypotheses, or chain-of-thought reasoning. These outputs require recursive coherence and longer dependency tracking \parencitewei2022chain.
Product-oriented output targets the generation of external artifacts: source code, formatted markup, structured data, or interactive dialogue. These tasks often carry hard constraints and precision demands. Simple forms (e.g., boilerplate code) reside in high-probability zones, while structurally complex or compositional outputs require deeper exploration.
Included in this class are:
-
•
Formalism Translation, such as converting text to JSON, SQL, or LaTeX. This requires syntax-aligned generation and tight coupling between prompt and output form \parencitereynolds2021prompt.
-
•
Retrieval-Simulation, where the model reproduces facts or references learned during pretraining. These outputs appear fluent but are not grounded in current truth, making them useful but epistemically fragile \parencitebommasani2021opportunities.
-
•
Social Interaction Simulation, which includes emulating customer support, roleplay, or therapeutic dialogue. These are product-like in that the output is consumed as experience or interface, and they require tone, persona, and context alignment \parencitejo2025proxyllm, park2023generative, song2024typing.
Crucially, requests to “find flaws” tend to align with high-probability lexical patterns that the model has seen many times during training (e.g., “One limitation is…,” “A potential confound is …,” “This argument assumes …”). These stigmergic patterns, i.e. emerging from indirect communication mediated by modifications of the environment [MARSH2008136], lie near the mode of , so they are reachable with minimal search depth and are often found by even the cheapest heuristic, such as greedy decoding.
By contrast, requests for constructive, future-oriented solutions typically require compositional novelty: the model must synthesize domain facts, propose unseen mechanisms, and articulate actionable steps. Such completions reside in lower-probability regions of the distribution, forcing the decoder to explore a broader beam or to sample deeper into the tail, both of which are algorithmically and computationally more demanding. In short, critique lives near the peak; creativity lives in the tail, explaining the empirical asymmetry in generation efficiency that we observe.
2.4 Discursive Network Formalization
To systematically study the phenomenon of invalidation, we propose a formal model that quantifies how invalidations propagate within a discursive network. This model considers actors as nodes in a network, with edges representing the exchange of statements. The goal is to understand how actors influence each other and how invalidations spread, and if and how it reaches an equilibrium state. The subsequent sections will detail the analytical machinery, yielding quantiative information, for studying invalidation in both human and artificial contexts.
Definition 2.1.
Discourse. In the context of a discursive network, discourse refers to the structured process of communication and interaction between actors, , through which they exchange, validate, invalidate, and attempt to persuade each other regarding the truth or falsity of a set of statements . Discourse encompasses all forms of communication between actors, where beliefs are shared, challenged, or reinforced, as well as the mechanisms of invalidation , and persuasion , which influence the evolution of each actor’s belief set. The outcome of discourse is governed by the update rules , which dictate how actors revise their beliefs based on the interactions they engage in.
Definition 2.2.
Discursive Network. A discursive network is a formal structure where:
-
•
is the set of actors participating in the discourse.
-
•
is the set of possible statements, where each statement can be either true or false.
-
•
represents the persuasion functions, where gives the likelihood that actor will adopt statement after receiving communication from actor .
-
•
denotes invalidations, where signifies actor invalidating a statement held by actor using a contradictory statement .
-
•
represents the communications between actors, where is the set of statements communicated from actor to actor .
-
•
represents the belief sets of the actors, where denotes the set of statements believed to be true by actor .
-
•
is the set of update rules that define how each actor’s belief set is modified in response to communications and invalidations.
-
•
represents the goal functions of the actors, with specifying the set of statements actor seeks to convince other actors to believe.
The discursive network models the dynamics of belief formation, communication, persuasion, and invalidation among actors within a formal discourse setting.
Example. Consider a simple scenario with three actors and two statements . Actor believes () and wants and to also believe (). Actor believes () and wants and to believe (). Actor is initially neutral (). Actor communicates to , who invalidates by presenting . Actor , observing this interaction, updates their belief set based on the persuasion functions and update rules. This framework models the propagation of invalidation within a discursive network, capturing the dynamics of belief, communication, and influence. By formalizing these interactions, we can analyze and predict how invalidation affects the acceptance and rejection of statements among actors in the network.
2.5 Modeling Discursive Networks
2.5.1 Single-Network Two-State Model
Let be the actor count in the collapsed network with two mutually exclusive statements , where denotes a true statement and a false one. For comparison with empirical simulations we work exclusively with proportions. The population state at time is the column vector
where and are the respective counts of actors endorsing and . Micro-level flips are characterized by the probabilities for and for . These induce the population-level transition matrix
(6) |
The mapping of this model to Definition 2.2 is provided in Table 1.
Element in | Instantiation in single-network model |
---|---|
Unchanged actor set; proportions refer to . | |
Binary, mutually exclusive statements (true vs. false). | |
if and ; if and . | |
Contradiction if . | |
Message containing . | |
or for each actor. | |
Switches with the corresponding probability, otherwise leaves it unchanged. | |
Persuade others to adopt actor ’s current belief. |
Lemma 2.2 (Single-network invalidation propagation).
Let the single-network proportion state evolve according to
(7) |
with flip probabilities . The system has a unique fixed point
and the second eigenvalue of equals , whose modulus is strictly smaller than ; hence the Markov chain converges geometrically to from any initial distribution.
Proof.
A fixed point satisfies . Writing and expanding gives
Because , the first line reduces to
Thus the fixed point is unique. The characteristic polynomial of is , whose roots are and . Because , we have ; hence as , so every trajectory converges to . ∎
Interpretation of Lemma 2.2.
In this binary model the two flip probabilities satisfy , meaning every update attempt switches an actor’s belief with probability one. The fixed point then simplifies to : the long-run proportion of actors endorsing equals the single parameter , while the proportion endorsing equals . Thus the equilibrium distribution mirrors the flip probabilities directly; increasing (the propensity to abandon ) linearly increases the eventual share of believers and decreases that of believers by the same amount.
2.5.2 Single-Network Emergent Invalidation Model
The two-state single-network model sets sthe stage for the analysis of the emergence of invalidations in a discursive network. To accomplish this, we first endow a single discursive network with per-statement fabrication and internal correction. This captures the behaviour of a single-instance LLM generating new text: invalidations are injected at hazard , while subsequent self-reflections (or post-processing heuristics) invalidate a fraction of the false statements at hazard .
Setup.
Let the network be with actor set and . At any time , we track the counts and of actors endorsing true and false statements, respectively, with .
Working with raw counts becomes cumbersome when comparing networks of different sizes or analyzing asymptotic behavior. By converting to proportions, we obtain: (i) Scale invariance: Networks with 100 or 10,000 actors can be compared directly, (ii) Probabilistic interpretation: Proportions represent the probability that a randomly selected actor holds a given belief, and (iii) Mathematical tractability: Fixed-point analysis and stability results are cleaner in normalized coordinates.
Definition 2.3 (Normalized state).
The proportion state (or normalized state) of the single network at time is the vector
where and represent the fractions of actors endorsing true and false statements, respectively. The constraint is automatically preserved, reflecting that every actor holds exactly one belief at each time step.
Stochastic primitives.
Events are scaled per statement so that , and remain commensurate.
Fabrication (invalidation).
In this case, represents the number of new falsehoods generated. The Poisson distribution is used to model the number of events occurring in a fixed interval of time, given a known average rate. Each true statement is independently falsified during :
Internal flips.
represents the number of true statements that become false with a fixed probability of becoming false (i.e. is the intrinsic truth→false hazard). represents the number of false statements that are corrected to become true with a fixed probability of being corrected ( models spontaneous acknowledgement or repair). Both follow a Binomial distribution. Truths can degrade and falsehoods can self-correct:
Update equations.
Define
Then
(8) | ||||||
(9) |
with preserved.
Element in | Instantiation in emergent-invalidation model |
---|---|
Fixed actor set; proportions refer to . | |
Binary statements ( = true, = false). | |
Spontaneous flips: for (appears in ), for (appears in ). | |
Internal invalidation; realized as self-correction when . (No cross-actor invalidation in the single-network setting.) | |
Message from to containing . | |
Current belief of actor : or . | |
Update rule applying the three hazards: | |
(fabrications) | |
(truth → false) | |
(false → true) | |
Goal: persuade all other actors to adopt . |
Lemma 2.3 (Single-network invalidation with fabrication).
Let the single-network proportion state evolve according to
where are the intrinsic flip probabilities, is the per-statement fabrication probability (), and . Then:
1. The system has a unique fixed point
2. The second eigenvalue of is , whose modulus is strictly smaller than ; hence the Markov chain converges geometrically to from any initial distribution.
Proof.
A fixed point satisfies . Writing and expanding gives
Because , the first line reduces to which yields the fixed point in the statement. The characteristic polynomial of is , with roots and . Since , we have ; therefore as , so every trajectory converges to . ∎
Interpretation.
The fabrication term simply augments the ordinary truth-to-false hazard . Consequently the equilibrium share of false believers rises from (when ) to , while the speed of convergence slows as the spectral gap narrows. Fabrication raises the inflow into the false state by , while internal invalidation is unchanged. If the system becomes “invalidation-dominant,” mimicking an LLM that generates more new errors than it self-repairs, an empirical regime reported in LLMs (cf. Section 3).
2.5.3 Cross-Network Invalidation-Detection Model
Now that we have studied single networks with and without spontaneous invalidation emergence, the next natural question is how to reduce invalidations. As we will see, using multiple discursive networks reduces invalidations.
Let the two discursive networks be and with actor sets and ; write .
Definition 2.4 (Normalized state).
For each network the proportion state at time is
where and are the counts of true and false statements, respectively.
Stochastic primitives.
All events are now specified so that their rates are comparable across networks regardless of size. In particular, fabrication is scaled by the current stock of true statements.
Falsehood generation (fabrication).
Each currently true statement in is independently falsified during . The total number of such events is
where is the per-statement fabrication hazard.
Cross-network detection.
Each false statement in is noticed by with probability , so
Internal flips.
True statements spontaneously become false with probability , and false statements self-correct with probability :
Normalized update equations.
Let and . Dividing by gives the proportion dynamics
(10) | ||||
(11) |
with preserved automatically.
Element in | Instantiation in the model |
---|---|
Two disjoint actor sets ; proportions refer to . | |
Binary statements shared by both networks ( = true, = false). | |
Persuasion function. Within each network it reduces to constant flip probabilities: for (used in ), for (used in ). Across networks, persuasion acts only via with success probability . | |
Cross-network invalidation: if and the receiver belongs to the other network, the statement is detected and flipped with probability (realized through the random variable ). | |
Message sent from to carrying . Communication enables both persuasion and detection. | |
Belief of actor : or . | |
Update rule for actor that applies in the following order (i) fabrication , (ii) internal flips using , (iii) cross-network detection using . The parameter therefore lives inside . | |
Goal: persuade every other actor (within and across networks) to adopt . |
Lemma 2.4 (Dual-network invalidation propagation).
Proof.
At equilibrium the expected changes vanish, so . Using the distributional means
and dividing by to convert counts to proportions gives
(13) | ||||
(14) |
Equation (13) yields . Insert this into the normalization to obtain and . Finally, setting from (14) gives the constraint (12) and the stated fixed point. ∎
Interpretation of Lemma 2.4.
The equilibrium proportions reveal clear causal roles for each parameter. The false-statement share in network is
so it scales directly with the per-actor error-generation rate and inversely with the cross-network detection probability . More prolific error creation or weaker cross-scrutiny raises the long-run fraction of false statements.
The true-statement share is
hence it grows with the internal correction probability and falls with the internal corruption probability . A network that corrects errors efficiently () or seldom corrupts truths () achieves a higher equilibrium truth proportion.
Finally, Eq. (12) couples to the flip parameters: if within-network corruption outpaces correction (), the consistency condition forces a higher , pushing upward unless the partner network compensates with stronger detection (). Thus the model quantifies an intuitive trade-off: falsehood prevalence is driven by the ratio of error creation to error removal, internally via and externally via .
2.5.4 Single- vs. Cross-Network Models with Invalidation
Stationary false-statement shares.
The single-network emergent-invalidation model (Lemma 2.3) stabilizes at
For a given network engaged in cross-network detection with partner (Lemma 2.4) the corresponding steady state is
where now abbreviate and .
Here
intrinsic flip probability in ; | |
intrinsic flip probability in ; | |
fabrication hazard per true statement in ; | |
probability a false statement in is detected by . |
Lemma 2.5 (Cross-network detection lowers falsehood prevalence).
If
then i.e. coupling to an external detector reduces the steady-state prevalence of false statements.
Proof.
Subtract the two stationary shares:
∎
Interpretation.
External scrutiny () or lighter fabrication pressure () pushes invalidations in the cross-network system below the single-network benchmark. Conversely, when , fabrications outrun detections and the dual system sustains the same or a higher falsehood share than isolation.
2.5.5 How Many Agents Guarantee a Target Falsehood Level?
Lemma 2.6 (Effective correction hazard).
Let a focal discursive network possess an internal “false true” correction hazard , and let it be cross-linked to partner networks, each supplying an external correction hazard . Assuming that (i) internal and external detections act independently, and (ii) every detection channel is memory-less (exponential), the waiting time until a false statement in is first corrected is
Consequently, the effective per-statement correction rate is
Proof.
In the discrete model each false statement faces a single Bernoulli “self-repair” trial per period with probability . As the period length , the Binomial Poisson limit converts this into a Poisson correction stream of rate , i.e. an exponential clock .
Each of the other networks contributes an independent Bernoulli trial with probability per period. Taking the same limit gives independent Poisson streams of rate , or clocks .
The total waiting time until any clock rings is the minimum
Because the minimum of independent exponentials is itself exponential with rate equal to the sum of the component rates, we obtain and hence . ∎
Lemma 2.7 (Agent requirement for a tolerance ).
Let be the asymptotic proportion of false statements in when it is coupled to the other networks as above. Then
Consequently, to guarantee one needs at least
cross-detecting networks (agents).
Proof.
Substituting from Lemma 2.6 into the single-network formula yields
The constraint is equivalent to , which gives . Taking the ceiling ensures is an integer. ∎
Proof.
Per Lemma 2.3, for one isolated network the long-run fraction of false statements is
Coupling to partner networks multiplies its “false true” correction hazard from to per Lemma 2.6. Substituting this into the single-network formula gives the new steady state
Impose the tolerance constraint ,
Solving the inequality for yields the desired result. The ceiling is necessary because must be an integer. ∎
Interpretation.
External scrutiny scales linearly with the number of partner networks, while internal falsehood production stays fixed. Thus decays hyperbolically in , and each additional agent yields diminishing (but still positive), returns in truthfulness.
2.6 FOO Algorithm with Integrity Verification
The Flaws-of-Others (FOO) algorithm instantiates the detection hazard of Lemma 2.5 in software. It couples an arbitrary ensemble of LLM agents (each defined by a back-end model, decoding temperature, and free-text instructions) to a lightweight consensus loop (Algorithm 1 and Fig. 2). Neither the number of agents nor their prompts are fixed: both are read at run time from a simple JSON configuration, so the same engine can mediate anything from a two-model A/B test to a dozen specialized critics.
The FOO algorithm requires trust in the integrity of agent interactions. In collaborative scientific work, the provenance of each contribution becomes essential for reproducibility and accountability. We extend the basic FOO protocol with cryptographic integrity verification.
2.6.1 Core FOO Protocol
The protocol has four phases:
1. Broadcast: an initial user task is broadcast to every active agent. Each agent returns a first-pass answer.
2. Cross-examination (FOO step): every agent receives the instruction “find the flaws in …” followed by all peer answers except its own, and produces a critique. This implements the cross-detection hazard: an error overlooked by one model is likely to be flagged by at least one other.
3. Harmonization: one or more agents are flagged as harmonizers. They aggregate the entire set of critiques, separate agreements from contradictions, and emit a structured “judgement.” Harmonizers can use any rubric-majority vote, weighted confidence, specialist veto, to convert divergent feedback into a common set of observations.
4. Revision and loop: every non-harmonizer ingests the judgement and regenerates its answer, optionally rebutting points it believes to be wrong. The cycle repeats until a termination condition is met (identical outputs, bounded edit distance, or a maximum number of rounds). The final harmonizer synthesis is returned to the user.
Because the agents; instructions; stopping rule; and comparison metric are all configurable, the same code base supports tasks as different as mathematical proof sketching, literature surveying, or code review. The FOO loop thus acts as a versatile wrapper that upgrades solitary generation into a networked, self-auditing process, realising in practice the external detection hazard that pushes the system into the truth-dominant regime predicted by the theory.
2.6.2 Integrity Extension
Each FOO interaction generates a cryptographically signed record containing: (i) Message content and timestamp, (ii) Agent identity and interaction type, (iii) Hash-based link to previous interactions, (iv) Verification signature
This creates a tamper-evident chain where any modification to historical interactions invalidates subsequent cryptographic links, making post-hoc fabrication of contributions computationally infeasible.
The integrity logging adds four checkpoint types to Algorithm 1:
-
1.
Initial response logging after broadcast
-
2.
Critique logging during cross-examination
-
3.
Harmonization decision logging
-
4.
Revision logging during iteration
Detailed implementation algorithms and security analysis are provided in Appendix A.
3 Theoretical Validation and Parameter Analysis
This section demonstrates the mathematical consistency and theoretical properties of our discursive network models. Rather than claiming empirical validation, we show how the framework accommodates realistic parameter ranges and produces theoretically coherent dynamics. The analysis serves three purposes: (i) establishing that the models yield stable, interpretable equilibria; (ii) demonstrating how parameter variations affect system behavior; and (iii) illustrating the framework’s capacity to represent different invalidation regimes observed in the literature.
We parameterize our models using representative values drawn from the LLM literature to demonstrate theoretical consistency and explore regime transitions. These parameter choices illustrate the framework’s expressive capacity rather than constituting empirical validation. Future work will require systematic parameter estimation from controlled experiments designed specifically to test the discursive network hypotheses.
(b) Cross-network invalidation-detection dynamics corresponding to Lemma 2.4. The hazards , , and are unchanged, and an external detection probability links two equal networks (Lemma 2.4). Curves and bands represent the mean and confidence interval over runs of steps; the upper subplot corresponds to Network 1 and the lower to Network 2. Dashed lines indicate the predicted equilibria (true, blue) and (false, red), which reduce the long-run false share from about in panel (a) to about .
3.1 Parameter specification from literature ranges
Published studies provide parameter ranges that inform our theoretical analysis. \textciteJi2023SelfReflection report invalidation rates of 26-61% in medical domains, while \textciteZhang2024SelfAlignment document self-evaluation accuracy near chance levels (AUROC 0.55). These findings suggest parameter regimes where , corresponding to invalidation-dominant dynamics in our framework:
i.e. the model fabricates new errors faster than it repairs them internally, i.e. an invalidation-dominant system. While we do not claim these studies directly measure our theoretical parameters, they establish the empirical plausibility of invalidation-dominant regimes and provide realistic bounds for theoretical exploration.
The claim-level probabilities reported by \textciteZhang2024SelfAlignment are raw soft-max scores. They are over-confident until calibrated (their Fig. 5), and they average dependent claims. Consequently we treat them as proxy scores, not literal probabilities, and accompany every point estimate with an explicit uncertainty discussion in what follows.
3.2 Parameter choices for simulation
Table 4 lists the hazards used in our single- and dual-network simulations. Whenever a range was reported in the source, we chose values that place the single network near the mid-point of the error band (about 60 % false statements) so that the effect of cross-network detection is easy to visualise.
The following analysis explores model behavior under parameter values that span regimes of practical interest. We examine: (i) single-network dynamics with varying ratios; (ii) cross-network detection effects as varies; and (iii) scaling behavior as the number of agents increases. This constitutes theoretical validation of model consistency rather than empirical hypothesis testing.
Symbol | Value | Model role | Empirical hook / comment |
---|---|---|---|
true false slip | Chosen an order of magnitude smaller than so that internal repair remains visible. | ||
internal repair | Matches self-eval’s modest AUROC . | ||
invalidations | Upper half of the 26-61 % error band implies ; solving for with fixed gives . | ||
cross-network repair | Picked so that , just inside the truth-dominant region; see Lemma 2.5. |
3.3 Single-network baseline
With the calibrated triple the single-network model (Lemma 2.3) predicts
A Monte-Carlo experiment (FOO_Single_Network.py, runs, steps) produces .
Interpretation.
When the false share stabilises near 60 %, squarely inside the empirical band of \citeauthorJi2023SelfReflection. This baseline serves as the reference against which we gauge cross-network effects.
3.4 Dual-network architecture
Coupling two identical networks via an external repair hazard (Lemma 2.5) yields
i.e. the falsehood prevalence is cut roughly in half. Simulations ( runs, steps) give and (Fig. 3b).
Interpretation.
The external hazard can be realized by retrieval-augmented generation, ensemble adjudication, or human post-editing. Once the ratio drops below the system flips from an invalidation-dominant regime () to a truth-dominant one (), a 51 % relative reduction.
3.5 How many independent agents for 5 % error?
Using Proposition 2.7 we can ask: How many cross-detecting networks are required to push the long-run false share below ? With the parameters of Table 4 and tolerance ,
Hence, in this example, at least nine mutually detecting agents are necessary to guarantee that fewer than one statement in twenty remains false at equilibrium under this calibration. The requirement grows only linearly in thanks to the additive nature of the external hazards, making multi-agent verification a scalable pathway to high factual reliability. Figure 4 shows the functional dependency between and .
4 Discussion
4.1 Ethical Concerns
The first and most important observation regarding discursive networks with LLMs is ethical: there is a risk of transgression when the human(s) in a discursive network use it as a means to oursource reasoning instead of deploying it as augmented intelligence. In linguistics, epithesis refers to the addition of a sound or letter to the end of a word without changing its meaning. By analogy we identify an ethical concern in discursive networks: the scientific epithesis, wherein an individual seeks authorship on an artifact to which they have contributed only superficial edits—or none at all. Like the linguistic phenomenon, the intervention leaves the substantive content untouched while appending an external element that alters perception rather than substance. Scientific epithesis does not meet the formal threshold of plagiarism, yet it belongs to the same family of misappropriations because it places a symbolic layer of credit “upon” the discourse without engaging in its intellectual construction. In the context of discursive networks this behaviour distorts the link between contribution and attribution, undermining the very mechanism of cross-agent validation that the network is designed to support.
Authorship norms present another axis of concern. As dozens of agents contribute micro-edits, intellectual responsibility becomes increasingly opaque, complicating both credit assignment and error tracing. Empirical work across disciplines shows that diffuse contributions encourage honourary or “gift” authorship, diluting accountability and undermining public trust in published findings [maruvsic2011systematic]. What constitutes authorship in discursive network is an open question that will take time to settle.
The integrity of discursive networks fundamentally depends on the ability to verify the authenticity and provenance of each contribution, whether from human or artificial agents. When scientific conclusions emerge from iterative exchanges among multiple participants, traditional notions of authorship become complicated by the distributed nature of intellectual labor and the possibility of post-hoc modification of interaction records. This challenge is particularly acute in combating epithesis, as the practice thrives in environments where genuine contributions cannot be distinguished from superficial additions or retroactive claims of involvement. A robust solution requires cryptographic mechanisms that create tamper-evident logs of all interactions, making it computationally infeasible to fabricate authorship claims after the fact. By implementing blockchain-based integrity verification for agent communications, discursive networks can establish a trustworthy foundation where each participant’s actual contributions are permanently recorded and verifiable. This technical infrastructure does more than prevent fraud; it creates positive incentives for meaningful engagement by ensuring that substantial intellectual contributions receive proper attribution while making epithetic behavior both detectable and reputationally costly. The result is a research environment where collaborative human-AI knowledge production can proceed with confidence in the integrity of the underlying interaction records.
The energy footprint of large discursive networks pose another ethical dilemma. Each critical review involves at least one forward-and-backward pass through a language model, so if a manuscript is critiqued by a single external agent the computational cost grows linearly with the number of agents, . When every agent critiques every other agent—the fully connected case that is ideal for robustness—the number of pairwise exchanges scales as , and so does the energy consumption. Moreover, once those quadratic interactions have occurred the network typically runs a consensus or “harmonization” phase to reconcile conflicting edits; common distributed algorithms complete in synchronous rounds. The aggregate budget therefore climbs to for end-to-end validation, dwarfing the cost of the original single-agent composition. In an era when each large-model inference already carries a measurable carbon footprint, the quadratic-plus overhead raises difficult questions about the sustainability of scaling discursive networks without parallel investment in greener compute or more frugal validation protocols.
Because those costs grow faster comapred to single-agent or single-author scenarios, only well-funded actors may afford the energy budget, deepening the resource gap already highlighted for modern NLP pipelines [Strubell2019Energy]. Sustaining the benefits of discursive verification therefore demands not merely algorithmic innovation but also governance frameworks and infrastructural subsidies that keep the playing field environmentally and economically fair.
Discursive networks promise robust cross-verification, yet their perpetual negotiation of “truth” can erode epistemic diversity. When many agents iteratively revise one another, the process tends to pull answers toward a central consensus, suppressing minority explanations in favour of the statistically safest wording. Large-scale language models already exhibit this homogenising bias, along with well-documented tendencies to replicate and even amplify the social prejudices embedded in their training data [Bender2021Parrots]. A network of such agents therefore risks hard-coding bias under the reassuring veneer of multi-agent agreement.
At the same time, cross-agent review typically demands full visibility of prompts and intermediate reasoning, thereby increasing the attack surface for privacy breaches. Membership-inference studies demonstrate how seemingly benign queries can reveal whether sensitive records were present in a training set [Shokri2017Membership], suggesting that discursive networks must treat every inter-agent channel as a potentially hamful vector for leakage.
Finally, adversarial robustness and distributive justice pose intertwined challenges. A single compromised agent can inject fashioned “triggers” that, once propagated through mutual validation, shift the entire network toward a malicious conclusion [Wallace2019Triggers].
4.2 Conclusions and Outlook
This manuscript has traced a broad arc, from theoretical grounding to practical tooling, around a single organising idea: discursive networks. By recognising that every LLM is both a generator and a consumer of discourse, we cast its interactions as edges in a network whose universal structure can be exploited for robust error control. Building on this abstraction we introduced Flaws-Of-Others (FOO), a reconfigurable agent-based algorithm packaged with user-friendly tools that assist the production, verification, and revision of scientific knowledge.
Many ethical challenges in discursive networks call for detailed, tamper-proof logs of interactions. This is essential for accountability, reproducibility, and auditability in digital systems. When interactions involve LLMs, scientific collaborations, or complex data workflows, the capacity to verify the integrity of recorded exchanges becomes a prerequisite for trust. One approach to achieving this is through the use of blockchain technology, which can provide decentralized, cryptographically secure records that are resistant to unauthorized modification. Each entry in a blockchain-based log is linked to the previous one through cryptographic hashes, ensuring that any tampering with earlier data invalidates the entire subsequent chain. This structure allows interaction logs to be both transparent and verifiable without central oversight. Furthermore, incorporating time-stamping and access control into such systems ensures that each interaction is both temporally fixed and attributable. These properties make blockchain a viable framework for securing interaction logs in research, legal compliance, and automated decision-making contexts.
This study deliberately replaces the fashionable term hallucination with the broader concept of invalidation. Whereas “hallucination” suggests a purely accidental slip in the model’s internal perception, the data reveal a richer spectrum of invalidations (failure modes) that includes strategic prompt manipulation, chain-of-thought drift and the simple inheritance of errors from flawed training corpora. All of these mechanisms manifest in the observable metric that matters to users, the production of false statements, so the single hazard rate is most naturally interpreted as an invalidation rate. The shift in vocabulary is therefore more than semantics: it aligns theoretical parameters with the phenomena actually counted in benchmarks.
A discursive network that relies exclusively on its own self-correction routines will remain invalidation-dominant as soon as the fabrication hazard exceeds the internal repair rate (). In that regime the long-run share of false statements is bounded below by one half, regardless of implementation details or domain. The formalism developed here reveals why: fabrication and self-repair enter the steady-state ratio additively, leaving no structural mechanism for the network to “outrun” its own invalidations.
The picture changes once independent verification channels are added. Coupling the generator to external agents augments every false statement with an additional repair hazard per agent, yielding the effective rate . If the composite system satisfies it flips into the truth-dominant regime, in which the falsehood share decreases monotonically with and approaches zero in the limit of infinite cross-checking capacity. The transition threshold depends only on the ratio , providing a clean design criterion that is independent of any particular benchmark or simulation.
Proposition 2.7 turns that qualitative criterion into a quantitative planning tool. It gives a closed-form bound
for the smallest number of mutually detecting agents required to keep the long-run falsehood share below a user-specified tolerance . Because grows only linearly in , even ambitious error targets (e.g. ) translate into tractable network sizes. In practice, engineers can estimate , , and from established benchmarks, select a viable cross-detection mechanism to determine , and then read off directly from the formula. The discursive-network formalization thus provides not just a descriptive model of information dynamics but a concrete apparatus for right-sizing the verification infrastructure needed to achieve prescribed levels of factual reliability.
The theoretical analysis confirms that our mathematical framework produces stable, interpretable dynamics across parameter ranges consistent with published LLM studies. The models successfully capture qualitative regime transitions (invalidation-dominant vs. truth-dominant) and provide quantitative predictions for multi-agent system design.
Empirical validation remains an important direction for future work, requiring controlled experiments specifically designed to measure the theoretical parameters (, , , ) under realistic conditions.
Although the mathematics treats as a single scalar, the concept encompasses several concrete engineering choices. Retrieval-augmented generation raises by surrounding the model with authoritative passages that expose contradictions. Model-ensemble adjudication raises it further by combining the diverse priors of independently trained models; uncorrelated errors rarely agree, so the aggregate chance that a falsehood slips through diminishes rapidly. Even higher values of become attainable when a human editor is placed in the loop, although latency and cost then become limiting factors. Our hazard model quantifies these trade-offs: for any target false-statement tolerance one can compute a required detection rate, and thus budget the amount of human or automated scrutiny that must be applied.
Just as people are quicker to spot another person’s mistakes than their own, a language model is often a sharper critic of a peer’s text than of its own output. The asymmetry stems from each model’s training objective: during generation it maximises local fluency rather than global factuality, so a well-phrased falsehood slips through unchallenged. When the same model is asked only to evaluate an already-written passage, fluency is no longer the bottleneck; the task collapses to checking claims against the knowledge stored in its training corpus. Verification is therefore easier, and cross-model review can expose errors that the original authoring pass left intact.
The significance of these findings extends beyond artificial systems. Invalidation dynamics of the same mathematical form could govern human conversation, peer review and social-media fact-checking. Recognising this shared structure invites a unified research agenda that links network science, cognitive psychology and algorithmic governance. Our future work will explore heterogeneous hazards at the level of individual actors, time-varying detection capacities that respond to workload, and live A/B tests that recover hazard estimates directly from production chat traffic. A deeper ethical analysis will also be required, because the push toward smaller competes with privacy constraints, energy budgets and the carbon footprint of large-scale verification.
We have entered a new cultural mode. It affects scientific production.
We have crossed a cultural threshold: language is no longer crafted solely by human hands and minds, but is continuously co-composed with machines. Large language-model systems do not “assist” writing; they repurpose it. Authorship dissolves into a live dialogue between human intention and algorithmic completion, with sentences generated, revised, and re-externalized in the same breath. Because the practice itself has changed, the evaluative yardsticks built for solitary, page-bound prose (originality scores, citation counts, style rubrics, etc.) no longer capture what is happening on the screen. This cultural shift brings novel risks and opportunities, demanding updated methods, training, expectations, and metrics.
Looking forward, the research programme must widen from token-level truthfulness to medium-level dynamics. Three paths stand out:
-
1.
Cultural assimilation. Our most pressing need is an ethical framework to harmonize societal values with this new reality. We need conventions, interfaces and pedagogies that make continuous, model-mediated composition legible, trustworthy, and fair.
-
2.
Metric redesign. Benchmarks rooted in solitary authorship and static text can no longer capture quality in a live, co-creative medium. New metrics should score how a statement evolves under iterative detection and repair, not just its instantaneous truth value.
-
3.
Governance of agency. When the medium itself “acts on the message,” responsibility diffuses across designers, deployers and end-users. Future hazard models must therefore integrate economic incentives, interface affordances and policy constraints alongside the technical parameters .
Seen through this lens, the ideas discussed in this manuscript form a prototype for broader cultural discussions that will emerge as society learns, once again, to write in a fundamentally new medium.
Acknowledgements
This work was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under Award No. 1R25GM151182 and by the National Science Foundation under Award No. 2518973. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH, NIGMS, or NSF.
In an instance of self-reference, this manuscript was developed using the methods it describes, through iterative cycles of recursive software development that refined the underlying technology. The records of interactions with multiple LLM agents are recorded only for late-stage manuscript refinement. Nevertheless, the following statement accurately represents the process employed here and provides a template for future work:
This manuscript and its supplementary materials were produced using methods detailed in [gutierrez2025flaws]. The author supplied the core concepts, the logical framework, and foundational technical content, while large language-model assistance was utilized for ideation, verification, text drafting and revision, and software implementation. Responsibility for all claims made herein rests solely with the author. Existing logs of interactions with LLM agents are included as an appendix within the supplementary materials.
Appendix A Blockchain Implementation Details
.
Implementation note.
We implement the loop in Python 3.11 with asyncio concurrency; each agent call is an HTTP request to a hosted LLM endpoint (e.g. OpenAI, Anthropic). Unless stated otherwise, the experimental pool comprises a single harmonizer (the most capable model available) and, for each back-end engine under test, two specialist agents sampled at temperatures 0.1 and 0.9. The protocol executes one broadcast round plus a minimum of three consensus rounds, so every query triggers
where is the total number of agents (harmonizer + specialists). For example, with one engine () the loop issues calls; adding engines or additional specialist roles scales the cost linearly. All configuration files and source code are publicly available at https://github.com/biomathematicus/foo.
Blockchain motivation.
The reliability of discursive networks depends critically on the integrity of recorded interactions between human and artificial agents. When scientific conclusions emerge from iterative exchanges among multiple LLMs and human reviewers, the provenance and authenticity of each contribution becomes essential for both reproducibility and accountability. Traditional logging systems are vulnerable to post-hoc modification, making it difficult to distinguish genuine collaborative refinement from retrospective tampering or fabricated authorship claims.
We address this challenge through a blockchain-based integrity system that creates tamper-evident records of all agent interactions. Each message exchange in the discursive network generates a cryptographic block containing the agent identity, message content, timestamp, and hash-linked reference to the previous interaction. The system employs SHA-256 hashing with a global salt shared across all agents to ensure consistency while preventing individual agents from being identified through hash analysis alone.
Definition A.1 (Conversation Blockchain).
For a discursive network , the conversation blockchain is a sequence of cryptographically linked blocks that creates tamper-evident records of all agent interactions, where is a global salt and denotes concatenation.
The cryptographic hash function can be instantiated with various secure hash algorithms including SHA-256, SHA-3 (Keccak), BLAKE2, or BLAKE3, each offering different performance and security trade-offs. SHA-256 remains a robust choice for blockchain applications due to its widespread adoption and proven security properties, while newer algorithms like BLAKE3 offer superior performance for high-throughput scenarios.
The blockchain protocol ensures that any modification to historical interactions invalidates the cryptographic chain, producing detectable integrity violations. When agents load previous conversations, the system verifies the complete hash chain and displays prominent warnings if tampering is detected: “LOG TAMPERED. TRUST HAS BEEN BREACHED. BLOCKCHAIN FAILS.” This mechanism makes post-hoc fabrication of contributions computationally infeasible while preserving the ability to legitimately edit conversations by rebuilding the chain from the point of modification onward.
The implementation maintains separate blockchains for each agent while using a shared salt stored in the system configuration to ensure hash consistency across sessions. Genesis blocks are initialized with fixed timestamps to prevent hash divergence during system restarts. The protocol automatically migrates existing conversation logs to blockchain format, enabling backward compatibility while establishing integrity verification for all new interactions.
This cryptographic foundation serves two complementary functions in discursive networks. First, it provides technical infrastructure for reproducible research by creating verifiable logs of how scientific conclusions evolved through agent interactions. Second, it establishes an ethical framework for attribution by making it computationally expensive to falsify contributions after the fact. The blockchain thus transforms the question of authorship from a matter of trust to one of cryptographic verification, supporting the broader goal of maintaining accountability in collaborative human-AI knowledge production.