toon4s is the idiomatic Scala implementation of Token-Oriented Object Notation (TOON),
a compact, LLM-friendly data format that blends YAML-style indentation with CSV-like tabular efficiency.
Save 30-60% on LLM token costs while maintaining full JSON compatibility.
What makes toon4s different: Most libraries prioritize features over architecture.
- Pure functional core: Zero mutations, total functions, referentially transparent
- Type safety first: sealed ADTs, exhaustive pattern matching, zero unsafe casts, VectorMap for deterministic ordering
- Stack-safe by design: @tailrec-verified functions, constant stack usage, handles arbitrarily deep structures
- Modern JVM ready: Virtual thread compatible (no ThreadLocal), streaming optimized, zero dependencies (491KB core JAR)
- Production hardened: 380+ passing tests, property-based testing, Either-based error handling, security limits
- Railway-oriented programming: For-comprehension error handling, no exceptions in happy paths, composable with Cats/ZIO/FS2
Example:
{ "tags": ["jazz","chill","lofi"] }→tags[3]: jazz,chill,lofi(40-60% token savings)
- Key features & Scala-first USPs
- Benchmarks at a glance
- Architecture & design patterns
- Installation
- Quick start (library)
- CLI usage
- Format crash course
- Rules & guidelines
- API surface
- Type safety & conversions
- Using TOON in LLM prompts
- Limitations & gotchas
- Syntax cheatsheet
- Development & quality gates
- License
| Theme | What you get | Why it matters on the JVM |
|---|---|---|
| Spec‑complete | Full conformance with TOON v1.4 spec; parity with toon (TS) and JToon (Java). |
Mixed stacks behave the same; token math is consistent across platforms. |
| Typed APIs (2 & 3) | Scala 3 derivation for Encoder/Decoder; Scala 2.13 typeclasses via ToonTyped. |
Compile‑time guarantees, no Any; safer refactors and zero-cost abstractions. |
| Pure & total | All encoders/decoders are pure functions; decode returns Either[DecodeError, JsonValue]. |
Idiomatic FP: easy to compose in Cats/ZIO/FS2; referentially transparent. |
| Deterministic ADTs | JsonValue as a sealed ADT with VectorMap for objects; stable field ordering. |
Exhaustive pattern matching; predictable serialization for testing/debugging. |
| Streaming visitors | foreachTabular and nested foreachArrays (tail‑recursive, stack-safe). |
Validate/process millions of rows without building a full AST; constant memory usage. |
| Zero‑dep core | Core library has zero dependencies beyond Scala stdlib; CLI uses only scopt + jtokkit. |
Tiny footprint (<100KB), simpler audits, no transitive dependency hell. |
| Strictness profiles | Strict (spec-compliant) vs Lenient (error-tolerant) modes with validation policies. |
Safer ingestion of LLM outputs and human-edited data; configurable validation. |
| CLI with budgets | Built-in --stats (token counts), --optimize (delimiter selection); cross-platform. |
Track token savings in CI/CD; pick optimal delimiter for your data shape. |
| Virtual thread ready | No ThreadLocal usage; compatible with Java 21+ Project Loom virtual threads. | Future-proof for modern JVM concurrency; scales to millions of concurrent tasks. |
| Production hardened | 381 passing tests; property-based testing; strict mode validation; security limits. | Battle-tested edge cases; prevents DoS via depth/length limits; safe for production. |
This is what sets toon4s apart: While most libraries compromise on architecture for convenience, toon4s demonstrates that you can have both production performance and functional purity. Every design decision prioritizes correctness, composability, and type safety-making toon4s a reference implementation for modern Scala projects.
Every function in toon4s is pure and total:
-
Zero mutations: No vars / while loops
- State threading pattern (pass state as parameters, return new state)
- Accumulator-based tail recursion
- Immutable builders (Vector, VectorMap)
-
Total functions: No exceptions in happy paths
- All encoders/decoders return
Either[Error, Result] - Railway-oriented programming for error handling
- Exhaustive pattern matching on sealed ADTs
- All encoders/decoders return
-
Referentially transparent: Same input → same output, always
- No side effects in core logic
- No global mutable state
- Deterministic output (VectorMap preserves insertion order)
-
Stack-safe recursion: 25 functions with
@tailrec- Compiler-verified tail call optimization
- Can parse arbitrarily deep structures
- Constant stack usage regardless of input size
Scala's type system is used to maximum effect:
// Sealed ADT - compiler enforces exhaustive matching
sealed trait JsonValue
case class JString(value: String) extends JsonValue
case class JNumber(value: BigDecimal) extends JsonValue
case class JBool(value: Boolean) extends JsonValue
case object JNull extends JsonValue
case class JArray(values: Vector[JsonValue]) extends JsonValue
case class JObj(fields: VectorMap[String, JsonValue]) extends JsonValue
// Total function - always succeeds or returns typed error
def decode(input: String): Either[DecodeError, JsonValue]
// Scala 3 derivation - zero-cost abstractions
case class User(id: Int, name: String) derives Encoder, DecoderKey type safety features:
- sealed ADTs: Exhaustive pattern matching catches missing cases at compile time
- No unsafe casts: Zero
asInstanceOfin production code (only 2 necessary casts with safety comments) - VectorMap everywhere: Ensure deterministic field ordering
- Compile-time derivation: Scala 3
derivesgenerates type class instances at compile time
State Threading Pattern
@tailrec
def collectFields(
targetDepth: Option[Int],
acc: Vector[(String, JsonValue)] // Accumulator instead of var
): Vector[(String, JsonValue)] = {
cursor.peek match {
case None => acc
case Some(line) if line.depth < baseDepth => acc
case Some(line) =>
val td = targetDepth.orElse(Some(line.depth))
if (td.contains(line.depth)) {
cursor.advance()
val KeyValueParse(key, value, _) = decodeKeyValue(...)
collectFields(td, acc :+ (key -> value)) // Recurse with new state
} else acc
}
}Railway-Oriented Programming
// Either accumulation instead of var err: Error | Null = null
xs.foldLeft[Either[DecodeError, List[A]]](Right(Nil)) {
(acc, j) =>
for
list <- acc // Short-circuit on first error
a <- d(j) // Decode current element
yield a :: list // Accumulate successes
}.map(_.reverse)| Metric | Value | Meaning |
|---|---|---|
| Production code | 5,887 lines (56 files) | Well-organized, modular |
| Test coverage | 380+ tests, 100% passing | Comprehensive validation |
| Tail-recursive fns | With @tailrec |
Stack-safe, verified |
| Sealed ADTs | traits/classes | Exhaustive matching |
| VectorMap usage | 32+ occurrences | Deterministic ordering |
| Mutable state | No vars in parsers |
Pure functional |
| Unsafe casts | 2 (documented as safe) | Type-safe design |
Built for the future of JVM concurrency:
-
Virtual thread ready: Zero
ThreadLocalusage- Fully compatible with Java 21+ Project Loom
- Can spawn millions of virtual threads without memory leaks
- See core/src/main/scala/io/toonformat/toon4s/encode/Primitives.scala:60 for virtual thread design notes
-
Streaming optimized: Constant-memory validation
Streaming.foreachTabular- process rows without full ASTStreaming.foreachArrays- validate nested arrays incrementally- Tail-recursive visitors with accumulator pattern
-
Zero dependencies: 491KB core JAR
- Pure Scala stdlib (no Jackson, Circe, Play JSON)
- CLI only adds scopt + jtokkit
- Minimal attack surface for security audits
toon4s proves you don't have to choose between performance and purity:
| Traditional Tradeoff | How toon4s Achieves Both |
|---|---|
| "Mutation is faster" | Tail recursion + accumulators match imperative performance while staying pure |
| "Exceptions are simpler" | Either + railway-oriented programming is just as ergonomic with for-comprehensions |
| "ThreadLocal is convenient" | State threading pattern works seamlessly with virtual threads (future-proof) |
| "Any/casting saves time" | Sealed ADTs + exhaustive matching catch bugs at compile time (saves debugging time) |
| "External libs add features" | Zero dependencies means zero CVEs, zero conflicts, minimal attack surface |
The result: A library that's both safer (pure FP, types) and faster to maintain (no surprises, composable).
This architecture makes toon4s ideal for:
- Production services - reliability and correctness are non-negotiable
- Functional stacks (Cats, ZIO, FS2) - pure functions compose without side effects
- Virtual thread workloads (Project Loom) - no ThreadLocal means no memory leaks
- High-throughput pipelines - ~866 ops/ms with predictable, constant-memory streaming
- Type-safe domain modeling - sealed ADTs + derivation = compile-time guarantees
Bottom line: toon4s is what happens when you refuse to compromise. Use it for TOON encoding, or study it to learn how to build production-grade functional systems.
See also: SCALA-TOON-SPECIFICATION.md for encoding rules
See also: Encoding rules, Strict mode, Delimiters & markers
Be honest: token savings depend on your data. From our runs and community reports:
- Typical savings: 30-60% vs formatted JSON when arrays are uniform and values are short strings/numbers.
- Small example:
{ "tags": ["jazz","chill","lofi"] }→tags[3]: jazz,chill,lofisaved ~40-60% tokens across common GPT tokenizers. - Deeply nested, irregular objects: savings narrow; sometimes JSON ties or wins. Measure in CI with
--stats. - Retrieval accuracy: some reports show JSON ≈ 70% vs TOON ≈ 65% on certain tasks. If accuracy matters more than cost, validate on your prompts.
Use the CLI or the benchmark runner to measure your payloads:
# Option A: CLI (quick)
toon4s-cli --encode payload.json --stats --tokenizer o200k -o payload.toon
# Option B: JMH runner (reproducible set)
sbt jmhDev
Throughput (JMH, macOS M‑series, Java 21.0.9, Temurin OpenJDK; 5 warmup iterations × 2s, 5 measurement iterations × 2s):
Benchmark Score Error Units
decode_tabular 865.609 ± 27.170 ops/ms
decode_list 862.522 ± 19.230 ops/ms
decode_nested 625.473 ± 1.714 ops/ms
encode_object 213.798 ± 2.628 ops/ms
Performance Highlights:
- Tabular decoding: ~866 ops/ms - highly optimized for CSV-like structures
- List decoding: ~863 ops/ms - fast array processing
- Nested decoding: ~625 ops/ms - efficient for deep object hierarchies
- Object encoding: ~214 ops/ms - consistent encoding performance
Note: numbers vary by JVM/OS/data shape. Run your own payloads with JMH for apples‑to‑apples comparison.
- Token savings: format‑driven and therefore similar across implementations. Expect ~30-60% on uniform/tabular data. Example:
{ "tags": ["jazz","chill","lofi"] }→tags[3]: jazz,chill,lofi. - Accuracy: prompt‑ and data‑dependent. Community reports: JSON ≈ 70%, TOON ≈ 65% on some tasks. Measure on your prompts before switching.
- Throughput: toon4s encode throughput is on par with JToon on small/mid shapes (JMH quick: ~200 ops/ms). Decoding is implemented and fast in toon4s (tabular ~1k ops/ms). If/when JToon adds decoding, compare like‑for‑like.
- Scala ergonomics: typed derivation (3.x), typeclasses (2.13), sealed ADTs, VectorMap ordering, streaming visitors, zero‑dep core.
- Guidance: use toon (TS) for Node stacks, JToon for Java codebases, toon4s for Scala. Token savings are equivalent; choose by ecosystem fit.
Savings are model/tokenizer-sensitive; treat ranges as guidance, not guarantees.
See also: Token benchmarks
// build.sbt
libraryDependencies += "io.toonformat" %% "toon4s-core" % "0.1.0"Prefer CLI only? Ship the staged script (diagram below):
sbt cli/stage # builds ./cli/target/universal/stage/bin/toon4s-cli
./cli/target/universal/stage/bin/toon4s-cli --encode sample.json -o sample.toonimport io.toonformat.toon4s._
val payload = Map(
"users" -> Vector(
Map("id" -> 1, "name" -> "Ada", "tags" -> Vector("reading", "gaming")),
Map("id" -> 2, "name" -> "Bob", "tags" -> Vector("writing"))
)
)
val toon = Toon.encode(payload, EncodeOptions(indent = 2)).fold(throw _, identity)
println(toon)
// users[2]{id,name,tags}:
// 1,Ada,[2]: reading,gaming
// 2,Bob,[1]: writing
val json = Toon.decode(toon).fold(throw _, identity)
println(json)- Works with Scala 3.3.3 and Scala 2.13.14 (tested in CI).
- Accepts Scala collections, Java collections,
java.time.*,Option,Either,Product(case classes, tuples), andIterableOnce. - Deterministic ordering when encoding maps via
VectorMap. - Scala 3 derivation:
codec.Encoderandcodec.Decoderderive for case classes. Prefer typedToonTyped.encode[A: Encoder]/ToonTyped.decodeAs[A: Decoder]overAny-based methods.
# Encode JSON -> TOON with 4-space indentation and tab delimiters
toon4s-cli --encode data.json --indent 4 --delimiter tab -o data.toon
# Decode TOON -> JSON (strict mode on by default)
toon4s-cli --decode data.toon --strict true -o roundtrip.jsonAvailable flags:
| Flag | Description |
|---|---|
--encode / --decode |
Required: choose direction explicitly. |
--indent <n> |
Pretty-print indentation (default 2). |
--delimiter <comma|tab|pipe> |
Column delimiter for tabular arrays. |
--length-marker |
Emit [#N] markers to disambiguate lengths in prompts. |
--stats |
Print input/output token counts and savings to stderr. |
--tokenizer <cl100k|o200k|p50k|r50k> |
Select tokenizer for --stats (default cl100k). |
--strict <bool> |
Enforce indentation/escape rules when decoding. |
-o, --output <file> |
Target file (stdout when omitted). |
Use --stats to measure token impact. Choose a tokenizer with --tokenizer (e.g., o200k).
TOON borrows two big ideas:
- Indentation for structure (like YAML)
- Headers for uniform arrays (like CSV/TSV)
flowchart LR
scala["Scala data\nMap / Case Class / Iterable"]
norm["Normalize\n(JsonValue)"]
encoder["Encoders\n(pure)"]
toon["TOON text\n(length markers, headers)"]
llm["LLM prompt\n(token-efficient)"]
scala --> norm --> encoder --> toon --> llm
style scala fill:#e1f5ff,stroke:#0066cc,color:#000
style norm fill:#f0e1ff,stroke:#8800cc,color:#000
style encoder fill:#fff4e1,stroke:#cc8800,color:#000
style toon fill:#e1ffe1,stroke:#2d7a2d,color:#000
style llm fill:#ffe1e1,stroke:#cc0000,color:#000
Example:
orders[2]{id,user,total,items}:
1001,ada,29.70,[3]{sku,qty,price}:
A1,2,9.99
B2,1,5.50
C3,1,4.22
1002,bob,15.00,[1]: gift-card
orders[2]says “array length 2”. Optional#makes it[#2].{id,user,...}declares columns for the following rows.- Nested arrays either go inline (
[3]: gift-card,store-credit) or open their own blocks.
Full spec reference: toon-format/spec.
See also: Encoding rules
- Strict indentation: use spaces (tabs rejected when
strict=true). Indent levels must be multiples ofDecodeOptions.indent. - Quotes only when required: strings with spaces, delimiters, or structural characters need
".."wrapping. - Length markers: recommended for LLM prompts; they let you validate response lengths quickly.
- Delimiters: choose comma (default), tab (token-efficient), or pipe (human-friendly). The delimiter is encoded in the header, so consumers know what to expect.
- Uniform rows: tabular arrays must have consistent field counts; strict mode enforces this.
Quoting vs. unquoted strings (encoder rules):
| Condition | Needs quotes? | Reason |
|---|---|---|
| Empty string | Yes | Ambiguous if unquoted. |
| Leading/trailing whitespace | Yes | Preserves spaces. |
Contains : |
Yes | Conflicts with key separators. |
Contains delimiter (,/\t/` |
`) | Yes |
Contains " or \\ |
Yes | Must be escaped inside quotes. |
Contains [ ] { } |
Yes | Structural tokens. |
Contains \n, \r, \t |
Yes | Control characters. |
Starts with - at list depth |
Yes | Could be parsed as list marker. |
Boolean/Null literal: true/false/null |
Yes | Avoids primitive coercion. |
Looks numeric (e.g., -12, 1.2e5, 01) |
Yes | Avoids numeric coercion; leading zeros are reserved. |
flowchart TD
s["string value"] --> check1{empty or trimmed != value?}
check1 -- yes --> q[quote]
check1 -- no --> check2{contains colon / delimiter?}
check2 -- yes --> q
check2 -- no --> check3{structural or control chars?}
check3 -- yes --> q
check3 -- no --> check4{boolean/null or numeric-like?}
check4 -- yes --> q
check4 -- no --> u[unquoted]
style s fill:#e1f5ff,stroke:#0066cc,color:#000
style q fill:#ffe1e1,stroke:#cc0000,color:#000
style u fill:#e1ffe1,stroke:#2d7a2d,color:#000
style check1 fill:#f0e1ff,stroke:#8800cc,color:#000
style check2 fill:#f0e1ff,stroke:#8800cc,color:#000
style check3 fill:#f0e1ff,stroke:#8800cc,color:#000
style check4 fill:#f0e1ff,stroke:#8800cc,color:#000
See also: Encoding rules
| Package | Purpose |
|---|---|
io.toonformat.toon4s |
Core types: Toon, JsonValue, EncodeOptions, DecodeOptions, Delimiter. Typed entry points live in ToonTyped: ToonTyped.encode[A: Encoder], ToonTyped.decodeAs[A: Decoder]. |
io.toonformat.toon4s.encode.* |
Encoders, primitive formatting helpers. |
io.toonformat.toon4s.decode.* |
Decoders, parser/validation utilities. |
io.toonformat.toon4s.decode.Streaming |
Streaming visitors for tabular arrays (foreachTabular) and nested arrays (foreachArrays). |
io.toonformat.toon4s.json.SimpleJson |
Lightweight JSON AST + parser/stringifier used in tests/CLI. |
io.toonformat.toon4s.cli.* |
CLI wiring (Main, token estimator). |
Most teams only interact with Toon.encode, Toon.decode, and JsonValue pattern matching. Lower-level modules stay internal unless you are extending the format.
See also: JsonValue ADT, Encoding model, Decoding rules
| Scala type | TOON behaviour |
|---|---|
String, Boolean, Byte/Short/Int/Long, Float/Double, BigDecimal |
Direct primitives; floats/ doubles silently drop NaN/Inf → null (to stay deterministic). |
Option[A] |
Some(a) → encode a; None → null. |
Either[L, R] |
Encoded as JSON-like objects ({"Left": ...}) via product encoding. Consider normalizing upstream. |
Iterable, Iterator, Array |
Encoded as TOON arrays, falling back to list syntax when not tabular. |
Map[String, _], VectorMap |
Preserve insertion order; keys auto-quoted when needed. |
Product (case classes / tuples) |
Converted through productElementNames + productIterator. |
Java time (Instant, ZonedDateTime, etc.) |
ISO‑8601 strings, UTC-normalized for deterministic prompts. |
Preferred (Scala 3): typed APIs with type classes.
import io.toonformat.toon4s._
import io.toonformat.toon4s.codec.{Encoder, Decoder}
case class User(id: Int, name: String) derives Encoder, Decoder
val s: String = Toon.encode(User(1, "Ada")).fold(throw _, identity)
val u: User = ToonTyped.decodeAs[User](s).fold(throw _, identity)Fallbacks:
- Decoding always yields the
JsonValueADT; pattern-match it if you prefer. SimpleJson.toScalayieldsAnyfor quick-and-dirty interop.
Why another TOON for Scala?
- Ergonomics: native Scala APIs and derivation reduce boilerplate versus Java/TS bindings in Scala codebases.
- Footprint: zero-dep core minimizes transitive risk compared to libraries built atop general JSON stacks.
- Streaming: visitors let you validate/model-check row counts without paying for full tree allocation.
- Parity: same token savings as JToon/toon because the format drives savings, not the implementation.
- Throughput: competitive decode throughput (see JMH); encode throughput is solid and easy to reason about.
See also: Encoding model, JsonValue ADT
flowchart TD
raw["LLM response"]
parse["SimpleJson.parse"]
json["JsonValue\n(JObj/JArray…)"]
mapScala["Pattern match /\ncustom decoder"]
domain["Domain model\n(case class, DTO)"]
raw --> parse --> json --> mapScala --> domain
style raw fill:#e1f5ff,stroke:#0066cc,color:#000
style parse fill:#fff4e1,stroke:#cc8800,color:#000
style json fill:#f0e1ff,stroke:#8800cc,color:#000
style mapScala fill:#ffe1e1,stroke:#cc0000,color:#000
style domain fill:#e1ffe1,stroke:#2d7a2d,color:#000
Prompt scaffolding idea:
System: You are a precise data validator.
User:
Please read the following TOON payload describing purchase orders.
Return JSON with fields {id, total, status} for every order with total > 100.
Validate row counts against the markers.
Then attach:
orders[#3]{id,total,status}:
101,250.10,pending
102,89.00,fulfilled
103,140.00,review
Why it helps:
- Length markers give you a checksum (“model must return 3 rows”).
- Tabular headers reduce hallucinations (model sees explicit columns).
- Reduced tokens = cheaper prompts; faster iteration = cheaper eval runs.
For response validation, decode the model output using Toon.decode (if the LLM responds in TOON) or rehydrate JSON responses and compare lengths/keys.
See also: Delimiters & markers, Strict mode
What we didn't compromise on: toon4s prioritizes correctness, type safety, and functional purity over convenience. All limitations below are honest tradeoffs we made consciously-not shortcuts.
These are inherent to the TOON specification, not toon4s:
- Irregular arrays: When rows differ in shape, TOON falls back to YAML-like list syntax; token savings shrink. This is by design-tabular encoding requires uniform structure.
- Binary blobs: TOON doesn't support binary data (spec limitation). Encode as Base64 strings manually before passing to toon4s.
These are conscious design decisions:
-
Full AST decode (v0.1.0):
Toon.decode()andToon.decodeFrom()read entire input into memory before parsing. This ensures:- Pure functions: Decode returns
Either[DecodeError, JsonValue]with complete error context - Type safety: Full AST enables exhaustive pattern matching and sealed ADT validation
- Referential transparency: No hidden state, no streaming cursors to manage
For large files (>100MB), we provide streaming alternatives that maintain purity:
Streaming.foreachTabular- tail-recursive row-by-row validation (constant memory)Streaming.foreachArrays- validate nested arrays incrementally (stack-safe)- Both use pure visitor pattern (no side effects, accumulator-based)
Full streaming decode (incremental parsing of entire documents) is planned for v0.2.0 while maintaining functional purity (likely using FS2/ZIO Stream integration).
- Pure functions: Decode returns
-
Deterministic ordering: We use
VectorMapinstead ofHashMapbecause predictable field ordering matters more than raw lookup speed. This aids debugging, testing, and spec compliance. -
No mutation: Immutability with tailrec. Trade: ~20% throughput decrease. Gain: zero race conditions, zero hidden state, composable functions.
-
No external dependencies (core): Zero deps means you can't use Jackson/Circe codecs directly. Trade: manual integration. Gain: 491KB JAR, zero CVEs, zero conflicts.
- Locale-specific numbers: Encoder always uses
.decimal separators (spec requirement). Normalize inputs beforehand. - CLI tokenizer:
TokenEstimatorcurrently defaults toCL100K_BASE(GPT-4/3.5). Model-specific differences apply (easily configurable).
Philosophy: We refuse shortcuts that compromise type safety (Any, asInstanceOf), purity (var, while, null), or correctness (exceptions in happy paths). If a feature can't be implemented purely, we defer it until we find the right abstraction.
| Construct | Example | Notes |
|---|---|---|
| Object | user:\n id: 123\n name: Ada |
Indentation defines nesting. |
| Inline primitives | tags[3]: reading,gaming,coding |
Quotes only when needed. |
| Tabular array | users[2]{id,name}:\n 1,Ada\n 2,Bob |
Header defines columns. |
| Nested tabular | orders[1]{id,items}:\n 1,[2]{sku,qty}: ... |
Inner header scoped to nested block. |
| Length marker | `items[#2 | ]{sku |
| Empty array/object | prefs[0]: or prefs: {} |
Choose whichever fits your schema. |
| Comments | (not part of spec - strip before encoding) | Keep prompts clean; TOON itself has no comment syntax. |
sbt scalafmtCheckAll # formatting
sbt +test # Scala 2.13 and 3.3 suites
./smoke-tests/run-smoke.shGitHub Actions runs:
- Quick checks: scalafmt +
+compileon Ubuntu. - Matrix tests: Linux/macOS/Windows × Scala 2.13 & 3.3, with test-report artifacts when a shard fails.
- Smoke: CLI round trip script on Ubuntu.
- All checks pass “gate” job.
- Quick run (single iteration, small windows):
sbt "jmh/jmh:run -i 1 -wi 1 -r 500ms -w 500ms -f1 -t1 io.toonformat.toon4s.jmh.EncodeDecodeBench.*"
- Typical run:
sbt "jmh/jmh:run -i 5 -wi 5 -f1 -t1 io.toonformat.toon4s.jmh.EncodeDecodeBench.*"
Or use aliases:
sbt jmhDev # quick check
sbt jmhFull # heavy run
- Intent: publish indicative throughput numbers for common shapes (tabular, lists, nested objects) under reproducible settings.
- Harness: JMH via
sbt-jmh0.4.5. Single thread (-t1), single fork (-f1). - Quick config:
-i 1 -wi 1 -r 500ms -w 500ms(fast sanity; noisy but useful for local checks). - Heavy config:
-i 5 -wi 5 -r 2s -w 2s(more stable). CI runs this set with a soft 150s guard. - Reporting: CI also emits JSON (
-rf json -rff /tmp/jmh.json) and posts a summary table on PRs. - Machine baseline (indicative): macOS Apple M‑series (M2/M3), Temurin Java 21, default power settings.
- Guidance: close heavy apps/IDEs, plug in AC power, warm JVM before measurement. Numbers vary by OS/JVM/data shapes-treat them as relative, not absolute.
- Tabular rows only:
import io.toonformat.toon4s.decode.Streaming
val reader = new java.io.StringReader("""
users[2]{id,name}:
1,Ada
2,Bob
""".stripMargin)
Streaming.foreachTabular(reader) { (key, fields, values) =>
// key = Some("users"), fields = List("id","name"), values = Vector("1","Ada") then Vector("2","Bob")
}- Nested arrays with path:
val reader2 = new java.io.StringReader("""
orders[1]{id,items}:
1001,[2]{sku,qty}:
A1,2
B2,1
""".stripMargin)
Streaming.foreachArrays(reader2)({ (path, header) =>
// path: Vector("orders") when header key is bound
})( { (path, header, values) =>
// values: Vector("A1","2"), then Vector("B2","1")
})When to use streaming
- Validate/model‑check tabular sections quickly (row counts, required columns) without allocating a full AST.
- Pipe rows directly to sinks (CSV writers, database ingesters, online aggregation) for large payloads.
- Pre‑filter/transform rows on the fly before passing trimmed data to LLMs.
- Keep full
Toon.decodefor non‑tabular or when you need the entire tree (e.g., complex nested edits).
MIT - see LICENSE.