lo/libsoliton

Fork 0

Kamal Tufekcic 1d99048c95

CI / lint (push) Successful in 1m37s

Details

CI / test-python (push) Successful in 1m49s

Details

CI / test-zig (push) Successful in 1m39s

Details

CI / test-wasm (push) Successful in 1m54s

Details

CI / test (push) Successful in 14m44s

Details

CI / miri (push) Successful in 14m18s

Details

CI / build (push) Successful in 1m9s

Details

CI / fuzz-regression (push) Successful in 9m9s

Details

CI / publish (push) Failing after 1m10s

Details

CI / publish-python (push) Failing after 1m46s

Details

CI / publish-wasm (push) Has been cancelled

Details

initial commit

Signed-off-by: Kamal Tufekcic <kamal@lo.sh>

2026-04-02 23:48:10 +03:00

704 KiB

Raw Permalink Blame History

Soliton Cryptographic Specification

1. Overview

Companion to LO Protocol Specification v1. Specifies all cryptographic protocols for authentication, key agreement, message encryption, signatures, and storage.

1.1 Design Philosophy

Unified key type: Identity = LO composite (X25519 + ML-KEM-768 + ML-DSA-65). Pre-keys = X-Wing (X25519 + ML-KEM-768).
KEM-native: Key agreement via KEM, not Diffie-Hellman.
Hybrid everything: Classical + post-quantum for encryption and signatures.
Header-bound AAD: All DM ciphertext authentication binds the full message header, preventing header tampering.
Memory-safe C ABI: Rust core library with a stable C ABI (soliton_capi). All language bindings call through this ABI.
Versioned primitives: Crypto version tag on all key material and sessions.

1.2 Primitives (lo-crypto-v1)

Primitive	Algorithm	Reference
Hybrid KEM	X-Wing (X25519 + ML-KEM-768)	draft-connolly-cfrg-xwing-kem-09
Classical KEM	X25519	RFC 7748
Post-quantum KEM	ML-KEM-768	FIPS 203
Classical signature	Ed25519	RFC 8032
Post-quantum signature	ML-DSA-65	FIPS 204
KDF	HKDF-SHA3-256	RFC 5869
Hash	SHA3-256	FIPS 202
Symmetric	XChaCha20-Poly1305	RFC 8439 + HChaCha20 extension
MAC	HMAC-SHA3-256	RFC 2104
Password KDF	Argon2id	RFC 9106
Storage compression	Zstandard (zstd)	RFC 8878

1.3 Backend

The core library is pure Rust with zero C dependencies:

Crate	Algorithms
`curve25519-dalek`	X25519 (RFC 7748)
`ed25519-dalek`	Ed25519 signing/verification (RFC 8032)
`ml-kem`	ML-KEM-768 (FIPS 203)
`ml-dsa`	ML-DSA-65 (FIPS 204)
`chacha20poly1305`	XChaCha20-Poly1305 (RFC 8439 + HChaCha20)
`sha3`	SHA3-256
`hmac`, `hkdf`	HMAC-SHA3-256, HKDF-SHA3-256
`ruzstd`	Zstandard compression/decompression (§11, pure Rust)
`argon2`	Argon2id password-based key derivation (RFC 9106, §10.6)
`getrandom`	CSPRNG (OS entropy: `getrandom(2)`, `ProcessPrng`, `getentropy`, etc.)

All dependencies are exact-pinned. No C toolchain, cmake, or pkg-config required. Compiles with cargo build on any target including wasm32-unknown-unknown.

1.4 Notation

||          Concatenation of byte strings
x[a..b]    Half-open byte range: byte index a inclusive to b exclusive. x[0..32] selects 32 bytes (indices 0-31). Equivalent to x[a:b] in Python/Go, x.slice(a, b) in Rust, Arrays.copyOfRange(x, a, b) in Java. Programmers accustomed to inclusive-end notation must treat b as "one past the last index."
len(x)     Length of x in bytes, encoded as 2-byte big-endian (the prefix encodes the byte count of x itself; the 2-byte prefix is not included in the value)
big_endian_32(x)   4-byte big-endian encoding of unsigned 32-bit integer x. Not the same as len(x) — big_endian_32 always writes exactly 4 bytes and does not encode x as a length prefix.
HKDF(salt, ikm, info, len)   HKDF-SHA3-256 extract-and-expand (always both steps: RFC 5869 §2.2 Extract + §2.3 Expand. HKDF-Expand-only is never used.)
XWing.KeyGen()               → (pk, sk)
XWing.Encaps(pk)             → (ciphertext, shared_secret)
XWing.Decaps(sk, ct)         → shared_secret  // re-derives pk_X = X25519(sk_X, G) internally (the decapsulator's
                                               // OWN public key — NOT ct_X from the ciphertext; see §8.2 for the
                                               // most common X-Wing implementation error)
Ed25519.Sign(ed25519_sk, msg)  → sig (64 bytes)
Ed25519.Verify(ed25519_pk, msg, sig) → bool
MLDSA.Sign(sk, msg)          → sig (3309 bytes, hedged mode: FIPS 204 §6.2 Sign_internal with rnd=random(32))
                               // §6.2 is Sign_internal (the deterministic core); §5.2 is the external ML-DSA.Sign wrapper.
                               // sk is the 32-byte seed ξ — re-expanded per §8.5 before use. Not the 4032-byte expanded signing key.
MLDSA.Verify(pk, msg, sig)   → bool (Verify_internal per FIPS 204 §6.3 — see §3.1)
                               // §6.3 is Verify_internal (the deterministic core); §5.3 is the external ML-DSA.Verify wrapper.
HMAC-SHA3-256(key, data)     → 32-byte tag (first argument is always the HMAC key, second is the message)
AEAD(key, nonce, pt, aad)    → ciphertext || tag  (XChaCha20-Poly1305)
random_bytes(n)              → n cryptographically random bytes (OS CSPRNG via getrandom)
encode_session_init(h)       → deterministic binary (§7.4)
encode_ratchet_header(h)     → deterministic binary (§7.4)
SHA3-256(x)                  → 32-byte digest (FIPS 202; not Ethereum's Keccak-256, which uses 0x01 padding — FIPS 202 uses 0x06)

Byte comparison convention: All lexicographic comparisons of byte strings throughout this spec (fingerprint sorting in §6.12, §9.2; key sorting in §9.2) use unsigned byte-by-byte comparison. Languages with signed byte types (Java byte, some C char implementations) must cast to unsigned before comparison — signed comparison reverses the ordering for bytes ≥ 0x80, producing different sort results and silently wrong AAD or verification phrases.

1.5 Channel 2 Scope (Metadata Exposure)

This library fully protects Channel 1 — the content and integrity of transmitted data: message confidentiality, authentication, forward secrecy, and replay prevention. It makes no guarantees about Channel 2 — the structural metadata of communication: who communicates with whom, when, how often, and in what pattern. This is an explicit design boundary, not a gap.

The following information is observable to a passive network adversary (one who can intercept but not modify traffic) and is out of scope for all security properties claimed in this document:

LO-KEX (§5)

Bundle fetch: the bundle relay server learns that party A intends to contact party B before any encryption begins.
Session initialization: the SessionInit message reveals that two specific fingerprints are beginning a session to any observer who can intercept it.
Failed session attempts: a responder that rejects a SessionInit (wrong crypto version, structural error) responds differently from one that never received it. An initiator can probe whether a party is online or running a specific version by observing response presence and timing. Probing resistance requires transport-layer measures outside this library's scope.

LO-Ratchet (§6)

Epoch transitions: pk_s in the cleartext header changes at each KEM ratchet step — a network observer can determine when a direction change occurred and count how many ratchet steps have taken place.
Message position: the counter n in the cleartext header reveals the message's position within the current epoch.
Previous epoch size: pn reveals how many messages were sent in the preceding epoch.
Ciphertext length: approximates plaintext length (compressed plaintext + 17-byte AEAD overhead). When compression is enabled, length leaks plaintext compressibility.

LO-Auth (§4)

Challenge issuance: the challenge ciphertext is sent in cleartext; its issuance reveals that an authentication attempt is in progress between a specific client and server.

Streaming AEAD (§11)

Stream header: base_nonce, version, and flags are transmitted in cleartext — their presence reveals that a stream is being established.
Chunk count: the number of chunks is observable from the ciphertext stream structure.
Chunk sizes: approximate plaintext chunk sizes (compressed size + 17-byte overhead per chunk).

Designing for Channel 2 protection: Applications requiring metadata privacy must add transport-layer measures on top of this library. Uniform message padding (all messages padded to fixed sizes) removes length leakage. Cover traffic removes frequency and timing leakage. Onion routing or a mix network removes connection-graph leakage. An encrypted transport tunnel wrapping LO-Ratchet output removes epoch-transition leakage from the header fields. These concerns are outside the scope of this library.

2. LO Composite Key

2.1 Key Generation

function GenerateIdentity():
    (xwing_pk, xwing_sk) = XWing.KeyGen()
    (mldsa_pk, mldsa_sk_expanded) = MLDSA.KeyGen()
    mldsa_sk = mldsa_sk_expanded.to_seed()    // Extract 32-byte seed ξ — NOT the 4032-byte expanded key (FIPS 204 §7.2, ML-DSA-65 sigKeySize)

    (ed25519_pk, ed25519_sk) = Ed25519.KeyGen()

    pk = xwing_pk || ed25519_pk || mldsa_pk   // 1216 + 32 + 1952 = 3200 bytes
    sk = xwing_sk || ed25519_sk || mldsa_sk   // 2432 (expanded X-Wing sk — see §8.5) + 32 + 32 = 2496 bytes

    fingerprint = hex(SHA3-256(pk))  // 32 bytes = 64 lowercase hex chars (a-f, 0-9)
    return (pk, sk, fingerprint)

MLDSA.KeyGen() returns an expanded key — to_seed() extracts the 32-byte seed before storage: Most ML-DSA library APIs return the fully expanded 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)-byte signing key as mldsa_sk. soliton stores only the 32-byte seed ξ (§8.5). The to_seed() step extracts ξ from the expanded form before assembly into the 2496-byte composite secret key. Alternative (seed-first) pattern used by the reference implementation: The reference does NOT call MLDSA.KeyGen() and then to_seed() — instead it generates ξ = random_bytes(32) directly from the OS CSPRNG and calls ML-DSA.KeyGen_internal(ξ) (FIPS 204 §6.1, a deterministic function of ξ) to obtain both the public key and the signing handle, without ever creating or discarding the expanded 4032-byte key. This is the pattern described in §2.1's "If your ML-DSA library does not expose ξ after KeyGen() at all" paragraph. Both patterns produce the same stored 32-byte seed and are cryptographically equivalent; the seed-first pattern is cleaner and avoids the to_seed() extraction step. The pseudocode above shows the KeyGen() + to_seed() pattern for generality; libraries that provide KeyGen_internal(ξ) or from_seed(ξ) constructors should use the seed-first pattern. A reimplementer who assembles sk = xwing_sk || ed25519_sk || mldsa_sk_expanded produces a 6496-byte secret key (2432 + 32 + 4032) — it will not parse as a valid identity secret key (2496-byte size check fails). The check in IdentitySecretKey::from_bytes() catches this immediately, so the failure is not silent. However, if a reimplementer writes their own construction without a size check, they may store the wrong form and only discover the mismatch at signing time.

XWing.KeyGen() uses X25519-first key layout — LO diverges from draft-09: The X-Wing secret key returned by XWing.KeyGen() is stored as sk_X (32 bytes) ‖ dk_M (2400 bytes) — X25519 component first, ML-KEM-768 expanded decapsulation key second. IETF draft-connolly-cfrg-xwing-kem-09 specifies the opposite order: dk_M ‖ sk_X. A reimplementer who follows draft-09's field ordering produces an incompatible 2432-byte secret key layout — ExtractXWingPrivate extracts the wrong bytes, decapsulation silently derives a wrong shared secret, and AEAD fails with no diagnostic pointing to the key-layout swap. The public key ordering is the same in both LO and draft-09 (X25519 public key first, ML-KEM-768 public key second). See §8.1 and §8.5 for the complete layout specification.

ML-KEM-768 KeyGen requires two independent 32-byte entropy draws: XWing.KeyGen() internally calls ML-KEM.KeyGen (FIPS 203 §7.1), which requires two independently-random 32-byte seeds d and z. The soliton reference implementation draws d and z independently from the OS CSPRNG (two separate getrandom calls). A reimplementer who derives both from a single seed (e.g., d = HKDF(seed, "d", 32), z = HKDF(seed, "z", 32)) produces a non-conforming key — the ML-KEM security proof requires that d and z are independently uniform; deriving both from a common secret violates this requirement and may weaken the IND-CCA2 security of the KEM. There is no structural or size-based signal that detects this mistake: the resulting keypair is the correct size, and encapsulation/decapsulation succeed normally. A conformance test MUST verify that d and z are generated by separate CSPRNG calls, not derived from a shared value.

Cross-library seed extraction is NOT portable via API name: to_seed(), signing_key.to_bytes()[0..32], seed(), private_key_bytes(), and similar method names are NOT equivalent across ML-DSA library implementations. In the Rust ml-dsa crate, to_seed() returns exactly ξ (the 32 bytes passed to ML-DSA.KeyGen_internal). In other libraries (BouncyCastle, Go's circl, liboqs), to_bytes()[0..32] may return the first bytes of the expanded signing key or a different internal representation — not the seed. The only portable cross-library verification: extract the candidate 32 bytes, call ML-DSA.KeyGen_internal(candidate) (FIPS 204 §6.1), and compare the resulting public key against the known public key. If they match, the candidate is ξ. Any candidate that does not round-trip to the known public key is not the seed, regardless of the API name used to extract it.

If your ML-DSA library does not expose ξ after KeyGen() at all (e.g., liboqs, BouncyCastle, and PQClean C bindings expose only the expanded key form with no seed accessor): generate ξ = random_bytes(32) yourself from the OS CSPRNG, then call ML-DSA.KeyGen_internal(ξ) (FIPS 204 §6.1) directly to obtain the public key. This produces a valid keypair with ξ as the seed, bypassing the library's opaque KeyGen() entirely. See §8.5 for the two-level API pattern (KeyGen() vs KeyGen_internal(ξ)) and for what ML-DSA.KeyGen_internal MUST consume (no CSPRNG input — it is a pure deterministic function of ξ).

The hex-encoded fingerprint is for display and user-facing comparison (§9). All wire-format fields (sender_ik_fingerprint, recipient_ik_fingerprint, local_fp, remote_fp) use the raw 32-byte SHA3-256 digest, not the 64-character hex string.

2.2 Component Extraction

function ExtractX25519Public(pk):     return pk[0..32]
function ExtractMLKEMPublic(pk):      return pk[32..1216]
function ExtractEd25519Public(pk):    return pk[1216..1248]
function ExtractMLDSAPublic(pk):      return pk[1248..3200]
function ExtractXWingPublic(pk):      return pk[0..1216]

function ExtractX25519Private(sk):    return sk[0..32]
function ExtractXWingPrivate(sk):     return sk[0..2432]  // sk_X(32) || dk_M(2400): X25519 scalar + ML-KEM-768 NTT-domain expanded decapsulation key — see §8.5
function ExtractEd25519Private(sk):   return sk[2432..2464] // 32-byte RFC 8032 seed s (RFC 8032 §5.1.5) — the raw random seed.
                                                              // NOT the SHA-512 hash of the seed, NOT the clamped scalar,
                                                              // NOT the 64-byte seed||public_key form (Go/libsodium default).
                                                              // ed25519_dalek::SigningKey::from_bytes() takes this exact form.
function ExtractMLDSAPrivate(sk):     return sk[2464..]     // 32 bytes (seed, NOT the 4032-byte expanded signing key (FIPS 204 §7.2, ML-DSA-65 sigKeySize) — see §8.5)

ML-DSA secret key is a 32-byte seed, not the expanded form: ExtractMLDSAPrivate returns a 32-byte seed (ξ), not the 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)-byte expanded signing key. The full expanded signing key is deterministically re-derived via ML-DSA.KeyGen_internal(ξ) (FIPS 204 §6.1) at signing time (§8.5). A reimplementer who stores the full expanded form produces a 6496-byte secret key (2432 + 32 + 4032) that is incompatible with soliton's 2496-byte layout (2432 + 32 + 32). ExtractMLDSAPublic(pk) returns the standard 1952-byte FIPS 204 pkEncode public key — no analogous storage divergence exists on the public side.

ML-KEM-768 sub-key sizes within X-Wing: The X-Wing components extracted by ExtractMLKEMPublic(pk) and ExtractXWingPrivate(sk) have fixed sub-structure (§8.1, §8.5): ML-KEM-768 public key (ek_PKE) = 1184 bytes (bytes 32-1215 of the X-Wing public key); ML-KEM-768 expanded secret key (dk_M) = 2400 bytes (bytes 32-2431 of the X-Wing secret key); ML-KEM-768 ciphertext = 1088 bytes (bytes 32-1119 of the X-Wing ciphertext). A reimplementer hard-coding the wrong ML-KEM-768 sub-key sizes (e.g., 1184 bytes for the secret key, which is the public key size) produces decapsulation keys that fail silently at AEAD — see §8.5 for the full dk_M field layout.

ML-KEM stores the full 2400-byte expanded decapsulation key; ML-DSA stores only the 32-byte seed: §2.1 explains that ML-DSA stores only ξ (32 bytes) and re-expands at sign time via ML-DSA.KeyGen_internal(ξ). A reimplementer might apply the same reasoning to ML-KEM — storing only a seed and re-deriving at decapsulation time. This does NOT work: FIPS 203 does not define a standard ML-KEM.KeyGen_internal(seed) equivalent that produces a deterministic decapsulation key from a single 32-byte seed in the way FIPS 204 §6.1 defines KeyGen_internal(ξ) for ML-DSA. The ML-KEM.KeyGen function takes two independent 32-byte values d and z (§2.1 above), and the expanded 2400-byte decapsulation key embeds both in expanded form (§8.5). There is no FIPS 203 pathway to regenerate the same 2400-byte dk_M from a shorter seed without storing d, z, and the expanded state separately — which is larger than the 2400-byte key itself. soliton stores the full dk_M (2400 bytes) in ExtractXWingPrivate(sk). A reimplementer who stores only a seed produces a layout-incompatible secret key.

IdentitySecretKey::from_bytes zeroizes the input buffer on the error path: from_bytes wraps the input in Zeroizing immediately on entry, so if InvalidLength is returned (wrong-size input), the caller's buffer is zeroed before the error propagates. This is a side-effect of the Rust Zeroizing wrapper — not a documented caller contract — but reimplementers and binding authors should be aware that a rejected-size buffer is zeroed. Callers who read the input back after a failed from_bytes call will find it zero. This side-effect does not apply to IdentityPublicKey::from_bytes (public keys are not secret; no zeroization on error).

Lazy validation: IdentityPublicKey::from_bytes() validates only the total size (3200 bytes). It does not parse or validate the X-Wing, Ed25519, or ML-DSA sub-key structures — invalid sub-key bytes are accepted at construction and produce errors only at use time (Encaps, HybridVerify, etc.). For example, a 3200-byte all-zero input is accepted at construction; encapsulation fails at use time when ML-KEM rejects the zero key material during matrix expansion, and signature verification fails when Ed25519 rejects the all-zero point as a non-canonical encoding. This is intentional: sub-key validation requires algorithm-specific parsing (ML-KEM coefficient range checks, Ed25519 point decompression, ML-DSA matrix expansion), which is expensive and duplicated by the operations themselves. Reimplementers MUST NOT assume that a successfully constructed IdentityPublicKey contains valid sub-keys. The same applies to IdentitySecretKey::from_bytes() (validates total size only).

Security note — identity key compromise: The identity secret key contains independent components: an X-Wing secret key (for KEM) and a dedicated Ed25519 secret key (for signing). A compromise of sk_IK yields both KEM decapsulation and signature forgery capability. The X25519 scalar within X-Wing is used solely for KEM; it plays no role in signing.

2.3 X-Wing Operations

Encapsulation and decapsulation use only the X-Wing components (X25519 + ML-KEM-768). The ML-DSA component is not involved.

function Encaps(lo_pk):
    xwing_pk = ExtractXWingPublic(lo_pk)
    return XWing.Encaps(xwing_pk)

function Decaps(lo_sk, ciphertext):
    xwing_sk = ExtractXWingPrivate(lo_sk)
    return XWing.Decaps(xwing_sk, ciphertext)
    // Note: the X-Wing combiner (§8.2) requires pk_X — the decapsulator's own
    // X25519 public key — which is re-derived from sk_X on every call as
    // X25519(sk_X, G). It is NOT taken from the ciphertext. See §8.2.

3. Hybrid Signatures

All signatures in LO use Ed25519 + ML-DSA-65 in parallel. A signature is valid only if both components verify.

3.1 Signing

function HybridSign(lo_sk, message):
    ed25519_sk = ExtractEd25519Private(lo_sk)
    mldsa_sk = ExtractMLDSAPrivate(lo_sk)

    sig_classical = Ed25519.Sign(ed25519_sk, message)   // 64 bytes
    sig_pqc = MLDSA.Sign(mldsa_sk, message)            // 3309 bytes, hedged mode

    return sig_classical || sig_pqc                      // 3373 bytes total

Domain labels are applied by callers, not inside HybridSign: The message parameter passed to HybridSign(lo_sk, message) MUST already contain any domain-separation label. HybridSign performs no label prepending, wrapping, or modification of its own — it signs the raw bytes as provided. Domain labels (e.g., "lo-spk-sig-v1", "lo-kex-init-sig-v1") are concatenated at the call site before invoking HybridSign (examples: §5.3, §5.4 Step 6). A reimplementer who embeds label handling inside HybridSign produces signatures over different bytes than the call-site concatenation: every signature over a labeled message would silently double-apply the label, making all such signatures incompatible with conforming implementations. Concretely: HybridSign(sk, "lo-spk-sig-v1" ‖ payload) is correct; HybridSign(sk, payload) where HybridSign internally prepends "lo-spk-sig-v1" is incorrect.

Byte layout: The 3373-byte composite signature is a raw concatenation with no length prefixes, delimiters, or type markers. Ed25519 occupies bytes 0-63 (fixed 64 bytes per RFC 8032), ML-DSA-65 occupies bytes 64-3372 (fixed 3309 bytes per FIPS 204). A reimplementer who adds length prefixes or uses variable-length Ed25519 encodings (some libraries return r || s || recovery_id) produces incompatible signatures. Split at byte offset 64, unconditionally.

HybridSign output is non-deterministic — do NOT compare two signatures byte-for-byte: Two calls to HybridSign(sk, same_message) produce the same bytes 0-63 (Ed25519 is deterministic per RFC 8032), but always different bytes 64-3372 (ML-DSA-65 uses hedged signing with fresh 32-byte randomness on each call). Byte-equality comparison of two HybridSign outputs is therefore always false for bytes 64-3372, even when both signatures are valid over the same message with the same key. Callers MUST use HybridVerify to check validity — never byte comparison. Systems that cache a signature (e.g., an SPK bundle signature) and later re-sign to compare will always see a mismatch.

ML-DSA message is a single contiguous buffer: Sign_internal and Verify_internal receive the full message as a single flat byte string — not a multi-part or streaming input. In Rust, the ml-dsa crate exposes sign_internal as signing_key.sign_internal(&[message], &rnd) where the outer slice is a &[&[u8]] of parts that are absorbed sequentially into the internal SHA3 state; soliton always passes a one-element slice containing the complete message buffer. A reimplementer whose ML-DSA library takes a &[&[u8]] multi-part interface for signing MUST pass the whole message as a single part — splitting it across multiple parts produces a different internal SHA3 hash state, resulting in incompatible signatures. The same applies to Verify_internal: if the library exposes a multi-part interface for verification (as some do, e.g., liboqs, BouncyCastle), the message MUST be passed as a single part. The ml-dsa Rust crate's verify path uses a flat &[u8], but other libraries may not.

ML-DSA internal API: Both signing and verification use the internal functions (Sign_internal / Verify_internal per FIPS 204 §6.2/§6.3) — the context string and domain separator defined in the public API (FIPS 204 §5.1) are not applied. This is intentional: soliton's domain separation is handled at the protocol level (per-context labels in §3.4, Appendix A). Signatures produced by soliton are not compatible with standalone FIPS 204 verifiers that apply the public API wrapper. Reimplementer warning: ML-DSA libraries outside Rust (liboqs, PQClean, BouncyCastle, Go's circl) often expose only the public API (ML-DSA.Sign / ML-DSA.Verify per FIPS 204 §5.1), which prepends a domain separator byte (0x00) and context string before calling the internal functions. Passing an empty context string to the public API is NOT equivalent to calling Sign_internal — the public API unconditionally prepends the 0x00 domain separator byte even with an empty context. Reimplementers must either access the internal functions directly or verify that their library provides a bypass for the public API's context/domain-separator wrapping. Using the public API produces signatures that are silently incompatible with soliton. rnd MUST be exactly 32 bytes: FIPS 204 §6.2 defines rnd as a 256-bit string (32 bytes). Libraries that accept rnd as a variable-length slice do not validate the size — passing 16 or 24 bytes silently weakens the hedging entropy without any error signal. Implementations MUST generate exactly 32 random bytes and MUST NOT pass a shorter or longer buffer. The reference implementation uses Zeroizing<[u8; 32]> and passes it as a 32-byte slice; binding-layer callers MUST ensure their random-byte generation produces exactly 32 bytes.

rnd MUST be freshly drawn from the OS CSPRNG for each individual Sign_internal call: Pre-generating rnd once and reusing it across multiple signing calls — or batching calls so multiple signatures share the same rnd — defeats the hedge entirely: repeated (message, rnd) pairs with constant rnd produce the same internal randomness on all calls, reducing hedged signing to deterministic signing and re-enabling the fault-injection attacks the hedge defends against. Each HybridSign call MUST draw 32 fresh bytes independently.

Hedged mode rationale: Hedged signing combines deterministic signing with 32 bytes of fresh randomness (rnd parameter), preventing fault-injection attacks that exploit deterministic nonce generation to extract the signing key. The rnd buffer is ephemeral secret material and MUST be zeroized immediately after Sign_internal returns — leaking it reduces hedged signing to deterministic signing, re-enabling the fault-injection attacks the hedge defends against. In Rust, wrapping in Zeroizing<[u8; 32]> handles this automatically; in C/Go/Python, the caller must explicitly zeroize the buffer.

Transient 4032-byte SigningKey zeroization: Every Sign_internal call re-expands the 32-byte seed ξ into the full 4032-byte signing key (s₁, s₂, t₀, t₁ polynomials — §8.5). This transient 4032-byte signing key MUST be zeroized before deallocation. In Rust, the ml-dsa crate's SigningKey implements ZeroizeOnDrop — the expanded key is automatically zeroized when the local variable goes out of scope after Sign_internal returns. In C/Go/Python implementations that call ML-DSA at a lower level, the caller MUST explicitly call memset_s (C) or equivalent on the 4032-byte signing key buffer before freeing it. Note the asymmetry with rnd above: rnd (32 bytes) is documented explicitly because it is ephemeral entropy whose leakage restores a broken security property; the 4032-byte SigningKey obligation is handled automatically in Rust but is equally important for C/Go/Python reimplementers — a leaked expanded signing key permits arbitrary ML-DSA-65 forgery.

Two secrets per Sign_internal call — summary for non-RAII implementations: Each call to HybridSign produces exactly two secret temporaries that must be zeroized: (1) the 32-byte hedged rnd buffer (described above — leaking it re-enables fault-injection attacks); and (2) the 4032-byte expanded ML-DSA-65 signing key (described above — leaking it permits arbitrary forgery). Rust's ZeroizeOnDrop handles both automatically; C/Go/Python callers must zeroize both explicitly at each call site.

sign_internal vs. verify_internal API asymmetry in the ml-dsa crate: In the ml-dsa Rust crate, sign_internal takes a &[&[u8]] (multi-part message slice), while verify_internal takes a flat &[u8]. Soliton always passes a one-element slice to sign_internal (signing_key.sign_internal(&[full_message], &rnd)), so the API difference is transparent at the call site. Reimplementers using a different ML-DSA library must confirm whether their library's Sign_internal and Verify_internal both accept a single flat buffer or use multi-part interfaces — and ensure both call sites pass the full message as a single contiguous input. This asymmetry does not affect correctness when both are used with a single part: SHA3's sponge construction absorbs input sequentially, so H(a) equals H_sponge.absorb(a).finalize() regardless of how the buffer is chunked. The sponge invariant means that a multi-part signing interface receiving a single part produces the same internal hash as a flat interface receiving the same bytes — there is no chunk-boundary hazard when exactly one part is used.

3.2 Verification

function HybridVerify(lo_pk, message, signature):
    if len(signature) != 3373:
        raise InvalidLength   // Must check before slicing — a short input causes
                              // signature[64..3373] to panic or read out-of-bounds.
                              // Returns InvalidLength (not VerificationFailed) because
                              // the error is on a caller-supplied parameter size, not
                              // a cryptographic failure.
                              //
                              // Typed-language note: a reimplementation that uses a
                              // `HybridSignature` wrapper type enforcing the 3373-byte
                              // invariant at construction time (e.g., via `from_bytes`)
                              // satisfies this check at the type level — `hybrid_verify`
                              // itself need not repeat it. An auditor comparing the typed
                              // implementation against this pseudocode should treat the
                              // type-constructor check as conformant with the inline guard
                              // shown here.

    ed25519_pk = ExtractEd25519Public(lo_pk)
    mldsa_pk = ExtractMLDSAPublic(lo_pk)

    sig_classical = signature[0..64]
    sig_pqc = signature[64..3373]

    ok_classical = Ed25519.Verify(ed25519_pk, message, sig_classical)
    ok_pqc = MLDSA.Verify_internal(mldsa_pk, message, sig_pqc)

    // BOTH must pass. Both verifications are evaluated eagerly (no short-circuit).
    // The AND combination MUST be constant-time (e.g., subtle::Choice or equivalent
    // bitwise AND) — a naive boolean && or branch on ok_classical leaks which
    // component failed via timing, enabling targeted forgery of only the weaker component.
    // Eagerness and constant-time AND are JOINT requirements — either alone is insufficient:
    //   - Eager evaluation without CT AND: both calls run, but a branch on the combined result
    //     still leaks whether the result is true or false via timing.
    //   - CT AND without eager evaluation: the bitwise AND is constant-time, but only computing
    //     ok_pqc when ok_classical is true leaks that Ed25519 passed via a timing side-channel.
    // The correct implementation evaluates BOTH verify calls unconditionally, then combines
    // the results with a bitwise AND (or equivalent constant-time operation) and branches only
    // on the combined boolean — not on either individual result.
    return ok_classical AND ok_pqc

Signature size validation: Before slicing, callers MUST verify len(signature) == 3373. Passing a shorter input causes the slice signature[64..3373] to panic or read out-of-bounds (language-dependent). More critically: some ML-DSA libraries return an error (not false) when given a wrong-size input. If that error propagates as a distinct failure mode rather than being collapsed to false before the AND combination, it breaks the constant-time AND requirement — the caller can distinguish "wrong size" from "right size but invalid" via timing or exception, leaking a distinguishing oracle. Any library error on a bad-size ML-DSA input MUST be treated as false for the AND combination, not propagated as a distinct exception.

HybridVerify returns InvalidLength for a wrong-size composite signature: When len(signature) ≠ 3373, HybridVerify returns InvalidLength before any slicing or cryptographic operation. This differs from the sub-component failure mappings (Ed25519 key import → VerificationFailed, ML-DSA signature decode → VerificationFailed), which fire during the verification operation itself. The top-level composite size check fires before any verification-layer operation begins and returns InvalidLength — the error is not oracle-exploitable because the attacker crafted the input and already knows whether its length is correct. A reimplementer who collapses this to VerificationFailed for consistency produces a divergent but still secure result; however, binding authors should document which error to expect.

Ed25519 verification strictness: Ed25519.Verify MUST use strict verification per RFC 8032 §5.1.7, rejecting non-canonical S values (S ≥ L), small-order public keys, and non-canonical point encodings. ZIP-215 permissive verification (as used by crypto/ed25519 in Go and some other libraries) is NOT compatible — it accepts signatures that soliton rejects, producing silent interoperability failures on HybridVerify. The implementation uses verify_strict() from ed25519-dalek. Reimplementers MUST verify their Ed25519 library defaults to strict mode or explicitly select it.

Caution — "strict mode" varies by library: Some Ed25519 libraries advertise a "strict" or "batch-compatible" mode that checks only S-canonicity (S < L, i.e., the scalar is in the range [0, ℓ−1]) but does NOT reject small-order public keys. Curve25519 has cofactor 8 and eight torsion points (points of order dividing 8); a small-order public key causes BasePoint × s × pk to produce a predictable output for any signature, allowing an attacker who controls pk to forge a valid signature under any private key. ed25519-dalek ≥ 1.0 (used by soliton) rejects all eight torsion points via VerifyingKey::from_bytes. Reimplementers using other libraries MUST explicitly verify that their "strict" mode includes small-order-key rejection — S-canonicity alone is insufficient.

Ed25519 key import failure during HybridVerify maps to VerificationFailed, not InvalidData: ExtractEd25519Public slices bytes 1216..1248 from the public key and passes them to the Ed25519 library's key import function (VerifyingKey::from_bytes in ed25519-dalek). If those 32 bytes are not a valid compressed Edwards point, the import fails. HybridVerify collapses this failure to VerificationFailed, not InvalidData — the key bytes are structurally the right length and format (the public key size was already validated by IdentityPublicKey::from_bytes), so the failure is a verification-layer rejection, not a parsing failure. A reimplementer whose Ed25519 library propagates import failures as exceptions must catch them before the AND combination and treat them identically to a verification failure. A library that silently accepts invalid compressed points at import and produces incorrect verification results (rather than an error) would diverge silently: an invalid Ed25519 sub-key would appear to "verify" as false when it should have failed on import, which coincidentally produces the same combined VerificationFailed result — but through the wrong code path, and only for signatures that happen to fail the boolean check. A reimplementer must confirm their Ed25519 library errors on invalid point encoding rather than silently accepting it.

ML-DSA public key import failure during HybridVerify maps to VerificationFailed, not InvalidData: ExtractMLDSAPublic slices bytes 1248..3200 from the public key and passes them to the ML-DSA library's key import function. If those 1952 bytes are structurally invalid (e.g., polynomial coefficients outside [0, q−1] rejected by pkDecode, FIPS 204 §7.1), the import fails. HybridVerify collapses this failure to VerificationFailed, not InvalidData — identical rationale to the Ed25519 case above (structurally valid-length bytes, verification-layer rejection). A reimplementer whose ML-DSA library propagates import failures as distinct exceptions must catch them before the AND combination and treat them as false. Libraries that silently accept out-of-range coefficients at import and produce wrong verification results diverge silently in the same way as the Ed25519 case: a bad public key appears to "verify" as false, which yields the correct final result through the wrong code path. ML-DSA infallible decode — third case: soliton's own ML-DSA implementation (ml-dsa crate) accepts any 1952-byte input as a valid key without checking coefficients at import — VerifyingKey::from_bytes is infallible for correctly-sized inputs. Out-of-range coefficients are not rejected at import; they produce wrong polynomial arithmetic in verify_internal, which returns false → VerificationFailed. This is a third behavior not covered by the spec's "reject vs. normalize" binary: accept-and-produce-wrong-result. For soliton's purposes, this is safe (wrong coefficients → always-false verification → correct rejection of the forged signature) but a reimplementer who reads "MUST confirm their ML-DSA library rejects invalid coefficient encodings" may incorrectly conclude that rejection-at-import is required. It is not — any behavior that produces VerificationFailed for a key with out-of-range coefficients (import error, normalization-then-wrong-result, or implicit-wrong-result) satisfies the security requirement. The concern is libraries that normalize out-of-range coefficients modulo q at import and then produce wrong-but-consistent verification results — the re-encode cross-check below catches these. Note: unlike Ed25519, where invalid points produce errors at import, some ML-DSA libraries reduce out-of-range coefficients modulo q silently on import — producing a different public key than the original bytes represent. A public key that round-trips through such a library's import→export cycle differs byte-for-byte from the original, causing HybridVerify to fail even for an authentic signature (the verified message is computed against the original bytes, but the coefficients used for verification differ after normalization). Reimplementers importing ML-DSA public keys from external libraries should apply the re-encode cross-check described in §8.5.

ML-DSA signature structural decode failure maps to VerificationFailed, not InvalidData: A 3309-byte ML-DSA signature with polynomial coefficients outside [0, q−1] will pass the size check (len(sig_pqc) == 3309) but fail at the ML-DSA library's signature decode step. In the ml-dsa crate, Signature::decode() returns None for such inputs. soliton maps this None to VerificationFailed — not InvalidData. The rationale: the size is correct; the structural failure is a property of the signature bytes themselves, not of the API call. Reimplementers whose ML-DSA library exposes a two-step API (decode then verify) must catch the decode failure explicitly and map it to VerificationFailed. If the decode failure propagates as a distinct exception or error type, it breaks the constant-time AND requirement (the caller can distinguish "decode failed" from "verify returned false" via exception type, even if the combined result is the same). The correct mapping: any ML-DSA decode failure → treat as ok_pqc = false → combined VerificationFailed. This is a third failure mode not covered by the wrong-size path (which fails at slicing before decode) or the public-key path (import failure) — it requires a separate catch in reimplementations.

3.3 Security Properties

Classical: Ed25519 provides 128-bit classical security (RFC 8032).
Post-quantum: ML-DSA-65 provides NIST Level 3.
Hybrid guarantee: Forgery requires breaking both simultaneously.
EUF-CMA: The parallel composition ("both must verify") is EUF-CMA secure if either component is (Bindel et al., PQCrypto 2017 — see Appendix D, "Hybrid Constructions").

3.4 Where Signatures Are Used

Signatures are used in two contexts in v1:

Pre-key bundle signing (§5.3): Identity key signs the signed pre-key's public key material.
Session initiation signing (§5.4 Step 6): Alice's identity key signs the encoded SessionInit, proving to Bob that the session was initiated by the holder of sk_IK_A.

Signatures are NOT used for:

Server-side authentication (KEM-based, §4).
Message encryption (symmetric, §7).
Ratchet key agreement (KEM-based, §6).

Header authentication without signatures: The §3.4 "not used for message encryption" note naturally prompts the question of how ratchet message headers are protected against tampering. The answer is AEAD AAD binding (§7.3): each message's ciphertext authenticates the full encoded ratchet header (sender_fp || recipient_fp || header_bytes) as additional associated data. A tampered header (e.g., modified kem_ct or wrong n) causes AEAD authentication to fail at decryption. Signatures are therefore unnecessary for per-message header integrity — the AEAD tag provides it.

4. KEM-Based Authentication

4.1 Purpose

Proves possession of the private key corresponding to a claimed public identity. Only the legitimate key holder can decapsulate.

4.2 Protocol

Client                                    Server
  |                                         |
  |  --- lo_pk (3200 B) ----------->        |
  |                                         |
  |        xwing_pk = ExtractXWingPublic(lo_pk)
  |        (ct, ss) = XWing.Encaps(xwing_pk)|
  |        token = HMAC-SHA3-256(ss, "lo-auth-v1")
  |        // ss zeroized immediately
  |                                         |
  |  <--- ct (X-Wing ciphertext) ---        |
  |                                         |
  |  ss = XWing.Decaps(xwing_sk, ct)        |
  |  proof = HMAC-SHA3-256(ss, "lo-auth-v1") |
  |  // ss zeroized immediately             |
  |                                         |
  |  --- proof (32 bytes) ---------->       |
  |                                         |
  |        constant_time_eq(proof, token)   |
  |        // token zeroized after verify   |
  |                                         |
  |  <--- READY or ERROR -----------        |

The three protocol steps correspond directly to the three CAPI entry points: the server's encapsulate-and-token step is soliton_auth_challenge; the client's decapsulate-and-proof step is soliton_auth_respond; the server's comparison step is soliton_auth_verify. Each CAPI function implements exactly one arrow in the diagram above — auth_challenge issues the ciphertext, auth_respond consumes it and returns the proof, auth_verify checks the proof against the stored token.

The X-Wing ciphertext ct is exactly 1120 bytes (32 bytes ct_X + 1088 bytes ct_M — see Appendix C). The proof value proof / token is 32 bytes (HMAC-SHA3-256 output).

General HMAC encoding rule — raw data, no length prefix: Throughout this protocol, HMAC data arguments are passed as raw bytes with no length prefix. This is the opposite of HKDF info fields (§5.4, §6.12), which use len(x) || x length-prefixed encoding. The distinction: HKDF info uses length prefixes because it concatenates multiple variable-length fields into a single domain-separation string; HMAC data is always a single, fixed-purpose input (a domain label or a counter byte) where no length prefix is needed. A reimplementer who applies the HKDF len(x) || x convention to HMAC data arguments produces a different MAC output with no error signal.

Convention: HMAC-SHA3-256(key, message) — the shared secret ss is the HMAC key, and the domain label "lo-auth-v1" is the message. ss is the key because it is the high-entropy secret material; the label is the data/domain separator. HMAC's security requires the key to be the secret — placing ss as the data argument and the label as the key would produce a MAC keyed by a public constant, which is trivially forgeable by anyone who knows the label. The label is 10 raw ASCII bytes with no length prefix — unlike the HKDF info fields in §5.4 which use len(x) || x format, HMAC in §4.2 passes the label directly as the HMAC data argument. C: use strlen("lo-auth-v1") (= 10), not sizeof("lo-auth-v1") (= 11) — sizeof includes the NUL terminator, producing an 11-byte input that yields a silently different HMAC token. A reimplementer who applies the §5.4 convention here would prepend a 2-byte BE length (0x00 0x0a) before the label, producing a different token.

4.3 Security Properties

Key possession proof: Only private key holder can produce correct HMAC.
Replay resistance (intra-connection): Fresh randomness per encapsulation prevents stale-proof replay within the same connection — an old (ct, proof) pair cannot be reused with a new ct challenge because the proof HMAC is bound to the specific ss from that encapsulation. Cross-connection replay is not prevented by fresh randomness alone — an adversary who captures a valid (ct, proof) pair can replay it against a different server instance that issues the same ct (e.g., via a replay of the encapsulation step). Cross-connection replay resistance requires the transport-layer session binding documented in §4.4; without it, the 30-second timeout only limits the replay window.
Post-quantum: X-Wing hybrid construction.
No signature required: Pure KEM paradigm.

4.4 Requirements

Server MUST validate the client's lo_pk is exactly 3200 bytes before beginning authentication. Accepting a short or oversized public key and then slicing into it for ExtractXWingPublic causes out-of-bounds access or reads from the wrong offset, producing a pseudorandom shared secret and silent HMAC mismatch — indistinguishable from an authentication failure. Length validation MUST precede encapsulation. A server that defers this check to Encaps will incur the full cost of an X-Wing KeyGen before discovering the malformed key — a pre-association DoS vector: an unauthenticated client can force repeated expensive KeyGen operations by sending malformed keys.
Wrong-length lo_pk MUST collapse to a generic authentication failure response: Returning a distinguishable error code for a wrong-length vs. correct-length key creates a length-probing oracle — an adversary can probe response codes to confirm whether a key is the expected size before committing to a full authentication attempt. Any authentication failure (wrong-length key, malformed key, correct-length key with bad cryptographic content) MUST produce the same externally observable outcome: generic authentication failure or connection close. The length check MUST still be enforced internally (to prevent the out-of-bounds access described above), but the error response to the client MUST NOT distinguish length failures from other authentication failures.
Server MUST use fresh randomness per encapsulation. Each (ct, token) pair MUST be delivered to the client at most once — caching and redelivering a previously generated ciphertext is forbidden even if a new encapsulation would produce the same entropy. Redelivering a pair gives an adversary an additional observation opportunity beyond the 30-second timeout window: an attacker who captures a replayed pair can attempt offline HMAC forgery against the same ss. Fresh randomness prevents entropy reuse, but delivery-uniqueness is a separate, additional requirement.
HMAC comparison MUST be constant-time (subtle::ConstantTimeEq). The comparison uses the full 32-byte HMAC-SHA3-256 output — no truncation. Implementations using a "HMAC-with-length" API parameterized on output length MUST request 32 bytes and compare all 32 bytes. A truncated comparison (e.g., 16 bytes) weakens forgery resistance from 256-bit to 128-bit and produces an incompatible proof token that fails on conforming servers.
Shared secret MUST be zeroized immediately after proof computation.
Auth token (proof HMAC) MUST be zeroized by the server immediately after the constant-time comparison. A server that retains the token in a session cache (e.g., for re-authentication within the 30-second window) enables token replay — an attacker who observes the token can resubmit it on the same connection before expiry. The 30-second timeout bounds but does not eliminate this window. Single-use: one comparison, then zeroize.
Label "lo-auth-v1" is a domain separator preventing cross-protocol attacks.
Transport-layer session binding: The proof token binds no server identity, timestamp, or connection identifier — replay resistance depends entirely on the transport layer binding the issued ciphertext to the specific connection on which it was issued. The server MUST reject a proof token received on any connection other than the one on which it issued the ciphertext. A token that escapes its connection context (e.g., via session hijacking or protocol downgrade) is replayable against any server that would issue the same ciphertext. The 30-second timeout bounds the window but does not replace the connection-binding requirement — without it, the timeout merely limits the replay window rather than preventing replay entirely.

4.5 Error Variants

Function	Error	Condition
`soliton_auth_challenge` (CAPI only)	`InvalidLength`	`lo_pk` not exactly 3200 bytes — Rust API only: the Rust `auth_challenge(client_pk: &IdentityPublicKey)` takes a typed reference; the size is enforced by the type system and `InvalidLength` cannot be returned. This guard exists only in the CAPI wrapper.
`soliton_auth_respond` (CAPI only)	`InvalidLength`	`ct` not exactly 1120 bytes — Rust API only: the Rust `auth_respond(ct: &xwing::Ciphertext)` takes a typed reference; the size is enforced at construction by the `xwing::Ciphertext` type and `InvalidLength` cannot be returned. This guard exists only in the CAPI wrapper.
`soliton_auth_verify` (CAPI only)	`InvalidLength`	`expected_token` not exactly 32 bytes — checked first (before `auth_proof`); same compile-time note as below applies.
`soliton_auth_verify` (CAPI only)	`InvalidLength`	`auth_proof` not exactly 32 bytes — Rust API only: the Rust `auth_verify(expected: &[u8; 32], proof: &[u8; 32]) -> bool` takes fixed-size array references; wrong-size inputs are rejected at compile time by the type system and `InvalidLength` cannot be returned. These guards exist only in the CAPI wrapper (`soliton_auth_verify`), which receives raw pointers and lengths.
`soliton_auth_verify` (CAPI only)	`VerificationFailed`	Constant-time comparison failed (proof ≠ token) — Rust API only: the Rust `auth_verify(expected: &[u8; 32], proof: &[u8; 32]) -> bool` returns `false` on mismatch; `VerificationFailed` is returned only by the CAPI wrapper.

External error collapsing requirement (see also §4.4): Callers MUST map all LO-Auth failures — InvalidLength from any step, VerificationFailed from auth_verify — to the same external authentication-failure response (e.g., connection close or generic error code). Returning a distinguishable error per step (e.g., "wrong key size" vs. "HMAC mismatch") enables an oracle: an attacker can probe which step failed and thereby determine whether the submitted key is the correct length (step 1 passes), whether the ciphertext was accepted (step 2 passes), and whether the HMAC matched (step 3 passes) — progressively confirming each layer of the authentication independently. All failures must be indistinguishable externally, regardless of which step triggered them.

5. LO-KEX: KEM-Based Key Agreement

5.1 Goals

Mutual authentication (both cryptographic: recipient via KEM; initiator via HybridSign over SessionInit — see §5.6).
Forward secrecy via pre-key rotation and single-use OPKs.
Post-quantum security via X-Wing.
Offline initiation via pre-keys.
Multi-key session binding (session key requires compromise of both IK and SPK).

5.2 Key Material

Key	Type	Size (pk)	Lifetime	Purpose
Identity Key (IK)	LO composite	3200 B	Long-term	Auth, signing
Signed Pre-Key (SPK)	X-Wing	1216 B	~weekly	Session initiation
One-Time Pre-Keys (OPK)	X-Wing	1216 B	Single use	Enhanced forward secrecy

Pre-keys are X-Wing only (no ML-DSA) because they need KEM, not signing.

OPK secret key storage format: OPK secret keys use the same expanded 2432-byte X-Wing secret key format as IK and SPK (§8.5): 32-byte X25519 scalar || 2400-byte ML-KEM-768 decapsulation key (NTT-domain). The table above shows public key size (1216 bytes); the stored secret key is 2432 bytes. Storing only the 32-byte X25519 scalar seed and re-deriving the ML-KEM portion at use is NOT supported — soliton stores the expanded form directly.

SPK private key retention after rotation: Rotating to a new SPK does NOT immediately delete the old SPK private key. The old private key MUST be retained for 30 days after rotation (Appendix B) to allow in-flight sessions that encapsulated to the old SPK to complete. Deleting the private key at rotation time causes silent InvalidData rejections for any SessionInit that arrived after rotation but was encapsulated to the pre-rotation SPK. After the 30-day window, the private key MUST be deleted — retaining it beyond that date extends the forward-secrecy exposure window. See §5.5 Step 4 and §10.2 for the deletion obligation and its security implications.

5.3 Pre-Key Bundle

Published to the user's home DM relay:

PreKeyBundle = {
    IK_pub:         LO composite public key (3200 bytes)
    crypto_version: "lo-crypto-v1"
    SPK_pub:        X-Wing public key (1216 bytes)
    SPK_id:         uint32
    SPK_sig:        Hybrid signature (3373 bytes)
    OPK_pub:        X-Wing public key (1216 bytes) [optional]
    OPK_id:         uint32 [optional]
}

OPK_pub and OPK_id must be both present or both absent.

SPK_id uniqueness obligation: SPK_id MUST be unique per server identity within the 30-day SPK retention window (§10.2). If a new SPK is generated with the same SPK_id as a recently deleted SPK that is still in its grace period, receive_session will silently retrieve the wrong secret key for that ID, producing AeadFailed with no diagnostic. A monotonic counter (incrementing on each SPK rotation) satisfies this constraint. Random 32-bit IDs are also acceptable given the collision probability over typical rotation schedules (~3 × 10⁻⁸ for a 30-day window with weekly rotation). Relay implementations MUST NOT reuse an SPK_id until the previous SPK with that ID has been fully deleted from the grace-period store. Note that SPK_id is a server-assigned opaque identifier — the reference implementation does not specify or enforce an allocation policy; uniqueness is a server-side obligation.

Wire format: The pre-key bundle is a transport-layer struct — soliton does not define a canonical binary encoding for it (unlike SessionInit, which has encode_session_init in §7.4). The transport protocol serializes the bundle for relay storage and retrieval. Field ordering and encoding are protocol-spec concerns. However, the following constraints apply regardless of wire format:

encode_prekey_bundle(b) =
    len(b.crypto_version) || b.crypto_version        // UTF-8, 2-byte BE len
 || b.IK_pub                                          // 3200 bytes (fixed, no length prefix)
 || b.SPK_pub                                         // 1216 bytes (fixed, no length prefix)
 || big_endian_32(b.SPK_id)
 || b.SPK_sig                                         // 3373 bytes (fixed, no length prefix)
 || if OPK present: 0x01 || b.OPK_pub || big_endian_32(b.OPK_id)
    else:           0x00

Decoder strictness: A conforming decoder for encode_prekey_bundle MUST reject: (1) any has_opk byte other than 0x00 or 0x01 — values 0x02-0xFF are invalid and MUST return InvalidData; (2) any trailing bytes after the last field — accept only the exact length implied by has_opk. Compare with §7.4's explicit "Trailing bytes after the last field → InvalidData" rule for decode_session_init. A decoder that accepts has_opk = 0x02 as "OPK present" produces the same parsed output as has_opk = 0x01 but allows an attacker to craft bundles that pass decoding with non-canonical bytes, creating format-malleability.

This encoding is not used in any AAD or signature (SPK_sig covers only the raw SPK_pub, not the bundle). It is provided as a reference for interoperable relay implementations. Two relays using different bundle encodings will not cause cryptographic failure — the fields are parsed individually, not as a blob — but a canonical encoding simplifies relay interop testing. For federated relay-to-relay bundle exchange, this encoding SHOULD be adopted as the shared convention: while the encoding is advisory for soliton clients (which parse individual fields), relays exchanging bundles in raw-blob form must agree on a representation. Two relays with incompatible bundle encodings produce parsing failures at relay ingestion without any cryptographic failure — the error is silent from the client's perspective. If a relay-level bundle exchange protocol does not independently negotiate encoding, it SHOULD adopt encode_prekey_bundle as normative for that exchange.

SPK_sig is computed over the domain-separated SPK public key (raw concatenation is unambiguous):

SPK_sig = HybridSign(IK_sk, "lo-spk-sig-v1" ‖ SPK_pub)

Raw concatenation — no length prefixes. This is safe because both components are fixed-size: the label is exactly 13 bytes and SPK_pub is exactly 1216 bytes, so no length prefixes are needed for unambiguous parsing. A reimplementer who adds length prefixes "for safety" produces different signed bytes and breaks all SPK signature verification. SPK_pub is the verbatim 1216-byte output of XWing.KeyGen() — no clamping, masking, or normalization is applied to any component (X25519 or ML-KEM) between key generation and signing. The bytes signed and stored must be identical. Some X25519 libraries normalize the public key (clear bit 255, or apply RFC 7748 clamping to the scalar before computing the public point), producing a different 32-byte value than the raw keygen output. If a reimplementer signs the pre-normalization bytes but stores the post-normalization bytes (or vice versa), HybridVerify in §5.4 Step 1 silently fails. The fixed 13-byte label "lo-spk-sig-v1" is a domain separator that prevents cross-context signature reuse — if the identity key is later used to sign other payloads (e.g., profile data or future protocol extensions), signatures from one context cannot be replayed in another. The SPK_id, crypto_version, OPK_pub, and OPK_id are metadata that travel alongside the signed key, not part of the signed message. Omitting crypto_version from the signature is intentional — downgrade protection relies on the hard-fail version policy (§14.14), not on signature binding. Omitting OPK_pub and OPK_id is intentional — OPKs are generated and signed independently, and their presence or absence in a bundle does not affect the authenticity of the SPK. SPK_sig is entirely independent of OPK data: Bob can add or remove OPKs from a bundle without invalidating the SPK signature, and a reimplementer who includes OPK bytes in the SPK signed message produces SPK signatures that fail verification on any bundle where the OPK differs.

Rationale for label-only domain separation: The signed message is a fixed label + raw key bytes, with no variable-length metadata. This keeps the signature verifiable without any metadata parsing ambiguity — the verifier has the label (a compile-time constant), the raw SPK_pub (from the bundle), and the raw IK_pub (from identity lookup). If SPK_id or crypto_version were included in the signed message, both signer and verifier would need to agree on an encoding format for those fields — an unnecessary source of interop bugs.

5.4 Session Initiation (Alice → Bob)

Alice wants to DM Bob. She has Bob's identity key (from community context or out-of-band) and fetches his pre-key bundle from his home relay.

Step 1: Verify Pre-Key Bundle

function VerifyPreKeyBundle(bundle, known_bob_ik):
    assert OPK fields are both present or both absent
    // Structural co-presence check fires FIRST, before any cryptographic operation.
    // Returns InvalidData (not BundleVerificationFailed) — tests format, not content.
    assert bundle.IK_pub == known_bob_ik
    assert bundle.crypto_version == "lo-crypto-v1"
    assert HybridVerify(bundle.IK_pub, "lo-spk-sig-v1" ‖ bundle.SPK_pub, bundle.SPK_sig)
    // Any assertion failure → abort, warn user

verify_bundle error collapse (anti-oracle): All non-structural verification failures — IK_pub mismatch, crypto_version mismatch, HybridVerify failure — return BundleVerificationFailed, not distinct error codes. A crypto_version mismatch returns BundleVerificationFailed, not UnsupportedVersion. Returning UnsupportedVersion for a version mismatch or VerificationFailed for a signature failure would let an attacker iteratively probe bundles to determine which specific field failed without possessing the correct keys — each distinct error response narrows the search space. The structural OPK co-presence check returns InvalidData (not BundleVerificationFailed) because it fires before any cryptographic operation and tests only format, not content. See §5.5 Step 1 for the parallel error-collapse analysis at the recipient side.

The type system enforces that initiate_session cannot be called with an unverified bundle; verify_bundle returns a VerifiedBundle newtype.

crypto_version maximum length: Parsers MUST reject any crypto_version field longer than 64 bytes with InvalidLength before performing the equality check. The 2-byte BE length prefix in encode_prekey_bundle can represent values up to 65,535 — a crafted bundle with a 65,535-byte version string consumes ~64 KiB before the equality check fires. Since "lo-crypto-v1" is 12 bytes, any field longer than 64 bytes is structurally impossible for a conforming version string, even accounting for hypothetical future versions. The CAPI enforces the broader decode_session_init input cap (64 KiB, §13.4), but a Rust reimplementer or binding author consuming the bundle fields individually MUST apply this length guard explicitly.

Step 2: Generate Ephemeral Key

(EK_pub, EK_sk) = XWing.KeyGen()

The keypair MUST be freshly generated from the OS CSPRNG for each initiate_session call. Reusing EK across sessions causes both sessions to share the same initial send_ratchet_sk — if EK_sk is compromised (e.g., via a side-channel during one session), every session initiated with that EK is also compromised at the initial ratchet epoch.

This ephemeral key serves as Alice's initial ratchet public key in LO-Ratchet (§6). Bob will encapsulate to it when performing the first KEM ratchet step upon replying. EK_sk must be preserved through Steps 3-7 and passed to RatchetState::init_alice (§5.5 / §13.5) as the initial send_ratchet_sk. Discarding EK_sk after constructing the SessionInit — e.g., freeing or zeroizing it once EK_pub has been extracted for the SessionInit struct — leaves Alice without the decapsulation key for Bob's first KEM ratchet step; decapsulation of Bob's first response silently fails (wrong epoch key → AeadFailed). EK_sk MUST NOT be used for any purpose other than this KEM decapsulation: using it for additional DH operations, separate KEMs, or signing creates cross-context key reuse that voids the forward-secrecy guarantee for OPK-less sessions. EK_sk is single-purpose — it decapsulates Bob's first KEM ratchet ciphertext and is then zeroized.

Step 3: KEM Encapsulations

// Encapsulate to Bob's identity key (authentication + defense-in-depth)
(ct_ik,  ss_ik)  = XWing.Encaps(ExtractXWingPublic(Bob.IK_pub))

// Encapsulate to Bob's signed pre-key (session binding)
(ct_spk, ss_spk) = XWing.Encaps(Bob.SPK_pub)

// Encapsulate to Bob's one-time pre-key (enhanced forward secrecy)
if Bob.OPK_pub is available:
    (ct_opk, ss_opk) = XWing.Encaps(Bob.OPK_pub)

Each XWing.Encaps call requires independent fresh randomness: Each call draws its own 32-byte ML-KEM encapsulation coins from the OS CSPRNG (FIPS 203 §7.2 requires uniformly random per-call coins). Sharing or reusing the same 32-byte entropy across two or three calls produces correlated ciphertexts that violate IND-CCA2 for those encapsulations — decapsulation succeeds and the session key derives normally, so there is no error diagnostic. The three calls are entirely independent invocations of XWing.Encaps and MUST each draw fresh entropy.

Step 4: Derive Session Key

if OPK was used:
    ikm = ss_ik || ss_spk || ss_opk    // 96 bytes (3 × 32-byte X-Wing shared secrets)
else:
    ikm = ss_ik || ss_spk               // 64 bytes (2 × 32-byte X-Wing shared secrets)

info = "lo-kex-v1"                            // raw 9-byte prefix (not length-prefixed)
    || len(crypto_version) || crypto_version // 2-byte BE length + 12 bytes ("lo-crypto-v1")
    || len(Alice.IK_pub) || Alice.IK_pub     // 2-byte BE length + 3200 bytes
    || len(Bob.IK_pub)   || Bob.IK_pub       // 2-byte BE length + 3200 bytes
    || len(EK_pub)       || EK_pub           // 2-byte BE length + 1216 bytes

session_key = HKDF(
    salt = 0x00 * 32,
    ikm  = ikm,
    info = info,
    len  = 64
)

root_key  = session_key[0..32]
epoch_key = session_key[32..64]
zeroize(session_key)           // 64-byte HKDF output — intermediate buffer containing entropy
                               // derived from kem_ss; MUST be zeroized after split. In Rust,
                               // wrapping in Zeroizing<[u8; 64]> handles this automatically on
                               // drop. Non-RAII implementations (C, Go, Python) MUST explicitly
                               // zero this buffer before returning or after the split — failing
                               // to do so leaves 64 bytes of key-derived material on the heap.

session_key must be zeroized after the split: The 64-byte HKDF output is an intermediate value containing both root_key and epoch_key. After splitting, the original session_key buffer still holds both secrets in cleartext and must be explicitly zeroized. In Rust, wrapping the buffer in Zeroizing<Vec<u8>> covers this automatically at drop. In C, a manual memset + compiler barrier (or explicit_bzero) is required. In Go, clear(sessionKey) after the copy. Failing to zeroize leaves a 64-byte window containing both the root key and the epoch key — more sensitive than either half alone.

session_key is derived with a single 64-byte HKDF call, then split positionally: The len = 64 HKDF call produces one 64-byte output; root_key and epoch_key are the first and second 32-byte halves respectively. This is NOT two separate HKDF invocations with distinct info labels — both halves come from the same Expand output. A reimplementer familiar with TLS 1.3's derive_secret (which calls HKDF-Expand-Label separately for each derived key with distinct labels and distinct context hashes) must not apply that pattern here. Using two separate HKDF calls with info = "root" and info = "epoch" (or any labeled split) produces different root_key and epoch_key values — both parties derive the same incorrect keys and AeadFailed results with no diagnostic.

Why zero salt: The IKM (ss_ik || ss_spk [|| ss_opk]) is already uniformly distributed high-entropy material — each shared secret is the output of an X-Wing KEM which by design produces pseudorandom bytes. HKDF's Extract step (the salt-keyed PRF) adds entropy from the salt to the IKM; when the IKM is already uniform, a non-secret salt (like zeros) provides no additional entropy benefit. A non-zero salt derived from session metadata would add complexity and a new parameter without a cryptographic gain. The choice of the default zero salt follows RFC 5869 §2.2's recommendation for this exact scenario.

Zero salt is 32 explicit zero bytes, not empty/null: The salt = 0x00 * 32 is the RFC 5869 §2.2 default for HKDF-SHA3-256 (HashLen = 32). Libraries that accept a null or empty salt MUST be verified to internally substitute the 32-zero-byte default — passing an empty byte slice (length 0) to HKDF's Extract step produces a different PRK than passing [0x00] × 32 (length 32) in many implementations. Go's golang.org/x/crypto/hkdf.New treats nil salt as "use HashLen zeros" but an explicit empty []byte{} may not — behavior varies by library. A reimplementer who passes nil in one language and [0x00; 32] in another gets interop failure with no diagnostic.

IKM concatenation order is critical: The order ss_ik || ss_spk [|| ss_opk] must be followed exactly — any reordering produces a different session key. Both parties derive IKM in the same order (Alice from encapsulation, Bob from decapsulation). The 64-byte and 96-byte IKM variants are not interchangeable. A reimplementer who zero-pads the absent ss_opk slot (passing 96 bytes with ss_opk = 0x00{32}) produces a different HKDF output than the specified 64-byte IKM — HKDF's Extract step processes the full input length, so ss_ik || ss_spk and ss_ik || ss_spk || 0x00{32} yield different PRKs. This manifests as AeadFailed at decrypt_first_message with no diagnostic.

IKM order is a documentation-only guarantee — no type-level enforcement: The ss_ik ‖ ss_spk [‖ ss_opk] concatenation order is specified above but not enforced at the type level. All three shared secrets have the same type (xwing::SharedSecret), so the encapsulation calls (or decapsulation calls on Bob's side) and the IKM concatenation can be reordered without a compile error. Any such reordering produces a different session key with no error at the HKDF step — the mismatch surfaces only as AeadFailed at decrypt_first_message with no diagnostic pointing to the ordering change. In the Rust implementation, the session_agreement_with_opk integration test is the only runtime guard against an order-breaking refactor. Any change to the encapsulation sequence in §5.4 Step 3, the decapsulation sequence in §5.5 Step 4, or the IKM concatenation in either step MUST verify that test still passes with matching keys on both sides.

Shared secret zeroization after IKM construction: After constructing ikm by concatenating ss_ik, ss_spk, and (optionally) ss_opk, each individual shared secret MUST be zeroized immediately. Copying a secret into a concatenation buffer creates an independent copy — the original remains on the heap (or stack) until explicitly zeroized. In Rust, Zeroizing<Vec> covers the concatenated buffer but .extend_from_slice() does not zeroize the source. In C, memcpy into ikm leaves the originals in their allocations. In Go, slice append does not zero the source. Forgetting this step leaks up to 96 bytes of shared secret material.

All len() values are 2-byte big-endian. The total info length is 7645 bytes (9 + 2 + 12 + 2 + 3200 + 2 + 3200 + 2 + 1216). The crypto_version field (added for cross-version domain separation) appears immediately after the raw prefix, before the identity keys.

Why length prefixes are used on fixed-size identity keys in HKDF info: Unlike §7.4 (AAD encoding, where fingerprints and public keys are written bare because their sizes are fixed by definition within a given crypto_version), the HKDF info field here applies len(x) || x encoding uniformly to all post-prefix fields. The rationale: (1) bit-string prefix-freeness — length-prefixed encodings are prefix-free, ensuring no valid info field for one set of inputs is a proper prefix of a valid info for another set, which is required for HKDF's domain separation guarantee to hold; (2) future version safety — a lo-crypto-v2 with different key sizes would change the field lengths; without length prefixes, a 3200-byte IK_pub in v1 and a differently-sized IK_pub in v2 would produce non-colliding info strings naturally (different sizes), but a uniform encoding convention ensures this by construction regardless of actual size changes; (3) consistency — crypto_version is genuinely variable-length, so all fields use the same encoding rule for simplicity. A reimplementer who omits the length prefixes from the fixed-size fields (treating them as optional "since the size is known") produces a different HKDF output — the missing 8 bytes of length prefixes shift the info bytes, producing a completely different session key with no diagnostic.

Why IK is both encapsulated to AND in info: IK encapsulation contributes ss_ik to the IKM, meaning Bob's IK private key is required to derive the session key. Binding both identity keys into HKDF info provides mutual authentication — substituting either key yields a different session key. See §5.6 for security analysis.

Why info includes EK_pub: Alice's ephemeral key EK_pub contributes no shared secret to the IKM (it is a KEM public key, not a DH key — shared secrets come from encapsulating to Bob's keys). Including EK_pub in info binds the session key to the specific ephemeral key Alice published. Without this binding, an active attacker could substitute a different sender_ek in the SessionInit while keeping the KEM ciphertexts intact — Bob's decapsulations would still succeed (the ciphertexts are bound to Bob's keys, not Alice's EK), but Bob's first KEM ratchet encapsulation (§6.4) would target the attacker's key instead of Alice's. With EK_pub in info, substituting sender_ek changes the HKDF output, causing decrypt_first_message to fail at AEAD.

Why info excludes SPK, OPK, and ciphertexts: SPK and OPK binding flows through the IKM path — only the holder of sk_SPK can produce ss_spk, and only the holder of sk_OPK can produce ss_opk. Including SPK/OPK public keys or ciphertexts in info would be redundant. For formal models: SPK/OPK binding is an IKM-path property (KEM correctness), not an info-path property (HKDF domain separation).

Step 5: Construct Session Init

SessionInit = {
    crypto_version:           "lo-crypto-v1"
    sender_ik_fingerprint:    SHA3-256(Alice.IK_pub) [32 bytes raw]
    recipient_ik_fingerprint: SHA3-256(Bob.IK_pub)   [32 bytes raw]
    sender_ek:                EK_pub [1216 bytes]
    ct_ik:                    X-Wing ciphertext [1120 bytes]
    ct_spk:                   X-Wing ciphertext [1120 bytes]
    spk_id:                   uint32
    ct_opk:                   X-Wing ciphertext [1120 bytes, optional]
    opk_id:                   uint32 [optional]
}

Encoded size: The encode_session_init output is 3,543 bytes (no OPK) or 4,669 bytes (with OPK). The OPK block adds exactly 1,126 bytes when present: 2 bytes (BE length prefix for ct_opk) + 1,120 bytes (ct_opk) + 4 bytes (opk_id as u32 BE). The 1-byte has_opk flag is always encoded (as part of the 3,543-byte base) — it is not part of the 1,126-byte increment. See §7.4 and Appendix C for the full field-by-field breakdown.

sender_ik_fingerprint lets Bob look up Alice's full identity key. Full IK not sent (bandwidth); Bob resolves from community context or prior knowledge.

recipient_ik_fingerprint names the intended recipient explicitly inside the signed payload. Since SessionInit is signed by Alice in Step 6, Bob can derive recipient binding from sender_sig alone — without reasoning about the KEM ciphertexts' implicit binding to Bob's keys. This simplifies formal verification: a Tamarin or ProVerif model can prove recipient binding as a direct property of the signature, rather than as a consequence of KEM decapsulability.

Step 6: Sign Session Init

Alice proves to Bob that she initiated this session (and possesses sk_IK_A):

session_init_bytes = encode_session_init(SessionInit)
sender_sig = HybridSign(Alice.IK_sk, "lo-kex-init-sig-v1" ‖ session_init_bytes)

Raw concatenation — no length prefixes (see Appendix A). sender_sig is a 3373-byte hybrid signature (Ed25519 64 bytes + ML-DSA-65 3309 bytes). The total signed message is "lo-kex-init-sig-v1" (18 bytes) || session_init_bytes (3543 or 4669 bytes) = 3561 bytes (no OPK) or 4687 bytes (with OPK). The label prefix is not length-delimited — it abuts session_init_bytes directly. sender_sig is transmitted alongside SessionInit and the first-message payload. Bob verifies it in §5.5 Step 3 before performing any KEM operations. The domain separator "lo-kex-init-sig-v1" prevents replay into any other signature context (§3.3). Canonical wire order: the three components are assembled as session_init_bytes ‖ sender_sig ‖ encrypted_payload — this order is defined and elaborated at the receiving side (§5.5 Step 3), which is where a receiver parses the three components. Alice MUST produce this order; Bob verifies in this order.

Step 7: Encrypt First Message

msg_key = KDF_MsgKey(epoch_key, 0)    // Counter 0 for the first message

nonce = random_bytes(24)     // Random for first message (defense in depth)

// session_init_bytes computed in Step 6 above
aad = "lo-dm-v1"             // 8 bytes
   || Alice.fingerprint_raw (32 bytes)
   || Bob.fingerprint_raw (32 bytes)
   || session_init_bytes

ciphertext = AEAD(
    key   = msg_key,
    nonce = nonce,
    plaintext = message_content,
    aad   = aad
)

// Zeroize msg_key immediately after use — secret material.
zeroize(msg_key)

encrypted_payload = nonce || ciphertext   // nonce prepended for decryption

The AAD binds the full session init structure. The session_init_bytes used here is the same encoding produced in Step 6 — it is computed once and reused verbatim, not re-encoded. Tampering with any session init field (ct_ik, ct_spk, sender_ek, spk_id, etc.) invalidates the AEAD tag. See §7.4 for the deterministic encoding.

No length prefixes in AAD fields: The "lo-dm-v1" AAD is raw concatenation — "lo-dm-v1" || sender_fp || recipient_fp || session_init_bytes — with no BE length prefixes separating the fields. This contrasts with the HKDF info construction in Step 4 (§5.4 Step 4), where each field is length-prefixed to prevent cross-context collisions. In the AAD, collisions are impossible structurally: sender_fp and recipient_fp are both fixed-width (32 bytes each), and session_init_bytes is the remainder. A reimplementer who applies the HKDF info length-prefix rule to the AAD produces different bytes and will see AEAD authentication failure on every first message.

build_first_message_aad / build_first_message_aad_from_encoded rejects empty si_encoded with InvalidData: An empty session_init_bytes input is rejected because an empty AAD suffix would strip the per-session binding — the AAD would degenerate to "lo-dm-v1" || sender_fp || recipient_fp with no session-init bytes, identical to what any first-message AAD would look like for the same two parties. encode_session_init never produces empty output (minimum 3,543 bytes), so this guard fires only on caller bugs — but it is a normative invariant of the AAD construction and MUST be enforced by reimplementers. InvalidData is correct (not InvalidLength) — an empty si_encoded is a structural protocol violation (a legitimate SessionInit has a minimum encoded size), not a buffer-size mismatch.

Why random nonce for the first message: The message key is unique (derived from a unique epoch key and counter), so a counter-based nonce would be safe. Random provides defense-in-depth: if a bug ever causes key reuse, random nonce prevents the catastrophic AEAD nonce-reuse failure mode.

First-message msg_key zeroization: msg_key is secret key material — it MUST be zeroized immediately after AEAD encryption completes. In Rust, wrapping the output of KDF_MsgKey in Zeroizing handles this automatically via Drop. In C/Go/Python, the caller must explicitly zeroize the key buffer after use. The same obligation applies to Bob's first-message decryption path (§5.5 Step 6).

Epoch key passthrough: The epoch_key (session-derived) becomes the epoch key passed to RatchetState::init_alice. Unlike the previous chain-ratchet design, the epoch key is not advanced by the first-message encryption — it is passed through unchanged. Name aliases: this value appears under three names across the spec, Rust API, and CAPI — epoch_key (this section), initial_chain_key (CAPI field name in SolitonInitiatedSession and Rust take_initial_chain_key() method name — both use the same initial_chain_key base name), and ratchet_init_key (CAPI first-message return). See §13.5 for the full list. send_count starts at 1 so that counter 0 is not reused by the ratchet (the first message consumed counter 0 with a random nonce). Counter 0 namespace partition: The send_count = 1 initialization is the sole mechanism preventing counter collision between the first-message path (encrypt_first_message at counter 0 with a random nonce) and the ratchet path (encrypt() starting at counter 1 with a counter-based nonce). Both paths derive message keys from the same epoch key via KDF_MsgKey(epoch_key, counter) — if a reimplementer initializes send_count = 0, the first encrypt() call produces the same msg_key as encrypt_first_message, with different nonces (counter-based vs. random) but identical AEAD keys. No runtime guard prevents this — the protection is purely structural (initialization value).

5.5 Session Reception (Bob)

Bob receives the session init (real-time or from offline queue).

Step 1: Resolve Alice's Identity

Bob uses sender_ik_fingerprint to look up Alice's full identity key from local cache, community server context, or prior knowledge. If unknown, Bob's client SHOULD indicate this is an unverified first contact.

This lookup is the sole identity binding. The library verifies that alice_ik_pk is self-consistent with the session (fingerprint matches sender_ik_fingerprint, signature is valid under that key), but it cannot verify that alice_ik_pk actually belongs to the human "Alice." If the caller supplies the wrong key — or an attacker's key — signature verification succeeds (the attacker signed the SessionInit with the corresponding private key), and the session is authenticated to the attacker, not Alice. The caller's key-lookup code is the only thing standing between "authenticated session with Alice" and "authenticated session with an adversary." See Appendix E, Caller Obligation 1.

TOFU key pinning obligation: On first contact (no prior key record for this fingerprint), the caller MUST record the association between sender_ik_fingerprint and alice_ik_pk immediately after a successful receive_session. On subsequent contacts from the same fingerprint, the caller MUST verify that alice_ik_pk matches the previously recorded key — presenting a different key for the same fingerprint MUST trigger a key-change warning. A caller who fails to pin the key after first contact and fails to verify on subsequent contacts accepts TOFU impersonation silently: an attacker controlling the relay can substitute a different key pair on every session and the library will accept each substitution as valid. Key pinning is the caller's responsibility — the library provides the fingerprint but does not maintain a key store.

Bob also validates:

crypto_version == "lo-crypto-v1"
SHA3-256(Alice.IK_pub) == si.sender_ik_fingerprint // MUST be constant-time — see below
SHA3-256(Bob.IK_pub) == si.recipient_ik_fingerprint // MUST be constant-time — see below

Fingerprint comparisons MUST be constant-time: Both SHA3-256(Alice.IK_pub) == si.sender_ik_fingerprint and SHA3-256(Bob.IK_pub) == si.recipient_ik_fingerprint are comparisons of 32-byte digests that must use constant-time equality. These checks precede HybridVerify (Step 3). A variable-time comparison here allows an attacker to probe the expected sender_ik_fingerprint value byte-by-byte by submitting crafted session inits and timing the comparison — they learn one byte per probe without paying the HybridVerify cost (~2 ms). After 32 probes, they know the stored fingerprint value. This allows targeted construction of sessions that pass the fingerprint check while carrying a fraudulent public key. Appendix E's constant-time table documents fingerprint comparison in §6.12 (ratchet); this requirement applies equally in the KEX context. soliton uses subtle::ConstantTimeEq for both comparisons.

Why receive_session does not need oracle-collapse: receive_session does NOT collapse errors to a generic failure. This is intentional and safe: the values being checked — crypto_version (cleartext), sender_ik_fingerprint (cleartext, transmitted in the SessionInit), and recipient_ik_fingerprint (cleartext, the receiver's own identity) — are all known to the sender who constructed the SessionInit. A timing leak on any of these checks reveals nothing the attacker did not already supply or know. This contrasts with verify_bundle (which collapses to prevent bundle-content enumeration) and LO-Auth (which collapses to prevent authentication-step enumeration). The comparisons still MUST be constant-time (§Appendix E) to prevent reconstruction of the receiver's stored fingerprint value, but the error codes themselves need not be collapsed.

Error collapsing in verify_bundle vs receive_session: The verify_bundle function (§5.3) collapses all non-structural failures (crypto version mismatch, fingerprint mismatch, signature verification failure) to the single BundleVerificationFailed error. Returning distinct errors would create an enumeration oracle — an attacker could iteratively probe which validation step failed, revealing information about the bundle contents. Exception: the OPK structural co-presence check (opk_pub and opk_id must be both present or both absent) returns InvalidData, not BundleVerificationFailed. This check runs before the IK comparison and signature verification — it is a pre-cryptographic structural validation, not a security-sensitive check that requires collapse. Callers pattern-matching on verify_bundle errors must handle both BundleVerificationFailed and InvalidData. receive_session does NOT collapse errors — it returns UnsupportedCryptoVersion for a bad crypto version and InvalidData for fingerprint mismatches. No pre-key bundle is involved in receive_session, so the bundle-level collapse does not apply; the SessionInit fields being checked (crypto version, fingerprints) are already visible to the sender who constructed them.

Step 2: Validate OPK Co-Presence

OPK fields must both be present or both be absent. Failure → abort with InvalidData. This is a structural validation on the parsed SessionInit, not a cryptographic operation — executing it before signature verification avoids unnecessary signature/KEM work on malformed messages.

Two distinct co-presence checks — only one is pre-signature: Step 2 validates the structural co-presence of ct_opk and opk_id within the decoded SessionInit (both fields present or both absent). This check is pre-signature. A separate caller co-presence check — whether the caller supplied opk_sk if and only if ct_opk is present — fires at Step 4, after HybridVerify. The caller check MUST be post-signature: moving it to Step 2 would create an OPK-presence oracle (an attacker could distinguish "OPK present, no opk_sk provided" from "OPK absent" before signature verification, probing the receiver's key state without completing authentication). A reimplementer who consolidates both checks at Step 2 enables this oracle. The §5.5 Step 4 note documents the post-signature placement rationale.

The OPK co-presence check is enforced inside encode_session_init: In the reference implementation, this check fires from within encode_session_init (called at Step 3 to reconstruct the signed message bytes), not as an independent pre-step. A reimplementer who constructs the signed message manually — by concatenating SessionInit fields directly without calling encode_session_init — bypasses this guard entirely and reaches KEM operations (Step 4) with a structurally invalid SessionInit. Any reimplementation MUST perform the OPK co-presence check explicitly before HybridVerify if encode_session_init is not used to reconstruct the signed bytes.

Step 3: Verify Initiator Signature

session_init_bytes = encode_session_init(received_session_init)
HybridVerify(Alice.IK_pub, "lo-kex-init-sig-v1" ‖ session_init_bytes, sender_sig)

session_init_bytes MUST be reconstructed by calling encode_session_init, not extracted from the wire: The signed message "lo-kex-init-sig-v1" ‖ session_init_bytes uses the canonical encoding of the parsed SessionInit struct — the output of encode_session_init. The session_init_bytes are NOT transmitted as an opaque blob alongside the signature; the wire format is session_init_bytes || sender_sig || encrypted_payload (§5.4 Step 6), but the verifier re-encodes from the parsed struct rather than slicing the wire buffer. A reimplementer building a signature verifier who tries to extract session_init_bytes from the wire — e.g., slicing wire[0..3543] — must verify that their wire-slice produces byte-for-byte identical output to encode_session_init. Any normalization of SessionInit fields during parsing (key clamping, padding removal, normalization of the X25519 component) that changes the bytes on re-encoding causes VerificationFailed on an authentic session init.

Verification failure → abort with VerificationFailed. This provides cryptographic proof that the session was initiated by the holder of sk_IK_A, preventing zero-knowledge impersonation: an adversary who knows only pk_IK_A cannot produce a valid sender_sig without sk_IK_A. This step executes before any KEM operations; a forged or absent signature is rejected immediately, not silently.

Verifier bytes obligation: HybridVerify must receive the raw bytes of sender_ek exactly as stored and transmitted — no normalization, clamping, or bit-masking applied to any sub-key component. If the verifier's X25519 library normalizes public keys on import (e.g., clears bit 255 of byte 31 — see §8.1), the verified bytes differ from the signed bytes and VerificationFailed results even with an authentic session init. The same obligation applies to SPK verification in §5.3. See §8.1 for the X25519 masking hazard that most commonly triggers this.

Validation ordering rationale: The three pre-signature checks (crypto version, sender fingerprint, recipient fingerprint — Steps 1-2) are cheaper than HybridVerify (which performs Ed25519 + ML-DSA-65 verification). Running them first avoids the cost of two signature verifications on messages that would fail a trivial structural check. A reimplementer who reorders signature verification before the fingerprint checks wastes CPU on forged messages and gains no security benefit — all four checks are required before proceeding to KEM operations regardless of order.

sender_sig is transmitted alongside the session init and first-message payload; it is not part of the SessionInit struct and is not included in the AAD (it covers the encoded SessionInit).

Canonical wire order: The three components are assembled as session_init_bytes || sender_sig || encrypted_payload — session init first (fixed or deterministic size given has_opk), then the signature (fixed 3373 bytes), then the encrypted payload (variable length). All three have deterministic sizes: session_init_bytes is 3543 bytes without OPK or 4669 bytes with OPK (§7.4, Appendix F.13), sender_sig is always 3373 bytes (§3.3). A receiver must consume all 3543 bytes of the fixed prefix before reaching the has_opk flag at offset 3542 (the last byte of the fixed prefix), which determines whether the remaining 1126 bytes of OPK data follow. The receiver then reads exactly 3373 bytes of sender_sig, and the remainder is encrypted_payload. The CAPI returns these as separate fields; callers assembling wire messages MUST use this order.

Step 4: Decapsulate

// Decapsulate IK ciphertext
ss_ik = XWing.Decaps(ExtractXWingPrivate(Bob.IK_sk), ct_ik)

// Decapsulate SPK ciphertext
ss_spk = XWing.Decaps(Bob.SPK_sk[spk_id], ct_spk)

// Decapsulate OPK ciphertext (if present)
if ct_opk is present:
    ss_opk = XWing.Decaps(Bob.OPK_sk[opk_id], ct_opk)
    // Delete OPK_sk[opk_id] immediately — single use

Caller co-presence obligation: The caller must provide opk_sk if and only if ct_opk is present in the SessionInit.

soliton does: If ct_opk is present but opk_sk is not provided (e.g., the OPK was already consumed and deleted), receive_session returns InvalidData — but only after signature verification (Step 3), so this check cannot be used as an oracle.
What a broken reimplementation sees instead: A reimplementer who silently skips OPK decapsulation when opk_sk is unavailable (omitting ss_opk from the IKM) does NOT get InvalidData at receive_session — receive_session succeeds. The session key diverges from Alice's, and the error surfaces only as AeadFailed at decrypt_first_message with no diagnostic pointing to the missing OPK decapsulation. The active guard (InvalidData on missing opk_sk) is the only mechanism that surfaces this condition as a clear error; omitting the guard silently accepts a broken session.

The converse is also InvalidData: if ct_opk is absent but opk_sk is provided, the session init contains no OPK ciphertext to decapsulate. Silently ignoring a surplus opk_sk would mask a caller error where the wrong OPK was retrieved.

OPK deletion is a forward secrecy boundary. While sk_OPK survives, a three-key compromise (sk_IK + sk_SPK + sk_OPK) recovers this session's key. After deletion, only two-key compromise (sk_IK + sk_SPK) suffices — the OPK's contribution to IKM is lost. "Immediately" means before the ratchet state is used for any messaging — not deferred to a background task or garbage collector. Any delay between decapsulation and deletion is a forward secrecy window where the three-key compromise remains viable.

The caller, not the library, performs the OPK deletion. receive_session accepts opk_sk as a shared reference (&xwing::SecretKey); the library decapsulates but holds no handle to persistent storage and cannot remove the OPK key from the caller's keystore. The caller MUST delete the OPK from persistent storage at the call site, immediately after receive_session returns successfully, before passing the resulting ratchet state to any messaging function. A reimplementer who expects the library to delete the OPK automatically, or who defers deletion to a separate "cleanup" pass, retains the key beyond the intended forward-secrecy boundary.

OPK deletion MUST be atomic with receive_session (single DB transaction): A server that completes receive_session and then crashes before deleting the OPK from storage will accept the same session_init again on restart — the OPK is still present, so the co-presence check passes. The second receive_session call succeeds and produces a second ratchet state from the same OPK decapsulation, violating the single-use guarantee. The correct model: execute receive_session and the OPK deletion as a single atomic database transaction. If receive_session succeeds, commit the transaction (which atomically deletes the OPK and persists the ratchet state). If the server crashes before the commit, the transaction rolls back and the OPK remains for the retried session init. If the server crashes after the commit, the OPK is deleted and the ratchet state is persisted — the session init is rejected on retry (ct_opk present but OPK deleted → InvalidData post-verification).

If spk_id does not match any retained SPK (all rotated out or invalid ID), the caller MUST reject the session init with InvalidData — but only after signature verification (Step 3). Checking spk_id before signature verification would create an SPK enumeration oracle: an attacker could probe which SPK IDs are retained without paying the signature cost. After signature verification confirms the session init is authentic, an unrecognized spk_id is safely rejected. Using the wrong SPK key instead of rejecting yields implicit rejection, producing a diverged session key and AEAD failure cryptographically indistinguishable from corruption. Expired SPKs (private key deleted after the 30-day retention window, §10.2) must be handled identically to unknown SPKs — return InvalidData post-signature-verification. Maintaining a separate "expired" vs "unknown" error would reintroduce the enumeration oracle that post-signature ordering is designed to prevent.

X-Wing implicit rejection (§8.4) applies to all three decapsulations — ct_ik, ct_spk, and ct_opk. Invalid or tampered ciphertexts produce pseudorandom shared secrets rather than errors, and the derived session key diverges silently from Alice's. decrypt_first_message fails at AEAD with no indication of which decapsulation diverged. Reimplementers using ML-KEM libraries with explicit-rejection APIs (that return an error on invalid ciphertexts) MUST suppress those errors and use the implicit-rejection output — propagating DecapsulationFailed would leak which ciphertext was malformed.

If opk_id references an absent OPK (expired or already consumed), the same applies — the pseudorandom shared secret from implicit rejection causes AEAD failure, leaking no information about OPK validity.

ML-KEM key format hazard: The ml-kem crate (and soliton's X-Wing §8.5) stores ML-KEM-768 decapsulation keys in NTT-domain encoding — the 1152-byte dk_PKE field contains polynomials in Number Theoretic Transform representation, not the coefficient-domain encoding specified in FIPS 203 §7.3 DecapsKeyGen. Reimplementers sourcing ML-KEM keys from other libraries (liboqs, PQClean, BouncyCastle) that use FIPS 203's coefficient-domain format MUST convert to NTT-domain before using them with soliton's X-Wing decapsulation. Using the wrong domain produces a pseudorandom shared secret (implicit rejection), causing silent AeadFailed at decrypt_first_message with no diagnostic pointing to the format mismatch. See §8.5 for the full key layout.

Diagnostic note — correct spk_id with wrong secret key: An unrecognized spk_id is caught explicitly (rejected as InvalidData post-signature-verification). A recognized spk_id paired with the wrong secret key (e.g., a storage corruption that maps a valid ID to a different key) is not caught at this step — ML-KEM implicit rejection produces a pseudorandom ss_spk, receive_session returns success, and the error surfaces only when decrypt_first_message fails with AeadFailed. No diagnostic distinguishes this from ciphertext tampering, transport corruption, or any other decapsulation divergence. This is the hardest SPK storage bug to diagnose. Implementations that maintain an spk_id → sk mapping SHOULD verify the mapping's integrity independently (e.g., by storing a fingerprint of the public key alongside the private key and checking it before decapsulation).

Diagnostic note — correct opk_id with wrong secret key: The same applies to OPK: a recognized opk_id paired with the wrong opk_sk (storage corruption mapping a valid OPK ID to different key material) produces a pseudorandom ss_opk via implicit rejection, receive_session returns success, and the error surfaces only as AeadFailed at decrypt_first_message. Unlike the SPK case, OPK keys are single-use and deleted immediately after decapsulation (§5.5 Step 4), so long-term storage corruption is less likely — but the failure mode is identical. Implementations SHOULD store an OPK public key fingerprint alongside the OPK secret key and verify it before decapsulation, the same as for SPK.

Step 5: Derive Session Key

Identical HKDF as Alice (§5.4 Step 4), using:

ikm: ss_ik || ss_spk [|| ss_opk]
info: Alice's IK_pub, Bob's IK_pub, Alice's EK_pub (from session init)

Produces identical root_key and epoch_key.

IKM zeroization obligation (identical to §5.4 Step 4): After HKDF output is split into root_key and epoch_key, zeroize the IKM (ss_ik || ss_spk [|| ss_opk]) and each component shared secret (ss_ik, ss_spk, ss_opk). These are uniformly distributed 32-byte KEM shared secrets — leaving them in memory after use enables an attacker with post-compromise memory access to recover root_key and epoch_key. See §5.4 Step 4 for the full zeroization rationale and the note on IKM concatenation buffer zeroization (the concatenated buffer holds copies of all shared secrets and must be zeroized independently of the individual components).

Initiator-first ordering, not local-first: Alice's identity key precedes Bob's in the HKDF info on both sides — Alice uses Alice.IK_pub || Bob.IK_pub and Bob also uses Alice.IK_pub || Bob.IK_pub. The ordering is determined by the initiator/responder role, not by which party is doing the computation. A reimplementer who reads "identical HKDF as Alice" as "local key first, remote key second" would swap the order on Bob's side, producing a different session key — both parties succeed at their own computation with no error; the mismatch surfaces only as AeadFailed at decrypt_first_message.

sender_ek (Alice's EK_pub) in HKDF info MUST be the raw bytes from the received session init — no normalization: The "Verifier bytes obligation" in Step 3 covers signature verification; the same no-normalization requirement applies here. Bob's HKDF info computation uses sender_ek (the X-Wing public key Alice transmitted), and it MUST be the raw received bytes, not a library-imported-and-re-exported form. If Bob's X25519 library normalizes the public key at import (e.g., clears bit 255 of the last byte — the high bit is masked in RFC 7748 §5 scalar multiplication), the normalized bytes differ from Alice's transmitted bytes, the HKDF info diverges, and decrypt_first_message fails with AeadFailed with no diagnostic pointing to the normalization. The fix: use the raw session_init.sender_ek bytes directly in the info construction without passing them through a library's key import path. See §8.1 for the X25519 masking hazard. The no-normalization obligation for signatures (Step 3) is explicitly documented there; this is the equally critical, less obvious HKDF-side obligation.

Step 6: Decrypt First Message

msg_key = KDF_MsgKey(epoch_key, 0)    // Counter 0 for the first message

// Reconstruct AAD from received session init
session_init_bytes = encode_session_init(received_session_init)
aad = "lo-dm-v1" || Alice.fingerprint_raw || Bob.fingerprint_raw || session_init_bytes

// Guard: reject payloads too short to contain a nonce + Poly1305 tag.
// Minimum valid length is 40 bytes (24-byte nonce + 16-byte tag). Payloads
// shorter than 40 bytes cannot contain a valid nonce — slicing [0..24] on
// a sub-24-byte buffer causes out-of-bounds access in C or a panic in Rust.
// Return AeadFailed (not InvalidLength) — see §12 oracle-collapse rationale.
if len(encrypted_payload) < 40:
    raise AeadFailed

// Extract nonce from payload
nonce = encrypted_payload[0..24]
ciphertext = encrypted_payload[24..]

// Zeroize msg_key immediately after use — secret material.
plaintext = AEAD-Decrypt(msg_key, nonce, ciphertext, aad)
zeroize(msg_key)

Bob's encode_session_init(received_session_init) must produce byte-for-byte identical output to Alice's Step 6 encoding — any field transformation during decode (padding trimming, key clamping, normalization) that alters re-encoded bytes causes silent AEAD failure with no diagnostic.

AeadFailed conflation is normative — MUST NOT add distinguishing codes: All AEAD authentication failures in decrypt_first_message — whether caused by a wrong session key (diverged KEM output), a tampered nonce, a modified AAD, a corrupt ciphertext, or a re-encoded session_init_bytes that differs from Alice's original — MUST return AeadFailed with no distinguishing information. Reimplementers MUST NOT return distinct error codes for these cases (e.g., a separate KeyDerivationMismatch or AadMismatch). Adding distinguishing codes creates an oracle: an attacker who can trigger specific errors knows which layer of the construction failed, enabling targeted substitution attacks. The single AeadFailed response forces the attacker to succeed at AEAD authentication — i.e., to know the key — to get any response other than failure. This requirement also applies to receive_session as a whole: VerificationFailed (Step 3) and AeadFailed (Step 6) must remain the only cryptographic-layer failure codes, not be further subdivided.

First-message msg_key zeroization (Bob): msg_key MUST be zeroized after AEAD decryption completes — it is secret material. In Rust, Zeroizing<[u8; 32]> handles this automatically. In C/Go/Python, explicitly zeroize the key buffer after AEAD-Decrypt returns. The same obligation applies on Alice's encrypt path (§5.4 Step 7).

Step 7: Initialize Ratchet State

Bob initializes LO-Ratchet with:

root_key from key derivation
recv_epoch_key = epoch_key (the session-derived key, now used as the receive epoch key)
recv_ratchet_pk = Alice's EK_pub (from session init)
ratchet_pending = true (Bob must perform a KEM ratchet step before his first send)
recv_count starts at 1 (the session-init message used counter 0). Corollary: a ratchet header with n = 0 will fail the duplicate check (0 < recv_count = 1) and be rejected as DuplicateMessage. Counter 0 is permanently outside the ratchet's receive window — it belongs to decrypt_first_message, not the ratchet. A reimplementer who initializes recv_count = 0 instead of 1 would accept n = 0 as a valid ratchet message, creating a counter alias with the first-message counter and enabling replay of the session-init payload as a ratchet message (AEAD would fail due to AAD mismatch, but the acceptance represents a protocol divergence). This recv_count = 1 invariant is a construction-time guarantee, not enforced at deserialization: the §6.8 guards do not reject a deserialized blob with recv_count = 0. A cross-implementation blob constructed with recv_count = 0 is silently accepted by the reference deserialization. The deserialization path trusts the invariant was maintained during construction. Reimplementers who allow recv_count = 0 at init time (e.g., for testing or partial state reconstruction) produce blobs that the reference accepts, but with state that violates the counter-alias-free guarantee above.
recv_seen = empty (counter 0 was consumed by decrypt_first_message, outside the ratchet)
Bob generates his own ratchet keypair only on first reply (triggered by ratchet_pending)

Session init replay — library boundary: receive_session does not detect or reject replayed session inits. A replayed session init carries a valid signature (it was signed by Alice), valid KEM ciphertexts, and passes all library-layer checks. If the same session_init_bytes is submitted to receive_session a second time (with a still-present OPK), a second ratchet state is created from the same KEM outputs — two live ratchet objects initialized identically, with the same root key and epoch key, in different memory locations. The library has no persistent session registry and cannot distinguish a replay from a legitimate first delivery.

Replay detection is the caller's responsibility. The correct architecture:

The relay MUST deduplicate session inits before delivering them to Bob's device — the natural deduplication key is (sender_ik_fingerprint, recipient_ik_fingerprint, SHA3-256(session_init_bytes)). A relay that delivers the same session init twice creates the duplicate-ratchet-state condition.
Bob's client MUST enforce at-most-once semantics for session establishment with a given peer: if a ratchet session already exists for the (sender_ik_fingerprint, spk_id, ct_ik) combination, the client MUST NOT call receive_session a second time with the same session init.
The OPK single-use delete-in-transaction requirement (§5.5 Step 4) provides a partial backstop: once the OPK is deleted, a replayed ct_opk-bearing session init fails with InvalidData at the co-presence check. However, OPK-less session inits (no ct_opk) have no such backstop and rely entirely on caller-side deduplication.

receive_session exposing this boundary as a caller obligation (rather than adding a session registry inside the library) is intentional — the library has no persistent storage and cannot implement relay-side deduplication. Applications building atop soliton MUST implement the deduplication layer at the relay and client levels described above.

5.6 Security Analysis

Multi-key session binding: The session key requires ALL shared secret components. No single key compromise is sufficient. Note: "IK" in the table below means the X-Wing component only (bytes 0-2431 of the LO composite secret key — see clarification after the table).

Keys compromised	Session key recoverable?
IK (X-Wing component) alone	No — missing ss_spk
SPK alone	No — missing ss_ik
OPK alone	No — missing ss_ik, ss_spk
IK (X-Wing component) + SPK	Yes (same as X3DH / PQXDH)
IK (X-Wing component) + SPK + OPK	Yes

"IK" in this table means the X-Wing component only: Session key recovery via IK requires the X-Wing private key (sk_X || dk_M, bytes 0-2431 of the LO composite secret key) — the component needed to decapsulate ct_ik. The Ed25519 and ML-DSA sub-keys within the LO composite key do not participate in key agreement and are irrelevant to session key recovery. A full LO composite key compromise (sk_IK) trivially yields the X-Wing sub-key, so the security table holds. But an adversary who compromises only the Ed25519 or ML-DSA sub-keys (e.g., through an algorithm-specific attack) gains forgery capability (session initiation, SPK signing) but NOT session key recovery — IK KEM decapsulation is independent of the signing sub-keys.

SPK is the most exposed key (medium-term, stored on relay, retained 30 days after rotation). IK is long-term and device-stored only. Requiring both for session key recovery means the least-protected key is no longer a single point of failure.

Forward secrecy: Forward secrecy comes from SPK rotation and OPK single-use. After SPK private key is deleted, sessions using that SPK are permanently secure — even if IK is later compromised, the attacker lacks ss_spk.

EK_sk forward-secrecy window: Alice's ephemeral key EK_sk (§5.4 Step 2) must remain live until she successfully processes Bob's first KEM ratchet step (§6.6 new-epoch path), at which point it MUST be zeroized. Until zeroization, a device compromise allows an attacker to recover EK_sk and decapsulate Bob's first KEM ratchet ciphertext — recovering ss_spk_ratchet and therefore the initial epoch key. This exposes all messages in Alice's first ratchet epoch (from send_count = 1 through the first KEM ratchet step). This window is bounded and unavoidable: the key must exist until the decapsulation it enables occurs. It does not affect sessions that used an OPK (the OPK provides an additional shared secret layer), and it disappears as soon as Alice processes Bob's first ratchet reply. The EK_sk zeroization obligation is documented in §5.4 Step 2 and §13.5; the forward-secrecy implication is that the window is as long as the round-trip to Bob's first reply.

Post-quantum security: All shared secrets via X-Wing. Both X25519 and ML-KEM-768 must be broken simultaneously.

Mutual authentication: Both identity keys are cryptographically bound into the session. Bob's IK is bound via KEM encapsulation (ct_ik): only the holder of Bob's IK private key can decapsulate and derive the session key. Alice's IK is bound via a HybridSign over the encoded SessionInit (sender_sig, §5.4 Step 6): only the holder of Alice's IK private key can produce a valid signature.

Recipient binding — implicit and explicit: Bob's IK is bound implicitly by the KEM: an attacker lacking Bob's IK private key cannot decapsulate ct_ik and the session key they derive will be garbage. Bob's IK is also bound explicitly via recipient_ik_fingerprint embedded in the signed SessionInit: sender_sig directly names Bob as the intended recipient, independent of KEM decapsulability. Formal verification tools (Tamarin, ProVerif) can derive recipient binding from the signature alone, without modelling KEM implicit binding as a separate lemma.

UKS (Unknown Key Share) resistance: An Unknown Key Share attack would allow Alice to establish a session that Alice believes is with Bob, but Bob believes is with a third party C. LO-KEX prevents this via a three-link chain that must all hold simultaneously: (1) Bob validates SHA3-256(Bob.IK_pub) == si.recipient_ik_fingerprint (§5.5 Step 1) — this binds Bob's own key to the session before any KEM operation; (2) Bob verifies Alice's signature over session_init_bytes, which contains recipient_ik_fingerprint as a field — so the signature covers Bob's identity explicitly, not just cryptographic material that implies it; (3) both Alice.IK_pub and Bob.IK_pub are bound into the HKDF info field via build_kex_info — a session where Alice thinks she's talking to Bob but Bob thinks he's talking to C would require both parties to derive the same session key from different info inputs, which HKDF collision-resistance prevents. All three links are required: the fingerprint check alone fails if the attacker can substitute a key with the same fingerprint (SHA3-256 preimage resistance required); the signature check alone fails if the signature doesn't name the recipient (it does, via recipient_ik_fingerprint); the HKDF binding alone is not a direct authentication (both parties must independently check they are talking to the expected peer). This argument is documented as A9 in Abstract.md; formal models must verify all three links hold simultaneously under the relevant security assumptions.

Explicit initiator authentication: Alice's sender_sig is proof-of-possession of sk_IK_A. An adversary who knows only pk_IK_A cannot produce a valid sender_sig without sk_IK_A (HybridSign EUF-CMA, §3.3). Bob verifies the signature before any KEM operations (§5.5 Step 3), so a forged or missing signature is rejected immediately, not silently. Both identity keys are also committed into the HKDF info field — any substitution additionally fails at first-message decryption.

IMPORTANT — First-contact limitation (TOFU): The mutual authentication guarantee holds only when both parties possess authentic copies of each other's identity keys. The signature proves Alice holds sk_IK_A, but does not prove that pk_IK_A actually belongs to the human "Alice" — on first contact, Bob cannot verify the binding between pk_IK_A and a human identity.

A relay controlling the delivery path could substitute a different IK pair (its own pk_IK_X, sk_IK_X), forge a valid sender_sig, and impersonate Alice to Bob — because Bob has no reference key to compare against on first contact. This is trust-on-first-use (TOFU), identical to Signal, SSH, and all systems without centralized PKI. It is inherent, not a bug.

Mitigations:

Verification phrases (§9) for post-hoc verification.
Key pinning after first contact.
Community server context (shared presence provides key distribution).
Multi-path verification (compare keys from multiple independent servers).

KCI resistance: Corrupt(IK, A) enables impersonation of Alice (both signing pre-keys and forging sender_sig). Cannot impersonate Bob to Alice (requires Bob's SPK/OPK private keys, independent of sk_IK_A).

Non-deniability: LO-KEX does not provide deniability. Alice's sender_sig (§5.4 Step 6) is a HybridSign EUF-CMA signature over the encoded SessionInit — Bob can present (session_init_bytes, sender_sig, pk_IK_A) to any third party as cryptographic proof that Alice initiated this specific session. This is a deliberate departure from Signal's X3DH, which achieves deniability through DH's non-binding outputs (both parties can compute the same shared secret, so neither can prove who initiated). Systems requiring deniable authentication should note this property. See Appendix D (Hashimoto, PKC 2024) for post-quantum deniable AKE approaches.

Header integrity: AAD binds session init (§5.4 Step 7) and ratchet headers (§6.5). Header tampering → AEAD failure. See §7.3-7.4.

spk_id cryptographic binding: spk_id is not included in the HKDF info of KDF_KEX (§5.4) — its binding flows through a different path. spk_id is a field of SessionInit, which is encoded by encode_session_init (§7.4) and incorporated into the AEAD AAD for the first message (§5.4 Step 7 and §7.3). AEAD authentication over this AAD provides the cryptographic binding: any attacker who substitutes a different spk_id in transit causes the encode_session_init output to differ, which changes the AAD bytes, which causes AEAD authentication to fail on the responder's side. The binding chain is: spk_id → encode_session_init(session_init) → AEAD AAD → authentication tag. A formal modeler constructing a spk_id-substitution attack lemma should derive binding from this chain rather than from the KDF info path.

Channel 2 surface: LO-KEX exposes the following metadata to a passive network adversary: the bundle fetch event (party A intends to initiate a session with party B), the SessionInit message (reveals both fingerprints and crypto version to any interceptor), and failed initialization responses (a structural rejection is distinguishable from silence, enabling version and presence probing — see §1.5 for the probing implication). All content and authentication guarantees above are unaffected; these are structural metadata leaks outside the Channel 1 scope of this section.

6. LO-Ratchet

After session establishment, ongoing message encryption uses LO-Ratchet.

6.1 Overview

LO-Ratchet combines a KEM ratchet (replacing Double Ratchet's DH ratchet) with counter-mode message key derivation. When the conversation direction changes, the new sender generates a fresh X-Wing keypair, encapsulates to the other party's current ratchet public key, and derives new root and epoch keys. Within an epoch (between KEM ratchet steps), each message key is derived directly from the epoch key and the message counter in O(1), without sequential chain advancement.

Design rationale: The Signal Double Ratchet uses a sequential KDF chain that provides per-message forward secrecy — compromising the chain key at position N reveals only messages N+1, N+2, ... but not messages 0..N-1. LO-Ratchet deliberately trades this for per-epoch forward secrecy: compromising an epoch key reveals all messages in that epoch. This simplification eliminates the skip cache, TTL expiry, purge logic, and O(N) skip cost for out-of-order messages, removing the most error-prone component of the protocol. The practical security impact is minimal — the epoch key shares a memory region with strictly more powerful secrets (root key, ratchet secret key), and any realistic memory compromise that extracts the epoch key also extracts these adjacent secrets, rendering per-message forward secrecy moot. Scope note: this memory-colocation argument holds when all ratchet state resides in a single protected memory region. Architectures where only the epoch key is exported — for example, an HSM-backed ratchet that holds root_key and send_ratchet_sk in hardware but exports send_epoch_key to the application CPU for message key derivation — break the colocation assumption: an attacker who compromises only the exported epoch key does not automatically also hold root_key. In such architectures, the per-epoch vs. per-message trade-off carries real security cost and should be evaluated against the specific deployment threat model.

Channel 2 surface: The ratchet header (pk_s, c_ratchet, n, pn) is transmitted in cleartext and bound into the AEAD AAD but not encrypted. A passive network observer learns: when epoch transitions occur (from pk_s changes), whether a KEM ratchet step is present in this message (c_ratchet), the message's position within the current epoch (n), and the number of messages sent in the previous epoch (pn). Message content, epoch keys, and identity are fully protected; the header fields are structural metadata outside the Channel 1 scope of this section. See §1.5 for the full Channel 2 surface and transport-layer mitigations.

6.2 State

RatchetState = {
    root_key:               32 bytes
    send_epoch_key:         32 bytes
    recv_epoch_key:         32 bytes
    local_fp:               32 bytes    // SHA3-256(full 3200-byte LO composite public key) for local party — NOT a sub-key hash, NOT the hex string
    remote_fp:              32 bytes    // SHA3-256(full 3200-byte LO composite public key) for remote party — same derivation rule
    send_ratchet_sk:        Option<X-Wing secret key>    // None until first send; also serves as decapsulation key for incoming KEM ratchet steps (§6.6) — there is no separate recv_ratchet_sk. Stored as the 2432-byte expanded X-Wing secret key form (NOT the 32-byte seed) — see §6.8 guard 2; storing the seed form produces InvalidData on serialization.
    send_ratchet_pk:        Option<X-Wing public key>    // None until first send. Dual role: (1) local state — the public key corresponding to send_ratchet_sk, included in outgoing message headers; (2) epoch routing anchor for the receiver — the receiver matches header.ratchet_pk against its own send_ratchet_pk (via recv_ratchet_pk on the other side) to identify the current epoch (§6.6)
    recv_ratchet_pk:        Option<X-Wing public key>    // None for Alice until first recv
    // Non-Rust reimplementer note: Option fields (recv_ratchet_pk, send_ratchet_sk/pk,
    // prev_recv_epoch_key, prev_recv_ratchet_pk) use Rust's Option<T> type where None
    // is semantically distinct from any byte pattern. In languages without sum types
    // (C, Go), represent None with a separate boolean presence flag — do NOT use an
    // all-zero array as a sentinel. An all-zero X-Wing public key is a valid (degenerate)
    // key that would cause epoch routing in decrypt (§6.6) to match incorrectly.
    prev_recv_epoch_key:    Option<32 bytes>              // Previous epoch key for late messages
    prev_recv_ratchet_pk:   Option<X-Wing public key>    // Previous epoch ratchet public key
    send_count:             u32    // = header.n when sending; starts at 1 for Alice
    recv_count:             u32    // high-water mark: max(n+1) for current recv epoch
    prev_send_count:        u32    // = header.pn when sending
    ratchet_pending:        bool   // set when peer KEM ciphertext received; cleared on next send
    recv_seen:              set of u32    // message counters successfully decrypted in current recv epoch
    prev_recv_seen:         set of u32    // message counters successfully decrypted in previous recv epoch
    epoch:                  u64    // monotonic anti-rollback counter for serialization
}

send_ratchet_sk dual role

send_ratchet_sk serves two distinct purposes: (1) signing/encapsulating outgoing KEM ratchet steps, and (2) decapsulating incoming KEM ratchet ciphertexts. There is no separate recv_ratchet_sk. This design means the party who most recently sent a message holds the decapsulation key for the peer's next reply — the current sender's send key becomes the receiver's decapsulation target. A reimplementer who adds a separate recv_ratchet_sk field diverges from the state model and will fail on the first direction change.

Clarification on counter fields: send_count is the counter included as n in the ratchet header of outgoing messages. recv_count is the high-water mark for the current receive epoch: max(n + 1) across all successfully decrypted messages. prev_send_count is the value of send_count at the moment the KEM ratchet step fires, included as pn in the first message of a new send epoch. This is not the number of messages sent in that epoch — for Alice's first epoch, one message at n=1 advances send_count to 2, so pn=2 when the ratchet fires. These are the same values that appear in the wire format (Protocol Spec §12.9).

ratchet_pending flag: Set to true when a message is received that carries a new peer ratchet public key (triggering recv_ratchet_pk update). Cleared when the next encrypt() call performs the send-side KEM ratchet step. While ratchet_pending is true, any call to encrypt() will perform the ratchet step first. This defers the send-side ratchet until the party actually needs to send, rather than forcing it immediately on receipt. For Bob, ratchet_pending = true at initialization (§5.5 Step 7) — it is not exclusively a runtime transition flag. It means "a KEM ratchet step is required before the next send," which is true immediately after session establishment for the responder.

recv_seen set: Tracks which message counters have been successfully decrypted in the current receive epoch. Used for duplicate detection: a message with n already in recv_seen is rejected as DuplicateMessage. The set is bounded at MAX_RECV_SEEN = 65536 entries as defense-in-depth against memory exhaustion. The set resets on each KEM ratchet step. Required operations: O(1) average-case contains for per-message duplicate detection (called on every decrypt), and sorted ascending iteration at serialization time (§6.8 serializes recv_seen entries in ascending order). The data structure choice is an implementation concern — a hash set provides O(1) contains and requires a sort step at serialization; a sorted B-tree provides O(log n) contains and O(1) sorted iteration. Either satisfies the spec; the performance difference becomes meaningful only near MAX_RECV_SEEN = 65536 entries.

Previous epoch key: prev_recv_epoch_key holds the epoch key from the immediately preceding receive epoch, allowing decryption of late-arriving messages from that epoch. It is overwritten (and the old value zeroized) by the next KEM ratchet step — only one previous epoch key is retained at any time. prev_recv_ratchet_pk identifies which ratchet public key the previous epoch was associated with, enabling the receiver to route incoming messages to the correct epoch key.

Fingerprint immutability: local_fp and remote_fp are fixed at session initialization (init_alice/init_bob) and MUST NOT be modified for the session lifetime. Both values are embedded in every message's AAD — mid-session modification would silently corrupt AAD for all in-flight and future messages, producing permanent AeadFailed without a session reset. The library enforces this by storing the fingerprints inside RatchetState (not caller-supplied per call) and by rejecting mutations via the exclusive-access model (§6.2). In languages where state fields are publicly accessible, implementations MUST treat these fields as read-only after initialization.

Fingerprint derivation: local_fp and remote_fp are SHA3-256 of the full 3200-byte LO composite public key (X-Wing pk (1216 B) || Ed25519 pk (32 B) || ML-DSA-65 pk (1952 B)) — not a hash of any single sub-key, and not the hex string. The CAPI (soliton_ratchet_init_alice, soliton_ratchet_init_bob) accepts pre-computed 32-byte fingerprint bytes; the library cannot verify correct derivation. A mismatch produces AeadFailed on every message with no diagnostic — the fingerprints are embedded in AAD, so a wrong fingerprint fails authentication identically to a tampered ciphertext. Use soliton_identity_fingerprint (§13.4) to compute fingerprints from public key bytes.

Identity fingerprint invariant: local_fp and remote_fp must be distinct (local_fp ≠ remote_fp) and neither may be all-zero. Equal fingerprints would break AAD asymmetry, allowing a message encrypted by one party to be replayed as if sent by the other. All-zero fingerprints indicate uninitialized state. Both conditions are enforced at init_alice/init_bob (returning InvalidData) and at deserialization (guard 20). A state constructed with a zero fingerprint can encrypt/decrypt (the AEAD doesn't inspect fingerprint values), but fails to round-trip through serialization — enforcing at init prevents this latent inconsistency.

init_alice / init_bob function signatures and error returns:

function init_alice(
    root_key:         [u8; 32],   // from KDF_KEX (§5.4 Step 4), secret material
    epoch_key:        [u8; 32],   // from KDF_KEX (§5.4 Step 4), becomes send_epoch_key
    local_fp:         [u8; 32],   // SHA3-256 of Alice's full 3200-byte public key
    remote_fp:        [u8; 32],   // SHA3-256 of Bob's full 3200-byte public key
    send_ratchet_pk:  X-Wing public key (1216 bytes),  // Alice's EK_pub (§5.4)
    send_ratchet_sk:  X-Wing secret key (2432 bytes),  // Alice's EK_sk (§5.4)
) → RatchetState | InvalidData

function init_bob(
    root_key:         [u8; 32],   // from KDF_KEX (§5.5 Step 4), secret material
    epoch_key:        [u8; 32],   // from KDF_KEX (§5.5 Step 4), becomes recv_epoch_key
    local_fp:         [u8; 32],   // SHA3-256 of Bob's full 3200-byte public key
    remote_fp:        [u8; 32],   // SHA3-256 of Alice's full 3200-byte public key
    recv_ratchet_pk:  X-Wing public key (1216 bytes),  // Alice's EK_pub (from SessionInit)
) → RatchetState | InvalidData

Parameter order note: Fingerprints follow root_key and chain_key but precede the ephemeral key parameters in both functions — (root_key, chain_key, local_fp, remote_fp, key_params...). The full CAPI signatures are soliton_ratchet_init_alice(root_key, root_key_len, chain_key, chain_key_len, local_fp, local_fp_len, remote_fp, remote_fp_len, ek_pk, ek_pk_len, ek_sk, ek_sk_len, out) and soliton_ratchet_init_bob(root_key, root_key_len, chain_key, chain_key_len, local_fp, local_fp_len, remote_fp, remote_fp_len, peer_ek, peer_ek_len, out). The §13.4 summary abbreviates for readability; fingerprints always follow root_key and chain_key but precede the ephemeral key parameters (ek_pk/ek_sk for Alice, peer_ek for Bob). A reimplementer who infers parameter order from the §13.4 abbreviation alone, or who follows an abbreviated listing that omits or reorders fingerprints, silently corrupts every message's AAD (the fingerprints flow into KDF_MsgKey for every message; wrong ordering produces wrong AAD, causing immediate AEAD failure). For init_alice, send_ratchet_pk appears before send_ratchet_sk (public key before secret key) — the reverse of the draft order common in academic specifications. Swapping pk/sk produces a type error in strongly-typed languages but not in C/Go/Python where both are *const u8.

init_alice and init_bob accept caller-supplied fingerprints — they are inputs, not outputs derived from the key material. The functions cannot verify that the fingerprints match the actual public keys used in the KEM exchange. They return InvalidData for: local_fp == remote_fp (AAD asymmetry violation), either fingerprint all-zero (uninitialized sentinel), root_key all-zero (liveness sentinel), or epoch_key all-zero (degenerate KEX output). On InvalidData, no ratchet handle is allocated.

Root key and epoch key liveness: The root_key and the input epoch_key parameter (from LO-KEX) must not be all-zero at init time. All-zero values indicate uninitialized or degenerate KEX output. This check applies to the epoch key that becomes the active direction's key — for Alice, send_epoch_key; for Bob, recv_epoch_key. The other direction's epoch key is intentionally set to all-zeros as a placeholder (Alice's recv_epoch_key, Bob's send_epoch_key) and is not checked at init — it will be set to a real value by the first KEM ratchet step in that direction. After a session-fatal error (encrypt AEAD failure), root_key is zeroized to zero — the all-zero liveness check on subsequent encrypt/decrypt calls prevents use of a dead session. Dual role of root_key: root_key serves two purposes: (1) it is the HKDF salt in KDF_Root (§6.4), providing forward secrecy by mixing fresh KEM shared secrets into the key hierarchy, and (2) it is the liveness sentinel checked at the top of encrypt/decrypt. Both uses require root_key to be secret material — the constant-time comparison in the liveness check (§6.5, §6.6) prevents timing side-channels that could leak root_key bytes.

Concurrency model: All operations on a RatchetState require exclusive access. No concurrent or reentrant calls are safe on the same state handle — even read-only queries (epoch, is-pending, etc.) must not race with encrypt/decrypt. The CAPI enforces this via an AtomicBool reentrancy guard (§13.6). Reimplementers wrapping the Rust core directly must provide their own mutual exclusion (e.g., Mutex<RatchetState>, not RwLock — encrypt, decrypt, and serialization may all trigger KEM ratchet steps or mutate counters). Exception: derive_call_keys (§6.12) takes &self and reads root_key without mutating any ratchet state. Multiple concurrent derive_call_keys calls are safe with respect to each other. However, derive_call_keys must still not race with encrypt/decrypt (which may advance root_key via a KEM ratchet step), so a RwLock — where derive_call_keys takes a read lock and encrypt/decrypt take write locks — is a valid alternative to Mutex for reimplementers who need concurrent call key derivation.

to_bytes() requires write-lock upgrade, not read-lock: to_bytes() is ownership-consuming (it takes self and nulls the handle on success — §6.8). In a RwLock scenario, to_bytes() requires a write lock with no outstanding readers — not a read lock. A reimplementer who acquires only a read lock for to_bytes() while a concurrent derive_call_keys also holds a read lock creates a use-after-consume race: both calls access the same handle, but to_bytes() destroys it. Rust's ownership system prevents this at compile time (consuming self requires &mut self upgrade, which is incompatible with any outstanding &self borrow). C/Go reimplementers using explicit RwLock primitives MUST acquire a write lock for to_bytes() and wait for all outstanding derive_call_keys read locks to drain before proceeding.

Anti-rollback epoch: The epoch counter starts at 0 and is incremented each time the state is serialized via to_bytes. On deserialization, the epoch must be strictly greater than the last-seen epoch for the same session. This prevents storage-layer replay of older blobs. See §6.8.

Initial state after LO-KEX:

For Alice (initiator):

send_ratchet_pk/sk = her EK (ephemeral key from §5.4 Step 2)
recv_ratchet_pk = None (Bob hasn't sent yet)
send_epoch_key = epoch key from encrypt_first_message
recv_epoch_key = all-zeros (unused until Bob sends)
prev_recv_epoch_key = None
send_count = 1 (counter 0 was used by the random-nonce first message)
recv_count = 0
prev_send_count = 0 (initialization default — no KEM ratchet step has fired yet). When Alice's first KEM ratchet step fires, prev_send_count is set to send_count at that moment (§6.4). If Alice sent one ratchet message (n=1, advancing send_count to 2), her first ratchet message carries pn = 2, not pn = 0.
ratchet_pending = false
recv_seen = empty, prev_recv_seen = empty
epoch = 0
recv_seen is empty (not {0}) because counter 0 was consumed by encrypt_first_message — a structurally separate function from encrypt(). A replayed session init is a protocol-layer concern (deduplicated by the relay), not a ratchet concern. Reimplementers MUST NOT seed recv_seen with {0}.

For Bob (responder):

recv_ratchet_pk = Alice's EK_pub (from session init)
send_ratchet_pk/sk = None (set on first send)
recv_epoch_key = epoch key from decrypt_first_message
send_epoch_key = all-zeros (unused until Bob sends)
prev_recv_epoch_key = None
recv_count = 1 (counter 0 was consumed by the session-init message, outside the ratchet). Derivation: recv_count tracks max(n + 1) across successfully decrypted messages in the current epoch; decrypt_first_message processes the message at counter 0, producing max(0 + 1) = 1. This is a bookkeeping value for serialization consistency (guard 17 requires recv_seen entries to be < recv_count), not a replay guard — Alice's send_count starts at 1, so she will never produce a ratchet message with n = 0 in this epoch. A reimplementer who treats recv_count = 1 as the security control preventing n = 0 replays is relying on a false assumption; the actual protection is Alice's send_count starting at 1 (§5.4 Step 7).
send_count = 0
prev_send_count = 0 (first ratchet message from Bob carries pn = 0 — no prior send epoch exists)
ratchet_pending = true (Bob must ratchet before first send)
recv_seen = empty, prev_recv_seen = empty
epoch = 0
recv_seen is empty (not {0}) for the same reason as Alice: counter 0 was consumed by decrypt_first_message, outside the ratchet. Reimplementers MUST NOT seed recv_seen with {0} — doing so causes the first ratchet-layer message (counter 0 in a new epoch after Bob's KEM ratchet step) to be rejected as DuplicateMessage.

6.3 Counter-Mode Message Key Derivation

function KDF_MsgKey(epoch_key, counter):
    return HMAC-SHA3-256(key=epoch_key, data=0x01 || big_endian_32(counter))

Each counter value produces a unique message key from the static epoch key. The 0x01 prefix byte provides domain separation — no other derivation from the epoch key currently exists, but the prefix reserves the 0x01 domain for message keys, leaving other prefix values available for future epoch-key-derived outputs without risking collision. The epoch key does not advance per message — it is fixed for the duration of an epoch (between KEM ratchet steps).

Counter=0 cannot collide with any chain-advancement derivation: In this counter-mode design there is no chain key advancement step — KDF_Chain does not exist. Counter 0 is simply KDF_MsgKey(epoch_key, 0), and counters 1, 2, … are independent derivations from the same static key. There is no internal computation that uses the same HMAC key and input as counter 0 for any other purpose. The 0x01 domain prefix ensures that even if a hypothetical future protocol extension derived something from the epoch key with a different prefix byte, counter 0 (data = 0x01 || 0x00000000) would not collide with it. A reimplementer coming from a chain-ratchet background (e.g., Signal's Double Ratchet) should note: there is no KDF_Chain(epoch_key) producing (msg_key, next_chain_key) — the epoch key is never used as input to derive another epoch key; that transition happens only via KDF_Root on a KEM ratchet step.

HMAC-SHA3-256 block size is 136 bytes: SHA3-256's rate (block size) is 136 bytes, not the 64-byte SHA-2 block size most developers have internalized. RFC 2104 HMAC pads/hashes the key to the hash's block size. A reimplementer building HMAC from a raw SHA3-256 primitive who assumes 64-byte blocks produces wrong padding and silently incorrect MACs. Standard HMAC libraries handle this automatically — this note exists for anyone implementing HMAC from scratch.

HMAC-SHA3-256 uses FIPS 202 SHA3-256, not Keccak-256: The SHA3-256 here is the NIST-standardized variant (FIPS 202, domain-separation suffix 0x06), not the pre-standardization Keccak-256 used in Ethereum and similar systems (suffix 0x01). The two produce different outputs for the same input. Both have the same 136-byte block size, so the block-size hazard above does not detect this substitution — the HMAC library silently accepts either hash function and produces wrong but plausible-looking output. In Go, use sha3.New256() from golang.org/x/crypto/sha3, not sha3.NewLegacyKeccak256(). In Python, use hashlib.sha3_256, not a pysha3 or pycryptodome Keccak binding. Every message key, root key, and call chain key derivation in this protocol would be wrong throughout if Keccak-256 were substituted here.

HMAC input is exactly 5 bytes: no length prefix, no padding. The data field is the literal concatenation 0x01 || big_endian_32(counter) — 1 byte domain prefix followed by 4 bytes counter. Unlike the HKDF info fields in §5.4 (which use 2-byte BE length prefixes), HMAC data here is a fixed-layout input with no framing. A reimplementer who adds a 2-byte length prefix (e.g., 0x00 0x05 || 0x01 || counter) by analogy with HKDF conventions produces a different 32-byte message key with no error or diagnostic.

HMAC domain byte allocation (complete registry — protocol extenders MUST NOT reuse allocated values):

Byte	Use	Section
`0x01`	Message key derivation (`KDF_MsgKey`)	§6.3
`0x02`-`0x03`	Reserved	—
`0x04`	Call key_a derivation (`AdvanceCallChain`)	§6.12
`0x05`	Call key_b derivation (`AdvanceCallChain`)	§6.12
`0x06`	Call chain_key derivation (`AdvanceCallChain`)	§6.12

Note: 0x04-0x06 operate on the call chain key (§6.12), not the epoch key. They are listed here for completeness — the domain byte space is global across all single-byte HMAC-SHA3-256 data inputs in the protocol.

HMAC is used here (not HKDF) as a PRF — each call is independent with a fixed key and unique counter input. No extract phase is needed because the epoch key is already uniformly distributed (output of HKDF in KDF_Root). For formal models: treat as PRF(ek, 0x01 ‖ BE32(counter)).

Forward secrecy is per-epoch, not per-message. Compromising an epoch key reveals all message keys in that epoch. Forward secrecy across epochs is provided by the KEM ratchet (§6.4), which derives each new epoch key from a fresh KEM shared secret via KDF_Root. See §6.13 for the design rationale.

The output is wrapped in Zeroizing to ensure automatic memory wipe after use. Non-Rust implementations: the 32-byte msg_key returned by KDF_MsgKey is secret key material — it MUST be zeroized immediately after AEAD encryption or decryption completes. In C/Go/Java/Python, the caller must explicitly call memset_s (C), Arrays.fill (Java), or equivalent after use. RAII-less environments MUST NOT rely on garbage collection or variable scope for zeroization — the key may remain in memory until a future allocation overwrites it, which can be arbitrarily delayed. The zeroization MUST occur before any error path or early return that could skip cleanup (e.g., if AEAD fails after key derivation but before the key is used, the derived key must still be zeroized).

6.4 KEM Ratchet Step

When sending and send_ratchet_pk is absent or ratchet_pending is true:

function PerformKEMRatchetSend(state):
    peer_pk = state.recv_ratchet_pk   // must be Some; if None → Internal error
        // (structurally unreachable from valid state — deserialization guards 6
        // and 9 in §6.8 prevent this configuration, and init_bob always sets it)

    // Generate new ratchet keypair.
    (new_pk, new_sk) = XWing.KeyGen()

    // Encapsulate to peer's ratchet public key.
    (ct, ss) = XWing.Encaps(peer_pk)

    // Advance root key, derive new send epoch key.
    (state.root_key, state.send_epoch_key) = KDF_Root(state.root_key, ss)

    // Update state.
    state.send_ratchet_sk = new_sk   // old sk auto-zeroized via ZeroizeOnDrop (Rust only).
                                     // Non-Rust reimplementers MUST explicitly zeroize the
                                     // old send_ratchet_sk before overwriting: assigning a new
                                     // pointer or value leaves the old key bytes on the heap.
                                     // When send_ratchet_sk is None (Bob's first send, or Alice's
                                     // state before any send), the zeroization obligation is
                                     // vacuously satisfied — there are no key bytes to zeroize.
                                     // In C: check for null before calling memset; in Go: check
                                     // for nil slice. The obligation applies only when transitioning
                                     // from Some(old_sk) to Some(new_sk).
                                     // Same obligation applies to prev_recv_epoch_key rotation
                                     // in §6.6 (old recv_epoch_key becomes prev_recv_epoch_key;
                                     // if prev_recv_epoch_key is being replaced, zeroize before
                                     // overwriting the slot).
                                     //
                                     // CRITICAL: the old send_ratchet_sk MUST NOT be zeroized
                                     // until after all three preceding operations (KeyGen, Encaps,
                                     // KDF_Root) have completed successfully. All three are
                                     // fallible (CSPRNG failure, structural error). If any fails,
                                     // the ratchet step must abort with no state change — the
                                     // caller must be able to retry with the session in its
                                     // pre-ratchet state. The reference implementation guarantees
                                     // this by performing all state writes (this line and below)
                                     // only after the fallible operations return successfully.
                                     // A reimplementer who "eagerly" zeroizes the old sk
                                     // immediately after keygen (before Encaps) loses the ability
                                     // to roll back on Encaps failure, leaving the session
                                     // permanently without a valid send ratchet key.
    state.send_ratchet_pk = new_pk
    state.prev_send_count = state.send_count   // MUST precede send_count = 0 (see below)
    state.send_count = 0   // Post-ratchet epochs start at n=0; Alice's first epoch
                           // is the exception (send_count=1 from session init, §5.4 Step 7).
                           // Reset (not continuation) is safe: the new epoch_key is independent
                           // (derived from a fresh KEM shared secret via KDF_Root), so counter N
                           // under epoch E₁ and counter N under epoch E₂ produce different
                           // message keys. Continuation would also be correct but reset is
                           // the simpler invariant and matches header.n expectations.
                           // **All seven field writes are atomic — no serialization point may be
                           // introduced between the start of KDF_Root and the post-AEAD send_count
                           // += 1.** This covers the entire sequence: `KDF_Root` (writes root_key
                           // and send_epoch_key), `send_ratchet_sk = new_sk`, `send_ratchet_pk =
                           // new_pk`, `prev_send_count = send_count`, `send_count = 0`,
                           // `ratchet_pending = false`. If serialization occurs after KDF_Root
                           // updates root_key/send_epoch_key but before send_count resets to 0,
                           // the blob encodes new epoch keys with the old send_count. Guard 8 does
                           // NOT catch this intermediate: `send_count > 0` with `send_ratchet_sk`
                           // present is a valid combination (every non-initial send), so the blob
                           // reloads successfully — but the session silently derives nonces using
                           // a desynchronized counter, causing AEAD failure against the peer.
                           // **Guard 8 transient violation (§6.8 guard 8)**: After `send_count = 0`
                           // specifically, the state has send_count == 0 with send_ratchet_sk
                           // present — the exact combination guard 8 rejects at deserialization.
                           // This narrower window is safe only because encrypt() holds exclusive
                           // access (§6.2) and does not yield between this line and send_count += 1.
                           // The full atomicity requirement above is the stronger invariant.
    state.ratchet_pending = false

    zeroize(ss)     // ss MUST be zeroized AFTER KDF_Root completes — it is the IKM input
                    // to HKDF and must survive until KDF_Root returns. Zeroizing ss before
                    // KDF_Root would use all-zero IKM, silently producing a weak, predictable
                    // epoch key. zeroize(ss) MUST be positioned after both state.root_key and
                    // state.send_epoch_key are written. Non-Rust implementations MUST NOT
                    // reorder or "optimize" this zeroization earlier.
    return ct    // Included in message header

prev_send_count = send_count MUST precede send_count = 0 — correctness requirement, not incidental ordering: The two assignments appear in fixed order in the pseudocode, but the ordering is a hard correctness requirement. prev_send_count captures the count of messages sent in the just-completed epoch, which is transmitted as pn in ratchet-step headers so the peer knows how many messages to expect from the old epoch. If send_count = 0 executes first, prev_send_count captures the reset value (0) regardless of how many messages were sent — every subsequent ratchet-step header carries pn = 0. The peer's AEAD succeeds (pn is AAD-bound but both sides compute the same wrong AAD when both use pn = 0), so this failure is silent in same-implementation testing. It only manifests as AeadFailed when a reimplementer's peer uses the correct ordering. A simple field swap in the implementation or a refactoring that moves the reset earlier silently introduces this bug.

Root KDF:

function KDF_Root(root_key, kem_shared_secret):
    output = HKDF(
        salt = root_key,
        ikm  = kem_shared_secret,  // 32 bytes — the full X-Wing combiner output (SHA3-256 of
                                   // ss_M ‖ ss_X ‖ ct_X ‖ pk_X ‖ XWingLabel, §8.2) — NOT the
                                   // X25519 DH output (ss_X) or ML-KEM shared secret (ss_M) alone
        info = "lo-ratchet-v1",   // raw 13-byte UTF-8 — no length prefix (unlike §5.4 KDF_KEX info which uses len(x)||x per field)
        len  = 64
    )
    return (output[0..32], output[32..64])    // (new_root, new_epoch_key)
                                              // bytes [0..32]  → new root_key (replaces state.root_key)
                                              // bytes [32..64] → new epoch_key (becomes state.send_epoch_key or state.recv_epoch_key)
                                              // Swapping the two halves is a silent wrong-output failure:
                                              // AEAD succeeds on the sender because both sender and receiver
                                              // share the same wrong key, but the root key evolves along a
                                              // different trajectory than the spec, breaking interoperability
                                              // with any correct implementation. F.2 (§Appendix F) provides
                                              // labeled vectors for both output halves.

Why root_key is HKDF salt (not IKM): Placing the existing chain state (root_key) as the HKDF salt means the extraction phase is keyed by accumulated entropy from all prior KEM ratchet steps. Even a weak kem_shared_secret (e.g., from a biased KEM or compromised randomness) cannot dominate the extraction — the pre-existing root entropy conditions the PRK. This follows Signal's ratchet KDF design. By contrast, KDF_KEX (§5.4 Step 4) uses a zero salt because there is no prior chain state at session establishment — the zero salt is the RFC 5869 §2.2 default for "no prior keying material."

KDF_Root is infallible in the reference implementation: HKDF-Expand output length is bounded by 255 × HashLen (RFC 5869 §2.3). For SHA3-256 (HashLen = 32), the maximum is 255 × 32 = 8160 bytes. KDF_Root requests exactly 64 bytes, which is well within this limit. The operation therefore cannot fail due to output-length overflow in a correct SHA3-256 HKDF implementation. Reimplementers using a fallible HKDF API (e.g., returning an error for length > max) will never observe that error on this call path; if they do, it indicates an implementation bug and MUST be treated as Internal, not surfaced to callers.

6.5 Message Encryption

function Encrypt(state, plaintext, sender_fp, recipient_fp):
    // Liveness guard: all-zero root_key indicates a dead (post-reset) session.
    // Constant-time comparison — root_key is secret material.
    if root_key == 0x00{32}:
        raise InvalidData

    // Guard against nonce reuse before any mutation.
    // Idempotent: repeated calls with send_count at u32::MAX return ChainExhausted
    // without modifying state — no progressive corruption on retry.
    if state.send_count == u32::MAX:
        raise ChainExhausted
        // Terminal state: when send_count == u32::MAX AND ratchet_pending == true,
        // the pending KEM ratchet step (which would reset send_count to 0) never
        // fires — the ChainExhausted guard blocks before the ratchet_pending check.
        // The session is permanently un-sendable. Full session reset (§6.10) and
        // new LO-KEX exchange required. A reimplementer who assumes the pending
        // ratchet "unblocks" the guard will deadlock silently.

    // Perform ratchet if needed (no send chain yet, or direction changed).
    // These are two independent conditions, NOT interchangeable:
    //   - send_ratchet_pk is None: Bob's initial state (never sent) — both conditions true
    //   - ratchet_pending: direction changed since last send — send_ratchet_pk is Some
    // After the first KEM ratchet step, send_ratchet_pk is always Some; only
    // ratchet_pending toggles. Collapsing them into a single flag breaks Bob's first send.
    kem_ct = None
    if state.send_ratchet_pk is None or state.ratchet_pending:
        kem_ct = PerformKEMRatchetSend(state)

    msg_key = KDF_MsgKey(state.send_epoch_key, state.send_count)

    nonce = 0x00{20} || big_endian_32(state.send_count)
    // SAME counter: KDF_MsgKey and nonce derivation both use state.send_count.
    // These are NOT separate counters. A reimplementer who uses a separate
    // nonce_counter (drifting from send_count) breaks AEAD authentication: the
    // receiver derives msg_key from header.n but constructs the nonce from header.n
    // as well — both use the same wire value. If the sender's nonce_counter diverges
    // from send_count, the nonce used for encryption differs from what the receiver
    // expects, producing AeadFailed with no diagnostic. If nonce_counter eventually
    // aliases send_count at a prior value, nonce reuse follows.

    header = {
        ratchet_pk:  state.send_ratchet_pk,
        kem_ct:      kem_ct,
        n:           state.send_count,
        pn:          state.prev_send_count
    }

    header_bytes = encode_ratchet_header(header)
    // sender_fp = local party's fingerprint, recipient_fp = remote party's fingerprint.
    // These are reversed on the decrypt side (§6.6) where sender_fp = remote.
    aad = "lo-dm-v1" || sender_fp (32 B) || recipient_fp (32 B) || header_bytes

    ciphertext = AEAD(msg_key, nonce, plaintext, aad)
    if ciphertext is Error:
        // Session-fatal: zeroize all key material to prevent nonce reuse on retry.
        reset(state)
        zeroize(msg_key)
        raise AeadFailed

    state.send_count += 1
    zeroize(msg_key)

    return (header, ciphertext)

The nonce encodes send_count in the last 4 bytes of a 24-byte buffer (bytes 0-19 are zero). Each (msg_key, nonce) pair is unique because the counter is distinct per epoch position. When send_count = 0 (the first message of a post-ratchet epoch — e.g., Bob's very first send after ratchet_pending clears), this produces a 24-byte all-zero nonce. This is safe and expected: the epoch key is fresh from KDF_Root, so the (epoch_key, nonce) pair is unique globally even though the nonce bytes are all zero. Implementations MUST NOT add a defensive guard that rejects all-zero nonces — doing so breaks every first post-ratchet message.

Encrypt atomicity: Unlike decrypt (§6.6), encrypt does not use snapshot/rollback. Atomicity is achieved through ordering — all fallible operations (ChainExhausted check, KEM keygen/encapsulate/KDF_Root, KDF_MsgKey, AEAD) execute before send_count is incremented. If any pre-AEAD operation fails, no state has been mutated. The KEM ratchet step (§6.4) is the exception: it mutates root_key, send_epoch_key, send_ratchet_sk/pk, prev_send_count, send_count, and ratchet_pending before AEAD runs. If AEAD fails after a successful KEM ratchet step, these mutations cannot be safely unwound, so the session is zeroized via reset() (see below). A reimplementer MUST NOT mutate send_count optimistically before AEAD succeeds — doing so would consume a counter on failure, eventually causing nonce reuse.

Cooperative multitasking within the atomicity window: The "exclusive access" model (§6.2) prevents concurrent calls on the same ratchet state, but it does not prevent the owning coroutine or goroutine from yielding during an encrypt call. In async/await (Rust, Python, JavaScript), goroutines (Go), or green threads (Erlang, Ruby Fiber), a yield between PerformKEMRatchetSend (which mutates root_key, send_epoch_key, etc.) and send_count += 1 causes the serialized state — produced by any to_bytes() call on that yield point — to encode the post-ratchet mutated fields alongside the pre-increment send_count. Guard 8 (§6.8, ratchet_pending requires recv_ratchet_pk) does not fire here, but the re-loaded session will have mismatched ratchet state: the new send_epoch_key with the old send_count, causing nonce reuse on the next post-load encrypt. The mitigation: the to_bytes call MUST NOT happen within the encrypt call's atomicity window. Callers MUST either hold the ratchet under a mutex that covers the full encrypt call (not just the state mutation), or structure async code so that to_bytes is never called in a concurrent task on the same ratchet state. Rust's &mut self on encrypt prevents this by construction (a mutable reference cannot be aliased); Go/Python/C callers must manage this explicitly.

PerformKEMRatchetSend ordering is a correctness requirement: The retry guarantee — that Internal (CSPRNG failure) from encrypt is safe to retry because no state was mutated — holds only if PerformKEMRatchetSend completes all fallible operations (keygen, encapsulate, KDF_Root) before writing any field. The pseudocode preserves this: XWing.KeyGen() and XWing.Encaps() both complete before any of the seven fields are written (line: (new_pk, new_sk) = XWing.KeyGen() then (ct, ss) = XWing.Encaps() then all state writes). A reimplementer who interleaves field mutations with fallible operations — for example, storing the new ratchet keypair immediately after keygen but before encapsulate — loses the retry guarantee and must implement explicit snapshot/rollback for the interleaved fields.

Encrypt atomicity is a documentation-only guarantee, not a structural one: The decrypt path enforces rollback integrity via an explicit save_recv_state / restore_recv_state snapshot, making the invariant structurally visible. The encrypt path has no equivalent snapshot mechanism — its safety guarantee is maintained solely by the ordering of operations in the pseudocode and implementation. A future refactor that reorders operations (e.g., moving send_count += 1 earlier, or splitting PerformKEMRatchetSend across multiple steps with interleaved state mutations) would silently break the retry and nonce-reuse guarantees with no compile-time or runtime protection. Security reviewers auditing the implementation should verify the operation order explicitly, and any refactor touching the encrypt path MUST maintain: (1) all XWing.KeyGen()/XWing.Encaps() calls complete before any state field is written; (2) send_count is incremented only after AEAD succeeds; (3) if AEAD fails after any state mutation, reset() is called before returning AeadFailed.

Internal from encrypt: When ratchet_pending = true, encrypt() calls XWing.KeyGen() and XWing.Encaps() as part of the KEM ratchet step (§6.4). Both operations consume CSPRNG randomness; CSPRNG exhaustion (structurally unreachable on standard OSes, but possible on embedded targets or under sandbox misconfiguration) propagates as Internal. This failure occurs before any state mutation — no KEM ratchet step has been applied, no counter has been consumed, and the session is unchanged. The call is safe to retry. ratchet_pending retains its pre-call value (true) — the next encrypt() call re-enters PerformKEMRatchetSend automatically without any caller intervention. The caller MUST NOT manually clear or re-set ratchet_pending after an Internal error. The documented encrypt error table is: ChainExhausted (counter at limit), AeadFailed (session-fatal, see below), and Internal. Internal has two sources with different retry semantics: (1) CSPRNG failure in XWing.KeyGen() / XWing.Encaps() — occurs before any state mutation, retryable; (2) recv_ratchet_pk = None inside PerformKEMRatchetSend — the KEM ratchet step requires the peer's last-received ratchet public key, which is absent only if the ratchet was deserialized from a structurally invalid blob. This variant is not retryable — the session is structurally inconsistent and must be reset. Callers who treat all Internal returns from encrypt as retryable will loop indefinitely on the second variant. See §6.6 for the analogous decrypt error table.

Session-fatal encrypt failure: AEAD encryption failure is treated as session-fatal — all session key material (root key, send/receive epoch keys, ratchet keys) is zeroized, making the state permanently unusable. The fingerprints (local_fp, remote_fp) are also zeroized as part of reset() — see §6.10 for the caller obligation to preserve fingerprints independently before reset. The caller must discard the session. The send counter is only incremented on success (after AEAD encryption), so AEAD failure does not consume a counter. The defense-in-depth zeroization prevents any possibility of nonce reuse from retry attempts. In practice, XChaCha20-Poly1305 encrypt only fails on integer overflow (plaintext.len() ≈ usize::MAX) — which does not occur with well-formed input. The liveness guard (§6.2) is how this achieves permanent unusability without a separate is_dead flag: reset() zeros root_key, and subsequent encrypt/decrypt calls detect the all-zero root key via constant-time comparison and return InvalidData.

Pseudocode parameters vs. API: sender_fp and recipient_fp are shown as explicit parameters in the pseudocode for clarity of AAD construction. In the actual API, they are stored in RatchetState at init_alice/init_bob time (as local_fp/remote_fp) and are not caller-supplied per call. The Rust signature is encrypt(&mut self, plaintext: &[u8]) — no fingerprint parameters. The CAPI soliton_ratchet_encrypt similarly takes no fingerprint parameters. A reimplementer who exposes per-call fingerprint parameters allows callers to pass different fingerprints on different calls, weakening the AAD binding guarantee.

Dropped encrypt results orphan counter slots: A successful encrypt() call advances send_count irrevocably. If the caller discards the returned (header, ciphertext) without transmitting it (e.g., due to a transport-layer error after AEAD succeeded), the counter slot is permanently consumed. The receiver will never see a message at that counter — counter-mode is tolerant of holes, so no error occurs on the receiver side. However, the caller MUST NOT re-encrypt the same plaintext on transport failure: a second encrypt() call uses the next counter, not the one that was dropped. The Rust API marks encrypt() with #[must_use], producing a compiler warning if the result is discarded. The CAPI soliton_ratchet_encrypt carries __attribute__((warn_unused_result)) in the generated header, providing the same compiler-level signal in C and C++. Languages without this feature (Go, Python) must enforce this obligation via documentation and caller discipline. Retry-loop hazard: a caller who, on transport failure, calls encrypt() again to "retransmit" is producing a new message at a new counter — not a retransmission of the original. The recipient will receive two messages at two different counter values. If the original message is ever delivered, no replay detection fires (both counters are distinct), and both messages are accepted. To retransmit, the caller must buffer and resend the already-encrypted (header, ciphertext) output, not invoke encrypt() again.

Counter gaps are a normal protocol property: The receiver MUST NOT treat counter gaps (missing entries in the n sequence) as errors. Gaps caused by dropped encrypt results are indistinguishable from gaps caused by lost network packets — both appear as missing entries in the counter sequence, and neither leaves any trace in the receiver's state. An application that uses counter gaps for application-layer loss detection, or a reimplementer who adds receiver-side gap-rejection logic, would break the protocol for any transport with unreliable delivery.

Critical: The AAD includes the canonical encoding of the full ratchet header. This prevents an attacker from:

Substituting ratchet_pk (would poison recipient's ratchet state).
Modifying pn (would cause incorrect previous-epoch counter range → state desync or forced message loss).
Injecting a fake kem_ct (would corrupt root key derivation).

6.6 Message Decryption

Helper function definitions used in the pseudocode below:

save_recv_state(state) → snapshot: Captures all receive-side state fields: recv_epoch_key, recv_count, recv_ratchet_pk, prev_recv_epoch_key, prev_recv_ratchet_pk, recv_seen, prev_recv_seen, root_key, and ratchet_pending. recv_count must be included because the new-epoch path sets it to 0 (line 1345) before AEAD runs — a failed AEAD would leave recv_count zeroed unless the snapshot captures and restores it. Does NOT capture send-side state (send_epoch_key, send_ratchet_sk, send_ratchet_pk, send_count, prev_send_count) — those are mutated only by encrypt() and are not part of the decrypt rollback scope. epoch is NOT in the snapshot and is NOT mutated by decrypt() — epoch is a serialization counter incremented only by to_bytes() (§6.7) to version the stored blob; it plays no role in the cryptographic operations of decrypt() and does not appear in any message or AAD. A reimplementer who includes epoch in the snapshot or who increments epoch inside decrypt() would desync the blob version counter from the actual serialize-call count, causing UnsupportedVersion errors on reload after a decrypt that was followed by no serialize.
restore_recv_state(state, snapshot): Restores all fields captured by save_recv_state, reverting any decrypt-path state mutations. Called on any failure after the snapshot is taken. A no-op in terms of effect if no mutations occurred before the failure.
Epoch identification: The current_epoch and prev_epoch boolean assignments in the pseudocode below ARE the epoch identification step. The reference implementation extracts this routing logic into a private identify_epoch() helper method; the pseudocode inlines it for presentation clarity. Comments in the pseudocode that reference identify_epoch() describe this inline routing block.

function Decrypt(state, header, ciphertext, sender_fp, recipient_fp):
    // Liveness guard: all-zero root_key indicates a dead (post-reset) session.
    // Constant-time comparison — root_key is secret material.
    if root_key == 0x00{32}:
        raise InvalidData

    // Counter exhaustion guard — BEFORE any KEM ratchet step.
    // recv_count is updated as max(recv_count, n + 1): with n = u32::MAX,
    // n + 1 wraps to 0 in unsigned arithmetic, silently resetting the
    // high-water mark and making all previously-seen counters appear unseen
    // (replay window collapse). Placing this before epoch-specific logic
    // prevents any cryptographic state mutation. NOTE: in this pseudocode,
    // the snapshot is allocated below (at `save_recv_state`, after
    // `identify_epoch()` and the pre-mutation structural checks). The
    // reference implementation allocates the snapshot before this guard —
    // in Rust the guard fires after the snapshot is already taken. Both
    // orderings are correct since no state mutations precede this guard;
    // rollback is a no-op either way. See Appendix E for the failure table
    // entry that documents both orderings.
    // ChainExhausted (not InvalidData) mirrors the send-side guard (§6.5):
    // the counter space is exhausted, not a structural format error.
    if header.n == u32::MAX:
        raise ChainExhausted

    // Identify which epoch this message belongs to.
    // Epoch routing depends solely on header.ratchet_pk — the presence or
    // absence of header.kem_ct is NOT examined until the new-epoch path is
    // confirmed. The three cases are evaluated in priority order (if/else if/else),
    // not as independent predicates.
    // IMPLEMENTATION REQUIREMENT: Implementations MUST NOT add a pre-AEAD structural
    // check that rejects current-epoch or previous-epoch messages carrying a kem_ct.
    // A message that matches the current or previous epoch MAY carry a kem_ct field
    // (e.g., a retransmitted ratchet-step message replayed with different routing).
    // Rejecting such a message as `InvalidData` before AEAD runs would violate the
    // oracle-collapse requirement (§12): `InvalidData` arrives in nanoseconds while
    // AEAD takes microseconds, creating a timing oracle that distinguishes "has kem_ct
    // in wrong context" from "authentication failed." The kem_ct is simply ignored on
    // non-new-epoch paths; AEAD authentication provides the correct rejection if the
    // message is invalid for any reason. A reimplementer adding a "kem_ct MUST be
    // absent for same-epoch messages" guard MUST ensure it is collapsed to `AeadFailed`
    // (not returned as `InvalidData`) if they choose to add it.
    // CONSTANT-TIME REQUIREMENT: Each comparison that is actually executed MUST
    // use constant-time equality (e.g., subtle::ConstantTimeEq). The risk is NOT
    // leaking which epoch a message belongs to — header.ratchet_pk is cleartext,
    // so the epoch type is already publicly observable. The actual risk is leaking
    // the byte values of the stored recv_ratchet_pk or prev_recv_ratchet_pk via
    // a timing side-channel: a crafted probe message with a hand-crafted ratchet_pk
    // that shares a prefix with the stored key can measure whether a partial match
    // shortens or lengthens the comparison, recovering the stored key byte-by-byte.
    // "Both comparisons" is an approximation — the implementation evaluates
    // prev_epoch first and short-circuits (early return) if it matches; current_epoch
    // is only reached if prev_epoch is false. Each comparison that EXECUTES must be
    // constant-time; which comparisons execute depends on state.
    // See Appendix E.
    current_epoch = (state.recv_ratchet_pk is Some AND
                     header.ratchet_pk == state.recv_ratchet_pk)
    prev_epoch = (state.prev_recv_ratchet_pk is Some AND
                  header.ratchet_pk == state.prev_recv_ratchet_pk)

    // When recv_ratchet_pk is None (Alice's initial state, before any receive),
    // current_epoch is always false (None ≠ any public key) and prev_epoch is
    // always false (prev_recv_ratchet_pk is also None). Every incoming message
    // takes the new-epoch KEM ratchet path until the first successful decrypt
    // establishes recv_ratchet_pk. Implementations in languages without native
    // option types (C, Go) must represent None as a distinct sentinel — not
    // all-zeros — and explicitly return false for both epoch checks.
    //
    // **Why `current_epoch` also requires the `is Some` guard**: Both predicates
    // are structurally symmetric. Without the guard, a language using an all-zero
    // sentinel for "absent" public key would evaluate `header.ratchet_pk == 0x00{1216}`
    // as true whenever the header carries an all-zero ratchet_pk — routing the message
    // to the current-epoch path even though no current-epoch key has been established.
    // The guard closes this: if `recv_ratchet_pk is None`, `current_epoch` is false
    // regardless of the header's ratchet_pk value, and the message correctly takes
    // the new-epoch KEM ratchet path. Rust handles this naturally via `Option<T>`;
    // C/Go/Python implementations MUST use a distinct non-zero sentinel or an explicit
    // boolean "is_set" flag, NOT an all-zeros value, to represent the absent state.

    // Structural check: previous-epoch messages require a retained epoch key.
    if prev_epoch AND NOT current_epoch AND state.prev_recv_epoch_key is None:
        raise InvalidData

    // Snapshot all receive-side state for rollback on any failure.
    // (see "State rollback on failure" below)
    snapshot = save_recv_state(state)

    // --- Epoch-specific key derivation (priority matching) ---
    // Previous-epoch check takes priority over current-epoch: this is
    // a correctness requirement, not a tie-breaker for a rare edge case.
    // Without this ordering, a message matching both predicates (possible
    // in certain initial-state configurations where prev_recv_ratchet_pk
    // == recv_ratchet_pk) would be routed nondeterministically.
    // This configuration is unreachable through honest operation — the
    // KEM ratchet always rotates old → prev before writing new → current,
    // so the two keys are always distinct. The priority is a correctness
    // invariant against crafted or corrupted blobs, not a common case
    // requiring special handling.
    // See Abstract.md §5.4 for formal justification.
    if prev_epoch:
        msg_key = KDF_MsgKey(state.prev_recv_epoch_key, header.n)
    else if NOT current_epoch:
        // New epoch: KEM ratchet step.
        kem_ct = header.kem_ct  // must be present; absent → InvalidData
        if state.send_ratchet_sk is None:
            raise InvalidData
            // send_ratchet_sk is None when the party has never sent (e.g.,
            // Bob's initial state before his first encrypt). A forged or
            // misrouted message with an unrecognized ratchet_pk triggers
            // the new-epoch path against this state. Without this guard,
            // a reimplementer might unwrap None/null or silently use a
            // zero key instead of returning an error.
        // Decapsulate with send_ratchet_sk: the sender encapsulated to our
        // send_ratchet_pk (which we published in our last outgoing header),
        // so the matching secret key is send_ratchet_sk, not recv_ratchet_sk.
        ss = XWing.Decaps(state.send_ratchet_sk, kem_ct)
        // In lo-crypto-v1, XWing.Decaps never fails cryptographically — ML-KEM
        // uses implicit rejection (§8.4) and X25519 always produces a 32-byte
        // result. A structural DecapsulationFailed (wrong kem_ct length) is
        // reachable if the ciphertext is malformed. Both DecapsulationFailed
        // and AeadFailed trigger the same snapshot rollback: restore_recv_state
        // before returning. No state mutations have occurred yet at this point —
        // epoch rotation (saving previous epoch keys, overwriting recv_epoch_key,
        // resetting recv_count, clearing recv_seen) follows below. Rollback is
        // applied unconditionally by the snapshot-and-restore mechanism regardless.
        // DecapsulationFailed on the decrypt path is NOT session-fatal — unlike
        // AeadFailed on the encrypt path (§6.5), which zeroizes all key material.
        // The snapshot rollback restores the session to its pre-call state,
        // and the caller can retry or discard the message. Treating decrypt-side
        // DecapsulationFailed as session-fatal (by analogy with encrypt-side
        // AeadFailed) would incorrectly terminate sessions on malformed messages.

        // Rotate previous epoch: current becomes previous.
        // Only save the previous epoch if recv_ratchet_pk was set (meaningful
        // current epoch exists). On the first KEM ratchet (Alice's init state),
        // recv_ratchet_pk is None — there is no previous epoch to save.
        if state.recv_ratchet_pk is Some:
            state.prev_recv_epoch_key = state.recv_epoch_key
            // ZEROIZATION NOTE: after this assignment, `state.prev_recv_epoch_key`
            // holds the old epoch key and `state.recv_epoch_key` holds a copy.
            // Both copies must be zeroized at their respective lifetimes:
            // `prev_recv_epoch_key` is zeroized when a second KEM ratchet step fires
            // (overwritten by the then-current epoch key or set to None); `recv_epoch_key`
            // is zeroized by `KDF_Root` overwriting it two lines below. In Rust,
            // `prev_recv_epoch_key` is `Option<Zeroizing<[u8; 32]>>` — the overwrite or
            // drop triggers zeroization of the old epoch key it holds. `recv_epoch_key`
            // and `send_epoch_key` are plain `[u8; 32]` (not `Zeroizing` wrappers —
            // `[u8; 32]` is `Copy`, so `Zeroizing::new(val)` would copy rather than move,
            // leaving the original on the stack). The overwrite of `recv_epoch_key` by
            // `KDF_Root` does NOT automatically zeroize the old value; the zeroization
            // responsibility belongs to the KDF_Root call that overwrites it. C/Go/Python
            // implementations must explicitly zeroize the old `recv_epoch_key` before
            // overwriting it at line `state.recv_epoch_key = new_epoch_key` — see §6.4
            // for the analogous pattern.
            state.prev_recv_ratchet_pk = state.recv_ratchet_pk
            state.prev_recv_seen = state.recv_seen
            // ORDERING IS CRITICAL: `prev_recv_seen = recv_seen` MUST precede
            // `recv_seen = empty` (six lines below). If reversed, `empty` is copied
            // into `prev_recv_seen`, silently discarding all current-epoch replay
            // protection history. The error is undetectable — the new epoch proceeds
            // normally, but previous-epoch duplicate detection is disabled. Compare
            // §6.4's analogous `prev_send_count = send_count` MUST precede
            // `send_count = 0` ordering requirement.
        else:
            state.prev_recv_epoch_key = None
            state.prev_recv_ratchet_pk = None
            state.prev_recv_seen = empty

        // Derive new current epoch.
        (state.root_key, state.recv_epoch_key) = KDF_Root(state.root_key, ss)
        zeroize(ss)    // ss is no longer needed — zeroize immediately (mirrors §6.4).
                       // Rust's ZeroizeOnDrop covers this automatically; C/Go/Python
                       // reimplementers MUST zeroize explicitly after this line.
        state.recv_ratchet_pk = header.ratchet_pk
        state.recv_count = 0
        state.recv_seen = empty    // MUST follow `prev_recv_seen = recv_seen` above.
        state.ratchet_pending = true

        msg_key = KDF_MsgKey(state.recv_epoch_key, header.n)
    else:
        // Current epoch — the `else` branch of the three-way selector:
        //   if prev_epoch            → previous epoch (above)
        //   else if NOT current_epoch → new epoch / KEM ratchet (above)
        //   else                     → current epoch (here)
        // "Current epoch" means header.ratchet_pk == recv_ratchet_pk.
        msg_key = KDF_MsgKey(state.recv_epoch_key, header.n)

    // AEAD decryption — all epoch types converge here.
    plaintext = DecryptWithKey(msg_key, header, ciphertext, sender_fp, recipient_fp)
    // On AEAD failure: restore_recv_state(state, snapshot), raise AeadFailed.

    // --- Post-AEAD duplicate detection and recv_seen update ---
    // **Security requirement**: Duplicate detection MUST be post-AEAD, not pre-AEAD.
    // Pre-AEAD recv_seen lookup returns in nanoseconds for duplicates vs.
    // microseconds for non-duplicates (key derivation + AEAD), creating a timing
    // oracle that leaks recv_seen set membership. An attacker replaying messages
    // with different counter values can probe which counters have been successfully
    // decrypted. Post-AEAD ordering ensures both duplicate and non-duplicate
    // messages take identical time through key derivation + AEAD.
    // Duplicates succeed AEAD (same key/nonce/ciphertext) but the plaintext is
    // discarded.
    if prev_epoch AND NOT current_epoch:
        // recv_count is NOT updated — it tracks the current epoch only.
        // Previous-epoch counters occupy a different sequence space;
        // unconditionally updating recv_count would break the invariant
        // that recv_seen entries are < recv_count (guard 17).
        if header.n in state.prev_recv_seen:
            restore_recv_state(state, snapshot)
            raise DuplicateMessage
        if |state.prev_recv_seen| >= MAX_RECV_SEEN:
            restore_recv_state(state, snapshot)
            raise ChainExhausted
        state.prev_recv_seen.add(header.n)
    else:
        // Current-epoch path. For messages that arrived as "new-epoch" (different
        // ratchet_pk), the KEM ratchet step above has already updated recv_ratchet_pk
        // to the new peer key — so by this point, the message's ratchet_pk matches
        // the current epoch and is handled here, not in the prev_epoch branch.
        // New-epoch messages follow the same post-AEAD state update as current-epoch
        // messages — recv_count is incremented and n is added to recv_seen. This is
        // not a separate case; the merge is intentional. Implementations that treat
        // new-epoch as an independent code path and omit the recv_count update leave
        // recv_count = 0 after the first new-epoch message, causing guard 17 failures
        // on subsequent serialization.
        if header.n in state.recv_seen:
            restore_recv_state(state, snapshot)
            raise DuplicateMessage
        if |state.recv_seen| >= MAX_RECV_SEEN:
            restore_recv_state(state, snapshot)
            raise ChainExhausted
        state.recv_seen.add(header.n)
        state.recv_count = max(state.recv_count, header.n + 1)
        // Off-by-one trap: `recv_count = header.n` (not `+ 1`) would silently fail
        // guard 17 after a new-epoch ratchet. After the new-epoch step sets recv_count = 0,
        // the first arriving message has n = 0 → recv_count = max(0, 0) = 0, but recv_seen
        // now contains {0}. Guard 17 requires all recv_seen entries to be < recv_count —
        // 0 < 0 is false → InvalidData on next serialization. The `+ 1` produces recv_count
        // = 1 with recv_seen = {0}, which satisfies the invariant (0 < 1).

    zeroize(msg_key)
    return plaintext

function DecryptWithKey(msg_key, header, ciphertext, sender_fp, recipient_fp):
    // Minimum ciphertext length: 16 bytes (Poly1305 tag, zero-length plaintext).
    // AEAD libraries that don't gracefully handle sub-tag-length inputs (e.g.,
    // OpenSSL EVP, some Go crypto/cipher implementations) may panic or return
    // non-standard errors. Guard explicitly before calling the AEAD primitive.
    if len(ciphertext) < 16:
        raise AeadFailed
    // No maximum ciphertext length is enforced at the Rust API level.
    // XChaCha20-Poly1305 accepts inputs up to ~256 GiB, so an authenticated peer
    // can supply a ciphertext of any size, causing the receiver to allocate the
    // full buffer before AeadFailed fires. The CAPI imposes a 256 MiB hard limit
    // (§13.2) that returns InvalidLength before this function is reached.
    // Rust-layer callers and non-CAPI reimplementers MUST impose their own upper
    // bound appropriate to their deployment context.
    // IMPORTANT: sender_fp is the REMOTE party's fingerprint (the message sender),
    // and recipient_fp is the LOCAL party's fingerprint (the message recipient).
    // This is the mirror of encrypt, where sender_fp = local and recipient_fp = remote.
    // A reimplementer who always uses (local_fp, remote_fp) for both directions
    // produces mismatched AAD and silent AEAD failures.
    nonce = 0x00{20} || big_endian_32(header.n)
    header_bytes = encode_ratchet_header(header)
    aad = "lo-dm-v1" || sender_fp || recipient_fp || header_bytes
    return AEAD-Decrypt(msg_key, nonce, ciphertext, aad)

DuplicateMessage MUST NOT return plaintext: When a message is detected as duplicate (counter already in recv_seen or prev_recv_seen), the decrypted plaintext MUST be zeroized and the function MUST return only the DuplicateMessage error. An API that returns both the plaintext and a duplicate indicator enables application-layer double delivery despite the error signal. The duplicate message successfully decrypts (AEAD is deterministic — same key/nonce/ciphertext produces the same plaintext), but exposing the result defeats the purpose of duplicate detection.

DuplicateMessage plaintext zeroization obligation for non-RAII implementations: The duplicate check runs after AEAD decryption (§6.6 post-AEAD duplicate detection rationale). This means the plaintext output buffer has already been filled by the time DuplicateMessage is raised. Non-RAII implementations (C, Go, Python) that use a "free on error" pattern will free the buffer without zeroing it — leaking plaintext in freed-but-not-overwritten memory. The obligation is: on DuplicateMessage, explicitly zeroize the plaintext output buffer before returning the error (or before freeing the buffer). In Rust, wrapping the output buffer in Zeroizing<Vec<u8>> handles this automatically via Drop. In C: explicit_bzero(buf, len); free(buf); before returning. In Go: clear(buf) (or for i := range buf { buf[i] = 0 }) before discarding. The same obligation applies to AeadFailed — both errors cause the function to return after AEAD has already written into the output buffer.

recv_ratchet_pk is stored verbatim without pre-AEAD structural validation: When a new-epoch message arrives, header.ratchet_pk (1216 bytes) is stored as the new recv_ratchet_pk immediately before AEAD decryption runs. No structural validity check (all-zero test, low-order point check, ML-KEM key validation) is performed before storage. Invalid key material surfaces as AeadFailed via X-Wing's implicit rejection (§8.4) when the next KEM ratchet step attempts decapsulation with that key. A reimplementer who adds a pre-AEAD structural check on ratchet_pk — for example, rejecting an all-zero public key before attempting AEAD — creates a timing oracle: the check returns in nanoseconds while AEAD takes microseconds, allowing an attacker to probe key validity without triggering an AEAD attempt.

Receiver does not use pn for key derivation: The pn (previous epoch count) field in the header is authenticated via AAD but the receiver performs no other processing on it. In counter-mode, any message key is directly derivable from the epoch key and counter — there is no skip-cache scanning bounded by pn. Reimplementers from the Signal Double Ratchet ecosystem: pn has no skip-cache role here; tampering with pn causes AEAD failure (§7.3), nothing else. No validation constraint on pn is applied. Values from 0 to u32::MAX are all acceptable on the wire — the receiver MUST NOT add guards on pn relative to any state field (e.g., a "pn must be ≤ peer's send_count" check has no basis in this protocol and would reject valid messages).

Decrypt atomicity: Unlike encrypt (§6.4), decrypt achieves atomicity through snapshot/rollback rather than operation ordering. The encrypt path can guarantee atomicity by ordering — all fallible operations execute before any state mutation, so a failure leaves state unchanged by construction. The decrypt path cannot use ordering-based atomicity: on the new-epoch path, KEM decapsulation produces the shared secret ss, and only after decapsulation do the state mutations occur (prev_recv_epoch_key save, KDF_Root(root_key, ss) overwriting recv_epoch_key, resetting recv_count to 0, clearing recv_seen). Because state mutations occur after a fallible operation (decapsulation), a failure during or after mutation cannot be recovered by reordering alone. Since fallible cryptographic operations follow state mutations on this path, the only correct atomicity mechanism is snapshot/rollback — take a full snapshot before any mutation, restore it on any failure. The Rust reference implementation's save_recv_state / restore_recv_state (see helper definitions above) implement this contract. A reimplementer who attempts ordering-based atomicity for decrypt — placing mutations after all fallible operations — cannot do so correctly on the new-epoch path and will either fail to perform the KEM ratchet step or silently corrupt state on failure.

State rollback on failure: All receive-side state mutations are rolled back on any failure (decapsulation failure, chain exhaustion, AEAD failure, duplicate detection). The implementation takes a full snapshot of nine fields before any mutation: root_key, recv_epoch_key, recv_count, recv_ratchet_pk, ratchet_pending, recv_seen, prev_recv_epoch_key, prev_recv_ratchet_pk, and prev_recv_seen. On any error, the entire snapshot is restored wholesale. Send-side state is never mutated by decrypt and is not snapshotted.

recv_seen and prev_recv_seen snapshots must be deep copies: Both fields are sets of u32 values that grow during decryption. In Rust, Clone on HashSet<u32> always produces an independent deep copy. In Python, Go, Java, and other languages with reference semantics, a simple variable assignment copies the reference — not the contents. Mutating the live set (e.g., inserting the new counter into recv_seen) would silently corrupt the snapshot, making rollback a no-op (the snapshot points to the same backing storage as the live set). The snapshot MUST be a fully independent set with the same elements — a deep copy whose mutations during decrypt_inner do not affect the snapshot, and whose restoration on error completely replaces the live set's contents.

recv_ratchet_pk and prev_recv_ratchet_pk snapshots also require deep copies: These public-key fields have the same reference-semantics trap as the recv_seen sets. The new-epoch path executes prev_recv_ratchet_pk = recv_ratchet_pk (assignment / shallow copy) and then recv_ratchet_pk = header.ratchet_pk (mutation). In Rust, public-key structs are Clone-derived and Copy is not implemented (they're non-trivial), so the snapshot Clone is always a value copy — no alias. In Python/Go/Java, if the snapshot holds a reference to the same object as the live field, the second assignment (recv_ratchet_pk = header.ratchet_pk) does not corrupt the snapshot — the snapshot still holds the original reference, which now also happens to be the live prev_recv_ratchet_pk. But on rollback, restoring the snapshot recv_ratchet_pk to the snapshot value points it back to the original object (now shared with the live prev_recv_ratchet_pk). This leaves recv_ratchet_pk and prev_recv_ratchet_pk pointing to the same object post-rollback, so the next same-epoch message (which should route via the current recv_ratchet_pk) will compare equal to prev_recv_ratchet_pk and route incorrectly via the new-epoch path — failing AEAD, appearing as a silent session corruption. The fix: deep-copy all public key fields in the snapshot. In Python: snapshot_recv_ratchet_pk = bytes(live_recv_ratchet_pk) (or equivalent). In Go: copy the underlying byte array rather than taking a slice reference.

Snapshot zeroization obligation on the success path: The snapshot copies of root_key, recv_epoch_key, and prev_recv_epoch_key are secret key material. On the success path, the snapshot is discarded rather than restored — but discarding must mean zeroizing, not merely freeing or letting the memory go out of scope. In Rust, Zeroizing<[u8; 32]> zeroizes automatically on drop, so the success path is correct by construction. In C, Go, Python, or other non-RAII languages, the caller who implements this function MUST explicitly zero these three fields in the snapshot before returning from the success path. Failing to do so leaves copies of old key material in freed-but-not-zeroed memory, where they are recoverable via heap-scanning for the duration they remain un-overwritten. The rollback invariant is "on error restore, on success zeroize" — not "on error restore, on success ignore."

Why ratchet_pending is in the snapshot: A new-epoch decrypt tentatively sets ratchet_pending = true (§6.6 KEM ratchet step) before AEAD runs. If AEAD fails and ratchet_pending is not restored to its pre-decrypt value, the next encrypt() call fires a KEM ratchet step against the (also rolled-back) old recv_ratchet_pk using the rolled-back root_key, producing a ciphertext the peer cannot process — silent session corruption with no error on the sender side. A reimplementer who implements partial rollback (e.g., omits ratchet_pending thinking it is a flag that should remain set after any new-epoch attempt) gets exactly this failure mode.

Previous epoch grace period: When a KEM ratchet step occurs, the current epoch key is preserved as prev_recv_epoch_key. Late-arriving messages from that epoch can still be decrypted using counter-mode derivation from the old epoch key. The previous epoch key is zeroized when a second KEM ratchet step rotates it out. This provides a one-epoch grace period for out-of-order delivery without storing per-message keys. "One epoch" means one receive epoch — one KEM ratchet step in the receive direction. A send-side KEM ratchet step does not rotate prev_recv_epoch_key. Implementations that interpret "epoch" as any KEM ratchet step (send or receive) will incorrectly expect recovery across two direction changes.

recv_count asymmetry across epochs: recv_count tracks only the current receive epoch — it is the high-water mark for current-epoch message counters. Previous-epoch messages update prev_recv_seen but do NOT update recv_count. This is critical for guard 17 (§6.8): recv_seen entries must be < recv_count. If previous-epoch messages updated recv_count, a previous-epoch counter (which could be any value in [0, u32::MAX − 1]) would corrupt the high-water mark for the current epoch, invalidating the guard 17 invariant. There is no analogous prev_recv_count — when a receive epoch rotates into previous, its recv_count is not preserved. prev_recv_seen entries are bounded only by guard 14 and guard 15.

Timing asymmetry across epoch paths: New-epoch decrypt performs X-Wing decapsulation (~1ms); current-epoch and previous-epoch paths are O(1) HMAC key derivation (~microseconds). This timing difference is not a side-channel oracle because the epoch type is determined solely by comparing header.ratchet_pk (a cleartext header field) to stored public keys — an observer who can measure timing already knows the epoch type from the public key. The constant-time requirement (Appendix E) applies to the public-key comparisons themselves, not to equalizing path runtimes. Reimplementers MUST NOT add dummy KEM operations to equalize paths — this would waste CPU for no security benefit.

Decrypt error table: decrypt() / soliton_ratchet_decrypt returns five distinct variants:

InvalidData: four distinct conditions, all returning InvalidData:
- Dead session (all-zero root_key, pre-snapshot): the session has been permanently terminated by a session-fatal encrypt error (§6.5), which zeroized root_key. This InvalidData is not retryable for any message — the session is irrecoverable. Re-establish via LO-KEX.
- Epoch too old (missing prev_recv_epoch_key for a previous-epoch message, pre-snapshot): the session is still live, but the sender's message is from an epoch older than the one-epoch grace period (prev_recv_epoch_key has already been rotated out). This InvalidData is non-retryable for that specific message, but the session remains functional for current-epoch and new-epoch messages.
- kem_ct absent in a new-epoch message (post-snapshot, rollback is a no-op): the header indicates a new epoch (new recv_ratchet_pk) but carries no KEM ciphertext. No state mutations have been applied when this fires. Non-retryable for that message (structurally malformed).
- send_ratchet_sk is None on the new-epoch path (post-snapshot, rollback is a no-op): decapsulation of the peer's new-epoch ciphertext requires the local X-Wing secret key, but the party has never sent (no key was generated yet). Fires before any state mutation. Non-retryable for that message.
Callers who need to distinguish the dead-session condition from the others may inspect root_key for the all-zero sentinel before calling (checking liveness) — there is no error-code distinction at the API level between the four conditions. State is unchanged for all four. The post-snapshot cases (third and fourth) require no rollback because no mutation precedes them, but the unconditional snapshot-and-restore mechanism handles them correctly regardless.
ChainExhausted: recv_seen or prev_recv_seen saturation (transient — resets on next KEM ratchet step), or header.n == u32::MAX (counter exhaustion, not a structural error). State is unchanged. NOT session-fatal — see §12 modes (2).
DuplicateMessage: counter already in recv_seen or prev_recv_seen. State is restored via snapshot rollback on all paths — the snapshot is always taken before epoch-specific processing begins. On the current-epoch and previous-epoch paths, no state has been mutated before duplicate detection fires (key derivation and AEAD precede it, but these are read-only operations with respect to ratchet state); the snapshot restore is therefore a no-op in practice. On the new-epoch path, KEM ratchet step mutations (epoch rotation, recv_count reset, recv_seen clear) have occurred before duplicate detection — but DuplicateMessage is structurally unreachable on the new-epoch path because recv_seen was just cleared; see §6.7. In all reachable cases, the snapshot restore is correct and necessary. Plaintext is zeroized and not returned. Rollback is MANDATORY for DuplicateMessage regardless of whether visible state mutations preceded it — a reimplementer who omits rollback "because the state wasn't mutated yet" silently corrupts the session on edge-case paths.
DecapsulationFailed: X-Wing decapsulation failure on the new-epoch path — fires at XWing.Decaps() (step 1 of the new-epoch branch), before AEAD, before state mutations — decapsulation is the first operation on the new-epoch branch; epoch rotation (saving previous epoch keys, overwriting recv_epoch_key, resetting recv_count, clearing recv_seen) has not yet occurred when this error fires (see §6.6 pseudocode: decapsulation at the top of the new-epoch branch, epoch rotation below). In practice unreachable with valid-length ciphertexts — implicit rejection (§8.4) makes all correctly-sized ciphertexts succeed decapsulation and fail at AEAD instead. Snapshot rollback is applied unconditionally by the snapshot-and-restore mechanism, even though no mutation has occurred — the snapshot is taken before all epoch-specific processing and restored on any error return. State is rolled back via snapshot. NOT session-fatal for decrypt — the session remains usable.
AeadFailed: authentication tag mismatch. State is rolled back via snapshot. NOT session-fatal for decrypt (contrast: AeadFailed on encrypt IS session-fatal, §6.5). The session can process subsequent messages.

The pre-snapshot vs. post-snapshot distinction matters for rollback: ChainExhausted and the two pre-snapshot InvalidData conditions (dead session, epoch too old) fire before any cryptographic state mutation — rollback is a no-op even when the snapshot exists. The two new-epoch-path InvalidData conditions (kem_ct absent, send_ratchet_sk None) fire after the snapshot but also before any state mutation — rollback is a no-op for these as well, but the distinction from the pre-snapshot InvalidData conditions matters: the new-epoch path does apply mutations later (see §6.6 KEM ratchet steps), so a reimplementer who checks "can this path mutate state?" at InvalidData fire time gets a different answer depending on which condition fired. DuplicateMessage and AeadFailed occur after state mutations have been tentatively applied and require snapshot restoration. DecapsulationFailed occurs before state mutations — decapsulation is the first step on the new-epoch path, preceding epoch rotation (see §6.6 pseudocode) — but the snapshot-and-restore mechanism applies unconditionally regardless. Note: the reference implementation takes the snapshot unconditionally before all guards (line snapshot = save_recv_state(state), after identify_epoch() and before epoch-specific processing). The phrase "pre-mutation" describes the semantic behavior (no state was actually changed), not a conditional snapshot implementation. A reimplementer who omits the snapshot for InvalidData/ChainExhausted paths on the grounds that "no state was mutated yet" is correct only if those errors genuinely fire before any mutation — but implementing conditional snapshotting adds fragility: if a future refactor moves a mutation earlier, the conditional snapshot silently stops covering it. The unconditional snapshot is simpler and correct by construction. A reimplementer who omits rollback for DuplicateMessage or DecapsulationFailed — treating them as "pre-mutation" because they seem like early checks — silently corrupts the session state on those error paths.

Out-of-order messages: Within the current epoch, messages may arrive in any order. Each message key is derived directly from the epoch key and the message counter — no sequential chain advancement is needed. The recv_seen set prevents duplicate processing. Between epochs, messages from the immediately previous epoch are also handled (see above).

Plaintext zeroization obligation: The decrypted plaintext is secret material. In Rust, decrypt() returns Zeroizing<Vec<u8>>, which automatically zeroizes the buffer when dropped. Languages without RAII-style automatic cleanup (C, Go, Python) must explicitly zeroize the plaintext buffer after use — the library cannot manage the caller's copy. This obligation parallels the ratchet state zeroization documented in §6.10 but is easier to overlook because plaintext feels "transient." A plaintext buffer that survives in freed-but-not-zeroed memory is vulnerable to the same heap-scanning attacks that key material is.

6.7 Duplicate Detection

Duplicate detection uses a recv_seen set (current epoch) and prev_recv_seen set (previous epoch) that track successfully-decrypted message counters. Both sets are bounded at MAX_RECV_SEEN = 65536 entries as defense-in-depth against memory exhaustion.

A message is a duplicate if its counter n is already in the appropriate recv_seen set. Messages from epochs older than the previous epoch are rejected by the KEM ratchet step (the old epoch key no longer exists; AEAD decryption will fail).

Unlike the Signal Double Ratchet's skip cache (which stores 32-byte message keys per skipped position), the recv_seen sets store only 4-byte counters — no secret key material. This eliminates the need for TTL expiry, purge throttling, and the ZeroizeOnDrop concerns associated with HashMap rehashing.

New-epoch path: For messages that trigger a KEM ratchet step (new-epoch), DuplicateMessage is unreachable by construction — the KEM ratchet step clears recv_seen to empty before duplicate detection runs, so the contains() check always returns false. The rollback covers this path only for theoretical completeness.

6.7.1 Worked Example: Four-Message Exchange

The following walkthrough traces a minimal Alice↔Bob exchange, showing the header values (n, pn, kem_ct) and the recv_count high-water mark for each message. This is the primary checkable reference for reimplementers verifying their counter and ratchet logic. recv_count is updated as max(recv_count, n+1) on each received message and resets to 0 on KEM ratchet (epoch transition).

Initial state (after §5.4/§5.5 session establishment):

Alice:  send_count=1, recv_count=0, prev_send_count=0
        ratchet_pending=false, send_ratchet_pk=Some(EK_pub)

Bob:    send_count=0, recv_count=1, prev_send_count=0
        ratchet_pending=true, send_ratchet_pk=None

Message 1 (A→B): Alice continues her first epoch (no ratchet needed).

n=1, pn=0, kem_ct=None

Alice's send_count advances to 2. Bob decrypts with his recv_epoch_key at counter 1. Bob's recv_count updates: max(1, n+1) = max(1, 2) = 2.

Message 2 (B→A): Bob's first send. ratchet_pending=true fires the KEM ratchet step.

n=0, pn=0, kem_ct=Some(...)

Bob had no previous send epoch (prev_send_count=0), so pn=0. The KEM ciphertext is encapsulated against Alice's send_ratchet_pk (which is EK_pub). Alice decrypts, sees the new ratchet_pk, and sets ratchet_pending=true. Alice's recv_count resets to 0 on epoch transition (KEM ratchet step), then updates: max(0, n+1) = max(0, 1) = 1.

Message 3 (A→B): Alice sends again. ratchet_pending=true (from receiving Bob's KEM ciphertext) fires her KEM ratchet step.

n=0, pn=2, kem_ct=Some(...)

pn=2 is the critical non-obvious value. Alice's previous send epoch had send_count=2 at the moment the ratchet fired (one message sent at n=1, which advanced send_count to 2). A reimplementer who initializes Alice's send_count at 0 instead of 1 would see pn=1 here. Bob's recv_count resets to 0 on epoch transition, then updates: max(0, n+1) = max(0, 1) = 1.

Message 4 (B→A): Bob sends again. ratchet_pending=true (from receiving Alice's KEM ciphertext) fires his KEM ratchet step.

n=0, pn=1, kem_ct=Some(...)

Bob's previous send epoch had send_count=1 (pn=1). Alice's recv_count resets to 0 on epoch transition, then updates: max(0, n+1) = max(0, 1) = 1.

After these four messages, both parties have completed two full KEM ratchet cycles. Every subsequent direction change triggers a new KEM ratchet step with the expected pn = send_count at the ratchet boundary.

Hard limit on late-arriving messages: The one-epoch grace period (§6.6) means messages from the immediately previous receive epoch can still be decrypted. Messages from any older epoch are permanently undecryptable — the epoch key was zeroized when the second KEM ratchet step rotated it out. This is a protocol-level hard limit, not a buffering opportunity: no amount of caching or reordering at the transport layer can recover a message whose epoch key has been zeroized. Application designers must ensure that transport-layer message ordering keeps latency within one direction change. In practice, epochs in an active conversation are short (1-10 messages between direction changes), so only messages delayed across two or more direction changes are lost.

6.8 Ratchet State Serialization

Ratchet state is serialized for encrypted persistent storage. The caller MUST authenticated-encrypt the output before persisting (e.g., via §11 storage encryption) — the serialized form contains all secret key material.

Epoch increment on serialization: to_bytes increments the epoch counter before writing it to the blob and returns the new epoch as its second return value: (blob, new_epoch). The stored value is epoch + 1, not the pre-serialization epoch. from_bytes loads this value as-is — no increment on load. The idiomatic anti-rollback pattern is to persist both the blob and new_epoch - 1 as the min_epoch for subsequent loads — not new_epoch itself. Using new_epoch directly as min_epoch makes the current blob permanently unloadable: the guard new_epoch > new_epoch is false, so from_bytes_with_min_epoch(blob_N, new_epoch) always returns InvalidData, breaking crash recovery. The correct stored value is new_epoch - 1, ensuring new_epoch > new_epoch - 1. See Caller Obligation 2 for the full crash-safe commit order. A reimplementer who computes epoch + 1 manually instead of using the return value risks off-by-one errors. A reimplementer who increments on load instead of on save produces incompatible blobs and breaks anti-rollback (the stored epoch would be one behind, potentially equal to the last-seen epoch instead of strictly greater).

Ownership-consuming serialization: to_bytes consumes (invalidates) the in-memory ratchet state. After serialization, the original state is zeroized and no longer usable. This prevents ratchet forking: if two copies of the same state existed simultaneously, both could encrypt with the same (epoch_key, send_count) pair, causing catastrophic AEAD nonce reuse. In languages without move semantics (C, Go, Python), implementations MUST explicitly zeroize and disable the state after serialization — failing to do so enables nonce reuse and full plaintext recovery. Exception — ChainExhausted: When to_bytes returns ChainExhausted (guard 24: epoch at u64::MAX, or guard triggered by counter saturation), the state is NOT consumed — the in-memory ratchet remains valid and can continue sending/receiving. See guard 24 for recovery semantics. A reimplementer who models to_bytes as always-consuming will destroy a recoverable session on counter exhaustion. Mechanism — can_serialize() predicate: In languages with consuming/move semantics (including Rust), can_serialize() MUST be called before to_bytes — it is not optional. Calling to_bytes(self) directly without a prior can_serialize() check risks consuming (moving) the session into to_bytes and losing it if to_bytes returns ChainExhausted. In Rust, once the session is moved into to_bytes and the function returns an error, the session is gone (the self was consumed). The "state not consumed on ChainExhausted" contract described above applies only to the CAPI layer (which uses the can_serialize() pre-check before taking ownership); at the Rust API level, the caller is responsible for calling can_serialize() first. The Rust core exposes RatchetState::can_serialize(&self) -> bool, which checks all six conditions that to_bytes would fail on: send_count == u32::MAX, recv_count == u32::MAX, prev_send_count == u32::MAX, epoch == u64::MAX, recv_seen.len() >= MAX_RECV_SEEN, or prev_recv_seen.len() >= MAX_RECV_SEEN. If can_serialize() returns true, to_bytes is guaranteed to succeed (no ChainExhausted). A reimplementer's can_serialize() must cover exactly these six conditions. The CAPI layer calls can_serialize() before taking ownership, which is why the CAPI to_bytes only visibly checks epoch — the other conditions are filtered by the pre-check. recv_count reachability: Unlike send_count, which is guarded at the encrypt side (§6.5 ChainExhausted fires before send_count reaches u32::MAX), recv_count has no equivalent decrypt-side guard — rejecting a valid message solely because it would push recv_count to u32::MAX would be incorrect. A message with header.n = u32::MAX - 1 is accepted, producing recv_count = u32::MAX. This is reachable only after ~4.3 billion received messages in a single epoch without a KEM ratchet step — implausible but structurally possible. Once recv_count == u32::MAX, can_serialize() returns false and the session is un-serializable until the next KEM ratchet step, which resets recv_count to 0 (§6.6). Recovery is through a direction change (the peer sends, triggering a KEM ratchet). If the session is one-directional with no peer replies, a new LO-KEX exchange is required.

Defense-in-depth conditions: can_serialize() also checks recv_seen.len() >= MAX_RECV_SEEN and prev_recv_seen.len() >= MAX_RECV_SEEN. The runtime cap in decrypt (§6.6) prevents these from firing in practice, but without the pre-check, a future refactor removing the runtime cap would cause to_bytes to fail with InvalidData rather than ChainExhausted, breaking the documented guarantee that can_serialize() == true implies to_bytes succeeds. Error type note: if the recv_seen size cap is somehow bypassed (future refactor, direct state manipulation), to_bytes returns InvalidData — not ChainExhausted — because the overflow is a structural violation, not a counter exhaustion. The can_serialize() predicate exists precisely to unify these different underlying error types into a single boolean: a reimplementer who checks only for ChainExhausted from to_bytes will miss the InvalidData from recv_seen overflow and treat it as a non-recoverable failure when it is actually recoverable via KEM ratchet step (same as counter exhaustion).

InvalidData from to_bytes consumes the session state — asymmetry with ChainExhausted: When to_bytes returns ChainExhausted, the CAPI layer's can_serialize() pre-check ensured the state was never consumed (handle not nulled, session still live). When to_bytes returns InvalidData (the recv_seen overflow path if can_serialize() is bypassed), the state IS consumed: at the Rust API level, self was moved into to_bytes and the session is gone; at the CAPI level, the handle is nulled on any path that takes ownership and then fails. A CAPI caller treating InvalidData from to_bytes as retryable — analogous to ChainExhausted — is holding a dangling handle. The asymmetry: ChainExhausted from to_bytes → state not consumed, retryable (wait for direction change); InvalidData from to_bytes → state consumed, session lost, must re-establish via LO-KEX.

Scope of can_serialize(): The predicate guarantees only that ChainExhausted will not be returned. It does not check liveness conditions like root_key != 0x00{32} (guard 25). to_bytes() succeeds on a dead/reset session — all counters are within bounds, so can_serialize() returns true and to_bytes() produces a blob. The failure is deferred: from_bytes() subsequently rejects the all-zero root_key via guard 25 (InvalidData). can_serialize() == true therefore guarantees to_bytes() success, but does NOT guarantee from_bytes() success on the resulting blob. In practice this is academic: encrypt() and decrypt() both reject dead sessions before any state mutation, so a dead session never accumulates state worth serializing. The full guarantee for a round-trip that survives both to_bytes() and from_bytes() is can_serialize() == true AND the session is alive (initialized via init_alice/init_bob and not subsequently reset()).

Serialization buffer zeroization: The serialization output buffer contains root keys, epoch keys, and ratchet secret keys — all secret material. Implementations SHOULD pre-allocate the buffer to its exact final size before writing any data. If a dynamic array (Vec, list, slice) reallocates during serialization, the abandoned allocation containing partial secret material is freed to the heap allocator without zeroization — only the final allocation is covered by zeroize-on-drop. In Rust, pre-allocation is the actual guarantee; a debug_assert on capacity detects underestimates during testing but is compiled out in release builds — if the pre-computed size is wrong, a release binary silently reallocates and the abandoned partial allocation is freed without zeroization. In C/Go/Python, pre-compute the buffer size and allocate once. The output buffer itself MUST be zeroized after the caller has finished with it (e.g., after passing it to storage encryption).

MAX_RECV_SEEN cap at runtime: When recv_seen or prev_recv_seen reaches MAX_RECV_SEEN (65536) entries during decrypt, subsequent messages in that epoch return ChainExhausted. This is transient — the cap resets on the next KEM ratchet step (which clears recv_seen). The cap prevents unbounded memory growth from an authenticated peer sending many messages in a single epoch.

Which epoch paths can fire this cap: The recv_seen saturation check (ChainExhausted) fires only on the current-epoch path (messages in the active receive epoch) and the previous-epoch path (messages in the prior epoch, checked against prev_recv_seen). The new-epoch path (which triggers a KEM ratchet step) is immune: it clears recv_seen to empty before the duplicate/cap check, so the cap check on that path is structurally unreachable — a new epoch always starts with an empty recv_seen. A reimplementer testing ChainExhausted from recv_seen saturation MUST use the current-epoch or previous-epoch path, not the new-epoch path.

prev_recv_seen recovery requires two KEM ratchet steps, not one. When recv_seen saturates, one KEM ratchet step clears it (new epoch starts with empty recv_seen). However, when prev_recv_seen saturates (the previous-epoch path hits the cap), the first KEM ratchet step copies the current recv_seen (which may itself be full) into prev_recv_seen — overwriting the saturated set with another potentially-full set. Only the second KEM ratchet step clears prev_recv_seen by rotating it out entirely (§6.6: the previous-epoch state is overwritten by the current epoch rotating into previous, and the second ratchet step makes the previously-rotated-in set into prev_recv_seen, which was the then-current recv_seen — empty or small if the second epoch was short). A caller who expects one direction change to unblock all ChainExhausted errors from decrypt() will be wrong for the previous-epoch saturation case — two direction changes (two KEM ratchet steps) are required. Qualification: two steps are sufficient only if the second epoch accumulates fewer than MAX_RECV_SEEN messages before the second direction change. If the second epoch also saturates recv_seen, the first KEM step copies that full set into prev_recv_seen, and a third direction change is needed. In the degenerate case where every epoch saturates, each direction change clears recv_seen but may refill prev_recv_seen — the pattern converges only when an epoch is short enough to stay below the cap.

Wire format (version 0x01):

[version: 1 byte = 0x01]
[epoch: u64 BE — anti-rollback monotonic counter]
[root_key: 32 bytes]
[send_epoch_key: 32 bytes]
[recv_epoch_key: 32 bytes]
[local_fp: 32 bytes]
[remote_fp: 32 bytes]
[send_ratchet_sk: optional field]
[send_ratchet_pk: optional field]
[recv_ratchet_pk: optional field]
[prev_recv_epoch_key: optional 32-byte field — EXCEPTION: encoded as 0x01 + 32 bytes (no 2-byte length prefix)]
[prev_recv_ratchet_pk: optional field]
[send_count: u32 BE]
[recv_count: u32 BE]
[prev_send_count: u32 BE]
[ratchet_pending: 1 byte, 0x00=false, 0x01=true]
[num_recv_seen: u32 BE]
[recv_seen entries × num_recv_seen: each u32 BE, sorted ascending]
[num_prev_recv_seen: u32 BE]
[prev_recv_seen entries × num_prev_recv_seen: each u32 BE, sorted ascending]

Optional field encoding: Each optional field is prefixed with a marker byte:

0x00 — absent (1 byte total; no data follows)
0x01 — present (1-byte marker + 2-byte BE length + data bytes)

Decoders MUST treat any marker byte other than 0x00 or 0x01 as InvalidData. Do NOT treat arbitrary non-zero values as "present" — doing so creates format malleability (multiple byte values encoding the same logical state) and accepts blobs that no conforming encoder produces. This strictness applies equally to the ratchet_pending boolean (which uses the same 0x00/0x01 encoding).

Exception: prev_recv_epoch_key uses fixed-size encoding (0x01 + 32 bytes, no 2-byte length prefix) since the size is always exactly 32 bytes. Implementers MUST NOT apply the general 0x01 + length + data rule to this field — doing so produces blobs that are not interoperable.

Worked byte sequence for the present case: If prev_recv_epoch_key = [0xAA × 32], the encoded field is: 01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa (33 bytes — 1-byte marker followed directly by 32 key bytes). Compare with a general optional field: if send_ratchet_sk were present, its encoding would be 01 09 80 bb bb...bb (1-byte marker + 2-byte BE length 0x0980 = 2432 + 2432 data bytes). For prev_recv_epoch_key, the two length-prefix bytes are absent — the marker 0x01 is followed immediately by the 32 key bytes. A decoder that reads a 2-byte length prefix after the 0x01 marker would interpret key bytes 1-2 as a spurious length, then misalign all subsequent fields by 2 bytes, producing InvalidData with no diagnostic pointing to this specific field.

A present marker with a zero-length body is rejected as InvalidData.

Expected field sizes for length-prefixed optional fields: Decoders MUST reject blobs where the 2-byte length prefix does not equal the expected size:

Field	Expected size	Type
`send_ratchet_sk`	2432 bytes	Fully expanded X-Wing secret key (§8.5). The 2432-byte size is the expanded form (32 X25519 bytes + 2400 ML-KEM-768 expanded bytes) — NOT the 32-byte seed. Guard 2 rejects any blob where the length prefix is not exactly 2432. Storing the compact 32-byte seed and re-expanding at load time produces a different length and triggers guard 2 with `InvalidData`.
`send_ratchet_pk`	1216 bytes	X-Wing public key
`recv_ratchet_pk`	1216 bytes	X-Wing public key
`prev_recv_epoch_key`	32 bytes	Epoch key (fixed-size encoding, no length prefix — see exception above)
`prev_recv_ratchet_pk`	1216 bytes	X-Wing public key

Version history:

0x01-0x04: previous chain-ratchet designs (not supported).
0x05: counter-mode epoch keys, previous epoch key, recv_seen sets (current).

Forward compatibility: Implementations MUST reject any version byte other than 0x01 with UnsupportedVersion. Do NOT attempt to parse unknown versions using current-version rules — field layout changes between versions are not backwards-compatible.

Annotated byte-offset table: Appendix F contains a byte-offset-annotated layout for Alice's and Bob's initial states (field name, byte range, and size), useful for debugging serialization interoperability. Refer to Appendix F when implementing an encoder — the compact field listing above gives field order but not absolute offsets, which depend on the sizes of preceding optional fields and are easiest to verify against the Appendix F examples.

Deserialization validation (twenty-four active guards, numbered 1-25 with guard 4 removed in v5; implementations may decompose these into more code-level checks — e.g., guard 20 produces two checks (one per fingerprint), guard 19+20 together produce three checks). All 24 active guards apply exclusively to from_bytes / from_bytes_with_min_epoch. The to_bytes path enforces only the six can_serialize() conditions (guards 5 and 24 for counters/epoch, plus the two recv_seen size caps). Guards like 25 (all-zero root_key) are not checked on serialization — serializing a dead session produces a valid blob that fails on reload, which is acceptable for cleanup flows:

Version byte must be 0x01; other values → UnsupportedVersion.
send_ratchet_sk and send_ratchet_pk must be both present or both absent.
recv_count > 0 requires recv_ratchet_pk present.
(Removed in v5.) recv_count == 0 with recv_ratchet_pk present is a valid state. It occurs after a KEM ratchet step in decrypt (§6.6) sets recv_ratchet_pk to the new peer key and resets recv_count to 0, before any message in the new epoch is successfully decrypted. If the triggering message's AEAD fails and the state is rolled back, or if serialization occurs before the next successful decrypt, the blob has recv_count == 0 with recv_ratchet_pk present. Rejecting this combination makes rollback-then-serialize a permanent deserialization failure — sessions become un-deserializable whenever serialization occurs between a KEM ratchet step and the first successful AEAD in the new epoch, a common app-lifecycle event (e.g., the app is backgrounded or killed between receiving a new-epoch header and completing decryption). Reimplementers adding sanity checks MUST NOT treat this combination as invalid.
No counter may equal u32::MAX → InvalidData. For send_count, this is unreachable by construction — the encrypt-side ChainExhausted guard (§6.5) fires at u32::MAX - 1, preventing send_count from reaching u32::MAX. Invariant: send_count ∈ [0, u32::MAX − 1] in any reachable ratchet state (specifically [1, u32::MAX − 1] in Alice's initial epoch, [0, u32::MAX − 1] in post-ratchet epochs). Contrast: recv_count ∈ [0, u32::MAX] — there is no decrypt-side guard preventing u32::MAX. For prev_send_count, the same interlock applies: prev_send_count is set from send_count during a KEM ratchet step, and the encrypt guard prevents send_count from reaching u32::MAX. Invariant: prev_send_count < u32::MAX in any reachable ratchet state. A reimplementer who relaxes the encrypt-side guard (e.g., to send_count >= u32::MAX - 1) would allow prev_send_count to reach u32::MAX, causing this deserialization guard to fire and permanently breaking the session. For recv_count, unlike the send-side counters, u32::MAX is legitimately reachable — a peer who sends message n = u32::MAX − 1 causes recv_count = max(recv_count, n + 1) = u32::MAX (there is no decrypt-side guard analogous to the encrypt-side ChainExhausted). This guard makes the session un-serializable, but the session remains functional in memory. Recovery: the peer triggers a KEM ratchet step (direction change), which resets recv_count to 0 in the new epoch. The error returned SHOULD indicate "un-serializable, recoverable by direction change" rather than "corruption" — the can_serialize() predicate (which checks all three counters) will return false until the KEM ratchet step occurs.

Asymmetry between to_bytes and from_bytes for recv_count = u32::MAX: When recv_count == u32::MAX, can_serialize() returns false and to_bytes returns ChainExhausted — the state is not consumed, and the session remains functional in memory. When a blob with recv_count == u32::MAX is loaded from storage (an unusual scenario — such a blob cannot originate from a correctly-functioning implementation, since can_serialize() prevents to_bytes from ever writing such a blob; it could originate from a different implementation without the can_serialize() pre-check, storage corruption, or a compatibility scenario with a prior implementation version), from_bytes returns InvalidData (this guard). The error semantics differ: ChainExhausted from to_bytes is recoverable (peer triggers direction change); InvalidData from from_bytes is not (the session is permanently broken from the persistence layer's perspective). A caller handling the persistence layer MUST distinguish these two cases — the recovery action differs: for ChainExhausted from to_bytes, wait for the peer to send; for InvalidData from from_bytes, discard the session and re-establish via LO-KEX.
ratchet_pending requires recv_ratchet_pk present.
send_count > 0 requires send_ratchet_sk present.
send_count == 0 with send_ratchet_sk present → InvalidData. This state exists transiently inside encrypt() between PerformKEMRatchetSend setting send_count = 0 and the post-AEAD send_count += 1. The exclusive access model (§6.2) prevents serialization from capturing that window — a correctly-implemented ratchet never produces a blob with this combination.
send_count == 0 && !ratchet_pending && recv_ratchet_pk.is_some() && send_ratchet_sk.is_none() → InvalidData. This state means a peer ratchet key was received but ratchet_pending is false — unreachable because recv_ratchet_pk is only set during session init (where ratchet_pending = true for Bob) or during a KEM ratchet step in decrypt (which always sets ratchet_pending = true).
All-default state → InvalidData. Precise predicate (5 conditions): send_count == 0 && recv_count == 0 && !ratchet_pending && send_ratchet_sk.is_none() && recv_ratchet_pk.is_none(). The remaining fields (prev_send_count, send_ratchet_pk, prev_recv_epoch_key, prev_recv_ratchet_pk, recv_seen, prev_recv_seen) are not checked — they are redundant given other guards: send_ratchet_pk is constrained by guard 2 (co-presence with send_ratchet_sk); prev_recv_epoch_key and prev_recv_ratchet_pk are constrained by guard 13 (co-presence); prev_send_count == 0 is implied by send_count == 0 (prev_send_count is set from send_count at ratchet time, and no ratchet has fired if send_count == 0 and send_ratchet_sk is absent); recv_seen emptiness is implied by recv_count == 0 (guard 17 requires entries < recv_count); prev_recv_seen emptiness is implied by prev_recv_epoch_key.is_none() (guard 18). This guard catches blobs where root_key and fingerprints are non-zero but everything else is at initialization defaults — a state unreachable by construction (init_alice sets send_count = 1, init_bob sets recv_count = 1 and ratchet_pending = true).
Trailing bytes after complete parse → InvalidData.

Truncated input returns InvalidData, not InvalidLength: If the blob is shorter than expected (buffer runs out before all required fields are read), the decoder returns InvalidData — not InvalidLength. InvalidLength would leak parser state: an attacker could probe inputs of increasing length and observe the error transition from InvalidLength to InvalidData, revealing the byte offset where parsing advanced past the size check, progressively exposing the internal blob layout. Using InvalidData for truncation collapses this oracle. A reimplementer who returns InvalidLength for short blobs produces a distinguishable error type for the same condition. 12. epoch must be strictly greater than the last-seen epoch for the same session → InvalidData (anti-rollback; prevents storage-layer replay of older blobs that would cause AEAD nonce reuse). 13. prev_recv_epoch_key and prev_recv_ratchet_pk must be both present or both absent. 14. num_recv_seen and num_prev_recv_seen must be strictly less than MAX_RECV_SEEN (65536) — consistent with the runtime cap in decrypt that rejects at the boundary. 15. Each recv_seen entry and each prev_recv_seen entry must not equal u32::MAX. Both sets must be in strictly ascending order (which also enforces no duplicates). Non-ascending order or a u32::MAX entry in either set → InvalidData. Rationale for u32::MAX exclusion: the decrypt-side ChainExhausted guard (§6.6) fires before processing any message with header.n == u32::MAX, preventing such a message from ever being successfully decrypted. A counter of u32::MAX can therefore never legitimately appear in recv_seen or prev_recv_seen — any blob that claims otherwise is malformed. Rationale for sorted order: deterministic serialization — identical ratchet state must produce byte-for-byte identical blobs for persistent storage recovery and anti-rollback epoch comparison. Both recv_seen and prev_recv_seen must be sorted; sorting only one set breaks blob determinism. The sort is enforced at serialization time (to_bytes); a reimplementer who stores entries in insertion order and sorts lazily on decode violates the identical-blob invariant. 16. prev_recv_epoch_key all-zero → InvalidData (deterministic message keys). 17. Each recv_seen entry must be < recv_count (high-water mark consistency). There is no analogous prev_recv_count high-water mark for prev_recv_seen — when a receive epoch rotates into previous, its recv_count is not preserved. prev_recv_seen entries are bounded only by guard 14 (< MAX_RECV_SEEN) and guard 15's u32::MAX exclusion; any value in [0, u32::MAX − 1] is valid. This asymmetry is intentional and safe: prev_recv_seen is a bounded, append-only set (bounded by guard 14) that is discarded on the next KEM ratchet step — no computation depends on a high-water mark relationship between prev_recv_seen entries and any stored counter. A reviewer noticing the asymmetry should not add a prev_recv_count field to close it — doing so would require persisting the previous epoch's recv_count, adding wire format complexity for no security benefit. The absence of the check is a known, deliberate asymmetry. 18. Non-empty prev_recv_seen requires prev_recv_epoch_key present. The converse does not hold — prev_recv_epoch_key present with prev_recv_seen empty is valid (occurs immediately after the first KEM ratchet step, before any previous-epoch messages arrive). 19. local_fp == remote_fp → InvalidData (self-sessions break AAD symmetry). 20. All-zero local_fp or remote_fp → InvalidData (indicates uninitialized fingerprints). 21. All-zero send_ratchet_sk[0..32] (the X25519 scalar portion, per §8.1 layout) when present → InvalidData. Only the X25519 scalar (first 32 bytes) is checked: an all-zero scalar produces an all-zero X25519 DH output during X-Wing decapsulation, collapsing the combiner and eliminating X25519's contribution to the shared secret. The ML-KEM-768 component (bytes 32-2432) has internal structure validated by the ml-kem crate's own deserialization — an all-zero ML-KEM key is rejected at the library boundary before this guard runs. Note: X25519 clamping (RFC 7748 §5: set bits 0,1,2 clear, bit 254 set, bit 255 clear) makes an all-zero scalar impossible for honestly-generated keys — this guard catches only maliciously crafted or corrupted blobs. 22. recv_count > 0 OR recv_ratchet_pk present, with all-zero recv_epoch_key → InvalidData (liveness: a post-initial-state session must have a real epoch key — otherwise HMAC-based message key derivation produces publicly computable keys). 23. send_count > 0 && !ratchet_pending with all-zero send_epoch_key → InvalidData (liveness: messages were sent with a non-functional epoch key). The !ratchet_pending conjunction is defense-in-depth: the state (ratchet_pending=true, send_count > 0, all-zero send_epoch_key) is unreachable by construction — any send_count > 0 implies a prior KEM ratchet step that produced a non-zero send_epoch_key. The condition prevents incorrectly rejecting a theoretically-possible-but-implementation-unreachable serialized state.

**Bob's initial all-zero `send_epoch_key` is valid**: Bob's initial state has `send_count = 0`, `ratchet_pending = true`, and `send_epoch_key = 0x00{32}` (placeholder). Guards 22 and 23 both skip this state by their conjunctions: guard 22 requires `recv_count > 0` OR `recv_ratchet_pk` present (Bob has `recv_count = 1` and `recv_ratchet_pk` set, but his all-zero key is `send_epoch_key`, not `recv_epoch_key`); guard 23 requires `send_count > 0` (Bob has `send_count = 0`). The all-zero `send_epoch_key` is safe because `ratchet_pending = true` guarantees it will be replaced by `KDF_Root` output during the first `encrypt()` call's KEM ratchet step — it is never used for key derivation. A reimplementer adding their own all-zero-key sanity check must exclude this initial state or Bob's first serialization will be rejected.

epoch at u64::MAX → ChainExhausted (next serialization would overflow). Unlike ChainExhausted from encrypt() (§6.5), this is not session-fatal: the in-memory ratchet state is NOT consumed — it remains valid and can continue sending and receiving. However, the session is permanently un-serializable: epoch is incremented only by to_bytes() and reset only by reset() — no KEM ratchet step or other operation resets it. The only recovery is reset() followed by a new LO-KEX exchange. (In practice, u64::MAX serializations is unreachable — included for completeness.) This guard fires on both to_bytes and from_bytes: from_bytes also rejects a stored epoch field equal to u64::MAX with ChainExhausted. Loading such a blob would produce a session that can send and receive in memory but immediately returns ChainExhausted on the next to_bytes call — permanently un-persistable without a single message processed. Rejecting at deserialization prevents this "zombie session" state. A reimplementer who checks guard 24 only on the serialization path creates a session that appears to load successfully but can never be saved.
All-zero root_key → InvalidData (session was zeroized/reset — not a valid deserializable state). Constant-time comparison via ct_eq.

Anti-rollback deserialization: The primary deserialization entry point is from_bytes_with_min_epoch(blob, min_epoch), which enforces guard 12: the blob's epoch field must be strictly greater than min_epoch (i.e., epoch > min_epoch, not >=). This prevents storage-layer replay attacks where an adversary substitutes an older serialized blob to rewind the ratchet to a prior state — which would re-derive the same epoch keys and restart counters from their prior values, causing catastrophic AEAD nonce reuse.

The epoch counter is a u64 incremented by to_bytes() each time the state is serialized. It is a persistence-layer counter independent of cryptographic epochs (KEM ratchet steps) — multiple serializations may occur within a single cryptographic epoch. Consequently, multiple valid blobs with different epoch values can encode identical cryptographic state (same keys, counters, and recv_seen sets). The anti-rollback mechanism prevents loading a blob with an older persistence epoch, but does not prevent two blobs encoding the same cryptographic state from coexisting on disk. The protection is "no blob older than the last loaded" — not "no blob with a prior cryptographic state."

Caller obligations:

Store min_epoch with independent integrity. The caller must persist the last-seen epoch value in a location whose integrity is independent of the ratchet blob itself. If an adversary who can substitute the ratchet blob can also substitute min_epoch, the anti-rollback mechanism is defeated. Suitable approaches: a separate authenticated store, a monotonic counter in secure hardware, or a key-value store where each entry is independently authenticated.
Atomic commit of blob + min_epoch. The min_epoch update and the new blob must be committed atomically (or in the correct order) to survive crashes. The safe pattern: after to_bytes() returns (blob_N, epoch_N), persist blob_N first, then update min_epoch to epoch_N - 1. This ensures the current blob is always reloadable: epoch_N > epoch_N - 1 holds. Updating min_epoch to epoch_N (the blob's own epoch) before persisting the next blob is dangerous: if the application crashes before the next to_bytes(), blob_N is no longer loadable (epoch_N > epoch_N is false) and the session is permanently lost. The invariant is: stored_min_epoch < epoch_of_current_blob, so the current blob always passes guard 12.
First-session bootstrap. For a newly established session (first deserialization ever), the caller should pass min_epoch = 0. The initial to_bytes() call sets epoch = 1, which satisfies 1 > 0. Subsequent deserializations pass the stored epoch from the prior successful load.
Per-session tracking. Each ratchet session requires its own min_epoch value — sessions are independent and their epoch counters are unrelated. The caller must map session identifiers to their respective min_epoch values. The stable session identity is the (local_fp, remote_fp) pair — the two 32-byte fingerprints supplied to init_alice/init_bob. These are invariant across serialization/deserialization cycles and survive restarts; application-layer connection IDs, database row IDs, or in-process object pointers do not survive restarts and MUST NOT be used as per-session min_epoch keys. A global min_epoch shared across sessions enables cross-session substitution: if session A has epoch = 50 and session B has epoch = 30, an attacker who can write to the storage layer can replace session B's blob with session A's blob — the global min_epoch (set to 29 from B's last load) accepts A's epoch 50, and the application now has session A's ratchet state in session B's slot, causing messages to B's peer to be encrypted under A's keys (undecryptable by B's peer, but revealing A's ratchet state to B's storage layer).
Handle ChainExhausted from to_bytes(). If to_bytes() returns ChainExhausted (epoch at u64::MAX), the session is permanently un-serializable. The caller must treat this as session-fatal for persistence purposes: discard the session and establish a new one via LO-KEX (§5). The in-memory ratchet remains functional for sending and receiving, but any crash or restart without a persisted blob loses the session state. Callers SHOULD check can_serialize() before long-running operations and proactively re-establish the session rather than waiting for to_bytes() failure. In practice, u64::MAX serializations is unreachable.
to_bytes is ownership-consuming — serialize before invalidating. In Rust, to_bytes(self) enforces this at the type level (the ratchet is moved into the function). In languages without ownership semantics (C, Go, Python), the implementation must complete serialization of the entire blob before invalidating the handle. An implementation that nulls the handle pointer (or frees the backing memory) before finishing serialization leaves the caller with neither the original state nor a complete blob — a total session loss on crash. The safe pattern: serialize all fields into the output buffer, then zeroize and free the internal state, then return the blob. The CAPI (soliton_ratchet_to_bytes) follows this pattern: the output buffer is fully written before the handle is freed.

from_bytes / from_bytes_with_min_epoch error table (all callers MUST handle all three error types):

Error	Condition	Recovery
`UnsupportedVersion`	Version byte ≠ 0x01 (guard 1)	Reject blob; re-establish via LO-KEX. No migration path — old-version blobs are permanently unreadable. For version bytes > 0x01 (future versions), upgrading to an implementation that supports that version may enable recovery.
`InvalidData`	Any guard 2-11, 13-23, or 25 fires; or input too large	Blob is structurally invalid or corrupt; re-establish via LO-KEX
`ChainExhausted`	`epoch == u64::MAX` (guard 24)	Stored epoch at u64::MAX cannot be re-serialized (`to_bytes` overflows on `epoch + 1`). States with stored epoch u64::MAX - 1 are accepted but non-serializable (`can_serialize()` returns false, `to_bytes` returns `ChainExhausted`); re-establish via LO-KEX

Callers matching only InvalidData | UnsupportedVersion will misclassify ChainExhausted: guard 24 fires for epoch == u64::MAX, returning ChainExhausted from from_bytes, not InvalidData. A caller who pattern-matches only the first two variants will treat ChainExhausted as an unhandled/unexpected error, potentially panicking or applying the wrong recovery action. ChainExhausted from from_bytes requires the same recovery as InvalidData (re-establish via LO-KEX), but the caller must handle it explicitly to avoid the default/unmatched case.

Blob size bound: The maximum valid ratchet blob is bounded by the wire format — with 65,535 entries in both recv_seen sets (guard 14 rejects num_seen >= MAX_RECV_SEEN (65536), so the maximum accepted count is 65,535), the blob reaches approximately 530 KB. The dominant term is two recv_seen sets × 65,535 entries × 4 bytes per u32 = 524,280 bytes; the remaining fixed fields (two X-Wing public keys at 1,216 bytes each, the X-Wing secret key at 2,432 bytes, two 32-byte epoch keys, fingerprints, and header fields) add approximately 5 KB. CAPI implementations apply a 1 MiB cap on from_bytes input as defense-in-depth against oversized inputs (tighter than the general 256 MiB CAPI cap, since ratchet blobs have a known bounded size). Reimplementers building their own deserialization entry point should apply a similar cap. Minimum valid blob size is 195 bytes — all optional fields absent (0x00 markers, 5 bytes), both recv_seen counts zero (2 × 4 = 8 bytes), with fixed mandatory fields (version 1 B + epoch 8 B + root_key 32 B + send_epoch_key 32 B + recv_epoch_key 32 B + local_fp 32 B + remote_fp 32 B + send_count 4 B + recv_count 4 B + prev_send_count 4 B + ratchet_pending 1 B = 182 bytes): 182 + 5 + 8 = 195 bytes. Any blob shorter than 195 bytes cannot represent a valid ratchet state and MUST be rejected with InvalidData. The reference implementation rejects such blobs during parsing — the field reader exhausts the buffer and returns InvalidData mid-parse rather than via an upfront length guard. Reimplementers who want a fast-reject pre-check SHOULD add an explicit if blob.len() < 195 { return Err(InvalidData) } guard before beginning field-by-field parsing; the reference implementation relies on parser exhaustion for equivalent behavior. The 195-byte floor is a format floor, not a state floor: no valid ratchet session ever produces a blob this small in practice. Alice's initial state (after ratchet_init_alice) includes send_ratchet_sk (2,432 bytes) and send_ratchet_pk (1,216 bytes), making her minimum blob approximately 3,849 bytes. Bob's initial state (after ratchet_init_bob) includes recv_ratchet_pk (1,216 bytes) and peer_ek (already absorbed), making his minimum exactly 1,413 bytes (see F.21 for the full field-by-field breakdown). The 195-byte check is a fast-reject sanity guard — implementations should not size parsing buffers based on it.

The convenience function from_bytes(blob) (without min_epoch) exists for use cases where anti-rollback is managed externally or is inapplicable (e.g., in-memory round-trip during migration). It is equivalent to from_bytes_with_min_epoch(blob, 0) and provides no rollback protection. Implementations that persist ratchet state MUST use from_bytes_with_min_epoch. from_bytes is deprecated at the Rust API level (using it produces a compiler warning). Binding authors SHOULD NOT expose it as a public API — expose only from_bytes_with_min_epoch and let callers pass min_epoch = 0 explicitly when they want no rollback protection.

Anti-rollback failure recovery: When from_bytes_with_min_epoch rejects a blob due to epoch rollback (guard 12), the session is permanently broken — the persisted state has been rewound to a prior epoch, which would cause AEAD nonce reuse if accepted. The application MUST discard the session and initiate a new LO-KEX exchange (§5). Reimplementers MUST NOT retry with an older blob, silently fall back to stale in-memory state, or attempt to "repair" the epoch counter. The only safe recovery path is full session re-establishment.

InvalidData ambiguity from from_bytes_with_min_epoch: Both epoch-rollback rejection (guard 12) and structural blob corruption (guards 2-11, 13-23, 25) return InvalidData. A caller who needs to distinguish "blob is stale, need new KEX" from "blob is corrupted, check backups" cannot do so from the error alone. The recovery action is identical in both cases — discard the session and establish a new one via LO-KEX — so the ambiguity has no practical consequence.

Diagnostic pattern for higher-level APIs: Appendix E and §13.5 note that from_bytes (the no-min-epoch variant) is deprecated and SHOULD NOT be exposed as a public API in higher-level bindings. A reimplementer who needs to distinguish epoch-rollback from structural corruption without exposing from_bytes should implement inspect_version(blob: &[u8]) -> Result<u8, Error> — a function that reads only the first byte of the blob and returns the version without parsing or validating any other field. This is not a full deserialization; it carries no mutation risk. If inspect_version returns UnsupportedVersion and the byte is 0x01, the caller knows the blob is current-version (so not a version mismatch) but epoch-rollback can then be confirmed by comparing blob[1..9] (the epoch u64 BE) against min_epoch. This diagnostic should be implemented at the application layer using the raw blob bytes, not by calling the library's deprecated from_bytes. from_bytes is safe to call for purely diagnostic purposes (it is read-only and never mutates external state), but exposing it publicly encourages callers to use it as a primary deserialization path, bypassing anti-rollback protection.

6.9 Implementation Notes

Requirements for implementers:

Test vectors: Ship comprehensive vectors covering: in-order, out-of-order, KEM ratchet step, previous-epoch message, duplicate detection.
State serialization: Round-trip serialize/deserialize ratchet state and verify continued operation.
Fuzzing: Fuzz the decryption path with random headers. Verify no panics, no state corruption.
Session reset as escape hatch: On unrecoverable decryption failure, fall back to session reset (§6.10). Lost messages are unavoidable but preferable to permanent communication failure.
Defensive validation: Before each operation, check invariants (counters non-negative, expected keys present, epoch keys non-zero). Violation → trigger session reset.

6.10 Session Reset

reset(state): Zeroizes all key material (root_key, send_epoch_key, recv_epoch_key), drops all optional keys (send_ratchet_sk/pk, recv_ratchet_pk, prev_recv_epoch_key, prev_recv_ratchet_pk), resets all counters to 0, clears recv_seen and prev_recv_seen, sets ratchet_pending = false, and resets epoch to 0. The all-zero root_key serves as the liveness sentinel — subsequent encrypt/decrypt calls detect the dead session via constant-time comparison (§6.5, §6.6). The fingerprints (local_fp, remote_fp) are also zeroized to prevent information leakage from the dead state. Caller obligation: after reset(), the ratchet handle no longer identifies which peer this session belonged to — both fingerprints are zero. Applications that need to associate the handle with a peer identity after a reset (e.g., to display the peer name or verify a new LO-KEX exchange) MUST store the fingerprints independently (in application state) before calling reset(). The library cannot preserve them across reset.

Destructor zeroization obligation: Implementations MUST zeroize all key material when the ratchet state object is deallocated (destructor, finalizer, or equivalent), not only on explicit reset() calls. In Rust, this is achieved by implementing Drop to call reset() — any abandoned state (error path, lost reference, scope exit) is automatically zeroized. Non-Rust implementations must arrange equivalent behavior: a C implementation must zeroize in the free function, a Go implementation in a finalizer, a Python implementation in __del__ (with a note that CPython finalizer timing is non-deterministic — consider explicit close() as the primary path). Without destructor zeroization, every error path that drops a session object leaks key material to the process heap.

reset() followed by to_bytes() produces a blob that from_bytes_with_min_epoch rejects for any min_epoch ≥ 1: reset() sets epoch = 0. to_bytes increments the epoch counter before writing it (§6.8), so a reset state serializes with epoch = 1 in the blob. from_bytes_with_min_epoch(blob, min_epoch) requires blob.epoch > min_epoch (§6.8 guard 12). If the application has previously persisted a blob from an active session with min_epoch = N (where N ≥ 1), loading the reset-state blob with that same min_epoch fails: 1 > N is false for any N ≥ 1. This is correct behavior — a reset session is cryptographically equivalent to a new session, not a continuation of the old one; the old min_epoch correctly rejects it. Application implication: after calling reset(), the application MUST treat the session as a new session (no persisted min_epoch applies). If the application stores the old min_epoch persistently and uses it for all subsequent from_bytes_with_min_epoch calls, the reset session can never be loaded after its first successful to_bytes. The correct recovery path after reset() is to discard the old min_epoch store and establish a new LO-KEX session, not to attempt to serialize and reload the reset state under the old min_epoch.

reset() MUST NOT acquire locks or reentrancy guards: reset() is called from the Drop implementation (destructor), which executes whenever the ratchet state is deallocated — including on error paths that may occur while holding internal session locks. If reset() itself attempts to acquire a lock or reentrancy guard that is already held by the calling context, it will deadlock. reset() must be callable from any context, including from within a failed encrypt() or decrypt() call. The implementation must ensure that the reset() code path is a simple zeroization sequence with no blocking operations, no mutex acquisitions, and no calls to any function that could itself block or fail. In Rust, the Drop implementation that calls reset() satisfies this by design — Rust's ownership model prevents calling reset() on a borrowed-while-mutating object. In C/Go/Python, where the equivalent "destructor" is called explicitly or via finalizer, implementors must audit the reset() call path for lock acquisitions.

When ratchet state is unrecoverable (Protocol Spec §12.13):

Both parties discard all ratchet state for the peer.
Resetting party fetches fresh pre-keys → performs new LO-KEX.
New session is cryptographically independent (new EK, new shared secrets, new state).
Messages encrypted under the old session that were not yet delivered become permanently undecryptable. This is unavoidable.
Verification phrase (§9) is unchanged (depends only on IKs) — confirms identity continuity.

6.11 Bandwidth

Per same-direction message	Per ratchet step
~1216 B (ratchet_pk always in header)	~2336 B (ratchet_pk + kem_ct; raw KEM fields only; full header = 2,347 B per Appendix C)

The full ratchet public key is included in every message header so the recipient always knows the sender's current ratchet key. This provides an implicit consistency check.

Exact encoded sizes: The approximate values above reflect only the dominant fields. The complete encode_ratchet_header output (§7.4) includes the has_kem_ct flag (1 byte), counter n (4 bytes), and previous-epoch count pn (4 bytes) in addition to the public key and ciphertext. Exact values: 1,225 bytes without KEM ciphertext (1216 + 1 + 4 + 4), 2,347 bytes with KEM ciphertext (1216 + 1 + 2 + 1120 + 4 + 4) — see Appendix C (encode_ratchet_header). The table above uses "~" because those sizes reflect only the dominant network cost relative to payload size; the actual wire overhead is the Appendix C values.

6.12 Voice Call Key Derivation

Call encryption key material for E2EE voice calls is derived from the ratchet root key and an ephemeral X-Wing shared secret exchanged during call signaling. The ephemeral KEM provides forward secrecy independent of the ratchet state — if the root key is later compromised (before the next KEM ratchet step), the call content remains confidential.

Group calls use the same mechanism: each participant derives call keys with every other participant via their existing pairwise ratchet sessions. The server acts as an SFU (Selective Forwarding Unit), routing encrypted media packets without decryption. Specifically: the call_id is shared across all participants (the initiator generates it once and distributes it via ratchet-encrypted signaling messages to each participant). The ephemeral KEM exchange is per-pair — each pair of participants performs an independent CallOffer/CallAnswer exchange over their pairwise ratchet session, producing a unique kem_ss per pair. Each pair therefore derives independent call keys from their unique (root_key, kem_ss, call_id) triple. The shared call_id provides a common call identifier for the application layer; it does not weaken key independence because kem_ss differs per pair.

Call Setup Protocol

Initiator generates call_id = random_bytes(16) and an ephemeral X-Wing keypair (ek_pub, ek_sk) = XWing.KeyGen(). Sends CallOffer { call_id, ek_pub } to the peer as a ratchet-encrypted message. call_id MUST be unique per call — reusing a call_id with the same root_key and kem_ss produces identical HKDF inputs and therefore identical call keys. The 128-bit random generation provides collision resistance (~2⁻⁶⁴ birthday bound), but implementations MUST NOT use predictable or sequential call IDs. ek_pub lifecycle: ek_pub is a public key — it is non-secret, transmitted in CallOffer, and can be retained or discarded after transmission without security consequence. ek_sk is secret and MUST be retained by the initiator until CallAnswer arrives (it is needed for decapsulation) and MUST be zeroized immediately after XWing.Decaps completes — or on call rejection/cancellation/timeout if no CallAnswer ever arrives. The initiator does not retain ek_pub for any cryptographic purpose after transmission; the responder holds it for KDF_Call's info field (which only uses fingerprints — see §6.12 HKDF derivation). Neither party uses ek_pub after derive_call_keys returns.
Responder encapsulates to the ephemeral public key: (ct, kem_ss) = XWing.Encaps(ek_pub). Sends CallAnswer { call_id, ct } as a ratchet-encrypted message. XWing.Encaps consumes CSPRNG randomness (ML-KEM encapsulation draws randomness for the ciphertext); at the CAPI level, CSPRNG failure aborts the process (§13.2). At the Rust API level, CSPRNG failure returns Internal (structurally unreachable on standard OSes — see §6.5). The CallAnswer MUST NOT be sent if encapsulation fails.
Initiator validates the CallAnswer call_id matches the CallOffer call_id before proceeding — mismatched call IDs indicate a confused-deputy or replay attack and MUST be rejected. Then decapsulates: kem_ss = XWing.Decaps(ek_sk, ct). Zeroizes ek_sk immediately. If CallAnswer never arrives (peer rejects, times out, or network failure), ek_sk must still be zeroized — the application layer MUST zeroize the ephemeral secret key on call rejection, timeout, or cancellation, not only on successful decapsulation. Failure to do so leaves a decapsulation key in memory indefinitely, recoverable via memory scanning.
Both parties derive call keys:

fp_lo, fp_hi = sort(local_fp, remote_fp)   // canonical order: lower first
                                            // Equal fingerprints are rejected upstream
                                            // (guard 19 at deserialization, and init_alice/init_bob
                                            // reject local_fp == remote_fp) — no tiebreaker branch
                                            // is needed here.
call_keys = HKDF(
    salt = root_key,
    ikm  = kem_ss || call_id,    // 32 + 16 = 48 bytes, raw concatenation (no length prefixes)
    info = "lo-call-v1" || fp_lo || fp_hi,  // 10 + 32 + 32 = 74 bytes, raw concatenation (no length prefixes)
    len  = 96
)

key_a      = call_keys[0..32]
key_b      = call_keys[32..64]
chain_key  = call_keys[64..96]

Initial key semantics (step 0): key_a and key_b from derive_call_keys are immediately usable for the first rekeying interval — step_count starts at 0 and the initial keys are the step-0 keys. The first call to AdvanceCallChain produces step-1 keys. Implementations MUST NOT call AdvanceCallChain before using the initial keys — both parties would begin at step 1, producing compatible keys, but a party that calls advance before first use and a party that does not would be off by one with no error or diagnostic.

Role assignment: The party with the lexicographically lower identity fingerprint (unsigned byte-by-byte comparison, left to right) uses key_a as their send key and key_b as their recv key. The other party reverses the assignment.

call_id is an opaque 16-byte blob — no UUID normalization: call_id is raw bytes concatenated directly into the HKDF IKM. No byte-order conversion, UUID formatting, or canonical representation is applied. Two implementations that both generate UUIDs MUST concatenate the same raw byte representation — e.g., if both use RFC 4122 little-endian UUID bytes (as in .NET Guid.ToByteArray()), both must concatenate exactly those bytes. An implementation that converts to network byte order or uses a different UUID serialization will silently derive different call keys. The safest approach: generate call_id = random_bytes(16) and treat those 16 bytes as opaque throughout the call lifecycle, never reinterpreting them as a UUID.

No length prefixes in info or IKM: Neither the info nor the ikm fields use length-prefixed encoding. Unlike KDF_KEX (§5.4), which applies len(x) || x format to each info component, derive_call_keys concatenates all fields raw. A reimplementer who applies the §5.4 convention to info would prepend \x00\x0a, \x00\x20, \x00\x20 before each component — producing a 80-byte info input instead of 74, silently incompatible call keys. The fixed sizes of all three info fields ("lo-call-v1" = 10 bytes, each fingerprint = 32 bytes) make length prefixes redundant for disambiguation; their omission is intentional, not an oversight.

Why call_id is in IKM, not info: call_id goes in IKM (alongside kem_ss) as defense-in-depth against KEM randomness failure. If the KEM's random number generator is compromised or biased, a unique call_id in IKM introduces variability into the HKDF extraction phase — different calls produce different extracted keys even if kem_ss is identical. Fingerprints go in info because their secrecy is not required (they are public values providing domain separation). Moving call_id to info would produce a subtly weaker construction: with a non-random KEM, all calls between the same pair would derive identical keys regardless of call_id.

kem_ss is zeroized immediately after HKDF. The 48-byte ikm concatenation buffer (kem_ss || call_id) also contains a copy of kem_ss and MUST be zeroized independently — zeroizing kem_ss alone leaves this copy in memory. In Rust, the ikm buffer is a plain [u8; 48] (Copy type, not a Zeroizing wrapper) and is explicitly zeroized via ikm.zeroize() immediately after HKDF — wrapping a Copy array in Zeroizing would receive a bitwise copy and leave the original on the stack, so the call-site zeroization is required; C/Go/Python implementations MUST explicitly zeroize the 48-byte concatenation buffer before freeing it (e.g., memset_s(ikm, 0, 48) in C), not just the 32-byte kem_ss. Contrast with §5.4's multi-paragraph IKM zeroization rationale — the same obligation applies here because ikm contains a bitwise copy of kem_ss. The ratchet state is not modified — root_key is read but not advanced. Multiple concurrent calls can each invoke derive_call_keys independently; since root_key is read-only, these derivations are idempotent with respect to the ratchet and do not interfere with each other or with ongoing message encryption. derive_call_keys reads root_key directly from live ratchet state via the RatchetState::derive_call_keys(&self, kem_ss, call_id) method, which does not accept root_key as a parameter. The core library also exports a standalone call::derive_call_keys(root_key, kem_ss, call_id, local_fp, remote_fp) function that accepts root_key explicitly — this exists for CAPI use (where the ratchet handle provides root_key and fingerprints internally) and for unit testing. Binding authors and reimplementers SHOULD use the RatchetState method, not the standalone function. The standalone function is safe only when the caller guarantees the §6.12 protocol requirement (no KEM ratchet step between signaling and derivation); the method enforces this implicitly by reading live state at call time. A reimplementer who designs the function to accept a root_key parameter (allowing the caller to snapshot root_key at CallOffer time and pass it later) creates an epoch-sync hazard: if a KEM ratchet step fires between CallOffer and derive_call_keys, the snapshot holds the pre-ratchet root_key while the peer's live state has already advanced — producing incompatible call keys with no diagnostic. The §6.12 protocol requirement (no KEM ratchet step between signaling and derivation) makes the live-read safe; a parameter-passing design would require the caller to enforce the same invariant externally.

Root key epoch sync: Both parties must call derive_call_keys at the same ratchet epoch — i.e., with the same root_key snapshot. A KEM ratchet step between sending CallOffer and calling derive_call_keys (or between receiving CallAnswer and deriving) advances root_key on one side, producing incompatible call keys with no error or diagnostic (HKDF succeeds, but the other party's derivation uses the old root_key). Protocol requirement (normative): no KEM ratchet step (triggered by sending or receiving a ratchet message with a new ratchet_pk) may occur between the CallOffer/CallAnswer exchange and the derive_call_keys call on either side. Implementations MUST follow this ordering: initiator MUST call derive_call_keys immediately after sending CallOffer and before processing any further ratchet messages; responder MUST call derive_call_keys immediately after sending CallAnswer and before processing any further ratchet messages. This is not advisory — violating this order produces silently incompatible call keys with no error or diagnostic on either side.

Enforcement window boundaries and concurrent ratchet-step messages: The enforcement window opens when CallOffer is sent (initiator) or received (responder) and closes when derive_call_keys is called on the same side. The window is bounded by the round-trip time of the signaling exchange — in practice, tens to hundreds of milliseconds. A ratchet message that arrives during this window and triggers a KEM ratchet step (because it carries a new ratchet_pk) MUST be queued and not processed until after derive_call_keys is called. The implementation MUST NOT decrypt the message or advance root_key while the enforcement window is open. The queue depth is bounded by the number of ratchet messages that can arrive during a single round trip — in practice, 0-5 messages. If a ratchet-step message is queued for more than a configurable timeout (LO Protocol defines the signaling timeout; typical value: 5 seconds), the call setup is considered failed and all signaling state is discarded. The definition of "enforcement window" and the specific queueing mechanism are deferred to the LO Protocol Specification. soliton enforces only the invariant that root_key MUST NOT advance between CallOffer/CallAnswer and derive_call_keys; the protocol layer is responsible for implementing the queueing and timeout behavior.

Signaling Messages

All signaling messages are encrypted via the existing LO-Ratchet session:

CallOffer { call_id, ek_pub } — initiator → peer
CallAnswer { call_id, ct } — peer → initiator
CallHangup { call_id } — either direction
CallReject { call_id } — peer declines

These are application-layer message types. soliton provides only the key derivation for signaling; signaling message encoding and transport are application concerns.

Frame encryption is also application-layer: soliton delivers two raw 32-byte symmetric keys (key_a and key_b) and does not define a frame cipher, nonce scheme, or frame AAD structure. The application is responsible for choosing an AEAD algorithm for media frames, constructing per-frame nonces (e.g., from frame sequence numbers), defining what AAD (if any) to include in frame authentication, and handling frame loss or reordering. A common approach is XChaCha20-Poly1305 with a 64-bit monotonically increasing frame counter as the nonce (zero-padded to 24 bytes); however, soliton does not mandate this. The keys produced by derive_call_keys are suitable inputs to any 256-bit AEAD; the choice of frame AEAD is outside the scope of this specification.

Intra-Call Rekeying

The call chain key supports periodic rekeying for forward secrecy within the call:

function AdvanceCallChain(chain_key):
    // step_count is checked before derivations (guard fires when step_count >= 2²⁴).
    // All three HMAC derivations execute only when the guard does not fire.
    key_a'     = HMAC-SHA3-256(chain_key, [0x04])    // single-byte data
    key_b'     = HMAC-SHA3-256(chain_key, [0x05])    // single-byte data
    chain_key' = HMAC-SHA3-256(chain_key, [0x06])    // single-byte data

    // Zeroize old chain_key
    step_count += 1    // incremented AFTER all three derivations, on the success path only;
                       // not incremented on the exhaustion path (step_count stays at 2²⁴
                       // on every post-exhaustion call — see exhaustion pseudocode below).
    // Role assignment (key_a' → send or recv) is preserved from initial derivation.
    // Mechanism: derive_call_keys returns a lower_role: bool computed from fingerprint
    // comparison (§6.12 step 5). The caller stores this bool and passes it to every
    // AdvanceCallChain call: lower_role=true → key_a' is the send key, key_b' is recv;
    // lower_role=false → key_b' is send, key_a' is recv. A reimplementer who does not
    // persist lower_role gets swapped send/recv keys on every advance, with no error.
    return (key_a', key_b', chain_key')

On exhaustion (step_count >= 2²⁴), all three key fields are zeroized before returning:

    Zeroize key_a
    Zeroize key_b
    Zeroize chain_key
    return ChainExhausted

On exhaustion, all three key fields (key_a, key_b, chain_key) are zeroized — not just chain_key.

Each advance produces fresh call encryption keys and a new chain key. The old chain key and call keys are zeroized. Compromise of a later call key does not reveal earlier media segments.

step_count is not an HMAC input: AdvanceCallChain takes only chain_key as input; step_count does not feed into the HMAC derivation. It is a pure exhaustion counter — tracking how many advances have occurred to enforce the 2²⁴ limit. Why this is safe: chain_key itself advances monotonically via HMAC one-way function at each step, providing implicit domain separation — step N's output is independent of step N−1's because it is derived from a different key (the previous step's output). In contrast, KDF_MsgKey (§6.3) reuses the same epoch_key for every message in an epoch, making the counter essential to distinguish per-message derivations. AdvanceCallChain does not reuse the key: each step produces a fresh chain_key' that is HMAC-derived from the prior chain_key, so including step_count in the data argument would be redundant. A reimplementer familiar with KDF_MsgKey must not apply the same pattern here: including step_count in the data argument would produce a different chain key at every step N > 0, making every derived call key incompatible with the reference implementation despite producing no error.

The rekey interval is an application-layer decision (e.g., every 30 seconds or every N encrypted frames). soliton provides the advance() primitive; the application controls when to call it. Both parties MUST advance the chain in lockstep — mismatched step_count values produce incompatible keys with no error or diagnostic. The synchronization mechanism is application-layer (e.g., include step_count in encrypted media frame headers; the receiver advances to match before decrypting). When the receiver's step_count is behind the sender's, it must call advance() sequentially N times to catch up — there is no shortcut (each step requires the previous chain key as input). A reasonable fast-forward tolerance is application-specific, but implementations SHOULD cap the maximum gap (e.g., 1000 steps) and treat larger gaps as session corruption rather than attempting a potentially expensive sequential catch-up. Once desynchronization is detected (receiver cannot decrypt at any plausible step_count offset within the tolerance window), the call's key material is irrecoverable — a new call must be established via the §6.12 setup protocol.

Call chain exhaustion: AdvanceCallChain has a hard limit of 2²⁴ (16,777,216) advances. The internal step_count starts at 0 and is checked (step_count >= 2²⁴) before the HMAC derivations and before any increment — the last permitted advance occurs when step_count = 2²⁴ − 1, after which step_count increments to 2²⁴ and the next call returns ChainExhausted. Fencepost note: at step_count = 2²⁴ − 1, the guard (2²⁴ − 1) >= 2²⁴ is false, so all three HMAC derivations execute and step_count increments to 2²⁴. At step_count = 2²⁴, the guard 2²⁴ >= 2²⁴ is true, so the guard fires first — no derivations run, step_count is NOT incremented further, and all key material is zeroized before returning ChainExhausted. A reimplementer who places the increment before the guard (step_count += 1; if step_count > 2²⁴) exhausts one step earlier (last advance at step_count = 2²⁴ − 2). A reimplementer who places the increment after the guard but also after the derivations, on the same branch that returns the keys, must ensure that guard-fires-and-no-increment and derivations-succeed-and-increment are handled by separate code paths — conflating them would increment step_count to 2²⁴ + 1 after the last permitted advance, causing step_count to wrap if stored in a u32 (though the reference implementation uses a u32 with wrapping prevented by the guard). On exhaustion, all call key material (key_a, key_b, chain_key) is zeroized — the call's forward-secrecy chain is permanently terminated. A new call must be established via §6.12's setup protocol. The limit prevents counter overflow in the internal chain advancement and bounds the total key material derivable from a single call chain. At a 30-second rekey interval, 2²⁴ advances corresponds to ~16 years of continuous call — the limit is not reachable in practice but is enforced as defense-in-depth.

step_count MUST be stored as a u32 or wider: A narrower type (u8, u16, u24) wraps before reaching 2²⁴, silently disabling the exhaustion guard. For example, a u24 implementation wraps to 0 after 16,777,216 advances — subsequent calls pass the guard 0 >= 2²⁴ = false and continue deriving keys indefinitely. The reference implementation uses a u32 (which can represent values up to ~4.3 × 10⁹, well above 2²⁴ = 16,777,216). Reimplementers MUST use a type that can represent 2²⁴ without wrapping.

ChainExhausted from advance() — exhausted handle is NOT auto-freed; caller MUST free it: Key material is zeroized on exhaustion, but the CallKeys allocation is NOT deallocated. The handle remains allocated and must be explicitly freed by the caller (soliton_call_keys_free in the CAPI; Rust's Drop via the normal ownership path). Failing to free the handle leaks the (now-zeroed) allocation. soliton.h OWNERSHIP note conflict: The generated header may carry a comment stating that after ChainExhausted, "all keys are zeroized — handle is dead." The phrase "handle is dead" means the handle is no longer usable for advance() — it does NOT mean the handle was auto-freed. Specification.md is normative: the handle is live, key material is zeroed, and the caller is responsible for freeing it. A binding author who reads "handle is dead" as "already freed" will double-free the handle. The correct action after ChainExhausted from advance() is: (1) record that the call session is exhausted, (2) free the handle via the standard free function, (3) establish a new call via derive_call_keys().

CAPI handle lifetime: Zeroization of key material does not deallocate the call chain handle. CAPI callers MUST still call the handle's destroy/free function after receiving ChainExhausted — failure to do so leaks the (now-zeroed) allocation. The handle remains valid for destruction but invalid for further advance() calls.

Post-exhaustion idempotency: After the first ChainExhausted, every subsequent call to advance() also returns ChainExhausted and unconditionally zeroizes key material. Because the keys were already zeroed on the first exhaustion, the re-zeroization is a no-op, but implementations MUST NOT guard against re-zeroization ("skip if already exhausted") — the unconditional behavior ensures that a caller who ignores the first ChainExhausted cannot obtain stale key material from a later call. step_count is not reset; it stays at 2²⁴. step_count is an internal counter — it is NOT exposed via the Rust API or CAPI. The only externally observable signal of exhaustion is the ChainExhausted return value from advance(). Callers MUST check the return value; there is no way to query exhaustion state without calling advance().

Security Properties

Input validation: derive_call_keys rejects all-zero root_key (dead session — liveness sentinel, constant-time check), all-zero kem_ss (degenerate KEM output — cryptographically implausible but structurally guarded, constant-time check), all-zero call_id (uninitialized identifier, variable-time — call_id is non-secret), and local_fp == remote_fp (self-call — collapses role assignment, variable-time — fingerprints are public). All four checks return InvalidData. The equal-fingerprint check is critical: with equal fingerprints, the strict < comparison in role assignment evaluates to false for both parties, so both assign key_a as recv and key_b as send — symmetric key confusion where each party encrypts with the key the other expects to decrypt with. This is distinct from the ratchet's local_fp ≠ remote_fp guard (§6.8 guard 19), which protects AAD symmetry.

Call key secrecy: Requires both root_key and kem_ss. The root key is bound via HKDF salt; the ephemeral KEM shared secret is bound via IKM.

Epoch binding via root_key as HKDF salt: Including root_key as the HKDF salt binds call keys to the current ratchet epoch. Call keys derived at ratchet epoch E are independent of those at epoch E+1 for the same (kem_ss, call_id) triple — advancing the ratchet between two calls changes root_key, producing completely different call keys even if the same ephemeral KEM exchange is reused. A reimplementer who uses a fixed salt (e.g., an empty salt or a static label) instead of root_key removes this epoch isolation: all calls between the same pair with the same call_id derive identical keys regardless of ratchet epoch, making past call keys recoverable from any future epoch compromise.

Forward secrecy (ephemeral KEM): The ephemeral keypair is generated per call and zeroized after derivation. Later compromise of root_key does not reveal call content — kem_ss is no longer recoverable.

Defense-in-depth (post-quantum): root_key in the HKDF salt carries the ratchet's accumulated post-quantum security. If the ephemeral KEM is broken by a quantum computer, root_key still protects. If root_key is compromised classically, the ephemeral KEM still protects.

Intra-call forward secrecy: AdvanceCallChain is one-way (HMAC-based PRF). Old chain keys are zeroized.

No ratchet state mutation: The ratchet operates independently during calls. Text messages advance the ratchet as normal.

CallKeys is intentionally ephemeral — no serialization path: There is no to_bytes/from_bytes API for CallKeys. Call key material is not designed to survive process restarts or be persisted to storage. If a call is interrupted (network failure, OS suspend, process crash), the CallKeys handle is lost and the call's key material is unrecoverable. The correct response is to re-establish the call via the §6.12 setup protocol (derive_call_keys on the current ratchet state after a new CallOffer/CallAnswer exchange) — the resulting new call keys will be independent of the interrupted call's keys, providing per-call forward secrecy. A reimplementer who adds a serialization path for CallKeys (to survive restarts) undermines this forward-secrecy property — a leaked blob of serialized call key material recovers the call's media encryption keys without any KEM secrets.

6.13 Design Rationale: Per-Epoch vs Per-Message Forward Secrecy

LO-Ratchet provides forward secrecy at epoch granularity (per KEM ratchet step), not per message. This is a deliberate departure from the Signal Double Ratchet, which provides per-message forward secrecy via a sequential KDF chain.

What per-message forward secrecy protects against: An attacker who compromises the chain key at position N can derive message keys N+1, N+2, ... but not 0, 1, ..., N-1. This matters only if the attacker obtains the chain key but not the root key or ratchet secret key.

Why this threat model is unrealistic: The RatchetState struct contains the epoch key, root key, and ratchet secret key at adjacent memory addresses. The root key is strictly more powerful (it derives all future epoch keys). The ratchet secret key enables decapsulating all future KEM ratchets. Any memory compromise that extracts the epoch key — buffer overread, memory dump, side-channel attack — extracts these adjacent secrets with overwhelming probability. Per-message forward secrecy protects against an attacker who can surgically extract exactly 32 bytes from a known offset and nothing else. This is not a realistic attack.

What we gain by dropping it:

O(1) out-of-order handling: Any message key is derivable directly from the epoch key and counter. No skip cache, no TTL expiry, no purge throttling.
~300 fewer lines of code: The skip cache was the most error-prone component (§6.9 in prior versions explicitly warned about this).
Simpler serialization: No variable-length skip cache in the wire format.
Reduced memory: No HashMap of 32-byte message keys (up to 3000 entries / ~96 KB). Duplicate detection uses 4-byte counters.
No TooManySkipped error: The skip-amplification DoS vector is eliminated entirely.

What we keep: Forward secrecy across epochs (KEM ratchet), break-in recovery (fresh KEM randomness), unique per-message keys (AEAD security), and a one-epoch grace period for late-arriving messages (via prev_recv_epoch_key).

Post-compromise security (PCS): PCS holds at epoch granularity. After an attacker compromises ratchet state at time t₁, a subsequent KEM ratchet step at time t₂ > t₁ with fresh (uncompromised) encapsulation randomness re-establishes confidentiality for the new epoch. The attacker's knowledge of state at t₁ does not yield epoch keys derived after the KEM ratchet step, because the fresh KEM shared secret is unknown to the attacker. Recovery depends on the sender's (encapsulator's) randomness being honest — the receiver contributes no fresh randomness to the KEM ratchet step (§14.11). For formal models: Corrupt(state, t₁) does not imply knowledge of epoch keys from a KEM ratchet step at t₂ > t₁, provided the encapsulator's CSPRNG is uncompromised at t₂.

PCS healing boundary — decapsulation time, not encapsulation time: PCS healing completes when the decapsulator successfully decrypts the first message in the new epoch, not when the encapsulator generates the KEM ciphertext. The encapsulator's fresh kem_ss is produced at t₂ (encapsulation), but recv_epoch_key on the decapsulator's side does not update until the first new-epoch message is successfully decrypted (at t₃ ≥ t₂). An adversary who compromises state at t₁, observes the KEM ratchet ciphertext at t₂, but suppresses delivery until t₃ can still decrypt all messages in the new epoch that are delivered before t₃ — because the decapsulator continues using the compromised recv_epoch_key until t₃. After t₃, the decapsulator holds the healed recv_epoch_key derived from the attacker-unknown kem_ss. For formal models: use Healed(session, t₃) where t₃ is the time of the first successful new-epoch decryption, not t₂.

Two KEM ratchet steps for complete PCS healing: A single KEM ratchet step derives a new recv_epoch_key unknown to the attacker — new-epoch messages are immediately protected. However, prev_recv_epoch_key (the compromised epoch key) persists through the first step, retained for the one-epoch late-message grace period. An adversary who compromises state immediately after the first KEM ratchet step can still read previous-epoch messages (via prev_recv_epoch_key). Full PCS healing — where the adversary retains no access to any compromised key material — requires two KEM ratchet steps: the first step rotates the compromised key into prev_recv_epoch_key; the second step overwrites prev_recv_epoch_key with the then-current epoch's key, permanently discarding the originally compromised material. For formal models: define FullyHealed(session, t₄) where t₄ is the time of the second successful new-epoch decryption (not just the first). The §14.17 cross-reference uses Healed(session, t₃) to mean "new epoch is healed"; formal models that also require the previous epoch's key to be gone MUST use FullyHealed(session, t₄).

PCS does NOT cover:

Already-compromised epochs: Messages encrypted before the healing KEM ratchet step remain compromised. PCS is forward-looking — it restores confidentiality for future epochs, not retroactively for past ones.
Compromised encapsulator randomness: If the attacker controls the sender's CSPRNG at the time of the KEM ratchet step, the fresh kem_ss is known to the attacker and the step does not heal. Recovery requires at least one KEM ratchet step with honest randomness (§14.11).
Active attacker participating in the KEM exchange: If the attacker can substitute ratchet_pk in a message header (man-in-the-middle on the message transport), the KEM encapsulation targets the attacker's key rather than the peer's. AEAD authentication prevents this in normal operation (the header is bound into the AAD), but a full state compromise at t₁ gives the attacker enough material to forge headers until the next honest KEM ratchet step.
One-directional sessions: PCS requires a direction change (the peer must send a message triggering a KEM ratchet step). A one-directional stream of messages never triggers a KEM ratchet and therefore never heals.

7. Symmetric Encryption

7.1 XChaCha20-Poly1305

Algorithm: XChaCha20-Poly1305 — the 24-byte-nonce variant of ChaCha20-Poly1305. This is NOT ChaCha20-Poly1305 (RFC 8439), which uses a 12-byte nonce. Go's golang.org/x/crypto/chacha20poly1305 package exposes both under similar names: chacha20poly1305.New constructs the 12-byte (RFC 8439) variant; chacha20poly1305.NewX constructs the 24-byte XChaCha20 variant. A reimplementer who uses New instead of NewX produces incompatible ciphertext silently — both accept any 256-bit key, and the error surfaces only as AeadFailed on the receiver. Always use the 24-byte nonce (XChaCha20) variant throughout soliton.

Key: 256-bit message key from KDF_MsgKey.
Tag: 128 bits (16 bytes), appended to ciphertext.
Minimum valid ratchet ciphertext: 16 bytes (Poly1305 tag only, zero-length plaintext). Ciphertexts shorter than 16 bytes are rejected as AeadFailed (not InvalidLength — see §12 error collapse). First-message encrypted payloads have a 40-byte minimum (24-byte nonce + 16-byte tag, §5.5 Step 6). First-message minimum enforcement: decrypt_first_message also returns AeadFailed (not InvalidLength) for payloads shorter than 40 bytes — this collapses "too short to contain a valid nonce + tag" with "authentication failed" into a single error variant, preventing a distinguishing oracle: an attacker who could observe InvalidLength vs AeadFailed would learn whether the authentication attempt even ran (and at what byte offset parsing failed). A reimplementer who returns InvalidLength for sub-40-byte first-message payloads breaks this oracle-collapse guarantee.
aead_encrypt failure — AeadFailed on usize overflow: XChaCha20-Poly1305 encryption can return AeadFailed only when the plaintext length overflows internal length calculations (approximately plaintext.len() ≈ usize::MAX). This cannot occur with well-formed input bounded by the CAPI size cap (§13.4) or the storage/streaming chunk sizes. In practice, aead_encrypt is infallible for any input that passes the upstream size guards. An AeadFailed from aead_encrypt in production code indicates an integer overflow in the calling layer, not a cryptographic failure.
Constant-time by construction: ARX-based (add-rotate-xor); no table lookups, no data-dependent branches. No hardware acceleration required.

7.2 Nonce Construction

First message of a session (LO-KEX session init):

nonce = random_bytes(24)    // Prepended to ciphertext payload

All subsequent messages (LO-Ratchet):

nonce[0..24]  = 0x00{24}       // MUST zero-initialize the entire buffer first
nonce[20..24] = big_endian_32(header.n)

Implementations MUST zero-initialize the entire 24-byte nonce buffer before writing the counter bytes into positions 20-23. In C, a stack-allocated uint8_t nonce[24] contains undefined (garbage) bytes unless explicitly zeroed — memset(nonce, 0, sizeof(nonce)) MUST precede the counter copy. In Go, var nonce [24]byte zero-initializes by language specification, but nonce := make([]byte, 24) from a pool-allocated slice may not. In Rust, let mut nonce = [0u8; 24] is zero-initialized by the type. A reimplementer who writes only nonce[20..24] = BE32(n) without zeroing positions 0-19 produces a garbage-contaminated nonce — the resulting AEAD key-nonce pair may or may not be unique across messages (depending on what was on the stack), producing non-deterministic AEAD failures on the receiver.

The counter nonce is not transmitted — recipient derives from header.n. Safe because each (msg_key, n) pair is unique: msg_key is derived from a unique epoch key and counter, and the epoch key changes on every KEM ratchet step.

An all-zero nonce (counter=0) is valid: The first message of every epoch uses header.n = 0, producing a 24-byte all-zero nonce [0x00 × 24]. This is intentional and correct — XChaCha20-Poly1305 specifies no restrictions on nonce content (unlike AES-GCM, which also accepts all-zero nonces). The security guarantee comes from msg_key uniqueness (unique per (epoch_key, counter) pair), not from nonce non-zero values. Implementations MUST NOT reject or guard against an all-zero nonce. Some AEAD libraries include a "null nonce protection" heuristic that rejects all-zero nonces as likely initialization failures — such protections MUST be disabled or bypassed for XChaCha20-Poly1305 ratchet encryption. A library that returns an error for a zero nonce would silently break decryption of every epoch's first message (n=0) in every post-ratchet epoch.

The counter occupies the last 4 bytes (20-23) of the 24-byte nonce, leaving bytes 0-19 as zero.

7.3 AAD Construction

First message (session init):

aad = "lo-dm-v1"                           // 8 bytes UTF-8
   || sender_fingerprint_raw               // 32 bytes (raw SHA3-256, not hex)
   || recipient_fingerprint_raw            // 32 bytes
   || encode_session_init(session_init)    // variable, see §7.4

encode_session_init re-encoding obligation: encode_session_init MUST be called to reconstruct the canonical bytes from the parsed struct — using raw wire bytes directly is an error. Bob's obligation to re-encode is documented at §5.5 Step 3 and §13.4. The output MUST be byte-for-byte identical to Alice's encoding; any field normalization during decode that alters re-encoding causes silent AeadFailed.

Ratchet messages:

aad = "lo-dm-v1"
   || sender_fingerprint_raw
   || recipient_fingerprint_raw
   || encode_ratchet_header(ratchet_header)

This binds ALL header fields to the AEAD tag. Tampering invalidates the tag.

"lo-dm-v1" is concatenated bare — no length prefix: The 8-byte label "lo-dm-v1" is written directly into the AAD with || (byte concatenation), NOT as a length-prefixed len(x) || x field. Contrast with §5.4 HKDF info construction, where "lo-kex-v1" is also bare but explicitly noted as "raw 9-byte prefix (not length-prefixed)." The || operator in this spec always means raw byte concatenation; length prefixes are written explicitly as len(x) || x. A reimplementer who applies the len(x) || x convention from §5.4 info fields to the AAD label — prepending a 2-byte length (0x00 0x08) before "lo-dm-v1" — produces a different 10-byte prefix and silently broken AEAD on every message. The confirmed encoding: aad = b"lo-dm-v1" || sender_fp || recipient_fp || header_bytes — total prefix is 8 raw bytes.

Sender/recipient orientation: sender_fingerprint_raw is the fingerprint of the party calling Encrypt (local party); recipient_fingerprint_raw is the fingerprint of the remote party. On the decrypt side, these roles are reversed — the decryptor reconstructs AAD using the remote party's fingerprint as sender_fingerprint_raw and its own fingerprint as recipient_fingerprint_raw. Both fingerprints are stored in RatchetState as local_fp and remote_fp at init time; encrypt uses (local_fp, remote_fp), decrypt uses (remote_fp, local_fp) to reconstruct the correct AAD order.

Note: for the first message, recipient_fingerprint_raw (Bob's IK fingerprint) appears twice in aad: once as the standalone prefix field, and again inside encode_session_init(session_init) as si.recipient_ik_fingerprint. Both occurrences are intentional — the prefix provides fast lookup without parsing the encoded blob, and the embedded copy ties the fingerprint directly into the signed SessionInit. Bob does not need an explicit equality check between the two occurrences — AEAD authentication enforces consistency transitively: if an attacker substitutes a different value in either location, the AAD bytes change and the AEAD tag fails. A reimplementer who adds an explicit prefix_fp == session_init_fp check before AEAD is not wrong (it detects a specific tampering pattern), but it is redundant — the AEAD check subsumes it and adding a distinct error for the mismatch would create an error-type oracle.

7.4 Deterministic Header Encoding

AAD must be computed identically by sender and recipient. JSON is not suitable (field ordering, whitespace, encoding ambiguity). Headers are encoded as length-prefixed binary.

Length-prefix rule: Identity fingerprints (32 bytes) and public keys (1216 bytes) are written bare — their sizes are fixed by definition and cannot change across crypto versions, so the decoder always knows the exact size from the crypto_version context and needs no length prefix to parse them unambiguously. KEM ciphertexts (1120 bytes in lo-crypto-v1) are length-prefixed despite being fixed-size in the current version — forward compatibility requires the decoder to handle variable-size ciphertexts from future crypto versions (a lo-crypto-v2 could adopt a different KEM with a different ciphertext size). The crypto_version string is length-prefixed because it is genuinely variable-length. A reimplementer MUST NOT pattern-match "fixed-size → no prefix" — the ciphertexts are the exception because their size is algorithm-determined, not definitionally invariant within a crypto version. Exception to the "fixed-size fields bare" rule: DM queue AAD (§11.4.2) uses len(recipient_fp) || recipient_fp — a length-prefixed encoding for the recipient fingerprint despite it being a fixed 32-byte field. This is an intentional deviation from the general rule; see §11.4.2 for the design rationale. A reimplementer who applies the "bare" rule from this section to all fixed-size fields will produce wrong AAD in the DM queue context.

encode_session_init(si):

encode_session_init(si) =
    len(si.crypto_version)             || si.crypto_version        // UTF-8, 2-byte BE len
 || si.sender_ik_fingerprint_raw                                   // 32 bytes (fixed, no length prefix —
                                                                   // fingerprints are SHA3-256 digests with
                                                                   // definitionally invariant size; no future
                                                                   // lo-crypto version will change them)
 || si.recipient_ik_fingerprint_raw                                // 32 bytes (fixed, no length prefix — same rationale)
 || si.sender_ek                                                   // 1216 bytes (fixed, no length prefix —
                                                                   // sender_ek is an X-Wing public key whose
                                                                   // size is definitionally fixed within lo-crypto-v1;
                                                                   // identity key sizes do not change across versions)
 || len(si.ct_ik)                      || si.ct_ik                 // 1120 bytes, 2-byte BE len
                                                                   // (length-prefixed despite being fixed-size in
                                                                   // lo-crypto-v1: KEM ciphertext size is
                                                                   // algorithm-determined, not definitionally
                                                                   // invariant — a future lo-crypto-v2 could
                                                                   // select a different KEM with a different
                                                                   // ciphertext size; a decoder that hard-codes
                                                                   // 1120 bytes would misparse future session inits)
 || len(si.ct_spk)                     || si.ct_spk                // 1120 bytes, 2-byte BE len (same rationale as ct_ik)
 || big_endian_32(si.spk_id)
 || si.has_opk (1 byte: 0x01 or 0x00)
 || if has_opk: len(si.ct_opk)        || si.ct_opk
                || big_endian_32(si.opk_id)
                // When has_opk = 0x00: encoding terminates immediately here.
                // No ct_opk or opk_id bytes are written. The total encoded length
                // with has_opk = 0x00 is exactly 3,543 bytes. A reimplementer who
                // writes zero-filled placeholders (e.g., 2 bytes len + 1120 zero
                // bytes + 4 zero bytes) after the 0x00 flag produces a malformed
                // encoding that fails the strict trailing-bytes check on decode
                // (§7.4 "Trailing bytes after the last field → InvalidData").

All callers of encode_session_init MUST use a single shared implementation: Three separate callers use encode_session_init output: (1) §5.4 Step 6 (Alice signs the encoded bytes); (2) §5.4 Step 7 (Alice uses the encoded bytes as AEAD AAD); (3) §5.5 Step 3 (Bob re-encodes the received SessionInit to verify Alice's signature). All three MUST produce byte-for-byte identical output. Any divergence between the signer (1) and the verifier (3) causes VerificationFailed; any divergence between the signer (1) and the AAD builder (2) causes AeadFailed at decrypt_first_message. A reimplementer who inlines encode_session_init at each call site and introduces any encoding difference — field ordering, padding, prefix conventions — gets a silent failure with no diagnostic pointing to the encoding divergence. The correct pattern: a single encoding function called identically from all three sites.

encode_ratchet_header(rh):

encode_ratchet_header(rh) =
    rh.ratchet_pk                                                  // 1216 bytes (fixed, no length prefix —
                                                                   // ratchet_pk is an X-Wing public key with
                                                                   // a definitionally fixed 1216-byte size
                                                                   // within lo-crypto-v1; fixed-size fields
                                                                   // are written bare per the Length-prefix
                                                                   // rule above)
 || rh.has_kem_ct (1 byte: 0x01 or 0x00)
 || if has_kem_ct: len(rh.kem_ct)     || rh.kem_ct                // 1120 bytes total: X25519_eph_pk (32) || ML-KEM-768_ct (1088), LO X25519-first encoding (§8.1); 2-byte BE len
                                                                   // (length-prefixed despite being fixed-size —
                                                                   // KEM ciphertext size is algorithm-determined,
                                                                   // not definitionally invariant; a future
                                                                   // lo-crypto-v2 could select a different KEM
                                                                   // with a different ciphertext size; the
                                                                   // decoder must not hard-code 1120 bytes)
 || big_endian_32(rh.n)                                        // always present — not conditional on has_kem_ct
 || big_endian_32(rh.pn)                                       // always present — not conditional on has_kem_ct

Signing context: When encode_session_init output is used as the signed message (§5.4 Step 6), the label "lo-kex-init-sig-v1" (18 raw bytes, no length prefix) is prepended: HybridSign(sk, "lo-kex-init-sig-v1" || encode_session_init(si)). When the same output is used as the AAD component (§7.3), no prefix is added — the encoded bytes are embedded directly in the AAD alongside fingerprints and the DM label. A reimplementer reading this section as the encoding reference for what gets signed must include the label prefix; omitting it produces a valid encoding but an invalid signature.

All len() values are 2-byte big-endian. Fixed-size fields (fingerprints at 32 bytes, keys at 1216 bytes) omit length prefixes. Variable-length fields (crypto_version, ciphertexts) use 2-byte BE length prefixes (ciphertexts are fixed-size in lo-crypto-v1 but length-prefixed for forward compatibility across crypto versions — a future crypto_version may select a different underlying KEM with a different ciphertext size, so a decoder that assumes a fixed ciphertext length would misparse future-version session inits). This encoding is unambiguous, deterministic, and trivial to implement.

Decode validation: On decode, each ciphertext length prefix MUST equal XWING_CIPHERTEXT_SIZE (1120 bytes); any other value → InvalidData (not InvalidLength — this is a wire-format field violation, not a caller-supplied parameter mismatch; see §12 error semantics). A decoder that trusts the u16 prefix without validation would accept malformed blobs with truncated or oversized ciphertexts, leading to incorrect decapsulation inputs. The crypto_version field is validated as "lo-crypto-v1" (exact match); other values → UnsupportedCryptoVersion.

Encode error behavior for wrong-size kem_ct: On the encode path, encode_ratchet_header handles a kem_ct with the wrong length as follows: if ct.len() > 65535 (does not fit in a u16), the function returns Internal because the length prefix field cannot represent the value. If ct.len() <= 65535 but is not 1120 bytes, the function silently encodes the actual length — the 2-byte length prefix receives the actual (non-1120) length and all bytes are written to the buffer without any error. This is not a size-validation step; the encoder's only hard constraint is that the length fits in a u16. The wrong-size ciphertext is caught on the decode path: the decoder validates that the length prefix equals XWING_CIPHERTEXT_SIZE (1120 bytes) and returns InvalidData for any other value (per the "Decode validation" paragraph above). A reimplementer who expects the encoder to return Internal for any wrong-size ciphertext (not just the > 65535 case) will incorrectly assume that encode-side validation is a substitute for CSPRNG-correct X-Wing usage. The correct invariant: the encode path is not responsible for validating ciphertext sizes; the decode path is.

sender_ek is bare while ciphertexts are length-prefixed — encoding boundary hazard: In encode_session_init, sender_ek (1216 bytes) is written with no length prefix, but the immediately following ct_ik carries a 2-byte BE length prefix. A reimplementer reading the format as "keys have prefixes, ciphertexts have prefixes" and adding a 2-byte prefix to sender_ek shifts every subsequent field by 2 bytes: the byte at offset 1216 is parsed as the high byte of len(sender_ek) rather than the start of len(ct_ik), desynchronizing all subsequent fields with no error until the final byte-length check (if any). No prefix is added to sender_ek because its size is definitionally invariant (X-Wing public keys are always 1216 bytes); the length prefix on ct_ik (and all three ciphertexts) exists specifically because KEM ciphertext sizes are algorithm-determined and may change across future crypto_version values. The asymmetry is intentional — see the Length-prefix rule above.

Total encoded sizes for encode_session_init: Without OPK (has_opk = 0x00): 3,543 bytes total (2 + 12 + 32 + 32 + 1216 + 2 + 1120 + 2 + 1120 + 4 + 1 = 3,543). With OPK (has_opk = 0x01): 4,669 bytes total (3,543 + 2 + 1120 + 4 = 4,669). Decoders can use these totals as a quick-reject check before field-by-field parsing: any input not equal to 3,543 or 4,669 bytes MUST be rejected as InvalidData without further parsing. A decoder that accepts inputs of any size and relies solely on field-by-field parsing would accept truncated inputs that parse successfully up to a short point (e.g., a blob of 14 bytes matches the len(crypto_version) || crypto_version prefix), masking truncation bugs in test environments.

Progressive parsing note: The has_opk flag is at offset 3542 — the last byte of the fixed-size prefix. A streaming/progressive parser cannot determine the total session_init_bytes length until it has consumed all 3543 bytes. The usual trick of reading a length prefix from the first few bytes does not work here; the format is self-delimiting only after the fixed prefix is fully consumed. For encode_ratchet_header, the has_kem_ct flag is at offset 1216 (immediately after ratchet_pk, which occupies bytes 0-1215), similarly requiring the full fixed prefix.

Boolean marker byte strictness: The has_opk and has_kem_ct fields accept only 0x00 (absent) or 0x01 (present). Any other value → InvalidData. Decoders MUST NOT treat arbitrary non-zero values as "present" — doing so accepts malformed blobs and creates format malleability (multiple byte values encode the same logical state). Trailing bytes after the last field → InvalidData (strict parsing, same rationale as §6.8 guard 11).

Fixed-width integer re-encoding MUST be lossless: All fixed-width integer fields — spk_id, opk_id, n, pn (u32, 4 bytes each) — MUST re-encode at their full fixed width as big-endian, regardless of value. A field containing 0 MUST produce four zero bytes (0x00 0x00 0x00 0x00), not an empty field or a variable-length encoding. A Python reimplementer who parses spk_id = 0 into a Python int and re-encodes with a variable-length BE encoder (which might produce b'' for zero) produces different bytes from the original encoding, yielding a different AAD and permanent AeadFailed with no diagnostic. The "byte-for-byte identical" guarantee in §7.3 includes fixed-width integers, not only variable-length fields.

Truncated input: If the input is too short to contain all required fields, the decoder returns InvalidData (not InvalidLength). This includes the case where a length prefix claims more bytes than remain in the buffer — the decoder must not read past the end. Using InvalidLength would leak parser state — an attacker could probe incrementally longer inputs and observe the error transition from InvalidLength to InvalidData, revealing the byte offset where parsing progressed past the size check.

8. X-Wing KEM Details

8.1 Encoding (LO-specific)

LO uses X25519-first encoding (diverges from draft-09 which uses ML-KEM-first):

// LO encoding (X25519-first):
X-Wing public key (1216 B):  X25519_pk (32) || ML-KEM-768_pk (1184)
X-Wing secret key (2432 B):  X25519_sk (32) || ML-KEM-768_sk (2400)
X-Wing ciphertext (1120 B):  X25519_eph_pk (32) || ML-KEM-768_ct (1088)

// draft-09 encoding (ML-KEM-first) — for contrast only; LO does NOT use this:
//   public key:  ML-KEM-768_pk (1184) || X25519_pk (32)
//   secret key:  ML-KEM-768_sk (2400) || X25519_sk (32)
//   ciphertext:  ML-KEM-768_ct (1088) || X25519_eph_pk (32)

Interoperability consequence: A reimplementer who uses draft-09's ML-KEM-first layout instead of LO's X25519-first layout produces public keys, ciphertexts, and secret keys whose byte order is inverted. Encapsulation "succeeds" (no length error), but the combiner receives reversed sub-components: pk_X from the ML-KEM portion and pk_M from the X25519 portion, producing a wrong shared secret. The mismatch surfaces only as AeadFailed at the AEAD layer with no indication of which byte offset was misinterpreted. If interoperating with a draft-09-compatible library, both parties must explicitly reorder the concatenation — LO's test suite includes a KAT that reorders a draft-09 vector into LO's X25519-first layout before decapsulation.

This encoding difference is internal only — no external interop with draft-09 implementations is required. Combiner inputs are extracted correctly regardless of encoding order; the cryptographic output is identical. Canonical byte representation: The X25519 component of an X-Wing public key is the raw 32-byte little-endian u-coordinate with no bit masking applied to the public key bytes. Only the private scalar is clamped (RFC 7748 §5), and clamping is applied at use time (inside the X25519 scalar-multiplication operation — §8.2) — the stored scalar bytes are unclamped raw random bytes. Clamping is NOT applied at storage time; the secret key bytes in the X-Wing 2432-byte blob are stored without clamping. A reimplementer who pre-clamps the scalar at storage time and then clamps again at use will compute the correct result (clamping is idempotent via RFC 7748's bit mask: bits 0-2 of byte 0 are already 0 after the first clamp; bit 7 of byte 31 is already 0; bit 6 of byte 31 is already 1), but the stored bytes will differ from soliton's unclamped format, causing silent key import failures when round-tripping through the 2432-byte serialization. Some X25519 libraries clear bit 255 of the public key byte 31 — using such a library produces a different 32-byte public key than soliton's, causing silent SPK signature verification failure (the signed bytes differ from the stored/verified bytes). Reimplementers MUST verify their X25519 library does not mask public key bits.

ML-KEM-768 public key coefficient reduction — happens inside Encaps, not at from_bytes: EncapsulationKey::from_bytes is a size check only — it does not normalize coefficients. Coefficient reduction (FIPS 203 §7.2 ByteDecode_12, which silently reduces any coefficient ≥ 3329 modulo q) occurs inside ML-KEM-768.Encaps() when the encapsulation key bytes are imported for use. The practical implication: a round-trip byte-comparison test (from_bytes then to_bytes, compare to original) will NOT detect normalization incompatibilities — the stored bytes are returned verbatim because from_bytes is a pure size check. Only a shared-secret KAT (encapsulate, decapsulate, compare shared secrets) detects normalization divergence. A foreign library that stores unreduced coefficients produces encapsulation keys that produce a different shared secret after Encaps — this surfaces as AeadFailed at the AEAD layer with no indication the key was modified. Reimplementers importing ML-KEM-768 encapsulation keys from external libraries MUST verify via KAT, not via byte-comparison. The cross-check from §8.5 also applies: re-derive the public key from the decapsulation key and compare ek_PKE bytes — a mismatch indicates encoding-domain divergence (NTT vs. coefficient-domain). This normalization divergence also affects SPK signature verification: the IK signature over the SPK (produced by HybridSign in §5.3, §10.2) covers the raw 1,216-byte SPK public key bytes as stored. If a reimplementer's ML-KEM library normalizes coefficients on import (modifying the byte representation), the bytes the reimplementer would sign or verify over differ from the raw bytes stored and transmitted by the reference implementation. The SPK signature verification step (HybridVerify at §5.5 Step 3 / §5.3) would then return VerificationFailed even when the SPK is cryptographically valid — because the signed bytes and the verified bytes are different normalized representations of the same underlying key. The normalization divergence therefore causes bundle authentication failure even when no tampering occurred.

8.2 Combiner (draft-09 §5.3)

Version pinning: lo-crypto-v1 is pinned to draft-connolly-cfrg-xwing-kem-09. Any future revision that alters the combiner construction — including a published RFC that differs from draft-09 — requires a new crypto_version string (i.e., "lo-crypto-v2") for compatibility. The XWingLabel bytes and ss_M ‖ ss_X ‖ ct_X ‖ pk_X argument order are draft-09-specific; a reimplementer using a different draft or the final RFC MUST verify these values match before using this spec.

function XWing.Combine(ss_M, ss_X, ct_X, pk_X):
    return SHA3-256(ss_M || ss_X || ct_X || pk_X || XWingLabel)

XWingLabel = 0x5c 0x2e 0x2f 0x2f 0x5e 0x5c   // ASCII: \.//^\  (6 bytes, label goes LAST)

ss_M = ML-KEM-768 shared secret (32 bytes)
ss_X = X25519 DH output (32 bytes)
ct_X = ephemeral X25519 public key (32 bytes) — ciphertext[0..32] in LO's X25519-first encoding (§8.1)
pk_X = recipient X25519 public key (32 bytes) — public_key[0..32] in LO's encoding
c_M = ML-KEM-768 ciphertext (1088 bytes) — ciphertext[32..1120]. Not in the combiner formula — c_M is bound inside ss_M via ML-KEM's implicit rejection (the pseudorandom SS depends on the ciphertext).
pk_M = ML-KEM-768 public key (1184 bytes) — public_key[32..1216]. Not in the combiner formula — pk_M is bound inside ss_M on both sides: on the decapsulator side, ss_M is derived from the decapsulation key, which embeds pk_M in its ek_PKE field (§8.5); on the encapsulator side, ss_M = ML-KEM-768.Encaps(pk_M, randomness) directly consumes pk_M as an input — encapsulation is a function of the public key, so pk_M is bound to ss_M there as well. This follows draft-09 §5.3.
Hash: SHA3-256
Label position: last (changed from draft-06 which had label first)
ss_M ‖ ss_X argument order: The ss_M ‖ ss_X order is fixed by draft-09 §5.3 — it is not a local choice. Swapping them produces a different SHA3-256 output with no error signal.
Total SHA3-256 input length: 134 bytes (32 + 32 + 32 + 32 + 6 = 134). SHA3-256's rate is 136 bytes (one Keccak block absorbs all 134 bytes in a single call — no second block). A reimplementer who miscounts the input length (e.g., adding or omitting the pk_M or c_M confusion from the "not in combiner" note above) produces a different hash output with no error at the hash primitive layer.

pk_X during decapsulation: The combiner requires pk_X (the recipient's X25519 public key), but the decapsulation key contains only sk_X. The decapsulator re-derives pk_X via X25519(sk_X, G) (scalar-basepoint multiplication) each time — no separate public key storage or input is needed. G is the X25519 base point defined in RFC 7748 §6.1: the u-coordinate value 9 encoded as a 32-byte little-endian integer (09 00 00 00 ... 00). X25519 libraries expose this as their scalarmult_base operation — use the library's base-point function rather than encoding G manually. This matches soliton's secret key layout (§8.5): only the X25519 scalar and ML-KEM expanded key are stored.

The label bytes decode as: \ (0x5c), . (0x2e), / (0x2f), / (0x2f), ^ (0x5e), \ (0x5c) = \.//^\.

SHA3-256 input must be one concatenated byte string — no separators, no length prefixes between fields: The five combiner inputs (ss_M, ss_X, ct_X, pk_X, XWingLabel) are concatenated as raw bytes with no separators, no length prefixes, and no delimiter bytes between them. The total input is exactly 134 bytes (32 + 32 + 32 + 32 + 6). Some hash APIs accept multiple buffers via repeated update() calls or a variadic array; these are equivalent to concatenation only when the underlying hash is a sponge/Merkle-Damgård construct that processes data chunk-boundary-invariantly. However, certain framework wrappers or tree-hash APIs (Merkle-tree SHA3, protocol-framing helpers, "typed" hash APIs) insert domain separation bytes or length prefixes between update() calls. Using such an API produces a different SHA3-256 output even when the individual fields are correct. The correct API call is either: (a) concatenate all five fields into a 134-byte buffer and call SHA3-256 once, or (b) use a streaming SHA3-256 context with raw update() — no other arguments, wrappers, or domain separation. Verification: compute SHA3-256(ss_M || ss_X || ct_X || pk_X || label) using known test inputs from Appendix F.3 and compare against the expected output.

Compile-time assertion verifies all six label bytes.

XWing.Combine MUST be internal-only — not a public API: The combiner takes raw ss_M and ss_X values as inputs and returns a SHA3-256 hash — it performs no key validation, no randomness checks, and no binding to a specific encapsulation operation. Exposing it as a callable public function allows a caller to supply arbitrary ss_X values (e.g., all-zero, repeated from a prior session, or attacker-controlled), which breaks the IND-CCA2 security guarantee of the combined scheme. The X-Wing security proof assumes ss_X is the genuine DH output from encapsulation or decapsulation — not a caller-supplied value. Implementations MUST call the combiner exclusively from within XWing.Encapsulate and XWing.Decapsulate, where ss_X is derived from a fresh ephemeral scalar (encapsulation) or from the peer's ephemeral public key and the stored secret key (decapsulation). The CAPI does NOT expose soliton_xwing_combine; the Rust API exposes XWing.Combine as pub(crate) only. Binding authors MUST NOT promote this to a public function.

The SharedSecret returned by XWing.Encapsulate / XWing.Decapsulate MUST be consumed immediately by KDF_Root and zeroized: The 32-byte shared secret (ss) is the output of XWing.Combine — it is secret key material of the same sensitivity as the inputs to KDF_Root. A binding author who returns the raw ss to callers for "custom KDF use" achieves the same security risk as exposing XWing.Combine directly: the caller can supply the ss to arbitrary downstream operations, bypassing the key hierarchy defined in §5.4 / §6.4. In Rust, xwing::SharedSecret is not Clone or Copy — it cannot be extracted without a deliberate as_bytes() call, and soliton never calls as_bytes() on the combined output outside KDF_Root. Binding authors MUST ensure the ss is passed directly into KDF_Root at the call site and zeroized before the encapsulation/decapsulation function returns. Do not buffer, log, or expose it through any intermediate field.

End-to-end encapsulation and decapsulation pseudocode:

function XWing.Encapsulate(pk):
    // pk layout (§8.1): pk_X(32) || pk_M(1184)
    pk_X = pk[0..32]
    pk_M = pk[32..1216]

    // X25519 half: generate ephemeral scalar, compute shared secret and ephemeral pk
    eph_sk = random_bytes(32)                      // 32 raw CSPRNG bytes; do NOT pre-clamp — clamping is applied
                                                   // internally by the X25519 call per §8.5 (stored raw).
                                                   // `random_bytes(32)` is used (not `random_scalar()`) to
                                                   // avoid library functions that return pre-clamped scalars;
                                                   // storing a pre-clamped scalar violates the raw-bytes
                                                   // storage requirement (§8.5).
    ct_X   = X25519(eph_sk, G)                    // ephemeral public key (32 bytes); G = RFC 7748 §6.1 base point (u-coordinate 9)
    ss_X   = X25519(eph_sk, pk_X)                 // DH output (32 bytes); if the library rejects
                                                   // all-zero output (low-order pk_X), substitute
                                                   // [0u8; 32] — same rule as Decapsulate (§8.3)

    // ML-KEM-768 half
    // FIPS 203 §7.2 draws m ← B^32 (32 random bytes) internally before encapsulation.
    // Deterministic-API callers (e.g., ml-kem crate's `encapsulate_deterministic`) must
    // supply this m explicitly via `random_b32()`. Passing a fixed, zero, or reused m
    // produces structurally valid ciphertexts but silently breaks IND-CCA2 — an attacker
    // who can predict m can recover the shared secret. Each Encaps call MUST use a fresh
    // 32-byte CSPRNG value; reuse across calls is not detectable at the API level.
    (ct_M, ss_M) = ML-KEM-768.Encaps(pk_M)       // ct_M = 1088 bytes, ss_M = 32 bytes

    // Assemble ciphertext: X25519-first (§8.1)
    ct = ct_X || ct_M                              // 32 + 1088 = 1120 bytes

    // Combine
    ss = XWing.Combine(ss_M, ss_X, ct_X, pk_X)   // 32 bytes

    zeroize(eph_sk, ss_X, ss_M)
    return (ct, ss)

function XWing.Decapsulate(sk, ct):
    // sk layout (§8.1, §8.5): sk_X(32) || dk_M(2400)
    sk_X = sk[0..32]
    dk_M = sk[32..2432]

    // ct layout (§8.1): ct_X(32) || ct_M(1088)
    ct_X = ct[0..32]
    ct_M = ct[32..1120]

    // X25519 half: re-derive pk_X from sk_X (no stored copy needed)
    pk_X = X25519(sk_X, G)                        // scalar-basepoint multiplication
    ss_X = X25519(sk_X, ct_X)                     // DH output (32 bytes)

    // ML-KEM-768 half (implicit rejection: always returns a shared secret, never fails)
    ss_M = ML-KEM-768.Decaps(dk_M, ct_M)         // 32 bytes

    // Combine — uses re-derived pk_X, not any value from the ciphertext
    ss = XWing.Combine(ss_M, ss_X, ct_X, pk_X)   // 32 bytes

    zeroize(ss_X, ss_M)
    return ss

pk_X re-derivation in decapsulation: The combiner requires pk_X (the decapsulator's own X25519 public key), but soliton stores only the X25519 scalar sk_X in the secret key (§8.5 — no separate public key is stored). The decapsulator computes pk_X = X25519(sk_X, G) (scalar-basepoint multiplication) on every decapsulation call. A reimplementer who passes ct_X (the encapsulator's ephemeral key from the ciphertext) as pk_X to the combiner produces a wrong shared secret — ct_X is the ephemeral encapsulator key, not the decapsulator public key. This is the most common error in X-Wing implementations.

Clamping requirement for pk_X re-derivation in non-auto-clamping libraries: The re-derivation pk_X = X25519(sk_X, G) requires RFC 7748 clamping applied to sk_X before the scalar multiply. Libraries that apply clamping automatically at every X25519 call (such as x25519-dalek) handle this transparently — the raw stored scalar is passed in and clamped internally. Libraries that require explicit pre-clamping (raw Montgomery ladders, some low-level crypto primitives) MUST have clamping applied before each X25519 call per §8.5's "Portability note." A non-clamping library computing X25519(raw_sk_X, G) without clamping produces a different public key for scalars where the low 3 bits or bits 254/255 differ from their clamped values — specifically, scalars where any of bits 0, 1, 2, 255 are set or bit 254 is clear. The mismatch between the re-derived pk_X and the actual public key causes the combiner to produce a wrong shared secret, and the AEAD fails silently. To verify clamping correctness: X25519(sk_X, G) with your library MUST equal public_key[0..32] (the stored X25519 public key). See §8.5 for the full portability note.

8.3 Low-Order X25519 Points

X25519 DH with a low-order public key produces an all-zeros output. LO uses the all-zeros value rather than rejecting the point. The all-zero check MUST be constant-time: the DH output is secret material before the check executes, so a variable-time comparison leaks one bit of information about the relationship between the ephemeral private key and the recipient's public key. The reference uses subtle::ConstantTimeEq against [0u8; 32]; see also the Constant-Time Requirements table in Appendix E. The SHA3-256 combiner absorbs this result alongside the ML-KEM shared secret and label; the full combiner output is secure regardless. Rejecting low-order points would allow an attacker with a malicious pre-key bundle to force session initiation to fail without providing any security benefit.

Why the all-zero check is sufficient for all low-order points: Curve25519 has 8 points of order dividing 8 (the cofactor), corresponding to points of order 1, 2, 4, or 8. RFC 7748 §5 clamps the scalar by clearing its three low bits (making it a multiple of 8). Multiplying any of these 8 low-order torsion points by a multiple-of-8 scalar produces the group identity. On Curve25519 in Montgomery form, the identity element has u-coordinate 0 — represented as the all-zero 32-byte string. Therefore, for any of the 8 low-order input points, the clamped scalar multiplication produces [0u8; 32]. The all-zero check is both necessary and sufficient: any low-order public key → all-zero output, and (with overwhelming probability) all-zero output → the input was a low-order point. A reimplementer who checks for a specific set of known low-order points by value (e.g., maintaining a hardcoded list of the 8 torsion points) is over-engineering — the all-zero output test is the complete check, since clamping makes it impossible for any non-torsion point to produce an all-zero DH output.

Implementation mechanism — error-catch-and-replace, not pre-filtering: The correct implementation calls the X25519 DH function normally and does NOT pre-filter low-order input points before calling DH. If the DH function returns an error or all-zero output, the caller substitutes [0u8; 32] explicitly. In soliton's Rust implementation, x25519::dh() rejects all-zero output by returning Err(DecapsulationFailed); the X-Wing encapsulate/decapsulate layer catches that error and substitutes [0u8; 32] via .unwrap_or([0u8; 32]). The underlying x25519_dalek crate's PublicKey type itself does not reject low-order points — soliton's wrapper adds the check. Other X25519 libraries behave differently on low-order input: (1) some return an error (catch and substitute [0u8; 32]); (2) some panic (catch the panic and substitute [0u8; 32]); (3) some silently return [0u8; 32] without any error signal — in this case no substitution is needed, the all-zero value is already correct, and adding an explicit error-catch that rejects the silent-success path is wrong. A reimplementer who checks for an error return and, finding none, then checks for all-zero output and substitutes [0u8; 32] handles all three behaviors correctly — the substitution of [0u8; 32] for [0u8; 32] is a no-op. The substitution is always the X-Wing layer's responsibility, not the X25519 primitive's.

Scope: This no-rejection policy applies exclusively inside X-Wing encapsulate/decapsulate (§8.2). LO-Auth (§4) uses full X-Wing KEM (not standalone X25519 DH), so the no-rejection policy applies there as well — the ML-KEM-768 component provides security even if X25519 produces an all-zero output. The standalone x25519::dh() function (used only internally within X-Wing, not exposed as a protocol-level primitive) DOES reject all-zero output — it returns an error — and the X-Wing layer catches that error and substitutes [0u8; 32] explicitly. A reimplementer who adds a zero-output rejection inside X-Wing's internal X25519 step and propagates the error rather than substituting zeros breaks interop with degenerate-but-secure key exchanges.

8.4 ML-KEM Implicit Rejection

ML-KEM-768 decapsulation implements FIPS 203 implicit rejection: invalid ciphertexts produce a pseudorandom shared secret rather than an error. Authentication failure surfaces only at the AEAD layer (wrong shared secret → wrong session/epoch key → AEAD tag mismatch). Implementations must not add explicit ciphertext validation that could create a timing oracle distinguishing valid from invalid ML-KEM ciphertexts.

X25519 does not independently provide implicit rejection — it produces a valid curve point (shared secret) for any 32-byte input, including malformed or attacker-chosen public keys. The combined X-Wing shared secret is pseudorandom on invalid ciphertext because the ML-KEM component's randomized rejection (FIPS 203 §7.3) dominates the combiner output via SHA3-256. A reimplementer constructing a non-standard X-Wing variant who removes or replaces the ML-KEM component would silently break this property — the X25519-only combiner output would be attacker-influenced rather than pseudorandom.

8.5 Secret Key Storage

LO stores the fully expanded X-Wing secret key (2432 bytes) rather than the 32-byte seed format specified in draft-09. This avoids re-running SHAKE256 key expansion on every decapsulate. This applies to all X-Wing secret keys in soliton — identity IK, signed pre-keys (SPK), one-time pre-keys (OPK), and ratchet keys are all stored in expanded 2432-byte form. The 32-byte seed form is never used for storage regardless of key type. The extra 2400 bytes per key is negligible given the 2496-byte composite key.

X25519 scalar sk_X is stored as raw bytes — clamping is applied at use time: The 32-byte sk_X field in the stored secret key is the raw random scalar from key generation (or from SHAKE256 seed expansion), before RFC 7748 clamping. Clamping (bit 255 clear, bit 254 set, three low bits of byte 0 clear) is applied at the time of each X25519 operation (X25519(sk_X, G) and X25519(sk_X, peer_pk)) by the underlying library. The stored bytes are NOT pre-clamped. A reimplementer who clamps sk_X before writing it into the secret key blob produces a different wire format: the stored bytes differ from the reference implementation, the X25519 operations that re-clamp at use time produce the same curve result (clamping is idempotent for DH), but the blob round-trip fails because the stored bytes do not match what the reference expects.

Portability note — libraries that require explicit pre-clamping: Some X25519 libraries do not apply clamping internally and require the caller to pre-clamp the scalar before each use (e.g., a raw Montgomery ladder that takes the scalar as-is). When integrating such a library: (1) do NOT clamp before storage — store the raw random bytes as specified; (2) DO clamp before each X25519 call — read the 32 stored bytes, apply RFC 7748 clamping in a temporary variable, pass the clamped bytes to the library, and zeroize the temporary immediately after. This "clamp at use time" pattern matches the reference implementation's semantics even when the library doesn't clamp automatically. Libraries that clamp automatically (including x25519-dalek used by the reference) make this transparent — the stored raw scalar is passed directly and clamped internally. A reimplementer who passes the raw stored scalar to a non-clamping library produces the wrong DH output (the unclamped scalar may have the low bits set, changing the scalar value and thus the DH result), causing silent AeadFailed at the AEAD layer. To verify: compute X25519(sk_X, G) using the library and compare against public_key[0..32] — a mismatch indicates clamping divergence.

Production keygen draws three independent OS CSPRNG values, not a SHAKE256-expanded seed: XWing.KeyGen() in production draws sk_X (32 bytes), d (32 bytes), and z (32 bytes) as three separate independent OS CSPRNG values — it does NOT draw a single 32-byte seed and expand it via SHAKE256(seed, 96). The three values are passed directly: sk_X to X25519 for the X25519 component, and d + z to ML-KEM.KeyGen_internal(d, z) for the ML-KEM component. No seed is stored or exposed. Both production key generation paths are conformant: drawing three independent OS CSPRNG values (reference path) and SHAKE256 seed expansion (alternative path) both produce interoperable key pairs. The two paths differ only in how the three random inputs (sk_X, d, z) are generated — the downstream X25519 and ML-KEM operations are identical. The reference implementation uses the three-draw path; a reimplementer who uses SHAKE256 seed expansion for production keygen produces keys that interoperate fully with the reference. The deviation SHOULD be documented, as it changes the security analysis: with seed expansion, the three components are no longer independent random values — their joint distribution is determined by SHAKE256 applied to a single seed. The SHAKE256 seed expansion path is the natural choice for test vectors and KAT reproduction (where deterministic derivation from a known seed is required), but it is also a valid production path. A reimplementer MUST NOT use a non-CSPRNG seed (e.g., a counter or a fixed value) for production keygen on either path — the seed or the three independent draws must come from the OS CSPRNG.

Seed-to-expanded-key derivation: The X-Wing 32-byte seed produces the expanded key via SHAKE256(seed, 96) → 96 bytes, split as d(32) || z(32) || sk_X(32) (draft-09 §3.2). d and z are the ML-KEM-768 seeds passed to ML-KEM.KeyGen_internal(d, z) (FIPS 203 §7.3), which produces the 2400-byte expanded decapsulation key. These are two separate arguments — ML-KEM.KeyGen_internal is NOT called with a single 64-byte d‖z concatenation. Passing a concatenation or reversing the argument order as (z, d) produces different key material with no error. sk_X is the X25519 scalar. LO's storage order is sk_X(32) || dk_M(2400) — X25519 scalar first, then the ML-KEM-768 expanded key. This is the reverse of draft-09's wire order (which places ML-KEM first). A reimplementer deriving from the seed MUST use this expansion and storage order; using draft-09's wire order produces a valid-looking 2432-byte key that silently fails at decapsulation (the X25519 and ML-KEM components are swapped, producing wrong shared secrets in both sub-KEMs).

ML-KEM-768 expanded key format (2400 bytes): The 2400-byte ML-KEM secret key is the ml-kem Rust crate's DecapsulationKey serialization, laid out as:

Offset	Size	Field	Description
0	1152	`dk_PKE`	NTT-domain decryption key (12 polynomials × 256 coefficients × 12 bits/coeff / 8)
1152	1184	`ek_PKE`	Encapsulation key (coefficient-domain encoding, identical to public key bytes)
2336	32	`H(ek_PKE)`	SHA3-256 hash of the encapsulation key
2368	32	`z`	Implicit-rejection seed (random, used for FO decapsulation)

This is not the FIPS 203 32-byte seed form (d || z), nor the FIPS 203 standardized 2400-byte dk_PKE || ek_PKE || H(ek_PKE) || z expansion (which uses coefficient-domain for dk_PKE, not NTT-domain). The field sizes and order match FIPS 203 §7.3 ML-KEM.KeyGen_internal, but the dk_PKE encoding differs: FIPS 203 specifies coefficient-domain via ByteEncode_12, while the ml-kem crate serializes in NTT-domain. Byte-for-byte comparison with FIPS 203 output is invalid for the first 1152 bytes (dk_PKE only). The remaining three fields — ek_PKE (bytes 1152-2335), H(ek_PKE) (bytes 2336-2367), and z (bytes 2368-2399) — use standard FIPS 203 encoding and are byte-for-byte identical to FIPS 203 output. Only dk_PKE diverges; a reimplementer who mistrusts all four fields and attempts to convert or reorder all of them produces a wrong key. Other ML-KEM libraries (liboqs, PQClean, BouncyCastle) use different serialization formats. Reimplementers must verify their library's DecapsulationKey serialization matches soliton's byte layout, or perform format conversion at the deserialization boundary. Silent failure mode: there is no format magic in the 2400-byte key bytes — a wrong-format key is accepted by deserialization and only manifests as AEAD failures during decapsulation (the shared secret diverges silently). Cross-library key import requires an explicit check: re-derive the public key from the decapsulation key and compare to the known public key. A mismatch indicates a format incompatibility. Concrete comparison: for the ML-KEM-768 component, compare bytes 1152-2335 of the decapsulation key (the ek_PKE field, 1184 bytes) against public_key[32..1216] (the ML-KEM-768 portion of the X-Wing public key). For the X25519 component, compute pk_X = X25519(sk_X, basepoint) from the first 32 bytes of the secret key and compare against public_key[0..32]. Both comparisons must pass; a mismatch in either indicates an incompatible key format or encoding.

ML-DSA-65 secret keys are stored as the 32-byte seed (FIPS 204 §6.1 ξ), not the 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)-byte expanded form. The signing key is deterministically re-expanded from the seed on each sign operation via ML-DSA.KeyGen_internal(ξ) (FIPS 204 §6.1), which produces the full expanded signing key on each call. Re-expansion is fully deterministic — ML-DSA.KeyGen_internal consumes no CSPRNG input (FIPS 204 §6.1 defines it as a pure function of ξ). Libraries that expose a two-level API — a public KeyGen() that draws OS randomness alongside an internal KeyGen_internal(ξ) that does not — must call the internal variant; calling the public variant for key re-expansion would succeed structurally but produce a different expanded key on every call, making signing non-reproducible. Implementations using ML-DSA libraries that only accept the expanded form must perform this expansion explicitly — passing the 32-byte seed directly as the signing key to such a library produces wrong signatures (the seed is not the signing key). Libraries that accept a seed-form input (e.g., via a from_seed(ξ) constructor) call KeyGen_internal internally; check the library's API.

ML-DSA seed expansion for low-level library APIs: FIPS 204 §6.1 Algorithm 1 (ML-DSA.KeyGen) applies an internal expansion to ξ: (ρ, ρ', K) = SHAKE256(ξ ‖ k ‖ ℓ, 128) (where k = 5, ℓ = 4 for ML-DSA-65, expressed as single bytes), followed by polynomial sampling from ρ and ρ'. Some ML-DSA libraries expose this two-step structure with separate keygen_internal(d, z) parameters — note that these are not the same d and z as ML-KEM's seed expansion; ML-DSA uses (ρ, ρ', K) as its intermediate state, derived differently. A reimplementer whose library requires explicit seed expansion must run Algorithm 1 §6.1 in full before constructing the signing key; there is no simple fixed-length hash split analogous to ML-KEM's SHAKE256(seed, 96) → d ‖ z ‖ sk_X. In practice, FIPS 204-compliant libraries targeted at this use case provide a KeyPair::from_seed(ξ) or equivalent entry point that performs Algorithm 1 internally — verify that the library entry point accepts the 32-byte seed directly and runs Algorithm 1 §6.1, rather than accepting already-expanded polynomial state. ML-DSA-65 public keys (1952 bytes) use the standard FIPS 204 pkEncode format and are compatible with compliant implementations (liboqs, PQClean, BouncyCastle) without format conversion — unlike ML-KEM-768 (see above), there is no NTT-domain encoding divergence.

ML-DSA-65 cross-check requirement for cross-library import: Importing a 32-byte ML-DSA-65 seed from an external source produces no immediate error — from_seed(ξ) succeeds for any 32-byte input, and the re-derived public key always matches the stored public key for the same seed. However, if the seed bytes don't correspond to the intended keypair (wrong format, wrong byte order, corrupted), signatures produced with the imported seed verify against the re-derived public key but not against the original stored public key in the identity blob. The size check passes, expansion succeeds, and signatures look valid — the mismatch is only visible if the other party's stored ML-DSA public key is available for comparison. Cross-library key import requires explicit verification: call ML-DSA.KeyGen_internal(candidate_seed) (FIPS 204 §6.1), derive the public key from it, and compare against the known ML-DSA-65 public key (composite_pk[1248..3200], 1952 bytes). A mismatch indicates an incompatible seed format and MUST be treated as InvalidData — proceeding would produce signatures that the receiver cannot verify.

Ed25519 secret keys are stored as the 32-byte seed (RFC 8032 §5.1.5), not the 64-byte expanded form (SHA-512 of seed). The signing key is deterministically expanded from the seed on each sign operation. The 32-byte ed25519_sk field in the composite secret key (bytes 2432-2464, §2.2) is this seed. Interop note: Libraries that represent Ed25519 private keys as 64-byte seed || public_key (libsodium, Go crypto/ed25519, PyNaCl) must extract only the first 32 bytes as the seed. The trailing 32 bytes (public key copy) are not part of soliton's secret key layout — passing the full 64-byte representation to key extraction produces corrupted output.

9. Verification Phrases

9.1 Purpose

A short human-readable phrase derived from both parties' identity public keys. Both parties can compare the phrase out-of-band (voice call, in-person) to verify identity key authenticity. The phrase is independent of session state — it depends only on the two identity keys and remains stable across session resets.

9.2 Algorithm

function VerificationPhrase(pk_a, pk_b):
    // Both pk_a, pk_b must be full LO composite identity public keys per §2.2
    // (3200 bytes: X-Wing 1216 + Ed25519 32 + ML-DSA 1952). Passing only
    // a sub-key component (e.g., the 1216-byte X-Wing key) returns InvalidLength;
    // passing a different 3200-byte value (e.g., padded or truncated) silently
    // produces a different phrase.
    // pk_a == pk_b → InvalidData (self-verification produces a valid phrase
    // that gives a false sense of security — the user verified against their own key).

    // Step 1: Sort keys lexicographically ascending — smaller key first.
    // The comparison is over the full 3200-byte raw public key bytes, NOT over
    // a fingerprint, hash, or any sub-key component. The fingerprint
    // (SHA3-256 of the key) is used as a canonical identifier throughout the
    // rest of the protocol — a reimplementer who sorts by SHA3-256(key) instead
    // of by the key itself produces a different ordering silently (the sort
    // succeeds, the phrase is wrong, no error is returned).
    // "Ascending" means the key that is lexicographically smaller (byte-by-byte,
    // left to right, unsigned comparison) is placed first. If pk_a <= pk_b,
    // first = pk_a, second = pk_b; otherwise first = pk_b, second = pk_a.
    // Descending order (larger key first) produces a different hash — silently
    // incompatible phrases with no runtime error.
    (first, second) = sort_lexicographic_ascending(pk_a, pk_b)

    // Step 2: Concatenate with domain separation label and hash.
    hash = SHA3-256("lo-verification-v1" || first || second)  // label = 18 bytes
    // Total SHA3-256 input: 6,418 bytes (18-byte label + 3,200-byte first + 3,200-byte second).
    // The full 3200-byte composite public key is hashed — NOT the 32-byte fingerprint
    // (SHA3-256 of the key). Using fingerprints instead silently produces different phrases
    // with no error signal and reduces the preimage size from 3200 bytes to 32 bytes.

    // Step 3: Map hash bytes to word indices.
    // Consume 2-byte chunks (u16, big-endian). The read cursor advances by 2 bytes
    // for each sample regardless of acceptance or rejection — rejected values are
    // discarded, not retried. Accept only values in [0, 62208)
    // (floor(65536 / 7776) × 7776 = 8 × 7776 = 62208) to eliminate modular bias.
    // Rejection rate: (65536 − 62208) / 65536 ≈ 5.1% per sample.
    // On exhaustion of 32 bytes, rehash: hash = SHA3-256("lo-phrase-expand-v1" || round || hash).
    // Total input: 52 bytes = 19-byte label + 1-byte round (u8) + 32-byte previous hash.
    // The read cursor resets to byte 0 of the new hash — no carry-over from the previous hash.
    // Concatenation order: 19-byte label, then 1-byte round (u8), then 32-byte previous hash.
    // Round counter starts at 1 (first rehash uses round = 0x01, range 1..=19).
    // Starting at 0 vs 1 produces different hash outputs — this is interop-critical.
    // Maximum round count is 19. Reaching round 20 → Internal error (structurally
    // unreachable at probability < 2^-150; implementations MUST treat as fatal and
    // return Internal — this indicates CSPRNG failure or a broken hash function,
    // not a recoverable condition. Do NOT retry or fall back to fewer words).
    // 16 initial samples + 19 rehash rounds × 16 = 320 total samples; termination probability < 2^-150.
    // "16 initial samples" means 16 candidate u16 values extracted from the 32-byte SHA3-256 output
    // (32 bytes / 2 bytes per u16 = 16 candidates). Each candidate is an attempt that may be accepted
    // (if < 62,208) or rejected (if ≥ 62,208). These are NOT 16 output words — 7 words are needed;
    // each hash round provides at most 16 candidate slots, typically more than enough for 7 words.
    words = []
    while len(words) < 7:
        val = next_accepted_u16(hash)   // bias-free, rejection sampling
        words.append(EFF_WORDLIST[val % 7776])

    return " ".join(words)

Canonical output format: The returned string is the seven words joined by single ASCII space (0x20) characters, with no leading or trailing whitespace. Words are lowercase as they appear in the EFF large wordlist — the wordlist is already lowercase; no case transformation is applied. Programmatic comparison of two phrases MUST use exact byte equality on the canonical string — case-folding or whitespace normalization before comparison is incorrect and masks implementation divergence.

fingerprint_hex() produces lowercase hex — same constraint as §2.1: The fingerprint_hex() function used for display returns 64 lowercase hexadecimal characters (digits 0-9 and lowercase letters a-f). This matches the §2.1 specification ("64 lowercase hex chars"). Any implementation that produces uppercase hex fingerprints (e.g., using %X format in C or strings.ToUpper in Go) diverges from the canonical form — verification phrase comparison and fingerprint display will not match across implementations even if the underlying keys are identical.

The word index is computed as val % 7776. The rejection threshold 62,208 (= 8 × 7,776) ensures uniform distribution — values ≥ 62,208 are rejected to eliminate modular bias.

Cursor advance on rejection is mandatory for interoperability: The read cursor advances by 2 bytes for every u16 sample regardless of whether the sample is accepted or rejected. An implementation that re-reads the same 2-byte position on rejection (i.e., advances only on acceptance) produces a different phrase for any input containing a rejected sample — occurring in approximately 5.1% of samples. Since the 7-word phrase requires 7 accepted samples, and any hash containing a rejected sample causes cursor divergence, the two implementations will agree only for the ~69% of 32-byte hashes that happen to contain no rejected sample in their first 7 accepted positions. Reimplementers MUST advance the cursor unconditionally — treating rejection as "advance past the rejected sample" (discarded), not "retry the same position." The test vector F.9 exercises this behavior explicitly (see Appendix F).

The EFF large wordlist contains exactly 7776 words, 0-indexed (entry 0 = "abacus", entry 7775 = "zoom"). Each word carries log2(7776) ≈ 12.9 bits of entropy. Seven words provide ≈ 90.3 bits of entropy. A 1-indexed implementation maps every index to a different word — this is a silent interop failure. Implementations MUST verify their embedded wordlist matches SHA3-256 a1e90a00ec269fc42a5f335b244cf6badcf94b62e331fa1639b49cce488c95c5 (full reference in Appendix D). Mismatched wordlist copies — different versions, incomplete dice-prefix stripping, trailing whitespace differences — produce silently incompatible phrases with no error indicator.

Canonical byte sequence for the checksum: The EFF large wordlist source file ships with dice-prefix columns (11111\tabacus, etc.). To produce the canonical form: (1) strip the dice prefix and tab from each line, leaving only the word; (2) use LF (\n, 0x0a) line endings — no CRLF; (3) include a trailing LF after the last word. The resulting file is 7776 lowercase words, one per line with a trailing newline, totalling 43,186 bytes. The SHA3-256 of this byte sequence is a1e90a00ec269fc42a5f335b244cf6badcf94b62e331fa1639b49cce488c95c5. A wordlist with CRLF endings, no trailing newline, or retained dice prefixes produces a different hash — verify independently before embedding.

9.2.1 Error Summary

Error	Condition
`InvalidLength`	Either `pk_a` or `pk_b` is not exactly 3200 bytes. The 3200-byte requirement reflects the full LO composite identity public key (X-Wing 1216 + Ed25519 32 + ML-DSA 1952 — §2.2). Passing only a sub-key component (e.g., the 1216-byte X-Wing portion) returns `InvalidLength`.
`InvalidData`	`pk_a == pk_b` (self-verification). A phrase computed from a key paired with itself gives a false sense of security — both parties see the same phrase regardless of the other's key, providing no authentication signal. Public keys are non-secret material — variable-time comparison (`==`) is used, not `ct_eq`.
`Internal`	Rehash round counter reached 20 — structurally unreachable at probability < 2⁻¹⁵⁰. Indicates CSPRNG failure or a broken hash function. Must NOT be retried.

9.3 Properties

Order-independent: VerificationPhrase(A, B) == VerificationPhrase(B, A).
Deterministic: Given the same two identity keys, always produces the same phrase.
Unbiased: Rejection sampling eliminates modular bias in word selection.
Session-independent: Depends only on long-term identity keys, not session state.
Wordlist: EFF large wordlist, 7776 words, compile-time length assertion.

9.4 Security Analysis

Second-preimage resistance: ~90.3 bits. An attacker trying to find a key that produces the same phrase when paired with a specific victim key must brute-force ~2^90 SHA3-256 hashes.

Birthday collision resistance: ~45 bits. An attacker who freely generates their own identity keys can generate ~2^45 keys; by the birthday paradox, with >50% probability two will produce the same verification phrase when paired with a given victim key. No CSPRNG access or key-generation control over the victim is required — the attacker generates their own keys until a collision is found. The attacker registers one key, establishes a legitimate verification phrase, then substitutes the colliding key — the victim's out-of-band check passes despite the key swap.

2^45 SHA3-256 operations (~35 trillion hashes) is expensive but within reach of well-resourced state-level attackers. For most threat models (where the attacker does not control key generation at scale), the ~90.3-bit second-preimage bound is the relevant security parameter. Applications with state-level threat models should supplement verification phrases with full fingerprint comparison.

10. Key Management

10.1 Identity Key

Generated once, stored permanently.
MUST be encrypted at rest (passphrase, device key, or platform secure storage).
Loss = loss of identity. No recovery by design.
Used for: auth (X-Wing component, §4), hybrid signing of pre-keys (§3), session initiation signing (§5.4 Step 6), KEM decapsulation in LO-KEX (§5.5 Step 4), and HKDF info binding in LO-KEX (§5). IK compromise alone is insufficient to derive session keys — also requires SPK private key (§5.6).

10.2 Signed Pre-Key Rotation

Rotate every 7 days (recommended).
Retain the full SPK keypair (secret key and SPK ID) for 30 days after rotation (grace period for delayed session inits). Incoming session_init blobs carry an spk_id used to look up the correct decapsulation key; storing only the key bytes without the ID makes this lookup impossible. The 30-day clock starts when the replacement SPK is uploaded and the old SPK leaves the active bundle — not when the old SPK was originally generated. An SPK that was active for 7 days and then rotated is retained for 30 additional days (37 days total from generation).
After grace period, delete old SPK private key.

10.3 One-Time Pre-Key Management

Upload batches of 100.
Replenish on DM_PREKEY_LOW (remaining < 10).
Delete private key immediately after single use.
OPKs have no time-based expiry — an unconsumed OPK private key remains valid indefinitely until consumed or explicitly deleted by application policy. Unlike SPKs (which have a 30-day retention window after rotation, §10.2), OPKs are not rotated on a schedule.
Protocol functions without OPK (reduced initial forward secrecy only). When the OPK pool is empty, servers MUST return a bundle with has_opk = false rather than rejecting the bundle request. Refusing to serve a bundle when no OPKs are available prevents session initiation entirely and is a denial-of-service hazard. An empty pool is a temporary operational condition, not a protocol error — the session proceeds without OPK and Alice and Bob acknowledge the reduced forward-secrecy guarantee.

10.4 Ratchet Key Lifecycle

Generated per KEM ratchet step.
Previous ratchet private key deleted (zeroized via ZeroizeOnDrop) after step completes.
Previous receive epoch key retained for one epoch (late message grace period), then zeroized.

10.5 Memory Hygiene

Zeroize all sensitive material immediately after use:

Shared secrets after key derivation.
Message keys after encrypt/decrypt.
Old ratchet private keys.
Auth shared secrets.
Intermediate KDF outputs.
Streaming AEAD key (caller copy): After calling stream_encrypt_init or stream_decrypt_init, the library holds an internal copy of the key in the encryptor/decryptor handle. The caller's original key buffer is NOT zeroed by the library — the caller MUST zeroize their copy immediately after init returns. See §15.1 for the key lifecycle. The handle's internal copy is zeroized automatically when the handle is freed (stream_encrypt_free / stream_decrypt_free).

Use Zeroizing<T> wrappers and ZeroizeOnDrop trait (Rust zeroize crate). [u8; N] is Copy — after Zeroizing::new(val), explicitly .zeroize() the source copy.

AEAD output buffer pre-allocation: When encrypting into a growable buffer, pre-allocate the full output capacity (plaintext.len() + 16 for Poly1305 tag) before writing any data. If the buffer reallocates mid-write, the abandoned heap region containing partial plaintext is freed without zeroization — the allocator does not zero freed memory. This applies to any language with growable buffers (Rust Vec, Go slices, Python bytearray). Pre-computing the output size eliminates reallocation entirely.

Decompression intermediate buffers: During storage decryption (§11.3) and streaming decryption (§15.5), the zstd decoder allocates internal buffers that are freed without zeroization — only the final output is wrapped in Zeroizing. The same applies to compression during encryption. These intermediate buffers may contain plaintext fragments on the heap. Implementations that require stronger guarantees should use a custom allocator that zeros on deallocation, or accept that decompression intermediates are a residual exposure window.

10.6 Passphrase-Based Key Derivation (Argon2id)

Identity keys stored on-device MUST be encrypted at rest. For passphrase protection use primitives::argon2::argon2id (RFC 9106, Argon2id variant, version 0x13 / v1.3). The version MUST be 0x13 — RFC 9106 also defines v1.0 (version 0x10), and some libraries default to it. Using the wrong version produces different output and silently incompatible key derivation.

Presets:

Preset	m_cost	t_cost	p_cost	Use case
`OWASP_MIN`	19 MiB (19456 KiB)	2	1	Interactive auth, latency < 1 s
`RECOMMENDED`	64 MiB (65536 KiB)	3	4	Stored keypair protection, ~0.1-2 s (hardware-dependent; modern multi-core hardware typically 0.1-1 s)
`WASM_DEFAULT`	16 MiB (16384 KiB)	3	1	WASM targets (single-threaded, constrained memory) — `p_cost = 1` because WASM runtimes are single-threaded: `p_cost > 1` serializes lane execution without achieving any parallelism, multiplying wall-clock time with zero additional security benefit

Requirements:

Salt: at least 8 bytes; 16 or 32 random bytes recommended. Use primitives::random::random_array::<16>().
Output: caller-allocated; 32 bytes for a 256-bit symmetric key; any positive length accepted. Not length-extensible: requesting 32 bytes and requesting 64 bytes produce outputs where the first 32 bytes are completely different — Argon2id's variable-length output uses Blake2b's long-output mode, which re-hashes the entire state for different output lengths. A reimplementer who requests a larger output and slices it to 32 bytes will silently produce an incompatible key.
Zeroize output with zeroize::Zeroize::zeroize(&mut out) or wrap in Zeroizing after use.
Error-path zeroization: On any error return (invalid parameters, Argon2 library failure), the output buffer is explicitly zeroized by the implementation before returning. Callers are NOT required to zeroize the output on the error path — but reimplementers MUST apply the same zeroization on failure; omitting it leaves partial key material in a caller-visible buffer with no obligation or documentation to clean it up.
The salt MUST be stored alongside the encrypted key material — it is not secret and may be stored in plaintext. Without the original salt, the Argon2id derivation cannot be reproduced and the protected key is permanently unrecoverable.

Validation bounds:

Parameter	Min	Max
`m_cost`	max(8, 8 × `p_cost`) KiB (RFC 9106 §3.1 block minimum: 8 blocks minimum; each lane requires at least 8 blocks, so p_cost lanes require 8 × p_cost blocks minimum; the standalone minimum of 8 applies when p_cost = 1, making max(8, 8 × p_cost) = 8)	4 GiB (4194304 KiB) `m_cost` is in KiB, not bytes: passing `65536` means 64 MiB (correct for `RECOMMENDED`), not 65536 bytes (which would be only 64 KiB and silently produce a different key). Argon2 accepts any `m_cost` ≥ the minimum with no error, so a factor-of-1024 unit mistake causes silent incompatibility.
`t_cost`	1	256
`p_cost`	1	256
output length	1 byte (inclusive)	4096 bytes (soliton-imposed; RFC 9106 allows up to 2³²−1 bytes)
salt length	8 bytes	268,435,456 bytes (256 MiB) — CAPI limit; RFC 9106 allows up to 2³²−1 bytes, but `soliton_argon2id` caps salt input at the general CAPI buffer limit of 256 MiB to prevent allocation exhaustion
password length	0 bytes	268,435,456 bytes (256 MiB) — CAPI limit. Zero-length password is accepted: an empty password (`password = NULL` or `password_len = 0`) is valid input; `soliton_argon2id` passes zero bytes to Argon2id without error. Callers MUST validate non-empty passwords at the application layer if their threat model requires it — the primitive does not enforce this.

Argon2 library failure → Internal: Parameter-validation failures (violations of the bounds table above) return InvalidData. An Argon2 library-internal failure during the hash operation itself (OOM, BLAKE2 internal error — structurally unreachable on correct parameters under normal OS conditions) returns Internal. Binding authors who need to enumerate all possible return codes must include Internal (-12) for this path. On any error (including Internal), the output buffer is zeroed before returning.

The 4096-byte output cap is soliton-specific — it is not mandated by RFC 9106 (which allows outputs up to 2³²−1 bytes). The cap prevents allocation-exhaustion attacks in server contexts where output length comes from untrusted input (e.g., a malicious client requesting a 4 GiB KDF output). A reimplementer who removes the cap to "follow the standard" reintroduces this vector.

The t_cost = 256 cap bounds iteration-time exhaustion: t_cost controls the number of passes over the memory block. Each pass takes time proportional to m_cost, so an adversary supplying untrusted t_cost from a request (e.g., a client sending its KDF parameters to a server that re-derives the key) can force arbitrarily long computation. The cap of 256 limits total work to 256 × m_cost passes, bounding the server-side CPU cost to a predictable maximum regardless of client-supplied input. This is the same defense-in-depth motivation as the p_cost = 256 and output-length caps — all three parameters are capped to prevent resource exhaustion when Argon2id parameters originate from untrusted input rather than the application's own configuration.

Error types: Salt too short (< 8 bytes) or output length violations (0 bytes or > 4096 bytes) return InvalidLength. Cost parameter violations (m_cost, t_cost, p_cost out of bounds or below argon2 library minimums) return InvalidData. Coupled constraint: RFC 9106 §3.1 additionally requires m_cost >= 8 × p_cost — each parallel lane requires at least 8 KiB of memory. Combinations where both m_cost and p_cost are individually within bounds but violate this coupling (e.g., m_cost=100, p_cost=100) return InvalidData. The individual upper-bound caps (m_cost > M_COST_MAX, t_cost > T_COST_MAX, p_cost > P_COST_MAX) are checked in soliton code before the library call. The coupled m_cost >= 8 × p_cost constraint is enforced by the argon2 library's parameter constructor (Params::new) and mapped to InvalidData via .map_err — it is not a soliton-level pre-check. A reimplementer who adds their own pre-checks must include this coupled constraint explicitly; checking only the individual caps will miss it.

InvalidLength.expected for zero-length output: When output_len = 0, InvalidLength is returned with expected = 1 (the minimum valid output length), NOT expected = 4096. The expected field reflects the bound violated: for output_len < 1, it signals the minimum; for output_len > 4096, it signals the maximum. A caller who inspects the expected field programmatically to build a diagnostic message should not assume that expected = 4096 means "value too small" — the value that appears in expected is always the valid-range bound that was violated, not a fixed error code. This is consistent with InvalidLength semantics throughout soliton (see §12 error semantics).

Usage: Argon2id is a building-block primitive — it derives a symmetric key from a passphrase, but does not define an encrypted blob format. The application is responsible for using the derived key with an AEAD to encrypt identity key material, and for defining the on-disk format. soliton does not export a combined "encrypt identity with passphrase" function; the CAPI exposes soliton_argon2id as the KDF and the application composes AEAD encryption separately.

Recommended composition: For passphrase-protected identity keys, use XChaCha20-Poly1305 (the same AEAD as the rest of the protocol) with a 24-byte random nonce (random_bytes(24)) and the Argon2id-derived 32-byte key used directly as the AEAD key — no secondary KDF or key expansion step. The 32-byte Argon2id output is already a uniformly distributed key of the correct size for XChaCha20-Poly1305. A reimplementer who adds an additional HKDF step (e.g., HKDF(argon2_output, salt=nonce, info="...")) produces incompatible ciphertext with no error at encryption time — the mismatch surfaces only as AeadFailed at decryption. The on-disk format is salt (16 bytes) ‖ nonce (24 bytes) ‖ AEAD ciphertext (plaintext_len + 16 tag). Minimum parseable blob size: With the recommended format, a blob must be at least 56 bytes (16 salt + 24 nonce + 16 Poly1305 tag with empty plaintext). Decoders MUST reject blobs shorter than 56 bytes before attempting AEAD — slicing a sub-24-byte remainder for the nonce causes out-of-bounds access in C or a panic in Rust. Return InvalidLength (not AeadFailed) for blobs shorter than 56 bytes; this is a pre-AEAD framing check on publicly observable data, not an authentication failure. Note: the storage blob format (§11.1) enforces a 42-byte minimum analogously — passphrase-protected key blobs need the same pre-AEAD guard. No AAD is required (the salt and nonce are integrity-protected via their role in the derivation/encryption — changing either produces decryption failure). This composition is not normative (applications may use a different AEAD or format), but following it ensures interoperability between independent implementations of passphrase-protected key storage. An application that chooses a different AEAD (e.g., AES-256-GCM) produces encrypted keys that are incompatible with applications following this recommendation.

Parameter flexibility limitation and extended blob format: The recommended salt(16) ‖ nonce(24) ‖ ciphertext format does not encode Argon2id cost parameters. If an application later upgrades from OWASP_MIN to RECOMMENDED parameters, existing blobs become permanently undecryptable without out-of-band knowledge of which parameters were used. Applications that may need to change parameters should use the extended format: m_cost (4 bytes, BE u32) ‖ t_cost (1 byte) ‖ p_cost (1 byte) ‖ salt (16 bytes) ‖ nonce (24 bytes) ‖ ciphertext. Extended format serialization constraint: t_cost and p_cost are stored as single bytes (u8, values 0x01-0xFF = 1-255). The bounds table above allows t_cost and p_cost up to 256, but a value of 256 does not fit in a u8 (256 = 0x100 truncates to 0x00, which is invalid — the minimum is 1). Applications using the extended format MUST restrict t_cost and p_cost to 1-255. The m_cost field is stored as a 4-byte BE u32, which accommodates the full 4 GiB maximum without constraint. Important: soliton does not implement a passphrase-blob encoder or decoder. The CAPI exposes only soliton_argon2id as the KDF primitive — there is no soliton_passphrase_encrypt or soliton_passphrase_decrypt function. Both the basic and extended format layouts are APPLICATION-LAYER conventions for applications to implement. The magic-byte discriminator described below is a RECOMMENDED convention for achieving cross-implementation interoperability, not a normative value enforced by the soliton library. A decoder that needs to support both formats MUST distinguish them using a magic prefix byte (0x00 for basic, 0x01 for extended — recommended values for interoperability), NOT by a length check alone. A length check is only unambiguous when the plaintext is shorter than 6 bytes (total blob ≤ 77 bytes for basic, ≤ 83 bytes for extended): for any real-world content with ≥ 6 bytes of plaintext, the size ranges of the two formats overlap completely and a length-based discriminator silently misparses every ambiguous blob. A reimplementer who uses a length check will find it works on test vectors with empty or short plaintexts but fails in production. The magic-byte scheme is the only reliable approach for cross-implementation interoperability. The extended format minimum parseable size is 62 bytes (4 + 1 + 1 + 16 + 24 + 16 tag with empty plaintext). Magic-byte interaction with minimum blob sizes: The 56-byte and 62-byte minimums above are for the format bodies — the payload after the magic discriminator byte. When the magic-byte discriminator is prepended at offset 0, the minimum sizes inclusive of the discriminator become 57 bytes (basic: 0x00 ‖ salt ‖ nonce ‖ tag) and 63 bytes (extended: 0x01 ‖ m_cost ‖ t_cost ‖ p_cost ‖ salt ‖ nonce ‖ tag). Decoders using the magic-byte scheme MUST apply the format-inclusive minimum (57 or 63 bytes) as their pre-AEAD size check, not the format-body minimums (56 or 62). Test vector F.29 uses the basic format (parameters supplied out-of-band). No test vector covers the extended format (0x01 ‖ m_cost ‖ t_cost ‖ p_cost ‖ salt ‖ nonce ‖ ciphertext) — the encoding of m_cost as a 4-byte BE u32 and t_cost/p_cost as single bytes is verified only through the extended-format decoder in the reference implementation (tests/compute_vectors.rs). Implementors adding extended-format support should verify their encoder output matches the reference decoder by running the reference integration test.

Unicode normalization: No Unicode normalization is applied — raw UTF-8 bytes are passed directly. Multi-platform applications MUST normalize passwords to NFC before calling, because iOS (NSString), Android (String), and Rust (str) use different internal representations; "café" may encode differently across platforms, producing different keys from the same apparent passphrase.

Invalid UTF-8 passthrough: Argon2id (RFC 9106) accepts arbitrary byte strings — it is not a Unicode function. The soliton argon2id primitive does NOT validate that the password bytes are well-formed UTF-8. Invalid UTF-8 byte sequences (e.g., 0xFF, lone continuation bytes) are passed through to Argon2id unchanged and produce a deterministic key. A reimplementer who adds a UTF-8 validation step at their API boundary (rejecting invalid UTF-8 with InvalidData) produces a stricter interface than the reference — callers passing arbitrary byte strings (not just UTF-8 passwords) to the CAPI will receive InvalidData from the reimplementation but succeed with the reference. The CAPI soliton_argon2id accepts *const u8, usize and passes the bytes through without UTF-8 checking.

Notes:

Only the Argon2id variant is supported (hybrid of Argon2i and Argon2d; recommended by RFC 9106 §4 for on-disk key material).
Not used internally by the protocol KDFs (HKDF-SHA3-256); provided for application-layer key protection.
The argon2 crate zeroizes internal memory blocks on drop.
ad (associated data) is always empty: soliton passes an empty byte string as the Argon2id ad parameter unconditionally. RFC 9106 defines ad as an optional context distinguisher (analogous to HKDF info), but the soliton KDF does not use it. Reimplementers MUST pass empty ad — a non-empty ad produces a different derived key with no error signal. Per-language: C callers using argon2_ctx directly MUST set .ad = NULL, .adlen = 0; Python's argon2-cffi has no ad parameter (always empty, correct by construction); Go's argon2.IDKey has no ad parameter (always empty, correct by construction); Rust's argon2 crate uses Params::default() which sets ad to empty.
secret (pepper) is always empty: soliton passes an empty byte string as the Argon2id secret parameter. The secret input provides a server-side pepper, but soliton does not use it (the pepper would require secure server-side key management outside the protocol scope). Reimplementers MUST pass empty secret — a non-empty secret produces a different derived key with no error signal. Per-language: C callers using argon2_ctx directly MUST set .secret = NULL, .secretlen = 0; Python's argon2-cffi has no secret parameter (always empty, correct by construction); Go's argon2.IDKey has no secret parameter (always empty, correct by construction); Rust's argon2 crate uses Params::default() which sets secret to empty. See Appendix B for the full parameter table.

11. Server-Side Encryption at Rest

11.1 Storage Blob Format

Messages are batched, compressed, then encrypted. Blob format:

[version: 1 byte, offset 0] [flags: 1 byte, offset 1] [nonce: 24 bytes, offsets 2-25] [ciphertext + tag, offset 26+]

Version byte (1-255): Storage encryption key version. Value 0 is reserved and rejected.

Version 0 on the decrypt path: Version 0 is rejected at key creation (StorageKey::new) and never enters the keyring. A blob header with version 0 therefore produces a keyring lookup miss, returning AeadFailed — not an early InvalidData. Implementations MUST NOT add a pre-AEAD version-0 check on the decrypt path; doing so returns a different error variant and creates an error-type oracle. This guarantee depends entirely on the keyring construction invariant: the decrypt path itself performs no version-0 check — it relies on StorageKey::new having enforced version ≠ 0 at key creation time, so no version-0 key can ever be in the keyring to produce a lookup hit. An implementation that allows version-0 keys to be added via a debug path, test fixture, or internal bypass silently breaks this guarantee — version-0 blobs would then decrypt successfully against a version-0 key, producing correct plaintext where the spec mandates AeadFailed. Implementations MUST enforce version ≠ 0 at key construction unconditionally; the decrypt path's correctness is derived from it.

Flags byte (bitfield):

Bit 0: compression. 0 = none, 1 = zstd.
Bits 1-7: reserved (must be 0; blobs with reserved bits set are rejected as AeadFailed — not UnsupportedFlags — to prevent error oracles that distinguish pre-AEAD validation failures from authentication failures).

Nonce: 24 bytes generated from the OS CSPRNG (random_bytes(24)) per encryption call. Birthday collision probability is ~2⁻⁹⁶ per pair, negligible across realistic encryption volumes per key version. For context: at one billion encryptions per day (10⁹/day), the expected time to a first nonce collision under the same key bytes exceeds 10²⁰ years — exhausting the 2⁹⁶ nonce space is not a realistic threat. Key rotation MUST be driven by key-material compromise concerns and organizational policy, not by nonce-collision probability. The 24-byte nonce space is large enough that nonce exhaustion is structurally irrelevant; rotate keys on a schedule appropriate for the sensitivity of the protected data. The birthday bound is per key_bytes, not per version number: assigning a new version number to the same key bytes does not reset the nonce pool — all blobs encrypted under any version that maps to the same key bytes share a single nonce space. Operators MUST use fresh, independently generated key bytes for each new key version. Reusing key material across version numbers provides no cryptographic isolation.

Minimum blob length: 42 bytes (1 + 1 + 24 + 16-byte Poly1305 tag with empty ciphertext).

Version and flags are AEAD-authenticated via AAD (§11.4): both fields are included verbatim in the AAD that is passed to XChaCha20-Poly1305 encryption. A reimplementer who reads §11.1 and implements the encrypt path without reaching §11.4 may omit version and flags from the AAD — the AEAD succeeds, but the resulting blob is malleable: an attacker can flip the compression bit or substitute a different version byte without detection. The AAD construction in §11.4 is not optional.

11.2 Pipeline

Write: batch → serialize → compress (zstd if enabled) → construct AAD → encrypt (XChaCha20-Poly1305) → prepend version + flags + nonce → write.

Read: fetch → parse version + flags + nonce → reject reserved flag bits as AeadFailed (not UnsupportedFlags — §11.1 oracle collapse) → look up key by version → reconstruct AAD → decrypt → decompress (if flag set) → deserialize.

Compression before encryption is mandatory ordering (encrypted data is incompressible).

11.3 Compression

encrypt_blob(compress=true) always compresses and always sets flags=0x01, regardless of whether zstd output is larger than the input. There is no expansion-ratio skip. A reimplementer who skips compression when it would expand must set flags=0x00 (not flags=0x01) — see the "Zstd level 1 expansion" note below. Empty plaintext with compress=true: when the plaintext is empty (0 bytes) and compress=true, the encoder MUST still call zstd on the empty input and store the resulting non-empty zstd frame. An empty zstd input produces a minimal valid zstd frame (~4-12 bytes, not 0 bytes — informational only; implementations MUST NOT add a minimum frame size check based on this range: adding a "frame must be ≥ 4 bytes" pre-AEAD guard would reject AEAD-authenticated blobs from future-compatible encoders that use a different zstd version or configuration, and re-create the oracle-collapse problem by returning a distinct error before attempting AEAD). The encoder MUST NOT skip compression for empty plaintext and produce a 0-byte body — doing so creates a blob that decrypt_blob can decode successfully (AEAD passes on the correct key, decompression of a 0-byte body produces 0 bytes), but which is not conformant to this spec and is not byte-compatible with the reference implementation's encrypt output. A reimplementer who tests the empty-plaintext case using compress=false and assumes the same behavior applies to compress=true will miss this divergence.

Zstd (RFC 8878). On by default. The current implementation uses ruzstd's Fastest level (~zstd level 1); higher levels are not yet available in ruzstd 0.8.x. The compression level is not configurable — all blobs are compressed at the same level. Interop note: the compression level is not part of the wire format. Any valid zstd frame is acceptable on decompression regardless of the compression level used to produce it. A reimplementer using a higher compression level (e.g., zstd level 3) produces interoperable blobs — the decompressor does not know or care what level was used.

Size limit: 256 MiB maximum on native targets; 16 MiB on wasm32 targets (cfg(target_arch = "wasm32")). This limit is enforced on both the encrypt and decrypt paths. On encrypt, the core library returns InvalidData (not InvalidLength; oversized plaintext is a protocol-level size policy violation, not a type-level buffer size mismatch) when plaintext exceeds the platform's limit. On decrypt, decompressed output exceeding the limit triggers AeadFailed (not DecompressionFailed) — all post-AEAD errors are collapsed to prevent a 1-bit oracle that would reveal successful authentication (see §12 error-oracle collapse). The decrypt-side limit prevents OOM from maliciously crafted zstd payloads ("zip bomb" attacks). Enforced via decoder.take(MAX_DECOMPRESSED_SIZE + 1) followed by length check. Cross-platform note: a blob encrypted on native with plaintext between 16-256 MiB is permanently undecryptable on WASM (exceeds the WASM limit). WASM encryptors are capped at 16 MiB so they cannot create such blobs, but mixed-platform deployments must enforce the lower limit on the encryption side.

CAPI error on oversized plaintext — platform-dependent: The CAPI applies a blanket 256 MiB cap on all input buffers (§13.4), returning InvalidLength for any buffer exceeding that limit. On native targets, this CAPI cap fires before the core library's 256 MiB InvalidData check — so CAPI callers on native see InvalidLength for oversized plaintext, not InvalidData. On WASM targets, the core library's 16 MiB limit is smaller than the CAPI's 256 MiB cap, so the core check fires first and returns InvalidData. A reimplementer building a compatible CAPI should apply the general InvalidLength cap first, then let the core InvalidData check enforce the platform-specific limit.

Zstd level 1 expansion and conditional compression skip — AAD binding hazard: At compression level 1 (Fastest), zstd occasionally expands incompressible data (the compressed output is larger than the input). A reimplementer who skips compression when it would expand (i.e., uses the uncompressed plaintext if compressed_size >= original_size) MUST set flags = 0x00 in both the blob header and the AAD. If they set flags = 0x01 in the AAD (because they "attempted" compression) but store uncompressed plaintext in the ciphertext, AEAD authentication will succeed at decrypt but decompression of the uncompressed content will fail or produce garbage. Equivalently, if they set flags = 0x01 in the header and flags = 0x00 in the AAD, AEAD authentication fails immediately. The invariant: flags in the AAD MUST equal flags in the blob header, and BOTH must accurately reflect whether the encrypted content is zstd-compressed. There is no mechanism to "correct" the flags after AEAD seals the blob — the flags are bound to the ciphertext at encryption time.

Decompression is flag-driven, not content-sniffing: The decryptor checks flags bit 0 to determine whether to decompress — it does NOT inspect the plaintext for zstd magic bytes (0x28 0xB5 0x2F 0xFD) and attempt decompression speculatively. A reimplementer who sniffs content and decompresses any output that begins with zstd magic bytes diverges from the specification: an uncompressed blob whose plaintext happens to begin with those bytes would be incorrectly decompressed, producing garbage or a decompression error. The flags byte is the sole authority on whether decompression applies.

An empty compressed payload decompresses to an empty plaintext (decrypt side only — special-cased after AEAD decryption by checking whether the decrypted content is empty before calling zstd, which would otherwise reject the empty frame). The empty check fires on the post-AEAD plaintext bytes, not on the raw ciphertext body: flags=0x01 with a zero-byte post-AEAD payload is accepted; a reimplementer who checks for emptiness pre-AEAD (on the ciphertext) and rejects would silently refuse a class of valid blobs. On the encrypt side, zstd produces a non-empty frame even for empty input (zstd frame headers are always present), so an empty compressed payload can only appear in a blob produced outside the standard encrypt path.

encrypt_blob zstd expansion asymmetry with streaming: encrypt_blob does not enforce a zstd expansion guard — if zstd produces output larger than the input, the larger output is stored. This is asymmetric with stream_encrypt_chunk (§15.11), which returns Internal if zstd output exceeds plaintext.len() + STREAM_ZSTD_OVERHEAD. Implementations MUST NOT add the streaming guard to encrypt_blob for consistency — doing so returns Internal for incompressible inputs instead of storing them, breaking interoperability.

Compression oracle (CRIME/BREACH): Any API that compresses plaintext that an attacker can partially control and then reports the ciphertext size creates a compression oracle. If the caller can observe the size of the encrypted blob (e.g., as returned by encrypt_blob) and inject chosen text adjacent to a secret in the same compression context (e.g., the same blob or channel), the attacker can recover the secret byte-by-byte by correlating size changes with injected guesses. This is the CRIME/BREACH attack family. For encrypt_blob, the entire plaintext is compressed as a single unit before AEAD — if the plaintext mixes attacker-controlled data with secrets (e.g., a JSON blob with a user-controlled field alongside an authentication token), the compressed size reveals information about the secret. Callers MUST NOT include attacker-controlled data and secrets in the same encrypt_blob call without either disabling compression (compress=false) or ensuring the compression contexts are isolated. For community channel storage where all subscribers share the same channel key, this concern applies to any blob that mixes channel content from multiple trust levels. See §15.5 for the same analysis applied to the streaming layer, where the concern is more acute due to sequential chunk compression.

11.4 Storage AAD

11.4.1 Community Storage AAD

aad = "lo-storage-v1" // 13 bytes || version // 1 byte (key version) || flags // 1 byte (bit 0: compressed) || len(channel_id) || channel_id // UTF-8, 2-byte BE len || len(segment_id) || segment_id // UTF-8, 2-byte BE len

Total AAD size: 15 + 2 + len(channel_id) + 2 + len(segment_id) = 19 + len(channel_id) + len(segment_id) bytes. The fixed overhead is 19 bytes: 13 (label) + 1 (version) + 1 (flags) + 2 (channel_id length prefix) + 2 (segment_id length prefix). Quick-check: for 8-byte channel_id and 12-byte segment_id, the AAD is 39 bytes total.

Why version and flags are in the AAD: Binding version prevents an attacker from substituting a different key version's ciphertext (key confusion). Binding flags prevents flipping the compression bit after encryption — without this, an attacker could set the compression flag on uncompressed ciphertext, causing the decryptor to run zstd decompression on raw plaintext (producing garbage or a decompression bomb). The AAD authenticates the processing pipeline, not just the plaintext.

Why community storage AAD omits identity fingerprints: Community storage is channel-keyed, not user-keyed — blobs are shared channel content encrypted under the channel's storage key, not under any individual user's key. Binding sender/recipient fingerprints would require knowing the author's identity at both encrypt and decrypt time, which is not always available (e.g., bulk channel export, server-side re-encryption). The channel_id and segment_id provide the binding that matters for anti-relocation: blobs cannot be moved to a different channel or segment position. Why DM queue AAD binds recipient but not sender: DM queue blobs are server-held per-recipient caches. The recipient's identity is always known at both encrypt and decrypt time (it is the key owner). The sender's identity is not reliably available at decrypt time — a server processing a DM queue does not necessarily know which sender produced each blob. Binding only the recipient prevents a relay from substituting one recipient's queued messages into another recipient's slot, without requiring sender identity tracking. A reimplementer who adds sender fingerprints to community storage AAD or DM queue AAD produces blobs that cannot be decrypted by the standard implementation — the AAD mismatch causes silent AeadFailed.

No Unicode normalization: All string identifiers (channel_id, segment_id, batch_id, recipient_fp) are raw UTF-8 bytes with no normalization applied. "café" encoded as NFC (U+00E9) and NFD (U+0065 U+0301) produce different AAD and thus decryption failure. Languages with automatic string normalization (Swift String normalizes to NFC; macOS filesystem APIs may normalize paths) must preserve the original byte representation.

len() is byte length of the UTF-8 encoding: The 2-byte BE length prefix encodes the byte count of the UTF-8-encoded string — not the character count, UTF-16 code-unit count, or code-point count. A 4-byte emoji (U+1F600) has byte-length 4, character-count 1, and UTF-16-unit-length 2. Implementations that call string.length() in Java, C#, or JavaScript receive the UTF-16 unit count and MUST convert to byte count (e.g., string.getBytes(UTF_8).length in Java, Encoding.UTF8.GetByteCount(s) in C#, Buffer.byteLength(s, 'utf8') in Node.js) before writing the prefix. Passing the wrong count shifts all subsequent fields and produces permanent AEAD failure with no diagnostic.

UTF-8 validation: All string fields (channel_id, segment_id, batch_id) MUST be validated as well-formed UTF-8 before inclusion in the AAD. Invalid UTF-8 → InvalidData. In Rust, channel_id: &str (and likewise segment_id: &str, batch_id: &str) guarantees valid UTF-8 at the type level — no explicit from_utf8() check is required or present in the reference implementation; the Rust type system enforces it. In C, Go, and other language bindings, the caller must perform an explicit check — Go's string([]byte{0xFF}) accepts arbitrary bytes and does NOT validate UTF-8; C has no built-in UTF-8 validation. Without this check, two callers passing the same logical string through different byte-level representations (one with an invalid continuation byte silently substituted) produce different AAD and silent AEAD failure on decrypt. CAPI callers and non-Rust bindings MUST validate UTF-8 before passing string fields to the library.

Oversized identifier error asymmetry: When channel_id or segment_id exceeds 65,535 bytes (the maximum representable value of the 2-byte BE length prefix), build_storage_aad returns InvalidData. On the encrypt path this propagates directly as InvalidData. On the decrypt path it is remapped to AeadFailed — returning InvalidData from a decrypt call would leak that the rejection occurred at AAD construction before AEAD was attempted, revealing that the identifier exceeded the length limit rather than that authentication failed. Callers must not interpret AeadFailed on decrypt as proof that the ciphertext was structurally valid — an oversized identifier produces the same error as a tampered blob.

Oversized batch_id error asymmetry (§11.4.2): The same encrypt→InvalidData / decrypt→AeadFailed asymmetry documented above for channel_id and segment_id applies equally to batch_id. When batch_id exceeds 65,535 bytes (the maximum representable value of the 2-byte BE length prefix in the DM queue AAD construction), InvalidData is returned on the encrypt path and AeadFailed on the decrypt path. The rationale is identical to the community storage asymmetry above.

Zero-length channel_id and segment_id are valid: The len(x) || x encoding permits zero-length strings — a zero-length channel_id encodes as 0x0000 (2-byte BE prefix with no subsequent bytes). The library accepts zero-length IDs on both encrypt and decrypt paths. Reimplementers MUST NOT add a non-empty guard on these fields; doing so produces InvalidData on encrypt and AeadFailed on decrypt for blobs where an empty string was used as the identifier, creating a silent interoperability break with any implementation following this spec.

11.4.2 DM Queue AAD

aad = "lo-dm-queue-v1" // 14 bytes || version // 1 byte (key version) || flags // 1 byte (bit 0: compressed) || len(recipient_fp) || recipient_fp // 32 bytes, recipient identity fingerprint (length-prefixed despite being fixed-size; a reimplementer who uses bare encoding — by analogy with §7.4's fixed-size fields — produces different AAD bytes and silent AEAD failure on every message) || len(batch_id) || batch_id // UTF-8, 2-byte BE len

Total AAD size: 14 + 1 + 1 + 2 + 32 + 2 + len(batch_id) = 52 + len(batch_id) bytes. For a UUID4 batch_id (36 ASCII characters), the total AAD is 88 bytes.

batch_id MUST be unique per (recipient_fp, key_version) pair: The batch_id is the sole per-batch domain separator in the DM queue AAD. Two blobs with the same recipient_fp, key_version (the version byte in the header), and batch_id but different content share the same AAD — an attacker who can observe or influence both blobs can use the colliding AAD to mount an integrity substitution attack: because AEAD authentication binds the AAD, a ciphertext-and-tag pair that is valid under one blob's (nonce, key, AAD) is also valid when presented under any other blob with the same AAD and the same key. In practice, nonces are random and different per blob, so valid tag reuse across blobs requires nonce collision (negligible); however, colliding AADs mean the AEAD tag provides no binding between the ciphertext and which specific batch it belongs to — an attacker with write access to the store can substitute blob A for blob B if both share the same AAD, and the recipient's AEAD will accept it. Distinct batch_id values prevent this by making each batch's AAD unique, ensuring that a ciphertext produced for batch A cannot authenticate as batch B. batch_id SHOULD be a value that is unique per batch: a UUID4 (random_bytes(16) formatted as UUID), a monotonic counter, or a timestamp with sufficient resolution. The reference implementation does not generate batch_id automatically — it is caller-supplied. A server that reuses batch_id across different message batches to the same recipient under the same key version creates colliding AADs.

recipient_fp is a raw 32-byte binary value: recipient_fp is the raw SHA3-256 digest of the recipient's LO composite public key (32 bytes of binary data) — not a hex string, not a UTF-8 encoding, not a Base64 value. It is concatenated directly into the AAD with no encoding step and no additional length delimiter beyond the len() prefix shown above. Unlike batch_id (which is a UTF-8 string requiring byte-length encoding) and unlike the display fingerprint (a 64-character hex string returned by GenerateIdentity), recipient_fp is always 32 raw bytes. The surrounding §11.4 discussion of UTF-8 validation and byte-length semantics applies only to string fields (channel_id, segment_id, batch_id) — it does not apply to recipient_fp.

DM queue blob size cap: DM queue blobs (soliton_dm_queue_encrypt / soliton_dm_queue_decrypt) inherit the CAPI input size cap from §13.2: 256 MiB (268,435,456 bytes) on standard (non-WASM) targets; 16 MiB (16,777,216 bytes) on WASM targets (where allocation constraints are tighter). Inputs exceeding the applicable cap return InvalidLength. The WASM cap is enforced by the WASM binding layer, not the core library — the core library applies only the 256 MiB cap. Implementations targeting WASM MUST apply the 16 MiB cap before calling into the core library. In practice, DM queue blobs are bounded by the application-layer message size limit (LO Protocol §15.1 mandates padding to a fixed maximum message size), which is well below either cap.

11.5 Storage Layout

<backend>/<channel_id>/<yyyy-mm-dd>/segment-<N>.blob

Partitioned by channel and date for efficient retention purging.
Segments are append-only encrypted chunks, numbered sequentially within a day.
New segment when current exceeds batch_size. batch_size is an application-defined threshold (not a library constant) — the maximum number of messages stored in a single segment file before rolling over to a new one. A typical value is 1,000 messages; larger values increase the decryption cost when accessing old messages (the entire segment must be decrypted to retrieve any message within it).
S3: shallow, predictable prefixes.

segment_id AAD value: The segment_id field in community storage AAD (§11.4.1) is the <yyyy-mm-dd> directory name for channels with at most one segment per day (e.g., "2024-03-15", matching the test vector in Appendix F.8). The date MUST be ISO 8601 with zero-padded month and day: "2024-03-05" for March 5th — NOT "2024-3-5". Two implementations using different date formatters (one zero-padded, one not) produce different AAD bytes and silent AeadFailed on every cross-implementation decrypt. For channels with multiple segments per day, segment_id MUST include both the date and the sequence number to prevent AAD collisions — e.g., "2024-03-15/segment-42" (separating date and filename with /) or another unambiguous application-defined format, provided both encrypt and decrypt use the same convention. Using the bare date for multi-segment days means all blobs on that day share the same segment_id and therefore the same AAD — different key versions (the version byte) prevent key confusion, but the position binding is weaker. The channel_id AAD value is the bare channel identifier string, not any path prefix.

segment_id is external caller-supplied context — not stored in the blob: The segment_id value is never encoded inside the encrypted blob. It must be reproduced identically at decrypt time from external metadata (e.g., the file path, directory name, or database record identifying this segment). A reimplementer who assumes segment_id is derivable from the ciphertext will produce a wrong AAD at decrypt time and receive AeadFailed with no diagnostic. Both the encrypt and decrypt calls MUST supply the same segment_id string — the AEAD tag authenticates it as part of AAD, so any mismatch causes authentication failure.

No canonical multi-segment segment_id format: This specification does not standardize a multi-segment segment_id format. The example "2024-03-15/segment-42" is illustrative, not normative. Any format that uniquely identifies the date and segment position within a channel is acceptable, provided it is applied consistently on the encrypt and decrypt sides. Deployments that define their own format (e.g., "2024-03-15_42", "20240315-042") are specification-conformant as long as the same convention is used throughout.

11.6 Key Rotation

Multiple decryption keys active simultaneously, identified by version byte (1-255).

LO_STORAGE_KEY_V1 = <64 hex chars>    # 256-bit key — MUST be generated from the OS CSPRNG
LO_STORAGE_KEY_V2 = <64 hex chars>
# Generate: openssl rand -hex 32

Storage key MUST be generated from the OS CSPRNG: unlike the per-blob nonce (§11.1, explicitly random_bytes(24)), the storage key is a long-lived 256-bit symmetric key used to encrypt every blob under that version. The key MUST be generated using the OS CSPRNG (openssl rand -hex 32, random_bytes(32), or equivalent). A key derived from a deterministic schedule, a counter, a password without KDF stretching, or any non-CSPRNG source reduces security to the entropy of that source. A KDF-derived key (e.g., HKDF from a master key) is acceptable only if the master key itself was CSPRNG-generated and the derivation is documented. The openssl rand -hex 32 example above is the recommended generation command; any OS-level CSPRNG invocation (/dev/urandom, getrandom(2), CryptGenRandom, SecRandomCopyBytes) is equivalent.

Procedure:

Generate new key for version N+1.
Provide at boot: LO_STORAGE_KEY_V{N+1} environment variable.
Update config: active_version = N+1 (live reload).
New writes tagged V(N+1). Old reads use version byte to select key.
Optional: re-encrypt old segments (read with old key, write with new).
After migration: remove old key on next restart. Warning: if step 5 is skipped, removing the old key makes all blobs encrypted under that version permanently undecryptable — there is no grace period, soft delete, or recovery mechanism. The version byte in each blob identifies which key decrypts it; without that key in the keyring, decrypt_blob returns AeadFailed. Callers MUST either complete re-encryption before key removal or accept permanent data loss for unrewritten blobs.

Key validation: StorageKey::new(version, key_bytes) rejects version 0 with UnsupportedVersion (version 0 is reserved — consistent with encountering version 0 in a blob's version byte during decryption) and rejects all-zero key material with InvalidData via constant-time comparison (ct_eq). An all-zero key is never legitimate — with the key known to any observer, the XChaCha20 keystream is publicly computable and all encrypted blobs are trivially decryptable. (Nonces are CSPRNG-generated independently of the key value; the rejection is not about nonce determinism.) Caller zeroization obligation: key_bytes is [u8; 32] — a Copy type. StorageKey::new receives a bitwise copy of the caller's array and zeroizes its own copy on rejection paths. The caller's original array is a separate copy that is NOT zeroized by the library. After calling StorageKey::new, the caller MUST explicitly zeroize their copy of key_bytes (Rust: key_bytes.zeroize(); C: soliton_zeroize(key_ptr, 32)). The general [u8; N] Copy zeroization pattern is described in §10.5; StorageKey::new is a specific instance of it.

add_key atomicity — active_version update and map insert: The add_key function sets active_version = version and inserts the key into the map. The Rust reference implementation assigns active_version BEFORE the map insert — this is safe in Rust because HashMap::insert is infallible (it cannot throw or return an error). Go's map[V]K insert and CPython's dict insert are also infallible — the assign-before-insert ordering is safe in both and requires no reordering. Reimplementers in languages where map insert CAN throw or fail (Java HashMap, C# Dictionary, custom MutableMapping subclasses in Python) MUST use the opposite ordering: assign active_version only AFTER the insert returns success. In these languages, if the map insert throws after the active_version assignment, active_version points to a missing key, breaking the active_key() never-None invariant — every subsequent encrypt_blob call returns InvalidData with no diagnostic. The safe rule: assign active_version after insert in any language where the map insert can raise an exception or return an error; assign before insert only when insert is unconditionally infallible.

Keys are held in process memory for the lifetime of the StorageKeyRing object and are zeroized on drop — when the StorageKeyRing is dropped (end of scope in Rust; explicit destruction in CAPI via the free function), all key material in the key list is zeroized before deallocation. CAPI callers MUST call the keyring free function after use; failing to do so leaks key material (the allocation is freed but the key bytes are not zeroed). Keys are not persisted automatically — the caller is responsible for reloading keys (e.g., from environment variables or a secrets manager) at startup. Environment variables should be cleared from the process environment immediately after reading.

Security risk of retaining old keys: Step 6 of the rotation procedure (removing the old key) is a security obligation, not only a hygiene concern. An old key that remains in the keyring — even after all blobs have been re-encrypted under the new key — provides an attacker who later compromises that old key with the ability to substitute old blobs back into storage. If an attacker has write access to the blob store and possesses a compromised old key, they can replace new-version blobs with old-version blobs encrypted under the compromised key; the keyring will decrypt them successfully (the version byte routes to the retained old key). Retaining old keys long after migration extends the window during which a key compromise enables replay of stale content. The recommended pattern: remove old keys promptly after re-encryption is complete, and treat the Ok(false) return from remove_key (key was already absent) as confirmation, not an error.

remove_key behavior: Removes a key version from the keyring's in-memory list and returns Ok(true) if the key was present, Ok(false) if it was absent (idempotent — removing a non-existent version is a no-op, not an error). remove_key(0) returns UnsupportedVersion (version 0 is reserved — same as StorageKey::new(0)). The active version cannot be removed (returns InvalidData). This is immediate in-memory removal, not deferred deletion — subsequent decrypt calls using that version will return AeadFailed (not UnsupportedVersion, to prevent error-oracle attacks that could distinguish "removed key" from "corrupted ciphertext"). Version tracking is the caller's responsibility; the keyring does not persist removal history.

add_key return value: add_key(version, key_bytes, make_active) returns Ok(true) if a key at that version already existed and was replaced (prior material is destroyed — any blobs encrypted under the old key material at that version are permanently undecryptable), Ok(false) if the version was new. make_active=false with a version matching the current active version returns InvalidData (see §13.2 for the CAPI note). The Ok(true) case is a silent overwrite of key material — callers who need to know whether a key was replaced should inspect the return value before adding.

add_key(make_active=true) with the current active version replaces key material in-place: When make_active=true and version equals the current active version, add_key succeeds (Ok(true)) and replaces the active key's material with the new bytes. The make_active=false-with-active-version guard does NOT fire in this case (that guard prevents the specific inconsistency of changing the active pointer without updating the material). The result: every subsequent encrypt_blob call uses the new key material, and any blobs previously encrypted with the old material at that version are permanently undecryptable — the old material is destroyed in-place with no grace period. This operation is only safe when all blobs at that version have already been migrated (re-encrypted) under a different version. The recommended rotation pattern (§11.6 step 3) avoids this hazard by always using a new version number when adding a replacement key.

StorageKeyRing thread safety model: StorageKeyRing auto-derives Send + Sync (all fields are Send + Sync) but is not designed for concurrent access. There is no internal Mutex — the struct is a plain HashMap<u8, StorageKey> with an active_version: u8 field. Mutating operations (add_key, remove_key) take &mut self; concurrent mutation requires exclusive access enforced by the caller, not the library. encrypt_blob and decrypt_blob do NOT take a &StorageKeyRing reference — the caller retrieves the active key via ring.active_key() and passes the resulting &StorageKey directly to encrypt_blob. The CAPI SolitonKeyRing wrapper adds an AtomicBool reentrancy guard that returns ConcurrentAccess (-18) on re-entrant calls from a single thread — this is a single-thread reentrancy guard, not a multi-thread Mutex. Correct concurrent model for reimplementers: wrap StorageKeyRing in a caller-owned Mutex<StorageKeyRing>. Encrypt/decrypt and key management all require exclusive lock acquisition, since getting the key reference (active_key()) and passing it to encrypt_blob are two separate operations that must not be split by a concurrent add_key/remove_key. A RwLock where encrypt/decrypt acquire read locks and key management acquires a write lock is unsafe because active_key() returns a reference into the inner HashMap — the reference is invalidated if any write-lock operation (add_key/remove_key) rehashes the map.

active_key() never-None invariant: The keyring is constructed with an initial key (new(version, key_bytes)), remove_key rejects removal of the active version, and add_key(make_active: true) atomically replaces the active version. These three constraints ensure active_key() always returns Some. In Rust, the Option<&StorageKey> return type is never None by construction. Binding authors implementing a keyring outside Rust's type system MUST maintain this invariant — if violated, encrypt_blob has no key to encrypt with and returns InvalidData with no diagnostic pointing to the empty keyring. The invariant should be checked at construction time (reject an empty or zero-version keyring) rather than at each encrypt_blob call.

12. Error Types

All soliton operations return a Result<T, Error>. The following error variants are defined:

Variant	C code	Meaning	Recoverability
`InvalidLength`	-1	Input has wrong size for the expected key or buffer type. Rust struct variant: `InvalidLength { expected: usize, got: usize }` — NOT a unit variant. Pattern-matching must bind both fields (or use `..`); constructing it requires both fields. The error message is `"invalid length: expected {expected}, got {got}"`. Internal truncation errors that would expose parser offset information use `InvalidData` instead, not `InvalidLength` — `InvalidLength` is reserved for caller-supplied parameters that don't match a known fixed size.	Caller bug
`DecapsulationFailed`	-2	KEM decapsulation failed — unreachable in lo-crypto-v1 (see note below). Exists for forward compatibility with explicit-rejection KEMs.	Retry-safe (decrypt)
`VerificationFailed`	-3	Signature verification failed	Retry-safe
`AeadFailed`	-4	AEAD decryption failed (wrong key, tampered ciphertext, or wrong AAD)	Session-fatal (encrypt); retry-safe (decrypt)
`BundleVerificationFailed`	-5	Pre-key bundle IK mismatch or signature invalid	Retry-safe (fetch new bundle)
`TooManySkipped`	-6	(reserved — was skip cache overflow, removed in counter-mode redesign)	—
`DuplicateMessage`	-7	Message counter already in recv_seen (already decrypted). Caller guidance: silently discard the duplicate — no application-level notification, no retry. MUST NOT surface this error to the message sender: an attacker who can distinguish `DuplicateMessage` from `AeadFailed` gains a membership-oracle on the receiver's `recv_seen` set, enabling byte-by-byte probing of which counters have been decrypted. Treat as an opaque "already delivered" signal at the transport layer.	Retry-safe (state unchanged)
(reserved)	-8	Was `SkippedKeyNotFound`, removed	—
`AlgorithmDisabled`	-9	(reserved — intended for platform-specific algorithm availability; currently unused)	—
`UnsupportedVersion`	-10	Serialized blob has unknown version byte. Source functions: `from_bytes`/`from_bytes_with_min_epoch` (ratchet blob version ≠ 0x01); `stream_decrypt_init` (stream header version ≠ 0x01); `StorageKey::new` (key version = 0 is reserved). Not returned from `decrypt_blob` for unknown blob version — that case collapses to `AeadFailed` (version-enumeration oracle).	Permanent
`DecompressionFailed`	-11	Zstandard decompression failed, or decompressed size exceeds 256 MiB (collapsed to `AeadFailed` at trust boundaries — see note below)	Retry-safe
`Internal`	-12	Structurally unreachable internal error. Also returned from `stream_encrypt_chunk` if zstd produces expansion beyond `plaintext.len() + STREAM_ZSTD_OVERHEAD` (§15.11) — the overhead ceiling is additive over the actual plaintext length, not over `CHUNK_SIZE`; a 100-byte final chunk that compresses to more than 356 bytes (100 + 256) triggers `Internal`, not a 1 MiB + 256 ceiling. Encrypt-side only (no oracle concern), not session-fatal. Recovery requires a full stream restart: `compress` is fixed at `stream_encrypt_init` — there is no per-chunk `compress` parameter to toggle. Retrying the same chunk on the same encryptor fails deterministically with the same result (the zstd output for that plaintext is fixed). Recovery requires abandoning the current encryptor, creating a new one via `stream_encrypt_init` with `compress = false`, and re-encrypting the stream from the beginning. `encrypt()` does not return `Internal` on CSPRNG failure. `random_bytes()` panics on OS CSPRNG unavailability, and `panic = "abort"` converts that panic into process termination — no error is propagated to the caller. CSPRNG failure is treated as non-recoverable by design: there is no safe fallback from an unusable entropy source, and a panicking abort is preferable to silently deriving keys from predictable "random" bytes. Also returned from `soliton_argon2id` if the underlying argon2 library returns an unexpected error not mappable to any other variant — structurally unreachable with a correct argon2 implementation and valid parameters.	Context-dependent (see notes)
`NullPointer`	-13	CAPI null pointer argument (C ABI only)	Caller bug
`UnsupportedFlags`	-14	Reserved for storage blobs with reserved flag bits set. Never constructed in the current implementation — reserved-flag rejections are collapsed directly to `AeadFailed` without producing this variant (see oracle-collapse note below). Retained solely for ABI stability: error code -14 MUST NOT be reassigned.	Permanent

Error oracle collapse (defense-in-depth): In the decrypt path, DecompressionFailed and UnsupportedFlags are collapsed to AeadFailed before returning to the caller. Distinct error codes for post-AEAD parsing steps would let an attacker distinguish "AEAD passed but decompression failed" from "AEAD failed," leaking a decryption oracle. The CAPI maps both to SOLITON_ERR_AEAD (-4). The distinct codes above are retained solely for ABI stability and are never exposed across trust boundaries. UnsupportedFlags is never constructed in the current implementation — reserved-flag rejections are mapped directly to AeadFailed.

Consolidated collapse table:

Internal Error Exposed As Context Reason

DecompressionFailed AeadFailed Storage (§11.3) Post-AEAD parsing oracle

UnsupportedFlags (reserved bits) AeadFailed Storage (§11.3) Reserved-bit oracle on pre-AEAD header field (flags byte is parsed before AEAD; a distinct error leaks that the rejection was structural, not cryptographic)

DecompressionFailed AeadFailed Streaming (§15.7) Post-AEAD decompression oracle

Reserved flag bits (stream header) AeadFailed Streaming (§15.8) Header field oracle (attacker-controlled)

Size mismatch after decompress AeadFailed Streaming (§15.7) Post-AEAD size oracle

Key version not in keyring at decrypt time AeadFailed Storage decrypt (§11.3) Version-enumeration oracle — returning UnsupportedVersion for an unregistered version byte would let an attacker distinguish "key not loaded" from "wrong ciphertext"

Undersize ciphertext (< 16 bytes) AeadFailed AEAD decrypt (§7.1) Too-short-vs-bad-tag oracle — using InvalidLength would let an attacker distinguish "ciphertext shorter than Poly1305 tag" from "valid-length but wrong tag"

Storage blob shorter than 42 bytes AeadFailed Storage decrypt (§11.1) Pre-AEAD framing oracle — 42 bytes is the minimum valid blob (26-byte header + 16-byte Poly1305 tag); using InvalidLength or InvalidData would let an attacker distinguish "blob too short to contain valid ciphertext" from "plausible-length blob with wrong key/tag"

ChainExhausted from from_bytes ChainExhausted (not InvalidData) Deserialization (§6.8) The blob's stored epoch is u64::MAX — the resulting state cannot be re-serialized (to_bytes would overflow on epoch + 1). States with stored epoch u64::MAX - 1 are accepted but can_serialize() returns false, preventing to_bytes from producing a zombie blob. This is a counter-exhaustion condition, not a format error.

Streaming chunk input shorter than STREAM_CHUNK_OVERHEAD = 17 bytes AeadFailed Streaming decrypt (§15) Pre-AEAD framing oracle — a 17-byte minimum is tag_byte (1) + Poly1305 tag (16), the smallest structurally valid chunk with zero-length plaintext. Returning InvalidLength or InvalidData for a sub-17-byte chunk would let an attacker distinguish "chunk too short to attempt AEAD" from "plausible-length chunk with wrong key/tag." This parallels the "Undersize ciphertext (< 16 bytes) → AeadFailed" rule for raw AEAD, with the streaming layer adding 1 byte for the tag_byte. Note: §15.7 describes the oracle-collapse scope as "post-authentication errors" — this pre-AEAD check is the streaming-layer analogue of the raw AEAD undersize collapse, not a post-auth error; both collapse to AeadFailed for the same oracle-prevention reason.

| DuplicateMessage | AeadFailed (toward sender) | Ratchet decrypt (§6.6) | Replay-detection oracle — an attacker who can distinguish DuplicateMessage from AeadFailed gains a membership oracle on the receiver's recv_seen set, enabling byte-by-byte probing of which counters have been decrypted. DuplicateMessage MUST NOT be surfaced to the message sender; the transport layer MUST treat it as an opaque "already delivered" signal and silently discard the duplicate. |

Not collapsed (checked on public/pre-AEAD data): UnsupportedVersion (version byte is cleartext), InvalidData for pre-AEAD framing checks (chunk wire length is observable). | ChainExhausted | -15 | Five distinct recoverability modes: (1) Encrypt-side (send_count at u32::MAX): session-fatal for the send direction — no more messages can be sent. Source: encrypt() / soliton_ratchet_encrypt only. (2) Decrypt-side recv_seen saturation (§6.8): transient — the recv_seen or prev_recv_seen set is full (65536 entries); the cap resets on the next KEM ratchet step (peer triggers direction change). A caller who treats all ChainExhausted from decrypt() as session-fatal will terminate a recoverable session. Source: decrypt() / soliton_ratchet_decrypt only. (3) Serialization epoch overflow (to_bytes at epoch u64::MAX, §6.8 guard 24): persistence-fatal — the in-memory session remains functional for send/receive but can never be serialized again. Source: to_bytes() / soliton_ratchet_to_bytes only. Also returned by from_bytes() / soliton_ratchet_from_bytes_with_min_epoch when the deserialized epoch equals u64::MAX (guard 24 — the session cannot be serialized again; §6.8). (4) Call chain advance limit (§6.12): CallKeys::advance() returns ChainExhausted after 2²⁴ steps; the call session is permanently exhausted and a new derive_call_keys() call is required. Unrelated to ratchet message counters. Source: CallKeys::advance() / soliton_call_keys_advance only. (5) Streaming chunk index exhaustion (§15.9): returned by encrypt_chunk or decrypt_chunk (sequential) when next_index == u64::MAX. Not session-fatal — the handle is still valid and can be freed normally. Distinct from the ratchet modes: a streaming ChainExhausted does NOT indicate any ratchet problem. Source: soliton_stream_encrypt_chunk / soliton_stream_decrypt_chunk only; soliton_stream_decrypt_chunk_at never returns this. | See per-mode description | | UnsupportedCryptoVersion | -16 | crypto_version field in a session init is not "lo-crypto-v1". Source functions: decode_session_init and receive_session. receive_session (§5.5 Step 1) performs its own crypto_version check against the parsed SessionInit before signature verification — it returns UnsupportedCryptoVersion directly (not collapsed, because §5.5's checked values are public; see the error-collapsing note in §5.5). decode_session_init returns it during wire-format parsing. Not returned by verify_bundle — a wrong crypto_version in a pre-key bundle is collapsed to BundleVerificationFailed (§5.3 error-collapsing paragraph) along with fingerprint mismatches and signature failures, to prevent enumeration of which check failed. Not returned by the ratchet or storage layers. A binding author who pattern-matches for UnsupportedCryptoVersion from decode_session_init but not from receive_session will miss the second source. | Permanent | | InvalidData | -17 | Structural violation in serialized data or caller protocol misuse. Covers: bad marker bytes, co-presence errors, implausible values in deserialized blobs (ratchet, session-init); and caller misuse on the streaming API — calling encrypt_chunk or decrypt_chunk after finalization, passing a wrong-size non-final chunk (uncompressed), or passing an oversized final chunk plaintext. Binding authors MUST NOT assume this error always indicates corrupt received data; it may indicate a caller-side state machine bug. | Retry-safe | | ConcurrentAccess | -18 | Opaque handle is being freed while another thread holds a reference (CAPI-only — not present in the core Error enum; exists only as a CAPI error code) | Caller bug |

Internal Error	Exposed As	Context	Reason
`DecompressionFailed`	`AeadFailed`	Storage (§11.3)	Post-AEAD parsing oracle
`UnsupportedFlags` (reserved bits)	`AeadFailed`	Storage (§11.3)	Reserved-bit oracle on pre-AEAD header field (flags byte is parsed before AEAD; a distinct error leaks that the rejection was structural, not cryptographic)
`DecompressionFailed`	`AeadFailed`	Streaming (§15.7)	Post-AEAD decompression oracle
Reserved flag bits (stream header)	`AeadFailed`	Streaming (§15.8)	Header field oracle (attacker-controlled)
Size mismatch after decompress	`AeadFailed`	Streaming (§15.7)	Post-AEAD size oracle
Key version not in keyring at decrypt time	`AeadFailed`	Storage decrypt (§11.3)	Version-enumeration oracle — returning `UnsupportedVersion` for an unregistered version byte would let an attacker distinguish "key not loaded" from "wrong ciphertext"
Undersize ciphertext (< 16 bytes)	`AeadFailed`	AEAD decrypt (§7.1)	Too-short-vs-bad-tag oracle — using `InvalidLength` would let an attacker distinguish "ciphertext shorter than Poly1305 tag" from "valid-length but wrong tag"
Storage blob shorter than 42 bytes	`AeadFailed`	Storage decrypt (§11.1)	Pre-AEAD framing oracle — 42 bytes is the minimum valid blob (26-byte header + 16-byte Poly1305 tag); using `InvalidLength` or `InvalidData` would let an attacker distinguish "blob too short to contain valid ciphertext" from "plausible-length blob with wrong key/tag"
`ChainExhausted` from `from_bytes`	`ChainExhausted` (not `InvalidData`)	Deserialization (§6.8)	The blob's stored epoch is `u64::MAX` — the resulting state cannot be re-serialized (`to_bytes` would overflow on `epoch + 1`). States with stored epoch `u64::MAX - 1` are accepted but `can_serialize()` returns false, preventing `to_bytes` from producing a zombie blob. This is a counter-exhaustion condition, not a format error.
Streaming chunk input shorter than `STREAM_CHUNK_OVERHEAD` = 17 bytes	`AeadFailed`	Streaming decrypt (§15)	Pre-AEAD framing oracle — a 17-byte minimum is `tag_byte (1) + Poly1305 tag (16)`, the smallest structurally valid chunk with zero-length plaintext. Returning `InvalidLength` or `InvalidData` for a sub-17-byte chunk would let an attacker distinguish "chunk too short to attempt AEAD" from "plausible-length chunk with wrong key/tag." This parallels the "Undersize ciphertext (< 16 bytes) → AeadFailed" rule for raw AEAD, with the streaming layer adding 1 byte for the tag_byte. Note: §15.7 describes the oracle-collapse scope as "post-authentication errors" — this pre-AEAD check is the streaming-layer analogue of the raw AEAD undersize collapse, not a post-auth error; both collapse to `AeadFailed` for the same oracle-prevention reason.

DecapsulationFailed is unreachable in lo-crypto-v1 — two blocked paths: This variant is structurally unreachable because both sites that could produce it are blocked:

encode_ratchet_header / decode_ratchet_header: Any KEM ciphertext (kem_ct) with the wrong size (not exactly 1120 bytes) is rejected as InvalidData during header parsing, before X-Wing decapsulation is attempted. A malformed ciphertext never reaches xwing::decapsulate.
xwing::decapsulate itself: X-Wing uses implicit rejection (§8.4): if ML-KEM decapsulation detects a "garbage ciphertext" condition (J ≠ 0), it substitutes a pseudo-random shared secret (z XOR H(ciphertext)) rather than returning an error. X25519 always produces a result (all-zero output is handled separately by the low-order point check). xwing::decapsulate therefore always returns Ok(shared_secret), never Err(DecapsulationFailed).

The variant is retained in the enum for ABI stability (-2 is reserved) and for future explicit-rejection KEMs. Binding authors may safely treat DecapsulationFailed from the current library as Internal — it indicates a logic error, not a recoverable condition.

Error is #[non_exhaustive]: The Error enum is marked #[non_exhaustive] in Rust, meaning match arms must include a catch-all (_ => ...). Binding authors and application code MUST NOT exhaustively match on error codes — a future version may add new variants without incrementing the library's major version. New variants MUST get new numeric codes (not reuse reserved slots). The #[non_exhaustive] attribute also prevents binding authors from constructing Error values directly; use the library's entry points.

Recoverability key: Retry-safe — the operation can be retried or the message dropped; ratchet state is unchanged on error. Session-fatal — the session (or encrypt direction) is permanently broken; AeadFailed on encrypt triggers full key zeroization as defense-in-depth, making the session irrecoverable. Permanent — the error reflects a capability or format gap, not a transient condition. Caller bug — indicates a programming error in the calling code.

InvalidLength is for type-level size mismatches (wrong key size, wrong ciphertext size). InvalidData is for structural content violations (bad format, co-presence invariant broken, implausible values). The distinction matters for diagnostics.

InvalidData from _free functions means wrong handle type, not blob corruption: When soliton_ratchet_free, soliton_keyring_free, soliton_call_keys_free, soliton_stream_encrypt_free, or soliton_stream_decrypt_free return InvalidData (-17), it indicates the opaque handle's internal type discriminant is wrong — the handle pointer belongs to a different handle type (e.g., a SolitonKeyRing* was passed to soliton_ratchet_free). This is distinct from InvalidData returned by decryption or deserialization functions, where it means structurally invalid content. Binding authors writing diagnostic or error-handling code for _free functions should map InvalidData to "handle type mismatch" rather than "corrupted data."

Error code ABI stability: Once a numeric error code is assigned (e.g., -6 for TooManySkipped), that code is reserved forever — even if the error variant is removed or renamed. Binding authors hardcode these values in constants, switch statements, and documentation. Reassigning a code to a different error would silently change the meaning of existing bindings without compilation or test failures. Removed codes are marked "reserved" in the table above and must never be reused.

13. C ABI (soliton_capi)

13.1 Overview

soliton_capi exposes the core library as a stable C ABI (extern "C" functions). Direct consumers: Go (cgo), C# (P/Invoke), Dart (dart:ffi), C/C++. Swift and Kotlin consume the CAPI indirectly via UniFFI-generated wrappers. Node.js uses napi-rs (a Rust-native Node add-on API that does not call through the C ABI).

The generated header is soliton.h. It is produced by cbindgen and must not be edited manually.

13.2 Conventions

Return codes: 0 = success, negative = error (see §12 for codes).
Caller-allocated output buffers: Used when output size is a fixed compile-time constant (e.g., 32-byte fingerprints, 32-byte shared secrets). The caller passes a pre-allocated buffer; on error the buffer is zeroed.
Library-allocated buffers (SolitonBuf): Used for variable-length outputs. Must be freed with soliton_buf_free. Never call free() directly on ptr.
Opaque heap objects (SolitonRatchet*, SolitonKeyRing*): Allocated by the library, freed with their respective _free functions.
CSPRNG failure aborts the process: All keygen and encapsulation operations that consume OS entropy (getrandom(2), ProcessPrng, getentropy, etc.) abort the process on CSPRNG failure rather than returning an error code — there is no safe cryptographic fallback when randomness is unavailable. This behavior is by design and is not configurable. Binding authors: do NOT wrap CAPI calls in a catch-all exception handler or POSIX signal handler expecting to recover from abort — the abort is deliberate and the process state after a failed CSPRNG call is not safely continuable. C++ callers: extern "C" functions MUST NOT propagate exceptions across the FFI boundary (undefined behavior per the C++ standard); the abort-on-CSPRNG-failure guarantee depends on no exception reaching the FFI boundary from within the library. A C++ wrapper that installs a std::terminate handler or catches SIGABRT will mis-handle this.
All pointer arguments must be non-null unless documented otherwise. Null pointers return NullPointer (-13). Exception (empty plaintext for encrypt only): soliton_stream_encrypt_chunk accepts plaintext = NULL with plaintext_len = 0 — this is the mechanism for producing an empty final chunk (valid empty-file stream). soliton_stream_decrypt_chunk does NOT share this exception: its ciphertext input is named chunk (not plaintext) and null chunk is always rejected with NullPointer, even with chunk_len = 0. Binding wrappers that add blanket non-null guards on the encrypt-side plaintext pointer break the empty-file use case silently (the null check fires before the zero-length check, returning NullPointer where the empty chunk would succeed). Wrappers that apply the same exception to the decrypt-side chunk pointer diverge from the reference — the reference returns NullPointer for null chunk unconditionally. Exception (empty AAD): soliton_stream_encrypt_init and soliton_stream_decrypt_init accept caller_aad = NULL with aad_len = 0 — this is the mechanism for streams with no additional authenticated data. The AAD defaults to empty, and HMAC domain separation is provided by the stream key and base nonce. Binding wrappers that add blanket non-null guards on caller_aad return NullPointer for valid empty-AAD calls. Exception (zero-length primitive inputs): soliton_hmac_sha3_256, soliton_hkdf_sha3_256, and soliton_argon2id accept a NULL pointer for any input whose corresponding length field is 0. Specifically: key = NULL with key_len = 0 (HMAC with empty key), data = NULL with data_len = 0 (HMAC/HKDF with empty data/IKM), salt = NULL with salt_len = 0 (HKDF/Argon2id with empty salt), password = NULL with password_len = 0 (Argon2id with empty password), info = NULL with info_len = 0 (HKDF with empty info). These are valid degenerate inputs to the underlying primitives — HMAC(key=∅, data), HKDF with empty IKM or salt, and Argon2id with empty password are all well-defined by their respective RFCs. The null-with-nonzero-length combination still returns NullPointer. Binding wrappers that add blanket non-null guards on these input pointers break the empty-input use case for primitive APIs where the caller explicitly wants to derive from an empty string. This exception does NOT apply to output buffers, key parameters with implicit fixed sizes (e.g., the key in soliton_aead_encrypt), or any parameters not enumerated here.
Zero-length byte arrays: Most CAPI functions reject non-null pointers with zero length as InvalidLength. Exception (zero-length ciphertext to decrypt operations): soliton_ratchet_decrypt, soliton_ratchet_decrypt_first, soliton_stream_decrypt_chunk, and soliton_stream_decrypt_chunk_at return AeadFailed (not InvalidLength) for inputs shorter than their respective AEAD minimums (16 bytes for ratchet, 40 bytes for first-message, 17 bytes for streaming) — collapsing to AeadFailed prevents an oracle distinguishing "too short to attempt AEAD" from "wrong key." See §12 collapse table. soliton_stream_decrypt_chunk_at shares the same collapse because it calls the same underlying decrypt_chunk_inner path as soliton_stream_decrypt_chunk. Binding wrappers that add zero-length short-circuit guards on these ciphertext inputs may return InvalidLength where the library returns AeadFailed, breaking the oracle-collapse guarantee. soliton_aead_decrypt with zero-length ciphertext is NOT in this exception: soliton_aead_decrypt with ciphertext_len = 0 returns InvalidLength (the CAPI zero-length guard fires before the core AEAD minimum check). ciphertext_len values 1-15 return AeadFailed (too short to contain the 16-byte Poly1305 tag, but non-zero length passes the CAPI guard). A reimplementer who applies the ratchet/stream pattern to soliton_aead_decrypt and returns AeadFailed for len = 0 diverges from the reference.
Input size cap: All CAPI functions reject any single input buffer exceeding 256 MiB (268,435,456 bytes) with InvalidLength. This is a defense-in-depth limit — no legitimate cryptographic input approaches this size, and rejecting oversized buffers early prevents downstream integer overflow or allocation-exhaustion issues in binding languages with unchecked size casts. Exception — streaming chunk functions: soliton_stream_decrypt_chunk and soliton_stream_decrypt_chunk_at do not apply a 256 MiB pre-check on the chunk input — chunk size is bounded structurally by STREAM_CHUNK_STRIDE (1,048,593 bytes) and the AEAD layer rejects any oversized input. A reimplementer who adds an explicit 256 MiB InvalidLength guard to streaming chunk functions introduces an observable divergence: the reference implementation returns AeadFailed for oversized chunks, not InvalidLength.
crypto_version string: null vs empty vs non-UTF-8 produce different errors — applies to soliton_kex_verify_bundle, soliton_kex_initiate, soliton_kex_decode_session_init, and soliton_kex_receive (the only four CAPI functions that accept a crypto_version parameter; all other CAPI functions do not take a crypto_version argument): crypto_version is passed as a null-terminated C string (const char *), not as a (ptr, len) pair. Three outcomes: (1) A null pointer returns NullPointer (-13); (2) a non-null pointer to a valid UTF-8 string that is not "lo-crypto-v1" (including an empty string "") returns UnsupportedCryptoVersion (-16); (3) a non-null pointer whose bytes are not valid UTF-8 returns InvalidData (-17) — the CAPI's CStr::from_ptr → to_str() call fails before version comparison can run, and the conversion error maps to InvalidData, not UnsupportedCryptoVersion. A reimplementer who pattern-matches on UnsupportedCryptoVersion to detect all "wrong version" inputs will miss the non-UTF-8 case, which surfaces as the unrelated-seeming InvalidData. This third outcome matters for bindings from runtimes whose string types may not be UTF-8 (Latin-1 in older Java contexts, arbitrary bytes in C char arrays). This distinction matters for bindings that represent "absent" and "empty" differently: some binding languages (Python None vs "", Java null vs "", Swift nil vs "") have distinct representations for these two cases. The binding's null-to-C mapping must pass a null pointer for "absent" and a pointer-to-null-byte for "empty." Bindings that convert None/nil/null to an empty C string (pointer-to-'\0') instead of a null pointer will return UnsupportedCryptoVersion where NullPointer is expected, and vice versa.

Concurrency safety — stateless functions vs. opaque handles: All primitive functions that take no opaque handles are safe to call concurrently from multiple threads without synchronization: soliton_sha3_256, soliton_hmac_sha3_256, soliton_hmac_sha3_256_verify, soliton_hkdf_sha3_256, soliton_aead_encrypt, soliton_aead_decrypt, soliton_xwing_keygen, soliton_xwing_encapsulate, soliton_xwing_decapsulate, soliton_identity_sign, soliton_identity_verify, soliton_verification_phrase, soliton_random_bytes, soliton_argon2id, soliton_zeroize. These functions have no internal mutable state — each call is fully independent. Opaque-handle functions (soliton_ratchet_*, soliton_keyring_*, soliton_stream_*, soliton_kex_*) require exclusive access per-handle; concurrent calls on the same handle are detected by the reentrancy guard and return ConcurrentAccess (-18).

13.3 Buffer Management

typedef struct SolitonBuf {
    uint8_t *ptr;
    uintptr_t len;
} SolitonBuf;

// Free and zeroize a library-allocated buffer.
// Sets ptr = null, len = 0 after free. Double-free is safe (no-op).
void soliton_buf_free(SolitonBuf *buf);

All library-allocated buffers are zeroized before freeing. The ptr and len fields are zeroed after free, making double-free a safe no-op. The ptr field MUST NOT be modified by the caller: soliton_buf_free passes the stored ptr value directly to free(). If the caller advances ptr (e.g., buf.ptr += n to read from an offset), soliton_buf_free frees the advanced pointer — not the original allocation — causing heap corruption in C or undefined behavior in C++. Use a separate local variable for reading: const uint8_t *p = buf.ptr; while (remaining > 0) { ... p++; remaining--; } — do not modify buf.ptr. The len field may be read but also MUST NOT be modified before the free call; modifying len does not affect soliton_buf_free (which does not use len during deallocation), but doing so breaks the "zeroed after free" invariant and may confuse callers who check len to detect freed state.

All CAPI functions with output buffer parameters zero the output upfront (after null-pointer guard) before any computation, so outputs are always in a defined state even on error. Exception — streaming chunk functions: soliton_stream_decrypt_chunk, soliton_stream_decrypt_chunk_at, and soliton_stream_encrypt_chunk zero the output buffer on error paths only — on success, bytes in the output buffer beyond the written bytes (out_written..out_len) are NOT zeroed. The rationale: the output is ciphertext or plaintext (not secret material requiring zeroization), and the buffer may be as large as CHUNK_SIZE / STREAM_ENCRYPT_MAX (≈1 MiB); zeroing on success would waste cycles per chunk. Reimplementers MUST NOT rely on post-success-write bytes being zero — read out_written to determine the valid range.

Caller-side zeroization: For caller-owned buffers that held secret material (e.g., chain keys copied out of soliton_ratchet_encrypt_first), use soliton_zeroize(ptr, len) — a volatile-write zeroing function guaranteed not to be optimized out by the compiler. Standard C memset may be elided if the buffer is not read afterward. Managed-runtime caveat: In languages with garbage collection (Go, Python, C#, Dart), the runtime may relocate heap objects between the last use of the buffer and the soliton_zeroize call, leaving a copy of the secret material at the old address. Callers in managed runtimes MUST pin the buffer (e.g., GCHandle.Alloc in .NET, pinner in Go 1.21+, ctypes with explicitly allocated C buffers in Python) before writing secrets into it. Alternatively, allocate secret buffers via malloc/calloc (outside the GC's control) and free them after zeroization. Volatile writes to a GC-relocated address zeroize the new location but leave the old location intact — the secret survives in memory with no reference to find it.

13.4 Key Functions

Identity:

soliton_identity_generate(pk_out, sk_out, fingerprint_hex_out) — generate LO composite keypair
soliton_identity_fingerprint(pk, pk_len, out) — compute raw SHA3-256 fingerprint
soliton_identity_sign(sk, sk_len, message, message_len, sig_out) — hybrid sign
soliton_identity_verify(pk, pk_len, message, message_len, sig, sig_len) — hybrid verify
soliton_identity_encapsulate(pk, pk_len, ct_out, ss_out) — encapsulate to IK X-Wing component. ss_out receives a 32-byte shared secret into a caller-allocated buffer. The caller MUST zeroize ss_out after use — use soliton_zeroize(ss_out, 32).
soliton_identity_decapsulate(sk, sk_len, ct, ct_len, ss_out) — decapsulate. ss_out receives a 32-byte shared secret into a caller-allocated buffer. The caller MUST zeroize ss_out after use — use soliton_zeroize(ss_out, 32).

Authentication:

soliton_auth_challenge(client_pk, client_pk_len, ct_out, token_out) — server: generate challenge
soliton_auth_respond(client_sk, client_sk_len, ct, ct_len, proof_out) — client: generate proof
soliton_auth_verify(expected_token, proof) — server: constant-time verification

LO-KEX:

soliton_kex_verify_bundle(bundle_ik_pk, ..., spk_pub, ..., spk_sig, ...) — verify pre-key bundle. Error codes: BundleVerificationFailed (-5) for all non-structural failures — IK mismatch (bundle_ik_pk ≠ known_ik_pk), invalid SPK signature, or crypto_version ≠ "lo-crypto-v1". All three collapse to a single error code to prevent iterative oracle probing (an attacker cannot determine which check failed — see §5.3 and §5.5 error-collapsing rationale). InvalidData (-17) on the one structural failure: OPK co-presence violation (opk_pub and opk_id must both be present or both absent) — this check precedes cryptography and is not security-sensitive. InvalidLength (-1) on wrong key/signature sizes. Note: VerificationFailed (-3) is NOT returned by this function — that code is for non-bundle signature operations (identity verification, auth). The collapse to BundleVerificationFailed is intentional; binding authors who pattern-match for VerificationFailed on the bundle verification path will silently miss all bundle-authentication failures.
soliton_kex_initiate(alice_ik_pk, ..., bob_ik_pk, ..., bob_spk_pub, ..., ...) — initiate session (returns SolitonInitiatedSession). Error codes: InvalidLength (-1) if any key or signature has the wrong size. InvalidData (-17) on structural corruption or co-presence violation. BundleVerificationFailed (-5) for all non-structural bundle failures (IK mismatch, unsupported crypto version, invalid SPK signature) — soliton_kex_initiate calls verify_bundle internally and the same oracle-collapse applies as for soliton_kex_verify_bundle above. SPK signature is re-verified internally even if the caller already called soliton_kex_verify_bundle — this is defense-in-depth; binding authors should not attempt to skip the pre-call to verify_bundle. Note: VerificationFailed (-3) and UnsupportedCryptoVersion (-16) are NOT returned by this function — both conditions collapse to BundleVerificationFailed (-5).
soliton_kex_receive(bob_ik_pk, ..., bob_ik_sk, ..., alice_ik_pk, ..., ...) — receive session init
soliton_kex_encode_session_init(...) — encode a parsed SessionInit back to canonical bytes (§7.4). Bob's tool, not Alice's: Alice never calls this directly — soliton_kex_initiate handles encoding internally. Bob calls soliton_kex_encode_session_init after individually parsing or validating the received fields, to reconstruct Alice's canonical encoding for use in first-message AAD construction. The output must be byte-for-byte identical to Alice's internal encoding; any normalization of individual fields during decode (key clamping, padding removal) that alters re-encoding causes silent first-message AEAD failure.
soliton_kex_build_first_message_aad(...) — build first-message AAD
soliton_kex_sign_prekey(ik_sk, ..., spk_pub, ..., sig_out) — sign a pre-key
soliton_kex_initiated_session_free(session) — free SolitonInitiatedSession. Safety model: null-safe (null session is a no-op). Returns void — not int32_t like opaque-handle free functions (soliton_ratchet_free, soliton_call_keys_free). SolitonInitiatedSession is a flat #[repr(C)] struct, not an opaque pointer — there is no type-tag field and no type-discriminant check. Callers MUST NOT pass a handle from a different free function (e.g., a ratchet handle) — doing so will zeroize and free incorrect memory without any error or diagnostic.

Ratchet:

soliton_ratchet_init_alice(root_key, ..., chain_key, ..., local_fp, ..., remote_fp, ..., ek_pk, ..., ek_sk, ..., out) — init Alice state; fingerprints follow root_key/chain_key but precede the ephemeral key params (§6.2 parameter order note). Parameter name is chain_key in the header — see §13.5 for the full name-alias table (epoch_key / chain_key / initial_chain_key).
soliton_ratchet_init_bob(root_key, ..., chain_key, ..., local_fp, ..., remote_fp, ..., peer_ek, ..., out) — init Bob state; fingerprints follow root_key/chain_key but precede the ephemeral key params (§6.2 parameter order note). Same chain_key alias — see §13.5.
soliton_ratchet_encrypt(ratchet, plaintext, ..., out) — encrypt (fingerprints are stored in the ratchet state, not passed per call)
soliton_ratchet_decrypt(ratchet, ratchet_pk, ..., kem_ct, ..., n, pn, ciphertext, ..., plaintext_out) — decrypt. Pass kem_ct = NULL and kem_ct_len = 0 when the header contains no KEM ciphertext (same-chain message). Pass n and pn exactly as received from the wire header — both are included in AAD regardless of epoch type; a caller who passes pn = 0 for every message gets AEAD failure whenever the wire pn ≠ 0.
soliton_ratchet_encrypt_first(epoch_key, plaintext, ..., aad, ..., payload_out, ratchet_init_key_out) — first message
soliton_ratchet_decrypt_first(epoch_key, payload, ..., aad, ..., plaintext_out, ratchet_init_key_out) — first message decrypt
soliton_ratchet_to_bytes(ratchet, data_out, epoch_out) — serialize state (ownership-consuming: takes *mut *mut SolitonRatchet, nulls the caller's handle on success to prevent post-serialization use; epoch_out receives the new epoch for anti-rollback tracking, nullable). On ChainExhausted (epoch at u64::MAX — the only counter the CAPI to_bytes wrapper visibly checks, because can_serialize() pre-filters send_count/recv_count/prev_send_count at u32::MAX before the CAPI takes ownership; the Rust to_bytes itself checks all four counters), *ratchet is NOT nulled — the handle remains valid. On ConcurrentAccess (-18), *ratchet is also NOT nulled — the handle remains live. On NullPointer (-13, e.g., data_out is null), *ratchet is likewise NOT nulled — the call was rejected before ownership transfer began. All three non-success cases that leave the handle intact (NullPointer, ChainExhausted, ConcurrentAccess) are retryable after fixing the caller bug or waiting for the concurrent operation; only a successful return irreversibly transfers and nulls ownership. A binding that frees the handle on any non-zero return code will double-free a live session whenever a null-pointer caller bug triggers NullPointer. Callers who check only for null after failure will lose the handle. Maintainer note: The "NOT nulled on ChainExhausted" guarantee depends on can_serialize() (see §6.8) pre-validating all conditions before the CAPI layer takes ownership of the handle. If a future to_bytes refactor introduces a new error condition not covered by can_serialize(), the handle will be nulled on that error with no recovery path — can_serialize() and to_bytes must check identical conditions. epoch_out sentinel on error: When epoch_out is non-null, the CAPI sets *epoch_out = 0 immediately at entry so that error paths never leave stale values from a previous call. Epoch 0 is never a valid serialized epoch (the initial to_bytes produces epoch 1), so 0 acts as a sentinel meaning "no epoch written." Callers that store *epoch_out as their min_epoch for anti-rollback MUST check the return code first and MUST NOT update their stored min_epoch on any error return — storing the sentinel value 0 as min_epoch silently disables anti-rollback protection for all subsequent from_bytes_with_min_epoch calls.
soliton_ratchet_from_bytes(data, data_len, out) — deserialize state (deprecated — use from_bytes_with_min_epoch; see §6.8). Error codes: InvalidData (-17) on structural blob corruption (guards 1-25, §6.8), ChainExhausted (-15) when the blob encodes epoch == u64::MAX (guard 24 — the session is structurally valid but permanently un-re-serializable; see §12 collapse table). InvalidLength (-1) if the input exceeds the 1 MiB CAPI cap. A binding author who catches only InvalidData and propagates all other errors as "corruption" will silently lose a recoverable serialization-exhausted session.
soliton_ratchet_from_bytes_with_min_epoch(data, data_len, min_epoch, out) — deserialize with anti-rollback check (epoch must be > min_epoch). Same error codes as from_bytes, plus InvalidData for epoch-rollback rejection (guard 12 — indistinguishable from structural corruption at the API level; see §6.8).
soliton_ratchet_epoch(ratchet, out) — query current epoch counter non-destructively (since to_bytes is ownership-consuming, use epoch() to read the epoch without committing to serialization — e.g., to check consistency with a stored min_epoch before calling to_bytes, or to initialize a min_epoch store when migrating existing sessions)
soliton_ratchet_reset(ratchet) — reset ratchet state to initial (zeroizes all epoch keys). Returns int32_t: 0 on success, ConcurrentAccess (-18) if the handle is in use, InvalidData (-17) if the handle's type discriminant is wrong (handle was not created by soliton_ratchet_init_*; see §13.6 type tagging). On ConcurrentAccess, the state is NOT reset — the caller must retry after the concurrent operation completes. On InvalidData, the state is also NOT reset — the type-discriminant check fires before any reset logic, so the handle (if it is a valid ratchet handle accidentally passed to the wrong operation) is unmodified and safe to continue using.
soliton_ratchet_free(ratchet) — free opaque ratchet. Returns int32_t: 0 on success, ConcurrentAccess (-18) if in use, InvalidData (-17) if the type discriminant is wrong. Null outer/inner pointer is a safe no-op (returns 0)
soliton_encrypted_message_free(msg) — free SolitonEncryptedMessage buffer fields (header.ratchet_pk, header.kem_ct, ciphertext). Does NOT free the struct itself — SolitonEncryptedMessage is a caller-owned value type, not an opaque heap handle. After calling this function, the caller is responsible for freeing the struct allocation (e.g., free(msg) in C). Contrast with soliton_ratchet_free, which frees the opaque handle allocation.

Call:

soliton_ratchet_derive_call_keys(ratchet, kem_ss, kem_ss_len, call_id, call_id_len, out) — derive call keys. kem_ss_len MUST be exactly 32 and call_id_len MUST be exactly 16; any other value → InvalidLength. These are the only two fixed-size input parameters in the call group with explicit length validation — unlike local_fp and remote_fp (taken from ratchet state internally), kem_ss and call_id are caller-supplied buffers with strict size contracts.
soliton_call_keys_send_key(keys, out, out_len) — copy current send key. out_len must be exactly 32; any other value returns InvalidLength. The caller MUST zeroize out after use — use soliton_zeroize(out, 32). The copied key is live session key material for media encryption.
soliton_call_keys_recv_key(keys, out, out_len) — copy current recv key. out_len must be exactly 32; any other value returns InvalidLength. The caller MUST zeroize out after use — use soliton_zeroize(out, 32). The copied key is live session key material for media encryption.
soliton_call_keys_advance(keys) — advance call chain (rekey). Returns ChainExhausted (-15) after 2²⁴ steps. On exhaustion, all call key material (key_a, key_b, chain_key) is immediately zeroized — the handle is dead: soliton_call_keys_send_key and soliton_call_keys_recv_key will return zeroed material after exhaustion, with no error or diagnostic. The handle is NOT auto-freed on ChainExhausted; callers MUST free it via soliton_call_keys_free and establish a new call via soliton_ratchet_derive_call_keys. See §6.12.
soliton_call_keys_free(keys) — free opaque call keys (zeroizes). Returns int32_t: 0 on success, ConcurrentAccess (-18) if in use, InvalidData (-17) if type discriminant wrong. Null outer/inner is safe no-op (returns 0)

Storage:

soliton_storage_encrypt(keyring, plaintext, ..., channel_id, segment_id, compress, out) — encrypt blob
soliton_storage_decrypt(keyring, blob, ..., channel_id, segment_id, out) — decrypt blob
soliton_dm_queue_encrypt(keyring, plaintext, ..., recipient_fp, batch_id, compress, out) — encrypt DM queue blob (§11.4.2 AAD)
soliton_dm_queue_decrypt(keyring, blob, ..., recipient_fp, batch_id, out) — decrypt DM queue blob
soliton_keyring_new(key, key_len, version, out) — create keyring (key is fixed 32 bytes). Error codes: NullPointer (-13) if out is null; InvalidLength (-1) if key_len ≠ 32; UnsupportedVersion (-10) if version == 0 (version 0 is reserved — §11.1); InvalidData (-17) if the key is all-zero bytes (§11.2 guard — all-zero is an invalid key). Returns 0 on success.
soliton_keyring_add_key(keyring, key, key_len, version, make_active) — add key (key is fixed 32 bytes). encrypt_blob always uses the active version's key. make_active=true atomically updates the active version to the newly-added key. make_active=false registers the key for decryption only (lookup by version byte) — the active version for new encryptions does not change. make_active=false with a version matching the current active version returns InvalidData: a caller adding a key with the same version byte as the current active key while passing make_active=false intends for the new key material to remain inactive, but the version byte already identifies the active slot — this is an ambiguous / incoherent request (it would silently replace key material for the active version without activating it, making the active version undecryptable for blobs previously encrypted under the old material). The function rejects this with InvalidData rather than silently updating the key material.
soliton_keyring_remove_key(keyring, version) — remove key. Returns int32_t: 0 if key was present and removed, 0 if key was absent (idempotent), InvalidData (-17) if version is the current active version (active key cannot be removed — §10 invariant), UnsupportedVersion (-10) if version == 0, NullPointer (-13) if keyring is null, InvalidData (-17) if type discriminant wrong. Design note — both Ok outcomes return 0: The core Rust remove_key returns Ok(true) (was present, removed) or Ok(false) (was absent). The CAPI collapses both to return code 0 — the distinction is informational and has no security consequence; the idempotency is the externally visible contract. Binding authors who need to distinguish the two cases must track key versions independently or use the Rust API directly.
soliton_keyring_free(keyring) — free keyring. Returns int32_t: 0 on success, ConcurrentAccess (-18) if in use, InvalidData (-17) if type discriminant wrong. Null outer/inner is safe no-op (returns 0)

Streaming AEAD:

soliton_stream_encrypt_init(key, key_len, caller_aad, aad_len, compress, out) — init encryptor (generates random base nonce). key_len MUST be exactly 32; any other value returns InvalidLength. Unlike header_len (lenient — extra bytes accepted), key_len is strict — the key is always exactly 32 bytes for XChaCha20-Poly1305.
soliton_stream_encrypt_header(enc, out, out_len) — copy 26-byte header into caller-allocated buffer; out_len MUST be ≥ 26 (lenient: extra buffer space is accepted)
soliton_stream_encrypt_chunk(enc, plaintext, ..., is_last, out) — encrypt one chunk; out_len MUST be ≥ STREAM_ENCRYPT_MAX (1,048,849 bytes) — returns InvalidLength for smaller buffers (parallel to the out_len < STREAM_CHUNK_SIZE → InvalidLength rule for decrypt chunk)
soliton_stream_encrypt_chunk_at(enc: *const, index, plaintext, ..., is_last, out) — encrypt at explicit index (stateless, random-access); uses *const SolitonStreamEncryptor (not *mut) to reflect the &self Rust contract — see §15.11 for the *const caveat. Same out_len ≥ STREAM_ENCRYPT_MAX requirement as the sequential variant. index MUST be unique per call — calling with the same index and different plaintexts produces nonce reuse (§15.12). Does not advance next_index. Not interchangeable with the sequential variant; see §15.11 for mixed-mode use. Absent from soliton.h: this function is implemented and exported (#[unsafe(no_mangle)]) but has no declaration in the C header — its decrypt counterpart soliton_stream_decrypt_chunk_at is declared in the header. Binding authors (C, C++, Go cgo, C#, Dart) must supply a manual extern declaration matching the signature above until the header is updated.
soliton_stream_encrypt_is_finalized(enc, out: *mut bool) — write finalized state to out
soliton_stream_encrypt_free(enc) — free encryptor (zeroizes key). Returns int32_t: 0 on success, NullPointer (-13) if outer pointer null, 0 (safe no-op) if inner pointer null (null inner pointer means the handle was already freed or never initialized — matches the double-free behavior of soliton_ratchet_free / soliton_keyring_free; does NOT return NullPointer for inner-null), ConcurrentAccess (-18) if in use, InvalidData (-17) if type discriminant wrong

soliton_stream_encrypt_chunk output buffer — only out_written bytes are valid: On a successful return from soliton_stream_encrypt_chunk, only the first *out_written bytes of out contain ciphertext. The output buffer must be at least STREAM_ENCRYPT_MAX (1,048,849 bytes) to accommodate any valid chunk, but a non-final uncompressed chunk writes exactly CHUNK_SIZE + CHUNK_OVERHEAD = 1,048,593 bytes, leaving the remaining 256 bytes of a minimum-sized buffer uninitialized. A binding author who copies out[0..STREAM_ENCRYPT_MAX] to a transport (instead of out[0..*out_written]) transmits up to 256 bytes of heap content alongside the ciphertext — ciphertext is not secret, but the heap bytes may contain earlier key material or other sensitive data from the process heap. Always use *out_written to determine the valid range. This mirrors the behavior documented for soliton_stream_decrypt_chunk and soliton_stream_decrypt_chunk_at.

No soliton_stream_encrypt_next_index function: After encrypt_chunk(is_last=true) succeeds, the chunk count equals the encryptor's internal next_index — but this is not exposed via CAPI. §15.12 describes how to track chunk count: callers must count encrypt_chunk calls manually, or use is_finalized() to confirm the stream is complete. The decrypt-side soliton_stream_decrypt_expected_index has no symmetric encrypt-side counterpart — this asymmetry is intentional.

soliton_stream_decrypt_init(key, key_len, header, header_len, caller_aad, aad_len, out) — init decryptor from header; key_len MUST be exactly 32 (strict: any other length returns InvalidLength, same as encrypt_init); header_len MUST be exactly 26 (strict: any other length returns InvalidLength)
soliton_stream_decrypt_chunk(dec, chunk, chunk_len, out, out_len, out_written, is_last: *mut bool) — decrypt sequential chunk; out_len MUST be ≥ STREAM_CHUNK_SIZE (1,048,576 bytes) — returns InvalidLength for smaller buffers (see note below); is_last is a required non-null out-parameter — returns NullPointer if null
soliton_stream_decrypt_chunk_at(dec, index, chunk, chunk_len, out, out_len, out_written, is_last: *mut bool) — decrypt at explicit index (stateless); same out_len ≥ STREAM_CHUNK_SIZE requirement as above; is_last is a required non-null out-parameter — returns NullPointer if null
soliton_stream_decrypt_is_finalized(dec, out: *mut bool) — write finalized state to out
soliton_stream_decrypt_expected_index(dec, out: *mut u64) — write next expected sequential index to out
soliton_stream_decrypt_free(dec) — free decryptor (zeroizes key). Returns int32_t: 0 on success, NullPointer (-13) if outer pointer null, 0 (safe no-op) if inner pointer null (null inner pointer means already-freed or never-initialized — does NOT return NullPointer for inner-null, consistent with soliton_ratchet_free / soliton_keyring_free), ConcurrentAccess (-18) if in use, InvalidData (-17) if type discriminant wrong

SOLITON_STREAM_ENCRYPT_MAX and SOLITON_STREAM_CHUNK_SIZE are NOT defined as #define constants in soliton.h: The header references these names in documentation comments but does not provide #define or constexpr entries. Binding authors who write out_len = SOLITON_STREAM_ENCRYPT_MAX get a compile error. The values must be embedded as integer literals in bindings: STREAM_ENCRYPT_MAX = 1,048,849 (encrypt output buffer, see Appendix A) and STREAM_CHUNK_SIZE = 1,048,576 (decrypt output buffer, see Appendix A). Language-idiomatic constant definitions are recommended:

// C/C++ — add to binding wrapper or generated header
#define SOLITON_STREAM_ENCRYPT_MAX  1048849UL
#define SOLITON_STREAM_CHUNK_SIZE   1048576UL

These values are stable and will not change without a major version bump.

Streaming decrypt output buffer minimum — STREAM_CHUNK_SIZE (1,048,576 bytes): Both soliton_stream_decrypt_chunk and soliton_stream_decrypt_chunk_at require the output buffer to be at least STREAM_CHUNK_SIZE bytes regardless of the expected plaintext size. This is because the buffer size cannot be known before decryption completes (for compressed streams, the decompressed size is variable and determined post-AEAD; for uncompressed streams, the plaintext size equals the ciphertext minus the 16-byte AEAD tag, which requires parsing the ciphertext first). The library therefore mandates a worst-case buffer that can hold any valid decrypted chunk. This is asymmetric with the encrypt side: the encrypt output buffer uses STREAM_ENCRYPT_MAX (1,048,849 bytes), which is larger than STREAM_CHUNK_SIZE to accommodate compression overhead and the tag_byte. The decrypt minimum is the raw STREAM_CHUNK_SIZE because decrypt outputs plaintext (no tag_byte, no compression overhead). Binding authors who size the output buffer to the expected plaintext for a small final chunk (e.g., a 100-byte final chunk with a 100-byte output buffer) will receive InvalidLength with no diagnostic in the error message indicating that buffer size is the cause. See Appendix A for the constant value.

Streaming header buffer size asymmetry: soliton_stream_encrypt_header accepts any out_len ≥ 26 (lenient — a 32-byte output buffer is fine). soliton_stream_decrypt_init requires header_len == 26 exactly (strict — any other length returns InvalidLength). This asymmetry is intentional: the encryptor writes into a caller-owned buffer and the caller controls the buffer size; the decryptor parses an input buffer where any size other than exactly 26 indicates a framing error. A caller who stores the 26-byte header in a 32-byte buffer can encrypt successfully but must pass exactly header_len = 26 to decrypt_init — passing the full buffer length (32) returns InvalidLength.

Primitives:

soliton_hmac_sha3_256(key, key_len, data, data_len, out, out_len) — HMAC-SHA3-256. out_len must be exactly 32 (the HMAC-SHA3-256 output size); any other value returns InvalidLength. Unlike most output-length parameters in the CAPI (which express a caller-allocated buffer size), out_len here is a strict size-check: the function does not produce a variable-length output.
soliton_hkdf_sha3_256(salt, salt_len, ikm, ikm_len, info, info_len, out, out_len) — HKDF-SHA3-256. out_len constraint: must be in the range 1-8160 bytes. The upper bound is the RFC 5869 §2.3 HKDF-Expand maximum: 255 × HashLen = 255 × 32 = 8160 bytes for SHA3-256. A zero out_len or out_len > 8160 returns InvalidLength.
soliton_sha3_256(data, data_len, out, out_len) — SHA3-256. out_len must be exactly 32; any other value returns InvalidLength.
soliton_xwing_keygen(pk_out, sk_out) — X-Wing key generation
soliton_xwing_encapsulate(pk, pk_len, ct_out, ss_out) — X-Wing encapsulate. ss_out receives a 32-byte shared secret into a caller-allocated buffer. The caller MUST zeroize ss_out after use — use soliton_zeroize(ss_out, 32).
soliton_xwing_decapsulate(sk, sk_len, ct, ct_len, ss_out) — X-Wing decapsulate. ss_out receives a 32-byte shared secret into a caller-allocated buffer. The caller MUST zeroize ss_out after use — use soliton_zeroize(ss_out, 32).
soliton_aead_encrypt(key, key_len, nonce, nonce_len, plaintext, ..., aad, ..., out) — raw XChaCha20-Poly1305 encrypt. key_len MUST be exactly 32 (AES-style key mismatch: XChaCha20-Poly1305 uses a 256-bit key); any other value returns InvalidLength. nonce_len MUST be exactly 24 — XChaCha20 uses a 192-bit nonce; passing a 12-byte ChaCha20 nonce returns InvalidLength. This is a common caller error when migrating from chacha20poly1305 (12-byte nonce) to xchacha20poly1305 (24-byte nonce).
soliton_aead_decrypt(key, key_len, nonce, nonce_len, ciphertext, ..., aad, ..., out) — raw XChaCha20-Poly1305 decrypt. Same key and nonce length constraints as soliton_aead_encrypt: key_len must be 32 and nonce_len must be 24; any other value returns InvalidLength.
soliton_hmac_sha3_256_verify(tag_a, tag_a_len, tag_b, tag_b_len) — constant-time 32-byte tag comparison. Returns 0 if equal, VerificationFailed (-3) if tags differ, InvalidLength (-1) if either length ≠ 32. Constant-time is a security requirement — comparison time must be independent of tag contents to prevent timing attacks on authentication tokens (§4). Do NOT substitute memcmp() or any early-exit comparison.
soliton_argon2id(password, ..., salt, ..., m_cost, t_cost, p_cost, out, out_len) — Argon2id KDF (§10.6). out_len constraint: 1-4096 bytes; zero or > 4096 returns InvalidLength (see §10.6 cap rationale).
soliton_verification_phrase(pk_a, pk_a_len, pk_b, pk_b_len, out) — verification phrase
soliton_random_bytes(buf, len) — fill buf with len cryptographically random bytes from the OS CSPRNG. Output cap: len must be ≤ 256 MiB (268,435,456 bytes) — requests exceeding this return InvalidLength. CSPRNG failure aborts: like keygen and encapsulation (§13.2), soliton_random_bytes aborts the process on OS entropy failure rather than returning an error code. Binding authors MUST NOT expect an error return on CSPRNG failure for this function.
soliton_zeroize(ptr, len) — volatile-write zeroing (guaranteed not optimized out — use for caller-owned secret buffers). Null-safe and zero-length-safe: if ptr is NULL or len == 0, the function is a silent no-op (returns immediately without error and without performing any memory write). This diverges from the general §13.2 convention where null pointers return NullPointer (-13) — soliton_zeroize does NOT return an error code for null input. Callers relying on soliton_zeroize to confirm that a buffer was zeroed MUST verify ptr != NULL && len > 0 before calling; the silent no-op means a null-check failure is invisible at the call site. soliton_zeroize has no return value — it returns void (C) / () (Rust). Unlike all other CAPI functions, there is no int32_t return code to check; the function either performs the volatile writes or silently does nothing.
soliton_version() — return version string as *const c_char. Static lifetime — do NOT free: The returned pointer is embedded in the library binary (a 'static string slice in Rust, exposed as a C string literal). It is valid for the lifetime of the process, never null, and MUST NOT be passed to free() or soliton_buf_free(). Calling free() on a static pointer is undefined behavior (heap corruption). Binding authors who follow the "every library allocation must be freed" convention from §13.2 must add an exception for soliton_version(). This function is the sole CAPI function that returns a raw C string pointer rather than a SolitonBuf; all other variable-length string outputs use SolitonBuf and are heap-allocated. The pointer remains valid as long as the library is loaded.

KEX (additional):

soliton_kex_decode_session_init(data, data_len, out) — decode SessionInit from bytes. Input cap: 64 KiB (65,536 bytes). Inputs exceeding 64 KiB return InvalidLength. This is tighter than the general 256 MiB CAPI cap (§13.2) — the maximum valid SessionInit is 4,669 bytes (with OPK; §7.4), so 64 KiB is a safe conservative bound that prevents allocation-exhaustion from oversized buffers.
soliton_decoded_session_init_free(session) — free decoded SessionInit: frees the crypto_version SolitonBuf. No zeroization is performed — SolitonDecodedSessionInit contains no secret material (§13.6). Null session is a safe no-op. Must be called on every successful soliton_kex_decode_session_init output.
soliton_kex_received_session_free(session) — free received session

soliton_kex_build_first_message_aad input cap: This function constructs the first-message AAD from a SolitonInitiatedSession and returns it as a SolitonBuf. It applies an 8 KiB (8,192 bytes) internal cap on the combined size of the SessionInit encoding and ancillary fields. Inputs exceeding 8 KiB return InvalidLength. In practice the SessionInit encoding is at most 4,669 bytes (§7.4), so this cap is never reached with well-formed inputs. Binding authors who synthesize oversized mock SolitonInitiatedSession structs for testing may encounter this limit.

opk_sk co-presence error codes at the CAPI level: The two directions of OPK co-presence violation produce different error codes at the CAPI level. (1) When ct_opk is non-null (OPK ciphertext present) but opk_sk is null (the OPK secret key pointer is null): the CAPI's null-pointer guard for opk_sk fires first and returns NullPointer (-13), before the co-presence check runs. (2) When ct_opk is null (no OPK ciphertext) but opk_sk is non-null (a secret key pointer was passed): the co-presence check fires and returns InvalidData (-17), because the OPK secret key was supplied for an absent OPK ciphertext. Binding authors pattern-matching on errors from soliton_kex_receive MUST handle both: NullPointer for "OPK ciphertext present, OPK secret key missing" and InvalidData for "OPK ciphertext absent, OPK secret key present."

opk_id co-presence constraint: When ct_opk is null (no OPK ciphertext), opk_id MUST be 0. Passing a non-zero opk_id with a null ct_opk returns InvalidData. This constraint is enforced by soliton_kex_receive and soliton_kex_decode_session_init. The opk_id field is meaningful only when ct_opk is present; a non-zero opk_id with absent ct_opk indicates a malformed SessionInit (the OPK key lookup would use the non-zero ID to look up an OPK that the protocol says is not being used). A reimplementer who initializes opk_id to a non-zero default when building a no-OPK SessionInit will receive InvalidData on the receiving side. opk_id = 0 is a valid OPK ID when has_opk = true / ct_opk is present: A server can assign OPK ID 0 to the first uploaded one-time pre-key. When a SessionInit arrives with has_opk = 0x01 (or ct_opk non-null) and opk_id = 0, this means OPK ID 0 was used — has_opk is the sole authority for whether an OPK was included. opk_id = 0 does NOT act as a sentinel for "no OPK present" in the case where has_opk = true. A reimplementer who treats opk_id == 0 as "no OPK" and ignores has_opk will discard valid SessionInits that used OPK ID 0, silently ignoring the OPK ciphertext and producing wrong decapsulation output. In SolitonDecodedSessionInit, has_opk is the canonical field to check; opk_id must only be used when has_opk == 1.

13.5 Key Usage Order for Session Initiation

The epoch key flows through several steps and the right value must be passed to each:

// Alice (initiator):
soliton_kex_initiate(...)           → SolitonInitiatedSession { initial_epoch_key, ... }
soliton_ratchet_encrypt_first(initial_epoch_key, plaintext, aad, ...)  → (payload, ratchet_init_key)
soliton_ratchet_init_alice(root_key, ratchet_init_key, ek_pk, ek_sk, ...)

// Bob (responder):
soliton_kex_receive(...)            → (root_key, initial_epoch_key, peer_ek)
soliton_ratchet_decrypt_first(initial_epoch_key, payload, aad, ...)  → (plaintext, ratchet_init_key)
soliton_ratchet_init_bob(root_key, ratchet_init_key, peer_ek, ...)

aad parameter for encrypt_first / decrypt_first: The aad parameter is the first-message AAD constructed by build_first_message_aad / soliton_kex_build_first_message_aad. Its value is (§7.3 / §5.4 Step 7): "lo-dm-v1" || sender_fingerprint_raw || recipient_fingerprint_raw || encode_session_init(session_init). Both Alice and Bob must pass byte-for-byte identical aad bytes; any divergence (wrong label, wrong fingerprint order, non-canonical encode_session_init output) produces AeadFailed on Bob's decrypt_first with no diagnostic pointing to the AAD. A reimplementer reading §13.5 in isolation who constructs aad from the per-function parameter description only will not find the required content — it must be sourced from §5.4 Step 7 (Alice's side) and §5.5 Step 6 (Bob's side). The easiest correct implementation calls soliton_kex_build_first_message_aad (§13.4) to produce this value rather than constructing it manually.

encrypt_first_message / decrypt_first_message are pre-RatchetState standalone operations: These functions take an initial_epoch_key parameter directly — they do NOT require or use a SolitonRatchet handle. They are stateless AEAD operations called before the ratchet is initialized (ratchet_init_alice / ratchet_init_bob). A reimplementer who constructs a SolitonRatchet first and then tries to pass it to the first-message functions has misread the call sequence — the first-message functions consume the initial epoch key and return ratchet_init_key, which is then passed to ratchet init.

ratchet_init_key is the epoch key returned unchanged by encrypt_first_message / decrypt_first_message — it is the input initial_epoch_key passed through (counter-mode does not advance the epoch key). It is passed to ratchet_init_alice / ratchet_init_bob as the initial epoch key. It is not a separate derived value.

Name equivalence (epoch key): The same 32-byte value (session_key[32..64] from §5.4 Step 4) appears under four names across the spec, Rust API, and CAPI: epoch_key (§5.4 protocol pseudocode), initial_epoch_key (CAPI SolitonInitiatedSession / soliton_kex_receive output in the §13.5 pseudocode), initial_chain_key (Rust InitiatedSession::take_initial_chain_key() — historical name from the pre-counter-mode chain design), and ratchet_init_key (CAPI return from encrypt_first / decrypt_first). All four are the same value at different points in the key flow. SolitonReceivedSession struct field name: In the SolitonReceivedSession C struct (§13.6), Bob's copy of this value is named chain_key — not initial_epoch_key. The §13.5 pseudocode uses initial_epoch_key as the return-value label for soliton_kex_receive; the §13.6 struct layout names the same field chain_key. A binding author laying out SolitonReceivedSession manually must use the field name chain_key, not initial_epoch_key.

Name equivalence (ephemeral public key): Alice's ephemeral X-Wing public key (EK_pub, 1216 bytes) also appears under three names: sender_ek (the SessionInit struct field transmitted in §5.4 Step 5), ek_pk (the SolitonInitiatedSession field returned by soliton_kex_initiate and stored in Alice's ratchet handle via soliton_ratchet_init_alice), and send_ratchet_pk (the RatchetState field after init_alice — Alice's initial send ratchet public key, which Bob will encapsulate to on his first send). Getting this mapping wrong means Bob's first KEM ratchet step encapsulates to a different key than Alice expects — the resulting kem_ss diverges, the new epoch key diverges, and every subsequent message fails with AeadFailed with no diagnostic pointing to the mismatched key.

Passing initial_epoch_key directly to ratchet init (skipping the first-message step) produces no immediate error — AEAD encryption succeeds with any 32-byte key — but decryption at the remote end will fail.

soliton_kex_receive wrong-key-ID silent failure: If a recognized spk_id is paired with the wrong secret key (e.g., storage corruption maps a valid ID to different key material), soliton_kex_receive succeeds and returns a valid-looking SolitonReceivedSession — but X-Wing implicit rejection produces a pseudorandom ss_spk, so root_key and initial_epoch_key diverge from Alice's. The error surfaces only when decrypt_first_message fails with AeadFailed, with no diagnostic distinguishing this from ciphertext tampering or transport corruption. This is the same category of silent failure as passing initial_epoch_key directly to ratchet init. Bob's spk_id → sk mapping MUST be verified for integrity independently (e.g., by storing a fingerprint of the SPK public key alongside the private key and checking it before decapsulation) — see §5.5 Step 4.

Single-use key extraction: InitiatedSession and ReceivedSession enforce single-use extraction of root_key and initial_epoch_key. The first call to take_root_key() / take_initial_epoch_key() returns the value and replaces the internal copy with zeros. A second call returns all-zeros, which ratchet init rejects (all-zero root_key is invalid). Reimplementers providing accessor methods (get_root_key()) instead of consuming methods risk accidental key reuse — extracting the same root key twice and initializing two ratchets produces two sessions with identical state, causing nonce reuse on the first message.

ek_sk is also single-use: The ek_sk field (X-Wing ephemeral secret key, 2432 bytes) in SolitonInitiatedSession MUST be passed to exactly one soliton_ratchet_init_alice call. ek_sk is the X-Wing decapsulation key that Alice will use to decapsulate Bob's first KEM ratchet ciphertext — passing it to two init_alice calls creates two ratchet instances with identical send_ratchet_sk. When Bob sends his first ratchet message, he encapsulates to Alice's ek_pk once; only one of the two Alice instances can derive the correct epoch key from the resulting KEM ciphertext. The other instance has the same send_ratchet_sk but decapsulates against a mismatched ciphertext, producing a wrong kem_ss, a wrong recv_epoch_key, and silent AeadFailed on the first message with no diagnostic pointing to the duplicated ek_sk. Unlike root_key and initial_epoch_key, ek_sk is not enforced as single-use by a consuming wrapper at the CAPI level (it is passed as a *const SolitonBuf raw pointer); callers MUST NOT reuse it. After soliton_ratchet_init_alice returns, the ek_sk buffer should be freed via soliton_kex_initiated_session_free — do not pass it to any further init_alice calls.

13.6 Opaque Structs

SolitonRatchet, SolitonKeyRing, SolitonCallKeys, SolitonStreamEncryptor, and SolitonStreamDecryptor are heap-allocated opaque structs. Their internal layout is not part of the ABI. They must be freed with soliton_ratchet_free, soliton_keyring_free, soliton_call_keys_free, soliton_stream_encrypt_free, and soliton_stream_decrypt_free respectively.

SolitonInitiatedSession is a flat C struct with both inline fields (zeroed by soliton_kex_initiated_session_free) and SolitonBuf fields. The ek_sk field must be freed via soliton_kex_initiated_session_free — do NOT call soliton_buf_free on ek_sk directly. soliton_buf_free frees the heap allocation and nulls the SolitonBuf fields, but the inline root_key and initial_chain_key arrays (32 bytes each, embedded directly in the struct, not SolitonBuf fields) are left unzeroized. The dedicated free function zeroizes both inline arrays and then frees the SolitonBuf fields. Calling soliton_buf_free on ek_sk followed by the dedicated free is safe (the null-after-free guarantee makes the second free of ek_sk a no-op), but calling only soliton_buf_free leaks 64 bytes of secret material. GC language hazard: SolitonInitiatedSession contains inline root_key and initial_chain_key (32 bytes each) — secret material embedded directly in the struct. In GC languages (C#, Go, Python), the GC may relocate (compact) a managed-heap struct, leaving unzeroized copies of these keys at the old address. Binding authors MUST allocate this struct in pinned/unmanaged memory (Marshal.AllocHGlobal, C.malloc, ctypes.create_string_buffer) and call soliton_kex_initiated_session_free immediately after extracting both keys to minimize the pinned lifetime.

GC language hazard — SolitonReceivedSession: The identical hazard applies to SolitonReceivedSession (Bob's side). SolitonReceivedSession contains inline root_key ([u8; 32]) and chain_key ([u8; 32]) — secret material embedded directly in the struct alongside SolitonBuf fields for peer_ek. GC relocation at any point between soliton_kex_receive returning and soliton_kex_received_session_free executing leaves unzeroized copies at the old address. Binding authors MUST apply the same pinned/unmanaged-memory allocation to SolitonReceivedSession as to SolitonInitiatedSession. The mitigation pattern is: allocate SolitonReceivedSession in pinned memory → call soliton_kex_receive → extract root_key and chain_key into pinned buffers → call soliton_kex_received_session_free → unpin. This struct is Bob's counterpart to Alice's SolitonInitiatedSession and carries the same category of secret material.

Alignment padding in flat structs: SolitonInitiatedSession has two implicit padding gaps that binding authors laying out the struct manually (Go struct, C# StructLayout, Python ctypes.Structure) MUST include explicitly. (1) spk_id → ct_opk (4-byte gap): spk_id (uint32_t, 4 bytes) ends at offset 212, but ct_opk (SolitonBuf) requires 8-byte pointer alignment — the next 8-aligned boundary is offset 216, so 4 bytes of implicit padding appear at offsets 212-215. A binding author who places ct_opk at offset 212 corrupts all subsequent fields. (2) has_opk → sender_sig (3-byte gap): has_opk (uint8_t) is followed by 3 bytes of implicit alignment padding before the next pointer-aligned SolitonBuf field. The generated soliton.h header handles both gaps automatically via C's natural alignment rules. The same 3-byte has_opk pattern applies to SolitonDecodedSessionInit's has_opk field (3 bytes padding before the next 4-byte-aligned field).

SolitonDecodedSessionInit contains no secret material — no zeroization required: All fields are wire-transmitted public or semi-public values (ciphertexts, public keys, fingerprints, version string). None require zeroization or privileged memory treatment. Callers MAY discard this struct normally after use — free() in C, garbage collection in managed languages, stack deallocation in Rust. Contrast with SolitonInitiatedSession and SolitonReceivedSession, which contain secret key material (root_key, epoch keys) derived from KEM operations and MUST be freed exclusively via soliton_kex_initiated_session_free / soliton_kex_received_session_free, which zeroize their contents before deallocation. SolitonDecodedSessionInit does not have and does not need a zeroizing free function.

SolitonDecodedSessionInit is large (4,672 bytes on LP64) — avoid stack allocation in constrained environments: This struct contains the full decoded fields of a SessionInit including ct_ik (1,120 bytes), ct_spk (1,120 bytes), ct_opk (1,120 bytes), sender_ek (1,216 bytes), and one SolitonBuf field (crypto_version, 16 bytes on LP64: ptr + len). The exact #[repr(C)] size on LP64 is 4,672 bytes: crypto_version(16) + sender_fp(32) + recipient_fp(32) + sender_ek(1216) + ct_ik(1120) + ct_spk(1120) + spk_id(4) + has_opk(1) + ct_opk(1120) + 3 bytes alignment padding + opk_id(4) + 4 bytes trailing struct padding to align to 8 bytes = 4,672. Binding authors doing manual struct layout (Go struct, C# StructLayout, Python ctypes.Structure) must include both the 3-byte padding before opk_id and the 4-byte trailing padding. On Go goroutines (initial stack 8 KiB, fragmented by other locals) and .NET async state machines (stack budget shared with awaiter frames), placing this struct on the stack risks non-deterministic stack overflow. Binding authors MUST heap-allocate this struct: C.malloc in Go, Marshal.AllocHGlobal in .NET, ctypes.create_string_buffer(ctypes.sizeof(...)) in Python, or equivalent. In Rust, the core library's SolitonDecodedSessionInit is behind a Box<>; C/Go/Python bindings must ensure the same. A binding that allocates this struct on the frame stack passes tests on machines with ample stack space but crashes non-deterministically in production under deep call chains.

SolitonDecodedSessionInit.crypto_version SolitonBuf length includes null terminator: The crypto_version field of SolitonDecodedSessionInit is a SolitonBuf whose len is 13 for the current version — 12 bytes for the string "lo-crypto-v1" plus one trailing null byte (\0). The null byte is included to make the buffer directly usable as a C string without an additional copy. Binding authors who read len and compare it to 12 (the character count of "lo-crypto-v1") will find len == 13 and may incorrectly conclude the version string is malformed. The correct validation pattern is: buf.len == 13 && buf.ptr[0..12] == b"lo-crypto-v1" && buf.ptr[12] == 0. Do NOT pass buf.len as the length of a cryptographic comparison (e.g., to a constant-time compare function) expecting 12 — the extra null byte would cause a mismatch against a 12-byte reference string.

SolitonRatchetHeader and SolitonEncryptedMessage layouts (flat value types — binding authors must lay out manually): These are #[repr(C)] flat structs returned from soliton_ratchet_encrypt and passed to soliton_ratchet_decrypt. Unlike the opaque handle types above, binding authors in Go, C#, and Python must lay out these structs explicitly.

SolitonRatchetHeader (40 bytes on LP64):

Offset	Size	Field	Description
0	16	`ratchet_pk`	`SolitonBuf` — sender's ratchet public key (library-allocated; ptr + len, each 8 bytes on LP64)
16	16	`kem_ct`	`SolitonBuf` — KEM ciphertext, if present; `ptr` is null and `len` is 0 if absent (same-epoch message)
32	4	`n`	`uint32_t` — message number within current send chain
36	4	`pn`	`uint32_t` — length of the previous send chain

SolitonEncryptedMessage (56 bytes on LP64):

Offset	Size	Field	Description
0	40	`header`	`SolitonRatchetHeader` (inline, not a pointer)
40	16	`ciphertext`	`SolitonBuf` — AEAD-encrypted message (library-allocated)

kem_ct.ptr == null (null pointer, len == 0) signals absence of a KEM ciphertext — do NOT use an all-zero SolitonBuf as the absent sentinel. On success, soliton_ratchet_encrypt zeroes the entire SolitonEncryptedMessage output before writing, so the null-ptr convention applies on success paths. On error, the output is also zeroed (making soliton_encrypted_message_free safe to call on error paths — it is a no-op on zero-initialized structs). Binding authors MUST pass kem_ct.ptr as NULL and kem_ct.len as 0 to soliton_ratchet_decrypt when the header contains no KEM ciphertext.

Type-tagging: Each opaque handle type embeds a 4-byte magic discriminant as its first field. The _free functions validate this discriminant before operating on the pointer. Passing a handle to the wrong free function (e.g., soliton_ratchet_free on a SolitonKeyRing*) is detected and returns InvalidData rather than corrupting memory. The discriminant values are internal and not part of the ABI.

soliton.h OWNERSHIP comment says cross-type free is "undefined behavior" — Specification.md is normative: The generated soliton.h header may carry an OWNERSHIP comment stating that passing the wrong handle type to a _free function is "undefined behavior." This contradicts §13.6's normative claim that the type discriminant check catches this and returns InvalidData. Specification.md is normative; the header comment is documentation only. Binding authors reading soliton.h who see "undefined behavior" and add their own UB-protection wrappers (null-checking the outer pointer, refusing to call _free when uncertain of the handle type) may inadvertently mask the InvalidData return code. The correct model: the discriminant check is implemented; cross-type free returns InvalidData (-17); no memory is corrupted; binding wrappers should propagate InvalidData as a type-mismatch error, not treat cross-type free as safe to elide.

Pointer aliasing: Opaque handles must not be aliased. Copying a handle pointer via memcpy and then using both copies produces undefined behavior — specifically, two encrypt calls on aliased SolitonRatchet handles will use the same nonce, causing catastrophic AEAD nonce reuse. The CAPI does not enforce single-ownership at the API level; this is a caller obligation. If a binding language needs to share a handle across threads, it must serialize access (e.g., mutex).

14. Security Analysis

14.1 Compromised Community Server

Impact: Reads group plaintext, observes connected users, could modify or inject messages, present fake keys. Mitigations: Group chat visibility accepted by design (§11 — community storage is channel-keyed, not user-keyed). DMs E2E encrypted (§5-§6). Fake key presentation mitigated by verification phrases (§9) + key pinning + key change warnings.

14.2 Compromised DM Relay

Impact: Metadata (sender/recipient/timing), stored ciphertext, could substitute pre-keys. Mitigations: Content E2E encrypted (§5-§6). Pre-key substitution → hybrid signature verification fails (requires breaking both Ed25519 and ML-DSA; §3.2, §5.3).

14.3 Harvest-Now-Decrypt-Later

Impact: Recorded ciphertext held for future quantum computer. Mitigations: X-Wing ML-KEM-768 protects session keys (§8). ML-DSA-65 protects signature integrity (§3).

What a CRQC breaking X25519 alone cannot do: X-Wing combines X25519 and ML-KEM-768 with a SHA3-256 combiner (§8). Breaking X25519 yields ss_X but not ss_M. The session key is SHA3-256(ss_M || ss_X || ct_X || pk_X || label) — an attacker who knows ss_X but not ss_M cannot recover the session key. A classical quantum computer (CRQC) capable of Shor's algorithm against X25519 gains nothing unless ML-KEM-768 is simultaneously broken. The harvest-now-decrypt-later threat is neutralized for session keys as long as ML-KEM-768 remains secure. What is at risk: pre-key bundle signatures (if ML-DSA-65 is broken) and the X25519 component of initial session key material, which contributes to the hybrid combiner's IND-CCA2 security claim but not to security when ML-KEM-768 is intact.

14.4 Identity Key Compromise

Can: Impersonate user, sign fake pre-keys (§5.3), authenticate as user (§4). Cannot: Decrypt past or current sessions with IK alone (also requires SPK private key — see §5.6). Decrypt current sessions (needs ratchet keys, §6). Impersonate others to compromised user. Recovery: New identity keypair. Contacts re-verify phrases (§9).

IK + SPK capability window is bounded by the SPK retention policy: A combined IK-and-SPK compromise recovers session keys only while sk_SPK is retained. SPKs are rotated every 7 days and the private key is deleted 30 days after rotation (§10.2, Appendix B). After sk_SPK deletion, even combined IK + SPK capability cannot recover that session's key — ss_spk is no longer computable. The attacker's window is at most 37 days from SPK generation (7-day rotation interval + 30-day retention window). Sessions established more than 37 days before the compromise with no SPK re-use are retrospectively safe.

14.5 First Contact (TOFU)

On first contact without prior keys, mutual auth not guaranteed. Same as Signal/SSH. See §5.6.

Verification phrase birthday resistance (~2^45): Verification phrases (§9) provide a partial mitigation for TOFU key substitution, but their birthday resistance is limited to approximately 2^45 SHA3-256 operations (§9.4). A well-resourced attacker who controls key generation at scale can generate ~2^45 identity keys and, by the birthday paradox, find two that produce the same verification phrase when paired with a given victim key — substituting the colliding key passes the out-of-band check. For most threat models this is out of reach, but the limitation is relevant for environments with state-level adversaries. Applications with high-threat requirements SHOULD supplement verification phrase comparison with full 32-byte fingerprint comparison (64 lowercase hex characters, §2.1), which provides ~256-bit second-preimage resistance against key substitution. See §9.4 for the full collision analysis.

14.6 Ratchet State Desynchronization

Counter-mode derivation eliminates stateful chain advancement, the primary historical source of desynchronization. Session reset (§6.10) recovers at cost of in-flight messages.

14.7 Header Tampering

All header fields bound into AEAD AAD (§7.3-7.4). Tampering → AEAD failure. Prevents state poisoning.

14.8 Storage Blob Relocation

Channel and segment IDs in storage AAD (§11.4). Blobs cannot be moved.

14.8a Ratchet State Blob Substitution

Impact: An attacker with write access to persisted ratchet state blobs can substitute an older blob (replay attack) or a blob from a different session (session confusion), potentially recovering old messages or inducing key reuse.

Substitution of an older blob: Reloading a stale ratchet blob rolls back send_count, causing nonce reuse: the next encrypted message reuses a counter that was already used in the current epoch, producing a ciphertext under the same (key, nonce) pair as a previously sent message. AEAD nonce reuse with the same key recovers the XOR of the two plaintexts — a catastrophic confidentiality failure. Mitigations: per §6.8 Caller Obligation 2, callers MUST store the last-known epoch (new_epoch - 1) and pass it to from_bytes_with_min_epoch on reload. Any blob with epoch ≤ min_epoch is rejected with InvalidData. Callers who use from_bytes (no min_epoch) instead of from_bytes_with_min_epoch — or who store the min_epoch value in the same write-accessible store as the blob — have no protection against blob rollback.

Substitution of a different session's blob: A blob from a different session fails immediately at AAD reconstruction — the sender_fp and recipient_fp embedded in the ratchet state's AAD scheme (§6.8) will not match the expected values for this session, causing AeadFailed before any ratchet state is loaded. No cross-session confusion is possible without breaking the ratchet AEAD.

Countermeasures are documented in §6.8 (anti-rollback epoch guard, Caller Obligation 2) but are not automatically enforced — they require explicit caller action. Application authors and binding authors MUST implement the epoch-store pattern. See §6.8 for the full caller obligation list.

14.9 Pre-Key Exhaustion

Per-source-per-target rate limiting. Sessions without OPK secure with reduced initial FS.

Reduced initial FS — concrete window: When no OPK is used, the session's initial forward secrecy is bounded by the SPK's lifetime. SPKs are rotated weekly and retained for 30 days after rotation (Appendix B). Therefore, for an OPK-absent session, an attacker who later obtains the SPK private key (before it is deleted at 30 days post-rotation) can recover the session's initial shared secrets. The forward secrecy window is up to 30 days from session initiation — not the one-time, delete-on-use guarantee that OPK provides. For OPK-present sessions, deleting sk_OPK immediately after receive_session terminates the forward secrecy vulnerability window at that point, independent of SPK lifetime. Implementers calibrating OPK replenishment thresholds and the rate-limiting policy should note that OPK exhaustion degrades forward secrecy from "delete-on-use" to "30-day window," not to "no forward secrecy" — the SPK still provides forward secrecy after its private key is deleted.

14.10 Metadata

Relay knows sender/recipient/timing. No IP logging. Tor/VPN for elevated threats. DM padding is mandatory (Protocol §15.1); community padding is optional (Protocol §15.2).

14.11 Forced Session Reset

What an adversary gains from a forced reset: Denial of service — the session's in-flight messages become permanently undecryptable (the ratchet state is zeroized) and both parties must establish a new session via LO-KEX. The adversary learns nothing new: the post-reset state is all-zeros with no key material remaining.

Forward secrecy after reset: Reset does NOT provide retroactive forward secrecy for pre-reset messages. Messages encrypted before the reset remain at risk if the pre-reset epoch was already compromised. Reset terminates an active session; it does not erase the adversary's copy of previously captured ciphertext.

What a forced reset gives an attacker: Forcing a reset requires the attacker to produce a ratchet state inconsistency that triggers §6.9 recommendation 4 (unrecoverable decryption failure → call reset()). An attacker who can inject malformed ciphertexts can trigger repeated resets, denying service (all in-flight messages permanently lost per reset). This is no worse than the baseline capability of dropping messages — message suppression already prevents delivery — but repeated resets additionally force LO-KEX re-establishment overhead.

Mutual reset prerequisite: A reset by one party does not automatically reset the other's state. For the conversation to resume, both parties must independently detect the desynchronization (e.g., via application-layer re-key request) and perform new LO-KEX exchanges. An asymmetric reset — one party resets, the other does not — produces permanent desynchronization with no error distinguishable from transport loss.

14.12 KEM Ratchet — Single-Sided Randomness

LO-Ratchet's KEM ratchet differs from the Double Ratchet's DH ratchet in a fundamental way: only the encapsulator (new sender) contributes fresh randomness per step. In a DH ratchet, both parties contribute private key material to the shared secret. In a KEM ratchet, the decapsulator's contribution is their existing ratchet public key from a previous step.

Implication: If the encapsulator's RNG is compromised during a KEM ratchet step, that step does not advance forward secrecy. However, an RNG failure is catastrophic regardless — ephemeral keys, nonces, and all security-critical random values generated during the failure window are equally compromised. The ratchet's single-sided randomness is the least of the problems. Mitigating it would require bidirectional KEM per ratchet step (mandatory round-trip, doubled ciphertext) — costs that address a scenario already catastrophic for independent reasons.

This property is inherent to all KEM-based ratchets, not specific to LO-Ratchet. Mitigation: use the OS CSPRNG exclusively (getrandom).

14.13 Header Size Side Channel

KEM-ratchet-step headers are observably larger than same-chain headers. A ratchet header that includes a KEM ciphertext (has_kem_ct = 0x01) encodes to exactly 2,347 bytes on the wire (§7.4, Appendix C). A same-chain header (no KEM ciphertext, has_kem_ct = 0x00) encodes to exactly 1,225 bytes. The exact 1,122-byte difference is directly observable by any passive network adversary, regardless of transport encryption (the header is inside the encrypted channel, but the message size is observable as a traffic feature).

What this leaks: A passive adversary observing message sizes can infer when a party changes send direction (the encapsulating party's header grows by ~1,122 bytes). In a typical DM exchange, this reveals the alternating communication pattern — who initiated each new "round" of the conversation. This does not reveal message content or timing of individual messages within a round, but it does reveal the coarse structure of who speaks next after a silence.

Normative position: This is accepted leakage. LO's threat model (§14.10) acknowledges that relay operators observe communication metadata (sender, recipient, timing). Header size is a metadata feature visible at the same layer as message timing and count. Mitigating it would require padding all headers to a fixed size (2,347 bytes), adding ~1,122 bytes of overhead to every non-ratchet-step message — approximately doubling header overhead for typical high-frequency exchanges. The security benefit (hiding direction changes) is low: direction changes are correlated with reply events, which are already inferable from timing alone. Implementations MUST NOT treat this as a bug; this is a documented, accepted property.

Transport-layer mitigation (optional): Transports that pad traffic to fixed-size cells (e.g., QUIC with datagram padding, LO's Protocol §15.1 DM padding) may partially obscure this difference. This is a transport-layer concern, not a cryptographic one.

14.14 Version Downgrade Policy

LO uses a hard-fail version policy. verify_bundle rejects any crypto_version other than the currently supported version ("lo-crypto-v1"). There is no version negotiation, no silent fallback, and no "choose best supported" logic. An unrecognized version is treated as a malformed bundle — the session is aborted and the user is warned.

This eliminates downgrade attacks by design: an attacker who modifies crypto_version in a relayed bundle causes rejection, not degraded security. The crypto_version field is not signed (§5.3), but tampering with it produces the same outcome as dropping the bundle entirely — the attacker's best outcome is message suppression (denial of service), not weakened cryptography, which is no worse than the Dolev-Yao baseline capability of dropping messages.

Migration window downgrade risk (forward-looking — no v2 is currently defined; these are design requirements for a future migration window): The above guarantee holds only when a single version is in operation. During a v1→v2 migration window (where both versions are accepted), the crypto_version field is not signed and a network attacker relaying a bundle can substitute "lo-crypto-v2" with "lo-crypto-v1" — causing a v2-capable initiator connecting to a v2 peer to silently negotiate v1 instead. Unlike the single-version case, this substitution does NOT cause rejection (v1 is still accepted during the window), so the attacker's outcome is not message suppression but downgrade. Requirement for v2 deployment: The v2 pre-key bundle MUST sign the crypto_version field to prevent this substitution. This is a known gap in the current single-version design (§5.3 explicitly excludes crypto_version from the SPK signature); it MUST be corrected before deploying a migration window. Alternatively, bundle integrity can be protected at the transport layer (e.g., QUIC with server-authenticated certificate + pinning), ensuring that relay substitution is detectable. The v2 pre-key bundle format — including the signing message structure, wire layout, and migration mechanism — is out of scope for this specification. Implementers MUST NOT attempt to deploy a v2 migration window based solely on this note. The v2 format will be defined in a future version of this spec; v1 and any future v2 are disjoint wire formats with no backward-compatible relationship defined here.

(The following describes the intended migration mechanism — no v2 protocol is currently defined; no negotiation infrastructure need be built until v2 is specified.) Future version transitions (e.g., lo-crypto-v2) will support exactly two versions during a migration window. The older version will be removed in a subsequent release. At no point will more than two versions be accepted concurrently. Migration mechanism: The initiator reads crypto_version from the recipient's pre-key bundle and uses the highest mutually supported version. There is no separate negotiation handshake — version selection is implicit in the bundle. During a v1→v2 migration window, a v2-capable initiator connecting to a v1 peer (whose bundle advertises "lo-crypto-v1") uses v1. A v2-capable initiator connecting to a v2 peer uses v2. A v1-only initiator connecting to a v2-only peer fails at verify_bundle (unrecognized version). The recipient's bundle is the sole version signal; the initiator MUST NOT exceed the recipient's advertised version.

Dual role of crypto_version: The "lo-crypto-v1" string serves two independent purposes: (1) a wire field in pre-key bundles and session init, triggering hard-fail version rejection in verify_bundle and decode_session_init; and (2) a KDF domain separator embedded in the HKDF info (§5.4 Step 4), binding the session key derivation to the protocol version. A v2 migration requires changing both, for different reasons — the wire field for compatibility gating, the KDF label for cryptographic domain separation. Changing only the wire field would produce a version that hard-fails bundle verification but would derive the same session keys as v1 if it somehow bypassed the check. Changing only the KDF label would silently produce incompatible keys while the wire field still accepts the old version. Both must change atomically.

KDF label mismatch is undetectable at session establishment. When a version mismatch causes the initiator and responder to derive session keys using different crypto_version strings in the HKDF info, receive_session succeeds and init_alice/init_bob initialize without error — the divergent keys are not compared. The first observable symptom is AeadFailed at decrypt_first_message, with no diagnostic distinguishing "mismatched crypto_version in KDF" from "corrupted ciphertext" or "wrong session keys." An implementer of the migration window who mismatches the KDF label while matching the wire field will see what appears to be random AEAD failures with no obvious cause. The safe verification: after receive_session, compare the negotiated crypto_version from the SessionInit against both parties' KDF labels before proceeding.

Independent versioning axes: crypto_version (e.g., "lo-crypto-v1") governs session establishment (§5) — it determines KEM algorithms, HKDF labels, and wire formats for the key exchange. The ratchet blob version byte (§6.8, currently 0x01) governs ratchet state serialization and is independent. A new optional field in the ratchet blob increments only the blob version, not the protocol version. A lo-crypto-v2 transition would require, at minimum, a new crypto_version string and updated HKDF labels; the blob version can remain 0x01 if the ratchet format is unchanged. Similarly, the streaming AEAD header version (§15.2, currently 0x01) and storage blob format are separate versioning axes.

14.15 Non-Deniability

LO-KEX and LO-Ratchet do not provide deniability. Any message ciphertext and ratchet header can be cryptographically attributed to the sender: the sender_sig in §5.4 Step 6 is a hybrid signature (Ed25519 + ML-DSA-65) binding Alice's long-term identity key to the session init, and the AAD scheme (§6.5) binds each ratchet message to both parties' fingerprints. An adversary who obtains a session transcript can verify that the session was established with Alice's identity key. This is intentional — LO's threat model prioritizes verifiable authenticated channels over offline deniability (Signal's approach). Applications requiring offline deniability (e.g., protection against coerced evidence disclosure) MUST use an additional repudiability layer — such as omitting long-term signatures from stored transcripts, using per-session ephemeral signing keys without long-term key binding, or employing a deniable symmetric-key scheme for message bodies — and should not rely on soliton alone.

The §5.6 brief paragraph on deniability refers specifically to the short-term deniability window provided by the ephemeral KEM ciphertext: during session establishment, an observer who does not hold Alice's identity key cannot confirm authorship of the session init ciphertext (since Alice's EK_sk is ephemeral). However, once sender_sig is verified and the session is established, deniability is lost — the long-term identity binding is irrevocable.

14.16 Streaming AEAD and Ratchet Key Exposure

Per-stream random keys (§15.1) limit the blast radius of a ratchet epoch compromise. An adversary who compromises a ratchet epoch key can recover stream keys that transited that epoch — the stream key is transmitted inside a ratchet-encrypted message alongside stream metadata, so the epoch key decrypts the message and recovers the stream key. However, only streams whose keys were transmitted during the compromised epoch are affected; streams whose keys transited different epochs remain protected.

Batching multiple stream keys in a single ratchet message multiplies exposure — a single epoch compromise recovers all batched keys. The recommended pattern: one stream key per ratchet message. For streams spanning many chunks (large file transfers), the long-lived stream key's exposure window equals the ratchet epoch during which it was transmitted, regardless of the stream's total duration or chunk count.

Random-access-only callers of decrypt_chunk_at have no replay protection: Applications that use decrypt_chunk_at exclusively — never the sequential decrypt_chunk — have zero anti-replay protection. In sequential mode, presenting a previously-decrypted chunk at the same index incidentally fails because next_index has already advanced past that position (AEAD runs against the wrong index-derived nonce). In random-access mode, presenting the same (index, chunk) pair a second time succeeds identically — decrypt_chunk_at is stateless and has no memory of prior decryptions. The only cryptographic freshness binding is the per-stream CSPRNG-unique key (§15.1); within a single stream, any valid (key, index, chunk) triple is always decryptable. Applications building file delivery, random-access video streaming, or any repeat-query-capable API on top of decrypt_chunk_at MUST track successfully-decrypted indices at the application layer or arrange for single-use key material (§15.1). This threat does not require breaking the ratchet — it only requires access to the (key, chunk) material already held by the application. See §15.12 "Chunk replay" for the behavioral details.

Out-of-band key delivery (distinct threat from epoch compromise): If a stream key is delivered via an unencrypted or weakly-authenticated channel — for example, via plaintext HTTP, an unauthenticated metadata API, or a push notification service with no end-to-end encryption — the stream is protected only by transport security on the key delivery path, not by the ratchet. An adversary who intercepts the key delivery (e.g., via MITM on the key delivery channel, server compromise, or push notification interception) can decrypt all stream chunks even if the ratchet session itself is fully uncompromised. This is a distinct threat from the ratchet epoch compromise scenario above: epoch compromise requires breaking the ratchet's cryptographic properties; out-of-band key exposure requires only access to the unprotected delivery channel. The mitigation is the same pattern specified in §15.1: always deliver stream keys inside ratchet-encrypted messages, never via separate channels. Any deviation from this pattern removes the ratchet's protection for the affected streams.

Streaming AEAD format version bumps use the version byte, not a new label: The streaming header version byte (0x01) is included in every per-chunk AAD (§15.4), which provides cryptographic domain separation between format versions. A v2 streaming format MUST increment the version byte (reader sees 0x02 → UnsupportedVersion, or negotiates accordingly). Adding a new label string (e.g., "lo-stream-v2") would be redundant — the version byte in AAD already provides the domain separation. Conversely, a format change that does NOT increment the version byte but uses a different label produces ciphertexts that are undistinguishable at the header level from v1, causing opaque AEAD failures rather than clean UnsupportedVersion errors.

14.17 Post-Compromise Security Healing Boundary

LO-Ratchet provides post-compromise security (PCS) — recovery of message confidentiality after a transient key compromise, without requiring a new session. The healing mechanics and boundary conditions are specified in §6.13. Key points for formal models:

Healing event (initial step): PCS healing begins at the KEM decapsulation step on the previously-compromised party — specifically, when the compromised party receives and successfully decapsulates a message containing a new ratchet_pk from the uncompromised peer. After this step, new-epoch messages are immediately protected by a recv_epoch_key unknown to the attacker. The uncompromised party's encapsulation step (which sends the new KEM ciphertext) does not itself heal the compromised party. Full healing requires a second KEM ratchet step — after the first decapsulation, prev_recv_epoch_key still holds the compromised epoch key (now as the previous-epoch backup). An attacker with the compromised key can still decrypt late-arriving previous-epoch messages until the second KEM ratchet step discards prev_recv_epoch_key. See §6.13 for FullyHealed(session, t₄) — the formal two-step healing definition.
Why decapsulation is the boundary: Before decapsulation, the compromised party still holds old epoch keys derivable from known state. After decapsulation, the new root_key and send_epoch_key derive from a KEM shared secret that was never exposed — the attacker who held the old epoch key cannot reproduce the new epoch keys. The healing epoch therefore begins at the decapsulation event on the previously-compromised side.
One-directional streams do not heal: If the compromised party never receives a message from the uncompromised peer (and therefore never decapsulates a new KEM ciphertext), no KEM ratchet step occurs and PCS healing never happens. See §6.13 for the full list of PCS boundary conditions and exclusions.
Known weakening — prev_recv_epoch_key survives the first KEM ratchet step: Corrupt(RatchetState) at any point after the first KEM ratchet step but before the second still exposes prev_recv_epoch_key — the previous epoch's key is retained for one grace period (§6.6). An attacker who compromises the session state between the first and second KEM ratchet steps can decrypt all messages from the prior epoch using prev_recv_epoch_key, even though new-epoch messages (protected by the freshly-derived recv_epoch_key) are safe. This corresponds to Abstract.md Theorem 4 / Lemma 4b: FullyHealed(session, t) requires t to be after the second KEM ratchet step, not the first. This is a deliberate design tradeoff — retaining prev_recv_epoch_key for one step enables decryption of late-arriving previous-epoch messages without storing per-message keys. The two_kem_ratchets_expire_old_epoch integration test is the empirical evidence that this two-step behavior is intentional and tested, not an oversight. Formal models of the PCS property MUST use the FullyHealed predicate from §6.13 (which captures the two-step boundary), not a simpler "healed after one ratchet step" approximation.
Formal modelers: A PCS lemma derived from §14 without consulting §6.13 risks placing the healing event at the wrong point in the protocol transcript. The normative PCS specification is §6.13; §14 provides the threat-model framing only.

14.18 New-Epoch Path as Unauthenticated KEM Decapsulation Oracle

Every incoming ratchet message whose header.ratchet_pk does not match recv_ratchet_pk or prev_recv_ratchet_pk takes the new-epoch path and triggers a full X-Wing decapsulation (dominated by ML-KEM-768). ML-KEM implicit rejection (§8.4) means this operation never returns an error for invalid inputs — a mismatched or maliciously crafted ratchet_pk produces a pseudorandom shared secret, which derives a wrong recv_epoch_key, which causes AEAD failure, which triggers snapshot rollback. The session is unharmed, but the decapsulation was performed unconditionally.

Performance context: In the pure-Rust implementation on modern 64-bit hardware, a full new-epoch path execution (X-Wing decapsulation + KDF + AEAD failure + rollback) completes well under 10 µs per message. This was measured by the soliton fuzzer sustaining over 190 000 executions per second per core. At this rate, a sustained injection of 190 000 crafted messages per second consumes at most one CPU core — no more than any other high-throughput CPU-bound workload. On 64-bit hardware this is not a meaningful denial-of-service vector.

Accepted tradeoff: The epoch routing decision (§6.6) relies solely on comparing header.ratchet_pk — a cleartext field — against stored public keys. No authentication occurs before decapsulation. Deferring decapsulation to post-authentication would require knowing the correct epoch key before AEAD runs, creating a circular dependency. This means the KEM decapsulation step is inherently unauthenticated. The cleartext ratchet_pk field already reveals epoch transitions to any observer, so the timing of the new-epoch path is not a novel information leak — it is observable from the public key value alone.

Residual concern: The performance characterization above applies to modern 64-bit hardware. Deployments on severely resource-constrained targets (e.g., hobby-grade 32-bit microcontrollers) where ML-KEM-768 decapsulation is orders of magnitude slower should evaluate whether transport-layer sender authentication is appropriate before messages reach the ratchet layer. soliton does not target 32-bit platforms and offers no performance guarantees for them.

§15 Streaming AEAD

Chunked authenticated encryption for large payloads (file transfer, attachments). Enables disk-to-disk encryption in fixed-size chunks without holding the full payload in memory. Inspired by the STREAM construction (Hoang, Reyhanitabar, Rogaway, Vizár, 2015) but uses counter-based nonce derivation for random-access decryption rather than ciphertext chaining.

15.1 Construction

Each stream uses a single caller-provided 32-byte key and a random 24-byte base nonce (generated from the OS CSPRNG). The key MUST be freshly generated from the OS CSPRNG for each stream — reusing a key across streams is catastrophic (see §15.12). The key is not managed by the library — key wrapping is the caller's responsibility (the standard pattern: generate a random 32-byte key, encrypt the stream, then encrypt the key in a ratchet message alongside the stream metadata). Deriving the streaming key deterministically from ratchet material is unsafe: ratchet compromise would propagate to all streaming keys derived from the compromised epoch, defeating the per-stream isolation that fresh randomness provides. Plaintext is split into 1 MiB chunks, each independently encrypted with XChaCha20-Poly1305 using a per-chunk nonce derived from the base nonce and chunk index.

Security model: The stream header (including base_nonce) is not secret — an adversary who knows the header and all ciphertexts but not the key cannot decrypt any chunk. The key is the sole secret; it is not contained in or recoverable from the header. Losing the key makes the stream permanently undecryptable.

No KDF step: The caller-provided key is used directly as the XChaCha20-Poly1305 key — there is no HKDF or other derivation step between the input key and the AEAD key. No KDF is needed because the caller is required to supply a fresh 256-bit CSPRNG key (§15.1 "key MUST be freshly generated from the OS CSPRNG"): a uniformly distributed 256-bit value already saturates XChaCha20-Poly1305's key entropy, so HKDF's extract phase adds no security benefit. A reimplementer who adds a KDF step (e.g., HKDF-SHA3-256(key, base_nonce, "stream")) produces incompatible ciphertext that the reference implementation cannot decrypt.

All-zero key policy: Unlike the storage keyring (§11.6), which explicitly rejects all-zero keys via constant-time check, the streaming layer does not validate that the caller-provided key is non-zero. This is a caller obligation. The storage layer's active guard exists because keys are long-lived and stored in a keyring managed by the library; the streaming layer's keys are ephemeral, caller-provided, and used once — validating them would shift a caller responsibility into a layer that cannot meaningfully enforce it (the caller could pass any weak key, not just all-zeros). A reimplementer who adds an all-zero guard to the streaming layer for "consistency" with storage creates a behavioral divergence from the specification.

Caller key zeroization: The library copies the key into the opaque encryptor/decryptor handle on initialization. The caller's original key buffer is not zeroed by the library. After calling soliton_stream_encrypt_init or soliton_stream_decrypt_init, the caller MUST zeroize their copy of the key via soliton_zeroize (CAPI) or Zeroizing wrapper (Rust). The handle's internal copy is zeroized automatically when the handle is freed.

15.2 Wire Format

Buffer-allocation quick reference: All streaming sizes (STREAM_HEADER_SIZE, CHUNK_SIZE, STREAM_CHUNK_OVERHEAD, STREAM_ZSTD_OVERHEAD, STREAM_ENCRYPT_MAX, STREAM_CHUNK_STRIDE) are defined with their derivations in Appendix A. A consolidated buffer-sizing summary table for binding authors is in Appendix B.

Stream = Header || Chunk₀ || Chunk₁ || ... || ChunkN

Header (26 bytes):
  version       (1)     — stream format version (0x01)
  flags         (1)     — bit 0: compression (0 = none, 1 = zstd), bits 1-7: reserved (must be zero)
  base_nonce    (24)    — random, unique per stream

Chunk:
  tag_byte      (1)     — 0x00 = non-final, 0x01 = final
  ciphertext    (variable)   — AEAD output: encrypted plaintext + 16-byte Poly1305 tag

tag_byte interpretation: On decrypt, only the value 0x01 is treated as final. Any other value (including 0x00 and hypothetical future values) is treated as non-final. Implementations MUST NOT reject unknown tag_byte values pre-AEAD — the tag_byte is authenticated via inclusion in both the nonce (§15.3) and the AAD (§15.4), so a chunk with a hypothetical future tag_byte 0x02 from a newer writer would fail AEAD on any older reader (the nonce and AAD would differ from what the encryptor used). The "lenient decoding" means: don't add a pre-AEAD guard that rejects non-0x00/0x01 values, because AEAD already provides the rejection. On encrypt, only 0x00 and 0x01 are produced.

The library does not embed length prefixes between chunks. Chunk delimitation is a transport/storage concern — different transports (QUIC, WebSocket, HTTP/2) and storage backends (object stores, flat files) have different framing mechanisms.

Compressed stream chunk framing is NOT specified as an interoperability format: This spec does not define a normative chunk-length framing format for compressed streams. The compressed streaming feature (flags & 0x01 == 1) is a single-implementation feature: it is designed to be written and read by the same soliton implementation (or a reimplementation that derives its own chunk framing scheme from this spec). The wire format specifies the AEAD construction, the nonce derivation, and the header layout — but NOT how the variable-length compressed ciphertext chunks are delimited on the wire when transported across a byte stream. A reimplementer who builds an independent implementation targeting cross-implementation interoperability for compressed streams MUST define and negotiate a chunk framing mechanism out-of-band (e.g., HTTP chunked encoding, a length-prefix layer, or an out-of-band chunk index). Without this, two independent compressed-stream implementations will fail to interoperate at the transport level even though their AEAD layer is identical.

Compressed streams are NOT self-delimiting: For a compressed stream (flags & 0x01 == 1), non-final chunks have variable ciphertext size (1 to CHUNK_SIZE + STREAM_ZSTD_OVERHEAD + 16 bytes, depending on content and compression ratio). There is no fixed stride — the transport MUST provide per-chunk lengths (e.g., HTTP chunked encoding, a length-prefix framing layer, or an index built during encryption). A reimplementer who applies the 1,048,593-byte fixed-stride read algorithm to a compressed stream will misalign at the first chunk boundary, causing all subsequent AEAD decryptions to fail.

Recommended framing for compressed streams: When transporting a compressed stream over a raw byte channel (file, TCP socket, UNIX pipe), implementers SHOULD prefix each chunk's ciphertext with a 4-byte big-endian u32 length field giving the ciphertext byte count (not including the tag_byte or the length prefix itself). The on-wire layout per chunk becomes tag_byte (1) || ciphertext_len (4, BE u32) || ciphertext (ciphertext_len bytes). ciphertext_len is the AEAD output byte count: len(compressed_plaintext) + 16 — the 16-byte Poly1305 authentication tag is part of the ciphertext and is included in ciphertext_len, not a separate field. For an empty final chunk (0 bytes of compressed plaintext), ciphertext_len = 16. A reimplementer who excludes the 16-byte Poly1305 tag from ciphertext_len (treating it as overhead outside the count) produces length values 16 bytes short per chunk, causing the reader's framing to misalign immediately after the first chunk. This framing is simple, has zero overhead relative to AEAD (the ciphertext already contains the Poly1305 tag), and enables a reader to allocate exactly the right buffer for each chunk without look-ahead. Implementations that deviate from this framing for compressed streams will be silently incompatible with conforming implementations at the transport layer even though their AEAD output is identical. Uncompressed streams do not need this framing — the fixed stride already provides delimitation (§15.2 above).

Uncompressed stream sequential read algorithm: For an uncompressed stream (flags & 0x01 == 0), every non-final chunk is exactly 1,048,593 bytes on the wire (1 tag_byte + 1,048,576 plaintext bytes encrypted to 1,048,576 + 16 AEAD ciphertext bytes = 1,048,593 total). A sequential reader reads fixed-size chunks until it encounters a chunk with tag_byte = 0x01 (final). Because the wire size is fully determined by CHUNK_SIZE (1,048,576 bytes, see §A Constants), a streaming implementation can read exactly 1,048,593 bytes per non-final chunk without a length prefix — no look-ahead required. This derivation combines §15.2 (wire format) and §15.6 (chunk sizing); it is stated here to spare streaming-layer implementers from reconstructing it.

Short final chunk from an unframed transport: The final chunk has variable ciphertext size (17 bytes minimum — 1 tag_byte + 16 Poly1305 tag for empty plaintext — up to 1,048,593 bytes). When reading from an unframed byte stream (raw TCP, file read), the algorithm is: attempt to read 1,048,593 bytes; if the transport delivers fewer bytes (because it reached EOF or end-of-stream), those fewer bytes constitute the final chunk. The size shortfall is not an error — it is the signal that the final chunk has been received. A reimplementer who requires exactly 1,048,593 bytes for every chunk (including the final) will reject all non-MiB-boundary streams. Note: pre-framed transports (QUIC streams, WebSocket messages, HTTP chunked encoding) deliver chunks with explicit boundaries and do not exhibit this ambiguity.

Minimum final chunk size and transport accumulation obligation: A final chunk delivered with fewer than 17 bytes returns AeadFailed — the 16-byte Poly1305 tag plus 1 tag_byte is the irreducible minimum (encrypting zero plaintext bytes produces a 16-byte tag with no ciphertext). Transport implementations MUST accumulate bytes until either 1,048,593 bytes are in hand (a full non-final stride) or a clean stream EOF signal before presenting a chunk to the decryption layer. Presenting a partial chunk (e.g., 8 bytes of a truncated stream) to decrypt_chunk returns AeadFailed with no indication of whether the data was truncated in transit or the key/nonce was wrong — the AEAD cannot distinguish these cases.

15.3 Nonce Derivation

Per-chunk nonce is derived by XORing a 24-byte mask into the base nonce:

mask = chunk_index (8 bytes, big-endian u64)
    || tag_byte   (1 byte: 0x00 = non-final, 0x01 = final)
    || 0x00 * 15  (15 zero bytes, padding)

chunk_nonce = base_nonce XOR mask

Bytes	Mask content	Purpose
0-7	`chunk_index` (u64 BE)	Distinct nonce per chunk position
8	`tag_byte`	Distinct nonce for final vs non-final at same index
9-23	`0x00`	No effect on base nonce (XOR with zero is identity)

These bytes MUST be zero in the mask. A reimplementer who places additional data in bytes 9-23 of the mask produces nonces incompatible with any conforming implementation.

Injectivity: For two chunks (i₁, t₁) and (i₂, t₂), mask₁ = mask₂ iff i₁ = i₂ and t₁ = t₂. Since XOR with a constant is a bijection, distinct (index, tag_byte) pairs always produce distinct nonces.

15.4 AAD Construction

aad = "lo-stream-v1"              // 12 bytes, domain label
   || version                      // 1 byte
   || flags                        // 1 byte
   || base_nonce                   // 24 bytes
   || chunk_index                  // 8 bytes, big-endian u64
   || tag_byte                     // 1 byte (0x00 or 0x01)
   || caller_aad                   // variable, caller-supplied context

Total AAD: 47 + len(caller_aad) bytes. caller_aad is optional application-level context (file ID, channel ID) provided once at stream init and constant across all chunks. caller_aad is not treated as secret material. The library stores it in a plain buffer without Zeroizing and does not zeroize it on handle destruction. Callers MUST NOT pass sensitive values (private paths, internal batch IDs, authentication tokens) as caller_aad — use only public or non-sensitive identifiers. It is the terminal field with no length prefix — the first 47 bytes have a fixed layout, so caller_aad is unambiguously everything after byte 46. Omitting the length prefix is intentional, not an oversight. A reimplementer who adds a 2-byte BE length prefix for consistency with other length-prefixed fields in the protocol (e.g., §7.4's session init encoding) produces different AAD bytes and AEAD authentication failure. The implementation captures caller_aad at init time and reuses the same bytes for every chunk. A reimplementer constructing per-chunk AAD manually MUST use identical caller_aad bytes for every chunk in the stream — varying the caller portion produces AEAD authentication failure on decrypt with no diagnostic indicating which field changed.

caller_aad is a raw byte string — C callers MUST NOT use strlen() to derive its length: caller_aad may contain null bytes (e.g., a binary UUID, a binary file identifier, an all-zero channel ID). C binding authors who pass strlen(aad) as aad_len silently truncate caller_aad at the first null byte, producing wrong AAD and AeadFailed on every decrypt_chunk call. Always pass the explicit byte count: soliton_stream_decrypt_init(key, key_len, header, header_len, aad, aad_len, out) where aad_len = sizeof(aad_array) or a separately-tracked length, never strlen(aad).

caller_aad mismatch is not detected at stream_decrypt_init: stream_decrypt_init accepts any caller_aad bytes without checking them against the stream's encrypted header (the header contains only version, flags, and base_nonce — there is no stored hash or commitment of caller_aad). A mismatch between the encrypt-side and decrypt-side caller_aad values first manifests as AeadFailed on the first decrypt_chunk call. Callers who supply a wrong caller_aad to stream_decrypt_init will always receive AeadFailed from decrypt_chunk, not from decrypt_init, with no indication at init time that the context binding is wrong.

AAD component	Prevents
`version`	Version downgrade
`flags`	Flag flipping (e.g., compression flag → skip decompression)
`chunk_index`	Chunk reordering
`base_nonce`	Cross-stream splicing
`tag_byte`	Truncation (stripping final marker)
`caller_aad`	Context confusion (file from channel X served as channel Y)

caller_aad size recommendation: caller_aad is semantically a file ID, channel ID, or similar context identifier — typically a few bytes to a few hundred. There is no protocol-level size limit (the CAPI 256 MiB general input cap applies), but large values produce multiplicative work: every chunk's AEAD runs Poly1305 over 47 + len(caller_aad) bytes of AAD. With a 256 MiB caller_aad and thousands of chunks, the aggregate AAD processing dominates total encryption time. Recommended maximum: 4096 bytes. Applications needing to bind larger context should hash it first (e.g., SHA3-256(full_context)) and pass the 32-byte digest as caller_aad.

15.5 Compression

Per-chunk zstd compression (Zstandard, RFC 8878), controlled by flags bit 0. When enabled, each chunk's plaintext is independently compressed before encryption. Empty plaintext (0-byte final chunk) bypasses compression regardless of the flag.

"Non-empty" check applies to post-AEAD plaintext, not ciphertext: The bypass condition for empty final chunks is checked on the plaintext after AEAD decryption, not on the raw ciphertext length. A 16-byte ciphertext (Poly1305 tag only, decrypting to 0 bytes of plaintext) is empty by this definition; a ciphertext whose decrypted content is 0 bytes after decompression is also empty. A reimplementer who checks ciphertext.len() == 0 (before decryption) instead of plaintext.len() == 0 (after decryption) will incorrectly attempt zstd decompression on a 0-byte buffer — resulting in a decompression error that collapses to AeadFailed (§15.7) with no diagnostic pointing to the misplaced check.

flags is a stream-level constant, not a per-chunk value. The flags byte is set once at stream initialization and appears identically in every chunk's AAD (§15.4) — including the final chunk, even when that chunk is empty and compression is bypassed. A reimplementer who interprets flags as "was this specific chunk compressed" and writes 0x00 for the empty final chunk when compress = true produces an AAD mismatch and AEAD failure on decrypt. The flags byte records the stream's compression configuration, not the per-chunk compression outcome.

Pipeline:

Encrypt (compression enabled, non-empty): plaintext → zstd compress → AEAD encrypt → prepend tag_byte.
Encrypt (compression disabled, or empty): plaintext → AEAD encrypt → prepend tag_byte.
Decrypt: read tag_byte → AEAD decrypt → (if compressed and non-empty) zstd decompress → plaintext.

Caller-visible buffer layout: encrypt_chunk produces a single output buffer containing tag_byte (1) || AEAD_ciphertext (plaintext_len + 16). The tag_byte is prepended and returned as part of the output — callers do NOT append it separately. decrypt_chunk expects the same layout as input: tag_byte (1) || AEAD_ciphertext. A reimplementer who returns only the AEAD ciphertext (without tag_byte) from encrypt_chunk, expecting the caller to prepend it, produces an API that is incompatible with the standard wire format and with CAPI callers who use the output buffer directly.

Compression level: Fastest (~1), matching encrypt_blob. Pure Rust via ruzstd. No dictionary (per-chunk independent, required for random access). Max decompressed size per chunk: CHUNK_SIZE (1 MiB).

Compression oracle (CRIME/BREACH): When attacker-controlled content is mixed with secret data in the same chunk, the per-chunk compressed size leaks information about the secret via adaptive chosen-plaintext. An attacker who can influence the plaintext and observe chunk wire sizes can iteratively extract secrets by measuring compression ratios. Since chunks compress independently with no cross-chunk dictionary, this oracle is bounded to within a single chunk — an attacker who places controlled content in chunk 0 cannot learn anything about secrets in chunk 5. Callers who separate attacker-influenced data from secrets across chunk boundaries do not need to disable compression for the entire stream. Use compress = false only when attacker-influenced data and secrets coexist within the same chunk (e.g., a single chunk containing both a user-supplied filename and session metadata).

15.6 Chunk Sizing

Non-final chunks: plaintext MUST be exactly CHUNK_SIZE (1 MiB). Enforced on both encrypt and decrypt sides. The timing of the size check differs by compression mode, and this asymmetry is security-relevant:

Uncompressed: The on-wire chunk is tag_byte (1) || AEAD_ciphertext (CHUNK_SIZE + 16) — total wire size CHUNK_SIZE + 17 (= CHUNK_SIZE + CHUNK_OVERHEAD). After reading the tag_byte byte, the AEAD ciphertext size is deterministic (CHUNK_SIZE + 16). The decryptor checks the AEAD ciphertext length pre-AEAD (framing check, InvalidData) before attempting decryption. "Chunk wire length" in this context means the AEAD ciphertext portion (not counting the already-read tag_byte). A reimplementer who defers this check to post-AEAD wastes cycles decrypting malformed chunks.
Compressed: ciphertext size is non-deterministic (compression ratio varies), so the plaintext-size check occurs post-AEAD after decrypt + decompress. The decompressed output must be exactly CHUNK_SIZE (not merely ≤ CHUNK_SIZE) — both undersized and oversized decompressed non-final chunks are rejected as AeadFailed (post-auth error collapse per §15.7). Returning a distinct error (e.g., InvalidData or DecompressionFailed) for either size mismatch would create a post-AEAD size oracle. A reimplementer who checks compressed chunk sizes pre-AEAD creates an oracle: rejecting a chunk before authentication reveals that the size check (not the AEAD) failed, leaking information about the expected plaintext size. No pre-AEAD ciphertext cap is applied for compressed chunks at the streaming layer — the CAPI 256 MiB input cap (§13.2) provides the outer bound. A legitimate compressed chunk is at most STREAM_ENCRYPT_MAX (= CHUNK_SIZE + ZSTD_OVERHEAD + CHUNK_OVERHEAD = 1,048,849 bytes — the maximum CAPI output buffer for one encrypted chunk); without a tighter cap, a peer can force AEAD attempt on up to 256 MiB of ciphertext before authentication fails. This is intentional — any tighter pre-AEAD cap would create the same oracle it is designed to prevent. Exception: a cap of exactly STREAM_ENCRYPT_MAX (1,048,849 bytes) is safe — it eliminates only inputs no conforming encryptor could produce (the reference encryptor never outputs a compressed chunk exceeding STREAM_ENCRYPT_MAX bytes) and does not create an oracle about the compression ratio or the expected size of valid ciphertext. The oracle concern applies only to caps tighter than the maximum conforming encryptor output.

Normative cap statement for compressed non-final chunk pre-AEAD: Implementations MAY apply a pre-AEAD ciphertext size cap of exactly STREAM_ENCRYPT_MAX (1,048,849 bytes). Implementations MUST NOT apply a pre-AEAD cap below STREAM_ENCRYPT_MAX — a cap tighter than the maximum conforming encryptor output creates the oracle it is designed to prevent (it would reject valid ciphertexts from a conforming peer, causing AeadFailed for valid data and allowing timing-based oracle inference). The reference implementation applies no tighter cap than the outer 256 MiB CAPI bound. An implementation applying the optional STREAM_ENCRYPT_MAX cap MUST return InvalidLength for ciphertext inputs exceeding that cap — not AeadFailed. The pre-AEAD size check fires before any AEAD operation, so InvalidLength is the correct variant (the input exceeds the size constraint, not the authentication check). This is an acceptable, documented divergence from the reference: the reference returns AeadFailed for oversized inputs (the 256 MiB CAPI cap returns InvalidLength, but inputs between STREAM_ENCRYPT_MAX and 256 MiB proceed to AEAD which then fails). Callers testing against both implementations MUST handle either InvalidLength or AeadFailed for inputs in the STREAM_ENCRYPT_MAX + 1 to 256 MiB range.

Accepting undersized non-final chunks in either mode would allow malformed streams where chunk boundaries are shifted, corrupting random-access offset calculations.

Encrypt-side non-final wrong-size is a caller bug — no library-level enforcement beyond the error return: The encrypt_chunk function returns InvalidData when a non-final chunk's plaintext length ≠ CHUNK_SIZE. There is no additional internal guard that prevents the caller from ignoring the error and continuing to encrypt subsequent chunks — the error is informational. The streaming state is unchanged on InvalidData from wrong chunk size (§15.11 atomicity), so a caller who ignores the error and re-calls encrypt_chunk with a different size produces a stream with inconsistent chunk sizes. This is a caller programming error; the library cannot enforce correct behavior beyond the error return for the offending call. The distinct error (InvalidData not AeadFailed) ensures this is diagnosable — it fires before AEAD, so it is safe to expose the distinction without creating an oracle.

Final chunk: plaintext may be 0..=CHUNK_SIZE. A final chunk exceeding CHUNK_SIZE is rejected as InvalidData (not InvalidLength — the type is correct (bytes of plaintext) but the value violates the chunk-size structural constraint; not AeadFailed — this is a pre-AEAD framing check on the plaintext length, not a post-authentication error). An empty file produces one final chunk (tag_byte + 16-byte AEAD tag = 17 bytes). Every valid stream has exactly one chunk with tag_byte=0x01.

Compressed final chunk decompressing beyond CHUNK_SIZE: A compressed final chunk that decompresses to more than CHUNK_SIZE bytes returns AeadFailed — the post-AEAD error collapse (§15.7) applies to all decompression-side size violations, including the final chunk. Reimplementers MUST NOT return a distinct error (DecompressionFailed, InvalidData) for this case — doing so creates an oracle distinguishing "AEAD passed, decompression size check failed" from "AEAD failed."

Minimum valid stream: 26 (header) + 17 (empty final chunk) = 43 bytes.

15.7 Error Oracle Collapse

Two categories of errors are collapsed to AeadFailed for oracle prevention, for different reasons:

Post-authentication errors (decompression failure, size mismatch): collapsed to prevent a 1-bit oracle distinguishing "authentication succeeded but post-processing failed" from "authentication failed." These checks fire after AEAD succeeds, so distinguishing them from AeadFailed would confirm that authentication passed — leaking information about key correctness.
Pre-authentication header errors (reserved flag bits): collapsed to prevent a 1-bit oracle distinguishing "unsupported flag combination" from "wrong key." Reserved-bit checks fire at stream_decrypt_init, before any chunk AEAD, so returning a distinct error would allow an attacker to distinguish "correct key with malformed header" from "wrong key" by probing the flag byte. This is a different oracle than the post-AEAD case but equally undesirable.

Pre-authentication checks on publicly visible fields (UnsupportedVersion for version byte, InvalidData for uncompressed chunk framing) do not create oracles because the checked values are visible to anyone who observes the header or chunk.

Error origin table for stream initialization and decryption:

Error	Returned from	Phase
`UnsupportedVersion`	`stream_decrypt_init`	Header parsing — version byte checked at init, before any chunk
`AeadFailed` (reserved flag bits)	`stream_decrypt_init`	Header parsing — flag byte checked at init, before any chunk
`AeadFailed` (authentication failure)	`stream_decrypt_chunk` / `stream_decrypt_chunk_at`	Per-chunk AEAD
`AeadFailed` (decompression failure)	`stream_decrypt_chunk` / `stream_decrypt_chunk_at`	Post-AEAD (oracle collapse)
`AeadFailed` (size mismatch post-decompress)	`stream_decrypt_chunk` / `stream_decrypt_chunk_at`	Post-AEAD (oracle collapse)
`InvalidData` (wrong non-final chunk size, uncompressed)	`stream_decrypt_chunk` / `stream_decrypt_chunk_at`	Pre-AEAD framing (not oracle — checked value is public)
`AeadFailed` (chunk shorter than 17 bytes)	`stream_decrypt_chunk` / `stream_decrypt_chunk_at`	Pre-AEAD oracle collapse — 17 bytes is the minimum valid chunk (1 `tag_byte` + 16-byte Poly1305 tag with zero plaintext). Returning `InvalidData` for chunks shorter than 17 bytes would allow an attacker to distinguish "chunk too short to attempt AEAD" from "valid-length but wrong tag." The same oracle-collapse rationale as §12's undersize-ciphertext row applies here for the streaming layer. Reimplementers who add a pre-AEAD `if len(chunk) < 17: return InvalidData` guard violate this requirement.
`InvalidData` (oversized final chunk plaintext)	`stream_encrypt_chunk` only	Pre-AEAD framing (encrypt side only — plaintext size is known before AEAD on the encrypt path; the decrypt path has no pre-AEAD plaintext size check for the final chunk, because the decrypted size is unknown until AEAD succeeds)
`InvalidData` (post-finalization call)	`stream_decrypt_chunk`	State guard
`ChainExhausted`	`stream_decrypt_chunk`	Counter guard (sequential only)

A reimplementer who places the version-byte check in the per-chunk path (returning InvalidData on each chunk that encounters an unexpected version) instead of in stream_decrypt_init will diverge from the specified error ordering — callers who check the return code of stream_decrypt_init expect to detect version mismatches before processing any chunks. The version byte appears only in the header, not per-chunk — a reimplementer checking version per-chunk is also structurally wrong (the version byte is not re-read from each chunk's data).

15.8 Version and Flags Handling

Version byte 0x01 is accepted; all other values rejected with UnsupportedVersion at init time (stream_decrypt_init), before any chunk is processed. Reserved flag bits (1-7) must be zero; non-zero reserved bits are also rejected at init time with AeadFailed (oracle collapse — attacker-controlled header field). Both checks fire during header parsing, not during the first chunk decrypt. A reimplementer who defers the reserved-bits check to per-chunk AEAD will observe different error ordering (the error appears on the first decrypt_chunk call rather than on stream_decrypt_init), producing divergent behavior in error-ordering tests.

Asymmetry rationale — why version gets UnsupportedVersion but flags get AeadFailed: The version byte is a public implementation-capability indicator. Returning UnsupportedVersion for an unknown version enables the caller to distinguish "library version too old, upgrade required" from "authentication failure" without any oracle risk — an attacker who knows the version byte (which is in the cleartext header) gains no information about key correctness by learning that the version is unsupported. The flags byte is security-relevant: an attacker who controls the flags byte and can observe the error response gains a key-verification oracle — if the correct key is loaded and only the flag is wrong, a distinct error would confirm key correctness. Collapsing flags errors to AeadFailed removes this distinguisher. In short: unknown version → caller needs to upgrade, expose clearly; unknown flags → potential adversarial probe, collapse to prevent oracle.

15.9 Chunk Index Exhaustion

The sequential encryptor and decryptor maintain a next_index: u64 counter (initially 0). Before each chunk operation, if next_index == u64::MAX, the operation returns ChainExhausted without encrypting or decrypting. This prevents next_index + 1 from wrapping to 0, which would reuse the chunk 0 nonce — catastrophic for AEAD security. The random-access decrypt_chunk_at does not maintain a sequential counter and accepts any u64 index directly, so exhaustion does not apply. Passing u64::MAX as the index is not guarded — it computes a valid nonce and attempts AEAD decryption, which will return AeadFailed (no encryptor could have produced a chunk at that index due to the sequential exhaustion guard). The nonce for index u64::MAX with a non-final tag byte is computed as base_nonce XOR (0xFFFFFFFFFFFFFFFF || 0x00 || 0x00{15}), i.e., the first 8 bytes of the mask (bytes 0-7, the chunk_index field encoded as a big-endian u64) are all 0xFF. This is a structurally valid XChaCha20-Poly1305 nonce — the AEAD proceeds, finds no matching ciphertext, and returns AeadFailed. Reimplementers MUST NOT add a ChainExhausted guard to decrypt_chunk_at — the function is stateless and cannot know whether the index is "valid."

expected_index() value after ChainExhausted: When a sequential encryptor or decryptor returns ChainExhausted (at next_index == u64::MAX), expected_index() / soliton_stream_decrypt_expected_index returns u64::MAX. The counter is not cleared, reset, or advanced on the exhaustion guard — it retains the value that triggered the guard. A reimplementer who advances or resets next_index on ChainExhausted will return the wrong value from expected_index() and break callers who inspect the counter after exhaustion to determine how many chunks were processed.

ChainExhausted boundary: A stream with exactly u64::MAX - 1 (18,446,744,073,709,551,614) chunks processes the final chunk at index u64::MAX - 1 (next_index advances from u64::MAX - 1 to u64::MAX after that chunk). The next call to encrypt_chunk or decrypt_chunk (at next_index == u64::MAX) returns ChainExhausted. Reimplementers testing this boundary MUST use the sequential API, not decrypt_chunk_at (which is stateless and does not check the counter).

decrypt_chunk_at remains usable after sequential exhaustion. When a sequential decryptor returns ChainExhausted (at next_index == u64::MAX), decrypt_chunk_at is unaffected — it reads no sequential state and can still decrypt any chunk by explicit index. This enables a valid use pattern: sequentially process all chunks up to the exhaustion boundary, then use decrypt_chunk_at for any remaining chunks. A reimplementer who adds a terminal-exhausted flag that also blocks decrypt_chunk_at breaks this pattern.

Compressed non-final chunk size validation applies to decrypt_chunk_at: The compressed non-final chunk size check (§15.6 — decompressed output MUST be exactly CHUNK_SIZE) applies to decrypt_chunk_at exactly as it does to sequential decryption. A reimplementer treating random-access decryption as "bare AEAD + decompress" without the size check accepts malformed streams where a non-final chunk decompresses to the wrong size. The check is post-AEAD (§15.6) and therefore safe — it does not create an oracle. The stateless nature of decrypt_chunk_at does not exempt it from content validation.

Empty-final-chunk compression bypass applies to decrypt_chunk_at: The §15.5 compression bypass for empty plaintext (a 0-byte final chunk is stored uncompressed regardless of the compress flag) applies identically to decrypt_chunk_at. When the decrypted AEAD output is zero bytes, the decompression step is skipped — attempting zstd_decompress([]) on an empty AEAD output would reject a structurally valid empty final chunk, collapsing to AeadFailed. A reimplementer who unconditionally decompresses the AEAD output in decrypt_chunk_at (rather than conditioning on !decrypted.is_empty()) breaks empty-file stream support.

15.10 Finalization State Machine

Both the encryptor and sequential decryptor maintain a finalized boolean (initially false) that enforces stream integrity:

Encrypt: Successfully encrypting a chunk with is_last = true sets finalized = true. A failed encrypt_chunk(is_last=true) call (e.g., Internal from zstd expansion) does NOT set finalized — the stream is not sealed and the call is retryable (§15.11 atomicity). Subsequent successful calls to encrypt_chunk (regardless of is_last) after finalization return InvalidData — the stream is sealed. A reimplementer who allows post-final writes would permit appending chunks to a supposedly-complete stream, breaking the exactly-one-final-chunk invariant (§15.6).
Decrypt (sequential): Successfully decrypting a chunk with tag_byte = 0x01 sets finalized = true. Subsequent calls to decrypt_chunk return InvalidData. This prevents callers from feeding additional chunks after the stream is complete. AEAD failure on the final chunk does NOT set finalized: if decrypt_chunk returns AeadFailed for a chunk whose tag_byte would have been 0x01, finalized remains false. The caller may retry the chunk (e.g., after re-fetching from a corrupted transport) without hitting the post-finalization guard. A reimplementer who sets finalized = true on any tag_byte = 0x01 attempt (including failed ones) prevents retry of a legitimately corrupted final chunk.
Encrypt (random access): encrypt_chunk_at does NOT read or set finalized, and does NOT read or advance next_index. It can be called before, during, or after sequential finalization — the finalized guard that encrypt_chunk enforces is absent. A reimplementer who adds a post-finalization guard to encrypt_chunk_at breaks the mixed sequential/random-access pattern and prevents callers from using parallel encryption alongside a sequential stream. Calling encrypt_chunk_at(is_last=true) after sequential finalization succeeds silently: the library emits a second final chunk (tag_byte = 0x01) with no error. The resulting stream violates the exactly-one-final invariant (§15.6) — a sequential decryptor seals at the first tag_byte = 0x01 and returns InvalidData for all subsequent chunks, including the second final marker. Tracking whether a final chunk has already been emitted is a caller obligation.
Decrypt (random access): decrypt_chunk_at does NOT read or set finalized. It can be called in any order, including after finalized = true, and including on the final chunk. The caller owns completion tracking when using random-access mode. The return value is (plaintext, is_last) where is_last reflects the decoded tag_byte (true if tag_byte == 0x01, false otherwise) — this is a pure read of the chunk's tag byte with no connection to the finalized flag. A reimplementer who omits is_last from the return value or always returns false prevents callers from detecting the final chunk in random-access mode.

The finalized flag is queryable via is_finalized() on both encryptors and decryptors.

Silent truncation when freeing an unfinalized encryptor: Calling soliton_stream_encrypt_free on an encryptor where finalized = false silently destroys the handle and zeroizes the key without error. The library does NOT return InvalidData or any error for freeing an unfinalized encryptor — the free operation always succeeds. The resulting stream has no final chunk (tag_byte = 0x01 was never emitted), so any sequential decryptor reading the output will eventually reach EOF without seeing tag_byte = 0x01 and detect truncation. However, if the caller discards the partially-written stream output without checking the free return code, the truncation is silent from the caller's perspective. Callers MUST call is_finalized() before freeing an encryptor and treat a non-finalized free as a programming error. The library cannot emit the final chunk automatically on free — the final chunk carries the actual last plaintext data, which the library does not buffer. An auto-emitted empty final chunk on free would produce a spurious 17-byte trailing chunk that the caller did not request and whose plaintext (empty) may be incorrect for the application. Callers who want to guarantee a final chunk MUST call encrypt_chunk(..., is_last=true) explicitly before freeing.

Freeing an unfinalized sequential decryptor also always succeeds: Calling soliton_stream_decrypt_free on a decryptor where finalized = false (i.e., the stream was only partially consumed — the tag_byte = 0x01 final chunk was never decrypted) silently destroys the handle and zeroizes the key without error. The library does NOT return InvalidData or any error for freeing an unfinalized decryptor. Whether the absent finalization reflects a truncated stream, a partial read, or a transport failure is a caller concern; the library imposes no constraint on consuming the full stream before freeing the handle.

header() is valid immediately after stream_encrypt_init — before the first chunk. The 26-byte header (version + flags + base_nonce) is written once at construction time and never changes. The canonical usage is: init → header() → encrypt_chunk(...) × N. A reimplementer who adds a "not-yet-started" guard — returning an error from header() before the first encrypt_chunk call — breaks protocols that transmit the header before beginning chunk production (e.g., streaming pipelines that open the output channel, write the header, and then encrypt chunks as they arrive). header() is equally valid before the first chunk, between chunks, after the final chunk, and after freeing finalization (if the handle is still accessible). It is not subject to any state guard — the finalized flag, the next_index counter, and the per-chunk error states are irrelevant. A reimplementer who adds a post-finalization guard to header() also breaks this pattern (retrieving the header after the final chunk is emitted is a common pattern for container formats).

15.11 Random Access

Counter-based nonce derivation enables both encryption and decryption of any chunk without processing preceding chunks.

encrypt_chunk_at — random-access encryption: The symmetric counterpart to decrypt_chunk_at. Encrypts one chunk at an explicit index using the same nonce and AAD construction as encrypt_chunk (§15.3, §15.4), but does not advance next_index or set finalized. The primary use case is parallel encryption: the caller splits the plaintext into chunks, assigns each chunk an index, and dispatches encrypt_chunk_at calls concurrently. Because each chunk's nonce and AAD are fully determined by the chunk index, compression flag, base nonce, and key — none of which change during parallel execution — the encrypted chunks can be computed independently and assembled in index order without synchronization. The caller is responsible for:

Assigning each chunk a unique index. Calling encrypt_chunk_at twice with the same (index, is_last, plaintext) triple produces identical output (nonce reuse — see §15.12 index uniqueness). Calling it twice with the same index but different plaintexts produces ciphertexts that are cryptographically indistinguishable from a corruption — no oracle exists to detect them.
Marking exactly one chunk as is_last = true. The final-chunk invariant (§15.6 exactly-one-final) is a caller obligation when using encrypt_chunk_at. The library enforces nothing: a caller who marks two chunks as final produces a stream where decrypt_chunk accepts the first tag_byte = 0x01 it encounters and returns InvalidData for all subsequent chunks (§15.10 decrypt sequential finalization guard).
Knowing the total chunk count before encryption begins (to identify which chunk is final). The sequential encrypt_chunk does not require this — is_last is provided per call. For encrypt_chunk_at, the caller must know the chunk count in advance to set is_last correctly on the last chunk.

encrypt_chunk_at takes &self in Rust (immutable borrow), so multiple concurrent calls from safe Rust code (e.g., via rayon::par_iter) are permitted without unsafe — the borrow checker enforces that no mutable state is shared. The CAPI soliton_stream_encrypt_chunk_at uses *const SolitonStreamEncryptor for the same reason: the function does not mutate handle state. The CAPI reentrancy guard (§13.6) still fires on concurrent calls to the same handle, so parallel encryption through the CAPI requires one encryptor handle per thread, all initialized from the same key, AAD, and base nonce. The Rust API has no such restriction.

*const SolitonStreamEncryptor for soliton_stream_encrypt_chunk_at: The CAPI uses *const to reflect the &self Rust contract. The same caveat from soliton_stream_decrypt_chunk_at applies: in C, const T* does NOT mean concurrent calls are safe — the reentrancy guard enforces single-caller access at runtime. Parallel encryption through CAPI always requires separate handles.

Sequential and random-access encryption can be mixed on the same handle: encrypt_chunk_at never modifies next_index or finalized, so it does not interfere with a concurrent or subsequent sequential encrypt_chunk pass. For example, a caller can encrypt the bulk of a stream sequentially via encrypt_chunk, then re-encrypt a specific chunk at a known index via encrypt_chunk_at to patch it — the sequential counter is unaffected. "Mixed" means interleaved calls within a single thread in the Rust API; CAPI mixed access requires sequential calls on the same handle due to the reentrancy guard.

Decryption of random-access-encrypted streams: A stream encrypted entirely via encrypt_chunk_at is wire-format identical to one encrypted via encrypt_chunk — the wire format (§15.2) depends only on the key, base nonce, indices, and plaintexts, not on which encrypt API was used. It can be decrypted via decrypt_chunk (sequential), decrypt_chunk_at (random access), or a mix.

Counter-based nonce derivation enables decryption of any chunk without processing preceding chunks:

No compression: chunk byte offsets are deterministic: STREAM_HEADER_SIZE + N × (CHUNK_SIZE + STREAM_CHUNK_OVERHEAD), where STREAM_CHUNK_OVERHEAD = 17 (1 tag_byte + 16 Poly1305 authentication tag) — using the exact names from Appendix A. Full expansion: 26 + N × (1,048,576 + 17) = 26 + N × 1,048,593. The tag_byte occupies the first byte of each chunk at this offset; the AEAD ciphertext (the bytes passed to XChaCha20-Poly1305) starts at STREAM_HEADER_SIZE + N × STREAM_CHUNK_STRIDE + 1. A seek-and-decrypt implementation that passes the tag_byte as the first byte to the AEAD primitive gets AeadFailed with no obvious diagnostic. Recommended for random-access use cases (video seeking, resumable downloads).
With compression: chunk sizes are content-dependent. The caller must build a chunk-offset index during encryption (accumulate per-chunk output sizes).

Index integrity: A tampered chunk-offset index (pointing to the wrong byte range for a given chunk index) causes AEAD failure, not silent wrong plaintext — both the per-chunk nonce and AAD include the chunk index, so presenting chunk N's ciphertext at index M fails authentication. This holds for both sequential and random-access modes.

decrypt_chunk_at takes an extracted chunk, not the stream tail: The chunk parameter is exactly one encrypted chunk's bytes — the bytes from the stream at offset 26 + N × 1,048,593 to the start of the next chunk (26 + (N+1) × 1,048,593), excluding the stream header. It is NOT the remaining stream bytes starting at that offset. Passing the stream tail (everything from the chunk's start byte to the end of the stream) does not decrypt correctly — the function expects exactly one chunk and treats trailing bytes as an oversized input that fails length validation. The caller is responsible for extracting the correct byte range before calling decrypt_chunk_at. For uncompressed streams, the offset formula above gives the exact byte range; for compressed streams, the caller must use the chunk-offset index built during encryption (§15.11 "With compression").

The decrypt_chunk_at API accepts a chunk index directly, does not advance the sequential counter, does not set the finalized flag regardless of tag_byte, and can be called on an immutable (&self) reference. Sequential and random-access decryption can be mixed on the same decryptor handle — decrypt_chunk_at never modifies next_index or finalized, so it does not interfere with a sequential pass (e.g., random-access retry of a failed chunk during an otherwise sequential download). "Mixed" means interleaved calls within a single thread, not concurrent multi-threaded access. Parallel chunk decryption requires separate decryptor handles initialized from the same key and header bytes; the CAPI reentrancy guard (§13.6) prevents concurrent calls on the same handle.

*const SolitonStreamDecryptor means "no observable state mutation," not "thread-safe": The CAPI signature uses *const SolitonStreamDecryptor for soliton_stream_decrypt_chunk_at to reflect that the function takes &self in Rust (no state mutation). In C, const T* conventionally signals "safe for concurrent reads," but this guarantee does NOT hold here — the CAPI reentrancy guard (§13.6) fires on any concurrent call regardless of whether the call is read-only. A C binding author who interprets *const as "concurrent calls are safe" and dispatches decrypt_chunk_at from multiple threads on the same handle receives ConcurrentAccess (-18) with no indication that const was the source of confusion. Parallel chunk decryption always requires separate handles, even when all calls are read-side decrypt_chunk_at. The const qualifier signals the Rust-level API contract (immutable borrow), not a C-level concurrency guarantee.

Atomicity: On encryption or decryption failure (AEAD rejection, ChainExhausted, decompression failure, post-finalization guard, Internal from compression expansion check), the encryptor/decryptor state is unchanged — next_index is not advanced and finalized is not set. The operation is retryable (the same chunk can be re-submitted after correcting the input). Unlike ratchet encrypt() (§6.5), per-chunk failures are NOT session-fatal and the streaming key is NOT zeroized on error — retryability requires the key to survive failed calls. The key is zeroized exclusively on handle destruction (soliton_stream_encrypt_free / soliton_stream_decrypt_free). Reimplementers MUST NOT zeroize the streaming key on per-chunk AEAD failure.

Output parameters on error: On any error return from soliton_stream_decrypt_chunk or soliton_stream_decrypt_chunk_at, the output parameters are set to defined values: *out_written = 0 and *is_last = false. Callers MUST check the return code before reading out_written or is_last — on error, these values are sentinels, not results. A caller who reads out_written or is_last without first checking the return code gets 0 and false, which is safe (no buffer overflow, no false finalization signal), but the defined-value guarantee is part of the CAPI contract so reimplementers must provide it. Note: soliton_stream_encrypt_chunk sets only *out_written = 0 on error; there is no is_last output parameter on the encrypt side.

15.12 Stream-Level Security Analysis

Cross-stream splicing: Each stream has a unique random base nonce (24 bytes from CSPRNG). Moving a chunk from stream A into stream B at the same index fails AEAD authentication — the per-chunk nonce is derived from the base nonce, so the chunk decrypts under a different nonce in stream B. Moving a chunk to a different index within the same stream also fails — different chunk index produces a different nonce.

Chunk reordering: Per-chunk nonces are deterministic from (base_nonce, index). Swapping chunks i and j fails AEAD because each chunk authenticates under its own index-derived nonce. The sequential decryptor also detects reordering via next_index monotonic advance. next_index starts at 0 (§15.9) — the first chunk has index 0. A stream encrypted with N chunks has chunk indices 0 through N−1, and next_index equals N after the final chunk is encrypted. Unlike ratchet send_count (which starts at 0 but represents a sequence number where 0 is the first sent message), stream chunk indexing is zero-based purely as a counter: chunk 0 is the first chunk, not a "zeroth" message. Reimplementers who initialize next_index = 1 by analogy with ratchet counters will misalign every chunk's nonce from the first chunk onward.

Truncation: The is_final tag byte (§15.10) detects truncation — a sequential decryptor that reaches EOF without seeing tag_byte = 0x01 knows the stream was truncated. The detection mechanism is is_finalized() == false after transport EOF, not a library error: the library does not return an error for a non-finalized stream at EOF; it returns errors only per-chunk (e.g., AeadFailed for a partial chunk, ChainExhausted for index overflow). Truncation between whole chunks — i.e., the transport closes cleanly without delivering the final chunk — produces no library error on any call. The caller detects this by checking is_finalized() after transport EOF: false means the final chunk (tag_byte = 0x01) was never delivered. A reimplementer who adds a check_complete() or flush() API that returns InvalidData for non-finalized state creates an incompatible API — no such call exists in the reference implementation. Random-access decryptors do not check finalization and cannot detect truncation; callers using random-access mode must verify completeness externally (e.g., via a known chunk count in the stream metadata). For compressed streams, the chunk count is not derivable from byte length (unlike uncompressed streams where (total_bytes - HEADER_SIZE) / (CHUNK_SIZE + CHUNK_OVERHEAD) is exact). The chunk count must be stored in the enclosing metadata. This metadata must itself be authenticated — it must be covered by a ratchet AEAD, a detached signature, or another integrity mechanism. An adversary who controls the metadata channel can substitute a smaller chunk count, making a truncated stream appear complete to a random-access caller that decrypts only the first N chunks. Storing the chunk count in an unauthenticated plaintext field (e.g., a JSON wrapper, an HTTP header) defeats this completeness check entirely. Standard authenticated placement: include the chunk count in the same ratchet message body that delivers the stream key (§15.1). The ratchet AEAD authenticates the entire message body, so the chunk count inherits authentication without a separate integrity mechanism. This is the recommended pattern; alternatives (detached signature, AEAD-authenticated sidecar) are valid but add complexity for no benefit in the standard composition.

Definition of "chunk count": The chunk count is the total number of chunks produced by the encryptor — equivalently, final_chunk_index + 1, where final_chunk_index is the 0-based index of the final chunk (the chunk with tag_byte = 0x01). For a stream with N non-final chunks followed by one final chunk, the chunk count is N + 1. This is also the value of the encryptor's next_index counter immediately after calling encrypt_chunk with is_last = true. An off-by-one (storing final_chunk_index instead of final_chunk_index + 1) silently accepts a stream truncated before the last chunk: a random-access caller decrypting chunks 0 through count − 1 would stop one chunk before the final one, never seeing is_last = true and incorrectly treating the stream as complete.

next_index reliability after a failed encrypt_chunk: This is only reliable when is_finalized() is true. If the final encrypt_chunk(is_last=true) call fails (e.g., InvalidData for oversized plaintext), next_index is not incremented — the state is unchanged (§15.11 atomicity). A caller who reads next_index after a failed final-chunk call and treats it as chunk count will be off by one.

No CAPI function retrieves the chunk count post-finalization: The CAPI provides soliton_stream_decrypt_expected_index (read the decryptor's sequential counter) but no corresponding soliton_stream_encrypt_expected_index. To implement the recommended metadata pattern (§15.12 authenticated chunk count), CAPI callers must track the chunk count themselves — increment a caller-managed counter on each soliton_stream_encrypt_chunk call. soliton_stream_decrypt_expected_index cannot substitute for a caller-managed counter: this function reads the decryptor's own sequential next_index (the number of chunks it has successfully decrypted), not the encryptor's state. A paired decryptor that has not yet decrypted any chunks returns 0 — it has no visibility into how many chunks the encryptor has produced. Do not use soliton_stream_decrypt_expected_index as a proxy for the encryptor's chunk count. The simplest workaround: maintain an application-level chunk_count variable initialized to 0, increment it after each successful soliton_stream_encrypt_chunk, and embed the final value in the ratchet message alongside the stream key. Neither the Rust API nor the CAPI exposes the encryptor's internal index counter — StreamEncryptor provides only header() and is_finalized() (the symmetric expected_index() exists on StreamDecryptor only, as an asymmetric design choice). CAPI callers and Rust callers alike must implement equivalent counter tracking in application code.

Chunk deletion (middle): Deleting chunk i causes chunk i+1 to be presented at index i during sequential decryption — AEAD fails (wrong nonce for that ciphertext). Random-access at the original index returns AeadFailed (no ciphertext at that offset, or wrong ciphertext).

encrypt_chunk_at index uniqueness: Calling encrypt_chunk_at twice with the same index on the same encryptor handle produces identical ciphertext both times — nonce and AAD are deterministic from the index, so the same plaintext at the same index always yields the same encrypted output. This is not a security vulnerability within a single stream (unique indices across chunks prevent nonce reuse between chunks), but it means encrypt_chunk_at offers no write-once enforcement: a caller who accidentally encrypts the same index twice will silently produce a redundant chunk with no error return. The stream assembled from such duplicate calls contains two ciphertexts at the same position; which one the decryptor sees depends on how the caller assembles the stream. Callers using encrypt_chunk_at for parallel encryption MUST ensure each chunk index is used exactly once. The library cannot enforce this — enforcing it would require shared mutable state, which contradicts the &self contract.

Chunk replay: The streaming layer does not provide replay protection — this is mode-dependent. In sequential mode, replaying a chunk that was already successfully decrypted fails incidentally: the sequential decryptor has already advanced next_index past that chunk's position, so presenting the chunk again decrypts it at a different (wrong) index, producing AeadFailed. This is incidental, not by design — the sequential counter provides freshness as a side effect of monotonic advance, not via explicit replay tracking. In random-access mode, decrypting the same (index, chunk) pair twice succeeds both times — decrypt_chunk_at is stateless and has no memory of prior decryptions. A formal modeler asking whether the streaming layer provides authenticated-channel replay resistance gets different answers depending on the mode. Reimplementers MUST NOT add stateful replay tracking to decrypt_chunk_at — its stateless contract is required for mixed sequential/random-access operation (§15.11) and for parallel chunk decryption across multiple handles.

Cross-session replay and key freshness: The in-session counter provides no protection against cross-session replay — presenting an entire stream (header + all chunks) to a fresh decryptor initialized with the same key succeeds without error. The stream has a unique base_nonce, but the decryptor has no memory of prior base nonces. Protection against cross-session replay relies entirely on the key being freshly generated from the OS CSPRNG for each stream (§15.1). The probability that two streams share the same key is negligible with a properly seeded CSPRNG. A reimplementer who derives stream keys deterministically (e.g., from a counter or from fixed material) or who reuses stream keys across sessions loses this protection entirely — cross-session replay becomes trivially possible.

Key reuse across streams: Catastrophic — two streams with the same key and base nonce produce identical per-chunk nonces, enabling XOR of plaintexts. The base nonce is 192 bits from CSPRNG, making accidental collision negligible (~2^-96 birthday bound for 2^48 streams). Callers MUST NOT reuse keys across streams; generate a fresh random key per stream. caller_aad does not substitute for key freshness: using distinct caller_aad values with the same key does not prevent nonce reuse — nonces are derived from the base nonce and chunk index, not from the AAD. Two streams with the same key and same base nonce (birthday collision) produce identical nonces regardless of caller_aad differences. The isolation primitive is the per-stream random key and base nonce, not the AAD.

Appendix A: Constants

All domain labels and AAD prefixes are raw UTF-8 byte strings — no null terminators, no length prefixes. Concatenation with other fields (fingerprints, header bytes, etc.) is raw byte concatenation unless explicitly annotated otherwise (e.g., KEX info uses length-prefixed fields per §5.4).

AUTH_HMAC_LABEL     = b"lo-auth-v1"          // 10 bytes
KEX_HKDF_INFO_PFX  = b"lo-kex-v1"            // 9 bytes
SPK_SIG_LABEL       = b"lo-spk-sig-v1"        // 13 bytes
INITIATOR_SIG_LABEL = b"lo-kex-init-sig-v1"  // 18 bytes
RATCHET_HKDF_INFO  = b"lo-ratchet-v1"        // 13 bytes
DM_AAD              = b"lo-dm-v1"             // 8 bytes — shared by first-message (§5.4)
                                               // and ratchet-message (§6.5) AAD. Context
                                               // disambiguation is provided by the suffix
                                               // (session-init-bytes vs. ratchet-header-bytes),
                                               // not the label. Cross-context confusion
                                               // (feeding a first-message ciphertext as a
                                               // ratchet message or vice versa) is rejected
                                               // by AEAD: encode_session_init begins with a
                                               // 2-byte BE length prefix (~0x000C) while
                                               // encode_ratchet_header begins with 1216 bytes
                                               // of public key material — the AAD mismatch
                                               // causes tag verification to fail. Future
                                               // message formats needing distinct AEAD contexts
                                               // MUST use a new label.
STORAGE_AAD         = b"lo-storage-v1"        // 13 bytes
DM_QUEUE_AAD        = b"lo-dm-queue-v1"       // 14 bytes — separate label (not DM_AAD with a suffix)
                                               // because the DM queue context has no fixed structural
                                               // suffix to provide disambiguation. DM_AAD is shared
                                               // between first-message and ratchet-message contexts
                                               // because those contexts have structurally distinct
                                               // suffixes (session-init bytes vs. ratchet-header bytes)
                                               // that make cross-context confusion impossible. DM queue
                                               // AAD has no such suffix — using DM_AAD with a queue-
                                               // specific suffix would require a separately-standardized
                                               // encoding convention with the same collision-prevention
                                               // burden as a distinct label. A distinct label is simpler.
CALL_HKDF_INFO      = b"lo-call-v1"           // 10 bytes
PHRASE_HASH_LABEL   = b"lo-verification-v1"  // 18 bytes
PHRASE_EXPAND_LABEL = b"lo-phrase-expand-v1"  // 19 bytes
MSG_KEY_DOMAIN_BYTE = 0x01                    // HMAC domain byte for KDF_MsgKey (§6.3)
                                               // 0x02 reserved — gap buffer between 0x01 (message key
                                               //   derivation) and 0x03; reserved for hypothetical future
                                               //   epoch-key-derived outputs to maintain a consistent gap
                                               //   and prevent contiguous assignment with 0x01.
                                               // 0x03 reserved (prevents collision with call chain bytes 0x04-0x06)
CALL_KEY_A_BYTE     = 0x04                    // HMAC data byte for first call key
CALL_KEY_B_BYTE     = 0x05                    // HMAC data byte for second call key
CALL_CHAIN_ADV_BYTE = 0x06                    // HMAC data byte for next call chain key
MAX_CALL_ADVANCE    = 2²⁴                     // Maximum advance_call_chain steps per call session
                                               // (16,777,216 rekeys). Exceeding this limit returns
                                               // ChainExhausted. Also listed in Appendix B.
                                               // NOT an exported pub const — this is a private
                                               // const in call.rs; importing MAX_CALL_ADVANCE by
                                               // name will fail at link/import time. Binding authors
                                               // must embed the literal value (16_777_216 / 0x100_0000).
CALL_ID_SIZE        = 16                       // 128-bit random call identifier
XWING_CIPHERTEXT_SIZE = 1120                  // X-Wing KEM ciphertext bytes: X25519_eph_pk (32) ||
                                               // ML-KEM-768_ct (1088), LO X25519-first order (§8.1).
                                               // Fixed in lo-crypto-v1; length-prefixed in wire format
                                               // (§7.4) for forward-compat across crypto versions.
HMAC_SHA3_256_BLOCK_SIZE = 136             // NOT an exported pub const — binding authors must
                                               // embed the value 136 directly; importing this name
                                               // will fail at link/import time.
                                               // SHA3-256's Keccak rate (block size) in bytes.
                                               // RFC 2104 HMAC pads/truncates keys to the hash's
                                               // block size — 136 bytes for SHA3-256, NOT the 64
                                               // bytes of SHA-2. A reimplementer using a SHA-2-
                                               // configured HMAC library or hardcoding 64 as the
                                               // block size produces wrong output on every KDF_MsgKey,
                                               // KDF_Root, KDF_Call, and AdvanceCallChain call.
                                               // Standard HMAC libraries handle this automatically
                                               // when SHA3-256 is selected — this constant exists
                                               // for reimplementers building HMAC from primitives
                                               // and for interoperability test vectors (F.25 / T3).
XWING_SEED_SHAKE_OUTPUT = 96              // NOT an exported pub const — binding authors must
                                               // embed the value 96 directly; importing this name
                                               // will fail at link/import time.
                                               // SHAKE256 output length (bytes) for X-Wing seed expansion
                                               // (§8.5, draft-09 §3.2): SHAKE256(seed_32, 96) → d(32)
                                               // || z(32) || sk_X(32). Not used in production keygen
                                               // (which draws three independent CSPRNG values) — used
                                               // exclusively in deterministic test environments and KAT
                                               // reproduction. A reimplementer using SHAKE256(seed, 64)
                                               // would derive only d and z, missing sk_X.
HKDF_ZERO_SALT      = [0x00] × 32   // 32 zero bytes (sequence notation — not integer multiplication)
MAX_RECV_SEEN       = 65536                    // max entries in recv_seen duplicate tracking set
RATCHET_BLOB_VERSION = 0x01                    // current ratchet state serialization version (§6.8).
                                               // `from_bytes` returns `UnsupportedVersion` for any
                                               // version ≠ 0x01. No migration path for unknown versions.
STREAM_HEADER_VERSION = 0x01                   // current streaming AEAD header version (§15.2) — Rust source: STREAM_VERSION
CRYPTO_VERSION      = "lo-crypto-v1"
XWING_LABEL         = 0x5c 0x2e 0x2f 0x2f 0x5e 0x5c  // \.//^\  (label goes LAST in combiner)
STREAM_AAD          = b"lo-stream-v1"          // 12 bytes
STREAM_TAG_NONFINAL = 0x00                     // non-final chunk tag byte. Three roles:
                                               // (1) XOR component in nonce derivation (§15.3) —
                                               //   XORed into mask byte 8, producing a nonce that
                                               //   is distinct from the final-chunk nonce at the
                                               //   same index (0x00 vs 0x01 in byte 8).
                                               // (2) Final-chunk signal — value 0x00 means there
                                               //   are more chunks to follow; the sequential
                                               //   decryptor does not set finalized=true.
                                               // (3) Reader termination — sequential decryptors
                                               //   continue reading chunks as long as tag_byte ≠ 0x01.
STREAM_TAG_FINAL    = 0x01                     // final chunk tag byte. Three roles:
                                               // (1) XOR component in nonce derivation (§15.3) —
                                               //   XORed into mask byte 8, producing a nonce that
                                               //   differs from the non-final nonce at the same index.
                                               //   This prevents the final-chunk ciphertext from being
                                               //   presentable as a valid non-final chunk (the nonces
                                               //   differ, so AEAD would fail if the tag_byte were flipped).
                                               // (2) Final-chunk signal — exactly one chunk per stream
                                               //   has tag_byte=0x01; its presence terminates the stream.
                                               // (3) Reader termination — sequential decryptors set
                                               //   finalized=true and reject any subsequent decrypt_chunk
                                               //   calls when a chunk with tag_byte=0x01 is successfully
                                               //   decrypted.
CHUNK_SIZE          = 1_048_576               // plaintext bytes per non-final chunk (1 MiB).
                                               // Also the minimum output buffer size for
                                               // soliton_stream_decrypt_chunk /
                                               // soliton_stream_decrypt_chunk_at (see Appendix B).
                                               // Rust source: STREAM_CHUNK_SIZE (the exported pub
                                               // const is named STREAM_CHUNK_SIZE, not CHUNK_SIZE;
                                               // this spec uses CHUNK_SIZE as the canonical name).
FLAG_COMPRESSED     = 0x01                    // bits 1-7 reserved (MUST be zero on write,
                                               // collapse to AeadFailed on read per §15.7).
                                               // This flag appears in: §11.1 storage blob header (flags byte),
                                               // §11.2 DM queue blob, §15.2 streaming AEAD header (flags byte),
                                               // §15.5 streaming AAD. In all contexts, bit 0 = compression
                                               // (0 = none, 1 = zstd). Binding authors using the flag
                                               // value directly should define this constant locally.
STREAM_HEADER_SIZE  = 26                      // bytes in the streaming AEAD header (§15.2):
                                               // version (1) + flags (1) + base_nonce (24).
                                               // Used in the random-access offset formula
                                               // (§15.11): offset = STREAM_HEADER_SIZE + N × STREAM_CHUNK_STRIDE.
STREAM_CHUNK_OVERHEAD = 17                    // bytes added per chunk beyond plaintext:
                                               // tag_byte (1) + Poly1305 tag (16).
                                               // An encrypted chunk is: tag_byte (1) ||
                                               // XChaCha20-Poly1305 output (plaintext + 16).
STREAM_CHUNK_STRIDE = 1_048_593               // fixed byte stride between uncompressed chunk
                                               // boundaries: CHUNK_SIZE + STREAM_CHUNK_OVERHEAD
                                               // = 1_048_576 + 17 = 1_048_593.
                                               // Used in the §15.11 random-access offset formula:
                                               // byte_offset(N) = STREAM_HEADER_SIZE + N × STREAM_CHUNK_STRIDE.
                                               // Only valid for uncompressed streams; compressed
                                               // chunk sizes are content-dependent (§15.11).
                                               // NOT an exported pub const — binding authors must
                                               // compute this as CHUNK_SIZE + STREAM_CHUNK_OVERHEAD;
                                               // importing STREAM_CHUNK_STRIDE by name will fail
                                               // at link/import time.
STREAM_ZSTD_OVERHEAD = 256                    // zstd expansion guard for streaming encrypt_chunk (§15.11).
                                               // If zstd output exceeds plaintext.len() + 256, encrypt_chunk
                                               // returns Internal (retryable with compress=false). The value
                                               // is a conservative margin: zstd's worst-case expansion on
                                               // incompressible 1 MiB data is ~50 bytes (frame + block headers);
                                               // 256 provides ~5× headroom. Used in STREAM_ENCRYPT_MAX below.
STREAM_ENCRYPT_MAX  = 1_048_849               // max bytes of CAPI output buffer for one chunk:
                                               // CHUNK_SIZE (1_048_576) + ZSTD_OVERHEAD (256) +
                                               // CHUNK_OVERHEAD (17). Binding authors MUST
                                               // allocate at least this many bytes for the output
                                               // buffer passed to soliton_stream_encrypt_chunk;
                                               // smaller buffers return InvalidLength.
                                               // NOTE: this is the ceiling for the full-CHUNK_SIZE
                                               // case. For a short final chunk (e.g., 100 bytes
                                               // of plaintext), the Internal guard fires if zstd
                                               // expands that chunk beyond 100 + ZSTD_OVERHEAD
                                               // (= 356 bytes), not beyond STREAM_ENCRYPT_MAX.
                                               // The guard is per-actual-plaintext-length, not
                                               // per-CHUNK_SIZE. An encrypt_chunk caller with a
                                               // 100-byte final chunk needs only a 356-byte CAPI
                                               // output buffer, but MUST still allocate at least
                                               // STREAM_ENCRYPT_MAX to satisfy the length guard.

Appendix B: Parameters

Parameter	Value
OPK batch size	100
Pre-key low threshold	10
SPK rotation	7 days
Old SPK retention	30 days (from rotation, not generation — §10.2)
Auth challenge timeout	30 seconds (§4.4)
Max recv_seen entries	65536 per epoch
Max epoch length	2^32 - 1 messages
Storage key versions	1-255
Verification phrase	7 words / EFF large wordlist (7,776 words)
Verification phrase entropy	~90.3 bits (7 × log2(7776))
Zstd compression level	Fastest (~1); `ruzstd` 0.8.x limitation
Max plaintext per blob (encrypt)	256 MiB on native; 16 MiB on WASM — `encrypt_blob` returns `InvalidData` if plaintext exceeds the platform limit before compression. This is the caller-provided pre-compression plaintext size, not the post-compression ciphertext size.
Max decompressed blob	256 MiB
Call ID size	16 bytes (128-bit random)
Argon2id version	0x13 (decimal 19 = v1.3, the only version produced and accepted; §10.6)
Argon2id m_cost	8 KiB - 4,194,304 KiB (4 GiB); must be ≥ 8 × p_cost (RFC 9106 §3.1)
Argon2id t_cost	1 - 256
Argon2id p_cost	1 - 256
Argon2id output length	1 - 4,096 bytes
Argon2id salt minimum	8 bytes
Argon2id `secret` (pepper)	Empty (0 bytes) — soliton does not use the Argon2id pepper input; reimplementers MUST pass empty `secret`
Argon2id `ad` (associated data)	Empty (0 bytes) — soliton does not use the Argon2id associated data input; reimplementers MUST pass empty `ad`
Stream chunk size	1 MiB (1,048,576 bytes)
Stream header size	26 bytes (version + flags + nonce)
Stream chunk overhead	17 bytes (tag_byte + Poly1305 tag)
Stream zstd overhead	256 bytes (~5× worst-case margin; zstd worst-case expansion on incompressible data is ~50 bytes for a 1 MiB input: frame header + block headers)
Stream max encrypted chunk	1,048,849 bytes (CHUNK_SIZE + ZSTD_OVERHEAD + CHUNK_OVERHEAD)
Stream decrypt output buffer minimum	1,048,576 bytes (CHUNK_SIZE) — binding authors MUST allocate at least this many bytes for the output buffer passed to `soliton_stream_decrypt_chunk` / `soliton_stream_decrypt_chunk_at`; smaller buffers return `InvalidLength` regardless of actual plaintext size
Stream max chunk index (sequential)	u64::MAX − 1 (guard fires at next_index == u64::MAX)
Stream max chunk index (random-access)	u64::MAX (no guard — any u64 accepted, returns AeadFailed for indices no encryptor produced; see §15.9)
Argon2id `OWASP_MIN` preset	m=19 MiB (19456 KiB), t=2, p=1 — interactive auth
Argon2id `RECOMMENDED` preset	m=64 MiB (65536 KiB), t=3, p=4 — stored keypair
Argon2id `WASM_DEFAULT` preset	m=16 MiB (16384 KiB), t=3, p=1 — WASM targets
Call chain advance limit	2²⁴ steps (16,777,216 rekeys per call session, §6.12). `step_count` starts at 0; the initial keys from `derive_call_keys` are step-0 keys. `ChainExhausted` fires when `step_count` reaches 2²⁴ (i.e., after 16,777,216 `advance()` calls).
WASM decompressed blob limit	16 MiB (WASM targets use a lower limit than the general 256 MiB; §11.3)
Max ratchet serialization epoch	u64::MAX − 1 (epoch u64::MAX triggers ChainExhausted from to_bytes, §6.8)
Ratchet blob deserialization cap	1 MiB (1,048,576 bytes) — CAPI `soliton_ratchet_from_bytes` / `from_bytes_with_min_epoch` reject inputs exceeding this size with `InvalidLength` (-1). Tighter than the general 256 MiB cap; the maximum valid blob is ~530 KB (§6.8). Reimplementers building their own deserialization entry point SHOULD apply an equivalent cap.
`decode_session_init` input cap	64 KiB (65,536 bytes) — `soliton_kex_decode_session_init` rejects inputs exceeding this size with `InvalidLength` (-1). Tighter than the general 256 MiB CAPI cap; the maximum valid session init blob is 4,669 bytes (with OPK; per Appendix C / §7.4).
`build_first_message_aad` input cap	8 KiB (8,192 bytes) — `soliton_kex_build_first_message_aad` rejects `session_init_encoded` inputs exceeding this size with `InvalidLength` (-1). The cap is never reached in practice — the maximum valid `session_init_encoded` blob is 4,669 bytes (with OPK; §7.4 / Appendix C). There is no `associated_data` parameter on this function. Tighter than the general 256 MiB CAPI cap.

HKDF Usage Summary

All three HKDF invocations use different salt conventions. Implementers must use the exact salt specified for each KDF — do not assume uniformity.

KDF_Root and KDF_Call share root_key as the HKDF salt — this is safe: both use root_key as the salt, which an auditor might flag as salt reuse. Domain separation is maintained by distinct IKM values (kem_shared_secret for KDF_Root vs kem_ss ‖ call_id for KDF_Call) and distinct info strings ("lo-ratchet-v1" vs "lo-call-v1" ‖ fp_lo ‖ fp_hi). HKDF's Extract step with a shared salt produces different PRKs only when the IKM differs; the different IKM inputs guarantee distinct PRKs. The info strings then provide additional domain separation in the Expand step. Same-salt reuse introduces no cross-context weakness here.

KDF	Salt	IKM	Info	Output
KDF_KEX (§5.4)	`0x00 × 32` (zero salt)	Combined pre-key shared secrets: 64 B without OPK (`ss_ik ‖ ss_spk`); 96 B with OPK (`ss_ik ‖ ss_spk ‖ ss_opk`). The two IKM variants are not interchangeable — a 64 B IKM and a zero-padded 96 B IKM produce different HKDF outputs.	Length-prefixed composite (§5.4) — exception: the `"lo-kex-v1"` (9 B) domain prefix is raw, no length prefix (see §5.4 and Appendix A); only the per-field entries that follow it use `len(x)‖x` encoding	64 B → (rk, ek)
KDF_Root (§6.4)	`root_key`	X-Wing shared secret	`"lo-ratchet-v1"` (raw, 13 B)	64 B → (rk′, ek′)
KDF_Call (§6.12)	`root_key`	`kem_ss ‖ call_id` (raw, 48 B)	`"lo-call-v1" ‖ fp_lo ‖ fp_hi` (raw, 74 B)	96 B → (key_a, key_b, ck)

Appendix C: Sizes

Component	Bytes
LO composite public key	3200 — field layout: X-Wing pk (bytes 0-1215) ‖ Ed25519 pk (bytes 1216-1247) ‖ ML-DSA-65 pk (bytes 1248-3199)
LO composite secret key	2496 — field layout: X-Wing sk (bytes 0-2431) ‖ Ed25519 seed (bytes 2432-2463) ‖ ML-DSA-65 seed `ξ` (bytes 2464-2495)
X-Wing public key	1216
X-Wing secret key	2432
X-Wing ciphertext	1120
X-Wing shared secret	32
X25519 scalar (sk)	32
X25519 public key	32
ML-KEM-768 public key (`ek_PKE`)	1184
ML-KEM-768 secret key (expanded, `dk_M`)	2400 (see §8.5 for field layout)
ML-KEM-768 ciphertext	1088
ML-KEM-768 shared secret	32
ML-DSA-65 public key	1952
ML-DSA-65 secret key seed (`ξ`, stored form)	32
ML-DSA-65 expanded signing key (`sk_expanded`, not stored — re-derived from seed at signing time per §8.5)	4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)
Auth proof / token (HMAC-SHA3-256 output of LO-Auth)	32
Ed25519 public key	32
Ed25519 secret key seed (stored form)	32
Fingerprint (raw)	32
Fingerprint (hex)	64 chars
Verification phrase	7 words (~90 bits entropy)
Ed25519 signature	64
ML-DSA-65 signature	3309
Hybrid signature	3373
AEAD tag	16
AEAD nonce	24
Storage blob header	26 (version + flags + nonce)
Storage blob minimum	42 (header + Poly1305 tag)
Ratchet blob minimum	195 bytes (§6.8) — any blob shorter MUST be rejected with `InvalidData` without parsing; see §6.8 for field breakdown
Passphrase blob minimum (basic, no prefix)	56 bytes: salt(16) + nonce(24) + tag(16) for empty plaintext (§10.6)
Passphrase blob minimum (basic, with magic prefix)	57 bytes: 0x00 magic(1) + salt(16) + nonce(24) + tag(16) (§10.6)
Passphrase blob minimum (extended, no prefix)	62 bytes: m_cost(4) + t_cost(1) + p_cost(1) + salt(16) + nonce(24) + tag(16) (§10.6)
Passphrase blob minimum (extended, with magic prefix)	63 bytes: 0x01 magic(1) + m_cost(4) + t_cost(1) + p_cost(1) + salt(16) + nonce(24) + tag(16) (§10.6)
Ratchet serialization version	1 byte (0x01 current)
Call ID	16
Call HKDF output	96 (send_key + recv_key + chain_key)
Call encryption key	32
Stream header	26 (version + flags + nonce)
Stream chunk overhead	17 (tag_byte + Poly1305 tag)
Stream min valid stream	43 (header + empty final chunk)
Stream max encrypted chunk	1,048,849 (with zstd overhead) — applies to any chunk, including the final
Stream max final chunk plaintext (decrypt output)	1,048,576 (`CHUNK_SIZE`) — a final chunk's plaintext is `0..=CHUNK_SIZE` bytes; the decrypt output buffer must be at least this size regardless of expected plaintext (§15.6)
Stream uncompressed chunk wire stride	1,048,593 (CHUNK_SIZE + CHUNK_OVERHEAD = 1,048,576 + 17) — the fixed byte stride between chunk boundaries in an uncompressed stream; used by §15.11 random-access offset formula: `offset = 26 + N × 1,048,593`
encode_session_init (no OPK)	3,543 — field breakdown (§7.4): 14 (2 len + 12 `"lo-crypto-v1"`) + 32 (sender_ik_fp) + 32 (recipient_ik_fp) + 1216 (sender_ek / X-Wing pk) + 1122 (2 len + 1120 ct_ik) + 1122 (2 len + 1120 ct_spk) + 4 (spk_id u32 BE) + 1 (has_opk=0x00) = 3,543
encode_session_init (with OPK)	4,669 — adds 1122 (2 len + 1120 ct_opk) + 4 (opk_id u32 BE) = 3,543 + 1,126 = 4,669 (§7.4)
encode_ratchet_header (no KEM ct)	1,225
encode_ratchet_header (with KEM ct)	2,347
encode_prekey_bundle (no OPK)	7,808 — 14 (2 len + 12 `"lo-crypto-v1"`) + 3200 (IK_pub) + 1216 (SPK_pub) + 4 (spk_id u32 BE) + 3373 (SPK_sig) + 1 (has_opk=0x00) = 7,808 (§5.3)
encode_prekey_bundle (with OPK)	9,028 — adds has_opk=0x01 (1) + OPK_pub (1216) + opk_id u32 BE (4) = 7,808 − 1 + 1,221 = 9,028 (§5.3)
First-message AAD (no OPK)	3,615
First-message AAD (with OPK)	4,741
Ratchet message AAD (no KEM ct)	1,297
Ratchet message AAD (with KEM ct)	2,419
First-message wire prefix, no OPK (encode + sig)	6,916 (3,543 + 3,373)
First-message wire prefix, with OPK (encode + sig)	8,042 (4,669 + 3,373)
First-message encrypted payload minimum	40 (nonce + tag)
Ratchet ciphertext minimum	16 (Poly1305 tag only)

ML-KEM-768 expanded secret key sub-field layout: The 2400-byte dk_M field has four sub-fields whose offsets matter for cross-library interoperability (the dk_PKE sub-field uses NTT-domain encoding, diverging from FIPS 203 coefficient-domain). See §8.5 for the full offset table (dk_PKE at 0, ek_PKE at 1152, H(ek_PKE) at 2336, z at 2368) and the byte-for-byte comparison procedure for detecting encoding incompatibilities.

Appendix D: References

Key Agreement and KEM Protocols

X3DH: Marlinspike, M. and Perrin, T. "The X3DH Key Agreement Protocol." Signal, 2016. https://signal.org/docs/specifications/x3dh/ — Basis for LO-KEX's asynchronous key agreement design.
PQXDH: Ehren, S., Gershuni, S., and Perrin, T. "The PQXDH Key Agreement Protocol." Signal, 2023. https://signal.org/docs/specifications/pqxdh/ — Signal's PQ extension of X3DH. LO-KEX uses X-Wing as the sole KEM rather than adding PQ KEM alongside DH.
Formal Analysis of Signal: Cohn-Gordon, K., Cremers, C., Dowling, B., Garratt, L., and Stebila, D. "A Formal Security Analysis of the Signal Messaging Protocol." Journal of Cryptology, 2020. https://eprint.iacr.org/2016/1013 — The formal analysis LO-KEX should aspire to.
Modular Double Ratchet: Alwen, J., Coretti, S., and Dodis, Y. "The Double Ratchet: Security Notions, Proofs, and Modularization for the Signal Protocol." EUROCRYPT 2019. https://eprint.iacr.org/2018/1037 — Formal treatment of the Double Ratchet as a composition of CKA and symmetric ratchet. Relevant to LO-Ratchet's KEM-based CKA adaptation.
CKA Extension: Alwen, J., Coretti, S., Dodis, Y., and Tselekounis, Y. "Security Analysis and Improvements for the IETF MLS Standard for Group Messaging." CRYPTO 2021. https://eprint.iacr.org/2019/1189 — Extends the CKA framework. Relevant to understanding what security properties a KEM-based CKA (like LO-Ratchet's) must satisfy.
KEM-based X3DH: Brendel, J., Fischlin, M., Günther, F., Janson, C., and Stebila, D. "Towards Post-Quantum Security for Signal's X3DH Handshake." SAC 2020. https://eprint.iacr.org/2020/1353 — Analyzes replacing DH with KEM in X3DH, including authentication asymmetry and IK encapsulation trade-offs.
Formal Verification of KEM-based AKE: Cremers, C., Jacomme, C., and Lukert, P. "Subgroup-Based Key Agreement Protocols and the Security of KEM-based AKE." CRYPTO 2024. https://eprint.iacr.org/2024/1186 — Recent formal verification methodology for KEM-based key agreement. Relevant approach for future Tamarin/ProVerif analysis of LO-KEX.
PQ Asynchronous Key Exchange: Hashimoto, K. "Post-Quantum Asynchronous Deniable Key Exchange and the Signal Handshake." PKC 2024. https://eprint.iacr.org/2023/1720 — Deniability and authentication in PQ adaptations of Signal's handshake.

Hybrid Constructions

Hybrid AKE: Bindel, N., Brendel, J., Fischlin, M., Goncalves, B., and Stebila, D. "Hybrid Key Encapsulation Mechanisms and Authenticated Key Exchange." PQCrypto 2019. https://eprint.iacr.org/2018/903 — Formal treatment of hybrid KEM/AKE, applicable to X-Wing and LO's hybrid signatures.
Hybrid Signatures: Bindel, N., Herath, U., McKague, M., and Stebila, D. "Transitioning to a Quantum-Resistant Public Key Infrastructure." PQCrypto 2017. https://eprint.iacr.org/2017/460 — Parallel "both must verify" composition (as in LO's Ed25519 + ML-DSA-65) is EUF-CMA secure if either component is.
KEM Combiners: Giacon, F., Heuer, F., and Poettering, B. "KEM Combiners." PKC 2018. https://eprint.iacr.org/2018/024 — Formal analysis of concatenate-then-KDF for multiple KEMs (relevant to ss_ik || ss_spk || ss_opk derivation).

Component Algorithms

X-Wing KEM: Connolly, D. et al. draft-connolly-cfrg-xwing-kem-09. https://eprint.iacr.org/2024/039
X25519 (Diffie-Hellman on Curve25519): Langley, A., Hamburg, M., and Turner, S. "Elliptic Curves for Security." RFC 7748, 2016. https://doi.org/10.17487/RFC7748 — Defines Curve25519 Diffie-Hellman (X25519) as used in the X-Wing classical sub-component (§8). Note §5 of RFC 7748: X25519 implicitly clamps the scalar (bits 0-2 of byte 0 cleared, bit 7 of byte 31 cleared, bit 6 of byte 31 set); the reference implementation relies on this clamping behavior and does not apply it separately. Low-order point handling is described in §6.1 and §8.3.
ML-KEM: NIST FIPS 203, 2024. https://doi.org/10.6028/NIST.FIPS.203 — The NTT-domain encoding used for the dk_PKE sub-field of the ML-KEM expanded secret key (§8.5) is defined in FIPS 203 §4.2.1 (NTT function) and §4.2.2 (ByteEncode/ByteDecode in NTT representation). Reimplementers investigating the NTT-vs-coefficient divergence should consult these subsections specifically; §7.3 (DecapsKeyGen) defines the key generation procedure but uses coefficient-domain internally before ByteEncode is applied.
ML-DSA: NIST FIPS 204, 2024. https://doi.org/10.6028/NIST.FIPS.204
Ed25519: Josefsson, S., Liusvaara, I. "Edwards-Curve Digital Signature Algorithm (EdDSA)." RFC 8032, 2017. https://doi.org/10.17487/RFC8032
Double Ratchet: Perrin, T. and Marlinspike, M. "The Double Ratchet Algorithm." Signal, 2016. https://signal.org/docs/specifications/doubleratchet/

Symmetric Primitives

SHA3-256 and SHAKE256: Dworkin, M. "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions." NIST FIPS 202, 2015. https://doi.org/10.6028/NIST.FIPS.202 — Defines the Keccak-based SHA3-256 hash function and the SHAKE256 extendable-output function (XOF). SHA3-256 is used for identity fingerprints, X-Wing combining (§8), HMAC, and HKDF; SHAKE256 is used in X-Wing's ML-KEM-768 seed expansion step (§8.5 — SHAKE256(seed, 96) expands the 32-byte seed to 96 bytes: d(32) || z(32) || sk_X(32), the ML-KEM-768 generation randomness plus the X25519 secret key). A reimplementer who uses SHAKE256(seed, 64) derives only d and z, missing sk_X — the X25519 component. The correct length is given in Appendix A (XWING_SEED_SHAKE_OUTPUT = 96). Note the 136-byte block size (rate) of SHA3-256 vs SHA-2's 64-byte block size, relevant for raw HMAC implementation.
HMAC: Krawczyk, H., Bellare, M., and Canetti, R. "HMAC: Keyed-Hashing for Message Authentication." RFC 2104, 1997. https://doi.org/10.17487/RFC2104 — Defines the HMAC construction. For HMAC-SHA3-256, block size is 136 bytes (SHA3-256 rate), not the SHA-2 value of 64 bytes.
HKDF: Krawczyk, H. and Eronen, P. RFC 5869, 2010. https://doi.org/10.17487/RFC5869
ChaCha20-Poly1305: Nir, Y., Langley, A. "ChaCha20 and Poly1305 for IETF Protocols." RFC 8439, 2018. https://doi.org/10.17487/RFC8439
XChaCha20-Poly1305 (HChaCha20 extension): Arciszewski, S. "XChaCha20-Poly1305 Construction." draft-irtf-cfrg-xchacha-03, 2020. https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-xchacha-03 — Defines HChaCha20 (the PRF that extends ChaCha20's 8-byte nonce to 24 bytes). RFC 8439 alone does not define HChaCha20 or XChaCha20; this document is the specification for the 24-byte nonce construction used throughout soliton.
Nonce Reuse: Joux, A. "Authentication Failures in NIST version of GCM." 2006. — Why AEAD nonce reuse is catastrophic (applies to Poly1305 as well as GCM); motivates LO's defense-in-depth random nonce for first messages.
Argon2id: Biryukov, A., Dinu, D., Khovratovich, D., and Josefsson, S. "Argon2 Memory-Hard Function for Password Hashing and Proof-of-Work Applications." RFC 9106, 2021. https://doi.org/10.17487/RFC9106 — Password-based key derivation used in §10.6. The Argon2id variant (hybrid of Argon2i and Argon2d) is specified; do not substitute Argon2i or Argon2d.
Zstandard: Collet, Y. and Kucherawy, M. "Zstandard Compression and the application/zstd Media Type." RFC 8878, 2021. https://doi.org/10.17487/RFC8878 — Compression format used for storage blobs (§11.3) and streaming chunks (§15.5). Pure Rust implementation via the ruzstd crate; no dependency on the reference C library.
STREAM: Hoang, V.T., Reyhanitabar, R., Rogaway, P., and Vizár, D. "Online Authenticated-Encryption and its Nonce-Reuse Misuse-Resistance." CRYPTO 2015. https://eprint.iacr.org/2015/189 — Streaming AEAD construction. LO's streaming API uses counter-based nonce derivation (for random access) rather than STREAM's ciphertext chaining.

General

SoK: Secure Messaging: Unger, N. et al. IEEE S&P 2015. https://doi.org/10.1109/SP.2015.22 — Covers TOFU, forward secrecy, deniability. Useful for positioning LO's design choices.
Post-Quantum Key Exchange / OQS: Stebila, D. and Mosca, M. "Post-Quantum Key Exchange for the Internet and the Open Quantum Safe Project." SAC 2016. https://doi.org/10.1007/978-3-319-69453-5_2 — Background on post-quantum key exchange design; liboqs originates from this project.
NIST PQC Standardization: https://csrc.nist.gov/projects/post-quantum-cryptography
EFF Wordlist: Electronic Frontier Foundation large wordlist for passphrase generation (7,776 words). https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases — The embedded copy is the July 2016 version (108,800 bytes, 7776 lines, LF line endings, dice-number prefix stripped). The hash is computed over the file's raw bytes with LF (\n) line endings — CRLF-normalized copies have different byte lengths and a different hash. Windows CRLF trap: On Windows with core.autocrlf=true, Git normalizes LF to CRLF on checkout. After dice-prefix stripping, each line becomes WORD\r — every word gains a trailing carriage return (0x0D). The wordlist hash detects this if verified on the embedded bytes (the embedded CRLF copy produces a different hash), but if stripping and embedding happen at runtime from a file (rather than at build time with a compile-time assertion), the \r appears silently in every word: phrases differ from conforming implementations with no error indicator. Implementations that load the wordlist from a file MUST strip any trailing \r (0x0D) from each line before use, in addition to stripping the dice prefix. SHA3-256 of the raw file: a1e90a00ec269fc42a5f335b244cf6badcf94b62e331fa1639b49cce488c95c5. Reimplementers MUST verify their wordlist matches this hash — different versions or copies of the "EFF large wordlist" produce different phrases for the same indices. Word lookup is case-insensitive; canonical form is lowercase: All words in the embedded wordlist are lowercase ASCII. When looking up a user-entered word (e.g., during phrase verification), comparisons MUST be case-insensitive — "Abacus", "ABACUS", and "abacus" all resolve to the same word. The canonical stored form and the form used for index derivation is lowercase. Implementations MUST normalize user input to lowercase before lookup, not expect the user to type in exact case. A case-sensitive comparison would reject correctly-entered phrases from users who capitalize the first word or type in all-caps.

Raw file format and stripping step: Each line in the original EFF file has the format DDDDD\tWORD\n — a 5-digit decimal dice number (e.g., "11111"), a literal tab character (\t), the word (e.g., "abacus"), and a LF newline. Soliton strips the prefix by discarding every character up to and including the first tab on each line, retaining only the word. The resulting embedded wordlist is one word per line with no dice prefix, no tab, and no trailing whitespace. A reimplementer who strips only the digits (not the tab), or who splits on whitespace and takes the last token, produces the same words but must verify against the hash. A reimplementer who takes the first token instead of the last gets the dice number, not the word — a silent interop failure.

Appendix E: Implementor's Guide

This appendix consolidates security-critical requirements scattered throughout the specification into a single reference for binding authors and application developers.

RNG Requirements

All randomness must come from the OS CSPRNG (getrandom on Linux, CryptGenRandom on Windows, SecRandomCopyBytes on macOS/iOS). There is no fallback mechanism — RNG failure is fatal.

The following operations consume randomness:

Operation	Randomness consumed	Section
`generate_identity`	Ed25519 keygen, X25519 keygen, ML-KEM-768 keygen, ML-DSA-65 seed	§2.1, §3.1
`xwing::keygen`	X25519 keygen, ML-KEM-768 keygen	§2.3
`xwing::encapsulate`	X25519 ephemeral scalar, ML-KEM-768 encap coins	§2.3
`HybridSign`	ML-DSA-65 hedged `rnd` (32 bytes, ephemeral, zeroized after `Sign_internal` returns — §3.1)	§3.1
`encrypt_first_message`	192-bit random nonce	§5.4
KEM ratchet step (send)	`xwing::keygen` + `xwing::encapsulate`	§6.4
`auth_challenge`	`xwing::encapsulate`	§4.2
Call ephemeral KEM	`xwing::keygen` + `xwing::encapsulate`	§6.12
`stream_encrypt_init`	192-bit random base nonce	§15.1
Stream key (caller)	256-bit random key (one per stream, MUST NOT be derived from ratchet material)	§15.1

Failure Semantics

Operation	Error	Rollback	State after	Retryable?
`encrypt()`	AEAD failure	Session-fatal. All session keys zeroized as defense-in-depth — a transient AEAD failure followed by retry could produce valid encryption with compromised internal state. Send counter is not incremented (§6.5), but the session is irrecoverable after key zeroization.	Permanently unusable — discard session.	No — new session required.
`decrypt()`	`InvalidData` (dead session: zeroed `root_key`)	Returns before snapshot — no state mutation occurs.	Unchanged.	No — session is permanently dead (post-reset or deserialized from zeroed state). New session required.
`decrypt()`	`InvalidData` (missing `prev_recv_epoch_key`)	Returns before snapshot — no state mutation occurs.	Unchanged.	No — structural error in the message/state combination.
`decrypt()`	`ChainExhausted` (header `n == u32::MAX`)	Returns after snapshot but before any state mutation — rollback is a no-op. The guard is the first operation inside the inner decryption function, before epoch routing, KEM ratchet, or key derivation. The §6.6 pseudocode shows it before the snapshot for presentational clarity; both orderings are correct since no mutations precede the guard.	Unchanged.	No — counter value is inherent to the message.
`decrypt()`	`AeadFailed`	Full snapshot/rollback (§6.6). State is restored to pre-decrypt values.	Unchanged.	Yes — caller may retry with different messages.
`decrypt()`	`DuplicateMessage`	Full snapshot/rollback (§6.6). Rollback is a no-op: `DuplicateMessage` can only occur in previous-epoch or current-epoch paths where no state fields are modified before the duplicate check.	Unchanged.	No — message was already processed.
`decrypt()`	`ChainExhausted` (`recv_seen` cap)	Full snapshot/rollback (§6.6).	Unchanged.	No — epoch's `recv_seen` set is full (65536 entries). For current-epoch saturation: requires peer to send from a new epoch (one KEM ratchet step = one direction change). For `prev_recv_seen` saturation: the next KEM ratchet step copies the current `recv_seen` into `prev_recv_seen` — if the current set is also full, `prev_recv_seen` remains saturated after the step. Full recovery from `prev_recv_seen` saturation may require two direction changes (the first rotates current into previous; the second discards the saturated previous).
`decrypt()`	`InvalidData` (`send_ratchet_sk` is None in new-epoch path)	Returns after snapshot but before any state mutation — rollback is a no-op. The new-epoch path checks `send_ratchet_sk` presence before performing the KEM ratchet receive step (the `else if NOT current_epoch:` block in §6.6).	Unchanged.	No — same message will fail again on any ratchet state (structurally malformed: new-epoch message requires decapsulation with local secret key).
`decrypt()`	`InvalidData` (`kem_ct` absent in new-epoch message)	Returns after snapshot but before any state mutation — rollback is a no-op.	Unchanged.	No — same message will fail again (structurally malformed: new-epoch header lacks KEM ciphertext).

Streaming AEAD Failure Semantics

Key differences from ratchet encrypt/decrypt:

Operation	Error	State after	Retryable?
`encrypt_chunk`	`AeadFailed`	Unchanged — `next_index` not advanced, `finalized` not set.	Yes — retry with the same plaintext. Note: in practice, `AeadFailed` from `encrypt_chunk` is structurally unreachable — XChaCha20-Poly1305 encrypt can only fail on usize overflow (§7.1), which cannot occur with chunk sizes bounded by `CHUNK_SIZE` (1,048,576 bytes). If encountered, it indicates an unexpected integer overflow in the AEAD layer, not a transient condition.
`encrypt_chunk`	`ChainExhausted` (`next_index == u64::MAX`)	Unchanged — guard fires before any state mutation.	No — `next_index` cannot advance further. The handle is not freed; call `soliton_stream_encrypt_free`.
`encrypt_chunk`	`Internal` (zstd expansion > `STREAM_ZSTD_OVERHEAD`)	Unchanged — guard fires before AEAD.	Yes — retry the same chunk with `compress = false`. This is not session-fatal; the streaming key is not zeroized.
`encrypt_chunk`	`InvalidData` (post-finalization or bad chunk size)	Unchanged.	No — structural caller error.
`decrypt_chunk`	`AeadFailed`	Unchanged — `next_index` not advanced, `finalized` not set.	Yes — retry or skip; the decryptor survives.
`decrypt_chunk`	`ChainExhausted`	Unchanged.	No — same semantics as encrypt-side.
`decrypt_chunk`	`InvalidData` (post-finalization, framing, or version mismatch)	Unchanged.	No.

Critical differences from ratchet:

All streaming failures are retry-safe — the streaming key is NEVER zeroized on per-chunk error (unlike ratchet encrypt(), where AeadFailed zeroizes all keys and makes the session permanently unusable). Retrying a failed chunk with the correct input will succeed.
Internal from compression expansion is retryable — pass compress = false for the affected chunk. This is an encode-side-only path (no oracle concern); no session state is affected.
ChainExhausted from a streaming handle does not affect the ratchet state in any way — the two are independent.

stream_encrypt_free / stream_decrypt_free outer-null behavior differs from ratchet/keyring/callkeys free: soliton_ratchet_free, soliton_keyring_free, and soliton_call_keys_free treat outer-null as a safe no-op (return 0). soliton_stream_encrypt_free and soliton_stream_decrypt_free return NullPointer (-13) for outer-null. The rationale: soliton_stream_encrypt_init and soliton_stream_decrypt_init always write a non-null handle on success — a null outer pointer cannot arise from normal use (init succeeded, producing a valid handle; init failed, leaving the output unchanged). An outer-null pointer to a stream free function signals a caller bug (passing an uninitialized pointer or a wrong variable), whereas a null outer pointer to ratchet/keyring/callkeys free more plausibly arises from defensive cleanup patterns. A reimplementer who makes all free functions return 0 for outer-null will diverge silently; a binding author who expects NullPointer for stream-free outer-null and tests for it will not catch the bug if using the non-streaming free functions.

Caller Obligations

Fingerprint → key resolution: The caller is responsible for mapping identity fingerprints to authentic public keys. Incorrect resolution causes §5.5 Step 1 (fingerprint mismatch) or Step 3 (signature verification failure) to fail; the session does not establish silently with a wrong key. The library provides fingerprint_hex() and verification phrases (§9) but does not manage identity stores, TOFU, or key pinning.

receive_session fingerprint mismatch returns InvalidData, not BundleVerificationFailed: receive_session is called with a parsed SessionInit (not a bundle), so BundleVerificationFailed is not applicable. A fingerprint mismatch (sender or recipient fingerprint does not match expected values) in receive_session returns InvalidData. BundleVerificationFailed applies only to verify_bundle (§5.3), which also collapses crypto-version mismatches and signature failures to BundleVerificationFailed to prevent an enumeration oracle. Callers who pattern-match on the error from receive_session expecting BundleVerificationFailed will never match it — all fingerprint validation failures from receive_session arrive as InvalidData.
OPK deletion: Must happen before the ratchet state is used for messaging (§5.5 Step 4, §10.3). Failure to delete allows an attacker who later compromises the OPK to recover the session key.
SPK rotation: 7-day cycle with 30-day grace period for old secret keys (§10.2). Stale SPKs reduce forward secrecy.
Secret key zeroization: IdentitySecretKey, xwing::SecretKey, and shared secrets implement ZeroizeOnDrop in Rust. CAPI callers must free library-allocated buffers via soliton_buf_free and zeroize caller-owned copies of secret material via soliton_zeroize — standard C memset may be optimized out. Failing to do either leaks key material.
Concurrency: All opaque CAPI handles (SolitonRatchet, SolitonKeyRing, SolitonCallKeys, SolitonStreamEncryptor, SolitonStreamDecryptor) are not thread-safe. Each handle embeds an AtomicBool reentrancy guard — concurrent calls on the same handle return ConcurrentAccess (-18) rather than corrupting state. This is a diagnostic for caller threading bugs, not a retriable error. Callers must serialize access per handle (e.g., mutex). Concurrent use of different handles is safe. SolitonKeyRing is particularly deceptive: a server encrypting storage blobs for multiple users might naturally share a single keyring across threads, but this will trigger ConcurrentAccess. Create one keyring per thread instead.
Storage decompression: The 256 MiB decompression limit (§11.3) is enforced internally. Callers need not guard against zip bombs.
Stream key freshness: Each stream key MUST be freshly generated from the OS CSPRNG (random_bytes(32)). Do not derive stream keys from ratchet material (epoch key, root key, call key) — a ratchet epoch compromise would propagate to all streams whose keys were derived from the compromised epoch, defeating per-stream isolation. The standard composition: generate a random key, encrypt the stream, then transmit the key inside a ratchet-encrypted message alongside stream metadata (§15.1).
Auth shared-secret zeroization: The shared secret returned by auth_respond (§4.2) must be zeroized immediately after use. In Rust, auth_respond returns Zeroizing<[u8; 32]> (automatic). CAPI callers receive the shared secret in a caller-provided buffer and must call soliton_zeroize on it after consuming the value. Failure to zeroize leaves the authentication shared secret on the heap, recoverable via memory scanning.
Argon2id parameter coupling: m_cost must be ≥ 8 × p_cost (RFC 9106 §3.1). This constraint is enforced at the library level — soliton_argon2id returns InvalidData for combinations where m_cost < 8 × p_cost (e.g., m_cost=100, p_cost=100). Per-parameter range checks (m_cost ∈ [8, 4,194,304], p_cost ∈ [1, 256]) pass individually for such combinations, so a binding author who validates parameters against per-field bounds only will not discover the error until the library call returns InvalidData. The coupling check must be performed in addition to the individual range checks (§10.6 / Appendix B parameter limits).
Old SPK secret key zeroization: After the 30-day SPK retention window (§10.2), the old SPK secret key MUST be zeroized and discarded. Retaining old SPK private keys beyond the retention window allows an attacker who later compromises those keys to decrypt sessions established with the corresponding SPK, retroactively breaking forward secrecy for the retention period. The library does not manage SPK lifecycle or trigger zeroization automatically — this is the caller's responsibility. See §10.2 for the rotation schedule.
Ephemeral ek_sk zeroization after ratchet init: The X-Wing ephemeral secret key (ek_sk, 2432 bytes) in SolitonInitiatedSession is passed to soliton_ratchet_init_alice and must be zeroized and freed immediately after. soliton_kex_initiated_session_free performs both the zeroization and deallocation. Do not retain ek_sk after init_alice returns — the ratchet has absorbed the public key counterpart (ek_pk); the private key is no longer needed and its continued presence in memory extends the window for side-channel or memory-dump attacks. The same obligation applies if ratchet_init_alice returns an error: zeroize and free ek_sk before retrying or cleaning up. See §13.5 for the single-use enforcement note.
Persistent session deserialization MUST use from_bytes_with_min_epoch, not bare from_bytes: When deserializing a persisted ratchet state (§6.8), callers MUST use soliton_ratchet_from_bytes_with_min_epoch (passing the epoch value stored on the last successful to_bytes call as min_epoch). Using bare from_bytes / soliton_ratchet_from_bytes silently accepts a rolled-back epoch, permanently disabling anti-rollback protection — an attacker who can substitute an earlier blob snapshot will cause message-key replay. Bare from_bytes is provided for initial deserialization only (when no prior epoch exists — specifically, when the application has never successfully completed a to_bytes call on this session and therefore has no persisted min_epoch value to supply), not for routine restore-from-disk operations. Binding-layer obligation: soliton_ratchet_from_bytes in soliton.h has no deprecation marker in the C header — the deprecation exists only at the Rust API level. Binding authors MUST add a language-level deprecation annotation when wrapping this function (e.g., @deprecated in Java/Kotlin, [Obsolete] in C#, #[deprecated] in Rust re-exports, a deprecation warning in Python docstrings) so that callers of the binding receive the same guidance as callers of the Rust API. A session that has been serialized at least once always has a min_epoch to supply; that value MUST be stored persistently alongside the encrypted ratchet blob. Losing the min_epoch store (e.g., due to a crash or storage error) does NOT qualify as "no prior epoch" — the correct response is to treat the session as potentially compromised (reset it and re-establish via LO-KEX), not to fall back to bare from_bytes. Callers who always use bare from_bytes will never observe an error from anti-rollback rejection, even when a rollback attack is in progress.
recv_seen saturation recovery requires a peer KEM ratchet step, not local state manipulation: When decrypt() returns ChainExhausted due to recv_seen saturation (65536 entries — §6.8), the correct recovery path is to wait for the peer to send a new message from a new epoch (a KEM ratchet step, triggered by sending in a direction that requires a new ephemeral key). The recv_seen set resets automatically on the next KEM ratchet step — no explicit caller action is needed. Callers MUST NOT attempt to clear or manipulate the recv_seen set directly; the ratchet state provides no API for this, and the set's integrity is essential for replay protection. An application that interprets ChainExhausted from decrypt() as session-fatal will incorrectly terminate a recoverable session — see §12 for the full per-mode ChainExhausted breakdown.

Constant-Time Requirements

Operation	Requirement	Implementation
`auth_verify`	Constant-time comparison	`subtle::ConstantTimeEq`
AEAD tag verification	Constant-time	Handled by `chacha20poly1305` crate
`hybrid_verify`	Constant-time AND combination of Ed25519 + ML-DSA results	`subtle::Choice` bitwise AND (§3.2) — short-circuit `&&` leaks which component failed, enabling targeted forgery. `Err` returns must not cause early exit before both verifications complete: libraries that return `Err` (rather than `Ok(false)`) on verification failure (e.g., for malformed-length signatures) must be wrapped so that both components are evaluated before any error is returned — an early `?` propagation on the first component's `Err` skips the second component entirely, leaking which half failed. The reference wraps both calls to produce `subtle::Choice` values before combining; a reimplementer must apply the same pattern when the underlying library uses error returns rather than boolean-valued verification results.
Epoch identification (§6.6)	Constant-time public key comparison	`subtle::ConstantTimeEq` on `ratchet_pk` vs `recv_ratchet_pk` / `prev_recv_ratchet_pk` — variable-time comparison would leak which epoch a message belongs to (current, previous, or new), revealing ratchet state to a timing attacker
Root key liveness check (§6.5, §6.6)	Constant-time all-zero comparison	`subtle::ConstantTimeEq` — defense-in-depth against partial-zero leakage
`derive_call_keys` secret input checks (§6.12)	Constant-time all-zero comparison for `root_key` and `kem_ss`	`subtle::ConstantTimeEq` — both are secret material; `call_id` and fingerprint equality use variable-time `==` (non-secret public values)
`verify_bundle` IK_pub comparison (§5.3)	Constant-time comparison	`subtle::ConstantTimeEq` on the full 3200-byte stored identity key vs. `bundle.IK_pub` — a variable-time comparison leaks the stored key byte-by-byte via response timing (32 × 100-byte probes, each byte determined at the cost of constructing a crafted bundle, far cheaper than paying `HybridVerify` per probe). `verify_bundle` collapses all failures to `BundleVerificationFailed` but does not prevent timing measurements on that common return path.
`receive_session` fingerprint checks (§5.5 Step 1)	Constant-time comparison	`subtle::ConstantTimeEq` on untrusted wire fingerprints before signature verification — variable-time comparison would let an attacker probe the expected fingerprint byte-by-byte via timing
X25519 DH all-zero output check (§8.3)	Constant-time comparison	`subtle::ConstantTimeEq` against `[0u8; 32]` — the DH output is secret material before the check fires. A variable-time comparison leaks whether the low-order-point substitution path was taken, revealing a bit of information about the relationship between the ephemeral key and the recipient's public key
`StorageKey::new` all-zero key rejection (§11.6)	Constant-time comparison	`subtle::ConstantTimeEq` against `[0u8; 32]` — the key is secret material; variable-time comparison leaks partial information about the key value during the liveness check
Streaming layer key quality (§15.1)	No all-zero check — deliberate asymmetry with storage layer. `stream_encrypt_init` and `stream_decrypt_init` do NOT validate that the caller-provided key is non-zero. Storage keys are library-managed, long-lived, and validated at construction time (`StorageKey::new`); streaming keys are ephemeral, caller-provided, and used once — validating them would be a caller-responsibility violation. See §15.1 "All-zero key policy" for full rationale. A reimplementer who adds an all-zero guard to streaming init for consistency with the storage layer diverges from the specification.	N/A — streaming init does not inspect key quality
Stream header version/flags byte comparisons (§15.8)	No constant-time requirement — public values	The version byte and flags byte in the stream header are cleartext, attacker-visible values. Variable-time comparison leaks no information beyond what is already observable from the header bytes themselves. Constant-time comparison (e.g., `subtle::ConstantTimeEq`) would provide no security benefit here and would add unnecessary complexity. This is in contrast to AEAD tag verification (always CT) and ratchet epoch identification (CT to prevent timing leakage of ratchet state).
All other operations	No constant-time requirement at the protocol level	—

Appendix F: Test Vectors

All values are hex-encoded. These vectors enable a reimplementor to verify their primitive constructions before attempting full protocol integration.

F.1 KDF_MsgKey (§6.5)

epoch_key:  4242424242424242424242424242424242424242424242424242424242424242
counter:    7
HMAC input: 0100000007   (0x01 || BE32(7))
msg_key:    cac256e53d0b0abc468331210d63c50f15ec875c3badfef6bfe53e1137165610

Construction: HMAC-SHA3-256(key=epoch_key, data=0x01 || counter_BE32)

Counter = 0 (first message in Bob's initial epoch and in every post-KEM-ratchet epoch for both parties). Alice's first ratchet send uses counter=1, not 0 — her send_count starts at 1 after session initiation (§6.2). See §6.7.1 for a worked example where Alice's first message has n=1.

epoch_key:  4242424242424242424242424242424242424242424242424242424242424242
counter:    0
HMAC input: 0100000000   (0x01 || BE32(0))
msg_key:    5ac7a1b8dd3103a3ef7bab0af995570a087b6a92b34d93bc8c88f3485e96054d

F.2 KDF_Root (§6.4)

root_key (salt):  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
kem_ss (ikm):     bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
info:             "lo-ratchet-v1"
new_root_key:     db7be3c198f86c5e044d6f5c39d526eaf72a651a4cd6b7d32b1adb6b6754d587
new_epoch_key:    71ceff4de7d184f3c97821177dc5afcc2abc334707301c0b9267a3f4b0aa0ff9

Construction: HKDF-SHA3-256(salt=root_key, ikm=kem_ss, info="lo-ratchet-v1", len=64). First 32 bytes = new root key, last 32 bytes = new epoch key.

F.3 X-Wing Combiner (§8.2)

ss_M:     1111111111111111111111111111111111111111111111111111111111111111
ss_X:     2222222222222222222222222222222222222222222222222222222222222222
ct_X:     3333333333333333333333333333333333333333333333333333333333333333
pk_X:     4444444444444444444444444444444444444444444444444444444444444444
label:    5c2e2f2f5e5c
output:   40ad7dbc0dd87305287bd9a9104f5dc064db038a8ac3da443fe3a090a272e2d5

Construction: SHA3-256(ss_M || ss_X || ct_X || pk_X || label). Label goes last (draft-09 §5.3).

pk_X in this vector is a fabricated test value: The pk_X = 0x44...44 input above is not derived from any real X25519 scalar — it is a fixed constant used to verify the SHA3-256 combiner construction independently of X25519 arithmetic. In the actual protocol, pk_X is re-derived during decapsulation as X25519(sk_X, G) (§8.2 and §8.5), where G is the standard X25519 base point: the 32-byte little-endian encoding of the integer 9, i.e., 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 (RFC 7748 §6.1). This is the one call to X25519 that is NOT a Diffie-Hellman step — it is a public-key rederivation: X25519(scalar, basepoint). A reimplementer who accidentally uses ct_X (the ephemeral key from the ciphertext, 32 bytes) instead of G (the fixed base point) produces wrong ss_X silently — the combiner runs, AEAD fails, and no diagnostic points to the wrong base point. No fabricated X25519 key-derivation vector is provided here — use the RFC 7748 §6.1 KAT vectors to validate X25519(scalar, basepoint) separately, then combine with this combiner vector to build confidence in the full XWing.Decaps pipeline. Clamping divergence is the primary risk: an implementation that omits RFC 7748 clamping (clear bits 0, 1, 2, 255; set bit 254) before the scalar multiply produces a different output without any error signal — both clamped and unclamped scalars produce valid curve points, just different ones. RFC 7748 §6.1 test vector 1 (u=09...00, k=77...00) exercises the clamped path; compare your X25519(sk_X, G) output against that vector to confirm your library applies clamping before the multiply. §8.5 documents the per-use clamping requirement in detail.

Using draft-09 X-Wing KAT for full pipeline verification: The IETF draft-connolly-cfrg-xwing-kem-09 provides a full X-Wing decapsulation KAT in its Appendix C (SHAKE-256 seed → key generation → encapsulation → decapsulation → shared secret). Applying it to LO requires three adaptations:

Key ordering: LO uses X25519-first storage (sk_X (32) || dk_M (2400) for secret key, pk_X (32) || pk_M (1184) for public key); draft-09 uses ML-KEM-first. Extract and reorder components before using draft-09 vectors.
Seed expansion: LO expands the X-Wing seed via SHAKE-256(seed, 96) → d (32) || z (32) || sk_X (32). The d and z values feed ML-KEM key generation; sk_X is the X25519 scalar. This matches draft-09 §6.2; verify your SHAKE-256 implementation produces the same intermediate values.
Ciphertext ordering: LO's ciphertext is ct_X (32) || ct_M (1088) (X25519-first); draft-09's ciphertext is ct_M (1088) || ct_X (32). Swap when extracting from draft-09 test vectors. The combiner inputs remain identical once extracted: SHA3-256(ss_M || ss_X || ct_X || pk_X || label).

The source contains an xwing_draft09_decap_kat test that performs exactly this adaptation — use it as a reference for the above steps.

Byte-order swap produces silent wrong output via ML-KEM implicit rejection: If the ciphertext byte order is not adapted (i.e., a draft-09 ML-KEM-first ciphertext ct_M(1088) || ct_X(32) is passed to LO's X25519-first decapsulator as-is), LO extracts the first 32 bytes as ct_X (these are actually the first 32 bytes of ct_M) and the remaining 1088 bytes as ct_M (these are the last 1056 bytes of ct_M concatenated with the actual ct_X). ML-KEM-768 decapsulation of the malformed 1088-byte input does not fail with an error — FIPS 203 defines implicit rejection: when decapsulation fails (ciphertext does not re-encrypt to itself), the function returns a pseudorandom output derived from a pre-keyed hash rather than reporting an error. So ss_M becomes a pseudorandom value, ss_X is computed from wrong bytes, the combiner produces a wrong but plausible-looking 32-byte output, and the AEAD fails with no diagnostic pointing to the byte-order swap. A byte-order-swap bug in a reimplementation is therefore completely invisible until AEAD fails; no intermediate value signals the error. The xwing_draft09_decap_kat test catches this by comparing against the expected shared secret after decapsulation — any byte-order mistake causes a test-vector mismatch at that comparison point.

F.4 Identity Fingerprint (§2.1)

pk:           5555...55  (3200 bytes of 0x55)
fingerprint:  6197102522f51ba35cf4e2e721ffcc5a1ae8e9dc14442b093bc0388696569a4d

Construction: SHA3-256(pk)

F.5 Auth HMAC (§4)

shared_secret:  cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
label:          "lo-auth-v1"
hmac_output:    b12569ef76edbe2f1215b876d89db5f067bdbf35bd99c6d0bcd47733609f02cf

Construction: HMAC-SHA3-256(key=shared_secret, data="lo-auth-v1"). This vector covers only the HMAC step. The shared secret is the X-Wing combined output (§8.2) from the X-Wing encapsulation/decapsulation in §4.2 — see F.3 for the combiner KAT and §8.2 for the full encap/decap pseudocode.

F.6 Verification Phrase Hash (§9.2)

pk_a:   0101...01  (3200 bytes of 0x01)
pk_b:   0202...02  (3200 bytes of 0x02)
sorted: pk_a first (lexicographic)
hash:   94488b955db55587ef0e0b1721a6db95b62b6f2c61ba158a557a2e007c7638b9

Construction: SHA3-256("lo-verification-v1" || sorted_first || sorted_second)

Word output (from rejection sampling on the hash above):

Samples (u16 big-endian from hash bytes):
  [0..2]  0x9448 = 37960  → accepted, 37960 % 7776 = 6856  → "triangle"
  [2..4]  0x8b95 = 35733  → accepted, 35733 % 7776 = 4629  → "phobia"
  [4..6]  0x5db5 = 23989  → accepted, 23989 % 7776 = 661   → "breeder"
  [6..8]  0x5587 = 21895  → accepted, 21895 % 7776 = 6343  → "sterile"
  [8..10] 0xef0e = 61198  → accepted, 61198 % 7776 = 6766  → "tibia"
  [10..12] 0x0b17 = 2839  → accepted, 2839 % 7776 = 2839   → "gerbil"
  [12..14] 0x21a6 = 8614  → accepted, 8614 % 7776 = 838    → "caption"

Phrase: "triangle phobia breeder sterile tibia gerbil caption"

All 7 samples accepted with no rejections (no rehash needed). This vector verifies the full pipeline: hash → u16 extraction → rejection sampling → modular index → EFF wordlist lookup. Note: Neither this vector nor the "with rejection" vector below exercises the rehash path (§9.2: when all 16 u16 samples from a 32-byte hash are exhausted before producing 7 accepted words, compute SHA3-256("lo-phrase-expand-v1" || round || hash) and continue sampling from the new hash). Both vectors complete within the first hash. The rehash path is tested by unit tests with adversarial inputs; a KAT vector is impractical because no naturally-occurring fingerprint pair is known to require rehashing (the probability of exhausting 16 samples is approximately (1 − 62208/65536)^16 ≈ 2 × 10⁻²¹).

With rejection (exercises cursor-advance-on-reject behavior):

pk_a:   0808...08  (3200 bytes of 0x08)
pk_b:   0101...01  (3200 bytes of 0x01)
sorted: pk_b first (lexicographic: 0x01 < 0x08)
hash:   9ea8205db7552a2a0679fbe6760b49fb59b46559ea3a44708ec7feb19b1c8d85

Samples (u16 big-endian from hash bytes):
  [0..2]   0x9ea8 = 40616  → accepted, 40616 % 7776 = 1736  → "despise"
  [2..4]   0x205d = 8285   → accepted, 8285 % 7776 = 509    → "barrier"
  [4..6]   0xb755 = 46933  → accepted, 46933 % 7776 = 277   → "approve"
  [6..8]   0x2a2a = 10794  → accepted, 10794 % 7776 = 3018  → "grinch"
  [8..10]  0x0679 = 1657   → accepted, 1657 % 7776 = 1657   → "degrading"
  [10..12] 0xfbe6 = 64486  → REJECTED (≥ 62208), cursor advances to [12]
  [12..14] 0x760b = 30219  → accepted, 30219 % 7776 = 6891  → "tropical"
  [14..16] 0x49fb = 18939  → accepted, 18939 % 7776 = 3387  → "implosive"

Phrase: "despise barrier approve grinch degrading tropical implosive"

This vector exercises the critical cursor-advance-on-reject behavior: sample [10..12] (0xfbe6 = 64486) fails the < 62208 acceptance check and is discarded. The cursor advances past it to [12..14] — it does NOT retry at [10..12]. A reimplementer who advances the cursor only on accepted samples would read [10..12] as the 6th accepted sample and produce a different phrase. The rejection also means 8 u16 samples are consumed for 7 words (16 bytes of the 32-byte hash).

F.7 Ratchet Nonce from Counter (§6.5)

counter:  42
nonce:    000000000000000000000000000000000000000000000000002a  (24 bytes: zeros with BE32(42) in bytes 20..24)

Counter occupies the last 4 bytes of a 24-byte nonce buffer. Bytes 0-19 are zero.

Counter = 0 (first message of each epoch and of every post-KEM-ratchet epoch):

counter:  0
nonce:    000000000000000000000000000000000000000000000000  (24 bytes: all zero)

An all-zero nonce is valid — see §7.2 for rationale. A reimplementer who guards against all-zero nonces as a "sanity check" would incorrectly reject every epoch's first message.

Counter = 1 (second message, validates that counter increments appear at the correct byte positions):

counter:  1
nonce:    000000000000000000000000000000000000000000000001  (24 bytes: zeros with BE32(1) in bytes 20..24)

The difference between counter=0 and counter=1 is a single bit flip at byte 23 (the least-significant byte of the 4-byte big-endian counter). A reimplementer whose counter goes in bytes 0-3 (wrong end) instead of bytes 20-23 would produce 01000000...00 for counter=1 — the counter=1 vector catches this.

F.8 Storage AAD Construction (§11.4.1)

version:    01
flags:      00 (uncompressed)
channel_id: "general"     (67656e6572616c)
segment_id: "2024-03-15"  (323032342d30332d3135)

aad: 6c6f2d73746f726167652d76310100000767656e6572616c000a323032342d30332d3135
     (36 bytes)

Construction: "lo-storage-v1" || version || flags || len(channel_id) || channel_id || len(segment_id) || segment_id. Length prefixes are 2-byte big-endian.

F.9 Streaming Nonce Derivation (§15.3)

base_nonce:   101112131415161718191a1b1c1d1e1f2021222324252627

chunk_index=0, tag_byte=0x00 (non-final):
  mask:         000000000000000000000000000000000000000000000000
  chunk_nonce:  101112131415161718191a1b1c1d1e1f2021222324252627

chunk_index=2, tag_byte=0x00 (non-final):
  mask:         000000000000000200000000000000000000000000000000
  chunk_nonce:  101112131415161518191a1b1c1d1e1f2021222324252627

chunk_index=0, tag_byte=0x01 (final — single-chunk stream):
  mask:         000000000000000001000000000000000000000000000000
  chunk_nonce:  101112131415161719191a1b1c1d1e1f2021222324252627

chunk_index=2, tag_byte=0x01 (final):
  mask:         000000000000000201000000000000000000000000000000
  chunk_nonce:  101112131415161519191a1b1c1d1e1f2021222324252627

Construction: mask = chunk_index(u64 BE) || tag_byte || 0x00*15, chunk_nonce = base_nonce XOR mask.

The pair (chunk_index=2, tag_byte=0x00) and (chunk_index=2, tag_byte=0x01) verifies that tag_byte participates in the XOR: the two nonces differ only at byte 8 (0x18 vs. 0x19), which is exactly where tag_byte sits in the mask. An implementation that omits tag_byte from the mask (or passes it as a constant) would compute the same nonce for both entries — the distinct expected values catch this silently.

The (chunk_index=0, tag_byte=0x01) entry covers the single-final-chunk case — the most common real-world usage for small files. An implementation that XORs tag_byte at the wrong byte position in the mask produces the correct nonce for (0, 0x00) (mask is all-zeros regardless) but the wrong nonce for (0, 0x01) (where the tag byte's position within the zero-index mask is the only distinguishing feature). Without this vector, a mask-ordering bug would pass all other F.9 entries.

Debugging property at chunk_index=0, tag_byte=0x00: The mask is all-zeros, so chunk_nonce == base_nonce exactly. If decryption fails for index-0, the bug is in header parsing or base_nonce extraction — not in the XOR step (which is a no-op at this index). If index-0 succeeds but a subsequent index fails, the bug is in the mask construction or XOR logic.

chunk_index = u64::MAX boundary vector: Detects 32-bit-truncation bugs — implementations using a u32 counter silently compute a different nonce for any chunk index above u32::MAX = 0xFFFFFFFF. At u64::MAX, bytes 0-7 of the mask are all 0xFF (vs. only bytes 4-7 for u32::MAX), so the two implementations produce nonces that differ in bytes 0-3.

chunk_index=u64::MAX (0xFFFFFFFFFFFFFFFF), tag_byte=0x00 (non-final):
  mask:         ffffffffffffffff00000000000000000000000000000000
  chunk_nonce:  efeeedecebeae9e818191a1b1c1d1e1f2021222324252627

chunk_index=u64::MAX (0xFFFFFFFFFFFFFFFF), tag_byte=0x01 (final):
  mask:         ffffffffffffffff01000000000000000000000000000000
  chunk_nonce:  efeeedecebeae9e819191a1b1c1d1e1f2021222324252627

Computed using the same base_nonce = 101112...27 as above: bytes 0-7 XOR 0xFF each; byte 8 XOR tag_byte; bytes 9-23 unchanged.

F.10 Streaming AAD Construction (§15.4)

version:    01
flags:      00 (uncompressed)
base_nonce: 101112131415161718191a1b1c1d1e1f2021222324252627
caller_aad: "" (empty)

chunk_index=0, tag_byte=0x00:
  aad: 6c6f2d73747265616d2d76310100101112131415161718191a1b1c1d1e1f2021222324252627000000000000000000
       (47 bytes)

chunk_index=0, tag_byte=0x00, caller_aad="file-abc-123":
  aad: 6c6f2d73747265616d2d76310100101112131415161718191a1b1c1d1e1f202122232425262700000000000000000066696c652d6162632d313233
       (59 bytes)

chunk_index=2, tag_byte=0x01, caller_aad="file-abc-123":
  aad: 6c6f2d73747265616d2d76310100101112131415161718191a1b1c1d1e1f202122232425262700000000000000020166696c652d6162632d313233
       (59 bytes)

Construction: "lo-stream-v1" || version || flags || base_nonce || chunk_index(u64 BE) || tag_byte || caller_aad.

Why three entries: The first entry (non-final, empty AAD) and third entry (final, non-empty AAD) do not cover the combination non-final + non-empty AAD. A reimplementer who adds a spurious length prefix to caller_aad only when tag_byte == 0x01 (final) passes entries 1 and 3 but fails entry 2. Entry 2 catches this bug.

Note: F.9 and F.10 cover only flags=0x00 (uncompressed). A compressed vector (flags=0x01) exercising the zstd-before-AEAD encrypt path and post-AEAD-decompress decrypt path is needed for complete cross-implementation validation of the compression integration. The AAD construction is identical (only the flags byte differs); the key difference is the plaintext/ciphertext relationship (compressed vs. raw).

F.11 KDF_KEX Session Key Derivation (§5.4 Step 4)

ss_ik:            1111111111111111111111111111111111111111111111111111111111111111
ss_spk:           2222222222222222222222222222222222222222222222222222222222222222
(no OPK)
ikm:              11111111111111111111111111111111111111111111111111111111111111112222222222222222222222222222222222222222222222222222222222222222
                  (64 bytes: ss_ik || ss_spk)

salt:             0000000000000000000000000000000000000000000000000000000000000000

alice_ik_pub:     0xAA repeated × 3200 bytes
bob_ik_pub:       0xBB repeated × 3200 bytes
ek_pub:           0xCC repeated × 1216 bytes
crypto_version:   "lo-crypto-v1" (6c6f2d63727970746f2d7631)

info (7645 bytes): "lo-kex-v1"                           // 9 bytes raw
                || 000c 6c6f2d63727970746f2d7631         // len(cv) + crypto_version
                || 0c80 [AA × 3200]                      // len(alice_ik) + alice_ik_pub
                || 0c80 [BB × 3200]                      // len(bob_ik) + bob_ik_pub
                || 04c0 [CC × 1216]                      // len(ek) + ek_pub

root_key:         5067b4b2c0b33aafa8be7805a7b1a136c32e7769624b8e78cc762c6194a3322c
epoch_key:        4ee99ff8ff9588a8c1df8819cb0bd49bd39277412f668c6be4ea0850220e8000

Construction: HKDF-SHA3-256(salt=0x00{32}, ikm=ss_ik||ss_spk, info=info, len=64). First 32 bytes = root_key, last 32 bytes = epoch_key. The info field uses 2-byte BE length prefixes for all components. The "lo-kex-v1" prefix is raw (not length-prefixed).

Missing intermediate checkpoint: No SHA3-256 hash of the assembled 7645-byte info field is provided. The info field mixes raw and length-prefixed fields in a specific order; a field-encoding error (wrong prefix size, swapped field order, missing length prefix on "lo-kex-v1") shifts all subsequent bytes and produces a final root_key/epoch_key mismatch with no intermediate signal. A reimplementer can verify their info assembly independently: compute SHA3-256 of the assembled info bytes before passing to HKDF and compare against a trusted build — this check is not provided in-document but is the diagnostic step to run when F.11/F.12 root_key or epoch_key diverge.

F.12 KDF_KEX with OPK (§5.4 Step 4)

ss_ik:            1111111111111111111111111111111111111111111111111111111111111111
ss_spk:           2222222222222222222222222222222222222222222222222222222222222222
ss_opk:           3333333333333333333333333333333333333333333333333333333333333333
ikm:              ss_ik || ss_spk || ss_opk  (96 bytes)

salt, info:       identical to F.11

root_key:         c308b84238e8b73424b88d5e24ac6e4e0e5a0bfe047b5620fc9811f368ec0be1
epoch_key:        35d3ddd0b464faa3663e92041cebf2bcd8db593b5b0ebae75e7f02a24631ea2c

The IKM concatenation order (ss_ik || ss_spk || ss_opk) is critical — any reordering produces a different session key.

F.13 encode_session_init (§7.4)

crypto_version:         "lo-crypto-v1"  (6c6f2d63727970746f2d7631)
sender_ik_fingerprint:  0xAA × 32
recipient_ik_fingerprint: 0xBB × 32
sender_ek:              0xCC × 1216
ct_ik:                  0x11 × 1120
ct_spk:                 0x22 × 1120
spk_id:                 0x000000DD (u32 big-endian)
has_opk:                0x00 (absent)

Encoded layout (3543 bytes):
  [0..2]        000c                    len(crypto_version)
  [2..14]       6c6f2d63727970746f2d7631  crypto_version
  [14..46]      AA × 32                 sender_ik_fingerprint (no length prefix — fixed size)
  [46..78]      BB × 32                 recipient_ik_fingerprint (no length prefix — fixed size)
  [78..1294]    CC × 1216               sender_ek (no length prefix — fixed size)
  [1294..1296]  0460                    len(ct_ik) = 1120
  [1296..2416]  11 × 1120              ct_ik
  [2416..2418]  0460                    len(ct_spk) = 1120
  [2418..3538]  22 × 1120              ct_spk
  [3538..3542]  000000dd               spk_id (u32 big-endian, no length prefix)
  [3542]        00                      has_opk

SHA3-256(encoded): e45e05fb2d4218d1cd2f660491cd026ceec187ea7e3048908aa0f37681c36a9c

The SHA3-256 hash provides a quick verification that the encoding is correct — compute the hash of your serialized output and compare before attempting signature or AAD construction. Note: spk_id is a 4-byte big-endian u32 (not 32 bytes), and it follows ct_spk (not sender_ek). The has_kem_ct field does not exist in encode_session_init — it is part of encode_ratchet_header only.

With OPK (4669 bytes):

(Same fields as above through spk_id)
ct_opk:       0x33 × 1120
opk_id:       0x000000EE (u32 big-endian)

Encoded layout (4669 bytes):
  [0..3542]     (identical to no-OPK variant through spk_id)
  [3542]        01                      has_opk
  [3543..3545]  0460                    len(ct_opk) = 1120
  [3545..4665]  33 × 1120              ct_opk
  [4665..4669]  000000ee               opk_id (u32 big-endian)

SHA3-256(encoded): 230d711bebc95875ee9d7e3bd4a56c0cf7e5f34a52a453ec498326b489af7dcc

The OPK block format is: 0x01 || len(ct_opk) || ct_opk || opk_id. Note that opk_id follows ct_opk (not spk_id) — the two key IDs are not adjacent. A reimplementer who places opk_id immediately after spk_id (before ct_opk) will produce an incompatible encoding.

Missing rejection vectors for decode_session_init: No negative-case KATs are provided for decode_session_init. The following inputs MUST return InvalidData and are the primary decoder-strictness checks: (1) has_opk = 0x02 (any byte other than 0x00/0x01 for the OPK flag); (2) trailing bytes after the last field (a conforming no-OPK blob with one extra byte appended); (3) mid-ciphertext truncation (e.g., 3541 bytes — truncated one byte before the end of ct_spk). A decoder that accepts (1) passes format-malleability; one that accepts (2) creates a decoding oracle; one that silently succeeds on (3) produces a ct_spk/spk_id parsing shift. These rejection behaviors are normative requirements in §7.4 but are not covered by existing test vectors.

F.14 encode_ratchet_header (§7.4)

Without KEM ciphertext (same-chain message, 1225 bytes):

ratchet_pk:   0xAA × 1216
has_kem_ct:   0x00 (absent)
n:            42 (0x0000002A)
pn:           10 (0x0000000A)

Encoded layout:
  [0..1216]    AA × 1216               ratchet_pk (no length prefix — fixed size)
  [1216]       00                      has_kem_ct
  [1217..1221] 0000002a               n (u32 big-endian)
  [1221..1225] 0000000a               pn (u32 big-endian)

SHA3-256(encoded): 71d0bf62f50a1fff7b27b0825426e3ae29b52e2e335940caeb46a485ec73e1bf

With KEM ciphertext (new-epoch message, 2347 bytes):

ratchet_pk:   0xAA × 1216
has_kem_ct:   0x01 (present)
kem_ct:       0xBB × 1120
n:            42 (0x0000002A)
pn:           10 (0x0000000A)

Encoded layout:
  [0..1216]    AA × 1216               ratchet_pk (no length prefix — fixed size)
  [1216]       01                      has_kem_ct
  [1217..1219] 0460                    len(kem_ct) = 1120
  [1219..2339] BB × 1120              kem_ct
  [2339..2343] 0000002a               n (u32 big-endian)
  [2343..2347] 0000000a               pn (u32 big-endian)

SHA3-256(encoded): 99588b3b8b7539dc864443b16741f642a963207b66eb59058fe5f1729b180ed2

F.15 KDF_Call (§6.12)

root_key:   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
kem_ss:     bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
call_id:    cccccccccccccccccccccccccccccccc
local_fp:   1111111111111111111111111111111111111111111111111111111111111111
remote_fp:  2222222222222222222222222222222222222222222222222222222222222222

ikm:        kem_ss || call_id  (48 bytes, no length prefixes)
info:       "lo-call-v1" || fp_lo || fp_hi  (74 bytes)
            fp_lo = local_fp (0x11... < 0x22...), fp_hi = remote_fp

key_a:      ed75d812373c9b3bf6bddd394a631950520503f103b492fb908621eb712b5970
key_b:      c3e5171534e0d1f922ea4ebf318357b990eafb0fff45d8cf430639a1fe2bb1e4
chain_key:  1427dde311aaa195b116cc98c870753179297981446d3b53e00a4a92a0d34aeb

Construction: HKDF-SHA3-256(salt=root_key, ikm=kem_ss||call_id, info="lo-call-v1"||fp_lo||fp_hi, len=96). Fingerprints sorted by unsigned byte-by-byte lexicographic comparison (lower first). Output: first 32 bytes = key_a, next 32 = key_b, last 32 = chain_key.

Reversed-order coverage: In this vector, local_fp (0x11...) < remote_fp (0x22...) so local_fp is fp_lo and remote_fp is fp_hi. To test the reversed sort branch (where local_fp > remote_fp), swap the two fingerprints: set local_fp = 0x22... and remote_fp = 0x11.... The info field must be identical (sorting produces the same fp_lo = 0x11..., fp_hi = 0x22...), so key_a, key_b, and chain_key are the same as above — but the role assignment reverses: the party whose local_fp = 0x22... now uses key_b as send key (not key_a), because they are the lexicographically higher party. A reimplementer who tests only the non-reversed case passes this vector and misses a sort-direction bug that would produce incompatible role assignments.

F.16 AdvanceCallChain (§6.12)

chain_key:  1427dde311aaa195b116cc98c870753179297981446d3b53e00a4a92a0d34aeb

key_a':     9cf3129c6bb7ad86cb12ffc534517a4c06a472fbcddbe295a501c79aa49800e1
key_b':     f24cd7822fd611159a6e6d809c6ac148fd7b9bad65d8b4f85745869634b2dd1e
chain_key': d3ae610c39cd9f7f8dce990b5c91634092ad0621fc01b44b24b2cb9f3638d0f2

Construction: key_a' = HMAC-SHA3-256(chain_key, 0x04), key_b' = HMAC-SHA3-256(chain_key, 0x05), chain_key' = HMAC-SHA3-256(chain_key, 0x06). Each HMAC input is a single byte.

F.17 DM Queue AAD (§11.4.2)

version:       01
flags:         00 (uncompressed)
recipient_fp:  0xAA × 32
batch_id:      "batch-001"  (62617463682d303031)

aad: 6c6f2d646d2d71756575652d763101000020aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa000962617463682d303031
     (61 bytes)

Construction: "lo-dm-queue-v1" || version || flags || len(recipient_fp) || recipient_fp || len(batch_id) || batch_id. Note: len(recipient_fp) = 0020 (2-byte BE for 32 bytes) — recipient_fp is length-prefixed despite being fixed-size, for wire format consistency with community AAD (§11.4.1).

F.18 First-Message AAD (§5.4 Step 7)

sender_fp:    0xAA × 32  (raw SHA3-256, not hex)
recipient_fp: 0xBB × 32
session_init: F.13 no-OPK encoding (3543 bytes)

aad = "lo-dm-v1" || sender_fp || recipient_fp || encode_session_init(si)
      (3615 bytes: 8 + 32 + 32 + 3543)

SHA3-256(aad): 091a81dbff776e4a81d34ce22f7cd7efeaf225cd40bbf5f9f49825fd5c462ac7

The sender and recipient fingerprints are the raw 32-byte SHA3-256 digests (§2.1), not hex strings. On the decrypt side (§5.5 Step 6), the AAD is assembled as "lo-dm-v1" || Alice.fingerprint_raw || Bob.fingerprint_raw || session_init_bytes — the same order. This is the most common first integration failure: passing hex-encoded fingerprints (64 bytes) instead of raw (32 bytes), or swapping sender/recipient positions.

F.19 Ratchet Message AAD (§6.5)

sender_fp:      0xAA × 32  (raw SHA3-256, not hex)
recipient_fp:   0xBB × 32
ratchet_header: F.14 no-KEM-ct encoding (1225 bytes)

aad = "lo-dm-v1" || sender_fp || recipient_fp || encode_ratchet_header(h)
      (1297 bytes: 8 + 32 + 32 + 1225)

SHA3-256(aad): eaec65b7ac6d8e3912bacf1ed40429ab5005f33550c1d6e0231844fecac6a93e

Direction asymmetry: When Alice encrypts, sender_fp = Alice.fingerprint_raw and recipient_fp = Bob.fingerprint_raw. When Bob decrypts the same message, he assembles AAD with sender_fp = Alice.fingerprint_raw (the message author) and recipient_fp = Bob.fingerprint_raw (himself) — the same order. A reimplementer who uses (local_fp, remote_fp) for both encrypt and decrypt gets the wrong AAD on one side.

F.19b Ratchet Message AAD — New Epoch (§6.5)

sender_fp:      0xAA × 32  (raw SHA3-256, not hex)
recipient_fp:   0xBB × 32
ratchet_header: F.14 with-KEM-ct encoding (2347 bytes)

aad = "lo-dm-v1" || sender_fp || recipient_fp || encode_ratchet_header(h)
      (2419 bytes: 8 + 32 + 32 + 2347)

SHA3-256(aad): 25e46f405c91fb21aef5f7cd719d19b36d3edc030edaedf488f5624c02e4c854

This is the most common reimplementation failure path — the with-KEM-ct header (2347 bytes) appears in every new-epoch message, and incorrect KEM ciphertext length-prefix encoding (e.g., omitting it, using little-endian, or using the wrong ciphertext size) silently produces a different AAD hash with no diagnostic.

F.20 Argon2id (§10.6)

password (21 bytes): 746573742d70617373776f72642d736f6c69746f6e  ("test-password-soliton")
salt     (16 bytes): 736f6c69746f6e2d73616c742d766563              ("soliton-salt-vec")
m_cost:  65536 (KiB, = Argon2Params::RECOMMENDED)
t_cost:  3
p_cost:  4
version: 0x13 (19, Argon2id v1.3)
output_len: 32

output (32 bytes): 79f1dce60c8371a21f849470848c40dc1589deb5119cd3c4f26298c3f17ac3cf

Verified against the reference C implementation (argon2 CLI, libargon2 20190702) and the argon2 Rust crate used by soliton. Key implementation pitfall: m_cost is in KiB (not bytes); passing 65536 bytes instead of 65536 KiB produces a different output silently (Argon2 accepts any m_cost ≥ 8 × p_cost). The p_cost parameter specifies lanes (degree of parallelism); some wrappers accept a separate "threads" parameter — for this vector, lanes = threads = 4.

F.21 Ratchet Blob Layout (§6.8)

The ratchet blob is too large for a full hex dump (3,847 bytes for Alice's minimal initial state due to X-Wing keys) but the structural layout is the primary reimplementation hazard. This annotated offset table describes Alice's initial state after init_alice + one to_bytes() call, with no OPK, no previous epoch, and no recv_seen entries:

Offset  Size    Field                          Value (this vector)
------  ------  -----------------------------  -------------------
0       1       version                        0x01
1       8       epoch (u64 BE)                 1
9       32      root_key                       [session-dependent]
41      32      send_epoch_key                 [session-dependent]
73      32      recv_epoch_key                 0x00 * 32 (always all-zero for Alice initial — hard-coded in init_alice; not session-dependent)
105     32      local_fp                       SHA3-256(Alice.IK_pub)
137     32      remote_fp                      SHA3-256(Bob.IK_pub)

--- Optional: send_ratchet_sk (present) ---
169     1       present flag                   0x01
170     2       length (u16 BE)                0x0980 (2432)
172     2432    X-Wing secret key              [session-dependent]

--- Optional: send_ratchet_pk (present) ---
2604    1       present flag                   0x01
2605    2       length (u16 BE)                0x04C0 (1216)
2607    1216    X-Wing public key              [session-dependent]

--- Optional: recv_ratchet_pk (absent in Alice initial) ---
3823    1       present flag                   0x00

--- Optional: prev_recv_epoch_key (absent) ---
3824    1       present flag                   0x00

--- Optional: prev_recv_ratchet_pk (absent) ---
3825    1       present flag                   0x00

--- Counters ---
3826    4       send_count (u32 BE)            0x00000001 (1)
3830    4       recv_count (u32 BE)            0x00000000 (0)
3834    4       prev_send_count (u32 BE)       0x00000000 (0)

--- Flags ---
3838    1       ratchet_pending                0x00

--- recv_seen set ---
3839    4       num_recv_seen (u32 BE)         0x00000000 (0)
                (entries would follow as u32 BE, sorted ascending)

--- prev_recv_seen set ---
3843    4       num_prev_recv_seen (u32 BE)    0x00000000 (0)
                (entries would follow as u32 BE, sorted ascending)

Total: 3847 bytes

Key reimplementation hazards:

Optional field encoding: present fields use 0x01 + u16_BE_length + data; absent fields use a single 0x00 byte. Exception: prev_recv_epoch_key uses 0x01 + 32_bytes (no length prefix) since the size is always exactly 32 bytes. Present-case byte sequence: 01 XX XX...XX (33 bytes, where XX × 32 is the key). A decoder that inserts a 2-byte length prefix after the 0x01 marker misaligns all subsequent fields by 2 bytes. See §6.8 for a full worked example.
X-Wing key sizes: secret key = 2432 bytes (32 X25519 + 2400 ML-KEM-768), public key = 1216 bytes (32 X25519 + 1184 ML-KEM-768). These are not the same sizes as draft-09 (which uses ML-KEM-first ordering and different key representations).
recv_seen entries: sorted strictly ascending, each u32 BE. No entry may equal u32::MAX. Each entry must be < recv_count. No test vector exercises a non-empty recv_seen or prev_recv_seen — both F.21 vectors show the empty case (num_recv_seen = 0). To independently verify the encoding: a recv_seen set containing {0x00000001, 0x00000003} would serialize as 00 00 00 02 (count=2) followed by 00 00 00 01 00 00 00 03 (two u32 BE values in ascending order). A {0x00000003, 0x00000001} insertion order MUST produce the same ascending-sorted bytes — the sort is by value, not insertion order.
send_count = 1 in Alice initial: init_alice sets send_count = 1 directly (§6.2) because counter 0 was consumed by encrypt_first_message. The first ratchet encrypt() call uses counter 1 and increments send_count to 2. A reimplementer who initializes send_count = 0 and expects the first encrypt() to set it to 1 produces a nonce collision with the first-message counter.
Absent recv_ratchet_pk with recv_count = 0: Valid for Alice's initial state (hasn't received anything). Guard 3 prevents recv_count > 0 with absent recv_ratchet_pk.

Bob's initial state (after init_bob + one to_bytes() call, i.e., after receiving Alice's session init and calling decrypt_first_message):

Offset  Size    Field                          Value (this vector)
------  ------  -----------------------------  -------------------
0       1       version                        0x01
1       8       epoch (u64 BE)                 1
9       32      root_key                       [session-dependent]
41      32      send_epoch_key                 0x00 * 32 (all-zero placeholder; replaced on first KEM ratchet step)
73      32      recv_epoch_key                 [session-dependent]
105     32      local_fp                       SHA3-256(Bob.IK_pub)
137     32      remote_fp                      SHA3-256(Alice.IK_pub)

--- Optional: send_ratchet_sk (absent — Bob hasn't sent yet) ---
169     1       present flag                   0x00

--- Optional: send_ratchet_pk (absent) ---
170     1       present flag                   0x00

--- Optional: recv_ratchet_pk (present = Alice's EK_pub from SessionInit) ---
171     1       present flag                   0x01
172     2       length (u16 BE)                0x04C0 (1216)
174     1216    X-Wing public key              [session-dependent = Alice's EK_pub]

--- Optional: prev_recv_epoch_key (absent) ---
1390    1       present flag                   0x00

--- Optional: prev_recv_ratchet_pk (absent) ---
1391    1       present flag                   0x00

--- Counters ---
1392    4       send_count (u32 BE)            0x00000000 (0)
1396    4       recv_count (u32 BE)            0x00000001 (1)
1400    4       prev_send_count (u32 BE)       0x00000000 (0)

--- Flags ---
1404    1       ratchet_pending                0x01 (Bob's first send triggers a KEM ratchet step)

--- recv_seen set ---
1405    4       num_recv_seen (u32 BE)         0x00000000 (0)
                (empty — counter 0 was consumed by decrypt_first_message,
                 outside the ratchet; do NOT seed with {0})

--- prev_recv_seen set ---
1409    4       num_prev_recv_seen (u32 BE)    0x00000000 (0)

Total: 1413 bytes (vs Alice's 3847)

The size difference is entirely due to send_ratchet_sk (2432 bytes) and send_ratchet_pk (1216 bytes) being absent on Bob's side: Bob hasn't sent a ratchet message yet and therefore has no send-side ratchet key. Reimplementers whose first test is "Bob receives and serializes" will see a dramatically smaller blob than Alice's layout — this is correct, not a bug. Note also: send_epoch_key is all-zero in Bob's initial state (the epoch key for Bob's sending direction is a placeholder, set to a real value by the first KEM ratchet step when Bob sends his first message), while recv_epoch_key is the actual key Bob received during decrypt_first_message.

A reimplementer should round-trip their own serialization/deserialization against these offsets before attempting a cross-implementation session.

Test path for from_bytes → ChainExhausted: To test the ChainExhausted error path from deserialization (guard 24, §6.8), take any valid ratchet blob and replace the 8 bytes at the epoch field offset with 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF (u64::MAX in big-endian). Pass the modified blob to soliton_ratchet_from_bytes (or from_bytes_with_min_epoch with any min_epoch). The function MUST return ChainExhausted (-15), NOT InvalidData (-17). A deserializer that returns InvalidData for this input misclassifies a serialization-exhausted-but-recoverable state as corrupted data, causing the caller to permanently discard a session that could have been handled differently (§12 mode (3)). The epoch field is at bytes 1-8 of the blob (immediately after the 1-byte version tag — there is no reserved field; version occupies offset 0, epoch occupies offsets 1-8, see F.21 layout). Verify the exact offset from the F.21 layout map before patching.

F.22 Streaming AEAD with Compression (§15)

Note: A byte-exact compressed streaming vector is not provided because zstd output is not guaranteed to be identical across implementations, compression levels, or library versions for the same input. The frame format (RFC 8878) is standardized, but the encoder's block-splitting, match-finding, and entropy coding decisions vary.

To validate the compression integration path:

Encrypt a known plaintext chunk with compress=true using your zstd encoder.
Verify the stream header has flags=0x01 (bit 0 set).
Decrypt using your zstd decoder — the recovered plaintext must match the original.
Cross-validate the AAD: identical to the uncompressed case (F.10) except flags=0x01. The flags byte appears in both the stream header (byte 1 of the 26-byte header) and in the per-chunk AAD (byte 13, immediately after the 12-byte "lo-stream-v1" label and the 1-byte version field). Using the same base nonce as F.10 and flags=0x01, the compressed chunk_index=0, tag_byte=0x00 AAD is:
```
6c6f2d73747265616d2d76310101101112131415161718191a1b1c1d1e1f2021222324252627000000000000000000
(47 bytes — identical to F.10 except byte 13 is 01 instead of 00)
```
Byte 13 is the flags byte; bytes 14-37 are the base_nonce; bytes 38-45 are the chunk index; byte 46 is the tag_byte. A reimplementer who propagates flags=0x01 to the stream header but uses flags=0x00 in the per-chunk AAD (or vice versa) will produce a ciphertext that their own implementation cannot decrypt — the AAD mismatch causes AEAD failure immediately, with no diagnostic pointing to the flag inconsistency.
Verify that a chunk compressed with flags=0x01 is rejected when decrypted with flags=0x00 (wrong AAD), and vice versa.

Compressed + non-empty caller_aad: F.10 covers uncompressed + non-empty caller_aad; F.22 above covers compressed + empty caller_aad. The combination compressed + non-empty caller_aad is exercised by substituting flags=0x01 into the F.10 non-empty-caller_aad entry. Using the same inputs as F.10 (chunk_index=2, tag_byte=0x01, caller_aad="file-abc-123") but with flags=0x01:

aad: 6c6f2d73747265616d2d76310101101112131415161718191a1b1c1d1e1f202122232425262700000000000000020166696c652d6162632d313233
     (59 bytes — identical to F.10 chunk_index=2 entry except byte 13 is 01 instead of 00)

A reimplementer who correctly handles each dimension separately but adds a spurious length prefix to caller_aad only when it is non-empty (a plausible mistake given that all other AAD fields are fixed-size) would pass all F.10 and F.22 vectors and only fail here, manifesting as an AEAD mismatch in cross-implementation testing.

Empty final chunk with compress=true — AAD still uses flags=0x01: A stream initialized with compress=true that ends with an empty final chunk (is_last=true, 0-byte plaintext) bypasses compression per §15.5, but the per-chunk AAD still uses flags=0x01 — the stream's compression configuration, not the per-chunk outcome. Using the same base nonce as F.10 and flags=0x01, the empty final chunk (chunk_index=0, tag_byte=0x01) AAD is:

6c6f2d73747265616d2d76310101101112131415161718191a1b1c1d1e1f2021222324252627000000000000000001
(47 bytes — identical to F.10 step 4 except byte 13 is 01 (flags=compressed) and byte 46 is 01 (tag_byte=final))

A reimplementer who writes flags=0x00 for this chunk ("compression was bypassed, so this chunk's flag should be 0") produces an AAD mismatch and immediate AEAD failure on decrypt, with no diagnostic pointing to the flag inconsistency.

The critical interop property is not the compressed byte sequence but the encrypt-then-decrypt round-trip and the AAD binding of the flags byte.

F.23 Storage Blob Wire Format (§11.1)

The storage blob has a fixed-layout header followed by an AEAD-protected body. No fabricated ciphertext bytes are included — the test for this section is structural validation of the header and the AAD construction, not a known-answer ciphertext output.

Wire layout:

Offset  Size  Field
------  ----  -----
0       1     version (u8) — storage key version, 1-255; 0 is reserved (AeadFailed via keyring miss — NOT InvalidData; §11.1 prohibits a pre-AEAD version-0 check as an oracle)
1       1     flags (u8)   — bit 0 = FLAG_COMPRESSED (0x01); bits 1-7 reserved (AeadFailed if set)
2       24    nonce        — 192-bit random; unique per encrypted blob
26      ≥16   ciphertext + Poly1305 tag — XChaCha20-Poly1305 AEAD output

Minimum blob: 42 bytes (1 + 1 + 24 + 16). A valid blob with zero plaintext bytes still carries a full 16-byte Poly1305 tag.

AAD binding (§11.4): Both version and flags are included in the AAD passed to AEAD. The AEAD operation covers bytes [26..] with AAD constructed from version, flags, channel_id, and segment_id. Neither version nor flags are inside the ciphertext — an implementation that omits them from the AAD produces a malleable blob (see §11.1).

Decryption read path:

Assert len(blob) >= 42; else AeadFailed (per §12 oracle-collapse table — a sub-42-byte blob returns AeadFailed, not InvalidData or InvalidLength, to prevent an oracle distinguishing "too short to contain valid ciphertext" from "plausible-length blob with wrong key/tag").
Read version = blob[0], flags = blob[1], nonce = blob[2..26].
Reject unknown flag bits as AeadFailed. (version == 0 is not a separate early reject: version 0 is never loaded into the keyring, so step 4's key-not-found lookup returns AeadFailed — the same outcome as any other unrecognized version.)
Look up encryption key by version; if absent → AeadFailed (not UnsupportedVersion — §11.3 oracle).
Reconstruct AAD from version, flags, channel_id, segment_id (§11.4.1 or §11.4.2).
Decrypt blob[26..] with XChaCha20-Poly1305 using nonce and AAD.
If compressed (flags & 0x01), decompress with zstd.

F.8 covers the AAD construction with a full worked example. Combine F.8 with the wire layout above to validate a complete storage encrypt/decrypt round-trip.

F.24 ML-DSA-65 Seed-to-Public-Key (§2.1 Cross-Library Verification)

This vector verifies the §2.1 portable cross-library verification procedure: extract candidate 32 bytes, call ML-DSA.KeyGen_internal(candidate), compare the resulting public key against the known public key.

Input: seed ξ = [0xAA] × 32 (32 bytes, all 0xAA)

seed (32 bytes, hex):
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Expected output: ML-DSA-65 public key (1952 bytes). Computed via MlDsa65::from_seed(ξ).verifying_key().encode() using the ml-dsa Rust crate (version used by soliton — see Cargo.lock). ML-DSA.KeyGen_internal(ξ) runs FIPS 204 Algorithm 1 deterministically with no CSPRNG input.

public_key (1952 bytes, hex):
2a3cd553791045a9363393c3f720866028e048bf598a099e8f81043491fb7095
71fe64caa83e93bac8c1c931d1a148aa8d04a37d42c6cfdaebc0638c7fff12b7
0ab0d76c209239bc4bbdfb36d2a676098792a1f9a5fd388292300e416e693633
1026274889fc21b80701d39a2c564f11417a20c2be4dae84ab0743071bb97bf8
c65c4520a6fd5c4e48e759bfac3e857a5fc23de915dee91d3fe6e83b5230ec28
d77b478c4831d19bf26e697abf5c890f527e6fef6f0499b69a490af5dc6e5e43
3b1c168ba9e9e51aab125d0927d1aaa5cc17cd649b6a5ca83418b163d9dd487c
fcbdcebb7d6386ed26ff22f4cdd329dfce0de2667d1809401a649cdcf4c4232b
06abcaa82c2d8277a12622045de61d224ac6913b488d885822b2c1a7e5c1be41
61eed1c5a79da7d86738e4d77740591090216554246cc12aa89ebd9c024e054a
1a9fc28d18b6263dd95cd9e5e50a28a615742f1c43a1326bb004f9fe0856672a
7e7873226d222f949032c17f3a13e7a9a1812f496cfa88d1261bde89a7d8117f
cd1e7fa50cc26072d516613cb75f457b7f7681f9b5c58c0fa13be6fc56ec446f
5c1347b62cde77c950d368fc63329a35f6584bfd74ae769fafdb7982601be1ad
d10816d57b85a647b7bf772a21d56453303b67825c58a9f71b0fb644b6ce6351
2dcd054ae0dc5995abed531098a1235c757ceada7e643004530173eccb2e2d3f
114a578cd7cca8304a4cab4fe39e206a089193f2566ff811da3eca6e634594e9
b330b85f10fb50304b997d387189aa121746aba38897c1691ca2fe590e2f12e1
bfb84106b043dcb8e8ec7009d8247cab028b90e792b9d186f20ed3e6ec0dd419
b54f953572ade2e144c3bade312cbe92d52e7c8ed350af61c24848dfa4686f30
fecb15b25e5618797e78add739e542b725f517fdd0ab4084a5d4da81bfe6e226
72b5f8817be017a28674e97d0f7f8410e7bb7257ab5131e1b56ca21cc7c57b75
c4a5b05992971f46d2648675b829ed71bfb49b5dfba39c071ac95cbe42d9dfc1
1bb81ad316e7656b55dba3f8a5786c050607d355791d5406c9e21c99a6ba2763
44eb1e755c8d83344a344ad5ea149051b91729c7cafdf5252d5a766ede05ad9e
1ef06e5dbf7de24486155caa2e92275d54f8c2df4e85f29605b975a9c2bbe775
f33761fc05d894a0834f96f5355cb63b83f0e90e5b5111bdca71611c93df96db
72a1db4723fdf4184c7f62f1e3efca954a772667effc9b553b7ab91c644cdbfb
a15c5f5c9e6b38e4df2ade1dd0b098739c47b39f5520eb2f584d0e353f90eddd
20320800545eec44c51f6c41618dc1451041ecb958351e2ef04a5fc13a7195c9
bc0a397944da82bdfa7ac46aeb05bee813944b25e66b311263f9d0f3d9bb6f5e
242a53b2ab9322eacf70388bd5be0ad4990ae9d7e3abaca428ce50c6eb35c9ee
0ced604ac17db0443b2ad1fe6d9bd9397457f2ce0f5e8665d9acd96b924344ad
3bc45cc0cf392d48b2e4dabbb07da0e2ba5561d346a952bd20054d035a4ff378
a4108dc0092ee25b40be9056a235aaf9aa314874351c99ec0bcbfa7e9bb6b1bd
e74fa506f863008058482be9fcb67449b2c2566b6d011985c4311fc5f1551bfc
a25699123a68e2a790cf2388120f28ff411836a3e95a97c6f60633b7cc27bea6
5f9323abb9731b222e28db4765748a3bfcdb962f0e290025e3b28b5642d891cd
dc09f1489aea45eee7f57ff231be716204192fbcf5fb574bae1d7e1e6e039bfe
f7dcd93162c093c11a80c2ab5b127a4f214ebb03dae20dcc38afa246320ee8e4
4a2ca97fd265fc7813ddd5efbacb981f401fbaa895e60a58a6e7d44bc1a17873
d523bc2659256e0e73eec555a6f7c799c74902e0ddb593c8d76937623feb0bcc
ee816d42841cf7935383435ba4fc4d4b6d36235da237fc2dcdd6d0e774157616
503a174417cd325e2c3ebe0b520d94f6f4c18a8d01a2daa087ce0fce85e61aa4
f126664d073773f8a927ed8c4d4d9724449af637a8e3a8bc15ddf2ec19f9f5b2
d0ad84dbf2a59fce072b132ba38bf5d985a966cabce4cc5f31a87101b56eb7e6
2b37e3fd0091afb9c8063cba5184234ceb4938313440678d4dd6c9eafa88d986
6e56fe75fa063396a66be4833ebef1e0b25ced5d5f5bceb26b7cb4775e792926
58c81b2405d2b488a0a881a2589749acbd0af912308aff5450d87dde8a0ee25b
7fee2a7d3b76e5237bfe6fc890f009438e539e1719864958c2bf3b63e43fae41
e591b53a8fbac2ec3f37d5b74a6d2e83b9a1e050ffc082e415e39288d51fabbb
1c791c2ccccef44e6f9c2886c506e561ad372cc20b691aba206d14d007518c3f
4b7aeabfe836f5c55fd65fb85d09948b652e1156983678ab6dffe739ca888614
3fe630b2a10741aa81d6a79fdfd3d144f2ce43d39ad5cd55e42d4f6deef1f406
2fd03aa83676a0a945dbd702e5f8c111c84d74c3d5a53d72a426c8ca5bc3f4f2
a4226d2efe1c1a476fd65d69a2d85216213108b45e5567bcba7f9ac9c73d7173
21561d56589eafa13f49fbeaa1ad47a3f2cb4f4f64f8b2055ea5968035a12b34
f0735981e2f3ebf50c51ba7e1f3e1f9b0ab892eb90e04d3a4d5e924b4280fb5f
e9e018f9d0bec7b53a097986abc61f6b3c9ef97dc30e97b4841cd1d64303646b
15b4fc5f99e97af00bc205a5c53097f572f0914fbc706c7164e1564396bfaa7d
ac3531d2c109a62c16ef9e81b49dbd91d7669bf5cf2ff875539b2ee691215114

How to use: A reimplementer who cannot call to_seed() on their ML-DSA library (e.g., because the library does not expose the 32-byte seed) uses the procedure in §2.1: extract the candidate 32 bytes from whatever API the library provides, call ML-DSA.KeyGen_internal(candidate) (FIPS 204 §6.1, deterministic — no CSPRNG input), and compare the resulting 1952-byte public key against the known public key. If the result matches this vector for ξ = [0xAA] × 32, the candidate extraction is correct. Note: this vector uses sign_internal / verify_internal (not the standard FIPS 204 domain-separated API — see §2.1 module doc); the public key encoding itself is standard FIPS 204 pkEncode format and is byte-for-byte comparable with any compliant ML-DSA-65 implementation.

F.25 Standalone HKDF-SHA3-256 Primitive (§5.4, §6.4)

Purpose: Validates HKDF-SHA3-256 in isolation before any composed operation. The primary interop trap: RFC 5869 test vectors use HMAC-SHA-256 (64-byte HMAC block size), not HMAC-SHA3-256 (136-byte block size — §4.3). A reimplementer who pads HMAC inputs to 64 bytes instead of 136 produces wrong output in every derived key with no diagnostic.

salt (32 bytes): 0000000000000000000000000000000000000000000000000000000000000000
ikm  (64 bytes): 0101010101010101010101010101010101010101010101010101010101010101
                 0101010101010101010101010101010101010101010101010101010101010101
info (15 bytes): "lo-test-hkdf-v1"  (raw UTF-8, no length prefix)
len:             64

output (64 bytes):
  4a694c255636bd5a472c807cf1400a05f78a4a3e93b7f663dd6825c9d496904c
  6224e025169b8c67e62ed3b10129da39c546d6e84c84920f69232fd8e76e7cf0

Verified by tests/compute_vectors.rs::f25_hkdf_sha3_256 in the reference implementation.

F.26 Standalone XChaCha20-Poly1305 Primitive (§3, §6.5)

Purpose: Validates the AEAD primitive directly: (key, nonce, plaintext, aad) → ciphertext || tag. Isolates AEAD bugs — wrong key/nonce byte ordering, wrong AD binding, wrong tag placement — before full session integration.

key       (32 bytes): 0202020202020202020202020202020202020202020202020202020202020202
nonce     (24 bytes): 030303030303030303030303030303030303030303030303
plaintext (11 bytes): "hello world"  (raw UTF-8)
aad       (15 bytes): "lo-test-aead-v1"  (raw UTF-8)

ciphertext (11 bytes): 356c4d3352734de8f25fe3
tag        (16 bytes): 91c8f97e537cf5c7d3f07d2b03388f77

wire output (27 bytes, ciphertext || tag):
  356c4d3352734de8f25fe391c8f97e537cf5c7d3f07d2b03388f77

Verified by tests/compute_vectors.rs::f26_xchacha20_poly1305 in the reference implementation.

soliton_aead_decrypt error boundary — InvalidLength vs AeadFailed for undersized ciphertext: The standalone AEAD function has an asymmetry compared to ratchet/stream decrypt; this is not shown in a success vector but is documented here for completeness.

`ciphertext_len`	`soliton_aead_decrypt` return
0	`InvalidLength` (-1) — the CAPI zero-length guard fires before any AEAD operation
1-15	`AeadFailed` (-4) — non-zero length passes the CAPI guard; too short to contain a 16-byte Poly1305 tag, so AEAD authentication fails
≥ 16, wrong key/tag	`AeadFailed` (-4) — AEAD authentication failure
≥ 16, correct	0 (success)

Contrast with ratchet/stream: soliton_ratchet_decrypt and soliton_stream_decrypt_chunk return AeadFailed for ALL undersized inputs including len = 0 (oracle-collapse requirement, §12). soliton_aead_decrypt with len = 0 returns InvalidLength — the CAPI zero-length guard fires first. A binding author who tests only the success path and then applies the ratchet/stream AeadFailed pattern to soliton_aead_decrypt diverges from the reference for the len = 0 case.

F.27 HybridSign / HybridVerify (§3.1)

Purpose: Validates the composite signature layout (Ed25519 || ML-DSA-65 concatenation) and the Sign_internal / Verify_internal internal-API requirement (not the FIPS 204 public API). Non-determinism is eliminated by pinning ML-DSA rnd = [0x00] × 32 — this is test-only; production signing uses fresh getrandom entropy.

Identity secret key sub-components (used to construct 2496-byte SK):
  X-Wing sk (bytes    0..2432): [0x01] × 2432  (not used for signing)
  Ed25519 seed (bytes 2432..2464): [0x02] × 32
  ML-DSA-65 seed (bytes 2464..2496): [0x03] × 32

message (15 bytes): "lo-test-sign-v1"  (raw UTF-8)
ML-DSA rnd (32 bytes, test-only): [0x00] × 32

Ed25519 signature (bytes 0..64):
  21aafa2d66a4774e163064717412a2694527c84cdc57e93370ba05738940bdd0
  facc5cb6330088ce849635ac41a0099842a40ef82cb0046f6978eeb7196be00f

ML-DSA-65 signature (bytes 64..3373): [3309 bytes — see reference test for full hex]

composite signature (3373 bytes total):
  Ed25519 component: 21aafa2d66a4774e163064717412a2694527c84cdc57e93370ba05738940bdd0
                     facc5cb6330088ce849635ac41a0099842a40ef82cb0046f6978eeb7196be00f
  ML-DSA-65 component starts: 1b47e0e18a96f465b42396b24a77f72f...
  ML-DSA-65 component ends:   ...000000000000000000000000000000000000000000000000000a1316161b1e

Full 6746-char hex is in EXPECTED_F27 in tests/compute_vectors.rs. Verified by tests/compute_vectors.rs::f27_hybrid_sign_verify which also runs hybrid_verify against the assembled composite. In-document verification limitation: The 3309-byte ML-DSA-65 component is too large to embed in full; no SHA3-256 hash of the ML-DSA-65 component is provided inline. Standalone verifiers who cannot access the repository must compute SHA3-256(composite[64..3373]) from their own implementation output and cross-check against a trusted build, rather than comparing against an in-document hash.

Missing partial-failure vectors for HybridVerify: No vectors are provided for the two partial-failure cases: (1) Ed25519 component valid + ML-DSA-65 component corrupted (e.g., byte 64 flipped); (2) ML-DSA-65 component valid + Ed25519 component corrupted (e.g., byte 0 flipped). §3.2 requires that HybridVerify evaluates BOTH components in constant time before combining the results — a && short-circuit that returns early on the first failure leaks which component failed. Partial-failure vectors would verify this by checking that the function returns VerificationFailed for both cases. Reimplementors MUST verify their HybridVerify does not short-circuit: evaluate Ed25519.Verify() AND MLDSA.Verify() independently, then combine with & (not &&).

F.28 Streaming AEAD End-to-End Wire Vector (§15)

Purpose: Provides complete wire bytes for a two-chunk stream. The primary interop trap: tag_byte is outside the AEAD call (used in AAD and XORed into the nonce), while the 16-byte Poly1305 tag is inside. A reimplementer who passes tag_byte as plaintext produces a different wire format — passes all nonce/AAD checks but fails AEAD on the receiver.

Non-final chunk size note: This vector uses 16-byte plaintext for the non-final chunk (chunk 0) for compactness. In production, soliton_stream_encrypt_chunk with is_last=false requires the plaintext to be exactly CHUNK_SIZE (1,048,576 bytes) — a non-final chunk whose plaintext is not exactly CHUNK_SIZE is rejected with InvalidData (-17) by the core library (not InvalidLength — InvalidLength fires only when the output buffer is too small; the constraint on plaintext size is a semantic content check, which maps to InvalidData). The F.28 vector was computed at the primitive level (direct AEAD calls with the correct nonce and AAD, bypassing the CAPI chunk-size guard), so the wire bytes are protocol-correct. A reimplementer building streaming AEAD MUST enforce the same constraint: non-final chunks must be full size (1 MiB), and only the final chunk may be smaller. The AEAD itself does not enforce this — it will accept any size — so the guard must be in the framing layer.

key        (32 bytes): 0404040404040404040404040404040404040404040404040404040404040404
base_nonce (24 bytes): 050505050505050505050505050505050505050505050505
flags:       0x00  (no compression)
caller_aad:  empty

header (26 bytes):
  01 00 050505050505050505050505050505050505050505050505
  hex: 0100050505050505050505050505050505050505050505050505

chunk 0 — non-final (tag_byte=0x00), plaintext=[0x41]×16:
  nonce mask = chunk_index(8B) || tag_byte(1B) || 0x00×15(15B) = 24 bytes total
             = 0000000000000000 || 00 || 000000000000000000000000000000
  nonce = base_nonce XOR (all-zero mask) = base_nonce (unchanged)
  aad   = "lo-stream-v1" || 0x01 || 0x00 || base_nonce || 0000000000000000 || 0x00
  wire (33 bytes): 00d5425e7085cc776bc8c608ad84c41cc37eefb10d2b859ebddf8c1187c616c0c4
    tag_byte:   00
    ciphertext: d5425e7085cc776bc8c608ad84c41cc3
    tag:        7eefb10d2b859ebddf8c1187c616c0c4

chunk 1 — final (tag_byte=0x01), plaintext=[0x42]×8:
  nonce mask = chunk_index(8B) || tag_byte(1B) || 0x00×15(15B) = 24 bytes total
             = 0000000000000001 || 01 || 000000000000000000000000000000
  nonce = base_nonce XOR mask
        = 0505050505050504 || 04 || 050505050505050505050505050505
        (8 B)               (1 B) (15 B — 30 hex chars)
        flat (48 hex chars): 050505050505050404050505050505050505050505050505
  aad   = "lo-stream-v1" || 0x01 || 0x00 || base_nonce || 0000000000000001 || 0x01
  wire (25 bytes): 01aac61cb7b722895cb246433e7ebc081e92150081150d345d
    tag_byte:   01
    ciphertext: aac61cb7b722895c  (8 bytes — matches 8-byte plaintext)
    tag:        b246433e7ebc081e92150081150d345d  (16 bytes — Poly1305 tag)

full stream hex (84 bytes):
  010005050505050505050505050505050505050505050505050500d5425e7085cc776bc8c608ad84c41cc37eefb10d2b859ebddf8c1187c616c0c401aac61cb7b722895cb246433e7ebc081e92150081150d345d

Verified by tests/compute_vectors.rs::f28_streaming_aead_wire in the reference implementation.

F.29 Argon2id + XChaCha20-Poly1305 Passphrase-Protected Key Blob (§10.6)

Purpose: End-to-end vector for the §10.6 recommended composition: salt(16) || nonce(24) || AEAD_ciphertext. F.20 covers Argon2id output in isolation; this covers the full assembly where the derived key feeds directly into XChaCha20-Poly1305 with the identity fingerprint as AAD. The easy mistake: using the wrong AAD (empty, or the salt, or something other than the identity fingerprint) or inserting an extra HKDF step between Argon2id output and the AEAD key — both produce incompatible ciphertext with no error at encryption time.

password (18 bytes): "lo-test-passphrase"  (raw UTF-8)
argon2_salt (16 bytes): 06060606060606060606060606060606
argon2 params: OWASP_MIN (m=19456 KiB, t=2, p=1)

aad / fingerprint (32 bytes): SHA3-256([0x00] × 3200)
  = 1fc29a619ef720eaf2966023f1d22c797a31a7ad6c9fd94b7fb28dfff94c5e4b

derived_key (32 bytes, Argon2id output):
  2058fdb73306ec7271061be269fccaf39756b8666248172d6923976e377f5d30

aead_nonce (24 bytes): 070707070707070707070707070707070707070707070707
plaintext  (17 bytes): "test-key-material"  (raw UTF-8)

ciphertext || tag (33 bytes):
  f90394fa7144500a63da86ca3ff6d900f855314f4c9030ab88b060a0ab41b9eede

blob (73 bytes, salt || nonce || ciphertext || tag):
  06060606060606060606060606060606
  070707070707070707070707070707070707070707070707
  f90394fa7144500a63da86ca3ff6d900f855314f4c9030ab88b060a0ab41b9eede

hex: 06060606060606060606060606060606070707070707070707070707070707070707070707070707f90394fa7144500a63da86ca3ff6d900f855314f4c9030ab88b060a0ab41b9eede

Verified by tests/compute_vectors.rs::f29_passphrase_key_blob in the reference implementation.

F.30 `from_bytes_with_min_epoch` Rejection Boundary (§6.8)

Purpose: Verify the strict > boundary in anti-rollback deserialization. The condition is epoch > min_epoch — equal epoch is rejected. This is the boundary that prevents replaying the current epoch's blob (not only older ones).

Test procedure: Obtain any valid ratchet blob. Read the epoch value from bytes 1-8 (u64 big-endian, immediately after the 1-byte version tag — see F.21 layout). Let N be the deserialized epoch.

blob epoch N

from_bytes_with_min_epoch(blob, N - 1) → Ok     ← epoch N > min_epoch N-1 ✓
from_bytes_with_min_epoch(blob, N)     → InvalidData (-17)  ← epoch N ≤ min_epoch N (equal, not strictly greater)
from_bytes_with_min_epoch(blob, N + 1) → InvalidData (-17)  ← epoch N < min_epoch N+1

Patching for boundary testing: To produce a blob with a specific epoch without running a full session, take any valid blob and overwrite bytes 1-8 with the desired epoch as u64 big-endian. Epoch = 1 → 00 00 00 00 00 00 00 01. The blob must otherwise be valid (pass from_bytes guards) — patch only the epoch field.

Off-by-one hazard: A reimplementer who uses >= instead of > accepts the current epoch's blob, defeating rollback protection for the common case where the adversary replays the most recent serialized state.

Error code: InvalidData (-17) on rejection. Not InvalidLength, UnsupportedVersion, or any other variant — those would let the caller misclassify a rollback attempt as a format error.

F.31 Stream Header — Compressed Stream (§15.2)

Purpose: Pin the 26-byte stream header wire encoding for flags=0x01 (compressed). The flags byte distinguishes compressed from uncompressed streams and is also bound into every per-chunk AAD (F.10, F.22) — a reimplementer who misplaces or omits it in either the header or the AAD produces unreadable ciphertext.

Using the F.10 base_nonce (101112131415161718191a1b1c1d1e1f2021222324252627):

version (1 byte):   01
flags   (1 byte):   01  (bit 0 = compressed)
base_nonce (24 bytes): 101112131415161718191a1b1c1d1e1f2021222324252627

header (26 bytes): 0101101112131415161718191a1b1c1d1e1f2021222324252627

The header is a concatenation with no length prefix, no delimiter. Any reimplementer who inserts a 1-byte or 2-byte length prefix before version, or who encodes flags as a 2-byte field, shifts all subsequent byte offsets and causes a parse failure at the first chunk.

F.32 Streaming AEAD Random-Access Byte Offset (§15.3)

Purpose: Confirm the byte_offset(N) formula for random-access decryption. A reimplementer computing the wrong chunk stride cannot seek correctly and will either read garbage or fail AEAD authentication on every chunk beyond the first.

Parameters: chunk_size = 1 MiB = 1,048,576 bytes.

chunk wire size = 1 (tag_byte) + 1,048,576 (ciphertext) + 16 (AEAD tag)
               = 1,048,593 bytes

byte_offset(N) = 26 + N × 1,048,593

byte_offset(0) =          26  ← first chunk starts immediately after 26-byte header
byte_offset(1) =   1,048,619  ← 26 + 1,048,593
byte_offset(2) =   2,097,212  ← 26 + 2 × 1,048,593
byte_offset(N) =  26 + N × 1,048,593

The 26 addend is the fixed stream header size (version=1, flags=1, base_nonce=24). A reimplementer who omits the header from the offset (using N × 1,048,593 directly) will seek 26 bytes too early on every chunk except chunk 0, producing AEAD failure.

The final chunk may be shorter than 1 MiB; byte_offset(N) gives the start of chunk N regardless of preceding chunk lengths only when all preceding chunks are full-size (1 MiB). Random access to a non-final chunk in a variable-chunk-size stream requires an index. The stride formula applies exclusively to fixed-1-MiB-chunk streams.

F.33 HMAC-SHA3-256 with Long Key (§3.2)

Purpose: Discriminates SHA-2 and SHA3-256 HMAC implementations at the block-size boundary. SHA3-256's HMAC block size is 136 bytes; SHA-2-256's is 64 bytes. A 100-byte key falls above the SHA-2 threshold (forcing key hashing to 32 bytes in a SHA-2 implementation) but below the SHA3-256 threshold (padding the key to 136 bytes without hashing). All existing HMAC vectors use 32-byte keys, which lie below both thresholds and cannot expose this mismatch.

key   (100 bytes): AB × 100
data  (10 bytes):  "lo-hmac-v1"  (ASCII)

MAC (32 bytes): aa5575019f7aade135d379d92699d13d62cded9208869f9c9898d687d93ae293

A SHA-2 implementation would hash the 100-byte key to 32 bytes before XOR-padding. A SHA3-256 implementation pads 100 bytes to 136 bytes by appending zeros — no preliminary hashing. Both produce distinct MAC values for this key length; a reimplementer who produces the same MAC as above for all existing 32-byte-key vectors but fails here has used the wrong hash function in their HMAC.

Verified by tests/compute_vectors.rs::f33_hmac_sha3_256_long_key.

F.34 SHA3-256 of First-Message AAD with OPK (§5.4 Step 7)

Purpose: Hashing the full AAD to a 32-byte value provides a fixed-size discriminator for the 4,741-byte with-OPK AAD structure. A reimplementer who omits ct_opk from the encoding, reverses the ct_opk/opk_id order, or omits the "lo-dm-v1" label prefix produces a different hash.

Inputs (synthetic, fixed):

sender_fingerprint   (32 bytes): AA × 32
recipient_fingerprint (32 bytes): BB × 32
crypto_version: "lo-crypto-v1"
sender_ek   (1216 bytes): CC × 1216
ct_ik       (1120 bytes): DD × 1120
ct_spk      (1120 bytes): EE × 1120
spk_id: 42 (0x0000002A)
ct_opk      (1120 bytes): FF × 1120
opk_id: 7  (0x00000007)

AAD wire layout: "lo-dm-v1"(8) || sender_fp(32) || recipient_fp(32) || encode_session_init(4669) = 4741 bytes total.

SHA3-256(first_message_aad with OPK):
  ba8e4c4ffb1330f47e5ca95a63671970036a1f3d07934836548efa0403e84815

Verified by tests/compute_vectors.rs::f34_first_message_aad_with_opk.

F.35 HybridSign over SPK Message (§5.3)

Purpose: Domain label vector for SPK signing. The signed message is "lo-spk-sig-v1" || SPK_pub. A reimplementer who uses the wrong label (e.g., "lo-kex-init-sig-v1") or signs only SPK_pub without the label produces a composite signature that hybrid_verify rejects.

Inputs (same synthetic identity key as F.27):

Ed25519 seed:   02 × 32
ML-DSA-65 seed: 03 × 32
ML-DSA-65 rnd:  00 × 32  (test only — production uses getrandom)
message: "lo-spk-sig-v1" (13 bytes) || CC × 1216  (total 1229 bytes)

Output: Ed25519 sig (64 bytes) || ML-DSA-65 sig (3309 bytes) = 3373 bytes total.

composite[0..64]   (Ed25519): 2856bb008aa260e6b541ead779730ad350d97feb39db4829cb4ef5520979f3c3820bda50d51fec0e16ae1b7bb2cba8016ab389222c51b46af1fa223914ad8a01
composite[64..3373] (ML-DSA-65): 93759b7f59dd...  (see EXPECTED_F35 in compute_vectors.rs)

Full 6746-character hex in tests/compute_vectors.rs::EXPECTED_F35. Verified by tests/compute_vectors.rs::f35_hybrid_sign_spk. In-document verification limitation: The ML-DSA-65 component is truncated; no inline SHA3-256 hash is provided. Standalone verifiers must compare SHA3-256(composite[64..3373]) from their implementation against a trusted build.

F.36 HybridSign over `encode_session_init` (§5.4 Step 6)

Purpose: Domain label vector for session-init signing. The signed message is "lo-kex-init-sig-v1" || encode_session_init(si). A reimplementer who signs the raw SessionInit fields directly (instead of the encoded form) or who uses the SPK label produces a different composite signature.

Inputs (same synthetic identity key as F.27; synthetic SessionInit without OPK):

Ed25519 seed:   02 × 32
ML-DSA-65 seed: 03 × 32
ML-DSA-65 rnd:  00 × 32  (test only)
SessionInit: crypto_version="lo-crypto-v1", sender_fp=AA×32, recipient_fp=BB×32,
             sender_ek=CC×1216, ct_ik=DD×1120, ct_spk=EE×1120, spk_id=42,
             ct_opk=None, opk_id=None

encode_session_init output: 3543 bytes (no-OPK path)
message: "lo-kex-init-sig-v1" (18 bytes) || si_encoded (3543 bytes) = 3561 bytes

Output: Ed25519 sig (64 bytes) || ML-DSA-65 sig (3309 bytes) = 3373 bytes total.

composite[0..64] (Ed25519): c53f65e56414c595257a2e7233b91b5c52f2da83edc9c6245c63091dc83815c4c72fc53db16e5bd658826641c15e5dc33397e85b4447bff11213eb4273376c03
composite[64..3373] (ML-DSA-65): 397e85b4...  (see EXPECTED_F36 in compute_vectors.rs)

Full 6746-character hex in tests/compute_vectors.rs::EXPECTED_F36. Verified by tests/compute_vectors.rs::f36_hybrid_sign_session_init. In-document verification limitation: The ML-DSA-65 component is truncated; no inline SHA3-256 hash is provided. Standalone verifiers must compare SHA3-256(composite[64..3373]) from their implementation against a trusted build.

F.37 LO-Auth HMAC Token Derivation (§4)

Purpose: The LO-Auth proof token is HMAC-SHA3-256(shared_secret, "lo-auth-v1"). This vector isolates the HMAC step from the KEM by using a synthetic shared secret. The KEM round-trip is covered by X-Wing unit tests; the label and key/data order are the reimplementation risks addressed here.

shared_secret (32 bytes): 08 × 32
label (10 bytes): "lo-auth-v1"

token = HMAC-SHA3-256(key=shared_secret, data=label):
  4e14e7ab92b70dd587a558e208cbcd98fd933048a2b2bf90e188e1d9b04f6e2a

A reimplementer who swaps key and data (HMAC(key=label, data=ss)) or who uses a different label (e.g., "lo-auth" without the version suffix) produces a different 32-byte value. The token must be compared constant-time using hmac_sha3_256_verify_raw — never with ==.

Verified by tests/compute_vectors.rs::f37_lo_auth_hmac.

F.38 Streaming AEAD `UnsupportedVersion` Rejection (§15.8)

Purpose: Verify that stream_decrypt_init (and soliton_stream_decrypt_init at the CAPI) returns UnsupportedVersion when the stream header's version byte is not 0x01. This is a rejection boundary test — no shared secret is produced.

The 26-byte stream header format is version(1B) || flags(1B) || base_nonce(24B) (§15.2 / F.31). Version 0x01 is the only currently defined version; any other byte triggers UnsupportedVersion.

Test inputs (any valid key and a header with a non-0x01 version byte):

key (32 bytes):                0404040404040404040404040404040404040404040404040404040404040404
header — version=0x00 (26 bytes): 0000050505050505050505050505050505050505050505050505
  version byte: 00  (not 0x01)
  flags:        00
  base_nonce:   050505050505050505050505050505050505050505050505

header — version=0x02 (26 bytes): 0200050505050505050505050505050505050505050505050505
  version byte: 02  (not 0x01)
  flags:        00
  base_nonce:   050505050505050505050505050505050505050505050505

Expected result for both inputs: stream_decrypt_init returns UnsupportedVersion (-10). No decryptor object is created.

Additional rejection inputs: any header with version byte in [0x00, 0x02..0xFF] must produce UnsupportedVersion. Version 0x01 with any valid flags byte and nonce produces Ok.

Reimplementation check: A reimplementer who validates only that the version byte is non-zero (instead of exactly 0x01) will accept version 0x02 silently. A reimplementer who skips version validation entirely will attempt to parse future-format streams with current-version rules, producing wrong decryption output with no error.

Verified by the decrypt_init_wrong_version (version=0x00) and decrypt_init_version_0x02 (version=0x02) tests in soliton/soliton/src/streaming.rs #[cfg(test)].

F.39 Missing Vectors — Acknowledged Gaps

The following vectors are not provided in-document. Each represents an integration failure mode that existing vectors do not cover. Reimplementors SHOULD add these as integration tests against the reference implementation.

F.39.1 First-Message Encrypt/Decrypt End-to-End KAT (§5.4 Step 7, §5.5 Step 5)

No vector combines F.11's epoch_key + a 24-byte random nonce + F.18's first-message AAD + a known plaintext into a complete encrypted first message with expected ciphertext. The primary integration failure mode — Alice and Bob deriving different AAD values — produces AeadFailed on Bob's side with no diagnostic pointing to the AAD divergence. To add this vector: run encrypt_first_message(epoch_key, plaintext, aad) with a pinned nonce and record the 24-byte nonce + ciphertext output; the corresponding decrypt_first_message call with the same inputs must reproduce the plaintext.

F.39.2 encode_prekey_bundle (§5.3)

No encode_prekey_bundle KAT is provided (with or without OPK). F.13 covers encode_session_init; the bundle format is structurally different (no sender fingerprints, no KEM ciphertexts). Field ordering and the absence of length prefixes on IK_pub, SPK_pub, and SPK_sig are the primary reimplementation hazards. To add these vectors: call encode_prekey_bundle with known key material and record the SHA3-256 of the encoded output for both the OPK-present and OPK-absent cases.

704 KiB Raw Permalink Blame History Unescape Escape

Soliton Cryptographic Specification

1. Overview

1.1 Design Philosophy

1.2 Primitives (lo-crypto-v1)

1.3 Backend

1.4 Notation

1.5 Channel 2 Scope (Metadata Exposure)

2. LO Composite Key

2.1 Key Generation

2.2 Component Extraction

2.3 X-Wing Operations

3. Hybrid Signatures

3.1 Signing

3.2 Verification

3.3 Security Properties

3.4 Where Signatures Are Used

4. KEM-Based Authentication

4.1 Purpose

4.2 Protocol

4.3 Security Properties

4.4 Requirements

4.5 Error Variants

5. LO-KEX: KEM-Based Key Agreement

5.1 Goals

5.2 Key Material

5.3 Pre-Key Bundle

5.4 Session Initiation (Alice → Bob)

Step 1: Verify Pre-Key Bundle

Step 2: Generate Ephemeral Key

Step 3: KEM Encapsulations

Step 4: Derive Session Key

Step 5: Construct Session Init

Step 6: Sign Session Init

Step 7: Encrypt First Message

5.5 Session Reception (Bob)

Step 1: Resolve Alice's Identity

Step 2: Validate OPK Co-Presence

Step 3: Verify Initiator Signature

Step 4: Decapsulate

Step 5: Derive Session Key

Step 6: Decrypt First Message

Step 7: Initialize Ratchet State

5.6 Security Analysis

6. LO-Ratchet

6.1 Overview

6.2 State

send_ratchet_sk dual role

6.3 Counter-Mode Message Key Derivation

6.4 KEM Ratchet Step

6.5 Message Encryption

6.6 Message Decryption

6.7 Duplicate Detection

6.7.1 Worked Example: Four-Message Exchange

6.8 Ratchet State Serialization

6.9 Implementation Notes

6.10 Session Reset

6.11 Bandwidth

6.12 Voice Call Key Derivation

Call Setup Protocol

Signaling Messages

Intra-Call Rekeying

Security Properties

6.13 Design Rationale: Per-Epoch vs Per-Message Forward Secrecy

7. Symmetric Encryption

7.1 XChaCha20-Poly1305

7.2 Nonce Construction

7.3 AAD Construction

7.4 Deterministic Header Encoding

8. X-Wing KEM Details

8.1 Encoding (LO-specific)

8.2 Combiner (draft-09 §5.3)

8.3 Low-Order X25519 Points

8.4 ML-KEM Implicit Rejection

8.5 Secret Key Storage

9. Verification Phrases

9.1 Purpose

9.2 Algorithm

9.2.1 Error Summary

9.3 Properties

704 KiB

Raw Permalink Blame History