# Soliton Cryptographic Specification

## 1. Overview

Companion to LO Protocol Specification v1. Specifies all cryptographic protocols for authentication, key agreement, message encryption, signatures, and storage.

### 1.1 Design Philosophy

- **Unified key type**: Identity = LO composite (X25519 + ML-KEM-768 + ML-DSA-65). Pre-keys = X-Wing (X25519 + ML-KEM-768).
- **KEM-native**: Key agreement via KEM, not Diffie-Hellman.
- **Hybrid everything**: Classical + post-quantum for encryption and signatures.
- **Header-bound AAD**: All DM ciphertext authentication binds the full message header, preventing header tampering.
- **Memory-safe C ABI**: Rust core library with a stable C ABI (`soliton_capi`). All language bindings call through this ABI.
- **Versioned primitives**: Crypto version tag on all key material and sessions.

### 1.2 Primitives (lo-crypto-v1)

| Primitive | Algorithm | Reference |
|-----------|-----------|-----------|
| Hybrid KEM | X-Wing (X25519 + ML-KEM-768) | draft-connolly-cfrg-xwing-kem-09 |
| Classical KEM | X25519 | RFC 7748 |
| Post-quantum KEM | ML-KEM-768 | FIPS 203 |
| Classical signature | Ed25519 | RFC 8032 |
| Post-quantum signature | ML-DSA-65 | FIPS 204 |
| KDF | HKDF-SHA3-256 | RFC 5869 |
| Hash | SHA3-256 | FIPS 202 |
| Symmetric | XChaCha20-Poly1305 | RFC 8439 + HChaCha20 extension |
| MAC | HMAC-SHA3-256 | RFC 2104 |
| Password KDF | Argon2id | RFC 9106 |
| Storage compression | Zstandard (zstd) | RFC 8878 |

### 1.3 Backend

The core library is pure Rust with zero C dependencies:

| Crate | Algorithms |
|-------|-----------|
| `curve25519-dalek` | X25519 (RFC 7748) |
| `ed25519-dalek` | Ed25519 signing/verification (RFC 8032) |
| `ml-kem` | ML-KEM-768 (FIPS 203) |
| `ml-dsa` | ML-DSA-65 (FIPS 204) |
| `chacha20poly1305` | XChaCha20-Poly1305 (RFC 8439 + HChaCha20) |
| `sha3` | SHA3-256 |
| `hmac`, `hkdf` | HMAC-SHA3-256, HKDF-SHA3-256 |
| `ruzstd` | Zstandard compression/decompression (§11, pure Rust) |
| `argon2` | Argon2id password-based key derivation (RFC 9106, §10.6) |
| `getrandom` | CSPRNG (OS entropy: `getrandom(2)`, `ProcessPrng`, `getentropy`, etc.) |

All dependencies are exact-pinned. No C toolchain, cmake, or pkg-config required. Compiles with `cargo build` on any target including `wasm32-unknown-unknown`.

### 1.4 Notation

```
||          Concatenation of byte strings
x[a..b]    Half-open byte range: byte index a inclusive to b exclusive. x[0..32] selects 32 bytes (indices 0-31). Equivalent to x[a:b] in Python/Go, x.slice(a, b) in Rust, Arrays.copyOfRange(x, a, b) in Java. Programmers accustomed to inclusive-end notation must treat b as "one past the last index."
len(x)     Length of x in bytes, encoded as 2-byte big-endian (the prefix encodes the byte count of x itself; the 2-byte prefix is not included in the value)
big_endian_32(x)   4-byte big-endian encoding of unsigned 32-bit integer x. Not the same as len(x) — big_endian_32 always writes exactly 4 bytes and does not encode x as a length prefix.
HKDF(salt, ikm, info, len)   HKDF-SHA3-256 extract-and-expand (always both steps: RFC 5869 §2.2 Extract + §2.3 Expand. HKDF-Expand-only is never used.)
XWing.KeyGen()               → (pk, sk)
XWing.Encaps(pk)             → (ciphertext, shared_secret)
XWing.Decaps(sk, ct)         → shared_secret  // re-derives pk_X = X25519(sk_X, G) internally (the decapsulator's
                                               // OWN public key — NOT ct_X from the ciphertext; see §8.2 for the
                                               // most common X-Wing implementation error)
Ed25519.Sign(ed25519_sk, msg)  → sig (64 bytes)
Ed25519.Verify(ed25519_pk, msg, sig) → bool
MLDSA.Sign(sk, msg)          → sig (3309 bytes, hedged mode: FIPS 204 §6.2 Sign_internal with rnd=random(32))
                               // §6.2 is Sign_internal (the deterministic core); §5.2 is the external ML-DSA.Sign wrapper.
                               // sk is the 32-byte seed ξ — re-expanded per §8.5 before use. Not the 4032-byte expanded signing key.
MLDSA.Verify(pk, msg, sig)   → bool (Verify_internal per FIPS 204 §6.3 — see §3.1)
                               // §6.3 is Verify_internal (the deterministic core); §5.3 is the external ML-DSA.Verify wrapper.
HMAC-SHA3-256(key, data)     → 32-byte tag (first argument is always the HMAC key, second is the message)
AEAD(key, nonce, pt, aad)    → ciphertext || tag  (XChaCha20-Poly1305)
random_bytes(n)              → n cryptographically random bytes (OS CSPRNG via getrandom)
encode_session_init(h)       → deterministic binary (§7.4)
encode_ratchet_header(h)     → deterministic binary (§7.4)
SHA3-256(x)                  → 32-byte digest (FIPS 202; not Ethereum's Keccak-256, which uses 0x01 padding — FIPS 202 uses 0x06)
```

**Byte comparison convention**: All lexicographic comparisons of byte strings throughout this spec (fingerprint sorting in §6.12, §9.2; key sorting in §9.2) use unsigned byte-by-byte comparison. Languages with signed byte types (Java `byte`, some C `char` implementations) must cast to unsigned before comparison — signed comparison reverses the ordering for bytes ≥ 0x80, producing different sort results and silently wrong AAD or verification phrases.

### 1.5 Channel 2 Scope (Metadata Exposure)

This library fully protects **Channel 1** — the content and integrity of transmitted data: message confidentiality, authentication, forward secrecy, and replay prevention. It makes no guarantees about **Channel 2** — the structural metadata of communication: who communicates with whom, when, how often, and in what pattern. This is an explicit design boundary, not a gap.

The following information is observable to a passive network adversary (one who can intercept but not modify traffic) and is out of scope for all security properties claimed in this document:

**LO-KEX (§5)**
- **Bundle fetch**: the bundle relay server learns that party A intends to contact party B before any encryption begins.
- **Session initialization**: the `SessionInit` message reveals that two specific fingerprints are beginning a session to any observer who can intercept it.
- **Failed session attempts**: a responder that rejects a `SessionInit` (wrong crypto version, structural error) responds differently from one that never received it. An initiator can probe whether a party is online or running a specific version by observing response presence and timing. Probing resistance requires transport-layer measures outside this library's scope.

**LO-Ratchet (§6)**
- **Epoch transitions**: `pk_s` in the cleartext header changes at each KEM ratchet step — a network observer can determine when a direction change occurred and count how many ratchet steps have taken place.
- **Message position**: the counter `n` in the cleartext header reveals the message's position within the current epoch.
- **Previous epoch size**: `pn` reveals how many messages were sent in the preceding epoch.
- **Ciphertext length**: approximates plaintext length (compressed plaintext + 17-byte AEAD overhead). When compression is enabled, length leaks plaintext compressibility.

**LO-Auth (§4)**
- **Challenge issuance**: the challenge ciphertext is sent in cleartext; its issuance reveals that an authentication attempt is in progress between a specific client and server.

**Streaming AEAD (§11)**
- **Stream header**: `base_nonce`, `version`, and `flags` are transmitted in cleartext — their presence reveals that a stream is being established.
- **Chunk count**: the number of chunks is observable from the ciphertext stream structure.
- **Chunk sizes**: approximate plaintext chunk sizes (compressed size + 17-byte overhead per chunk).

**Designing for Channel 2 protection**: Applications requiring metadata privacy must add transport-layer measures on top of this library. Uniform message padding (all messages padded to fixed sizes) removes length leakage. Cover traffic removes frequency and timing leakage. Onion routing or a mix network removes connection-graph leakage. An encrypted transport tunnel wrapping LO-Ratchet output removes epoch-transition leakage from the header fields. These concerns are outside the scope of this library.

---

## 2. LO Composite Key

### 2.1 Key Generation

```
function GenerateIdentity():
    (xwing_pk, xwing_sk) = XWing.KeyGen()
    (mldsa_pk, mldsa_sk_expanded) = MLDSA.KeyGen()
    mldsa_sk = mldsa_sk_expanded.to_seed()    // Extract 32-byte seed ξ — NOT the 4032-byte expanded key (FIPS 204 §7.2, ML-DSA-65 sigKeySize)

    (ed25519_pk, ed25519_sk) = Ed25519.KeyGen()

    pk = xwing_pk || ed25519_pk || mldsa_pk   // 1216 + 32 + 1952 = 3200 bytes
    sk = xwing_sk || ed25519_sk || mldsa_sk   // 2432 (expanded X-Wing sk — see §8.5) + 32 + 32 = 2496 bytes

    fingerprint = hex(SHA3-256(pk))  // 32 bytes = 64 lowercase hex chars (a-f, 0-9)
    return (pk, sk, fingerprint)
```

**`MLDSA.KeyGen()` returns an expanded key — `to_seed()` extracts the 32-byte seed before storage**: Most ML-DSA library APIs return the fully expanded 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)-byte signing key as `mldsa_sk`. soliton stores only the 32-byte seed `ξ` (§8.5). The `to_seed()` step extracts `ξ` from the expanded form before assembly into the 2496-byte composite secret key. **Alternative (seed-first) pattern used by the reference implementation**: The reference does NOT call `MLDSA.KeyGen()` and then `to_seed()` — instead it generates `ξ = random_bytes(32)` directly from the OS CSPRNG and calls `ML-DSA.KeyGen_internal(ξ)` (FIPS 204 §6.1, a deterministic function of `ξ`) to obtain both the public key and the signing handle, without ever creating or discarding the expanded 4032-byte key. This is the pattern described in §2.1's "If your ML-DSA library does not expose ξ after `KeyGen()` at all" paragraph. Both patterns produce the same stored 32-byte seed and are cryptographically equivalent; the seed-first pattern is cleaner and avoids the `to_seed()` extraction step. The pseudocode above shows the `KeyGen() + to_seed()` pattern for generality; libraries that provide `KeyGen_internal(ξ)` or `from_seed(ξ)` constructors should use the seed-first pattern. A reimplementer who assembles `sk = xwing_sk || ed25519_sk || mldsa_sk_expanded` produces a 6496-byte secret key (2432 + 32 + 4032) — it will not parse as a valid identity secret key (2496-byte size check fails). The check in `IdentitySecretKey::from_bytes()` catches this immediately, so the failure is not silent. However, if a reimplementer writes their own construction without a size check, they may store the wrong form and only discover the mismatch at signing time.

**`XWing.KeyGen()` uses X25519-first key layout — LO diverges from draft-09**: The X-Wing secret key returned by `XWing.KeyGen()` is stored as `sk_X (32 bytes) ‖ dk_M (2400 bytes)` — X25519 component first, ML-KEM-768 expanded decapsulation key second. IETF draft-connolly-cfrg-xwing-kem-09 specifies the opposite order: `dk_M ‖ sk_X`. A reimplementer who follows draft-09's field ordering produces an incompatible 2432-byte secret key layout — `ExtractXWingPrivate` extracts the wrong bytes, decapsulation silently derives a wrong shared secret, and AEAD fails with no diagnostic pointing to the key-layout swap. The public key ordering is the same in both LO and draft-09 (X25519 public key first, ML-KEM-768 public key second). See §8.1 and §8.5 for the complete layout specification.

**ML-KEM-768 KeyGen requires two independent 32-byte entropy draws**: `XWing.KeyGen()` internally calls `ML-KEM.KeyGen` (FIPS 203 §7.1), which requires two independently-random 32-byte seeds `d` and `z`. The soliton reference implementation draws `d` and `z` independently from the OS CSPRNG (two separate `getrandom` calls). A reimplementer who derives both from a single seed (e.g., `d = HKDF(seed, "d", 32)`, `z = HKDF(seed, "z", 32)`) produces a non-conforming key — the ML-KEM security proof requires that `d` and `z` are independently uniform; deriving both from a common secret violates this requirement and may weaken the IND-CCA2 security of the KEM. There is no structural or size-based signal that detects this mistake: the resulting keypair is the correct size, and encapsulation/decapsulation succeed normally. A conformance test MUST verify that `d` and `z` are generated by separate CSPRNG calls, not derived from a shared value.

**Cross-library seed extraction is NOT portable via API name**: `to_seed()`, `signing_key.to_bytes()[0..32]`, `seed()`, `private_key_bytes()`, and similar method names are NOT equivalent across ML-DSA library implementations. In the Rust `ml-dsa` crate, `to_seed()` returns exactly `ξ` (the 32 bytes passed to `ML-DSA.KeyGen_internal`). In other libraries (BouncyCastle, Go's `circl`, liboqs), `to_bytes()[0..32]` may return the first bytes of the expanded signing key or a different internal representation — not the seed. **The only portable cross-library verification**: extract the candidate 32 bytes, call `ML-DSA.KeyGen_internal(candidate)` (FIPS 204 §6.1), and compare the resulting public key against the known public key. If they match, the candidate is `ξ`. Any candidate that does not round-trip to the known public key is not the seed, regardless of the API name used to extract it.

**If your ML-DSA library does not expose ξ after `KeyGen()` at all** (e.g., liboqs, BouncyCastle, and PQClean C bindings expose only the expanded key form with no seed accessor): generate `ξ = random_bytes(32)` yourself from the OS CSPRNG, then call `ML-DSA.KeyGen_internal(ξ)` (FIPS 204 §6.1) directly to obtain the public key. This produces a valid keypair with `ξ` as the seed, bypassing the library's opaque `KeyGen()` entirely. See §8.5 for the two-level API pattern (`KeyGen()` vs `KeyGen_internal(ξ)`) and for what `ML-DSA.KeyGen_internal` MUST consume (no CSPRNG input — it is a pure deterministic function of `ξ`).

The hex-encoded fingerprint is for display and user-facing comparison (§9). All wire-format fields (`sender_ik_fingerprint`, `recipient_ik_fingerprint`, `local_fp`, `remote_fp`) use the raw 32-byte SHA3-256 digest, not the 64-character hex string.

### 2.2 Component Extraction

```
function ExtractX25519Public(pk):     return pk[0..32]
function ExtractMLKEMPublic(pk):      return pk[32..1216]
function ExtractEd25519Public(pk):    return pk[1216..1248]
function ExtractMLDSAPublic(pk):      return pk[1248..3200]
function ExtractXWingPublic(pk):      return pk[0..1216]

function ExtractX25519Private(sk):    return sk[0..32]
function ExtractXWingPrivate(sk):     return sk[0..2432]  // sk_X(32) || dk_M(2400): X25519 scalar + ML-KEM-768 NTT-domain expanded decapsulation key — see §8.5
function ExtractEd25519Private(sk):   return sk[2432..2464] // 32-byte RFC 8032 seed s (RFC 8032 §5.1.5) — the raw random seed.
                                                              // NOT the SHA-512 hash of the seed, NOT the clamped scalar,
                                                              // NOT the 64-byte seed||public_key form (Go/libsodium default).
                                                              // ed25519_dalek::SigningKey::from_bytes() takes this exact form.
function ExtractMLDSAPrivate(sk):     return sk[2464..]     // 32 bytes (seed, NOT the 4032-byte expanded signing key (FIPS 204 §7.2, ML-DSA-65 sigKeySize) — see §8.5)
```

**ML-DSA secret key is a 32-byte seed, not the expanded form**: `ExtractMLDSAPrivate` returns a 32-byte seed (`ξ`), not the 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)-byte expanded signing key. The full expanded signing key is deterministically re-derived via `ML-DSA.KeyGen_internal(ξ)` (FIPS 204 §6.1) at signing time (§8.5). A reimplementer who stores the full expanded form produces a 6496-byte secret key (2432 + 32 + 4032) that is incompatible with soliton's 2496-byte layout (2432 + 32 + 32). `ExtractMLDSAPublic(pk)` returns the standard 1952-byte FIPS 204 `pkEncode` public key — no analogous storage divergence exists on the public side.

**ML-KEM-768 sub-key sizes within X-Wing**: The X-Wing components extracted by `ExtractMLKEMPublic(pk)` and `ExtractXWingPrivate(sk)` have fixed sub-structure (§8.1, §8.5): ML-KEM-768 public key (`ek_PKE`) = **1184 bytes** (bytes 32-1215 of the X-Wing public key); ML-KEM-768 expanded secret key (`dk_M`) = **2400 bytes** (bytes 32-2431 of the X-Wing secret key); ML-KEM-768 ciphertext = **1088 bytes** (bytes 32-1119 of the X-Wing ciphertext). A reimplementer hard-coding the wrong ML-KEM-768 sub-key sizes (e.g., 1184 bytes for the secret key, which is the public key size) produces decapsulation keys that fail silently at AEAD — see §8.5 for the full `dk_M` field layout.

**ML-KEM stores the full 2400-byte expanded decapsulation key; ML-DSA stores only the 32-byte seed**: §2.1 explains that ML-DSA stores only `ξ` (32 bytes) and re-expands at sign time via `ML-DSA.KeyGen_internal(ξ)`. A reimplementer might apply the same reasoning to ML-KEM — storing only a seed and re-deriving at decapsulation time. This does NOT work: FIPS 203 does not define a standard `ML-KEM.KeyGen_internal(seed)` equivalent that produces a deterministic decapsulation key from a single 32-byte seed in the way FIPS 204 §6.1 defines `KeyGen_internal(ξ)` for ML-DSA. The `ML-KEM.KeyGen` function takes two independent 32-byte values `d` and `z` (§2.1 above), and the expanded 2400-byte decapsulation key embeds both in expanded form (§8.5). There is no FIPS 203 pathway to regenerate the same 2400-byte `dk_M` from a shorter seed without storing `d`, `z`, and the expanded state separately — which is larger than the 2400-byte key itself. soliton stores the full `dk_M` (2400 bytes) in `ExtractXWingPrivate(sk)`. A reimplementer who stores only a seed produces a layout-incompatible secret key.

**`IdentitySecretKey::from_bytes` zeroizes the input buffer on the error path**: `from_bytes` wraps the input in `Zeroizing` immediately on entry, so if `InvalidLength` is returned (wrong-size input), the caller's buffer is zeroed before the error propagates. This is a side-effect of the Rust `Zeroizing` wrapper — not a documented caller contract — but reimplementers and binding authors should be aware that a rejected-size buffer is zeroed. Callers who read the input back after a failed `from_bytes` call will find it zero. This side-effect does not apply to `IdentityPublicKey::from_bytes` (public keys are not secret; no zeroization on error).

**Lazy validation**: `IdentityPublicKey::from_bytes()` validates only the total size (3200 bytes). It does not parse or validate the X-Wing, Ed25519, or ML-DSA sub-key structures — invalid sub-key bytes are accepted at construction and produce errors only at use time (`Encaps`, `HybridVerify`, etc.). For example, a 3200-byte all-zero input is accepted at construction; encapsulation fails at use time when ML-KEM rejects the zero key material during matrix expansion, and signature verification fails when Ed25519 rejects the all-zero point as a non-canonical encoding. This is intentional: sub-key validation requires algorithm-specific parsing (ML-KEM coefficient range checks, Ed25519 point decompression, ML-DSA matrix expansion), which is expensive and duplicated by the operations themselves. Reimplementers MUST NOT assume that a successfully constructed `IdentityPublicKey` contains valid sub-keys. The same applies to `IdentitySecretKey::from_bytes()` (validates total size only).

**Security note — identity key compromise**: The identity secret key contains independent components: an X-Wing secret key (for KEM) and a dedicated Ed25519 secret key (for signing). A compromise of `sk_IK` yields both KEM decapsulation and signature forgery capability. The X25519 scalar within X-Wing is used solely for KEM; it plays no role in signing.

### 2.3 X-Wing Operations

Encapsulation and decapsulation use only the X-Wing components (X25519 + ML-KEM-768). The ML-DSA component is not involved.

```
function Encaps(lo_pk):
    xwing_pk = ExtractXWingPublic(lo_pk)
    return XWing.Encaps(xwing_pk)

function Decaps(lo_sk, ciphertext):
    xwing_sk = ExtractXWingPrivate(lo_sk)
    return XWing.Decaps(xwing_sk, ciphertext)
    // Note: the X-Wing combiner (§8.2) requires pk_X — the decapsulator's own
    // X25519 public key — which is re-derived from sk_X on every call as
    // X25519(sk_X, G). It is NOT taken from the ciphertext. See §8.2.
```

---

## 3. Hybrid Signatures

All signatures in LO use Ed25519 + ML-DSA-65 in parallel. A signature is valid only if **both** components verify.

### 3.1 Signing

```
function HybridSign(lo_sk, message):
    ed25519_sk = ExtractEd25519Private(lo_sk)
    mldsa_sk = ExtractMLDSAPrivate(lo_sk)

    sig_classical = Ed25519.Sign(ed25519_sk, message)   // 64 bytes
    sig_pqc = MLDSA.Sign(mldsa_sk, message)            // 3309 bytes, hedged mode

    return sig_classical || sig_pqc                      // 3373 bytes total
```

**Domain labels are applied by callers, not inside HybridSign**: The `message` parameter passed to `HybridSign(lo_sk, message)` MUST already contain any domain-separation label. HybridSign performs no label prepending, wrapping, or modification of its own — it signs the raw bytes as provided. Domain labels (e.g., `"lo-spk-sig-v1"`, `"lo-kex-init-sig-v1"`) are concatenated at the call site before invoking HybridSign (examples: §5.3, §5.4 Step 6). A reimplementer who embeds label handling inside HybridSign produces signatures over different bytes than the call-site concatenation: every signature over a labeled message would silently double-apply the label, making all such signatures incompatible with conforming implementations. Concretely: `HybridSign(sk, "lo-spk-sig-v1" ‖ payload)` is correct; `HybridSign(sk, payload)` where HybridSign internally prepends `"lo-spk-sig-v1"` is incorrect.

**Byte layout**: The 3373-byte composite signature is a raw concatenation with no length prefixes, delimiters, or type markers. Ed25519 occupies bytes 0-63 (fixed 64 bytes per RFC 8032), ML-DSA-65 occupies bytes 64-3372 (fixed 3309 bytes per FIPS 204). A reimplementer who adds length prefixes or uses variable-length Ed25519 encodings (some libraries return `r || s || recovery_id`) produces incompatible signatures. Split at byte offset 64, unconditionally.

**`HybridSign` output is non-deterministic — do NOT compare two signatures byte-for-byte**: Two calls to `HybridSign(sk, same_message)` produce the same bytes 0-63 (Ed25519 is deterministic per RFC 8032), but **always different bytes 64-3372** (ML-DSA-65 uses hedged signing with fresh 32-byte randomness on each call). Byte-equality comparison of two `HybridSign` outputs is therefore always `false` for bytes 64-3372, even when both signatures are valid over the same message with the same key. Callers MUST use `HybridVerify` to check validity — never byte comparison. Systems that cache a signature (e.g., an SPK bundle signature) and later re-sign to compare will always see a mismatch.

**ML-DSA message is a single contiguous buffer**: `Sign_internal` and `Verify_internal` receive the full message as a single flat byte string — not a multi-part or streaming input. In Rust, the `ml-dsa` crate exposes `sign_internal` as `signing_key.sign_internal(&[message], &rnd)` where the outer slice is a `&[&[u8]]` of parts that are absorbed sequentially into the internal SHA3 state; soliton always passes a one-element slice containing the complete message buffer. A reimplementer whose ML-DSA library takes a `&[&[u8]]` multi-part interface for signing MUST pass the whole message as a single part — splitting it across multiple parts produces a different internal SHA3 hash state, resulting in incompatible signatures. The same applies to `Verify_internal`: if the library exposes a multi-part interface for verification (as some do, e.g., liboqs, BouncyCastle), the message MUST be passed as a single part. The `ml-dsa` Rust crate's verify path uses a flat `&[u8]`, but other libraries may not.

**ML-DSA internal API**: Both signing and verification use the internal functions (`Sign_internal` / `Verify_internal` per FIPS 204 §6.2/§6.3) — the context string and domain separator defined in the public API (FIPS 204 §5.1) are not applied. This is intentional: soliton's domain separation is handled at the protocol level (per-context labels in §3.4, Appendix A). Signatures produced by soliton are not compatible with standalone FIPS 204 verifiers that apply the public API wrapper. **Reimplementer warning**: ML-DSA libraries outside Rust (liboqs, PQClean, BouncyCastle, Go's `circl`) often expose only the public API (`ML-DSA.Sign` / `ML-DSA.Verify` per FIPS 204 §5.1), which prepends a domain separator byte (`0x00`) and context string before calling the internal functions. Passing an empty context string to the public API is NOT equivalent to calling `Sign_internal` — the public API unconditionally prepends the `0x00` domain separator byte even with an empty context. Reimplementers must either access the internal functions directly or verify that their library provides a bypass for the public API's context/domain-separator wrapping. Using the public API produces signatures that are silently incompatible with soliton. **`rnd` MUST be exactly 32 bytes**: FIPS 204 §6.2 defines `rnd` as a 256-bit string (32 bytes). Libraries that accept `rnd` as a variable-length slice do not validate the size — passing 16 or 24 bytes silently weakens the hedging entropy without any error signal. Implementations MUST generate exactly 32 random bytes and MUST NOT pass a shorter or longer buffer. The reference implementation uses `Zeroizing<[u8; 32]>` and passes it as a 32-byte slice; binding-layer callers MUST ensure their random-byte generation produces exactly 32 bytes.

**`rnd` MUST be freshly drawn from the OS CSPRNG for each individual `Sign_internal` call**: Pre-generating `rnd` once and reusing it across multiple signing calls — or batching calls so multiple signatures share the same `rnd` — defeats the hedge entirely: repeated `(message, rnd)` pairs with constant `rnd` produce the same internal randomness on all calls, reducing hedged signing to deterministic signing and re-enabling the fault-injection attacks the hedge defends against. Each `HybridSign` call MUST draw 32 fresh bytes independently.

**Hedged mode rationale**: Hedged signing combines deterministic signing with 32 bytes of fresh randomness (`rnd` parameter), preventing fault-injection attacks that exploit deterministic nonce generation to extract the signing key. The `rnd` buffer is ephemeral secret material and MUST be zeroized immediately after `Sign_internal` returns — leaking it reduces hedged signing to deterministic signing, re-enabling the fault-injection attacks the hedge defends against. In Rust, wrapping in `Zeroizing<[u8; 32]>` handles this automatically; in C/Go/Python, the caller must explicitly zeroize the buffer.

**Transient 4032-byte SigningKey zeroization**: Every `Sign_internal` call re-expands the 32-byte seed `ξ` into the full 4032-byte signing key (s₁, s₂, t₀, t₁ polynomials — §8.5). This transient 4032-byte signing key MUST be zeroized before deallocation. In Rust, the `ml-dsa` crate's `SigningKey` implements `ZeroizeOnDrop` — the expanded key is automatically zeroized when the local variable goes out of scope after `Sign_internal` returns. In C/Go/Python implementations that call ML-DSA at a lower level, the caller MUST explicitly call `memset_s` (C) or equivalent on the 4032-byte signing key buffer before freeing it. Note the asymmetry with `rnd` above: `rnd` (32 bytes) is documented explicitly because it is ephemeral entropy whose leakage restores a broken security property; the 4032-byte SigningKey obligation is handled automatically in Rust but is equally important for C/Go/Python reimplementers — a leaked expanded signing key permits arbitrary ML-DSA-65 forgery.

**Two secrets per `Sign_internal` call — summary for non-RAII implementations**: Each call to `HybridSign` produces exactly two secret temporaries that must be zeroized: (1) the 32-byte hedged `rnd` buffer (described above — leaking it re-enables fault-injection attacks); and (2) the 4032-byte expanded ML-DSA-65 signing key (described above — leaking it permits arbitrary forgery). Rust's `ZeroizeOnDrop` handles both automatically; C/Go/Python callers must zeroize both explicitly at each call site.

**`sign_internal` vs. `verify_internal` API asymmetry in the `ml-dsa` crate**: In the `ml-dsa` Rust crate, `sign_internal` takes a `&[&[u8]]` (multi-part message slice), while `verify_internal` takes a flat `&[u8]`. Soliton always passes a one-element slice to `sign_internal` (`signing_key.sign_internal(&[full_message], &rnd)`), so the API difference is transparent at the call site. Reimplementers using a different ML-DSA library must confirm whether their library's `Sign_internal` and `Verify_internal` both accept a single flat buffer or use multi-part interfaces — and ensure both call sites pass the full message as a single contiguous input. This asymmetry does not affect correctness when both are used with a single part: SHA3's sponge construction absorbs input sequentially, so `H(a)` equals `H_sponge.absorb(a).finalize()` regardless of how the buffer is chunked. The sponge invariant means that a multi-part signing interface receiving a single part produces the same internal hash as a flat interface receiving the same bytes — there is no chunk-boundary hazard when exactly one part is used.

### 3.2 Verification

```
function HybridVerify(lo_pk, message, signature):
    if len(signature) != 3373:
        raise InvalidLength   // Must check before slicing — a short input causes
                              // signature[64..3373] to panic or read out-of-bounds.
                              // Returns InvalidLength (not VerificationFailed) because
                              // the error is on a caller-supplied parameter size, not
                              // a cryptographic failure.
                              //
                              // Typed-language note: a reimplementation that uses a
                              // `HybridSignature` wrapper type enforcing the 3373-byte
                              // invariant at construction time (e.g., via `from_bytes`)
                              // satisfies this check at the type level — `hybrid_verify`
                              // itself need not repeat it. An auditor comparing the typed
                              // implementation against this pseudocode should treat the
                              // type-constructor check as conformant with the inline guard
                              // shown here.

    ed25519_pk = ExtractEd25519Public(lo_pk)
    mldsa_pk = ExtractMLDSAPublic(lo_pk)

    sig_classical = signature[0..64]
    sig_pqc = signature[64..3373]

    ok_classical = Ed25519.Verify(ed25519_pk, message, sig_classical)
    ok_pqc = MLDSA.Verify_internal(mldsa_pk, message, sig_pqc)

    // BOTH must pass. Both verifications are evaluated eagerly (no short-circuit).
    // The AND combination MUST be constant-time (e.g., subtle::Choice or equivalent
    // bitwise AND) — a naive boolean && or branch on ok_classical leaks which
    // component failed via timing, enabling targeted forgery of only the weaker component.
    // Eagerness and constant-time AND are JOINT requirements — either alone is insufficient:
    //   - Eager evaluation without CT AND: both calls run, but a branch on the combined result
    //     still leaks whether the result is true or false via timing.
    //   - CT AND without eager evaluation: the bitwise AND is constant-time, but only computing
    //     ok_pqc when ok_classical is true leaks that Ed25519 passed via a timing side-channel.
    // The correct implementation evaluates BOTH verify calls unconditionally, then combines
    // the results with a bitwise AND (or equivalent constant-time operation) and branches only
    // on the combined boolean — not on either individual result.
    return ok_classical AND ok_pqc
```

**Signature size validation**: Before slicing, callers MUST verify `len(signature) == 3373`. Passing a shorter input causes the slice `signature[64..3373]` to panic or read out-of-bounds (language-dependent). More critically: some ML-DSA libraries return an error (not `false`) when given a wrong-size input. If that error propagates as a distinct failure mode rather than being collapsed to `false` before the AND combination, it breaks the constant-time AND requirement — the caller can distinguish "wrong size" from "right size but invalid" via timing or exception, leaking a distinguishing oracle. Any library error on a bad-size ML-DSA input MUST be treated as `false` for the AND combination, not propagated as a distinct exception.

**`HybridVerify` returns `InvalidLength` for a wrong-size composite signature**: When `len(signature) ≠ 3373`, `HybridVerify` returns `InvalidLength` before any slicing or cryptographic operation. This differs from the sub-component failure mappings (Ed25519 key import → `VerificationFailed`, ML-DSA signature decode → `VerificationFailed`), which fire during the verification operation itself. The top-level composite size check fires before any verification-layer operation begins and returns `InvalidLength` — the error is not oracle-exploitable because the attacker crafted the input and already knows whether its length is correct. A reimplementer who collapses this to `VerificationFailed` for consistency produces a divergent but still secure result; however, binding authors should document which error to expect.

**Ed25519 verification strictness**: `Ed25519.Verify` MUST use strict verification per RFC 8032 §5.1.7, rejecting non-canonical S values (S ≥ L), small-order public keys, and non-canonical point encodings. ZIP-215 permissive verification (as used by `crypto/ed25519` in Go and some other libraries) is NOT compatible — it accepts signatures that soliton rejects, producing silent interoperability failures on `HybridVerify`. The implementation uses `verify_strict()` from `ed25519-dalek`. Reimplementers MUST verify their Ed25519 library defaults to strict mode or explicitly select it.

**Caution — "strict mode" varies by library**: Some Ed25519 libraries advertise a "strict" or "batch-compatible" mode that checks only S-canonicity (S < L, i.e., the scalar is in the range [0, ℓ−1]) but does NOT reject small-order public keys. Curve25519 has cofactor 8 and eight torsion points (points of order dividing 8); a small-order public key causes `BasePoint × s × pk` to produce a predictable output for any signature, allowing an attacker who controls `pk` to forge a valid signature under any private key. ed25519-dalek ≥ 1.0 (used by soliton) rejects all eight torsion points via `VerifyingKey::from_bytes`. Reimplementers using other libraries MUST explicitly verify that their "strict" mode includes small-order-key rejection — S-canonicity alone is insufficient.

**Ed25519 key import failure during HybridVerify maps to VerificationFailed, not InvalidData**: `ExtractEd25519Public` slices bytes `1216..1248` from the public key and passes them to the Ed25519 library's key import function (`VerifyingKey::from_bytes` in ed25519-dalek). If those 32 bytes are not a valid compressed Edwards point, the import fails. `HybridVerify` collapses this failure to `VerificationFailed`, not `InvalidData` — the key bytes are structurally the right length and format (the public key size was already validated by `IdentityPublicKey::from_bytes`), so the failure is a verification-layer rejection, not a parsing failure. A reimplementer whose Ed25519 library propagates import failures as exceptions must catch them before the AND combination and treat them identically to a verification failure. A library that silently accepts invalid compressed points at import and produces incorrect verification results (rather than an error) would diverge silently: an invalid Ed25519 sub-key would appear to "verify" as false when it should have failed on import, which coincidentally produces the same combined `VerificationFailed` result — but through the wrong code path, and only for signatures that happen to fail the boolean check. A reimplementer must confirm their Ed25519 library errors on invalid point encoding rather than silently accepting it.

**ML-DSA public key import failure during HybridVerify maps to VerificationFailed, not InvalidData**: `ExtractMLDSAPublic` slices bytes `1248..3200` from the public key and passes them to the ML-DSA library's key import function. If those 1952 bytes are structurally invalid (e.g., polynomial coefficients outside `[0, q−1]` rejected by `pkDecode`, FIPS 204 §7.1), the import fails. `HybridVerify` collapses this failure to `VerificationFailed`, not `InvalidData` — identical rationale to the Ed25519 case above (structurally valid-length bytes, verification-layer rejection). A reimplementer whose ML-DSA library propagates import failures as distinct exceptions must catch them before the AND combination and treat them as `false`. Libraries that silently accept out-of-range coefficients at import and produce wrong verification results diverge silently in the same way as the Ed25519 case: a bad public key appears to "verify" as false, which yields the correct final result through the wrong code path. **ML-DSA infallible decode — third case**: soliton's own ML-DSA implementation (`ml-dsa` crate) accepts any 1952-byte input as a valid key without checking coefficients at import — `VerifyingKey::from_bytes` is infallible for correctly-sized inputs. Out-of-range coefficients are not rejected at import; they produce wrong polynomial arithmetic in `verify_internal`, which returns `false` → `VerificationFailed`. This is a third behavior not covered by the spec's "reject vs. normalize" binary: **accept-and-produce-wrong-result**. For soliton's purposes, this is safe (wrong coefficients → always-false verification → correct rejection of the forged signature) but a reimplementer who reads "MUST confirm their ML-DSA library rejects invalid coefficient encodings" may incorrectly conclude that rejection-at-import is required. It is not — any behavior that produces `VerificationFailed` for a key with out-of-range coefficients (import error, normalization-then-wrong-result, or implicit-wrong-result) satisfies the security requirement. The concern is libraries that **normalize** out-of-range coefficients modulo q at import and then produce wrong-but-consistent verification results — the re-encode cross-check below catches these. **Note**: unlike Ed25519, where invalid points produce errors at import, some ML-DSA libraries reduce out-of-range coefficients modulo q silently on import — producing a different public key than the original bytes represent. A public key that round-trips through such a library's import→export cycle differs byte-for-byte from the original, causing `HybridVerify` to fail even for an authentic signature (the verified message is computed against the original bytes, but the coefficients used for verification differ after normalization). Reimplementers importing ML-DSA public keys from external libraries should apply the re-encode cross-check described in §8.5.

**ML-DSA signature structural decode failure maps to VerificationFailed, not InvalidData**: A 3309-byte ML-DSA signature with polynomial coefficients outside `[0, q−1]` will pass the size check (`len(sig_pqc) == 3309`) but fail at the ML-DSA library's signature decode step. In the `ml-dsa` crate, `Signature::decode()` returns `None` for such inputs. soliton maps this `None` to `VerificationFailed` — not `InvalidData`. The rationale: the size is correct; the structural failure is a property of the signature bytes themselves, not of the API call. Reimplementers whose ML-DSA library exposes a two-step API (decode then verify) must catch the decode failure explicitly and map it to `VerificationFailed`. If the decode failure propagates as a distinct exception or error type, it breaks the constant-time AND requirement (the caller can distinguish "decode failed" from "verify returned false" via exception type, even if the combined result is the same). The correct mapping: any ML-DSA decode failure → treat as `ok_pqc = false` → combined `VerificationFailed`. This is a third failure mode not covered by the wrong-size path (which fails at slicing before decode) or the public-key path (import failure) — it requires a separate catch in reimplementations.

### 3.3 Security Properties

- **Classical**: Ed25519 provides 128-bit classical security (RFC 8032).
- **Post-quantum**: ML-DSA-65 provides NIST Level 3.
- **Hybrid guarantee**: Forgery requires breaking both simultaneously.
- **EUF-CMA**: The parallel composition ("both must verify") is EUF-CMA secure if either component is (Bindel et al., PQCrypto 2017 — see Appendix D, "Hybrid Constructions").

### 3.4 Where Signatures Are Used

Signatures are used in two contexts in v1:

1. **Pre-key bundle signing** (§5.3): Identity key signs the signed pre-key's public key material.
2. **Session initiation signing** (§5.4 Step 6): Alice's identity key signs the encoded SessionInit, proving to Bob that the session was initiated by the holder of sk_IK_A.

Signatures are NOT used for:
- Server-side authentication (KEM-based, §4).
- Message encryption (symmetric, §7).
- Ratchet key agreement (KEM-based, §6).

**Header authentication without signatures**: The §3.4 "not used for message encryption" note naturally prompts the question of how ratchet message headers are protected against tampering. The answer is AEAD AAD binding (§7.3): each message's ciphertext authenticates the full encoded ratchet header (`sender_fp || recipient_fp || header_bytes`) as additional associated data. A tampered header (e.g., modified `kem_ct` or wrong `n`) causes AEAD authentication to fail at decryption. Signatures are therefore unnecessary for per-message header integrity — the AEAD tag provides it.

---

## 4. KEM-Based Authentication

### 4.1 Purpose

Proves possession of the private key corresponding to a claimed public identity. Only the legitimate key holder can decapsulate.

### 4.2 Protocol

```
Client                                    Server
  |                                         |
  |  --- lo_pk (3200 B) ----------->        |
  |                                         |
  |        xwing_pk = ExtractXWingPublic(lo_pk)
  |        (ct, ss) = XWing.Encaps(xwing_pk)|
  |        token = HMAC-SHA3-256(ss, "lo-auth-v1")
  |        // ss zeroized immediately
  |                                         |
  |  <--- ct (X-Wing ciphertext) ---        |
  |                                         |
  |  ss = XWing.Decaps(xwing_sk, ct)        |
  |  proof = HMAC-SHA3-256(ss, "lo-auth-v1") |
  |  // ss zeroized immediately             |
  |                                         |
  |  --- proof (32 bytes) ---------->       |
  |                                         |
  |        constant_time_eq(proof, token)   |
  |        // token zeroized after verify   |
  |                                         |
  |  <--- READY or ERROR -----------        |
```

The three protocol steps correspond directly to the three CAPI entry points: the server's encapsulate-and-token step is `soliton_auth_challenge`; the client's decapsulate-and-proof step is `soliton_auth_respond`; the server's comparison step is `soliton_auth_verify`. Each CAPI function implements exactly one arrow in the diagram above — `auth_challenge` issues the ciphertext, `auth_respond` consumes it and returns the proof, `auth_verify` checks the proof against the stored token.

The X-Wing ciphertext `ct` is exactly **1120 bytes** (32 bytes `ct_X` + 1088 bytes `ct_M` — see Appendix C). The proof value `proof` / `token` is 32 bytes (HMAC-SHA3-256 output).

**General HMAC encoding rule — raw data, no length prefix**: Throughout this protocol, HMAC data arguments are passed as raw bytes with no length prefix. This is the opposite of HKDF `info` fields (§5.4, §6.12), which use `len(x) || x` length-prefixed encoding. The distinction: HKDF info uses length prefixes because it concatenates multiple variable-length fields into a single domain-separation string; HMAC data is always a single, fixed-purpose input (a domain label or a counter byte) where no length prefix is needed. A reimplementer who applies the HKDF `len(x) || x` convention to HMAC data arguments produces a different MAC output with no error signal.

Convention: `HMAC-SHA3-256(key, message)` — the shared secret `ss` is the HMAC key, and the domain label `"lo-auth-v1"` is the message. **`ss` is the key because it is the high-entropy secret material; the label is the data/domain separator.** HMAC's security requires the key to be the secret — placing `ss` as the data argument and the label as the key would produce a MAC keyed by a public constant, which is trivially forgeable by anyone who knows the label. **The label is 10 raw ASCII bytes with no length prefix** — unlike the HKDF `info` fields in §5.4 which use `len(x) || x` format, HMAC in §4.2 passes the label directly as the HMAC data argument. **C**: use `strlen("lo-auth-v1")` (= 10), not `sizeof("lo-auth-v1")` (= 11) — `sizeof` includes the NUL terminator, producing an 11-byte input that yields a silently different HMAC token. A reimplementer who applies the §5.4 convention here would prepend a 2-byte BE length (`0x00 0x0a`) before the label, producing a different token.

### 4.3 Security Properties

- **Key possession proof**: Only private key holder can produce correct HMAC.
- **Replay resistance (intra-connection)**: Fresh randomness per encapsulation prevents stale-proof replay within the same connection — an old `(ct, proof)` pair cannot be reused with a new `ct` challenge because the proof HMAC is bound to the specific `ss` from that encapsulation. **Cross-connection replay is not prevented by fresh randomness alone** — an adversary who captures a valid `(ct, proof)` pair can replay it against a different server instance that issues the same `ct` (e.g., via a replay of the encapsulation step). Cross-connection replay resistance requires the transport-layer session binding documented in §4.4; without it, the 30-second timeout only limits the replay window.
- **Post-quantum**: X-Wing hybrid construction.
- **No signature required**: Pure KEM paradigm.

### 4.4 Requirements

- Server MUST validate the client's `lo_pk` is exactly 3200 bytes before beginning authentication. Accepting a short or oversized public key and then slicing into it for `ExtractXWingPublic` causes out-of-bounds access or reads from the wrong offset, producing a pseudorandom shared secret and silent HMAC mismatch — indistinguishable from an authentication failure. Length validation MUST precede encapsulation. A server that defers this check to `Encaps` will incur the full cost of an X-Wing KeyGen before discovering the malformed key — a pre-association DoS vector: an unauthenticated client can force repeated expensive KeyGen operations by sending malformed keys.
- **Wrong-length `lo_pk` MUST collapse to a generic authentication failure response**: Returning a distinguishable error code for a wrong-length vs. correct-length key creates a length-probing oracle — an adversary can probe response codes to confirm whether a key is the expected size before committing to a full authentication attempt. Any authentication failure (wrong-length key, malformed key, correct-length key with bad cryptographic content) MUST produce the same externally observable outcome: generic authentication failure or connection close. The length check MUST still be enforced internally (to prevent the out-of-bounds access described above), but the error response to the client MUST NOT distinguish length failures from other authentication failures.
- Server MUST use fresh randomness per encapsulation. **Each `(ct, token)` pair MUST be delivered to the client at most once** — caching and redelivering a previously generated ciphertext is forbidden even if a new encapsulation would produce the same entropy. Redelivering a pair gives an adversary an additional observation opportunity beyond the 30-second timeout window: an attacker who captures a replayed pair can attempt offline HMAC forgery against the same `ss`. Fresh randomness prevents entropy reuse, but delivery-uniqueness is a separate, additional requirement.
- HMAC comparison MUST be constant-time (`subtle::ConstantTimeEq`). The comparison uses the full 32-byte HMAC-SHA3-256 output — no truncation. Implementations using a "HMAC-with-length" API parameterized on output length MUST request 32 bytes and compare all 32 bytes. A truncated comparison (e.g., 16 bytes) weakens forgery resistance from 256-bit to 128-bit and produces an incompatible proof token that fails on conforming servers.
- Shared secret MUST be zeroized immediately after proof computation.
- Auth token (proof HMAC) MUST be zeroized by the server immediately after the constant-time comparison. A server that retains the token in a session cache (e.g., for re-authentication within the 30-second window) enables token replay — an attacker who observes the token can resubmit it on the same connection before expiry. The 30-second timeout bounds but does not eliminate this window. Single-use: one comparison, then zeroize.
- Label `"lo-auth-v1"` is a domain separator preventing cross-protocol attacks.
- **Transport-layer session binding**: The proof token binds no server identity, timestamp, or connection identifier — replay resistance depends entirely on the transport layer binding the issued ciphertext to the specific connection on which it was issued. The server MUST reject a proof token received on any connection other than the one on which it issued the ciphertext. A token that escapes its connection context (e.g., via session hijacking or protocol downgrade) is replayable against any server that would issue the same ciphertext. The 30-second timeout bounds the window but does not replace the connection-binding requirement — without it, the timeout merely limits the replay window rather than preventing replay entirely.

### 4.5 Error Variants

| Function | Error | Condition |
|----------|-------|-----------|
| `soliton_auth_challenge` (CAPI only) | `InvalidLength` | `lo_pk` not exactly 3200 bytes — **Rust API only**: the Rust `auth_challenge(client_pk: &IdentityPublicKey)` takes a typed reference; the size is enforced by the type system and `InvalidLength` cannot be returned. This guard exists only in the CAPI wrapper. |
| `soliton_auth_respond` (CAPI only) | `InvalidLength` | `ct` not exactly 1120 bytes — **Rust API only**: the Rust `auth_respond(ct: &xwing::Ciphertext)` takes a typed reference; the size is enforced at construction by the `xwing::Ciphertext` type and `InvalidLength` cannot be returned. This guard exists only in the CAPI wrapper. |
| `soliton_auth_verify` (CAPI only) | `InvalidLength` | `expected_token` not exactly 32 bytes — checked first (before `auth_proof`); same compile-time note as below applies. |
| `soliton_auth_verify` (CAPI only) | `InvalidLength` | `auth_proof` not exactly 32 bytes — **Rust API only**: the Rust `auth_verify(expected: &[u8; 32], proof: &[u8; 32]) -> bool` takes fixed-size array references; wrong-size inputs are rejected at compile time by the type system and `InvalidLength` cannot be returned. These guards exist only in the CAPI wrapper (`soliton_auth_verify`), which receives raw pointers and lengths. |
| `soliton_auth_verify` (CAPI only) | `VerificationFailed` | Constant-time comparison failed (proof ≠ token) — **Rust API only**: the Rust `auth_verify(expected: &[u8; 32], proof: &[u8; 32]) -> bool` returns `false` on mismatch; `VerificationFailed` is returned only by the CAPI wrapper. |

**External error collapsing requirement** (see also §4.4): Callers MUST map all LO-Auth failures — `InvalidLength` from any step, `VerificationFailed` from `auth_verify` — to the same external authentication-failure response (e.g., connection close or generic error code). Returning a distinguishable error per step (e.g., "wrong key size" vs. "HMAC mismatch") enables an oracle: an attacker can probe which step failed and thereby determine whether the submitted key is the correct length (step 1 passes), whether the ciphertext was accepted (step 2 passes), and whether the HMAC matched (step 3 passes) — progressively confirming each layer of the authentication independently. All failures must be indistinguishable externally, regardless of which step triggered them.

---

## 5. LO-KEX: KEM-Based Key Agreement

### 5.1 Goals

1. **Mutual authentication** (both cryptographic: recipient via KEM; initiator via HybridSign over SessionInit — see §5.6).
2. **Forward secrecy** via pre-key rotation and single-use OPKs.
3. **Post-quantum security** via X-Wing.
4. **Offline initiation** via pre-keys.
5. **Multi-key session binding** (session key requires compromise of both IK and SPK).

### 5.2 Key Material

| Key | Type | Size (pk) | Lifetime | Purpose |
|-----|------|-----------|----------|---------|
| Identity Key (IK) | LO composite | 3200 B | Long-term | Auth, signing |
| Signed Pre-Key (SPK) | X-Wing | 1216 B | ~weekly | Session initiation |
| One-Time Pre-Keys (OPK) | X-Wing | 1216 B | Single use | Enhanced forward secrecy |

Pre-keys are X-Wing only (no ML-DSA) because they need KEM, not signing.

**OPK secret key storage format**: OPK secret keys use the same expanded 2432-byte X-Wing secret key format as IK and SPK (§8.5): 32-byte X25519 scalar || 2400-byte ML-KEM-768 decapsulation key (NTT-domain). The table above shows public key size (1216 bytes); the stored secret key is 2432 bytes. Storing only the 32-byte X25519 scalar seed and re-deriving the ML-KEM portion at use is NOT supported — soliton stores the expanded form directly.

**SPK private key retention after rotation**: Rotating to a new SPK does NOT immediately delete the old SPK private key. The old private key MUST be retained for 30 days after rotation (Appendix B) to allow in-flight sessions that encapsulated to the old SPK to complete. Deleting the private key at rotation time causes silent `InvalidData` rejections for any `SessionInit` that arrived after rotation but was encapsulated to the pre-rotation SPK. After the 30-day window, the private key MUST be deleted — retaining it beyond that date extends the forward-secrecy exposure window. See §5.5 Step 4 and §10.2 for the deletion obligation and its security implications.

### 5.3 Pre-Key Bundle

Published to the user's home DM relay:

```
PreKeyBundle = {
    IK_pub:         LO composite public key (3200 bytes)
    crypto_version: "lo-crypto-v1"
    SPK_pub:        X-Wing public key (1216 bytes)
    SPK_id:         uint32
    SPK_sig:        Hybrid signature (3373 bytes)
    OPK_pub:        X-Wing public key (1216 bytes) [optional]
    OPK_id:         uint32 [optional]
}
```

OPK_pub and OPK_id must be both present or both absent.

**`SPK_id` uniqueness obligation**: `SPK_id` MUST be unique per server identity within the 30-day SPK retention window (§10.2). If a new SPK is generated with the same `SPK_id` as a recently deleted SPK that is still in its grace period, `receive_session` will silently retrieve the wrong secret key for that ID, producing `AeadFailed` with no diagnostic. A monotonic counter (incrementing on each SPK rotation) satisfies this constraint. Random 32-bit IDs are also acceptable given the collision probability over typical rotation schedules (~3 × 10⁻⁸ for a 30-day window with weekly rotation). Relay implementations MUST NOT reuse an SPK_id until the previous SPK with that ID has been fully deleted from the grace-period store. Note that `SPK_id` is a server-assigned opaque identifier — the reference implementation does not specify or enforce an allocation policy; uniqueness is a server-side obligation.

**Wire format**: The pre-key bundle is a transport-layer struct — soliton does not define a canonical binary encoding for it (unlike SessionInit, which has `encode_session_init` in §7.4). The transport protocol serializes the bundle for relay storage and retrieval. Field ordering and encoding are protocol-spec concerns. However, the following constraints apply regardless of wire format:

```
encode_prekey_bundle(b) =
    len(b.crypto_version) || b.crypto_version        // UTF-8, 2-byte BE len
 || b.IK_pub                                          // 3200 bytes (fixed, no length prefix)
 || b.SPK_pub                                         // 1216 bytes (fixed, no length prefix)
 || big_endian_32(b.SPK_id)
 || b.SPK_sig                                         // 3373 bytes (fixed, no length prefix)
 || if OPK present: 0x01 || b.OPK_pub || big_endian_32(b.OPK_id)
    else:           0x00
```

**Decoder strictness**: A conforming decoder for `encode_prekey_bundle` MUST reject: (1) any `has_opk` byte other than `0x00` or `0x01` — values `0x02`-`0xFF` are invalid and MUST return `InvalidData`; (2) any trailing bytes after the last field — accept only the exact length implied by `has_opk`. Compare with §7.4's explicit "Trailing bytes after the last field → InvalidData" rule for `decode_session_init`. A decoder that accepts `has_opk = 0x02` as "OPK present" produces the same parsed output as `has_opk = 0x01` but allows an attacker to craft bundles that pass decoding with non-canonical bytes, creating format-malleability.

This encoding is not used in any AAD or signature (SPK_sig covers only the raw SPK_pub, not the bundle). It is provided as a reference for interoperable relay implementations. Two relays using different bundle encodings will not cause cryptographic failure — the fields are parsed individually, not as a blob — but a canonical encoding simplifies relay interop testing. **For federated relay-to-relay bundle exchange, this encoding SHOULD be adopted as the shared convention**: while the encoding is advisory for soliton clients (which parse individual fields), relays exchanging bundles in raw-blob form must agree on a representation. Two relays with incompatible bundle encodings produce parsing failures at relay ingestion without any cryptographic failure — the error is silent from the client's perspective. If a relay-level bundle exchange protocol does not independently negotiate encoding, it SHOULD adopt `encode_prekey_bundle` as normative for that exchange.

**SPK_sig is computed over the domain-separated SPK public key (raw concatenation is unambiguous):**

```
SPK_sig = HybridSign(IK_sk, "lo-spk-sig-v1" ‖ SPK_pub)
```

Raw concatenation — no length prefixes. This is safe because both components are fixed-size: the label is exactly 13 bytes and `SPK_pub` is exactly 1216 bytes, so no length prefixes are needed for unambiguous parsing. A reimplementer who adds length prefixes "for safety" produces different signed bytes and breaks all SPK signature verification. `SPK_pub` is the verbatim 1216-byte output of `XWing.KeyGen()` — no clamping, masking, or normalization is applied to any component (X25519 or ML-KEM) between key generation and signing. The bytes signed and stored must be identical. Some X25519 libraries normalize the public key (clear bit 255, or apply RFC 7748 clamping to the scalar before computing the public point), producing a different 32-byte value than the raw keygen output. If a reimplementer signs the pre-normalization bytes but stores the post-normalization bytes (or vice versa), `HybridVerify` in §5.4 Step 1 silently fails. The fixed 13-byte label `"lo-spk-sig-v1"` is a domain separator that prevents cross-context signature reuse — if the identity key is later used to sign other payloads (e.g., profile data or future protocol extensions), signatures from one context cannot be replayed in another. The `SPK_id`, `crypto_version`, `OPK_pub`, and `OPK_id` are metadata that travel alongside the signed key, not part of the signed message. Omitting `crypto_version` from the signature is intentional — downgrade protection relies on the hard-fail version policy (§14.14), not on signature binding. Omitting `OPK_pub` and `OPK_id` is intentional — OPKs are generated and signed independently, and their presence or absence in a bundle does not affect the authenticity of the SPK. `SPK_sig` is entirely independent of OPK data: Bob can add or remove OPKs from a bundle without invalidating the SPK signature, and a reimplementer who includes OPK bytes in the SPK signed message produces SPK signatures that fail verification on any bundle where the OPK differs.

**Rationale for label-only domain separation**: The signed message is a fixed label + raw key bytes, with no variable-length metadata. This keeps the signature verifiable without any metadata parsing ambiguity — the verifier has the label (a compile-time constant), the raw SPK_pub (from the bundle), and the raw IK_pub (from identity lookup). If `SPK_id` or `crypto_version` were included in the signed message, both signer and verifier would need to agree on an encoding format for those fields — an unnecessary source of interop bugs.

### 5.4 Session Initiation (Alice → Bob)

Alice wants to DM Bob. She has Bob's identity key (from community context or out-of-band) and fetches his pre-key bundle from his home relay.

#### Step 1: Verify Pre-Key Bundle

```
function VerifyPreKeyBundle(bundle, known_bob_ik):
    assert OPK fields are both present or both absent
    // Structural co-presence check fires FIRST, before any cryptographic operation.
    // Returns InvalidData (not BundleVerificationFailed) — tests format, not content.
    assert bundle.IK_pub == known_bob_ik
    assert bundle.crypto_version == "lo-crypto-v1"
    assert HybridVerify(bundle.IK_pub, "lo-spk-sig-v1" ‖ bundle.SPK_pub, bundle.SPK_sig)
    // Any assertion failure → abort, warn user
```

**`verify_bundle` error collapse (anti-oracle)**: All non-structural verification failures — `IK_pub` mismatch, `crypto_version` mismatch, `HybridVerify` failure — return `BundleVerificationFailed`, not distinct error codes. A `crypto_version` mismatch returns `BundleVerificationFailed`, not `UnsupportedVersion`. Returning `UnsupportedVersion` for a version mismatch or `VerificationFailed` for a signature failure would let an attacker iteratively probe bundles to determine which specific field failed without possessing the correct keys — each distinct error response narrows the search space. The structural OPK co-presence check returns `InvalidData` (not `BundleVerificationFailed`) because it fires before any cryptographic operation and tests only format, not content. See §5.5 Step 1 for the parallel error-collapse analysis at the recipient side.

The type system enforces that `initiate_session` cannot be called with an unverified bundle; `verify_bundle` returns a `VerifiedBundle` newtype.

**`crypto_version` maximum length**: Parsers MUST reject any `crypto_version` field longer than 64 bytes with `InvalidLength` before performing the equality check. The 2-byte BE length prefix in `encode_prekey_bundle` can represent values up to 65,535 — a crafted bundle with a 65,535-byte version string consumes ~64 KiB before the equality check fires. Since `"lo-crypto-v1"` is 12 bytes, any field longer than 64 bytes is structurally impossible for a conforming version string, even accounting for hypothetical future versions. The CAPI enforces the broader `decode_session_init` input cap (64 KiB, §13.4), but a Rust reimplementer or binding author consuming the bundle fields individually MUST apply this length guard explicitly.

#### Step 2: Generate Ephemeral Key

```
(EK_pub, EK_sk) = XWing.KeyGen()
```

The keypair MUST be freshly generated from the OS CSPRNG for each `initiate_session` call. Reusing EK across sessions causes both sessions to share the same initial `send_ratchet_sk` — if EK_sk is compromised (e.g., via a side-channel during one session), every session initiated with that EK is also compromised at the initial ratchet epoch.

This ephemeral key serves as **Alice's initial ratchet public key** in LO-Ratchet (§6). Bob will encapsulate to it when performing the first KEM ratchet step upon replying. **EK_sk must be preserved** through Steps 3-7 and passed to `RatchetState::init_alice` (§5.5 / §13.5) as the initial `send_ratchet_sk`. Discarding EK_sk after constructing the SessionInit — e.g., freeing or zeroizing it once EK_pub has been extracted for the SessionInit struct — leaves Alice without the decapsulation key for Bob's first KEM ratchet step; decapsulation of Bob's first response silently fails (wrong epoch key → `AeadFailed`). **EK_sk MUST NOT be used for any purpose other than this KEM decapsulation**: using it for additional DH operations, separate KEMs, or signing creates cross-context key reuse that voids the forward-secrecy guarantee for OPK-less sessions. EK_sk is single-purpose — it decapsulates Bob's first KEM ratchet ciphertext and is then zeroized.

#### Step 3: KEM Encapsulations

```
// Encapsulate to Bob's identity key (authentication + defense-in-depth)
(ct_ik,  ss_ik)  = XWing.Encaps(ExtractXWingPublic(Bob.IK_pub))

// Encapsulate to Bob's signed pre-key (session binding)
(ct_spk, ss_spk) = XWing.Encaps(Bob.SPK_pub)

// Encapsulate to Bob's one-time pre-key (enhanced forward secrecy)
if Bob.OPK_pub is available:
    (ct_opk, ss_opk) = XWing.Encaps(Bob.OPK_pub)
```

**Each `XWing.Encaps` call requires independent fresh randomness**: Each call draws its own 32-byte ML-KEM encapsulation coins from the OS CSPRNG (FIPS 203 §7.2 requires uniformly random per-call coins). Sharing or reusing the same 32-byte entropy across two or three calls produces correlated ciphertexts that violate IND-CCA2 for those encapsulations — decapsulation succeeds and the session key derives normally, so there is no error diagnostic. The three calls are entirely independent invocations of `XWing.Encaps` and MUST each draw fresh entropy.

#### Step 4: Derive Session Key

```
if OPK was used:
    ikm = ss_ik || ss_spk || ss_opk    // 96 bytes (3 × 32-byte X-Wing shared secrets)
else:
    ikm = ss_ik || ss_spk               // 64 bytes (2 × 32-byte X-Wing shared secrets)

info = "lo-kex-v1"                            // raw 9-byte prefix (not length-prefixed)
    || len(crypto_version) || crypto_version // 2-byte BE length + 12 bytes ("lo-crypto-v1")
    || len(Alice.IK_pub) || Alice.IK_pub     // 2-byte BE length + 3200 bytes
    || len(Bob.IK_pub)   || Bob.IK_pub       // 2-byte BE length + 3200 bytes
    || len(EK_pub)       || EK_pub           // 2-byte BE length + 1216 bytes

session_key = HKDF(
    salt = 0x00 * 32,
    ikm  = ikm,
    info = info,
    len  = 64
)

root_key  = session_key[0..32]
epoch_key = session_key[32..64]
zeroize(session_key)           // 64-byte HKDF output — intermediate buffer containing entropy
                               // derived from kem_ss; MUST be zeroized after split. In Rust,
                               // wrapping in Zeroizing<[u8; 64]> handles this automatically on
                               // drop. Non-RAII implementations (C, Go, Python) MUST explicitly
                               // zero this buffer before returning or after the split — failing
                               // to do so leaves 64 bytes of key-derived material on the heap.
```

**`session_key` must be zeroized after the split**: The 64-byte HKDF output is an intermediate value containing both `root_key` and `epoch_key`. After splitting, the original `session_key` buffer still holds both secrets in cleartext and must be explicitly zeroized. In Rust, wrapping the buffer in `Zeroizing<Vec<u8>>` covers this automatically at drop. In C, a manual `memset` + compiler barrier (or `explicit_bzero`) is required. In Go, `clear(sessionKey)` after the copy. Failing to zeroize leaves a 64-byte window containing both the root key and the epoch key — more sensitive than either half alone.

**`session_key` is derived with a single 64-byte HKDF call, then split positionally**: The `len = 64` HKDF call produces one 64-byte output; `root_key` and `epoch_key` are the first and second 32-byte halves respectively. This is NOT two separate HKDF invocations with distinct `info` labels — both halves come from the same Expand output. A reimplementer familiar with TLS 1.3's `derive_secret` (which calls `HKDF-Expand-Label` separately for each derived key with distinct labels and distinct `context` hashes) must not apply that pattern here. Using two separate HKDF calls with `info = "root"` and `info = "epoch"` (or any labeled split) produces different `root_key` and `epoch_key` values — both parties derive the same incorrect keys and `AeadFailed` results with no diagnostic.

**Why zero salt**: The IKM (`ss_ik || ss_spk [|| ss_opk]`) is already uniformly distributed high-entropy material — each shared secret is the output of an X-Wing KEM which by design produces pseudorandom bytes. HKDF's Extract step (the salt-keyed PRF) adds entropy from the salt to the IKM; when the IKM is already uniform, a non-secret salt (like zeros) provides no additional entropy benefit. A non-zero salt derived from session metadata would add complexity and a new parameter without a cryptographic gain. The choice of the default zero salt follows RFC 5869 §2.2's recommendation for this exact scenario.

**Zero salt is 32 explicit zero bytes, not empty/null**: The `salt = 0x00 * 32` is the RFC 5869 §2.2 default for HKDF-SHA3-256 (HashLen = 32). Libraries that accept a null or empty salt MUST be verified to internally substitute the 32-zero-byte default — passing an empty byte slice (length 0) to HKDF's Extract step produces a different PRK than passing `[0x00] × 32` (length 32) in many implementations. Go's `golang.org/x/crypto/hkdf.New` treats `nil` salt as "use HashLen zeros" but an explicit empty `[]byte{}` may not — behavior varies by library. A reimplementer who passes `nil` in one language and `[0x00; 32]` in another gets interop failure with no diagnostic.

**IKM concatenation order is critical**: The order `ss_ik || ss_spk [|| ss_opk]` must be followed exactly — any reordering produces a different session key. Both parties derive IKM in the same order (Alice from encapsulation, Bob from decapsulation). **The 64-byte and 96-byte IKM variants are not interchangeable.** A reimplementer who zero-pads the absent `ss_opk` slot (passing 96 bytes with `ss_opk = 0x00{32}`) produces a different HKDF output than the specified 64-byte IKM — HKDF's Extract step processes the full input length, so `ss_ik || ss_spk` and `ss_ik || ss_spk || 0x00{32}` yield different PRKs. This manifests as `AeadFailed` at `decrypt_first_message` with no diagnostic.

**IKM order is a documentation-only guarantee — no type-level enforcement**: The `ss_ik ‖ ss_spk [‖ ss_opk]` concatenation order is specified above but not enforced at the type level. All three shared secrets have the same type (`xwing::SharedSecret`), so the encapsulation calls (or decapsulation calls on Bob's side) and the IKM concatenation can be reordered without a compile error. Any such reordering produces a different session key with no error at the HKDF step — the mismatch surfaces only as `AeadFailed` at `decrypt_first_message` with no diagnostic pointing to the ordering change. In the Rust implementation, the `session_agreement_with_opk` integration test is the only runtime guard against an order-breaking refactor. Any change to the encapsulation sequence in §5.4 Step 3, the decapsulation sequence in §5.5 Step 4, or the IKM concatenation in either step MUST verify that test still passes with matching keys on both sides.

**Shared secret zeroization after IKM construction**: After constructing `ikm` by concatenating `ss_ik`, `ss_spk`, and (optionally) `ss_opk`, each individual shared secret MUST be zeroized immediately. Copying a secret into a concatenation buffer creates an independent copy — the original remains on the heap (or stack) until explicitly zeroized. In Rust, `Zeroizing<Vec>` covers the concatenated buffer but `.extend_from_slice()` does not zeroize the source. In C, `memcpy` into `ikm` leaves the originals in their allocations. In Go, slice append does not zero the source. Forgetting this step leaks up to 96 bytes of shared secret material.

All `len()` values are 2-byte big-endian. The total `info` length is 7645 bytes (9 + 2 + 12 + 2 + 3200 + 2 + 3200 + 2 + 1216). The `crypto_version` field (added for cross-version domain separation) appears immediately after the raw prefix, before the identity keys.

**Why length prefixes are used on fixed-size identity keys in HKDF `info`**: Unlike §7.4 (AAD encoding, where fingerprints and public keys are written bare because their sizes are fixed by definition within a given `crypto_version`), the HKDF `info` field here applies `len(x) || x` encoding uniformly to all post-prefix fields. The rationale: (1) **bit-string prefix-freeness** — length-prefixed encodings are prefix-free, ensuring no valid `info` field for one set of inputs is a proper prefix of a valid `info` for another set, which is required for HKDF's domain separation guarantee to hold; (2) **future version safety** — a `lo-crypto-v2` with different key sizes would change the field lengths; without length prefixes, a 3200-byte IK_pub in v1 and a differently-sized IK_pub in v2 would produce non-colliding `info` strings naturally (different sizes), but a uniform encoding convention ensures this by construction regardless of actual size changes; (3) **consistency** — `crypto_version` is genuinely variable-length, so all fields use the same encoding rule for simplicity. A reimplementer who omits the length prefixes from the fixed-size fields (treating them as optional "since the size is known") produces a different HKDF output — the missing 8 bytes of length prefixes shift the info bytes, producing a completely different session key with no diagnostic.

**Why IK is both encapsulated to AND in `info`**: IK encapsulation contributes `ss_ik` to the IKM, meaning Bob's IK private key is required to derive the session key. Binding both identity keys into HKDF `info` provides mutual authentication — substituting either key yields a different session key. See §5.6 for security analysis.

**Why `info` includes EK_pub**: Alice's ephemeral key `EK_pub` contributes no shared secret to the IKM (it is a KEM public key, not a DH key — shared secrets come from encapsulating *to* Bob's keys). Including `EK_pub` in `info` binds the session key to the specific ephemeral key Alice published. Without this binding, an active attacker could substitute a different `sender_ek` in the `SessionInit` while keeping the KEM ciphertexts intact — Bob's decapsulations would still succeed (the ciphertexts are bound to Bob's keys, not Alice's EK), but Bob's first KEM ratchet encapsulation (§6.4) would target the attacker's key instead of Alice's. With `EK_pub` in `info`, substituting `sender_ek` changes the HKDF output, causing `decrypt_first_message` to fail at AEAD.

**Why `info` excludes SPK, OPK, and ciphertexts**: SPK and OPK binding flows through the IKM path — only the holder of `sk_SPK` can produce `ss_spk`, and only the holder of `sk_OPK` can produce `ss_opk`. Including SPK/OPK public keys or ciphertexts in `info` would be redundant. For formal models: SPK/OPK binding is an IKM-path property (KEM correctness), not an info-path property (HKDF domain separation).

#### Step 5: Construct Session Init

```
SessionInit = {
    crypto_version:           "lo-crypto-v1"
    sender_ik_fingerprint:    SHA3-256(Alice.IK_pub) [32 bytes raw]
    recipient_ik_fingerprint: SHA3-256(Bob.IK_pub)   [32 bytes raw]
    sender_ek:                EK_pub [1216 bytes]
    ct_ik:                    X-Wing ciphertext [1120 bytes]
    ct_spk:                   X-Wing ciphertext [1120 bytes]
    spk_id:                   uint32
    ct_opk:                   X-Wing ciphertext [1120 bytes, optional]
    opk_id:                   uint32 [optional]
}
```

**Encoded size**: The `encode_session_init` output is 3,543 bytes (no OPK) or 4,669 bytes (with OPK). The OPK block adds exactly 1,126 bytes when present: 2 bytes (BE length prefix for `ct_opk`) + 1,120 bytes (`ct_opk`) + 4 bytes (`opk_id` as u32 BE). The 1-byte `has_opk` flag is always encoded (as part of the 3,543-byte base) — it is not part of the 1,126-byte increment. See §7.4 and Appendix C for the full field-by-field breakdown.

`sender_ik_fingerprint` lets Bob look up Alice's full identity key. Full IK not sent (bandwidth); Bob resolves from community context or prior knowledge.

`recipient_ik_fingerprint` names the intended recipient explicitly inside the signed payload. Since `SessionInit` is signed by Alice in Step 6, Bob can derive recipient binding from `sender_sig` alone — without reasoning about the KEM ciphertexts' implicit binding to Bob's keys. This simplifies formal verification: a Tamarin or ProVerif model can prove recipient binding as a direct property of the signature, rather than as a consequence of KEM decapsulability.

#### Step 6: Sign Session Init

Alice proves to Bob that she initiated this session (and possesses sk_IK_A):

```
session_init_bytes = encode_session_init(SessionInit)
sender_sig = HybridSign(Alice.IK_sk, "lo-kex-init-sig-v1" ‖ session_init_bytes)
```

Raw concatenation — no length prefixes (see Appendix A). `sender_sig` is a 3373-byte hybrid signature (Ed25519 64 bytes + ML-DSA-65 3309 bytes). The total signed message is `"lo-kex-init-sig-v1" (18 bytes) || session_init_bytes (3543 or 4669 bytes)` = **3561 bytes** (no OPK) or **4687 bytes** (with OPK). The label prefix is not length-delimited — it abuts `session_init_bytes` directly. `sender_sig` is transmitted alongside `SessionInit` and the first-message payload. Bob verifies it in §5.5 Step 3 before performing any KEM operations. The domain separator `"lo-kex-init-sig-v1"` prevents replay into any other signature context (§3.3). **Canonical wire order**: the three components are assembled as `session_init_bytes ‖ sender_sig ‖ encrypted_payload` — this order is defined and elaborated at the receiving side (§5.5 Step 3), which is where a receiver parses the three components. Alice MUST produce this order; Bob verifies in this order.

#### Step 7: Encrypt First Message

```
msg_key = KDF_MsgKey(epoch_key, 0)    // Counter 0 for the first message

nonce = random_bytes(24)     // Random for first message (defense in depth)

// session_init_bytes computed in Step 6 above
aad = "lo-dm-v1"             // 8 bytes
   || Alice.fingerprint_raw (32 bytes)
   || Bob.fingerprint_raw (32 bytes)
   || session_init_bytes

ciphertext = AEAD(
    key   = msg_key,
    nonce = nonce,
    plaintext = message_content,
    aad   = aad
)

// Zeroize msg_key immediately after use — secret material.
zeroize(msg_key)

encrypted_payload = nonce || ciphertext   // nonce prepended for decryption
```

**The AAD binds the full session init structure.** The `session_init_bytes` used here is the same encoding produced in Step 6 — it is computed once and reused verbatim, not re-encoded. Tampering with any session init field (ct_ik, ct_spk, sender_ek, spk_id, etc.) invalidates the AEAD tag. See §7.4 for the deterministic encoding.

**No length prefixes in AAD fields**: The `"lo-dm-v1"` AAD is raw concatenation — `"lo-dm-v1" || sender_fp || recipient_fp || session_init_bytes` — with no BE length prefixes separating the fields. This contrasts with the HKDF `info` construction in Step 4 (§5.4 Step 4), where each field is length-prefixed to prevent cross-context collisions. In the AAD, collisions are impossible structurally: `sender_fp` and `recipient_fp` are both fixed-width (32 bytes each), and `session_init_bytes` is the remainder. A reimplementer who applies the HKDF `info` length-prefix rule to the AAD produces different bytes and will see AEAD authentication failure on every first message.

**`build_first_message_aad` / `build_first_message_aad_from_encoded` rejects empty `si_encoded` with `InvalidData`**: An empty `session_init_bytes` input is rejected because an empty AAD suffix would strip the per-session binding — the AAD would degenerate to `"lo-dm-v1" || sender_fp || recipient_fp` with no session-init bytes, identical to what any first-message AAD would look like for the same two parties. `encode_session_init` never produces empty output (minimum 3,543 bytes), so this guard fires only on caller bugs — but it is a normative invariant of the AAD construction and MUST be enforced by reimplementers. `InvalidData` is correct (not `InvalidLength`) — an empty `si_encoded` is a structural protocol violation (a legitimate SessionInit has a minimum encoded size), not a buffer-size mismatch.

**Why random nonce for the first message**: The message key is unique (derived from a unique epoch key and counter), so a counter-based nonce would be safe. Random provides defense-in-depth: if a bug ever causes key reuse, random nonce prevents the catastrophic AEAD nonce-reuse failure mode.

**First-message `msg_key` zeroization**: `msg_key` is secret key material — it MUST be zeroized immediately after AEAD encryption completes. In Rust, wrapping the output of `KDF_MsgKey` in `Zeroizing` handles this automatically via `Drop`. In C/Go/Python, the caller must explicitly zeroize the key buffer after use. The same obligation applies to Bob's first-message decryption path (§5.5 Step 6).

**Epoch key passthrough**: The `epoch_key` (session-derived) becomes the epoch key passed to `RatchetState::init_alice`. Unlike the previous chain-ratchet design, the epoch key is not advanced by the first-message encryption — it is passed through unchanged. **Name aliases**: this value appears under three names across the spec, Rust API, and CAPI — `epoch_key` (this section), `initial_chain_key` (CAPI field name in `SolitonInitiatedSession` and Rust `take_initial_chain_key()` method name — both use the same `initial_chain_key` base name), and `ratchet_init_key` (CAPI first-message return). See §13.5 for the full list. `send_count` starts at 1 so that counter 0 is not reused by the ratchet (the first message consumed counter 0 with a random nonce). **Counter 0 namespace partition**: The `send_count = 1` initialization is the sole mechanism preventing counter collision between the first-message path (`encrypt_first_message` at counter 0 with a random nonce) and the ratchet path (`encrypt()` starting at counter 1 with a counter-based nonce). Both paths derive message keys from the same epoch key via `KDF_MsgKey(epoch_key, counter)` — if a reimplementer initializes `send_count = 0`, the first `encrypt()` call produces the same `msg_key` as `encrypt_first_message`, with different nonces (counter-based vs. random) but identical AEAD keys. No runtime guard prevents this — the protection is purely structural (initialization value).

### 5.5 Session Reception (Bob)

Bob receives the session init (real-time or from offline queue).

#### Step 1: Resolve Alice's Identity

Bob uses `sender_ik_fingerprint` to look up Alice's full identity key from local cache, community server context, or prior knowledge. If unknown, Bob's client SHOULD indicate this is an unverified first contact.

**This lookup is the sole identity binding.** The library verifies that `alice_ik_pk` is self-consistent with the session (fingerprint matches `sender_ik_fingerprint`, signature is valid under that key), but it cannot verify that `alice_ik_pk` actually belongs to the human "Alice." If the caller supplies the wrong key — or an attacker's key — signature verification succeeds (the attacker signed the SessionInit with the corresponding private key), and the session is authenticated to the attacker, not Alice. The caller's key-lookup code is the only thing standing between "authenticated session with Alice" and "authenticated session with an adversary." See Appendix E, Caller Obligation 1.

**TOFU key pinning obligation**: On first contact (no prior key record for this fingerprint), the caller MUST record the association between `sender_ik_fingerprint` and `alice_ik_pk` immediately after a successful `receive_session`. On subsequent contacts from the same fingerprint, the caller MUST verify that `alice_ik_pk` matches the previously recorded key — presenting a different key for the same fingerprint MUST trigger a key-change warning. A caller who fails to pin the key after first contact and fails to verify on subsequent contacts accepts TOFU impersonation silently: an attacker controlling the relay can substitute a different key pair on every session and the library will accept each substitution as valid. Key pinning is the caller's responsibility — the library provides the fingerprint but does not maintain a key store.

Bob also validates:
- `crypto_version == "lo-crypto-v1"`
- `SHA3-256(Alice.IK_pub) == si.sender_ik_fingerprint`  // MUST be constant-time — see below
- `SHA3-256(Bob.IK_pub) == si.recipient_ik_fingerprint` // MUST be constant-time — see below

**Fingerprint comparisons MUST be constant-time**: Both `SHA3-256(Alice.IK_pub) == si.sender_ik_fingerprint` and `SHA3-256(Bob.IK_pub) == si.recipient_ik_fingerprint` are comparisons of 32-byte digests that must use constant-time equality. These checks precede `HybridVerify` (Step 3). A variable-time comparison here allows an attacker to probe the expected `sender_ik_fingerprint` value byte-by-byte by submitting crafted session inits and timing the comparison — they learn one byte per probe without paying the `HybridVerify` cost (~2 ms). After 32 probes, they know the stored fingerprint value. This allows targeted construction of sessions that pass the fingerprint check while carrying a fraudulent public key. Appendix E's constant-time table documents fingerprint comparison in §6.12 (ratchet); this requirement applies equally in the KEX context. soliton uses `subtle::ConstantTimeEq` for both comparisons.

**Why `receive_session` does not need oracle-collapse**: `receive_session` does NOT collapse errors to a generic failure. This is intentional and safe: the values being checked — `crypto_version` (cleartext), `sender_ik_fingerprint` (cleartext, transmitted in the SessionInit), and `recipient_ik_fingerprint` (cleartext, the receiver's own identity) — are all known to the sender who constructed the SessionInit. A timing leak on any of these checks reveals nothing the attacker did not already supply or know. This contrasts with `verify_bundle` (which collapses to prevent bundle-content enumeration) and LO-Auth (which collapses to prevent authentication-step enumeration). The comparisons still MUST be constant-time (§Appendix E) to prevent reconstruction of the *receiver's stored* fingerprint value, but the error codes themselves need not be collapsed.

**Error collapsing in `verify_bundle` vs `receive_session`**: The `verify_bundle` function (§5.3) collapses all non-structural failures (crypto version mismatch, fingerprint mismatch, signature verification failure) to the single `BundleVerificationFailed` error. Returning distinct errors would create an enumeration oracle — an attacker could iteratively probe which validation step failed, revealing information about the bundle contents. **Exception**: the OPK structural co-presence check (`opk_pub` and `opk_id` must be both present or both absent) returns `InvalidData`, not `BundleVerificationFailed`. This check runs before the IK comparison and signature verification — it is a pre-cryptographic structural validation, not a security-sensitive check that requires collapse. Callers pattern-matching on `verify_bundle` errors must handle both `BundleVerificationFailed` and `InvalidData`. `receive_session` does NOT collapse errors — it returns `UnsupportedCryptoVersion` for a bad crypto version and `InvalidData` for fingerprint mismatches. No pre-key bundle is involved in receive_session, so the bundle-level collapse does not apply; the SessionInit fields being checked (crypto version, fingerprints) are already visible to the sender who constructed them.

#### Step 2: Validate OPK Co-Presence

OPK fields must both be present or both be absent. Failure → abort with `InvalidData`. This is a structural validation on the parsed SessionInit, not a cryptographic operation — executing it before signature verification avoids unnecessary signature/KEM work on malformed messages.

**Two distinct co-presence checks — only one is pre-signature**: Step 2 validates the *structural* co-presence of `ct_opk` and `opk_id` within the decoded SessionInit (both fields present or both absent). This check is pre-signature. A separate *caller* co-presence check — whether the caller supplied `opk_sk` if and only if `ct_opk` is present — fires at Step 4, after `HybridVerify`. The caller check MUST be post-signature: moving it to Step 2 would create an OPK-presence oracle (an attacker could distinguish "OPK present, no opk_sk provided" from "OPK absent" before signature verification, probing the receiver's key state without completing authentication). A reimplementer who consolidates both checks at Step 2 enables this oracle. The §5.5 Step 4 note documents the post-signature placement rationale.

**The OPK co-presence check is enforced inside `encode_session_init`**: In the reference implementation, this check fires from within `encode_session_init` (called at Step 3 to reconstruct the signed message bytes), not as an independent pre-step. A reimplementer who constructs the signed message manually — by concatenating SessionInit fields directly without calling `encode_session_init` — bypasses this guard entirely and reaches KEM operations (Step 4) with a structurally invalid SessionInit. Any reimplementation MUST perform the OPK co-presence check explicitly before `HybridVerify` if `encode_session_init` is not used to reconstruct the signed bytes.

#### Step 3: Verify Initiator Signature

```
session_init_bytes = encode_session_init(received_session_init)
HybridVerify(Alice.IK_pub, "lo-kex-init-sig-v1" ‖ session_init_bytes, sender_sig)
```

**`session_init_bytes` MUST be reconstructed by calling `encode_session_init`, not extracted from the wire**: The signed message `"lo-kex-init-sig-v1" ‖ session_init_bytes` uses the canonical encoding of the parsed SessionInit struct — the output of `encode_session_init`. The `session_init_bytes` are NOT transmitted as an opaque blob alongside the signature; the wire format is `session_init_bytes || sender_sig || encrypted_payload` (§5.4 Step 6), but the verifier re-encodes from the parsed struct rather than slicing the wire buffer. A reimplementer building a signature verifier who tries to extract `session_init_bytes` from the wire — e.g., slicing `wire[0..3543]` — must verify that their wire-slice produces byte-for-byte identical output to `encode_session_init`. Any normalization of SessionInit fields during parsing (key clamping, padding removal, normalization of the X25519 component) that changes the bytes on re-encoding causes `VerificationFailed` on an authentic session init.

Verification failure → abort with `VerificationFailed`. This provides cryptographic proof that the session was initiated by the holder of sk_IK_A, preventing zero-knowledge impersonation: an adversary who knows only pk_IK_A cannot produce a valid `sender_sig` without sk_IK_A. This step executes before any KEM operations; a forged or absent signature is rejected immediately, not silently.

**Verifier bytes obligation**: `HybridVerify` must receive the raw bytes of `sender_ek` exactly as stored and transmitted — no normalization, clamping, or bit-masking applied to any sub-key component. If the verifier's X25519 library normalizes public keys on import (e.g., clears bit 255 of byte 31 — see §8.1), the verified bytes differ from the signed bytes and `VerificationFailed` results even with an authentic session init. The same obligation applies to SPK verification in §5.3. See §8.1 for the X25519 masking hazard that most commonly triggers this.

**Validation ordering rationale**: The three pre-signature checks (crypto version, sender fingerprint, recipient fingerprint — Steps 1-2) are cheaper than HybridVerify (which performs Ed25519 + ML-DSA-65 verification). Running them first avoids the cost of two signature verifications on messages that would fail a trivial structural check. A reimplementer who reorders signature verification before the fingerprint checks wastes CPU on forged messages and gains no security benefit — all four checks are required before proceeding to KEM operations regardless of order.

`sender_sig` is transmitted alongside the session init and first-message payload; it is not part of the SessionInit struct and is not included in the AAD (it covers the encoded SessionInit).

**Canonical wire order**: The three components are assembled as `session_init_bytes || sender_sig || encrypted_payload` — session init first (fixed or deterministic size given `has_opk`), then the signature (fixed 3373 bytes), then the encrypted payload (variable length). All three have deterministic sizes: `session_init_bytes` is 3543 bytes without OPK or 4669 bytes with OPK (§7.4, Appendix F.13), `sender_sig` is always 3373 bytes (§3.3). A receiver must consume all 3543 bytes of the fixed prefix before reaching the `has_opk` flag at offset 3542 (the last byte of the fixed prefix), which determines whether the remaining 1126 bytes of OPK data follow. The receiver then reads exactly 3373 bytes of `sender_sig`, and the remainder is `encrypted_payload`. The CAPI returns these as separate fields; callers assembling wire messages MUST use this order.

#### Step 4: Decapsulate

```
// Decapsulate IK ciphertext
ss_ik = XWing.Decaps(ExtractXWingPrivate(Bob.IK_sk), ct_ik)

// Decapsulate SPK ciphertext
ss_spk = XWing.Decaps(Bob.SPK_sk[spk_id], ct_spk)

// Decapsulate OPK ciphertext (if present)
if ct_opk is present:
    ss_opk = XWing.Decaps(Bob.OPK_sk[opk_id], ct_opk)
    // Delete OPK_sk[opk_id] immediately — single use
```

**Caller co-presence obligation**: The caller must provide `opk_sk` if and only if `ct_opk` is present in the SessionInit.

- **soliton does**: If `ct_opk` is present but `opk_sk` is not provided (e.g., the OPK was already consumed and deleted), `receive_session` returns `InvalidData` — but only after signature verification (Step 3), so this check cannot be used as an oracle.
- **What a broken reimplementation sees instead**: A reimplementer who silently skips OPK decapsulation when `opk_sk` is unavailable (omitting `ss_opk` from the IKM) does NOT get `InvalidData` at `receive_session` — `receive_session` succeeds. The session key diverges from Alice's, and the error surfaces only as `AeadFailed` at `decrypt_first_message` with no diagnostic pointing to the missing OPK decapsulation. The active guard (`InvalidData` on missing `opk_sk`) is the only mechanism that surfaces this condition as a clear error; omitting the guard silently accepts a broken session.

The converse is also `InvalidData`: if `ct_opk` is absent but `opk_sk` is provided, the session init contains no OPK ciphertext to decapsulate. Silently ignoring a surplus `opk_sk` would mask a caller error where the wrong OPK was retrieved.

**OPK deletion is a forward secrecy boundary.** While `sk_OPK` survives, a three-key compromise (`sk_IK` + `sk_SPK` + `sk_OPK`) recovers this session's key. After deletion, only two-key compromise (`sk_IK` + `sk_SPK`) suffices — the OPK's contribution to IKM is lost. "Immediately" means before the ratchet state is used for any messaging — not deferred to a background task or garbage collector. Any delay between decapsulation and deletion is a forward secrecy window where the three-key compromise remains viable.

**The caller, not the library, performs the OPK deletion.** `receive_session` accepts `opk_sk` as a shared reference (`&xwing::SecretKey`); the library decapsulates but holds no handle to persistent storage and cannot remove the OPK key from the caller's keystore. The caller MUST delete the OPK from persistent storage at the call site, immediately after `receive_session` returns successfully, before passing the resulting ratchet state to any messaging function. A reimplementer who expects the library to delete the OPK automatically, or who defers deletion to a separate "cleanup" pass, retains the key beyond the intended forward-secrecy boundary.

**OPK deletion MUST be atomic with `receive_session` (single DB transaction)**: A server that completes `receive_session` and then crashes before deleting the OPK from storage will accept the same `session_init` again on restart — the OPK is still present, so the co-presence check passes. The second `receive_session` call succeeds and produces a second ratchet state from the same OPK decapsulation, violating the single-use guarantee. **The correct model**: execute `receive_session` and the OPK deletion as a single atomic database transaction. If `receive_session` succeeds, commit the transaction (which atomically deletes the OPK and persists the ratchet state). If the server crashes before the commit, the transaction rolls back and the OPK remains for the retried session init. If the server crashes after the commit, the OPK is deleted and the ratchet state is persisted — the session init is rejected on retry (`ct_opk` present but OPK deleted → `InvalidData` post-verification).

If `spk_id` does not match any retained SPK (all rotated out or invalid ID), the caller MUST reject the session init with `InvalidData` — but only after signature verification (Step 3). Checking `spk_id` before signature verification would create an SPK enumeration oracle: an attacker could probe which SPK IDs are retained without paying the signature cost. After signature verification confirms the session init is authentic, an unrecognized `spk_id` is safely rejected. Using the wrong SPK key instead of rejecting yields implicit rejection, producing a diverged session key and AEAD failure cryptographically indistinguishable from corruption. **Expired SPKs** (private key deleted after the 30-day retention window, §10.2) must be handled identically to unknown SPKs — return `InvalidData` post-signature-verification. Maintaining a separate "expired" vs "unknown" error would reintroduce the enumeration oracle that post-signature ordering is designed to prevent.

X-Wing implicit rejection (§8.4) applies to all three decapsulations — `ct_ik`, `ct_spk`, and `ct_opk`. Invalid or tampered ciphertexts produce pseudorandom shared secrets rather than errors, and the derived session key diverges silently from Alice's. `decrypt_first_message` fails at AEAD with no indication of which decapsulation diverged. Reimplementers using ML-KEM libraries with explicit-rejection APIs (that return an error on invalid ciphertexts) MUST suppress those errors and use the implicit-rejection output — propagating `DecapsulationFailed` would leak which ciphertext was malformed.

If `opk_id` references an absent OPK (expired or already consumed), the same applies — the pseudorandom shared secret from implicit rejection causes AEAD failure, leaking no information about OPK validity.

**ML-KEM key format hazard**: The `ml-kem` crate (and soliton's X-Wing §8.5) stores ML-KEM-768 decapsulation keys in NTT-domain encoding — the 1152-byte `dk_PKE` field contains polynomials in Number Theoretic Transform representation, not the coefficient-domain encoding specified in FIPS 203 §7.3 `DecapsKeyGen`. Reimplementers sourcing ML-KEM keys from other libraries (liboqs, PQClean, BouncyCastle) that use FIPS 203's coefficient-domain format MUST convert to NTT-domain before using them with soliton's X-Wing decapsulation. Using the wrong domain produces a pseudorandom shared secret (implicit rejection), causing silent `AeadFailed` at `decrypt_first_message` with no diagnostic pointing to the format mismatch. See §8.5 for the full key layout.

**Diagnostic note — correct `spk_id` with wrong secret key**: An unrecognized `spk_id` is caught explicitly (rejected as `InvalidData` post-signature-verification). A *recognized* `spk_id` paired with the wrong secret key (e.g., a storage corruption that maps a valid ID to a different key) is not caught at this step — ML-KEM implicit rejection produces a pseudorandom `ss_spk`, `receive_session` returns success, and the error surfaces only when `decrypt_first_message` fails with `AeadFailed`. No diagnostic distinguishes this from ciphertext tampering, transport corruption, or any other decapsulation divergence. This is the hardest SPK storage bug to diagnose. Implementations that maintain an `spk_id → sk` mapping SHOULD verify the mapping's integrity independently (e.g., by storing a fingerprint of the public key alongside the private key and checking it before decapsulation).

**Diagnostic note — correct `opk_id` with wrong secret key**: The same applies to OPK: a recognized `opk_id` paired with the wrong `opk_sk` (storage corruption mapping a valid OPK ID to different key material) produces a pseudorandom `ss_opk` via implicit rejection, `receive_session` returns success, and the error surfaces only as `AeadFailed` at `decrypt_first_message`. Unlike the SPK case, OPK keys are single-use and deleted immediately after decapsulation (§5.5 Step 4), so long-term storage corruption is less likely — but the failure mode is identical. Implementations SHOULD store an OPK public key fingerprint alongside the OPK secret key and verify it before decapsulation, the same as for SPK.

#### Step 5: Derive Session Key

Identical HKDF as Alice (§5.4 Step 4), using:
- `ikm`: `ss_ik || ss_spk [|| ss_opk]`
- `info`: Alice's IK_pub, Bob's IK_pub, Alice's EK_pub (from session init)

Produces identical `root_key` and `epoch_key`.

**IKM zeroization obligation (identical to §5.4 Step 4)**: After HKDF output is split into `root_key` and `epoch_key`, zeroize the IKM (`ss_ik || ss_spk [|| ss_opk]`) and each component shared secret (`ss_ik`, `ss_spk`, `ss_opk`). These are uniformly distributed 32-byte KEM shared secrets — leaving them in memory after use enables an attacker with post-compromise memory access to recover `root_key` and `epoch_key`. See §5.4 Step 4 for the full zeroization rationale and the note on IKM concatenation buffer zeroization (the concatenated buffer holds copies of all shared secrets and must be zeroized independently of the individual components).

**Initiator-first ordering, not local-first**: Alice's identity key precedes Bob's in the HKDF `info` on both sides — Alice uses `Alice.IK_pub || Bob.IK_pub` and Bob also uses `Alice.IK_pub || Bob.IK_pub`. The ordering is determined by the initiator/responder role, not by which party is doing the computation. A reimplementer who reads "identical HKDF as Alice" as "local key first, remote key second" would swap the order on Bob's side, producing a different session key — both parties succeed at their own computation with no error; the mismatch surfaces only as `AeadFailed` at `decrypt_first_message`.

**`sender_ek` (Alice's EK_pub) in HKDF `info` MUST be the raw bytes from the received session init — no normalization**: The "Verifier bytes obligation" in Step 3 covers signature verification; the same no-normalization requirement applies here. Bob's HKDF `info` computation uses `sender_ek` (the X-Wing public key Alice transmitted), and it MUST be the raw received bytes, not a library-imported-and-re-exported form. If Bob's X25519 library normalizes the public key at import (e.g., clears bit 255 of the last byte — the high bit is masked in RFC 7748 §5 scalar multiplication), the normalized bytes differ from Alice's transmitted bytes, the HKDF `info` diverges, and `decrypt_first_message` fails with `AeadFailed` with no diagnostic pointing to the normalization. The fix: use the raw `session_init.sender_ek` bytes directly in the `info` construction without passing them through a library's key import path. See §8.1 for the X25519 masking hazard. The no-normalization obligation for signatures (Step 3) is explicitly documented there; this is the equally critical, less obvious HKDF-side obligation.

#### Step 6: Decrypt First Message

```
msg_key = KDF_MsgKey(epoch_key, 0)    // Counter 0 for the first message

// Reconstruct AAD from received session init
session_init_bytes = encode_session_init(received_session_init)
aad = "lo-dm-v1" || Alice.fingerprint_raw || Bob.fingerprint_raw || session_init_bytes

// Guard: reject payloads too short to contain a nonce + Poly1305 tag.
// Minimum valid length is 40 bytes (24-byte nonce + 16-byte tag). Payloads
// shorter than 40 bytes cannot contain a valid nonce — slicing [0..24] on
// a sub-24-byte buffer causes out-of-bounds access in C or a panic in Rust.
// Return AeadFailed (not InvalidLength) — see §12 oracle-collapse rationale.
if len(encrypted_payload) < 40:
    raise AeadFailed

// Extract nonce from payload
nonce = encrypted_payload[0..24]
ciphertext = encrypted_payload[24..]

// Zeroize msg_key immediately after use — secret material.
plaintext = AEAD-Decrypt(msg_key, nonce, ciphertext, aad)
zeroize(msg_key)
```

Bob's `encode_session_init(received_session_init)` must produce byte-for-byte identical output to Alice's Step 6 encoding — any field transformation during decode (padding trimming, key clamping, normalization) that alters re-encoded bytes causes silent AEAD failure with no diagnostic.

**AeadFailed conflation is normative — MUST NOT add distinguishing codes**: All AEAD authentication failures in `decrypt_first_message` — whether caused by a wrong session key (diverged KEM output), a tampered nonce, a modified AAD, a corrupt ciphertext, or a re-encoded `session_init_bytes` that differs from Alice's original — MUST return `AeadFailed` with no distinguishing information. Reimplementers MUST NOT return distinct error codes for these cases (e.g., a separate `KeyDerivationMismatch` or `AadMismatch`). Adding distinguishing codes creates an oracle: an attacker who can trigger specific errors knows which layer of the construction failed, enabling targeted substitution attacks. The single `AeadFailed` response forces the attacker to succeed at AEAD authentication — i.e., to know the key — to get any response other than failure. This requirement also applies to `receive_session` as a whole: `VerificationFailed` (Step 3) and `AeadFailed` (Step 6) must remain the only cryptographic-layer failure codes, not be further subdivided.

**First-message `msg_key` zeroization (Bob)**: `msg_key` MUST be zeroized after AEAD decryption completes — it is secret material. In Rust, `Zeroizing<[u8; 32]>` handles this automatically. In C/Go/Python, explicitly zeroize the key buffer after `AEAD-Decrypt` returns. The same obligation applies on Alice's encrypt path (§5.4 Step 7).

#### Step 7: Initialize Ratchet State

Bob initializes LO-Ratchet with:
- `root_key` from key derivation
- `recv_epoch_key` = `epoch_key` (the session-derived key, now used as the receive epoch key)
- `recv_ratchet_pk` = Alice's `EK_pub` (from session init)
- `ratchet_pending = true` (Bob must perform a KEM ratchet step before his first send)
- `recv_count` starts at 1 (the session-init message used counter 0). **Corollary**: a ratchet header with `n = 0` will fail the duplicate check (`0 < recv_count = 1`) and be rejected as `DuplicateMessage`. Counter 0 is permanently outside the ratchet's receive window — it belongs to `decrypt_first_message`, not the ratchet. A reimplementer who initializes `recv_count = 0` instead of 1 would accept `n = 0` as a valid ratchet message, creating a counter alias with the first-message counter and enabling replay of the session-init payload as a ratchet message (AEAD would fail due to AAD mismatch, but the acceptance represents a protocol divergence). **This `recv_count = 1` invariant is a construction-time guarantee, not enforced at deserialization**: the §6.8 guards do not reject a deserialized blob with `recv_count = 0`. A cross-implementation blob constructed with `recv_count = 0` is silently accepted by the reference deserialization. The deserialization path trusts the invariant was maintained during construction. Reimplementers who allow `recv_count = 0` at init time (e.g., for testing or partial state reconstruction) produce blobs that the reference accepts, but with state that violates the counter-alias-free guarantee above.
- `recv_seen` = empty (counter 0 was consumed by `decrypt_first_message`, outside the ratchet)
- Bob generates his own ratchet keypair only on first reply (triggered by `ratchet_pending`)

**Session init replay — library boundary**: `receive_session` does not detect or reject replayed session inits. A replayed session init carries a valid signature (it was signed by Alice), valid KEM ciphertexts, and passes all library-layer checks. If the same `session_init_bytes` is submitted to `receive_session` a second time (with a still-present OPK), a second ratchet state is created from the same KEM outputs — two live ratchet objects initialized identically, with the same root key and epoch key, in different memory locations. The library has no persistent session registry and cannot distinguish a replay from a legitimate first delivery.

Replay detection is the caller's responsibility. The correct architecture:
- The relay MUST deduplicate session inits before delivering them to Bob's device — the natural deduplication key is `(sender_ik_fingerprint, recipient_ik_fingerprint, SHA3-256(session_init_bytes))`. A relay that delivers the same session init twice creates the duplicate-ratchet-state condition.
- Bob's client MUST enforce at-most-once semantics for session establishment with a given peer: if a ratchet session already exists for the `(sender_ik_fingerprint, spk_id, ct_ik)` combination, the client MUST NOT call `receive_session` a second time with the same session init.
- The OPK single-use delete-in-transaction requirement (§5.5 Step 4) provides a partial backstop: once the OPK is deleted, a replayed `ct_opk`-bearing session init fails with `InvalidData` at the co-presence check. However, OPK-less session inits (no `ct_opk`) have no such backstop and rely entirely on caller-side deduplication.

`receive_session` exposing this boundary as a caller obligation (rather than adding a session registry inside the library) is intentional — the library has no persistent storage and cannot implement relay-side deduplication. Applications building atop soliton MUST implement the deduplication layer at the relay and client levels described above.

### 5.6 Security Analysis

**Multi-key session binding**: The session key requires ALL shared secret components. No single key compromise is sufficient. Note: **"IK" in the table below means the X-Wing component only** (bytes 0-2431 of the LO composite secret key — see clarification after the table).

| Keys compromised | Session key recoverable? |
|---|---|
| IK (X-Wing component) alone | No — missing ss_spk |
| SPK alone | No — missing ss_ik |
| OPK alone | No — missing ss_ik, ss_spk |
| IK (X-Wing component) + SPK | Yes (same as X3DH / PQXDH) |
| IK (X-Wing component) + SPK + OPK | Yes |

**"IK" in this table means the X-Wing component only**: Session key recovery via IK requires the X-Wing private key (`sk_X || dk_M`, bytes 0-2431 of the LO composite secret key) — the component needed to decapsulate `ct_ik`. The Ed25519 and ML-DSA sub-keys within the LO composite key do not participate in key agreement and are irrelevant to session key recovery. A full LO composite key compromise (`sk_IK`) trivially yields the X-Wing sub-key, so the security table holds. But an adversary who compromises only the Ed25519 or ML-DSA sub-keys (e.g., through an algorithm-specific attack) gains forgery capability (session initiation, SPK signing) but NOT session key recovery — IK KEM decapsulation is independent of the signing sub-keys.

SPK is the most exposed key (medium-term, stored on relay, retained 30 days after rotation). IK is long-term and device-stored only. Requiring both for session key recovery means the least-protected key is no longer a single point of failure.

**Forward secrecy**: Forward secrecy comes from SPK rotation and OPK single-use. After SPK private key is deleted, sessions using that SPK are permanently secure — even if IK is later compromised, the attacker lacks ss_spk.

**EK_sk forward-secrecy window**: Alice's ephemeral key `EK_sk` (§5.4 Step 2) must remain live until she successfully processes Bob's first KEM ratchet step (§6.6 new-epoch path), at which point it MUST be zeroized. Until zeroization, a device compromise allows an attacker to recover `EK_sk` and decapsulate Bob's first KEM ratchet ciphertext — recovering `ss_spk_ratchet` and therefore the initial epoch key. This exposes all messages in Alice's first ratchet epoch (from `send_count = 1` through the first KEM ratchet step). This window is bounded and unavoidable: the key must exist until the decapsulation it enables occurs. It does not affect sessions that used an OPK (the OPK provides an additional shared secret layer), and it disappears as soon as Alice processes Bob's first ratchet reply. The `EK_sk` zeroization obligation is documented in §5.4 Step 2 and §13.5; the forward-secrecy implication is that the window is as long as the round-trip to Bob's first reply.

**Post-quantum security**: All shared secrets via X-Wing. Both X25519 and ML-KEM-768 must be broken simultaneously.

**Mutual authentication**: Both identity keys are cryptographically bound into the session. Bob's IK is bound via KEM encapsulation (`ct_ik`): only the holder of Bob's IK private key can decapsulate and derive the session key. Alice's IK is bound via a HybridSign over the encoded SessionInit (`sender_sig`, §5.4 Step 6): only the holder of Alice's IK private key can produce a valid signature.

**Recipient binding — implicit and explicit**: Bob's IK is bound *implicitly* by the KEM: an attacker lacking Bob's IK private key cannot decapsulate `ct_ik` and the session key they derive will be garbage. Bob's IK is also bound *explicitly* via `recipient_ik_fingerprint` embedded in the signed `SessionInit`: `sender_sig` directly names Bob as the intended recipient, independent of KEM decapsulability. Formal verification tools (Tamarin, ProVerif) can derive recipient binding from the signature alone, without modelling KEM implicit binding as a separate lemma.

**UKS (Unknown Key Share) resistance**: An Unknown Key Share attack would allow Alice to establish a session that Alice believes is with Bob, but Bob believes is with a third party C. LO-KEX prevents this via a three-link chain that must all hold simultaneously: (1) Bob validates `SHA3-256(Bob.IK_pub) == si.recipient_ik_fingerprint` (§5.5 Step 1) — this binds Bob's own key to the session before any KEM operation; (2) Bob verifies Alice's signature over `session_init_bytes`, which contains `recipient_ik_fingerprint` as a field — so the signature covers Bob's identity explicitly, not just cryptographic material that implies it; (3) both `Alice.IK_pub` and `Bob.IK_pub` are bound into the HKDF `info` field via `build_kex_info` — a session where Alice thinks she's talking to Bob but Bob thinks he's talking to C would require both parties to derive the same session key from different `info` inputs, which HKDF collision-resistance prevents. All three links are required: the fingerprint check alone fails if the attacker can substitute a key with the same fingerprint (SHA3-256 preimage resistance required); the signature check alone fails if the signature doesn't name the recipient (it does, via `recipient_ik_fingerprint`); the HKDF binding alone is not a direct authentication (both parties must independently check they are talking to the expected peer). This argument is documented as A9 in Abstract.md; formal models must verify all three links hold simultaneously under the relevant security assumptions.

**Explicit initiator authentication**: Alice's `sender_sig` is proof-of-possession of sk_IK_A. An adversary who knows only pk_IK_A cannot produce a valid `sender_sig` without sk_IK_A (HybridSign EUF-CMA, §3.3). Bob verifies the signature before any KEM operations (§5.5 Step 3), so a forged or missing signature is rejected immediately, not silently. Both identity keys are also committed into the HKDF info field — any substitution additionally fails at first-message decryption.

**IMPORTANT — First-contact limitation (TOFU)**: The mutual authentication guarantee holds only when both parties possess authentic copies of each other's identity keys. The signature proves Alice holds sk_IK_A, but does not prove that pk_IK_A actually belongs to the human "Alice" — on first contact, Bob cannot verify the binding between pk_IK_A and a human identity.

A relay controlling the delivery path could substitute a different IK pair (its own pk_IK_X, sk_IK_X), forge a valid `sender_sig`, and impersonate Alice to Bob — because Bob has no reference key to compare against on first contact. This is trust-on-first-use (TOFU), identical to Signal, SSH, and all systems without centralized PKI. It is inherent, not a bug.

Mitigations:
- Verification phrases (§9) for post-hoc verification.
- Key pinning after first contact.
- Community server context (shared presence provides key distribution).
- Multi-path verification (compare keys from multiple independent servers).

**KCI resistance**: Corrupt(IK, A) enables impersonation of Alice (both signing pre-keys and forging `sender_sig`). Cannot impersonate Bob to Alice (requires Bob's SPK/OPK private keys, independent of sk_IK_A).

**Non-deniability**: LO-KEX does not provide deniability. Alice's `sender_sig` (§5.4 Step 6) is a HybridSign EUF-CMA signature over the encoded SessionInit — Bob can present `(session_init_bytes, sender_sig, pk_IK_A)` to any third party as cryptographic proof that Alice initiated this specific session. This is a deliberate departure from Signal's X3DH, which achieves deniability through DH's non-binding outputs (both parties can compute the same shared secret, so neither can prove who initiated). Systems requiring deniable authentication should note this property. See Appendix D (Hashimoto, PKC 2024) for post-quantum deniable AKE approaches.

**Header integrity**: AAD binds session init (§5.4 Step 7) and ratchet headers (§6.5). Header tampering → AEAD failure. See §7.3-7.4.

**spk_id cryptographic binding**: `spk_id` is not included in the HKDF `info` of `KDF_KEX` (§5.4) — its binding flows through a different path. `spk_id` is a field of `SessionInit`, which is encoded by `encode_session_init` (§7.4) and incorporated into the AEAD AAD for the first message (§5.4 Step 7 and §7.3). AEAD authentication over this AAD provides the cryptographic binding: any attacker who substitutes a different `spk_id` in transit causes the `encode_session_init` output to differ, which changes the AAD bytes, which causes AEAD authentication to fail on the responder's side. The binding chain is: `spk_id` → `encode_session_init(session_init)` → AEAD AAD → authentication tag. A formal modeler constructing a spk_id-substitution attack lemma should derive binding from this chain rather than from the KDF info path.

**Channel 2 surface**: LO-KEX exposes the following metadata to a passive network adversary: the bundle fetch event (party A intends to initiate a session with party B), the `SessionInit` message (reveals both fingerprints and crypto version to any interceptor), and failed initialization responses (a structural rejection is distinguishable from silence, enabling version and presence probing — see §1.5 for the probing implication). All content and authentication guarantees above are unaffected; these are structural metadata leaks outside the Channel 1 scope of this section.

---

## 6. LO-Ratchet

After session establishment, ongoing message encryption uses LO-Ratchet.

### 6.1 Overview

LO-Ratchet combines a **KEM ratchet** (replacing Double Ratchet's DH ratchet) with **counter-mode message key derivation**. When the conversation direction changes, the new sender generates a fresh X-Wing keypair, encapsulates to the other party's current ratchet public key, and derives new root and epoch keys. Within an epoch (between KEM ratchet steps), each message key is derived directly from the epoch key and the message counter in O(1), without sequential chain advancement.

**Design rationale**: The Signal Double Ratchet uses a sequential KDF chain that provides per-message forward secrecy — compromising the chain key at position N reveals only messages N+1, N+2, ... but not messages 0..N-1. LO-Ratchet deliberately trades this for per-epoch forward secrecy: compromising an epoch key reveals all messages in that epoch. This simplification eliminates the skip cache, TTL expiry, purge logic, and O(N) skip cost for out-of-order messages, removing the most error-prone component of the protocol. The practical security impact is minimal — the epoch key shares a memory region with strictly more powerful secrets (root key, ratchet secret key), and any realistic memory compromise that extracts the epoch key also extracts these adjacent secrets, rendering per-message forward secrecy moot. **Scope note**: this memory-colocation argument holds when all ratchet state resides in a single protected memory region. Architectures where only the epoch key is exported — for example, an HSM-backed ratchet that holds `root_key` and `send_ratchet_sk` in hardware but exports `send_epoch_key` to the application CPU for message key derivation — break the colocation assumption: an attacker who compromises only the exported epoch key does not automatically also hold `root_key`. In such architectures, the per-epoch vs. per-message trade-off carries real security cost and should be evaluated against the specific deployment threat model.

**Channel 2 surface**: The ratchet header (`pk_s`, `c_ratchet`, `n`, `pn`) is transmitted in cleartext and bound into the AEAD AAD but not encrypted. A passive network observer learns: when epoch transitions occur (from `pk_s` changes), whether a KEM ratchet step is present in this message (`c_ratchet`), the message's position within the current epoch (`n`), and the number of messages sent in the previous epoch (`pn`). Message content, epoch keys, and identity are fully protected; the header fields are structural metadata outside the Channel 1 scope of this section. See §1.5 for the full Channel 2 surface and transport-layer mitigations.

### 6.2 State

```
RatchetState = {
    root_key:               32 bytes
    send_epoch_key:         32 bytes
    recv_epoch_key:         32 bytes
    local_fp:               32 bytes    // SHA3-256(full 3200-byte LO composite public key) for local party — NOT a sub-key hash, NOT the hex string
    remote_fp:              32 bytes    // SHA3-256(full 3200-byte LO composite public key) for remote party — same derivation rule
    send_ratchet_sk:        Option<X-Wing secret key>    // None until first send; also serves as decapsulation key for incoming KEM ratchet steps (§6.6) — there is no separate recv_ratchet_sk. Stored as the 2432-byte expanded X-Wing secret key form (NOT the 32-byte seed) — see §6.8 guard 2; storing the seed form produces InvalidData on serialization.
    send_ratchet_pk:        Option<X-Wing public key>    // None until first send. Dual role: (1) local state — the public key corresponding to send_ratchet_sk, included in outgoing message headers; (2) epoch routing anchor for the receiver — the receiver matches header.ratchet_pk against its own send_ratchet_pk (via recv_ratchet_pk on the other side) to identify the current epoch (§6.6)
    recv_ratchet_pk:        Option<X-Wing public key>    // None for Alice until first recv
    // Non-Rust reimplementer note: Option fields (recv_ratchet_pk, send_ratchet_sk/pk,
    // prev_recv_epoch_key, prev_recv_ratchet_pk) use Rust's Option<T> type where None
    // is semantically distinct from any byte pattern. In languages without sum types
    // (C, Go), represent None with a separate boolean presence flag — do NOT use an
    // all-zero array as a sentinel. An all-zero X-Wing public key is a valid (degenerate)
    // key that would cause epoch routing in decrypt (§6.6) to match incorrectly.
    prev_recv_epoch_key:    Option<32 bytes>              // Previous epoch key for late messages
    prev_recv_ratchet_pk:   Option<X-Wing public key>    // Previous epoch ratchet public key
    send_count:             u32    // = header.n when sending; starts at 1 for Alice
    recv_count:             u32    // high-water mark: max(n+1) for current recv epoch
    prev_send_count:        u32    // = header.pn when sending
    ratchet_pending:        bool   // set when peer KEM ciphertext received; cleared on next send
    recv_seen:              set of u32    // message counters successfully decrypted in current recv epoch
    prev_recv_seen:         set of u32    // message counters successfully decrypted in previous recv epoch
    epoch:                  u64    // monotonic anti-rollback counter for serialization
}
```

#### send_ratchet_sk dual role

`send_ratchet_sk` serves two distinct purposes: (1) signing/encapsulating outgoing KEM ratchet steps, and (2) decapsulating incoming KEM ratchet ciphertexts. There is no separate `recv_ratchet_sk`. This design means the party who most recently sent a message holds the decapsulation key for the peer's next reply — the current sender's send key becomes the receiver's decapsulation target. A reimplementer who adds a separate `recv_ratchet_sk` field diverges from the state model and will fail on the first direction change.

**Clarification on counter fields**: `send_count` is the counter included as `n` in the ratchet header of outgoing messages. `recv_count` is the high-water mark for the current receive epoch: `max(n + 1)` across all successfully decrypted messages. `prev_send_count` is the value of `send_count` at the moment the KEM ratchet step fires, included as `pn` in the first message of a new send epoch. This is not the number of messages sent in that epoch — for Alice's first epoch, one message at `n=1` advances `send_count` to 2, so `pn=2` when the ratchet fires. These are the same values that appear in the wire format (Protocol Spec §12.9).

**`ratchet_pending` flag**: Set to `true` when a message is received that carries a new peer ratchet public key (triggering `recv_ratchet_pk` update). Cleared when the next `encrypt()` call performs the send-side KEM ratchet step. While `ratchet_pending` is true, any call to `encrypt()` will perform the ratchet step first. This defers the send-side ratchet until the party actually needs to send, rather than forcing it immediately on receipt. For Bob, `ratchet_pending = true` at initialization (§5.5 Step 7) — it is not exclusively a runtime transition flag. It means "a KEM ratchet step is required before the next send," which is true immediately after session establishment for the responder.

**`recv_seen` set**: Tracks which message counters have been successfully decrypted in the current receive epoch. Used for duplicate detection: a message with `n` already in `recv_seen` is rejected as `DuplicateMessage`. The set is bounded at `MAX_RECV_SEEN = 65536` entries as defense-in-depth against memory exhaustion. The set resets on each KEM ratchet step. **Required operations**: O(1) average-case `contains` for per-message duplicate detection (called on every decrypt), and sorted ascending iteration at serialization time (§6.8 serializes `recv_seen` entries in ascending order). The data structure choice is an implementation concern — a hash set provides O(1) `contains` and requires a sort step at serialization; a sorted B-tree provides O(log n) `contains` and O(1) sorted iteration. Either satisfies the spec; the performance difference becomes meaningful only near `MAX_RECV_SEEN = 65536` entries.

**Previous epoch key**: `prev_recv_epoch_key` holds the epoch key from the immediately preceding receive epoch, allowing decryption of late-arriving messages from that epoch. It is overwritten (and the old value zeroized) by the next KEM ratchet step — only one previous epoch key is retained at any time. `prev_recv_ratchet_pk` identifies which ratchet public key the previous epoch was associated with, enabling the receiver to route incoming messages to the correct epoch key.

**Fingerprint immutability**: `local_fp` and `remote_fp` are fixed at session initialization (`init_alice`/`init_bob`) and MUST NOT be modified for the session lifetime. Both values are embedded in every message's AAD — mid-session modification would silently corrupt AAD for all in-flight and future messages, producing permanent `AeadFailed` without a session reset. The library enforces this by storing the fingerprints inside `RatchetState` (not caller-supplied per call) and by rejecting mutations via the exclusive-access model (§6.2). In languages where state fields are publicly accessible, implementations MUST treat these fields as read-only after initialization.

**Fingerprint derivation**: `local_fp` and `remote_fp` are SHA3-256 of the full 3200-byte LO composite public key (`X-Wing pk (1216 B) || Ed25519 pk (32 B) || ML-DSA-65 pk (1952 B)`) — not a hash of any single sub-key, and not the hex string. The CAPI (`soliton_ratchet_init_alice`, `soliton_ratchet_init_bob`) accepts pre-computed 32-byte fingerprint bytes; the library cannot verify correct derivation. A mismatch produces `AeadFailed` on every message with no diagnostic — the fingerprints are embedded in AAD, so a wrong fingerprint fails authentication identically to a tampered ciphertext. Use `soliton_identity_fingerprint` (§13.4) to compute fingerprints from public key bytes.

**Identity fingerprint invariant**: `local_fp` and `remote_fp` must be distinct (`local_fp ≠ remote_fp`) and neither may be all-zero. Equal fingerprints would break AAD asymmetry, allowing a message encrypted by one party to be replayed as if sent by the other. All-zero fingerprints indicate uninitialized state. Both conditions are enforced at `init_alice`/`init_bob` (returning `InvalidData`) and at deserialization (guard 20). A state constructed with a zero fingerprint can encrypt/decrypt (the AEAD doesn't inspect fingerprint values), but fails to round-trip through serialization — enforcing at init prevents this latent inconsistency.

**`init_alice` / `init_bob` function signatures and error returns**:

```
function init_alice(
    root_key:         [u8; 32],   // from KDF_KEX (§5.4 Step 4), secret material
    epoch_key:        [u8; 32],   // from KDF_KEX (§5.4 Step 4), becomes send_epoch_key
    local_fp:         [u8; 32],   // SHA3-256 of Alice's full 3200-byte public key
    remote_fp:        [u8; 32],   // SHA3-256 of Bob's full 3200-byte public key
    send_ratchet_pk:  X-Wing public key (1216 bytes),  // Alice's EK_pub (§5.4)
    send_ratchet_sk:  X-Wing secret key (2432 bytes),  // Alice's EK_sk (§5.4)
) → RatchetState | InvalidData

function init_bob(
    root_key:         [u8; 32],   // from KDF_KEX (§5.5 Step 4), secret material
    epoch_key:        [u8; 32],   // from KDF_KEX (§5.5 Step 4), becomes recv_epoch_key
    local_fp:         [u8; 32],   // SHA3-256 of Bob's full 3200-byte public key
    remote_fp:        [u8; 32],   // SHA3-256 of Alice's full 3200-byte public key
    recv_ratchet_pk:  X-Wing public key (1216 bytes),  // Alice's EK_pub (from SessionInit)
) → RatchetState | InvalidData
```

**Parameter order note**: Fingerprints follow `root_key` and `chain_key` but precede the ephemeral key parameters in both functions — `(root_key, chain_key, local_fp, remote_fp, key_params...)`. The full CAPI signatures are `soliton_ratchet_init_alice(root_key, root_key_len, chain_key, chain_key_len, local_fp, local_fp_len, remote_fp, remote_fp_len, ek_pk, ek_pk_len, ek_sk, ek_sk_len, out)` and `soliton_ratchet_init_bob(root_key, root_key_len, chain_key, chain_key_len, local_fp, local_fp_len, remote_fp, remote_fp_len, peer_ek, peer_ek_len, out)`. The §13.4 summary abbreviates for readability; fingerprints always follow root_key and chain_key but precede the ephemeral key parameters (`ek_pk`/`ek_sk` for Alice, `peer_ek` for Bob). A reimplementer who infers parameter order from the §13.4 abbreviation alone, or who follows an abbreviated listing that omits or reorders fingerprints, silently corrupts every message's AAD (the fingerprints flow into `KDF_MsgKey` for every message; wrong ordering produces wrong AAD, causing immediate AEAD failure). For `init_alice`, `send_ratchet_pk` appears before `send_ratchet_sk` (public key before secret key) — the reverse of the draft order common in academic specifications. Swapping pk/sk produces a type error in strongly-typed languages but not in C/Go/Python where both are `*const u8`.

`init_alice` and `init_bob` accept caller-supplied fingerprints — they are inputs, not outputs derived from the key material. The functions cannot verify that the fingerprints match the actual public keys used in the KEM exchange. They return `InvalidData` for: `local_fp == remote_fp` (AAD asymmetry violation), either fingerprint all-zero (uninitialized sentinel), `root_key` all-zero (liveness sentinel), or `epoch_key` all-zero (degenerate KEX output). On `InvalidData`, no ratchet handle is allocated.

**Root key and epoch key liveness**: The `root_key` and the *input* `epoch_key` parameter (from LO-KEX) must not be all-zero at init time. All-zero values indicate uninitialized or degenerate KEX output. This check applies to the epoch key that becomes the active direction's key — for Alice, `send_epoch_key`; for Bob, `recv_epoch_key`. The *other* direction's epoch key is intentionally set to all-zeros as a placeholder (Alice's `recv_epoch_key`, Bob's `send_epoch_key`) and is not checked at init — it will be set to a real value by the first KEM ratchet step in that direction. After a session-fatal error (encrypt AEAD failure), `root_key` is zeroized to zero — the all-zero liveness check on subsequent encrypt/decrypt calls prevents use of a dead session. **Dual role of root_key**: `root_key` serves two purposes: (1) it is the HKDF salt in `KDF_Root` (§6.4), providing forward secrecy by mixing fresh KEM shared secrets into the key hierarchy, and (2) it is the liveness sentinel checked at the top of encrypt/decrypt. Both uses require `root_key` to be secret material — the constant-time comparison in the liveness check (§6.5, §6.6) prevents timing side-channels that could leak root_key bytes.

**Concurrency model**: All operations on a `RatchetState` require exclusive access. No concurrent or reentrant calls are safe on the same state handle — even read-only queries (epoch, is-pending, etc.) must not race with encrypt/decrypt. The CAPI enforces this via an `AtomicBool` reentrancy guard (§13.6). Reimplementers wrapping the Rust core directly must provide their own mutual exclusion (e.g., `Mutex<RatchetState>`, not `RwLock` — encrypt, decrypt, and serialization may all trigger KEM ratchet steps or mutate counters). **Exception**: `derive_call_keys` (§6.12) takes `&self` and reads `root_key` without mutating any ratchet state. Multiple concurrent `derive_call_keys` calls are safe with respect to each other. However, `derive_call_keys` must still not race with encrypt/decrypt (which may advance `root_key` via a KEM ratchet step), so a `RwLock` — where `derive_call_keys` takes a read lock and encrypt/decrypt take write locks — is a valid alternative to `Mutex` for reimplementers who need concurrent call key derivation.

**`to_bytes()` requires write-lock upgrade, not read-lock**: `to_bytes()` is ownership-consuming (it takes `self` and nulls the handle on success — §6.8). In a `RwLock` scenario, `to_bytes()` requires a write lock with no outstanding readers — not a read lock. A reimplementer who acquires only a read lock for `to_bytes()` while a concurrent `derive_call_keys` also holds a read lock creates a use-after-consume race: both calls access the same handle, but `to_bytes()` destroys it. Rust's ownership system prevents this at compile time (consuming `self` requires `&mut self` upgrade, which is incompatible with any outstanding `&self` borrow). C/Go reimplementers using explicit `RwLock` primitives MUST acquire a write lock for `to_bytes()` and wait for all outstanding `derive_call_keys` read locks to drain before proceeding.

**Anti-rollback epoch**: The `epoch` counter starts at 0 and is incremented each time the state is serialized via `to_bytes`. On deserialization, the epoch must be strictly greater than the last-seen epoch for the same session. This prevents storage-layer replay of older blobs. See §6.8.

**Initial state after LO-KEX:**

For Alice (initiator):
- `send_ratchet_pk/sk` = her EK (ephemeral key from §5.4 Step 2)
- `recv_ratchet_pk` = `None` (Bob hasn't sent yet)
- `send_epoch_key` = epoch key from `encrypt_first_message`
- `recv_epoch_key` = all-zeros (unused until Bob sends)
- `prev_recv_epoch_key` = `None`
- `send_count` = 1 (counter 0 was used by the random-nonce first message)
- `recv_count` = 0
- `prev_send_count` = 0 (initialization default — no KEM ratchet step has fired yet). When Alice's first KEM ratchet step fires, `prev_send_count` is set to `send_count` at that moment (§6.4). If Alice sent one ratchet message (`n=1`, advancing `send_count` to 2), her first ratchet message carries `pn = 2`, not `pn = 0`.
- `ratchet_pending` = false
- `recv_seen` = empty, `prev_recv_seen` = empty
- `epoch` = 0
- `recv_seen` is empty (not `{0}`) because counter 0 was consumed by `encrypt_first_message` — a structurally separate function from `encrypt()`. A replayed session init is a protocol-layer concern (deduplicated by the relay), not a ratchet concern. Reimplementers MUST NOT seed `recv_seen` with `{0}`.

For Bob (responder):
- `recv_ratchet_pk` = Alice's EK_pub (from session init)
- `send_ratchet_pk/sk` = `None` (set on first send)
- `recv_epoch_key` = epoch key from `decrypt_first_message`
- `send_epoch_key` = all-zeros (unused until Bob sends)
- `prev_recv_epoch_key` = `None`
- `recv_count` = 1 (counter 0 was consumed by the session-init message, outside the ratchet). Derivation: `recv_count` tracks `max(n + 1)` across successfully decrypted messages in the current epoch; `decrypt_first_message` processes the message at counter 0, producing `max(0 + 1) = 1`. This is a bookkeeping value for serialization consistency (guard 17 requires `recv_seen` entries to be `< recv_count`), not a replay guard — Alice's `send_count` starts at 1, so she will never produce a ratchet message with `n = 0` in this epoch. A reimplementer who treats `recv_count = 1` as the security control preventing `n = 0` replays is relying on a false assumption; the actual protection is Alice's `send_count` starting at 1 (§5.4 Step 7).
- `send_count` = 0
- `prev_send_count` = 0 (first ratchet message from Bob carries `pn = 0` — no prior send epoch exists)
- `ratchet_pending` = true (Bob must ratchet before first send)
- `recv_seen` = empty, `prev_recv_seen` = empty
- `epoch` = 0
- `recv_seen` is empty (not `{0}`) for the same reason as Alice: counter 0 was consumed by `decrypt_first_message`, outside the ratchet. Reimplementers MUST NOT seed `recv_seen` with `{0}` — doing so causes the first ratchet-layer message (counter 0 in a new epoch after Bob's KEM ratchet step) to be rejected as `DuplicateMessage`.

### 6.3 Counter-Mode Message Key Derivation

```
function KDF_MsgKey(epoch_key, counter):
    return HMAC-SHA3-256(key=epoch_key, data=0x01 || big_endian_32(counter))
```

Each counter value produces a unique message key from the static epoch key. The `0x01` prefix byte provides domain separation — no other derivation from the epoch key currently exists, but the prefix reserves the `0x01` domain for message keys, leaving other prefix values available for future epoch-key-derived outputs without risking collision. The epoch key does not advance per message — it is fixed for the duration of an epoch (between KEM ratchet steps).

**Counter=0 cannot collide with any chain-advancement derivation**: In this counter-mode design there is no chain key advancement step — `KDF_Chain` does not exist. Counter 0 is simply `KDF_MsgKey(epoch_key, 0)`, and counters 1, 2, … are independent derivations from the same static key. There is no internal computation that uses the same HMAC key and input as counter 0 for any other purpose. The 0x01 domain prefix ensures that even if a hypothetical future protocol extension derived something from the epoch key with a different prefix byte, counter 0 (data = `0x01 || 0x00000000`) would not collide with it. A reimplementer coming from a chain-ratchet background (e.g., Signal's Double Ratchet) should note: there is no `KDF_Chain(epoch_key)` producing `(msg_key, next_chain_key)` — the epoch key is never used as input to derive another epoch key; that transition happens only via `KDF_Root` on a KEM ratchet step.

**HMAC-SHA3-256 block size is 136 bytes**: SHA3-256's rate (block size) is 136 bytes, not the 64-byte SHA-2 block size most developers have internalized. RFC 2104 HMAC pads/hashes the key to the hash's block size. A reimplementer building HMAC from a raw SHA3-256 primitive who assumes 64-byte blocks produces wrong padding and silently incorrect MACs. Standard HMAC libraries handle this automatically — this note exists for anyone implementing HMAC from scratch.

**HMAC-SHA3-256 uses FIPS 202 SHA3-256, not Keccak-256**: The SHA3-256 here is the NIST-standardized variant (FIPS 202, domain-separation suffix `0x06`), not the pre-standardization Keccak-256 used in Ethereum and similar systems (suffix `0x01`). The two produce different outputs for the same input. Both have the same 136-byte block size, so the block-size hazard above does not detect this substitution — the HMAC library silently accepts either hash function and produces wrong but plausible-looking output. In Go, use `sha3.New256()` from `golang.org/x/crypto/sha3`, not `sha3.NewLegacyKeccak256()`. In Python, use `hashlib.sha3_256`, not a `pysha3` or `pycryptodome` Keccak binding. Every message key, root key, and call chain key derivation in this protocol would be wrong throughout if Keccak-256 were substituted here.

**HMAC input is exactly 5 bytes: no length prefix, no padding.** The data field is the literal concatenation `0x01 || big_endian_32(counter)` — 1 byte domain prefix followed by 4 bytes counter. Unlike the HKDF `info` fields in §5.4 (which use 2-byte BE length prefixes), HMAC data here is a fixed-layout input with no framing. A reimplementer who adds a 2-byte length prefix (e.g., `0x00 0x05 || 0x01 || counter`) by analogy with HKDF conventions produces a different 32-byte message key with no error or diagnostic.

**HMAC domain byte allocation** (complete registry — protocol extenders MUST NOT reuse allocated values):

| Byte | Use | Section |
|------|-----|---------|
| `0x01` | Message key derivation (`KDF_MsgKey`) | §6.3 |
| `0x02`-`0x03` | Reserved | — |
| `0x04` | Call key_a derivation (`AdvanceCallChain`) | §6.12 |
| `0x05` | Call key_b derivation (`AdvanceCallChain`) | §6.12 |
| `0x06` | Call chain_key derivation (`AdvanceCallChain`) | §6.12 |

Note: `0x04`-`0x06` operate on the call chain key (§6.12), not the epoch key. They are listed here for completeness — the domain byte space is global across all single-byte HMAC-SHA3-256 data inputs in the protocol.

HMAC is used here (not HKDF) as a PRF — each call is independent with a fixed key and unique counter input. No extract phase is needed because the epoch key is already uniformly distributed (output of HKDF in `KDF_Root`). For formal models: treat as `PRF(ek, 0x01 ‖ BE32(counter))`.

**Forward secrecy is per-epoch, not per-message.** Compromising an epoch key reveals all message keys in that epoch. Forward secrecy across epochs is provided by the KEM ratchet (§6.4), which derives each new epoch key from a fresh KEM shared secret via `KDF_Root`. See §6.13 for the design rationale.

The output is wrapped in `Zeroizing` to ensure automatic memory wipe after use. **Non-Rust implementations**: the 32-byte `msg_key` returned by `KDF_MsgKey` is secret key material — it MUST be zeroized immediately after AEAD encryption or decryption completes. In C/Go/Java/Python, the caller must explicitly call `memset_s` (C), `Arrays.fill` (Java), or equivalent after use. RAII-less environments MUST NOT rely on garbage collection or variable scope for zeroization — the key may remain in memory until a future allocation overwrites it, which can be arbitrarily delayed. The zeroization MUST occur before any error path or early return that could skip cleanup (e.g., if AEAD fails after key derivation but before the key is used, the derived key must still be zeroized).

### 6.4 KEM Ratchet Step

When sending and `send_ratchet_pk` is absent or `ratchet_pending` is true:

```
function PerformKEMRatchetSend(state):
    peer_pk = state.recv_ratchet_pk   // must be Some; if None → Internal error
        // (structurally unreachable from valid state — deserialization guards 6
        // and 9 in §6.8 prevent this configuration, and init_bob always sets it)

    // Generate new ratchet keypair.
    (new_pk, new_sk) = XWing.KeyGen()

    // Encapsulate to peer's ratchet public key.
    (ct, ss) = XWing.Encaps(peer_pk)

    // Advance root key, derive new send epoch key.
    (state.root_key, state.send_epoch_key) = KDF_Root(state.root_key, ss)

    // Update state.
    state.send_ratchet_sk = new_sk   // old sk auto-zeroized via ZeroizeOnDrop (Rust only).
                                     // Non-Rust reimplementers MUST explicitly zeroize the
                                     // old send_ratchet_sk before overwriting: assigning a new
                                     // pointer or value leaves the old key bytes on the heap.
                                     // When send_ratchet_sk is None (Bob's first send, or Alice's
                                     // state before any send), the zeroization obligation is
                                     // vacuously satisfied — there are no key bytes to zeroize.
                                     // In C: check for null before calling memset; in Go: check
                                     // for nil slice. The obligation applies only when transitioning
                                     // from Some(old_sk) to Some(new_sk).
                                     // Same obligation applies to prev_recv_epoch_key rotation
                                     // in §6.6 (old recv_epoch_key becomes prev_recv_epoch_key;
                                     // if prev_recv_epoch_key is being replaced, zeroize before
                                     // overwriting the slot).
                                     //
                                     // CRITICAL: the old send_ratchet_sk MUST NOT be zeroized
                                     // until after all three preceding operations (KeyGen, Encaps,
                                     // KDF_Root) have completed successfully. All three are
                                     // fallible (CSPRNG failure, structural error). If any fails,
                                     // the ratchet step must abort with no state change — the
                                     // caller must be able to retry with the session in its
                                     // pre-ratchet state. The reference implementation guarantees
                                     // this by performing all state writes (this line and below)
                                     // only after the fallible operations return successfully.
                                     // A reimplementer who "eagerly" zeroizes the old sk
                                     // immediately after keygen (before Encaps) loses the ability
                                     // to roll back on Encaps failure, leaving the session
                                     // permanently without a valid send ratchet key.
    state.send_ratchet_pk = new_pk
    state.prev_send_count = state.send_count   // MUST precede send_count = 0 (see below)
    state.send_count = 0   // Post-ratchet epochs start at n=0; Alice's first epoch
                           // is the exception (send_count=1 from session init, §5.4 Step 7).
                           // Reset (not continuation) is safe: the new epoch_key is independent
                           // (derived from a fresh KEM shared secret via KDF_Root), so counter N
                           // under epoch E₁ and counter N under epoch E₂ produce different
                           // message keys. Continuation would also be correct but reset is
                           // the simpler invariant and matches header.n expectations.
                           // **All seven field writes are atomic — no serialization point may be
                           // introduced between the start of KDF_Root and the post-AEAD send_count
                           // += 1.** This covers the entire sequence: `KDF_Root` (writes root_key
                           // and send_epoch_key), `send_ratchet_sk = new_sk`, `send_ratchet_pk =
                           // new_pk`, `prev_send_count = send_count`, `send_count = 0`,
                           // `ratchet_pending = false`. If serialization occurs after KDF_Root
                           // updates root_key/send_epoch_key but before send_count resets to 0,
                           // the blob encodes new epoch keys with the old send_count. Guard 8 does
                           // NOT catch this intermediate: `send_count > 0` with `send_ratchet_sk`
                           // present is a valid combination (every non-initial send), so the blob
                           // reloads successfully — but the session silently derives nonces using
                           // a desynchronized counter, causing AEAD failure against the peer.
                           // **Guard 8 transient violation (§6.8 guard 8)**: After `send_count = 0`
                           // specifically, the state has send_count == 0 with send_ratchet_sk
                           // present — the exact combination guard 8 rejects at deserialization.
                           // This narrower window is safe only because encrypt() holds exclusive
                           // access (§6.2) and does not yield between this line and send_count += 1.
                           // The full atomicity requirement above is the stronger invariant.
    state.ratchet_pending = false

    zeroize(ss)     // ss MUST be zeroized AFTER KDF_Root completes — it is the IKM input
                    // to HKDF and must survive until KDF_Root returns. Zeroizing ss before
                    // KDF_Root would use all-zero IKM, silently producing a weak, predictable
                    // epoch key. zeroize(ss) MUST be positioned after both state.root_key and
                    // state.send_epoch_key are written. Non-Rust implementations MUST NOT
                    // reorder or "optimize" this zeroization earlier.
    return ct    // Included in message header
```

**`prev_send_count = send_count` MUST precede `send_count = 0` — correctness requirement, not incidental ordering**: The two assignments appear in fixed order in the pseudocode, but the ordering is a hard correctness requirement. `prev_send_count` captures the count of messages sent in the just-completed epoch, which is transmitted as `pn` in ratchet-step headers so the peer knows how many messages to expect from the old epoch. If `send_count = 0` executes first, `prev_send_count` captures the reset value (0) regardless of how many messages were sent — every subsequent ratchet-step header carries `pn = 0`. The peer's AEAD succeeds (pn is AAD-bound but both sides compute the same wrong AAD when both use `pn = 0`), so **this failure is silent in same-implementation testing**. It only manifests as `AeadFailed` when a reimplementer's peer uses the correct ordering. A simple field swap in the implementation or a refactoring that moves the reset earlier silently introduces this bug.

**Root KDF:**

```
function KDF_Root(root_key, kem_shared_secret):
    output = HKDF(
        salt = root_key,
        ikm  = kem_shared_secret,  // 32 bytes — the full X-Wing combiner output (SHA3-256 of
                                   // ss_M ‖ ss_X ‖ ct_X ‖ pk_X ‖ XWingLabel, §8.2) — NOT the
                                   // X25519 DH output (ss_X) or ML-KEM shared secret (ss_M) alone
        info = "lo-ratchet-v1",   // raw 13-byte UTF-8 — no length prefix (unlike §5.4 KDF_KEX info which uses len(x)||x per field)
        len  = 64
    )
    return (output[0..32], output[32..64])    // (new_root, new_epoch_key)
                                              // bytes [0..32]  → new root_key (replaces state.root_key)
                                              // bytes [32..64] → new epoch_key (becomes state.send_epoch_key or state.recv_epoch_key)
                                              // Swapping the two halves is a silent wrong-output failure:
                                              // AEAD succeeds on the sender because both sender and receiver
                                              // share the same wrong key, but the root key evolves along a
                                              // different trajectory than the spec, breaking interoperability
                                              // with any correct implementation. F.2 (§Appendix F) provides
                                              // labeled vectors for both output halves.
```

**Why `root_key` is HKDF salt (not IKM)**: Placing the existing chain state (`root_key`) as the HKDF salt means the extraction phase is keyed by accumulated entropy from all prior KEM ratchet steps. Even a weak `kem_shared_secret` (e.g., from a biased KEM or compromised randomness) cannot dominate the extraction — the pre-existing root entropy conditions the PRK. This follows Signal's ratchet KDF design. By contrast, `KDF_KEX` (§5.4 Step 4) uses a zero salt because there is no prior chain state at session establishment — the zero salt is the RFC 5869 §2.2 default for "no prior keying material."

**`KDF_Root` is infallible in the reference implementation**: HKDF-Expand output length is bounded by 255 × HashLen (RFC 5869 §2.3). For SHA3-256 (HashLen = 32), the maximum is 255 × 32 = 8160 bytes. `KDF_Root` requests exactly 64 bytes, which is well within this limit. The operation therefore cannot fail due to output-length overflow in a correct SHA3-256 HKDF implementation. Reimplementers using a fallible HKDF API (e.g., returning an error for length > max) will never observe that error on this call path; if they do, it indicates an implementation bug and MUST be treated as `Internal`, not surfaced to callers.

### 6.5 Message Encryption

```
function Encrypt(state, plaintext, sender_fp, recipient_fp):
    // Liveness guard: all-zero root_key indicates a dead (post-reset) session.
    // Constant-time comparison — root_key is secret material.
    if root_key == 0x00{32}:
        raise InvalidData

    // Guard against nonce reuse before any mutation.
    // Idempotent: repeated calls with send_count at u32::MAX return ChainExhausted
    // without modifying state — no progressive corruption on retry.
    if state.send_count == u32::MAX:
        raise ChainExhausted
        // Terminal state: when send_count == u32::MAX AND ratchet_pending == true,
        // the pending KEM ratchet step (which would reset send_count to 0) never
        // fires — the ChainExhausted guard blocks before the ratchet_pending check.
        // The session is permanently un-sendable. Full session reset (§6.10) and
        // new LO-KEX exchange required. A reimplementer who assumes the pending
        // ratchet "unblocks" the guard will deadlock silently.

    // Perform ratchet if needed (no send chain yet, or direction changed).
    // These are two independent conditions, NOT interchangeable:
    //   - send_ratchet_pk is None: Bob's initial state (never sent) — both conditions true
    //   - ratchet_pending: direction changed since last send — send_ratchet_pk is Some
    // After the first KEM ratchet step, send_ratchet_pk is always Some; only
    // ratchet_pending toggles. Collapsing them into a single flag breaks Bob's first send.
    kem_ct = None
    if state.send_ratchet_pk is None or state.ratchet_pending:
        kem_ct = PerformKEMRatchetSend(state)

    msg_key = KDF_MsgKey(state.send_epoch_key, state.send_count)

    nonce = 0x00{20} || big_endian_32(state.send_count)
    // SAME counter: KDF_MsgKey and nonce derivation both use state.send_count.
    // These are NOT separate counters. A reimplementer who uses a separate
    // nonce_counter (drifting from send_count) breaks AEAD authentication: the
    // receiver derives msg_key from header.n but constructs the nonce from header.n
    // as well — both use the same wire value. If the sender's nonce_counter diverges
    // from send_count, the nonce used for encryption differs from what the receiver
    // expects, producing AeadFailed with no diagnostic. If nonce_counter eventually
    // aliases send_count at a prior value, nonce reuse follows.

    header = {
        ratchet_pk:  state.send_ratchet_pk,
        kem_ct:      kem_ct,
        n:           state.send_count,
        pn:          state.prev_send_count
    }

    header_bytes = encode_ratchet_header(header)
    // sender_fp = local party's fingerprint, recipient_fp = remote party's fingerprint.
    // These are reversed on the decrypt side (§6.6) where sender_fp = remote.
    aad = "lo-dm-v1" || sender_fp (32 B) || recipient_fp (32 B) || header_bytes

    ciphertext = AEAD(msg_key, nonce, plaintext, aad)
    if ciphertext is Error:
        // Session-fatal: zeroize all key material to prevent nonce reuse on retry.
        reset(state)
        zeroize(msg_key)
        raise AeadFailed

    state.send_count += 1
    zeroize(msg_key)

    return (header, ciphertext)
```

The nonce encodes `send_count` in the last 4 bytes of a 24-byte buffer (bytes 0-19 are zero). Each `(msg_key, nonce)` pair is unique because the counter is distinct per epoch position. **When `send_count = 0` (the first message of a post-ratchet epoch — e.g., Bob's very first send after `ratchet_pending` clears), this produces a 24-byte all-zero nonce. This is safe and expected: the epoch key is fresh from `KDF_Root`, so the `(epoch_key, nonce)` pair is unique globally even though the nonce bytes are all zero. Implementations MUST NOT add a defensive guard that rejects all-zero nonces — doing so breaks every first post-ratchet message.**

**Encrypt atomicity**: Unlike decrypt (§6.6), encrypt does not use snapshot/rollback. Atomicity is achieved through ordering — all fallible operations (ChainExhausted check, KEM keygen/encapsulate/KDF_Root, KDF_MsgKey, AEAD) execute before `send_count` is incremented. If any pre-AEAD operation fails, no state has been mutated. The KEM ratchet step (§6.4) is the exception: it mutates `root_key`, `send_epoch_key`, `send_ratchet_sk/pk`, `prev_send_count`, `send_count`, and `ratchet_pending` before AEAD runs. If AEAD fails after a successful KEM ratchet step, these mutations cannot be safely unwound, so the session is zeroized via `reset()` (see below). A reimplementer MUST NOT mutate `send_count` optimistically before AEAD succeeds — doing so would consume a counter on failure, eventually causing nonce reuse.

**Cooperative multitasking within the atomicity window**: The "exclusive access" model (§6.2) prevents concurrent calls on the same ratchet state, but it does not prevent the owning coroutine or goroutine from yielding during an encrypt call. In async/await (Rust, Python, JavaScript), goroutines (Go), or green threads (Erlang, Ruby Fiber), a yield between `PerformKEMRatchetSend` (which mutates root_key, send_epoch_key, etc.) and `send_count += 1` causes the serialized state — produced by any `to_bytes()` call on that yield point — to encode the post-ratchet mutated fields alongside the pre-increment `send_count`. Guard 8 (§6.8, `ratchet_pending` requires `recv_ratchet_pk`) does not fire here, but the re-loaded session will have mismatched ratchet state: the new send_epoch_key with the old send_count, causing nonce reuse on the next post-load encrypt. The mitigation: the `to_bytes` call MUST NOT happen within the encrypt call's atomicity window. Callers MUST either hold the ratchet under a mutex that covers the full encrypt call (not just the state mutation), or structure async code so that `to_bytes` is never called in a concurrent task on the same ratchet state. Rust's `&mut self` on `encrypt` prevents this by construction (a mutable reference cannot be aliased); Go/Python/C callers must manage this explicitly.

**PerformKEMRatchetSend ordering is a correctness requirement**: The retry guarantee — that `Internal` (CSPRNG failure) from encrypt is safe to retry because no state was mutated — holds only if `PerformKEMRatchetSend` completes all fallible operations (keygen, encapsulate, KDF_Root) before writing any field. The pseudocode preserves this: `XWing.KeyGen()` and `XWing.Encaps()` both complete before any of the seven fields are written (line: `(new_pk, new_sk) = XWing.KeyGen()` then `(ct, ss) = XWing.Encaps()` then all state writes). A reimplementer who interleaves field mutations with fallible operations — for example, storing the new ratchet keypair immediately after keygen but before encapsulate — loses the retry guarantee and must implement explicit snapshot/rollback for the interleaved fields.

**Encrypt atomicity is a documentation-only guarantee, not a structural one**: The decrypt path enforces rollback integrity via an explicit `save_recv_state` / `restore_recv_state` snapshot, making the invariant structurally visible. The encrypt path has no equivalent snapshot mechanism — its safety guarantee is maintained solely by the ordering of operations in the pseudocode and implementation. A future refactor that reorders operations (e.g., moving `send_count += 1` earlier, or splitting `PerformKEMRatchetSend` across multiple steps with interleaved state mutations) would silently break the retry and nonce-reuse guarantees with no compile-time or runtime protection. Security reviewers auditing the implementation should verify the operation order explicitly, and any refactor touching the encrypt path MUST maintain: (1) all `XWing.KeyGen()`/`XWing.Encaps()` calls complete before any state field is written; (2) `send_count` is incremented only after `AEAD` succeeds; (3) if AEAD fails after any state mutation, `reset()` is called before returning `AeadFailed`.

**`Internal` from encrypt**: When `ratchet_pending = true`, `encrypt()` calls `XWing.KeyGen()` and `XWing.Encaps()` as part of the KEM ratchet step (§6.4). Both operations consume CSPRNG randomness; CSPRNG exhaustion (structurally unreachable on standard OSes, but possible on embedded targets or under sandbox misconfiguration) propagates as `Internal`. This failure occurs before any state mutation — no KEM ratchet step has been applied, no counter has been consumed, and the session is unchanged. The call is safe to retry. `ratchet_pending` retains its pre-call value (`true`) — the next `encrypt()` call re-enters `PerformKEMRatchetSend` automatically without any caller intervention. The caller MUST NOT manually clear or re-set `ratchet_pending` after an `Internal` error. The documented encrypt error table is: `ChainExhausted` (counter at limit), `AeadFailed` (session-fatal, see below), and `Internal`. `Internal` has two sources with different retry semantics: (1) CSPRNG failure in `XWing.KeyGen()` / `XWing.Encaps()` — occurs before any state mutation, **retryable**; (2) `recv_ratchet_pk = None` inside `PerformKEMRatchetSend` — the KEM ratchet step requires the peer's last-received ratchet public key, which is absent only if the ratchet was deserialized from a structurally invalid blob. This variant is **not retryable** — the session is structurally inconsistent and must be reset. Callers who treat all `Internal` returns from encrypt as retryable will loop indefinitely on the second variant. See §6.6 for the analogous decrypt error table.

**Session-fatal encrypt failure**: AEAD encryption failure is treated as session-fatal — all session key material (root key, send/receive epoch keys, ratchet keys) is zeroized, making the state permanently unusable. The fingerprints (`local_fp`, `remote_fp`) are also zeroized as part of `reset()` — see §6.10 for the caller obligation to preserve fingerprints independently before reset. The caller must discard the session. The send counter is only incremented on success (after AEAD encryption), so AEAD failure does not consume a counter. The defense-in-depth zeroization prevents any possibility of nonce reuse from retry attempts. In practice, XChaCha20-Poly1305 encrypt only fails on integer overflow (`plaintext.len() ≈ usize::MAX`) — which does not occur with well-formed input. The liveness guard (§6.2) is how this achieves permanent unusability without a separate `is_dead` flag: `reset()` zeros `root_key`, and subsequent encrypt/decrypt calls detect the all-zero root key via constant-time comparison and return `InvalidData`.

**Pseudocode parameters vs. API**: `sender_fp` and `recipient_fp` are shown as explicit parameters in the pseudocode for clarity of AAD construction. In the actual API, they are stored in `RatchetState` at `init_alice`/`init_bob` time (as `local_fp`/`remote_fp`) and are not caller-supplied per call. The Rust signature is `encrypt(&mut self, plaintext: &[u8])` — no fingerprint parameters. The CAPI `soliton_ratchet_encrypt` similarly takes no fingerprint parameters. A reimplementer who exposes per-call fingerprint parameters allows callers to pass different fingerprints on different calls, weakening the AAD binding guarantee.

**Dropped encrypt results orphan counter slots**: A successful `encrypt()` call advances `send_count` irrevocably. If the caller discards the returned `(header, ciphertext)` without transmitting it (e.g., due to a transport-layer error after AEAD succeeded), the counter slot is permanently consumed. The receiver will never see a message at that counter — counter-mode is tolerant of holes, so no error occurs on the receiver side. However, the caller MUST NOT re-encrypt the same plaintext on transport failure: a second `encrypt()` call uses the *next* counter, not the one that was dropped. The Rust API marks `encrypt()` with `#[must_use]`, producing a compiler warning if the result is discarded. **The CAPI `soliton_ratchet_encrypt` carries `__attribute__((warn_unused_result))` in the generated header**, providing the same compiler-level signal in C and C++. Languages without this feature (Go, Python) must enforce this obligation via documentation and caller discipline. **Retry-loop hazard**: a caller who, on transport failure, calls `encrypt()` again to "retransmit" is producing a new message at a new counter — not a retransmission of the original. The recipient will receive two messages at two different counter values. If the original message is ever delivered, no replay detection fires (both counters are distinct), and both messages are accepted. To retransmit, the caller must buffer and resend the *already-encrypted* `(header, ciphertext)` output, not invoke `encrypt()` again.

**Counter gaps are a normal protocol property**: The receiver MUST NOT treat counter gaps (missing entries in the `n` sequence) as errors. Gaps caused by dropped encrypt results are indistinguishable from gaps caused by lost network packets — both appear as missing entries in the counter sequence, and neither leaves any trace in the receiver's state. An application that uses counter gaps for application-layer loss detection, or a reimplementer who adds receiver-side gap-rejection logic, would break the protocol for any transport with unreliable delivery.

**Critical**: The AAD includes the canonical encoding of the full ratchet header. This prevents an attacker from:
- Substituting `ratchet_pk` (would poison recipient's ratchet state).
- Modifying `pn` (would cause incorrect previous-epoch counter range → state desync or forced message loss).
- Injecting a fake `kem_ct` (would corrupt root key derivation).

### 6.6 Message Decryption

**Helper function definitions used in the pseudocode below:**

- **`save_recv_state(state) → snapshot`**: Captures all receive-side state fields: `recv_epoch_key`, `recv_count`, `recv_ratchet_pk`, `prev_recv_epoch_key`, `prev_recv_ratchet_pk`, `recv_seen`, `prev_recv_seen`, `root_key`, and `ratchet_pending`. `recv_count` must be included because the new-epoch path sets it to 0 (line 1345) before AEAD runs — a failed AEAD would leave `recv_count` zeroed unless the snapshot captures and restores it. Does NOT capture send-side state (`send_epoch_key`, `send_ratchet_sk`, `send_ratchet_pk`, `send_count`, `prev_send_count`) — those are mutated only by `encrypt()` and are not part of the decrypt rollback scope. **`epoch` is NOT in the snapshot and is NOT mutated by `decrypt()`** — `epoch` is a serialization counter incremented only by `to_bytes()` (§6.7) to version the stored blob; it plays no role in the cryptographic operations of `decrypt()` and does not appear in any message or AAD. A reimplementer who includes `epoch` in the snapshot or who increments `epoch` inside `decrypt()` would desync the blob version counter from the actual serialize-call count, causing `UnsupportedVersion` errors on reload after a decrypt that was followed by no serialize.

- **`restore_recv_state(state, snapshot)`**: Restores all fields captured by `save_recv_state`, reverting any decrypt-path state mutations. Called on any failure after the snapshot is taken. A no-op in terms of effect if no mutations occurred before the failure.

- **Epoch identification**: The `current_epoch` and `prev_epoch` boolean assignments in the pseudocode below ARE the epoch identification step. The reference implementation extracts this routing logic into a private `identify_epoch()` helper method; the pseudocode inlines it for presentation clarity. Comments in the pseudocode that reference `identify_epoch()` describe this inline routing block.

```
function Decrypt(state, header, ciphertext, sender_fp, recipient_fp):
    // Liveness guard: all-zero root_key indicates a dead (post-reset) session.
    // Constant-time comparison — root_key is secret material.
    if root_key == 0x00{32}:
        raise InvalidData

    // Counter exhaustion guard — BEFORE any KEM ratchet step.
    // recv_count is updated as max(recv_count, n + 1): with n = u32::MAX,
    // n + 1 wraps to 0 in unsigned arithmetic, silently resetting the
    // high-water mark and making all previously-seen counters appear unseen
    // (replay window collapse). Placing this before epoch-specific logic
    // prevents any cryptographic state mutation. NOTE: in this pseudocode,
    // the snapshot is allocated below (at `save_recv_state`, after
    // `identify_epoch()` and the pre-mutation structural checks). The
    // reference implementation allocates the snapshot before this guard —
    // in Rust the guard fires after the snapshot is already taken. Both
    // orderings are correct since no state mutations precede this guard;
    // rollback is a no-op either way. See Appendix E for the failure table
    // entry that documents both orderings.
    // ChainExhausted (not InvalidData) mirrors the send-side guard (§6.5):
    // the counter space is exhausted, not a structural format error.
    if header.n == u32::MAX:
        raise ChainExhausted

    // Identify which epoch this message belongs to.
    // Epoch routing depends solely on header.ratchet_pk — the presence or
    // absence of header.kem_ct is NOT examined until the new-epoch path is
    // confirmed. The three cases are evaluated in priority order (if/else if/else),
    // not as independent predicates.
    // IMPLEMENTATION REQUIREMENT: Implementations MUST NOT add a pre-AEAD structural
    // check that rejects current-epoch or previous-epoch messages carrying a kem_ct.
    // A message that matches the current or previous epoch MAY carry a kem_ct field
    // (e.g., a retransmitted ratchet-step message replayed with different routing).
    // Rejecting such a message as `InvalidData` before AEAD runs would violate the
    // oracle-collapse requirement (§12): `InvalidData` arrives in nanoseconds while
    // AEAD takes microseconds, creating a timing oracle that distinguishes "has kem_ct
    // in wrong context" from "authentication failed." The kem_ct is simply ignored on
    // non-new-epoch paths; AEAD authentication provides the correct rejection if the
    // message is invalid for any reason. A reimplementer adding a "kem_ct MUST be
    // absent for same-epoch messages" guard MUST ensure it is collapsed to `AeadFailed`
    // (not returned as `InvalidData`) if they choose to add it.
    // CONSTANT-TIME REQUIREMENT: Each comparison that is actually executed MUST
    // use constant-time equality (e.g., subtle::ConstantTimeEq). The risk is NOT
    // leaking which epoch a message belongs to — header.ratchet_pk is cleartext,
    // so the epoch type is already publicly observable. The actual risk is leaking
    // the byte values of the stored recv_ratchet_pk or prev_recv_ratchet_pk via
    // a timing side-channel: a crafted probe message with a hand-crafted ratchet_pk
    // that shares a prefix with the stored key can measure whether a partial match
    // shortens or lengthens the comparison, recovering the stored key byte-by-byte.
    // "Both comparisons" is an approximation — the implementation evaluates
    // prev_epoch first and short-circuits (early return) if it matches; current_epoch
    // is only reached if prev_epoch is false. Each comparison that EXECUTES must be
    // constant-time; which comparisons execute depends on state.
    // See Appendix E.
    current_epoch = (state.recv_ratchet_pk is Some AND
                     header.ratchet_pk == state.recv_ratchet_pk)
    prev_epoch = (state.prev_recv_ratchet_pk is Some AND
                  header.ratchet_pk == state.prev_recv_ratchet_pk)

    // When recv_ratchet_pk is None (Alice's initial state, before any receive),
    // current_epoch is always false (None ≠ any public key) and prev_epoch is
    // always false (prev_recv_ratchet_pk is also None). Every incoming message
    // takes the new-epoch KEM ratchet path until the first successful decrypt
    // establishes recv_ratchet_pk. Implementations in languages without native
    // option types (C, Go) must represent None as a distinct sentinel — not
    // all-zeros — and explicitly return false for both epoch checks.
    //
    // **Why `current_epoch` also requires the `is Some` guard**: Both predicates
    // are structurally symmetric. Without the guard, a language using an all-zero
    // sentinel for "absent" public key would evaluate `header.ratchet_pk == 0x00{1216}`
    // as true whenever the header carries an all-zero ratchet_pk — routing the message
    // to the current-epoch path even though no current-epoch key has been established.
    // The guard closes this: if `recv_ratchet_pk is None`, `current_epoch` is false
    // regardless of the header's ratchet_pk value, and the message correctly takes
    // the new-epoch KEM ratchet path. Rust handles this naturally via `Option<T>`;
    // C/Go/Python implementations MUST use a distinct non-zero sentinel or an explicit
    // boolean "is_set" flag, NOT an all-zeros value, to represent the absent state.

    // Structural check: previous-epoch messages require a retained epoch key.
    if prev_epoch AND NOT current_epoch AND state.prev_recv_epoch_key is None:
        raise InvalidData

    // Snapshot all receive-side state for rollback on any failure.
    // (see "State rollback on failure" below)
    snapshot = save_recv_state(state)

    // --- Epoch-specific key derivation (priority matching) ---
    // Previous-epoch check takes priority over current-epoch: this is
    // a correctness requirement, not a tie-breaker for a rare edge case.
    // Without this ordering, a message matching both predicates (possible
    // in certain initial-state configurations where prev_recv_ratchet_pk
    // == recv_ratchet_pk) would be routed nondeterministically.
    // This configuration is unreachable through honest operation — the
    // KEM ratchet always rotates old → prev before writing new → current,
    // so the two keys are always distinct. The priority is a correctness
    // invariant against crafted or corrupted blobs, not a common case
    // requiring special handling.
    // See Abstract.md §5.4 for formal justification.
    if prev_epoch:
        msg_key = KDF_MsgKey(state.prev_recv_epoch_key, header.n)
    else if NOT current_epoch:
        // New epoch: KEM ratchet step.
        kem_ct = header.kem_ct  // must be present; absent → InvalidData
        if state.send_ratchet_sk is None:
            raise InvalidData
            // send_ratchet_sk is None when the party has never sent (e.g.,
            // Bob's initial state before his first encrypt). A forged or
            // misrouted message with an unrecognized ratchet_pk triggers
            // the new-epoch path against this state. Without this guard,
            // a reimplementer might unwrap None/null or silently use a
            // zero key instead of returning an error.
        // Decapsulate with send_ratchet_sk: the sender encapsulated to our
        // send_ratchet_pk (which we published in our last outgoing header),
        // so the matching secret key is send_ratchet_sk, not recv_ratchet_sk.
        ss = XWing.Decaps(state.send_ratchet_sk, kem_ct)
        // In lo-crypto-v1, XWing.Decaps never fails cryptographically — ML-KEM
        // uses implicit rejection (§8.4) and X25519 always produces a 32-byte
        // result. A structural DecapsulationFailed (wrong kem_ct length) is
        // reachable if the ciphertext is malformed. Both DecapsulationFailed
        // and AeadFailed trigger the same snapshot rollback: restore_recv_state
        // before returning. No state mutations have occurred yet at this point —
        // epoch rotation (saving previous epoch keys, overwriting recv_epoch_key,
        // resetting recv_count, clearing recv_seen) follows below. Rollback is
        // applied unconditionally by the snapshot-and-restore mechanism regardless.
        // DecapsulationFailed on the decrypt path is NOT session-fatal — unlike
        // AeadFailed on the encrypt path (§6.5), which zeroizes all key material.
        // The snapshot rollback restores the session to its pre-call state,
        // and the caller can retry or discard the message. Treating decrypt-side
        // DecapsulationFailed as session-fatal (by analogy with encrypt-side
        // AeadFailed) would incorrectly terminate sessions on malformed messages.

        // Rotate previous epoch: current becomes previous.
        // Only save the previous epoch if recv_ratchet_pk was set (meaningful
        // current epoch exists). On the first KEM ratchet (Alice's init state),
        // recv_ratchet_pk is None — there is no previous epoch to save.
        if state.recv_ratchet_pk is Some:
            state.prev_recv_epoch_key = state.recv_epoch_key
            // ZEROIZATION NOTE: after this assignment, `state.prev_recv_epoch_key`
            // holds the old epoch key and `state.recv_epoch_key` holds a copy.
            // Both copies must be zeroized at their respective lifetimes:
            // `prev_recv_epoch_key` is zeroized when a second KEM ratchet step fires
            // (overwritten by the then-current epoch key or set to None); `recv_epoch_key`
            // is zeroized by `KDF_Root` overwriting it two lines below. In Rust,
            // `prev_recv_epoch_key` is `Option<Zeroizing<[u8; 32]>>` — the overwrite or
            // drop triggers zeroization of the old epoch key it holds. `recv_epoch_key`
            // and `send_epoch_key` are plain `[u8; 32]` (not `Zeroizing` wrappers —
            // `[u8; 32]` is `Copy`, so `Zeroizing::new(val)` would copy rather than move,
            // leaving the original on the stack). The overwrite of `recv_epoch_key` by
            // `KDF_Root` does NOT automatically zeroize the old value; the zeroization
            // responsibility belongs to the KDF_Root call that overwrites it. C/Go/Python
            // implementations must explicitly zeroize the old `recv_epoch_key` before
            // overwriting it at line `state.recv_epoch_key = new_epoch_key` — see §6.4
            // for the analogous pattern.
            state.prev_recv_ratchet_pk = state.recv_ratchet_pk
            state.prev_recv_seen = state.recv_seen
            // ORDERING IS CRITICAL: `prev_recv_seen = recv_seen` MUST precede
            // `recv_seen = empty` (six lines below). If reversed, `empty` is copied
            // into `prev_recv_seen`, silently discarding all current-epoch replay
            // protection history. The error is undetectable — the new epoch proceeds
            // normally, but previous-epoch duplicate detection is disabled. Compare
            // §6.4's analogous `prev_send_count = send_count` MUST precede
            // `send_count = 0` ordering requirement.
        else:
            state.prev_recv_epoch_key = None
            state.prev_recv_ratchet_pk = None
            state.prev_recv_seen = empty

        // Derive new current epoch.
        (state.root_key, state.recv_epoch_key) = KDF_Root(state.root_key, ss)
        zeroize(ss)    // ss is no longer needed — zeroize immediately (mirrors §6.4).
                       // Rust's ZeroizeOnDrop covers this automatically; C/Go/Python
                       // reimplementers MUST zeroize explicitly after this line.
        state.recv_ratchet_pk = header.ratchet_pk
        state.recv_count = 0
        state.recv_seen = empty    // MUST follow `prev_recv_seen = recv_seen` above.
        state.ratchet_pending = true

        msg_key = KDF_MsgKey(state.recv_epoch_key, header.n)
    else:
        // Current epoch — the `else` branch of the three-way selector:
        //   if prev_epoch            → previous epoch (above)
        //   else if NOT current_epoch → new epoch / KEM ratchet (above)
        //   else                     → current epoch (here)
        // "Current epoch" means header.ratchet_pk == recv_ratchet_pk.
        msg_key = KDF_MsgKey(state.recv_epoch_key, header.n)

    // AEAD decryption — all epoch types converge here.
    plaintext = DecryptWithKey(msg_key, header, ciphertext, sender_fp, recipient_fp)
    // On AEAD failure: restore_recv_state(state, snapshot), raise AeadFailed.

    // --- Post-AEAD duplicate detection and recv_seen update ---
    // **Security requirement**: Duplicate detection MUST be post-AEAD, not pre-AEAD.
    // Pre-AEAD recv_seen lookup returns in nanoseconds for duplicates vs.
    // microseconds for non-duplicates (key derivation + AEAD), creating a timing
    // oracle that leaks recv_seen set membership. An attacker replaying messages
    // with different counter values can probe which counters have been successfully
    // decrypted. Post-AEAD ordering ensures both duplicate and non-duplicate
    // messages take identical time through key derivation + AEAD.
    // Duplicates succeed AEAD (same key/nonce/ciphertext) but the plaintext is
    // discarded.
    if prev_epoch AND NOT current_epoch:
        // recv_count is NOT updated — it tracks the current epoch only.
        // Previous-epoch counters occupy a different sequence space;
        // unconditionally updating recv_count would break the invariant
        // that recv_seen entries are < recv_count (guard 17).
        if header.n in state.prev_recv_seen:
            restore_recv_state(state, snapshot)
            raise DuplicateMessage
        if |state.prev_recv_seen| >= MAX_RECV_SEEN:
            restore_recv_state(state, snapshot)
            raise ChainExhausted
        state.prev_recv_seen.add(header.n)
    else:
        // Current-epoch path. For messages that arrived as "new-epoch" (different
        // ratchet_pk), the KEM ratchet step above has already updated recv_ratchet_pk
        // to the new peer key — so by this point, the message's ratchet_pk matches
        // the current epoch and is handled here, not in the prev_epoch branch.
        // New-epoch messages follow the same post-AEAD state update as current-epoch
        // messages — recv_count is incremented and n is added to recv_seen. This is
        // not a separate case; the merge is intentional. Implementations that treat
        // new-epoch as an independent code path and omit the recv_count update leave
        // recv_count = 0 after the first new-epoch message, causing guard 17 failures
        // on subsequent serialization.
        if header.n in state.recv_seen:
            restore_recv_state(state, snapshot)
            raise DuplicateMessage
        if |state.recv_seen| >= MAX_RECV_SEEN:
            restore_recv_state(state, snapshot)
            raise ChainExhausted
        state.recv_seen.add(header.n)
        state.recv_count = max(state.recv_count, header.n + 1)
        // Off-by-one trap: `recv_count = header.n` (not `+ 1`) would silently fail
        // guard 17 after a new-epoch ratchet. After the new-epoch step sets recv_count = 0,
        // the first arriving message has n = 0 → recv_count = max(0, 0) = 0, but recv_seen
        // now contains {0}. Guard 17 requires all recv_seen entries to be < recv_count —
        // 0 < 0 is false → InvalidData on next serialization. The `+ 1` produces recv_count
        // = 1 with recv_seen = {0}, which satisfies the invariant (0 < 1).

    zeroize(msg_key)
    return plaintext

function DecryptWithKey(msg_key, header, ciphertext, sender_fp, recipient_fp):
    // Minimum ciphertext length: 16 bytes (Poly1305 tag, zero-length plaintext).
    // AEAD libraries that don't gracefully handle sub-tag-length inputs (e.g.,
    // OpenSSL EVP, some Go crypto/cipher implementations) may panic or return
    // non-standard errors. Guard explicitly before calling the AEAD primitive.
    if len(ciphertext) < 16:
        raise AeadFailed
    // No maximum ciphertext length is enforced at the Rust API level.
    // XChaCha20-Poly1305 accepts inputs up to ~256 GiB, so an authenticated peer
    // can supply a ciphertext of any size, causing the receiver to allocate the
    // full buffer before AeadFailed fires. The CAPI imposes a 256 MiB hard limit
    // (§13.2) that returns InvalidLength before this function is reached.
    // Rust-layer callers and non-CAPI reimplementers MUST impose their own upper
    // bound appropriate to their deployment context.
    // IMPORTANT: sender_fp is the REMOTE party's fingerprint (the message sender),
    // and recipient_fp is the LOCAL party's fingerprint (the message recipient).
    // This is the mirror of encrypt, where sender_fp = local and recipient_fp = remote.
    // A reimplementer who always uses (local_fp, remote_fp) for both directions
    // produces mismatched AAD and silent AEAD failures.
    nonce = 0x00{20} || big_endian_32(header.n)
    header_bytes = encode_ratchet_header(header)
    aad = "lo-dm-v1" || sender_fp || recipient_fp || header_bytes
    return AEAD-Decrypt(msg_key, nonce, ciphertext, aad)
```

**DuplicateMessage MUST NOT return plaintext**: When a message is detected as duplicate (counter already in `recv_seen` or `prev_recv_seen`), the decrypted plaintext MUST be zeroized and the function MUST return only the `DuplicateMessage` error. An API that returns both the plaintext and a duplicate indicator enables application-layer double delivery despite the error signal. The duplicate message successfully decrypts (AEAD is deterministic — same key/nonce/ciphertext produces the same plaintext), but exposing the result defeats the purpose of duplicate detection.

**DuplicateMessage plaintext zeroization obligation for non-RAII implementations**: The duplicate check runs **after** AEAD decryption (§6.6 post-AEAD duplicate detection rationale). This means the plaintext output buffer has already been filled by the time `DuplicateMessage` is raised. Non-RAII implementations (C, Go, Python) that use a "free on error" pattern will free the buffer without zeroing it — leaking plaintext in freed-but-not-overwritten memory. The obligation is: on `DuplicateMessage`, explicitly zeroize the plaintext output buffer before returning the error (or before freeing the buffer). In Rust, wrapping the output buffer in `Zeroizing<Vec<u8>>` handles this automatically via `Drop`. In C: `explicit_bzero(buf, len); free(buf);` before returning. In Go: `clear(buf)` (or `for i := range buf { buf[i] = 0 }`) before discarding. The same obligation applies to `AeadFailed` — both errors cause the function to return after AEAD has already written into the output buffer.

**`recv_ratchet_pk` is stored verbatim without pre-AEAD structural validation**: When a new-epoch message arrives, `header.ratchet_pk` (1216 bytes) is stored as the new `recv_ratchet_pk` immediately before AEAD decryption runs. No structural validity check (all-zero test, low-order point check, ML-KEM key validation) is performed before storage. Invalid key material surfaces as `AeadFailed` via X-Wing's implicit rejection (§8.4) when the next KEM ratchet step attempts decapsulation with that key. A reimplementer who adds a pre-AEAD structural check on `ratchet_pk` — for example, rejecting an all-zero public key before attempting AEAD — creates a timing oracle: the check returns in nanoseconds while AEAD takes microseconds, allowing an attacker to probe key validity without triggering an AEAD attempt.

**Receiver does not use `pn` for key derivation**: The `pn` (previous epoch count) field in the header is authenticated via AAD but the receiver performs no other processing on it. In counter-mode, any message key is directly derivable from the epoch key and counter — there is no skip-cache scanning bounded by `pn`. Reimplementers from the Signal Double Ratchet ecosystem: `pn` has no skip-cache role here; tampering with `pn` causes AEAD failure (§7.3), nothing else. **No validation constraint on `pn` is applied.** Values from 0 to `u32::MAX` are all acceptable on the wire — the receiver MUST NOT add guards on `pn` relative to any state field (e.g., a "pn must be ≤ peer's send_count" check has no basis in this protocol and would reject valid messages).

**Decrypt atomicity**: Unlike encrypt (§6.4), decrypt achieves atomicity through snapshot/rollback rather than operation ordering. The encrypt path can guarantee atomicity by ordering — all fallible operations execute before any state mutation, so a failure leaves state unchanged by construction. The decrypt path cannot use ordering-based atomicity: on the new-epoch path, KEM decapsulation produces the shared secret `ss`, and only after decapsulation do the state mutations occur (`prev_recv_epoch_key` save, `KDF_Root(root_key, ss)` overwriting `recv_epoch_key`, resetting `recv_count` to 0, clearing `recv_seen`). Because state mutations occur after a fallible operation (decapsulation), a failure during or after mutation cannot be recovered by reordering alone. Since fallible cryptographic operations follow state mutations on this path, the only correct atomicity mechanism is snapshot/rollback — take a full snapshot before any mutation, restore it on any failure. The Rust reference implementation's `save_recv_state` / `restore_recv_state` (see helper definitions above) implement this contract. A reimplementer who attempts ordering-based atomicity for decrypt — placing mutations after all fallible operations — cannot do so correctly on the new-epoch path and will either fail to perform the KEM ratchet step or silently corrupt state on failure.

**State rollback on failure**: All receive-side state mutations are rolled back on any failure (decapsulation failure, chain exhaustion, AEAD failure, duplicate detection). The implementation takes a full snapshot of nine fields before any mutation: `root_key`, `recv_epoch_key`, `recv_count`, `recv_ratchet_pk`, `ratchet_pending`, `recv_seen`, `prev_recv_epoch_key`, `prev_recv_ratchet_pk`, and `prev_recv_seen`. On any error, the entire snapshot is restored wholesale. Send-side state is never mutated by decrypt and is not snapshotted.

**`recv_seen` and `prev_recv_seen` snapshots must be deep copies**: Both fields are sets of u32 values that grow during decryption. In Rust, `Clone` on `HashSet<u32>` always produces an independent deep copy. In Python, Go, Java, and other languages with reference semantics, a simple variable assignment copies the reference — not the contents. Mutating the live set (e.g., inserting the new counter into `recv_seen`) would silently corrupt the snapshot, making rollback a no-op (the snapshot points to the same backing storage as the live set). The snapshot MUST be a fully independent set with the same elements — a deep copy whose mutations during decrypt_inner do not affect the snapshot, and whose restoration on error completely replaces the live set's contents.

**`recv_ratchet_pk` and `prev_recv_ratchet_pk` snapshots also require deep copies**: These public-key fields have the same reference-semantics trap as the `recv_seen` sets. The new-epoch path executes `prev_recv_ratchet_pk = recv_ratchet_pk` (assignment / shallow copy) and then `recv_ratchet_pk = header.ratchet_pk` (mutation). In Rust, public-key structs are `Clone`-derived and `Copy` is not implemented (they're non-trivial), so the snapshot `Clone` is always a value copy — no alias. In Python/Go/Java, if the snapshot holds a reference to the same object as the live field, the second assignment (`recv_ratchet_pk = header.ratchet_pk`) does not corrupt the snapshot — the snapshot still holds the original reference, which now also happens to be the live `prev_recv_ratchet_pk`. But on rollback, restoring the snapshot `recv_ratchet_pk` to the snapshot value points it back to the original object (now shared with the live `prev_recv_ratchet_pk`). This leaves `recv_ratchet_pk` and `prev_recv_ratchet_pk` pointing to the **same** object post-rollback, so the next same-epoch message (which should route via the current `recv_ratchet_pk`) will compare equal to `prev_recv_ratchet_pk` and route incorrectly via the new-epoch path — failing AEAD, appearing as a silent session corruption. The fix: deep-copy all public key fields in the snapshot. In Python: `snapshot_recv_ratchet_pk = bytes(live_recv_ratchet_pk)` (or equivalent). In Go: copy the underlying byte array rather than taking a slice reference.

**Snapshot zeroization obligation on the success path**: The snapshot copies of `root_key`, `recv_epoch_key`, and `prev_recv_epoch_key` are secret key material. On the success path, the snapshot is discarded rather than restored — but discarding must mean zeroizing, not merely freeing or letting the memory go out of scope. In Rust, `Zeroizing<[u8; 32]>` zeroizes automatically on drop, so the success path is correct by construction. In C, Go, Python, or other non-RAII languages, the caller who implements this function MUST explicitly zero these three fields in the snapshot before returning from the success path. Failing to do so leaves copies of old key material in freed-but-not-zeroed memory, where they are recoverable via heap-scanning for the duration they remain un-overwritten. The rollback invariant is "on error restore, on success zeroize" — not "on error restore, on success ignore."

**Why `ratchet_pending` is in the snapshot**: A new-epoch decrypt tentatively sets `ratchet_pending = true` (§6.6 KEM ratchet step) before AEAD runs. If AEAD fails and `ratchet_pending` is not restored to its pre-decrypt value, the next `encrypt()` call fires a KEM ratchet step against the (also rolled-back) old `recv_ratchet_pk` using the rolled-back `root_key`, producing a ciphertext the peer cannot process — silent session corruption with no error on the sender side. A reimplementer who implements partial rollback (e.g., omits `ratchet_pending` thinking it is a flag that should remain set after any new-epoch attempt) gets exactly this failure mode.

**Previous epoch grace period**: When a KEM ratchet step occurs, the current epoch key is preserved as `prev_recv_epoch_key`. Late-arriving messages from that epoch can still be decrypted using counter-mode derivation from the old epoch key. The previous epoch key is zeroized when a second KEM ratchet step rotates it out. This provides a one-epoch grace period for out-of-order delivery without storing per-message keys. "One epoch" means one receive epoch — one KEM ratchet step in the receive direction. A send-side KEM ratchet step does not rotate `prev_recv_epoch_key`. Implementations that interpret "epoch" as any KEM ratchet step (send or receive) will incorrectly expect recovery across two direction changes.

**`recv_count` asymmetry across epochs**: `recv_count` tracks only the current receive epoch — it is the high-water mark for current-epoch message counters. Previous-epoch messages update `prev_recv_seen` but do NOT update `recv_count`. This is critical for guard 17 (§6.8): `recv_seen` entries must be `< recv_count`. If previous-epoch messages updated `recv_count`, a previous-epoch counter (which could be any value in `[0, u32::MAX − 1]`) would corrupt the high-water mark for the current epoch, invalidating the guard 17 invariant. There is no analogous `prev_recv_count` — when a receive epoch rotates into previous, its `recv_count` is not preserved. `prev_recv_seen` entries are bounded only by guard 14 and guard 15.

**Timing asymmetry across epoch paths**: New-epoch decrypt performs X-Wing decapsulation (~1ms); current-epoch and previous-epoch paths are O(1) HMAC key derivation (~microseconds). This timing difference is not a side-channel oracle because the epoch type is determined solely by comparing `header.ratchet_pk` (a cleartext header field) to stored public keys — an observer who can measure timing already knows the epoch type from the public key. The constant-time requirement (Appendix E) applies to the public-key comparisons themselves, not to equalizing path runtimes. Reimplementers MUST NOT add dummy KEM operations to equalize paths — this would waste CPU for no security benefit.

**Decrypt error table**: `decrypt()` / `soliton_ratchet_decrypt` returns five distinct variants:
- `InvalidData`: four distinct conditions, all returning `InvalidData`:
  - **Dead session** (all-zero `root_key`, pre-snapshot): the session has been permanently terminated by a session-fatal encrypt error (§6.5), which zeroized `root_key`. This `InvalidData` is **not retryable** for any message — the session is irrecoverable. Re-establish via LO-KEX.
  - **Epoch too old** (missing `prev_recv_epoch_key` for a previous-epoch message, pre-snapshot): the session is still live, but the sender's message is from an epoch older than the one-epoch grace period (`prev_recv_epoch_key` has already been rotated out). This `InvalidData` is non-retryable for that specific message, but the session remains functional for current-epoch and new-epoch messages.
  - **`kem_ct` absent in a new-epoch message** (post-snapshot, rollback is a no-op): the header indicates a new epoch (new `recv_ratchet_pk`) but carries no KEM ciphertext. No state mutations have been applied when this fires. Non-retryable for that message (structurally malformed).
  - **`send_ratchet_sk` is `None` on the new-epoch path** (post-snapshot, rollback is a no-op): decapsulation of the peer's new-epoch ciphertext requires the local X-Wing secret key, but the party has never sent (no key was generated yet). Fires before any state mutation. Non-retryable for that message.

  Callers who need to distinguish the dead-session condition from the others may inspect `root_key` for the all-zero sentinel before calling (checking liveness) — there is no error-code distinction at the API level between the four conditions. State is unchanged for all four. The post-snapshot cases (third and fourth) require no rollback because no mutation precedes them, but the unconditional snapshot-and-restore mechanism handles them correctly regardless.
- `ChainExhausted`: recv_seen or prev_recv_seen saturation (transient — resets on next KEM ratchet step), or header.n == u32::MAX (counter exhaustion, not a structural error). State is unchanged. NOT session-fatal — see §12 modes (2).
- `DuplicateMessage`: counter already in recv_seen or prev_recv_seen. State is restored via snapshot rollback on all paths — the snapshot is always taken before epoch-specific processing begins. On the current-epoch and previous-epoch paths, no state has been mutated before duplicate detection fires (key derivation and AEAD precede it, but these are read-only operations with respect to ratchet state); the snapshot restore is therefore a no-op in practice. On the new-epoch path, KEM ratchet step mutations (epoch rotation, recv_count reset, recv_seen clear) have occurred before duplicate detection — but `DuplicateMessage` is structurally unreachable on the new-epoch path because recv_seen was just cleared; see §6.7. In all reachable cases, the snapshot restore is correct and necessary. Plaintext is zeroized and not returned. **Rollback is MANDATORY for DuplicateMessage regardless of whether visible state mutations preceded it** — a reimplementer who omits rollback "because the state wasn't mutated yet" silently corrupts the session on edge-case paths.
- `DecapsulationFailed`: X-Wing decapsulation failure on the new-epoch path — fires at XWing.Decaps() (step 1 of the new-epoch branch), before AEAD, **before state mutations** — decapsulation is the first operation on the new-epoch branch; epoch rotation (saving previous epoch keys, overwriting `recv_epoch_key`, resetting `recv_count`, clearing `recv_seen`) has not yet occurred when this error fires (see §6.6 pseudocode: decapsulation at the top of the new-epoch branch, epoch rotation below). In practice unreachable with valid-length ciphertexts — implicit rejection (§8.4) makes all correctly-sized ciphertexts succeed decapsulation and fail at AEAD instead. Snapshot rollback is applied unconditionally by the snapshot-and-restore mechanism, even though no mutation has occurred — the snapshot is taken before all epoch-specific processing and restored on any error return. State is rolled back via snapshot. NOT session-fatal for decrypt — the session remains usable.
- `AeadFailed`: authentication tag mismatch. State is rolled back via snapshot. NOT session-fatal for decrypt (contrast: `AeadFailed` on encrypt IS session-fatal, §6.5). The session can process subsequent messages.

The pre-snapshot vs. post-snapshot distinction matters for rollback: `ChainExhausted` and the two pre-snapshot `InvalidData` conditions (dead session, epoch too old) fire before any cryptographic state mutation — rollback is a no-op even when the snapshot exists. The two new-epoch-path `InvalidData` conditions (`kem_ct` absent, `send_ratchet_sk` None) fire after the snapshot but also before any state mutation — rollback is a no-op for these as well, but the distinction from the pre-snapshot `InvalidData` conditions matters: the new-epoch path does apply mutations later (see §6.6 KEM ratchet steps), so a reimplementer who checks "can this path mutate state?" at `InvalidData` fire time gets a different answer depending on which condition fired. `DuplicateMessage` and `AeadFailed` occur after state mutations have been tentatively applied and require snapshot restoration. `DecapsulationFailed` occurs **before** state mutations — decapsulation is the first step on the new-epoch path, preceding epoch rotation (see §6.6 pseudocode) — but the snapshot-and-restore mechanism applies unconditionally regardless. **Note**: the reference implementation takes the snapshot unconditionally before all guards (line `snapshot = save_recv_state(state)`, after `identify_epoch()` and before epoch-specific processing). The phrase "pre-mutation" describes the semantic behavior (no state was actually changed), not a conditional snapshot implementation. A reimplementer who omits the snapshot for `InvalidData`/`ChainExhausted` paths on the grounds that "no state was mutated yet" is correct only if those errors genuinely fire before any mutation — but implementing conditional snapshotting adds fragility: if a future refactor moves a mutation earlier, the conditional snapshot silently stops covering it. The unconditional snapshot is simpler and correct by construction. A reimplementer who omits rollback for `DuplicateMessage` or `DecapsulationFailed` — treating them as "pre-mutation" because they seem like early checks — silently corrupts the session state on those error paths.

**Out-of-order messages**: Within the current epoch, messages may arrive in any order. Each message key is derived directly from the epoch key and the message counter — no sequential chain advancement is needed. The `recv_seen` set prevents duplicate processing. Between epochs, messages from the immediately previous epoch are also handled (see above).

**Plaintext zeroization obligation**: The decrypted plaintext is secret material. In Rust, `decrypt()` returns `Zeroizing<Vec<u8>>`, which automatically zeroizes the buffer when dropped. Languages without RAII-style automatic cleanup (C, Go, Python) must explicitly zeroize the plaintext buffer after use — the library cannot manage the caller's copy. This obligation parallels the ratchet state zeroization documented in §6.10 but is easier to overlook because plaintext feels "transient." A plaintext buffer that survives in freed-but-not-zeroed memory is vulnerable to the same heap-scanning attacks that key material is.

### 6.7 Duplicate Detection

Duplicate detection uses a `recv_seen` set (current epoch) and `prev_recv_seen` set (previous epoch) that track successfully-decrypted message counters. Both sets are bounded at `MAX_RECV_SEEN = 65536` entries as defense-in-depth against memory exhaustion.

A message is a duplicate if its counter `n` is already in the appropriate `recv_seen` set. Messages from epochs older than the previous epoch are rejected by the KEM ratchet step (the old epoch key no longer exists; AEAD decryption will fail).

Unlike the Signal Double Ratchet's skip cache (which stores 32-byte message keys per skipped position), the `recv_seen` sets store only 4-byte counters — no secret key material. This eliminates the need for TTL expiry, purge throttling, and the `ZeroizeOnDrop` concerns associated with HashMap rehashing.

**New-epoch path**: For messages that trigger a KEM ratchet step (new-epoch), `DuplicateMessage` is unreachable by construction — the KEM ratchet step clears `recv_seen` to empty before duplicate detection runs, so the `contains()` check always returns false. The rollback covers this path only for theoretical completeness.

### 6.7.1 Worked Example: Four-Message Exchange

The following walkthrough traces a minimal Alice↔Bob exchange, showing the header values (`n`, `pn`, `kem_ct`) and the `recv_count` high-water mark for each message. This is the primary checkable reference for reimplementers verifying their counter and ratchet logic. `recv_count` is updated as `max(recv_count, n+1)` on each received message and resets to 0 on KEM ratchet (epoch transition).

**Initial state** (after §5.4/§5.5 session establishment):
```
Alice:  send_count=1, recv_count=0, prev_send_count=0
        ratchet_pending=false, send_ratchet_pk=Some(EK_pub)

Bob:    send_count=0, recv_count=1, prev_send_count=0
        ratchet_pending=true, send_ratchet_pk=None
```

**Message 1 (A→B)**: Alice continues her first epoch (no ratchet needed).
```
n=1, pn=0, kem_ct=None
```
Alice's `send_count` advances to 2. Bob decrypts with his `recv_epoch_key` at counter 1. Bob's `recv_count` updates: `max(1, n+1) = max(1, 2) = 2`.

**Message 2 (B→A)**: Bob's first send. `ratchet_pending=true` fires the KEM ratchet step.
```
n=0, pn=0, kem_ct=Some(...)
```
Bob had no previous send epoch (`prev_send_count=0`), so `pn=0`. The KEM ciphertext is encapsulated against Alice's `send_ratchet_pk` (which is `EK_pub`). Alice decrypts, sees the new `ratchet_pk`, and sets `ratchet_pending=true`. Alice's `recv_count` resets to 0 on epoch transition (KEM ratchet step), then updates: `max(0, n+1) = max(0, 1) = 1`.

**Message 3 (A→B)**: Alice sends again. `ratchet_pending=true` (from receiving Bob's KEM ciphertext) fires her KEM ratchet step.
```
n=0, pn=2, kem_ct=Some(...)
```
**`pn=2` is the critical non-obvious value.** Alice's previous send epoch had `send_count=2` at the moment the ratchet fired (one message sent at `n=1`, which advanced `send_count` to 2). A reimplementer who initializes Alice's `send_count` at 0 instead of 1 would see `pn=1` here. Bob's `recv_count` resets to 0 on epoch transition, then updates: `max(0, n+1) = max(0, 1) = 1`.

**Message 4 (B→A)**: Bob sends again. `ratchet_pending=true` (from receiving Alice's KEM ciphertext) fires his KEM ratchet step.
```
n=0, pn=1, kem_ct=Some(...)
```
Bob's previous send epoch had `send_count=1` (`pn=1`). Alice's `recv_count` resets to 0 on epoch transition, then updates: `max(0, n+1) = max(0, 1) = 1`.

After these four messages, both parties have completed two full KEM ratchet cycles. Every subsequent direction change triggers a new KEM ratchet step with the expected `pn` = `send_count` at the ratchet boundary.

**Hard limit on late-arriving messages**: The one-epoch grace period (§6.6) means messages from the immediately previous receive epoch can still be decrypted. Messages from any older epoch are permanently undecryptable — the epoch key was zeroized when the second KEM ratchet step rotated it out. This is a protocol-level hard limit, not a buffering opportunity: no amount of caching or reordering at the transport layer can recover a message whose epoch key has been zeroized. Application designers must ensure that transport-layer message ordering keeps latency within one direction change. In practice, epochs in an active conversation are short (1-10 messages between direction changes), so only messages delayed across two or more direction changes are lost.

### 6.8 Ratchet State Serialization

Ratchet state is serialized for encrypted persistent storage. The caller MUST authenticated-encrypt the output before persisting (e.g., via §11 storage encryption) — the serialized form contains all secret key material.

**Epoch increment on serialization**: `to_bytes` increments the epoch counter *before* writing it to the blob and returns the new epoch as its second return value: `(blob, new_epoch)`. The stored value is `epoch + 1`, not the pre-serialization epoch. `from_bytes` loads this value as-is — no increment on load. The idiomatic anti-rollback pattern is to persist both the blob and `new_epoch - 1` as the `min_epoch` for subsequent loads — **not** `new_epoch` itself. Using `new_epoch` directly as `min_epoch` makes the current blob permanently unloadable: the guard `new_epoch > new_epoch` is false, so `from_bytes_with_min_epoch(blob_N, new_epoch)` always returns `InvalidData`, breaking crash recovery. The correct stored value is `new_epoch - 1`, ensuring `new_epoch > new_epoch - 1`. See Caller Obligation 2 for the full crash-safe commit order. A reimplementer who computes `epoch + 1` manually instead of using the return value risks off-by-one errors. A reimplementer who increments on load instead of on save produces incompatible blobs and breaks anti-rollback (the stored epoch would be one behind, potentially equal to the last-seen epoch instead of strictly greater).

**Ownership-consuming serialization**: `to_bytes` consumes (invalidates) the in-memory ratchet state. After serialization, the original state is zeroized and no longer usable. This prevents ratchet forking: if two copies of the same state existed simultaneously, both could encrypt with the same `(epoch_key, send_count)` pair, causing catastrophic AEAD nonce reuse. In languages without move semantics (C, Go, Python), implementations MUST explicitly zeroize and disable the state after serialization — failing to do so enables nonce reuse and full plaintext recovery. **Exception — `ChainExhausted`**: When `to_bytes` returns `ChainExhausted` (guard 24: epoch at `u64::MAX`, or guard triggered by counter saturation), the state is NOT consumed — the in-memory ratchet remains valid and can continue sending/receiving. See guard 24 for recovery semantics. A reimplementer who models `to_bytes` as always-consuming will destroy a recoverable session on counter exhaustion. **Mechanism — `can_serialize()` predicate**: In languages with consuming/move semantics (including Rust), `can_serialize()` MUST be called before `to_bytes` — it is not optional. Calling `to_bytes(self)` directly without a prior `can_serialize()` check risks consuming (moving) the session into `to_bytes` and losing it if `to_bytes` returns `ChainExhausted`. In Rust, once the session is moved into `to_bytes` and the function returns an error, the session is gone (the `self` was consumed). The "state not consumed on `ChainExhausted`" contract described above applies only to the CAPI layer (which uses the `can_serialize()` pre-check before taking ownership); at the Rust API level, the caller is responsible for calling `can_serialize()` first. The Rust core exposes `RatchetState::can_serialize(&self) -> bool`, which checks all six conditions that `to_bytes` would fail on: `send_count == u32::MAX`, `recv_count == u32::MAX`, `prev_send_count == u32::MAX`, `epoch == u64::MAX`, `recv_seen.len() >= MAX_RECV_SEEN`, or `prev_recv_seen.len() >= MAX_RECV_SEEN`. If `can_serialize()` returns true, `to_bytes` is guaranteed to succeed (no `ChainExhausted`). A reimplementer's `can_serialize()` must cover exactly these six conditions. The CAPI layer calls `can_serialize()` before taking ownership, which is why the CAPI `to_bytes` only visibly checks epoch — the other conditions are filtered by the pre-check. **`recv_count` reachability**: Unlike `send_count`, which is guarded at the encrypt side (§6.5 ChainExhausted fires before `send_count` reaches `u32::MAX`), `recv_count` has no equivalent decrypt-side guard — rejecting a valid message solely because it would push `recv_count` to `u32::MAX` would be incorrect. A message with `header.n = u32::MAX - 1` is accepted, producing `recv_count = u32::MAX`. This is reachable only after ~4.3 billion received messages in a single epoch without a KEM ratchet step — implausible but structurally possible. Once `recv_count == u32::MAX`, `can_serialize()` returns false and the session is un-serializable until the next KEM ratchet step, which resets `recv_count` to 0 (§6.6). Recovery is through a direction change (the peer sends, triggering a KEM ratchet). If the session is one-directional with no peer replies, a new LO-KEX exchange is required.

**Defense-in-depth conditions**: `can_serialize()` also checks `recv_seen.len() >= MAX_RECV_SEEN` and `prev_recv_seen.len() >= MAX_RECV_SEEN`. The runtime cap in `decrypt` (§6.6) prevents these from firing in practice, but without the pre-check, a future refactor removing the runtime cap would cause `to_bytes` to fail with `InvalidData` rather than `ChainExhausted`, breaking the documented guarantee that `can_serialize() == true` implies `to_bytes` succeeds. **Error type note**: if the `recv_seen` size cap is somehow bypassed (future refactor, direct state manipulation), `to_bytes` returns `InvalidData` — not `ChainExhausted` — because the overflow is a structural violation, not a counter exhaustion. The `can_serialize()` predicate exists precisely to unify these different underlying error types into a single boolean: a reimplementer who checks only for `ChainExhausted` from `to_bytes` will miss the `InvalidData` from `recv_seen` overflow and treat it as a non-recoverable failure when it is actually recoverable via KEM ratchet step (same as counter exhaustion).

**`InvalidData` from `to_bytes` consumes the session state — asymmetry with `ChainExhausted`**: When `to_bytes` returns `ChainExhausted`, the CAPI layer's `can_serialize()` pre-check ensured the state was never consumed (handle not nulled, session still live). When `to_bytes` returns `InvalidData` (the `recv_seen` overflow path if `can_serialize()` is bypassed), the state IS consumed: at the Rust API level, `self` was moved into `to_bytes` and the session is gone; at the CAPI level, the handle is nulled on any path that takes ownership and then fails. A CAPI caller treating `InvalidData` from `to_bytes` as retryable — analogous to `ChainExhausted` — is holding a dangling handle. The asymmetry: `ChainExhausted` from `to_bytes` → state not consumed, retryable (wait for direction change); `InvalidData` from `to_bytes` → state consumed, session lost, must re-establish via LO-KEX.

**Scope of `can_serialize()`**: The predicate guarantees only that `ChainExhausted` will not be returned. It does not check liveness conditions like `root_key != 0x00{32}` (guard 25). **`to_bytes()` succeeds on a dead/reset session** — all counters are within bounds, so `can_serialize()` returns true and `to_bytes()` produces a blob. The failure is deferred: `from_bytes()` subsequently rejects the all-zero root_key via guard 25 (`InvalidData`). `can_serialize() == true` therefore guarantees `to_bytes()` success, but does NOT guarantee `from_bytes()` success on the resulting blob. In practice this is academic: `encrypt()` and `decrypt()` both reject dead sessions before any state mutation, so a dead session never accumulates state worth serializing. The full guarantee for a round-trip that survives both `to_bytes()` and `from_bytes()` is `can_serialize() == true` AND the session is alive (initialized via `init_alice`/`init_bob` and not subsequently `reset()`).

**Serialization buffer zeroization**: The serialization output buffer contains root keys, epoch keys, and ratchet secret keys — all secret material. Implementations SHOULD pre-allocate the buffer to its exact final size before writing any data. If a dynamic array (Vec, list, slice) reallocates during serialization, the abandoned allocation containing partial secret material is freed to the heap allocator without zeroization — only the final allocation is covered by zeroize-on-drop. In Rust, pre-allocation is the actual guarantee; a `debug_assert` on capacity detects underestimates during testing but is compiled out in release builds — if the pre-computed size is wrong, a release binary silently reallocates and the abandoned partial allocation is freed without zeroization. In C/Go/Python, pre-compute the buffer size and allocate once. The output buffer itself MUST be zeroized after the caller has finished with it (e.g., after passing it to storage encryption).

**MAX_RECV_SEEN cap at runtime**: When `recv_seen` or `prev_recv_seen` reaches `MAX_RECV_SEEN` (65536) entries during decrypt, subsequent messages in that epoch return `ChainExhausted`. This is transient — the cap resets on the next KEM ratchet step (which clears `recv_seen`). The cap prevents unbounded memory growth from an authenticated peer sending many messages in a single epoch.

**Which epoch paths can fire this cap**: The `recv_seen` saturation check (`ChainExhausted`) fires only on the **current-epoch path** (messages in the active receive epoch) and the **previous-epoch path** (messages in the prior epoch, checked against `prev_recv_seen`). The **new-epoch path** (which triggers a KEM ratchet step) is immune: it clears `recv_seen` to empty before the duplicate/cap check, so the cap check on that path is structurally unreachable — a new epoch always starts with an empty `recv_seen`. A reimplementer testing `ChainExhausted` from `recv_seen` saturation MUST use the current-epoch or previous-epoch path, not the new-epoch path.

**`prev_recv_seen` recovery requires two KEM ratchet steps, not one.** When `recv_seen` saturates, one KEM ratchet step clears it (new epoch starts with empty `recv_seen`). However, when `prev_recv_seen` saturates (the previous-epoch path hits the cap), the first KEM ratchet step copies the current `recv_seen` (which may itself be full) into `prev_recv_seen` — overwriting the saturated set with another potentially-full set. Only the second KEM ratchet step clears `prev_recv_seen` by rotating it out entirely (§6.6: the previous-epoch state is overwritten by the current epoch rotating into previous, and the second ratchet step makes the previously-rotated-in set into `prev_recv_seen`, which was the then-current `recv_seen` — empty or small if the second epoch was short). A caller who expects one direction change to unblock all `ChainExhausted` errors from `decrypt()` will be wrong for the previous-epoch saturation case — two direction changes (two KEM ratchet steps) are required. **Qualification**: two steps are sufficient only if the second epoch accumulates fewer than `MAX_RECV_SEEN` messages before the second direction change. If the second epoch also saturates `recv_seen`, the first KEM step copies that full set into `prev_recv_seen`, and a third direction change is needed. In the degenerate case where every epoch saturates, each direction change clears `recv_seen` but may refill `prev_recv_seen` — the pattern converges only when an epoch is short enough to stay below the cap.

**Wire format (version 0x01):**

```
[version: 1 byte = 0x01]
[epoch: u64 BE — anti-rollback monotonic counter]
[root_key: 32 bytes]
[send_epoch_key: 32 bytes]
[recv_epoch_key: 32 bytes]
[local_fp: 32 bytes]
[remote_fp: 32 bytes]
[send_ratchet_sk: optional field]
[send_ratchet_pk: optional field]
[recv_ratchet_pk: optional field]
[prev_recv_epoch_key: optional 32-byte field — EXCEPTION: encoded as 0x01 + 32 bytes (no 2-byte length prefix)]
[prev_recv_ratchet_pk: optional field]
[send_count: u32 BE]
[recv_count: u32 BE]
[prev_send_count: u32 BE]
[ratchet_pending: 1 byte, 0x00=false, 0x01=true]
[num_recv_seen: u32 BE]
[recv_seen entries × num_recv_seen: each u32 BE, sorted ascending]
[num_prev_recv_seen: u32 BE]
[prev_recv_seen entries × num_prev_recv_seen: each u32 BE, sorted ascending]
```

**Optional field encoding**: Each optional field is prefixed with a marker byte:
- `0x00` — absent (1 byte total; no data follows)
- `0x01` — present (1-byte marker + 2-byte BE length + data bytes)

Decoders MUST treat any marker byte other than `0x00` or `0x01` as `InvalidData`. Do NOT treat arbitrary non-zero values as "present" — doing so creates format malleability (multiple byte values encoding the same logical state) and accepts blobs that no conforming encoder produces. This strictness applies equally to the `ratchet_pending` boolean (which uses the same `0x00`/`0x01` encoding).

**Exception**: `prev_recv_epoch_key` uses fixed-size encoding (`0x01` + 32 bytes, **no** 2-byte length prefix) since the size is always exactly 32 bytes. Implementers MUST NOT apply the general `0x01 + length + data` rule to this field — doing so produces blobs that are not interoperable.

**Worked byte sequence for the present case**: If `prev_recv_epoch_key = [0xAA × 32]`, the encoded field is: `01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa` (33 bytes — 1-byte marker followed directly by 32 key bytes). Compare with a general optional field: if `send_ratchet_sk` were present, its encoding would be `01 09 80 bb bb...bb` (1-byte marker + 2-byte BE length `0x0980` = 2432 + 2432 data bytes). For `prev_recv_epoch_key`, the two length-prefix bytes are absent — the marker `0x01` is followed immediately by the 32 key bytes. A decoder that reads a 2-byte length prefix after the `0x01` marker would interpret key bytes 1-2 as a spurious length, then misalign all subsequent fields by 2 bytes, producing `InvalidData` with no diagnostic pointing to this specific field.

A present marker with a zero-length body is rejected as `InvalidData`.

**Expected field sizes for length-prefixed optional fields**: Decoders MUST reject blobs where the 2-byte length prefix does not equal the expected size:

| Field | Expected size | Type |
|-------|--------------|------|
| `send_ratchet_sk` | 2432 bytes | Fully expanded X-Wing secret key (§8.5). The 2432-byte size is the expanded form (32 X25519 bytes + 2400 ML-KEM-768 expanded bytes) — NOT the 32-byte seed. Guard 2 rejects any blob where the length prefix is not exactly 2432. Storing the compact 32-byte seed and re-expanding at load time produces a different length and triggers guard 2 with `InvalidData`. |
| `send_ratchet_pk` | 1216 bytes | X-Wing public key |
| `recv_ratchet_pk` | 1216 bytes | X-Wing public key |
| `prev_recv_epoch_key` | 32 bytes | Epoch key (fixed-size encoding, **no length prefix** — see exception above) |
| `prev_recv_ratchet_pk` | 1216 bytes | X-Wing public key |

**Version history**:
- `0x01`-`0x04`: previous chain-ratchet designs (not supported).
- `0x05`: counter-mode epoch keys, previous epoch key, recv_seen sets (current).

**Forward compatibility**: Implementations MUST reject any version byte other than `0x01` with `UnsupportedVersion`. Do NOT attempt to parse unknown versions using current-version rules — field layout changes between versions are not backwards-compatible.

**Annotated byte-offset table**: Appendix F contains a byte-offset-annotated layout for Alice's and Bob's initial states (field name, byte range, and size), useful for debugging serialization interoperability. Refer to Appendix F when implementing an encoder — the compact field listing above gives field order but not absolute offsets, which depend on the sizes of preceding optional fields and are easiest to verify against the Appendix F examples.

**Deserialization validation** (twenty-four active guards, numbered 1-25 with guard 4 removed in v5; implementations may decompose these into more code-level checks — e.g., guard 20 produces two checks (one per fingerprint), guard 19+20 together produce three checks). All 24 active guards apply exclusively to `from_bytes` / `from_bytes_with_min_epoch`. The `to_bytes` path enforces only the six `can_serialize()` conditions (guards 5 and 24 for counters/epoch, plus the two `recv_seen` size caps). Guards like 25 (all-zero `root_key`) are not checked on serialization — serializing a dead session produces a valid blob that fails on reload, which is acceptable for cleanup flows:
1. Version byte must be `0x01`; other values → `UnsupportedVersion`.
2. `send_ratchet_sk` and `send_ratchet_pk` must be both present or both absent.
3. `recv_count > 0` requires `recv_ratchet_pk` present.
4. _(Removed in v5.)_ `recv_count == 0` with `recv_ratchet_pk` present is a valid state. It occurs after a KEM ratchet step in decrypt (§6.6) sets `recv_ratchet_pk` to the new peer key and resets `recv_count` to 0, before any message in the new epoch is successfully decrypted. If the triggering message's AEAD fails and the state is rolled back, or if serialization occurs before the next successful decrypt, the blob has `recv_count == 0` with `recv_ratchet_pk` present. Rejecting this combination makes rollback-then-serialize a permanent deserialization failure — sessions become un-deserializable whenever serialization occurs between a KEM ratchet step and the first successful AEAD in the new epoch, a common app-lifecycle event (e.g., the app is backgrounded or killed between receiving a new-epoch header and completing decryption). Reimplementers adding sanity checks MUST NOT treat this combination as invalid.
5. No counter may equal `u32::MAX` → `InvalidData`. For `send_count`, this is unreachable by construction — the encrypt-side `ChainExhausted` guard (§6.5) fires at `u32::MAX - 1`, preventing `send_count` from reaching `u32::MAX`. **Invariant: `send_count ∈ [0, u32::MAX − 1]` in any reachable ratchet state** (specifically `[1, u32::MAX − 1]` in Alice's initial epoch, `[0, u32::MAX − 1]` in post-ratchet epochs). Contrast: `recv_count ∈ [0, u32::MAX]` — there is no decrypt-side guard preventing `u32::MAX`. For `prev_send_count`, the same interlock applies: `prev_send_count` is set from `send_count` during a KEM ratchet step, and the encrypt guard prevents `send_count` from reaching `u32::MAX`. **Invariant: `prev_send_count < u32::MAX` in any reachable ratchet state.** A reimplementer who relaxes the encrypt-side guard (e.g., to `send_count >= u32::MAX - 1`) would allow `prev_send_count` to reach `u32::MAX`, causing this deserialization guard to fire and permanently breaking the session. For `recv_count`, unlike the send-side counters, `u32::MAX` is *legitimately reachable* — a peer who sends message `n = u32::MAX − 1` causes `recv_count = max(recv_count, n + 1) = u32::MAX` (there is no decrypt-side guard analogous to the encrypt-side `ChainExhausted`). This guard makes the session un-serializable, but the session remains functional in memory. Recovery: the peer triggers a KEM ratchet step (direction change), which resets `recv_count` to 0 in the new epoch. The error returned SHOULD indicate "un-serializable, recoverable by direction change" rather than "corruption" — the `can_serialize()` predicate (which checks all three counters) will return `false` until the KEM ratchet step occurs.

   **Asymmetry between `to_bytes` and `from_bytes` for `recv_count = u32::MAX`**: When `recv_count == u32::MAX`, `can_serialize()` returns `false` and `to_bytes` returns `ChainExhausted` — the state is not consumed, and the session remains functional in memory. When a blob with `recv_count == u32::MAX` is loaded from storage (an unusual scenario — such a blob cannot originate from a correctly-functioning implementation, since `can_serialize()` prevents `to_bytes` from ever writing such a blob; it could originate from a different implementation without the `can_serialize()` pre-check, storage corruption, or a compatibility scenario with a prior implementation version), `from_bytes` returns `InvalidData` (this guard). The error semantics differ: `ChainExhausted` from `to_bytes` is recoverable (peer triggers direction change); `InvalidData` from `from_bytes` is not (the session is permanently broken from the persistence layer's perspective). A caller handling the persistence layer MUST distinguish these two cases — the recovery action differs: for `ChainExhausted` from `to_bytes`, wait for the peer to send; for `InvalidData` from `from_bytes`, discard the session and re-establish via LO-KEX.
6. `ratchet_pending` requires `recv_ratchet_pk` present.
7. `send_count > 0` requires `send_ratchet_sk` present.
8. `send_count == 0` with `send_ratchet_sk` present → `InvalidData`. This state exists transiently inside `encrypt()` between `PerformKEMRatchetSend` setting `send_count = 0` and the post-AEAD `send_count += 1`. The exclusive access model (§6.2) prevents serialization from capturing that window — a correctly-implemented ratchet never produces a blob with this combination.
9. `send_count == 0 && !ratchet_pending && recv_ratchet_pk.is_some() && send_ratchet_sk.is_none()` → `InvalidData`. This state means a peer ratchet key was received but `ratchet_pending` is false — unreachable because `recv_ratchet_pk` is only set during session init (where `ratchet_pending = true` for Bob) or during a KEM ratchet step in decrypt (which always sets `ratchet_pending = true`).
10. All-default state → `InvalidData`. Precise predicate (5 conditions): `send_count == 0 && recv_count == 0 && !ratchet_pending && send_ratchet_sk.is_none() && recv_ratchet_pk.is_none()`. The remaining fields (`prev_send_count`, `send_ratchet_pk`, `prev_recv_epoch_key`, `prev_recv_ratchet_pk`, `recv_seen`, `prev_recv_seen`) are not checked — they are redundant given other guards: `send_ratchet_pk` is constrained by guard 2 (co-presence with `send_ratchet_sk`); `prev_recv_epoch_key` and `prev_recv_ratchet_pk` are constrained by guard 13 (co-presence); `prev_send_count == 0` is implied by `send_count == 0` (prev_send_count is set from send_count at ratchet time, and no ratchet has fired if send_count == 0 and send_ratchet_sk is absent); `recv_seen` emptiness is implied by `recv_count == 0` (guard 17 requires entries < recv_count); `prev_recv_seen` emptiness is implied by `prev_recv_epoch_key.is_none()` (guard 18). This guard catches blobs where root_key and fingerprints are non-zero but everything else is at initialization defaults — a state unreachable by construction (init_alice sets send_count = 1, init_bob sets recv_count = 1 and ratchet_pending = true).
11. Trailing bytes after complete parse → `InvalidData`.

**Truncated input returns `InvalidData`, not `InvalidLength`**: If the blob is shorter than expected (buffer runs out before all required fields are read), the decoder returns `InvalidData` — not `InvalidLength`. `InvalidLength` would leak parser state: an attacker could probe inputs of increasing length and observe the error transition from `InvalidLength` to `InvalidData`, revealing the byte offset where parsing advanced past the size check, progressively exposing the internal blob layout. Using `InvalidData` for truncation collapses this oracle. A reimplementer who returns `InvalidLength` for short blobs produces a distinguishable error type for the same condition.
12. `epoch` must be strictly greater than the last-seen epoch for the same session → `InvalidData` (anti-rollback; prevents storage-layer replay of older blobs that would cause AEAD nonce reuse).
13. `prev_recv_epoch_key` and `prev_recv_ratchet_pk` must be both present or both absent.
14. `num_recv_seen` and `num_prev_recv_seen` must be strictly less than `MAX_RECV_SEEN` (65536) — consistent with the runtime cap in decrypt that rejects at the boundary.
15. Each `recv_seen` entry and each `prev_recv_seen` entry must not equal `u32::MAX`. Both sets must be in strictly ascending order (which also enforces no duplicates). Non-ascending order or a `u32::MAX` entry in either set → `InvalidData`. **Rationale for `u32::MAX` exclusion**: the decrypt-side `ChainExhausted` guard (§6.6) fires before processing any message with `header.n == u32::MAX`, preventing such a message from ever being successfully decrypted. A counter of `u32::MAX` can therefore never legitimately appear in `recv_seen` or `prev_recv_seen` — any blob that claims otherwise is malformed. **Rationale for sorted order**: deterministic serialization — identical ratchet state must produce byte-for-byte identical blobs for persistent storage recovery and anti-rollback epoch comparison. Both `recv_seen` and `prev_recv_seen` must be sorted; sorting only one set breaks blob determinism. The sort is enforced at serialization time (`to_bytes`); a reimplementer who stores entries in insertion order and sorts lazily on decode violates the identical-blob invariant.
16. `prev_recv_epoch_key` all-zero → `InvalidData` (deterministic message keys).
17. Each `recv_seen` entry must be `< recv_count` (high-water mark consistency). There is no analogous `prev_recv_count` high-water mark for `prev_recv_seen` — when a receive epoch rotates into previous, its `recv_count` is not preserved. `prev_recv_seen` entries are bounded only by guard 14 (`< MAX_RECV_SEEN`) and guard 15's `u32::MAX` exclusion; any value in `[0, u32::MAX − 1]` is valid. **This asymmetry is intentional and safe**: `prev_recv_seen` is a bounded, append-only set (bounded by guard 14) that is discarded on the next KEM ratchet step — no computation depends on a high-water mark relationship between `prev_recv_seen` entries and any stored counter. A reviewer noticing the asymmetry should not add a `prev_recv_count` field to close it — doing so would require persisting the previous epoch's `recv_count`, adding wire format complexity for no security benefit. The absence of the check is a known, deliberate asymmetry.
18. Non-empty `prev_recv_seen` requires `prev_recv_epoch_key` present. The converse does not hold — `prev_recv_epoch_key` present with `prev_recv_seen` empty is valid (occurs immediately after the first KEM ratchet step, before any previous-epoch messages arrive).
19. `local_fp == remote_fp` → `InvalidData` (self-sessions break AAD symmetry).
20. All-zero `local_fp` or `remote_fp` → `InvalidData` (indicates uninitialized fingerprints).
21. All-zero `send_ratchet_sk[0..32]` (the X25519 scalar portion, per §8.1 layout) when present → `InvalidData`. Only the X25519 scalar (first 32 bytes) is checked: an all-zero scalar produces an all-zero X25519 DH output during X-Wing decapsulation, collapsing the combiner and eliminating X25519's contribution to the shared secret. The ML-KEM-768 component (bytes 32-2432) has internal structure validated by the ml-kem crate's own deserialization — an all-zero ML-KEM key is rejected at the library boundary before this guard runs. Note: X25519 clamping (RFC 7748 §5: set bits 0,1,2 clear, bit 254 set, bit 255 clear) makes an all-zero scalar impossible for honestly-generated keys — this guard catches only maliciously crafted or corrupted blobs.
22. `recv_count > 0` OR `recv_ratchet_pk` present, with all-zero `recv_epoch_key` → `InvalidData` (liveness: a post-initial-state session must have a real epoch key — otherwise HMAC-based message key derivation produces publicly computable keys).
23. `send_count > 0 && !ratchet_pending` with all-zero `send_epoch_key` → `InvalidData` (liveness: messages were sent with a non-functional epoch key). The `!ratchet_pending` conjunction is defense-in-depth: the state (`ratchet_pending=true`, `send_count > 0`, all-zero `send_epoch_key`) is unreachable by construction — any `send_count > 0` implies a prior KEM ratchet step that produced a non-zero `send_epoch_key`. The condition prevents incorrectly rejecting a theoretically-possible-but-implementation-unreachable serialized state.

    **Bob's initial all-zero `send_epoch_key` is valid**: Bob's initial state has `send_count = 0`, `ratchet_pending = true`, and `send_epoch_key = 0x00{32}` (placeholder). Guards 22 and 23 both skip this state by their conjunctions: guard 22 requires `recv_count > 0` OR `recv_ratchet_pk` present (Bob has `recv_count = 1` and `recv_ratchet_pk` set, but his all-zero key is `send_epoch_key`, not `recv_epoch_key`); guard 23 requires `send_count > 0` (Bob has `send_count = 0`). The all-zero `send_epoch_key` is safe because `ratchet_pending = true` guarantees it will be replaced by `KDF_Root` output during the first `encrypt()` call's KEM ratchet step — it is never used for key derivation. A reimplementer adding their own all-zero-key sanity check must exclude this initial state or Bob's first serialization will be rejected.
24. `epoch` at `u64::MAX` → `ChainExhausted` (next serialization would overflow). Unlike `ChainExhausted` from `encrypt()` (§6.5), this is not session-fatal: the in-memory ratchet state is NOT consumed — it remains valid and can continue sending and receiving. However, the session is permanently un-serializable: `epoch` is incremented only by `to_bytes()` and reset only by `reset()` — no KEM ratchet step or other operation resets it. The only recovery is `reset()` followed by a new LO-KEX exchange. (In practice, `u64::MAX` serializations is unreachable — included for completeness.) **This guard fires on both `to_bytes` and `from_bytes`**: `from_bytes` also rejects a stored `epoch` field equal to `u64::MAX` with `ChainExhausted`. Loading such a blob would produce a session that can send and receive in memory but immediately returns `ChainExhausted` on the next `to_bytes` call — permanently un-persistable without a single message processed. Rejecting at deserialization prevents this "zombie session" state. A reimplementer who checks guard 24 only on the serialization path creates a session that appears to load successfully but can never be saved.
25. All-zero `root_key` → `InvalidData` (session was zeroized/reset — not a valid deserializable state). Constant-time comparison via `ct_eq`.

**Anti-rollback deserialization**: The primary deserialization entry point is `from_bytes_with_min_epoch(blob, min_epoch)`, which enforces guard 12: the blob's `epoch` field must be strictly greater than `min_epoch` (i.e., `epoch > min_epoch`, not `>=`). This prevents storage-layer replay attacks where an adversary substitutes an older serialized blob to rewind the ratchet to a prior state — which would re-derive the same epoch keys and restart counters from their prior values, causing catastrophic AEAD nonce reuse.

The `epoch` counter is a u64 incremented by `to_bytes()` each time the state is serialized. It is a persistence-layer counter independent of cryptographic epochs (KEM ratchet steps) — multiple serializations may occur within a single cryptographic epoch. Consequently, multiple valid blobs with different `epoch` values can encode identical cryptographic state (same keys, counters, and `recv_seen` sets). The anti-rollback mechanism prevents loading a blob with an older persistence epoch, but does not prevent two blobs encoding the same cryptographic state from coexisting on disk. The protection is "no blob older than the last loaded" — not "no blob with a prior cryptographic state."

**Caller obligations**:

1. **Store `min_epoch` with independent integrity.** The caller must persist the last-seen epoch value in a location whose integrity is independent of the ratchet blob itself. If an adversary who can substitute the ratchet blob can also substitute `min_epoch`, the anti-rollback mechanism is defeated. Suitable approaches: a separate authenticated store, a monotonic counter in secure hardware, or a key-value store where each entry is independently authenticated.

2. **Atomic commit of blob + min_epoch.** The `min_epoch` update and the new blob must be committed atomically (or in the correct order) to survive crashes. The safe pattern: after `to_bytes()` returns `(blob_N, epoch_N)`, persist `blob_N` first, then update `min_epoch` to `epoch_N - 1`. This ensures the current blob is always reloadable: `epoch_N > epoch_N - 1` holds. Updating `min_epoch` to `epoch_N` (the blob's own epoch) before persisting the next blob is dangerous: if the application crashes before the next `to_bytes()`, `blob_N` is no longer loadable (`epoch_N > epoch_N` is false) and the session is permanently lost. The invariant is: `stored_min_epoch < epoch_of_current_blob`, so the current blob always passes guard 12.

3. **First-session bootstrap.** For a newly established session (first deserialization ever), the caller should pass `min_epoch = 0`. The initial `to_bytes()` call sets `epoch = 1`, which satisfies `1 > 0`. Subsequent deserializations pass the stored epoch from the prior successful load.

4. **Per-session tracking.** Each ratchet session requires its own `min_epoch` value — sessions are independent and their epoch counters are unrelated. The caller must map session identifiers to their respective min_epoch values. The stable session identity is the `(local_fp, remote_fp)` pair — the two 32-byte fingerprints supplied to `init_alice`/`init_bob`. These are invariant across serialization/deserialization cycles and survive restarts; application-layer connection IDs, database row IDs, or in-process object pointers do not survive restarts and MUST NOT be used as per-session min_epoch keys. A global `min_epoch` shared across sessions enables cross-session substitution: if session A has `epoch = 50` and session B has `epoch = 30`, an attacker who can write to the storage layer can replace session B's blob with session A's blob — the global `min_epoch` (set to 29 from B's last load) accepts A's epoch 50, and the application now has session A's ratchet state in session B's slot, causing messages to B's peer to be encrypted under A's keys (undecryptable by B's peer, but revealing A's ratchet state to B's storage layer).

5. **Handle `ChainExhausted` from `to_bytes()`.** If `to_bytes()` returns `ChainExhausted` (epoch at `u64::MAX`), the session is permanently un-serializable. The caller must treat this as session-fatal for persistence purposes: discard the session and establish a new one via LO-KEX (§5). The in-memory ratchet remains functional for sending and receiving, but any crash or restart without a persisted blob loses the session state. Callers SHOULD check `can_serialize()` before long-running operations and proactively re-establish the session rather than waiting for `to_bytes()` failure. In practice, `u64::MAX` serializations is unreachable.

6. **`to_bytes` is ownership-consuming — serialize before invalidating.** In Rust, `to_bytes(self)` enforces this at the type level (the ratchet is moved into the function). In languages without ownership semantics (C, Go, Python), the implementation must complete serialization of the entire blob before invalidating the handle. An implementation that nulls the handle pointer (or frees the backing memory) before finishing serialization leaves the caller with neither the original state nor a complete blob — a total session loss on crash. The safe pattern: serialize all fields into the output buffer, then zeroize and free the internal state, then return the blob. The CAPI (`soliton_ratchet_to_bytes`) follows this pattern: the output buffer is fully written before the handle is freed.

**`from_bytes` / `from_bytes_with_min_epoch` error table** (all callers MUST handle all three error types):

| Error | Condition | Recovery |
|-------|-----------|----------|
| `UnsupportedVersion` | Version byte ≠ 0x01 (guard 1) | Reject blob; re-establish via LO-KEX. No migration path — old-version blobs are permanently unreadable. For version bytes > 0x01 (future versions), upgrading to an implementation that supports that version may enable recovery. |
| `InvalidData` | Any guard 2-11, 13-23, or 25 fires; or input too large | Blob is structurally invalid or corrupt; re-establish via LO-KEX |
| `ChainExhausted` | `epoch == u64::MAX` (guard 24) | Stored epoch at u64::MAX cannot be re-serialized (`to_bytes` overflows on `epoch + 1`). States with stored epoch u64::MAX - 1 are accepted but non-serializable (`can_serialize()` returns false, `to_bytes` returns `ChainExhausted`); re-establish via LO-KEX |

**Callers matching only `InvalidData | UnsupportedVersion` will misclassify `ChainExhausted`**: guard 24 fires for `epoch == u64::MAX`, returning `ChainExhausted` from `from_bytes`, not `InvalidData`. A caller who pattern-matches only the first two variants will treat `ChainExhausted` as an unhandled/unexpected error, potentially panicking or applying the wrong recovery action. `ChainExhausted` from `from_bytes` requires the same recovery as `InvalidData` (re-establish via LO-KEX), but the caller must handle it explicitly to avoid the default/unmatched case.

**Blob size bound**: The maximum valid ratchet blob is bounded by the wire format — with 65,535 entries in both `recv_seen` sets (guard 14 rejects `num_seen >= MAX_RECV_SEEN` (65536), so the maximum accepted count is 65,535), the blob reaches approximately 530 KB. The dominant term is two recv_seen sets × 65,535 entries × 4 bytes per u32 = 524,280 bytes; the remaining fixed fields (two X-Wing public keys at 1,216 bytes each, the X-Wing secret key at 2,432 bytes, two 32-byte epoch keys, fingerprints, and header fields) add approximately 5 KB. CAPI implementations apply a 1 MiB cap on `from_bytes` input as defense-in-depth against oversized inputs (tighter than the general 256 MiB CAPI cap, since ratchet blobs have a known bounded size). Reimplementers building their own deserialization entry point should apply a similar cap. **Minimum valid blob size is 195 bytes** — all optional fields absent (`0x00` markers, 5 bytes), both `recv_seen` counts zero (2 × 4 = 8 bytes), with fixed mandatory fields (version 1 B + epoch 8 B + root_key 32 B + send_epoch_key 32 B + recv_epoch_key 32 B + local_fp 32 B + remote_fp 32 B + send_count 4 B + recv_count 4 B + prev_send_count 4 B + ratchet_pending 1 B = 182 bytes): 182 + 5 + 8 = 195 bytes. Any blob shorter than 195 bytes cannot represent a valid ratchet state and MUST be rejected with `InvalidData`. The reference implementation rejects such blobs during parsing — the field reader exhausts the buffer and returns `InvalidData` mid-parse rather than via an upfront length guard. Reimplementers who want a fast-reject pre-check SHOULD add an explicit `if blob.len() < 195 { return Err(InvalidData) }` guard before beginning field-by-field parsing; the reference implementation relies on parser exhaustion for equivalent behavior. **The 195-byte floor is a format floor, not a state floor**: no valid ratchet session ever produces a blob this small in practice. Alice's initial state (after `ratchet_init_alice`) includes `send_ratchet_sk` (2,432 bytes) and `send_ratchet_pk` (1,216 bytes), making her minimum blob approximately 3,849 bytes. Bob's initial state (after `ratchet_init_bob`) includes `recv_ratchet_pk` (1,216 bytes) and `peer_ek` (already absorbed), making his minimum exactly 1,413 bytes (see F.21 for the full field-by-field breakdown). The 195-byte check is a fast-reject sanity guard — implementations should not size parsing buffers based on it.

The convenience function `from_bytes(blob)` (without min_epoch) exists for use cases where anti-rollback is managed externally or is inapplicable (e.g., in-memory round-trip during migration). It is equivalent to `from_bytes_with_min_epoch(blob, 0)` and provides no rollback protection. Implementations that persist ratchet state MUST use `from_bytes_with_min_epoch`. `from_bytes` is deprecated at the Rust API level (using it produces a compiler warning). Binding authors SHOULD NOT expose it as a public API — expose only `from_bytes_with_min_epoch` and let callers pass `min_epoch = 0` explicitly when they want no rollback protection.

**Anti-rollback failure recovery**: When `from_bytes_with_min_epoch` rejects a blob due to epoch rollback (guard 12), the session is permanently broken — the persisted state has been rewound to a prior epoch, which would cause AEAD nonce reuse if accepted. The application MUST discard the session and initiate a new LO-KEX exchange (§5). Reimplementers MUST NOT retry with an older blob, silently fall back to stale in-memory state, or attempt to "repair" the epoch counter. The only safe recovery path is full session re-establishment.

**`InvalidData` ambiguity from `from_bytes_with_min_epoch`**: Both epoch-rollback rejection (guard 12) and structural blob corruption (guards 2-11, 13-23, 25) return `InvalidData`. A caller who needs to distinguish "blob is stale, need new KEX" from "blob is corrupted, check backups" cannot do so from the error alone. The recovery action is identical in both cases — discard the session and establish a new one via LO-KEX — so the ambiguity has no practical consequence.

**Diagnostic pattern for higher-level APIs**: Appendix E and §13.5 note that `from_bytes` (the no-min-epoch variant) is deprecated and SHOULD NOT be exposed as a public API in higher-level bindings. A reimplementer who needs to distinguish epoch-rollback from structural corruption without exposing `from_bytes` should implement `inspect_version(blob: &[u8]) -> Result<u8, Error>` — a function that reads only the first byte of the blob and returns the version without parsing or validating any other field. This is not a full deserialization; it carries no mutation risk. If `inspect_version` returns `UnsupportedVersion` and the byte is 0x01, the caller knows the blob is current-version (so not a version mismatch) but epoch-rollback can then be confirmed by comparing `blob[1..9]` (the epoch u64 BE) against `min_epoch`. This diagnostic should be implemented at the application layer using the raw blob bytes, not by calling the library's deprecated `from_bytes`. `from_bytes` is safe to call for purely diagnostic purposes (it is read-only and never mutates external state), but exposing it publicly encourages callers to use it as a primary deserialization path, bypassing anti-rollback protection.

### 6.9 Implementation Notes

**Requirements for implementers:**
1. **Test vectors**: Ship comprehensive vectors covering: in-order, out-of-order, KEM ratchet step, previous-epoch message, duplicate detection.
2. **State serialization**: Round-trip serialize/deserialize ratchet state and verify continued operation.
3. **Fuzzing**: Fuzz the decryption path with random headers. Verify no panics, no state corruption.
4. **Session reset as escape hatch**: On unrecoverable decryption failure, fall back to session reset (§6.10). Lost messages are unavoidable but preferable to permanent communication failure.
5. **Defensive validation**: Before each operation, check invariants (counters non-negative, expected keys present, epoch keys non-zero). Violation → trigger session reset.

### 6.10 Session Reset

**`reset(state)`**: Zeroizes all key material (root_key, send_epoch_key, recv_epoch_key), drops all optional keys (send_ratchet_sk/pk, recv_ratchet_pk, prev_recv_epoch_key, prev_recv_ratchet_pk), resets all counters to 0, clears recv_seen and prev_recv_seen, sets ratchet_pending = false, and resets epoch to 0. The all-zero root_key serves as the liveness sentinel — subsequent encrypt/decrypt calls detect the dead session via constant-time comparison (§6.5, §6.6). The fingerprints (`local_fp`, `remote_fp`) are also zeroized to prevent information leakage from the dead state. **Caller obligation**: after `reset()`, the ratchet handle no longer identifies which peer this session belonged to — both fingerprints are zero. Applications that need to associate the handle with a peer identity after a reset (e.g., to display the peer name or verify a new LO-KEX exchange) MUST store the fingerprints independently (in application state) before calling `reset()`. The library cannot preserve them across reset.

**Destructor zeroization obligation**: Implementations MUST zeroize all key material when the ratchet state object is deallocated (destructor, finalizer, or equivalent), not only on explicit `reset()` calls. In Rust, this is achieved by implementing `Drop` to call `reset()` — any abandoned state (error path, lost reference, scope exit) is automatically zeroized. Non-Rust implementations must arrange equivalent behavior: a C implementation must zeroize in the free function, a Go implementation in a finalizer, a Python implementation in `__del__` (with a note that CPython finalizer timing is non-deterministic — consider explicit `close()` as the primary path). Without destructor zeroization, every error path that drops a session object leaks key material to the process heap.

**`reset()` followed by `to_bytes()` produces a blob that `from_bytes_with_min_epoch` rejects for any `min_epoch ≥ 1`**: `reset()` sets `epoch = 0`. `to_bytes` increments the epoch counter before writing it (§6.8), so a reset state serializes with `epoch = 1` in the blob. `from_bytes_with_min_epoch(blob, min_epoch)` requires `blob.epoch > min_epoch` (§6.8 guard 12). If the application has previously persisted a blob from an active session with `min_epoch = N` (where N ≥ 1), loading the reset-state blob with that same `min_epoch` fails: `1 > N` is false for any N ≥ 1. This is correct behavior — a reset session is cryptographically equivalent to a new session, not a continuation of the old one; the old `min_epoch` correctly rejects it. **Application implication**: after calling `reset()`, the application MUST treat the session as a new session (no persisted `min_epoch` applies). If the application stores the old `min_epoch` persistently and uses it for all subsequent `from_bytes_with_min_epoch` calls, the reset session can never be loaded after its first successful `to_bytes`. The correct recovery path after `reset()` is to discard the old `min_epoch` store and establish a new LO-KEX session, not to attempt to serialize and reload the reset state under the old `min_epoch`.

**`reset()` MUST NOT acquire locks or reentrancy guards**: `reset()` is called from the `Drop` implementation (destructor), which executes whenever the ratchet state is deallocated — including on error paths that may occur while holding internal session locks. If `reset()` itself attempts to acquire a lock or reentrancy guard that is already held by the calling context, it will deadlock. `reset()` must be callable from any context, including from within a failed `encrypt()` or `decrypt()` call. The implementation must ensure that the `reset()` code path is a simple zeroization sequence with no blocking operations, no mutex acquisitions, and no calls to any function that could itself block or fail. In Rust, the `Drop` implementation that calls `reset()` satisfies this by design — Rust's ownership model prevents calling `reset()` on a borrowed-while-mutating object. In C/Go/Python, where the equivalent "destructor" is called explicitly or via finalizer, implementors must audit the `reset()` call path for lock acquisitions.

When ratchet state is unrecoverable (Protocol Spec §12.13):

1. Both parties discard all ratchet state for the peer.
2. Resetting party fetches fresh pre-keys → performs new LO-KEX.
3. New session is cryptographically independent (new EK, new shared secrets, new state).
4. Messages encrypted under the old session that were not yet delivered become permanently undecryptable. This is unavoidable.
5. Verification phrase (§9) is unchanged (depends only on IKs) — confirms identity continuity.

### 6.11 Bandwidth

| Per same-direction message | Per ratchet step |
|---------------------------|------------------|
| ~1216 B (ratchet_pk always in header) | ~2336 B (ratchet_pk + kem_ct; raw KEM fields only; full header = 2,347 B per Appendix C) |

The full ratchet public key is included in every message header so the recipient always knows the sender's current ratchet key. This provides an implicit consistency check.

**Exact encoded sizes**: The approximate values above reflect only the dominant fields. The complete `encode_ratchet_header` output (§7.4) includes the `has_kem_ct` flag (1 byte), counter `n` (4 bytes), and previous-epoch count `pn` (4 bytes) in addition to the public key and ciphertext. Exact values: **1,225 bytes** without KEM ciphertext (1216 + 1 + 4 + 4), **2,347 bytes** with KEM ciphertext (1216 + 1 + 2 + 1120 + 4 + 4) — see Appendix C (`encode_ratchet_header`). The table above uses "~" because those sizes reflect only the dominant network cost relative to payload size; the actual wire overhead is the Appendix C values.

### 6.12 Voice Call Key Derivation

Call encryption key material for E2EE voice calls is derived from the ratchet root key and an ephemeral X-Wing shared secret exchanged during call signaling. The ephemeral KEM provides forward secrecy independent of the ratchet state — if the root key is later compromised (before the next KEM ratchet step), the call content remains confidential.

Group calls use the same mechanism: each participant derives call keys with every other participant via their existing pairwise ratchet sessions. The server acts as an SFU (Selective Forwarding Unit), routing encrypted media packets without decryption. Specifically: the `call_id` is shared across all participants (the initiator generates it once and distributes it via ratchet-encrypted signaling messages to each participant). The ephemeral KEM exchange is **per-pair** — each pair of participants performs an independent `CallOffer`/`CallAnswer` exchange over their pairwise ratchet session, producing a unique `kem_ss` per pair. Each pair therefore derives independent call keys from their unique `(root_key, kem_ss, call_id)` triple. The shared `call_id` provides a common call identifier for the application layer; it does not weaken key independence because `kem_ss` differs per pair.

#### Call Setup Protocol

1. **Initiator** generates `call_id = random_bytes(16)` and an ephemeral X-Wing keypair `(ek_pub, ek_sk) = XWing.KeyGen()`. Sends `CallOffer { call_id, ek_pub }` to the peer as a ratchet-encrypted message. **call_id MUST be unique per call** — reusing a `call_id` with the same `root_key` and `kem_ss` produces identical HKDF inputs and therefore identical call keys. The 128-bit random generation provides collision resistance (~2⁻⁶⁴ birthday bound), but implementations MUST NOT use predictable or sequential call IDs. **`ek_pub` lifecycle**: `ek_pub` is a public key — it is non-secret, transmitted in `CallOffer`, and can be retained or discarded after transmission without security consequence. `ek_sk` is secret and MUST be retained by the initiator until `CallAnswer` arrives (it is needed for decapsulation) and MUST be zeroized immediately after `XWing.Decaps` completes — or on call rejection/cancellation/timeout if no `CallAnswer` ever arrives. The initiator does not retain `ek_pub` for any cryptographic purpose after transmission; the responder holds it for KDF_Call's info field (which only uses fingerprints — see §6.12 HKDF derivation). Neither party uses `ek_pub` after `derive_call_keys` returns.

2. **Responder** encapsulates to the ephemeral public key: `(ct, kem_ss) = XWing.Encaps(ek_pub)`. Sends `CallAnswer { call_id, ct }` as a ratchet-encrypted message. `XWing.Encaps` consumes CSPRNG randomness (ML-KEM encapsulation draws randomness for the ciphertext); at the CAPI level, CSPRNG failure aborts the process (§13.2). At the Rust API level, CSPRNG failure returns `Internal` (structurally unreachable on standard OSes — see §6.5). The `CallAnswer` MUST NOT be sent if encapsulation fails.

3. **Initiator** validates the `CallAnswer` `call_id` matches the `CallOffer` `call_id` before proceeding — mismatched call IDs indicate a confused-deputy or replay attack and MUST be rejected. Then decapsulates: `kem_ss = XWing.Decaps(ek_sk, ct)`. Zeroizes `ek_sk` immediately. **If `CallAnswer` never arrives** (peer rejects, times out, or network failure), `ek_sk` must still be zeroized — the application layer MUST zeroize the ephemeral secret key on call rejection, timeout, or cancellation, not only on successful decapsulation. Failure to do so leaves a decapsulation key in memory indefinitely, recoverable via memory scanning.

4. **Both parties** derive call keys:

```
fp_lo, fp_hi = sort(local_fp, remote_fp)   // canonical order: lower first
                                            // Equal fingerprints are rejected upstream
                                            // (guard 19 at deserialization, and init_alice/init_bob
                                            // reject local_fp == remote_fp) — no tiebreaker branch
                                            // is needed here.
call_keys = HKDF(
    salt = root_key,
    ikm  = kem_ss || call_id,    // 32 + 16 = 48 bytes, raw concatenation (no length prefixes)
    info = "lo-call-v1" || fp_lo || fp_hi,  // 10 + 32 + 32 = 74 bytes, raw concatenation (no length prefixes)
    len  = 96
)

key_a      = call_keys[0..32]
key_b      = call_keys[32..64]
chain_key  = call_keys[64..96]
```

**Initial key semantics (step 0)**: `key_a` and `key_b` from `derive_call_keys` are immediately usable for the first rekeying interval — `step_count` starts at 0 and the initial keys are the step-0 keys. The first call to `AdvanceCallChain` produces step-1 keys. Implementations MUST NOT call `AdvanceCallChain` before using the initial keys — both parties would begin at step 1, producing compatible keys, but a party that calls advance before first use and a party that does not would be off by one with no error or diagnostic.

5. **Role assignment**: The party with the lexicographically lower identity fingerprint (unsigned byte-by-byte comparison, left to right) uses `key_a` as their send key and `key_b` as their recv key. The other party reverses the assignment.

**`call_id` is an opaque 16-byte blob — no UUID normalization**: `call_id` is raw bytes concatenated directly into the HKDF IKM. No byte-order conversion, UUID formatting, or canonical representation is applied. Two implementations that both generate UUIDs MUST concatenate the same raw byte representation — e.g., if both use RFC 4122 little-endian UUID bytes (as in .NET `Guid.ToByteArray()`), both must concatenate exactly those bytes. An implementation that converts to network byte order or uses a different UUID serialization will silently derive different call keys. The safest approach: generate `call_id = random_bytes(16)` and treat those 16 bytes as opaque throughout the call lifecycle, never reinterpreting them as a UUID.

**No length prefixes in info or IKM**: Neither the `info` nor the `ikm` fields use length-prefixed encoding. Unlike `KDF_KEX` (§5.4), which applies `len(x) || x` format to each info component, `derive_call_keys` concatenates all fields raw. A reimplementer who applies the §5.4 convention to `info` would prepend `\x00\x0a`, `\x00\x20`, `\x00\x20` before each component — producing a 80-byte info input instead of 74, silently incompatible call keys. The fixed sizes of all three info fields (`"lo-call-v1"` = 10 bytes, each fingerprint = 32 bytes) make length prefixes redundant for disambiguation; their omission is intentional, not an oversight.

**Why `call_id` is in IKM, not info**: `call_id` goes in IKM (alongside `kem_ss`) as defense-in-depth against KEM randomness failure. If the KEM's random number generator is compromised or biased, a unique `call_id` in IKM introduces variability into the HKDF extraction phase — different calls produce different extracted keys even if `kem_ss` is identical. Fingerprints go in `info` because their secrecy is not required (they are public values providing domain separation). Moving `call_id` to `info` would produce a subtly weaker construction: with a non-random KEM, all calls between the same pair would derive identical keys regardless of `call_id`.

`kem_ss` is zeroized immediately after HKDF. The 48-byte `ikm` concatenation buffer (`kem_ss || call_id`) also contains a copy of `kem_ss` and MUST be zeroized independently — zeroizing `kem_ss` alone leaves this copy in memory. In Rust, the `ikm` buffer is a plain `[u8; 48]` (`Copy` type, not a `Zeroizing` wrapper) and is explicitly zeroized via `ikm.zeroize()` immediately after HKDF — wrapping a `Copy` array in `Zeroizing` would receive a bitwise copy and leave the original on the stack, so the call-site zeroization is required; C/Go/Python implementations MUST explicitly zeroize the 48-byte concatenation buffer before freeing it (e.g., `memset_s(ikm, 0, 48)` in C), not just the 32-byte `kem_ss`. Contrast with §5.4's multi-paragraph IKM zeroization rationale — the same obligation applies here because `ikm` contains a bitwise copy of `kem_ss`. The ratchet state is not modified — `root_key` is read but not advanced. Multiple concurrent calls can each invoke `derive_call_keys` independently; since `root_key` is read-only, these derivations are idempotent with respect to the ratchet and do not interfere with each other or with ongoing message encryption. **`derive_call_keys` reads `root_key` directly from live ratchet state** via the `RatchetState::derive_call_keys(&self, kem_ss, call_id)` method, which does not accept `root_key` as a parameter. The core library also exports a standalone `call::derive_call_keys(root_key, kem_ss, call_id, local_fp, remote_fp)` function that accepts `root_key` explicitly — this exists for CAPI use (where the ratchet handle provides `root_key` and fingerprints internally) and for unit testing. Binding authors and reimplementers SHOULD use the `RatchetState` method, not the standalone function. The standalone function is safe only when the caller guarantees the §6.12 protocol requirement (no KEM ratchet step between signaling and derivation); the method enforces this implicitly by reading live state at call time. A reimplementer who designs the function to accept a `root_key` parameter (allowing the caller to snapshot `root_key` at `CallOffer` time and pass it later) creates an epoch-sync hazard: if a KEM ratchet step fires between `CallOffer` and `derive_call_keys`, the snapshot holds the pre-ratchet `root_key` while the peer's live state has already advanced — producing incompatible call keys with no diagnostic. The §6.12 protocol requirement (no KEM ratchet step between signaling and derivation) makes the live-read safe; a parameter-passing design would require the caller to enforce the same invariant externally.

**Root key epoch sync**: Both parties must call `derive_call_keys` at the same ratchet epoch — i.e., with the same `root_key` snapshot. A KEM ratchet step between sending `CallOffer` and calling `derive_call_keys` (or between receiving `CallAnswer` and deriving) advances `root_key` on one side, producing incompatible call keys with no error or diagnostic (HKDF succeeds, but the other party's derivation uses the old `root_key`). **Protocol requirement (normative)**: no KEM ratchet step (triggered by sending or receiving a ratchet message with a new `ratchet_pk`) may occur between the `CallOffer`/`CallAnswer` exchange and the `derive_call_keys` call on either side. Implementations MUST follow this ordering: initiator MUST call `derive_call_keys` immediately after sending `CallOffer` and before processing any further ratchet messages; responder MUST call `derive_call_keys` immediately after sending `CallAnswer` and before processing any further ratchet messages. This is not advisory — violating this order produces silently incompatible call keys with no error or diagnostic on either side.

**Enforcement window boundaries and concurrent ratchet-step messages**: The enforcement window opens when `CallOffer` is sent (initiator) or received (responder) and closes when `derive_call_keys` is called on the same side. The window is bounded by the round-trip time of the signaling exchange — in practice, tens to hundreds of milliseconds. A ratchet message that arrives during this window and triggers a KEM ratchet step (because it carries a new `ratchet_pk`) MUST be queued and not processed until after `derive_call_keys` is called. The implementation MUST NOT decrypt the message or advance `root_key` while the enforcement window is open. The queue depth is bounded by the number of ratchet messages that can arrive during a single round trip — in practice, 0-5 messages. If a ratchet-step message is queued for more than a configurable timeout (LO Protocol defines the signaling timeout; typical value: 5 seconds), the call setup is considered failed and all signaling state is discarded. **The definition of "enforcement window" and the specific queueing mechanism are deferred to the LO Protocol Specification.** soliton enforces only the invariant that `root_key` MUST NOT advance between `CallOffer`/`CallAnswer` and `derive_call_keys`; the protocol layer is responsible for implementing the queueing and timeout behavior.

#### Signaling Messages

All signaling messages are encrypted via the existing LO-Ratchet session:

- `CallOffer { call_id, ek_pub }` — initiator → peer
- `CallAnswer { call_id, ct }` — peer → initiator
- `CallHangup { call_id }` — either direction
- `CallReject { call_id }` — peer declines

These are application-layer message types. soliton provides only the key derivation for signaling; signaling message encoding and transport are application concerns.

**Frame encryption is also application-layer**: soliton delivers two raw 32-byte symmetric keys (`key_a` and `key_b`) and does not define a frame cipher, nonce scheme, or frame AAD structure. The application is responsible for choosing an AEAD algorithm for media frames, constructing per-frame nonces (e.g., from frame sequence numbers), defining what AAD (if any) to include in frame authentication, and handling frame loss or reordering. A common approach is XChaCha20-Poly1305 with a 64-bit monotonically increasing frame counter as the nonce (zero-padded to 24 bytes); however, soliton does not mandate this. The keys produced by `derive_call_keys` are suitable inputs to any 256-bit AEAD; the choice of frame AEAD is outside the scope of this specification.

#### Intra-Call Rekeying

The call chain key supports periodic rekeying for forward secrecy within the call:

```
function AdvanceCallChain(chain_key):
    // step_count is checked before derivations (guard fires when step_count >= 2²⁴).
    // All three HMAC derivations execute only when the guard does not fire.
    key_a'     = HMAC-SHA3-256(chain_key, [0x04])    // single-byte data
    key_b'     = HMAC-SHA3-256(chain_key, [0x05])    // single-byte data
    chain_key' = HMAC-SHA3-256(chain_key, [0x06])    // single-byte data

    // Zeroize old chain_key
    step_count += 1    // incremented AFTER all three derivations, on the success path only;
                       // not incremented on the exhaustion path (step_count stays at 2²⁴
                       // on every post-exhaustion call — see exhaustion pseudocode below).
    // Role assignment (key_a' → send or recv) is preserved from initial derivation.
    // Mechanism: derive_call_keys returns a lower_role: bool computed from fingerprint
    // comparison (§6.12 step 5). The caller stores this bool and passes it to every
    // AdvanceCallChain call: lower_role=true → key_a' is the send key, key_b' is recv;
    // lower_role=false → key_b' is send, key_a' is recv. A reimplementer who does not
    // persist lower_role gets swapped send/recv keys on every advance, with no error.
    return (key_a', key_b', chain_key')
```

On exhaustion (`step_count >= 2²⁴`), all three key fields are zeroized before returning:

```
    Zeroize key_a
    Zeroize key_b
    Zeroize chain_key
    return ChainExhausted
```

On exhaustion, all three key fields (`key_a`, `key_b`, `chain_key`) are zeroized — not just `chain_key`.

Each advance produces fresh call encryption keys and a new chain key. The old chain key and call keys are zeroized. Compromise of a later call key does not reveal earlier media segments.

**`step_count` is not an HMAC input**: `AdvanceCallChain` takes only `chain_key` as input; `step_count` does not feed into the HMAC derivation. It is a pure exhaustion counter — tracking how many advances have occurred to enforce the 2²⁴ limit. **Why this is safe**: `chain_key` itself advances monotonically via HMAC one-way function at each step, providing implicit domain separation — step N's output is independent of step N−1's because it is derived from a different key (the previous step's output). In contrast, `KDF_MsgKey` (§6.3) reuses the same `epoch_key` for every message in an epoch, making the counter essential to distinguish per-message derivations. `AdvanceCallChain` does not reuse the key: each step produces a fresh `chain_key'` that is HMAC-derived from the prior `chain_key`, so including `step_count` in the data argument would be redundant. **A reimplementer familiar with `KDF_MsgKey` must not apply the same pattern here**: including `step_count` in the data argument would produce a different chain key at every step N > 0, making every derived call key incompatible with the reference implementation despite producing no error.

The rekey interval is an application-layer decision (e.g., every 30 seconds or every N encrypted frames). soliton provides the `advance()` primitive; the application controls when to call it. Both parties MUST advance the chain in lockstep — mismatched `step_count` values produce incompatible keys with no error or diagnostic. The synchronization mechanism is application-layer (e.g., include `step_count` in encrypted media frame headers; the receiver advances to match before decrypting). When the receiver's `step_count` is behind the sender's, it must call `advance()` sequentially N times to catch up — there is no shortcut (each step requires the previous chain key as input). A reasonable fast-forward tolerance is application-specific, but implementations SHOULD cap the maximum gap (e.g., 1000 steps) and treat larger gaps as session corruption rather than attempting a potentially expensive sequential catch-up. Once desynchronization is detected (receiver cannot decrypt at any plausible `step_count` offset within the tolerance window), the call's key material is irrecoverable — a new call must be established via the §6.12 setup protocol.

**Call chain exhaustion**: `AdvanceCallChain` has a hard limit of 2²⁴ (16,777,216) advances. The internal `step_count` starts at 0 and is checked (`step_count >= 2²⁴`) **before** the HMAC derivations and **before** any increment — the last permitted advance occurs when `step_count = 2²⁴ − 1`, after which `step_count` increments to 2²⁴ and the next call returns `ChainExhausted`. **Fencepost note**: at `step_count = 2²⁴ − 1`, the guard `(2²⁴ − 1) >= 2²⁴` is false, so all three HMAC derivations execute and `step_count` increments to 2²⁴. At `step_count = 2²⁴`, the guard `2²⁴ >= 2²⁴` is true, so the guard fires first — no derivations run, `step_count` is NOT incremented further, and all key material is zeroized before returning `ChainExhausted`. A reimplementer who places the increment before the guard (`step_count += 1; if step_count > 2²⁴`) exhausts one step earlier (last advance at `step_count = 2²⁴ − 2`). A reimplementer who places the increment after the guard but also after the derivations, on the same branch that returns the keys, must ensure that guard-fires-and-no-increment and derivations-succeed-and-increment are handled by separate code paths — conflating them would increment `step_count` to 2²⁴ + 1 after the last permitted advance, causing `step_count` to wrap if stored in a u32 (though the reference implementation uses a u32 with wrapping prevented by the guard). On exhaustion, all call key material (`key_a`, `key_b`, `chain_key`) is zeroized — the call's forward-secrecy chain is permanently terminated. A new call must be established via §6.12's setup protocol. The limit prevents counter overflow in the internal chain advancement and bounds the total key material derivable from a single call chain. At a 30-second rekey interval, 2²⁴ advances corresponds to ~16 years of continuous call — the limit is not reachable in practice but is enforced as defense-in-depth.

**`step_count` MUST be stored as a u32 or wider**: A narrower type (u8, u16, u24) wraps before reaching 2²⁴, silently disabling the exhaustion guard. For example, a u24 implementation wraps to 0 after 16,777,216 advances — subsequent calls pass the guard `0 >= 2²⁴ = false` and continue deriving keys indefinitely. The reference implementation uses a u32 (which can represent values up to ~4.3 × 10⁹, well above 2²⁴ = 16,777,216). Reimplementers MUST use a type that can represent 2²⁴ without wrapping.

**`ChainExhausted` from `advance()` — exhausted handle is NOT auto-freed; caller MUST free it**: Key material is zeroized on exhaustion, but the `CallKeys` allocation is NOT deallocated. The handle remains allocated and must be explicitly freed by the caller (`soliton_call_keys_free` in the CAPI; Rust's `Drop` via the normal ownership path). Failing to free the handle leaks the (now-zeroed) allocation. **soliton.h OWNERSHIP note conflict**: The generated header may carry a comment stating that after `ChainExhausted`, "all keys are zeroized — handle is dead." The phrase "handle is dead" means the handle is no longer usable for `advance()` — it does NOT mean the handle was auto-freed. Specification.md is normative: the handle is live, key material is zeroed, and the caller is responsible for freeing it. A binding author who reads "handle is dead" as "already freed" will double-free the handle. The correct action after `ChainExhausted` from `advance()` is: (1) record that the call session is exhausted, (2) free the handle via the standard free function, (3) establish a new call via `derive_call_keys()`.

**CAPI handle lifetime**: Zeroization of key material does not deallocate the call chain handle. CAPI callers MUST still call the handle's destroy/free function after receiving `ChainExhausted` — failure to do so leaks the (now-zeroed) allocation. The handle remains valid for destruction but invalid for further `advance()` calls.

**Post-exhaustion idempotency**: After the first `ChainExhausted`, every subsequent call to `advance()` also returns `ChainExhausted` and unconditionally zeroizes key material. Because the keys were already zeroed on the first exhaustion, the re-zeroization is a no-op, but implementations MUST NOT guard against re-zeroization ("skip if already exhausted") — the unconditional behavior ensures that a caller who ignores the first `ChainExhausted` cannot obtain stale key material from a later call. `step_count` is not reset; it stays at 2²⁴. `step_count` is an internal counter — it is NOT exposed via the Rust API or CAPI. The only externally observable signal of exhaustion is the `ChainExhausted` return value from `advance()`. Callers MUST check the return value; there is no way to query exhaustion state without calling `advance()`.

#### Security Properties

**Input validation**: `derive_call_keys` rejects all-zero `root_key` (dead session — liveness sentinel, constant-time check), all-zero `kem_ss` (degenerate KEM output — cryptographically implausible but structurally guarded, constant-time check), all-zero `call_id` (uninitialized identifier, variable-time — `call_id` is non-secret), and `local_fp == remote_fp` (self-call — collapses role assignment, variable-time — fingerprints are public). All four checks return `InvalidData`. The equal-fingerprint check is critical: with equal fingerprints, the strict `<` comparison in role assignment evaluates to `false` for both parties, so both assign `key_a` as recv and `key_b` as send — symmetric key confusion where each party encrypts with the key the other expects to decrypt with. This is distinct from the ratchet's `local_fp ≠ remote_fp` guard (§6.8 guard 19), which protects AAD symmetry.

**Call key secrecy**: Requires both `root_key` and `kem_ss`. The root key is bound via HKDF salt; the ephemeral KEM shared secret is bound via IKM.

**Epoch binding via `root_key` as HKDF salt**: Including `root_key` as the HKDF salt binds call keys to the current ratchet epoch. Call keys derived at ratchet epoch E are independent of those at epoch E+1 for the same `(kem_ss, call_id)` triple — advancing the ratchet between two calls changes `root_key`, producing completely different call keys even if the same ephemeral KEM exchange is reused. A reimplementer who uses a fixed salt (e.g., an empty salt or a static label) instead of `root_key` removes this epoch isolation: all calls between the same pair with the same `call_id` derive identical keys regardless of ratchet epoch, making past call keys recoverable from any future epoch compromise.

**Forward secrecy (ephemeral KEM)**: The ephemeral keypair is generated per call and zeroized after derivation. Later compromise of `root_key` does not reveal call content — `kem_ss` is no longer recoverable.

**Defense-in-depth (post-quantum)**: `root_key` in the HKDF salt carries the ratchet's accumulated post-quantum security. If the ephemeral KEM is broken by a quantum computer, `root_key` still protects. If `root_key` is compromised classically, the ephemeral KEM still protects.

**Intra-call forward secrecy**: `AdvanceCallChain` is one-way (HMAC-based PRF). Old chain keys are zeroized.

**No ratchet state mutation**: The ratchet operates independently during calls. Text messages advance the ratchet as normal.

**`CallKeys` is intentionally ephemeral — no serialization path**: There is no `to_bytes`/`from_bytes` API for `CallKeys`. Call key material is not designed to survive process restarts or be persisted to storage. If a call is interrupted (network failure, OS suspend, process crash), the `CallKeys` handle is lost and the call's key material is unrecoverable. The correct response is to re-establish the call via the §6.12 setup protocol (`derive_call_keys` on the current ratchet state after a new `CallOffer`/`CallAnswer` exchange) — the resulting new call keys will be independent of the interrupted call's keys, providing per-call forward secrecy. A reimplementer who adds a serialization path for `CallKeys` (to survive restarts) undermines this forward-secrecy property — a leaked blob of serialized call key material recovers the call's media encryption keys without any KEM secrets.

### 6.13 Design Rationale: Per-Epoch vs Per-Message Forward Secrecy

LO-Ratchet provides forward secrecy at epoch granularity (per KEM ratchet step), not per message. This is a deliberate departure from the Signal Double Ratchet, which provides per-message forward secrecy via a sequential KDF chain.

**What per-message forward secrecy protects against**: An attacker who compromises the chain key at position N can derive message keys N+1, N+2, ... but not 0, 1, ..., N-1. This matters only if the attacker obtains the chain key but not the root key or ratchet secret key.

**Why this threat model is unrealistic**: The `RatchetState` struct contains the epoch key, root key, and ratchet secret key at adjacent memory addresses. The root key is strictly more powerful (it derives all future epoch keys). The ratchet secret key enables decapsulating all future KEM ratchets. Any memory compromise that extracts the epoch key — buffer overread, memory dump, side-channel attack — extracts these adjacent secrets with overwhelming probability. Per-message forward secrecy protects against an attacker who can surgically extract exactly 32 bytes from a known offset and nothing else. This is not a realistic attack.

**What we gain by dropping it**:
- **O(1) out-of-order handling**: Any message key is derivable directly from the epoch key and counter. No skip cache, no TTL expiry, no purge throttling.
- **~300 fewer lines of code**: The skip cache was the most error-prone component (§6.9 in prior versions explicitly warned about this).
- **Simpler serialization**: No variable-length skip cache in the wire format.
- **Reduced memory**: No HashMap of 32-byte message keys (up to 3000 entries / ~96 KB). Duplicate detection uses 4-byte counters.
- **No `TooManySkipped` error**: The skip-amplification DoS vector is eliminated entirely.

**What we keep**: Forward secrecy across epochs (KEM ratchet), break-in recovery (fresh KEM randomness), unique per-message keys (AEAD security), and a one-epoch grace period for late-arriving messages (via `prev_recv_epoch_key`).

**Post-compromise security (PCS)**: PCS holds at epoch granularity. After an attacker compromises ratchet state at time t₁, a subsequent KEM ratchet step at time t₂ > t₁ with fresh (uncompromised) encapsulation randomness re-establishes confidentiality for the new epoch. The attacker's knowledge of state at t₁ does not yield epoch keys derived after the KEM ratchet step, because the fresh KEM shared secret is unknown to the attacker. Recovery depends on the sender's (encapsulator's) randomness being honest — the receiver contributes no fresh randomness to the KEM ratchet step (§14.11). For formal models: `Corrupt(state, t₁)` does not imply knowledge of epoch keys from a KEM ratchet step at t₂ > t₁, provided the encapsulator's CSPRNG is uncompromised at t₂.

**PCS healing boundary — decapsulation time, not encapsulation time**: PCS healing completes when the decapsulator successfully decrypts the first message in the new epoch, not when the encapsulator generates the KEM ciphertext. The encapsulator's fresh `kem_ss` is produced at t₂ (encapsulation), but `recv_epoch_key` on the decapsulator's side does not update until the first new-epoch message is successfully decrypted (at t₃ ≥ t₂). An adversary who compromises state at t₁, observes the KEM ratchet ciphertext at t₂, but suppresses delivery until t₃ can still decrypt all messages in the new epoch that are delivered before t₃ — because the decapsulator continues using the compromised `recv_epoch_key` until t₃. After t₃, the decapsulator holds the healed `recv_epoch_key` derived from the attacker-unknown `kem_ss`. For formal models: use `Healed(session, t₃)` where t₃ is the time of the first successful new-epoch decryption, not t₂.

**Two KEM ratchet steps for complete PCS healing**: A single KEM ratchet step derives a new `recv_epoch_key` unknown to the attacker — new-epoch messages are immediately protected. However, `prev_recv_epoch_key` (the compromised epoch key) persists through the first step, retained for the one-epoch late-message grace period. An adversary who compromises state immediately after the first KEM ratchet step can still read previous-epoch messages (via `prev_recv_epoch_key`). Full PCS healing — where the adversary retains no access to any compromised key material — requires two KEM ratchet steps: the first step rotates the compromised key into `prev_recv_epoch_key`; the second step overwrites `prev_recv_epoch_key` with the then-current epoch's key, permanently discarding the originally compromised material. For formal models: define `FullyHealed(session, t₄)` where `t₄` is the time of the second successful new-epoch decryption (not just the first). The §14.17 cross-reference uses `Healed(session, t₃)` to mean "new epoch is healed"; formal models that also require the previous epoch's key to be gone MUST use `FullyHealed(session, t₄)`.

**PCS does NOT cover:**
- **Already-compromised epochs**: Messages encrypted before the healing KEM ratchet step remain compromised. PCS is forward-looking — it restores confidentiality for future epochs, not retroactively for past ones.
- **Compromised encapsulator randomness**: If the attacker controls the sender's CSPRNG at the time of the KEM ratchet step, the fresh `kem_ss` is known to the attacker and the step does not heal. Recovery requires at least one KEM ratchet step with honest randomness (§14.11).
- **Active attacker participating in the KEM exchange**: If the attacker can substitute `ratchet_pk` in a message header (man-in-the-middle on the message transport), the KEM encapsulation targets the attacker's key rather than the peer's. AEAD authentication prevents this in normal operation (the header is bound into the AAD), but a full state compromise at t₁ gives the attacker enough material to forge headers until the next honest KEM ratchet step.
- **One-directional sessions**: PCS requires a direction change (the peer must send a message triggering a KEM ratchet step). A one-directional stream of messages never triggers a KEM ratchet and therefore never heals.

---

## 7. Symmetric Encryption

### 7.1 XChaCha20-Poly1305

**Algorithm**: XChaCha20-Poly1305 — the 24-byte-nonce variant of ChaCha20-Poly1305. This is NOT ChaCha20-Poly1305 (RFC 8439), which uses a 12-byte nonce. Go's `golang.org/x/crypto/chacha20poly1305` package exposes both under similar names: `chacha20poly1305.New` constructs the 12-byte (RFC 8439) variant; `chacha20poly1305.NewX` constructs the 24-byte XChaCha20 variant. A reimplementer who uses `New` instead of `NewX` produces incompatible ciphertext silently — both accept any 256-bit key, and the error surfaces only as `AeadFailed` on the receiver. **Always use the 24-byte nonce (XChaCha20) variant throughout soliton.**

- **Key**: 256-bit message key from KDF_MsgKey.
- **Tag**: 128 bits (16 bytes), appended to ciphertext.
- **Minimum valid ratchet ciphertext**: 16 bytes (Poly1305 tag only, zero-length plaintext). Ciphertexts shorter than 16 bytes are rejected as `AeadFailed` (not `InvalidLength` — see §12 error collapse). First-message encrypted payloads have a 40-byte minimum (24-byte nonce + 16-byte tag, §5.5 Step 6). **First-message minimum enforcement**: `decrypt_first_message` also returns `AeadFailed` (not `InvalidLength`) for payloads shorter than 40 bytes — this collapses "too short to contain a valid nonce + tag" with "authentication failed" into a single error variant, preventing a distinguishing oracle: an attacker who could observe `InvalidLength` vs `AeadFailed` would learn whether the authentication attempt even ran (and at what byte offset parsing failed). A reimplementer who returns `InvalidLength` for sub-40-byte first-message payloads breaks this oracle-collapse guarantee.
- **`aead_encrypt` failure — `AeadFailed` on usize overflow**: XChaCha20-Poly1305 encryption can return `AeadFailed` only when the plaintext length overflows internal length calculations (approximately `plaintext.len() ≈ usize::MAX`). This cannot occur with well-formed input bounded by the CAPI size cap (§13.4) or the storage/streaming chunk sizes. In practice, `aead_encrypt` is infallible for any input that passes the upstream size guards. An `AeadFailed` from `aead_encrypt` in production code indicates an integer overflow in the calling layer, not a cryptographic failure.
- **Constant-time by construction**: ARX-based (add-rotate-xor); no table lookups, no data-dependent branches. No hardware acceleration required.

### 7.2 Nonce Construction

**First message of a session** (LO-KEX session init):
```
nonce = random_bytes(24)    // Prepended to ciphertext payload
```

**All subsequent messages** (LO-Ratchet):
```
nonce[0..24]  = 0x00{24}       // MUST zero-initialize the entire buffer first
nonce[20..24] = big_endian_32(header.n)
```

Implementations MUST zero-initialize the entire 24-byte nonce buffer before writing the counter bytes into positions 20-23. In C, a stack-allocated `uint8_t nonce[24]` contains undefined (garbage) bytes unless explicitly zeroed — `memset(nonce, 0, sizeof(nonce))` MUST precede the counter copy. In Go, `var nonce [24]byte` zero-initializes by language specification, but `nonce := make([]byte, 24)` from a pool-allocated slice may not. In Rust, `let mut nonce = [0u8; 24]` is zero-initialized by the type. A reimplementer who writes only `nonce[20..24] = BE32(n)` without zeroing positions 0-19 produces a garbage-contaminated nonce — the resulting AEAD key-nonce pair may or may not be unique across messages (depending on what was on the stack), producing non-deterministic AEAD failures on the receiver.

The counter nonce is not transmitted — recipient derives from `header.n`. Safe because each `(msg_key, n)` pair is unique: `msg_key` is derived from a unique epoch key and counter, and the epoch key changes on every KEM ratchet step.

**An all-zero nonce (counter=0) is valid**: The first message of every epoch uses `header.n = 0`, producing a 24-byte all-zero nonce `[0x00 × 24]`. This is intentional and correct — XChaCha20-Poly1305 specifies no restrictions on nonce content (unlike AES-GCM, which also accepts all-zero nonces). The security guarantee comes from `msg_key` uniqueness (unique per (epoch_key, counter) pair), not from nonce non-zero values. Implementations MUST NOT reject or guard against an all-zero nonce. Some AEAD libraries include a "null nonce protection" heuristic that rejects all-zero nonces as likely initialization failures — such protections MUST be disabled or bypassed for XChaCha20-Poly1305 ratchet encryption. A library that returns an error for a zero nonce would silently break decryption of every epoch's first message (n=0) in every post-ratchet epoch.

The counter occupies the last 4 bytes (20-23) of the 24-byte nonce, leaving bytes 0-19 as zero.

### 7.3 AAD Construction

**First message (session init):**

```
aad = "lo-dm-v1"                           // 8 bytes UTF-8
   || sender_fingerprint_raw               // 32 bytes (raw SHA3-256, not hex)
   || recipient_fingerprint_raw            // 32 bytes
   || encode_session_init(session_init)    // variable, see §7.4
```

**`encode_session_init` re-encoding obligation**: `encode_session_init` MUST be called to reconstruct the canonical bytes from the parsed struct — using raw wire bytes directly is an error. Bob's obligation to re-encode is documented at §5.5 Step 3 and §13.4. The output MUST be byte-for-byte identical to Alice's encoding; any field normalization during decode that alters re-encoding causes silent `AeadFailed`.

**Ratchet messages:**

```
aad = "lo-dm-v1"
   || sender_fingerprint_raw
   || recipient_fingerprint_raw
   || encode_ratchet_header(ratchet_header)
```

This binds ALL header fields to the AEAD tag. Tampering invalidates the tag.

**`"lo-dm-v1"` is concatenated bare — no length prefix**: The 8-byte label `"lo-dm-v1"` is written directly into the AAD with `||` (byte concatenation), NOT as a length-prefixed `len(x) || x` field. Contrast with §5.4 HKDF info construction, where `"lo-kex-v1"` is also bare but explicitly noted as "raw 9-byte prefix (not length-prefixed)." The `||` operator in this spec always means raw byte concatenation; length prefixes are written explicitly as `len(x) || x`. A reimplementer who applies the `len(x) || x` convention from §5.4 info fields to the AAD label — prepending a 2-byte length (`0x00 0x08`) before `"lo-dm-v1"` — produces a different 10-byte prefix and silently broken AEAD on every message. The confirmed encoding: `aad = b"lo-dm-v1" || sender_fp || recipient_fp || header_bytes` — total prefix is 8 raw bytes.

**Sender/recipient orientation**: `sender_fingerprint_raw` is the fingerprint of the party calling `Encrypt` (local party); `recipient_fingerprint_raw` is the fingerprint of the remote party. On the decrypt side, these roles are reversed — the decryptor reconstructs AAD using the remote party's fingerprint as `sender_fingerprint_raw` and its own fingerprint as `recipient_fingerprint_raw`. Both fingerprints are stored in `RatchetState` as `local_fp` and `remote_fp` at init time; encrypt uses `(local_fp, remote_fp)`, decrypt uses `(remote_fp, local_fp)` to reconstruct the correct AAD order.

Note: for the first message, `recipient_fingerprint_raw` (Bob's IK fingerprint) appears twice in `aad`: once as the standalone prefix field, and again inside `encode_session_init(session_init)` as `si.recipient_ik_fingerprint`. Both occurrences are intentional — the prefix provides fast lookup without parsing the encoded blob, and the embedded copy ties the fingerprint directly into the signed `SessionInit`. **Bob does not need an explicit equality check between the two occurrences** — AEAD authentication enforces consistency transitively: if an attacker substitutes a different value in either location, the AAD bytes change and the AEAD tag fails. A reimplementer who adds an explicit `prefix_fp == session_init_fp` check before AEAD is not wrong (it detects a specific tampering pattern), but it is redundant — the AEAD check subsumes it and adding a distinct error for the mismatch would create an error-type oracle.

### 7.4 Deterministic Header Encoding

AAD must be computed identically by sender and recipient. JSON is not suitable (field ordering, whitespace, encoding ambiguity). Headers are encoded as length-prefixed binary.

**Length-prefix rule**: Identity fingerprints (32 bytes) and public keys (1216 bytes) are written bare — their sizes are fixed by definition and cannot change across crypto versions, so the decoder always knows the exact size from the `crypto_version` context and needs no length prefix to parse them unambiguously. KEM ciphertexts (1120 bytes in lo-crypto-v1) are length-prefixed despite being fixed-size in the current version — forward compatibility requires the decoder to handle variable-size ciphertexts from future crypto versions (a `lo-crypto-v2` could adopt a different KEM with a different ciphertext size). The `crypto_version` string is length-prefixed because it is genuinely variable-length. A reimplementer MUST NOT pattern-match "fixed-size → no prefix" — the ciphertexts are the exception because their size is algorithm-determined, not definitionally invariant within a crypto version. **Exception to the "fixed-size fields bare" rule**: DM queue AAD (§11.4.2) uses `len(recipient_fp) || recipient_fp` — a length-prefixed encoding for the recipient fingerprint despite it being a fixed 32-byte field. This is an intentional deviation from the general rule; see §11.4.2 for the design rationale. A reimplementer who applies the "bare" rule from this section to all fixed-size fields will produce wrong AAD in the DM queue context.

**encode_session_init(si):**

```
encode_session_init(si) =
    len(si.crypto_version)             || si.crypto_version        // UTF-8, 2-byte BE len
 || si.sender_ik_fingerprint_raw                                   // 32 bytes (fixed, no length prefix —
                                                                   // fingerprints are SHA3-256 digests with
                                                                   // definitionally invariant size; no future
                                                                   // lo-crypto version will change them)
 || si.recipient_ik_fingerprint_raw                                // 32 bytes (fixed, no length prefix — same rationale)
 || si.sender_ek                                                   // 1216 bytes (fixed, no length prefix —
                                                                   // sender_ek is an X-Wing public key whose
                                                                   // size is definitionally fixed within lo-crypto-v1;
                                                                   // identity key sizes do not change across versions)
 || len(si.ct_ik)                      || si.ct_ik                 // 1120 bytes, 2-byte BE len
                                                                   // (length-prefixed despite being fixed-size in
                                                                   // lo-crypto-v1: KEM ciphertext size is
                                                                   // algorithm-determined, not definitionally
                                                                   // invariant — a future lo-crypto-v2 could
                                                                   // select a different KEM with a different
                                                                   // ciphertext size; a decoder that hard-codes
                                                                   // 1120 bytes would misparse future session inits)
 || len(si.ct_spk)                     || si.ct_spk                // 1120 bytes, 2-byte BE len (same rationale as ct_ik)
 || big_endian_32(si.spk_id)
 || si.has_opk (1 byte: 0x01 or 0x00)
 || if has_opk: len(si.ct_opk)        || si.ct_opk
                || big_endian_32(si.opk_id)
                // When has_opk = 0x00: encoding terminates immediately here.
                // No ct_opk or opk_id bytes are written. The total encoded length
                // with has_opk = 0x00 is exactly 3,543 bytes. A reimplementer who
                // writes zero-filled placeholders (e.g., 2 bytes len + 1120 zero
                // bytes + 4 zero bytes) after the 0x00 flag produces a malformed
                // encoding that fails the strict trailing-bytes check on decode
                // (§7.4 "Trailing bytes after the last field → InvalidData").
```

**All callers of `encode_session_init` MUST use a single shared implementation**: Three separate callers use `encode_session_init` output: (1) §5.4 Step 6 (Alice signs the encoded bytes); (2) §5.4 Step 7 (Alice uses the encoded bytes as AEAD AAD); (3) §5.5 Step 3 (Bob re-encodes the received SessionInit to verify Alice's signature). All three MUST produce byte-for-byte identical output. Any divergence between the signer (1) and the verifier (3) causes `VerificationFailed`; any divergence between the signer (1) and the AAD builder (2) causes `AeadFailed` at `decrypt_first_message`. A reimplementer who inlines `encode_session_init` at each call site and introduces any encoding difference — field ordering, padding, prefix conventions — gets a silent failure with no diagnostic pointing to the encoding divergence. The correct pattern: a single encoding function called identically from all three sites.

**encode_ratchet_header(rh):**

```
encode_ratchet_header(rh) =
    rh.ratchet_pk                                                  // 1216 bytes (fixed, no length prefix —
                                                                   // ratchet_pk is an X-Wing public key with
                                                                   // a definitionally fixed 1216-byte size
                                                                   // within lo-crypto-v1; fixed-size fields
                                                                   // are written bare per the Length-prefix
                                                                   // rule above)
 || rh.has_kem_ct (1 byte: 0x01 or 0x00)
 || if has_kem_ct: len(rh.kem_ct)     || rh.kem_ct                // 1120 bytes total: X25519_eph_pk (32) || ML-KEM-768_ct (1088), LO X25519-first encoding (§8.1); 2-byte BE len
                                                                   // (length-prefixed despite being fixed-size —
                                                                   // KEM ciphertext size is algorithm-determined,
                                                                   // not definitionally invariant; a future
                                                                   // lo-crypto-v2 could select a different KEM
                                                                   // with a different ciphertext size; the
                                                                   // decoder must not hard-code 1120 bytes)
 || big_endian_32(rh.n)                                        // always present — not conditional on has_kem_ct
 || big_endian_32(rh.pn)                                       // always present — not conditional on has_kem_ct
```

**Signing context**: When `encode_session_init` output is used as the signed message (§5.4 Step 6), the label `"lo-kex-init-sig-v1"` (18 raw bytes, no length prefix) is prepended: `HybridSign(sk, "lo-kex-init-sig-v1" || encode_session_init(si))`. When the same output is used as the AAD component (§7.3), no prefix is added — the encoded bytes are embedded directly in the AAD alongside fingerprints and the DM label. A reimplementer reading this section as the encoding reference for what gets signed must include the label prefix; omitting it produces a valid encoding but an invalid signature.

All `len()` values are 2-byte big-endian. Fixed-size fields (fingerprints at 32 bytes, keys at 1216 bytes) omit length prefixes. Variable-length fields (`crypto_version`, ciphertexts) use 2-byte BE length prefixes (ciphertexts are fixed-size in lo-crypto-v1 but length-prefixed for forward compatibility across crypto versions — a future `crypto_version` may select a different underlying KEM with a different ciphertext size, so a decoder that assumes a fixed ciphertext length would misparse future-version session inits). This encoding is unambiguous, deterministic, and trivial to implement.

**Decode validation**: On decode, each ciphertext length prefix MUST equal `XWING_CIPHERTEXT_SIZE` (1120 bytes); any other value → `InvalidData` (not `InvalidLength` — this is a wire-format field violation, not a caller-supplied parameter mismatch; see §12 error semantics). A decoder that trusts the u16 prefix without validation would accept malformed blobs with truncated or oversized ciphertexts, leading to incorrect decapsulation inputs. The `crypto_version` field is validated as `"lo-crypto-v1"` (exact match); other values → `UnsupportedCryptoVersion`.

**Encode error behavior for wrong-size kem_ct**: On the encode path, `encode_ratchet_header` handles a `kem_ct` with the wrong length as follows: if `ct.len() > 65535` (does not fit in a u16), the function returns `Internal` because the length prefix field cannot represent the value. If `ct.len() <= 65535` but is not 1120 bytes, the function **silently encodes the actual length** — the 2-byte length prefix receives the actual (non-1120) length and all bytes are written to the buffer without any error. This is not a size-validation step; the encoder's only hard constraint is that the length fits in a u16. The wrong-size ciphertext is caught on the **decode path**: the decoder validates that the length prefix equals `XWING_CIPHERTEXT_SIZE` (1120 bytes) and returns `InvalidData` for any other value (per the "Decode validation" paragraph above). A reimplementer who expects the encoder to return `Internal` for any wrong-size ciphertext (not just the `> 65535` case) will incorrectly assume that encode-side validation is a substitute for CSPRNG-correct X-Wing usage. The correct invariant: the encode path is not responsible for validating ciphertext sizes; the decode path is.

**`sender_ek` is bare while ciphertexts are length-prefixed — encoding boundary hazard**: In `encode_session_init`, `sender_ek` (1216 bytes) is written with no length prefix, but the immediately following `ct_ik` carries a 2-byte BE length prefix. A reimplementer reading the format as "keys have prefixes, ciphertexts have prefixes" and adding a 2-byte prefix to `sender_ek` shifts every subsequent field by 2 bytes: the byte at offset 1216 is parsed as the high byte of `len(sender_ek)` rather than the start of `len(ct_ik)`, desynchronizing all subsequent fields with no error until the final byte-length check (if any). No prefix is added to `sender_ek` because its size is definitionally invariant (X-Wing public keys are always 1216 bytes); the length prefix on `ct_ik` (and all three ciphertexts) exists specifically because KEM ciphertext sizes are algorithm-determined and may change across future `crypto_version` values. The asymmetry is intentional — see the Length-prefix rule above.

**Total encoded sizes for `encode_session_init`**: Without OPK (`has_opk = 0x00`): **3,543 bytes** total (2 + 12 + 32 + 32 + 1216 + 2 + 1120 + 2 + 1120 + 4 + 1 = 3,543). With OPK (`has_opk = 0x01`): **4,669 bytes** total (3,543 + 2 + 1120 + 4 = 4,669). Decoders can use these totals as a quick-reject check before field-by-field parsing: any input not equal to 3,543 or 4,669 bytes MUST be rejected as `InvalidData` without further parsing. A decoder that accepts inputs of any size and relies solely on field-by-field parsing would accept truncated inputs that parse successfully up to a short point (e.g., a blob of 14 bytes matches the `len(crypto_version) || crypto_version` prefix), masking truncation bugs in test environments.

**Progressive parsing note**: The `has_opk` flag is at offset 3542 — the last byte of the fixed-size prefix. A streaming/progressive parser cannot determine the total `session_init_bytes` length until it has consumed all 3543 bytes. The usual trick of reading a length prefix from the first few bytes does not work here; the format is self-delimiting only after the fixed prefix is fully consumed. For `encode_ratchet_header`, the `has_kem_ct` flag is at offset 1216 (immediately after `ratchet_pk`, which occupies bytes 0-1215), similarly requiring the full fixed prefix.

**Boolean marker byte strictness**: The `has_opk` and `has_kem_ct` fields accept only `0x00` (absent) or `0x01` (present). Any other value → `InvalidData`. Decoders MUST NOT treat arbitrary non-zero values as "present" — doing so accepts malformed blobs and creates format malleability (multiple byte values encode the same logical state). Trailing bytes after the last field → `InvalidData` (strict parsing, same rationale as §6.8 guard 11).

**Fixed-width integer re-encoding MUST be lossless**: All fixed-width integer fields — `spk_id`, `opk_id`, `n`, `pn` (u32, 4 bytes each) — MUST re-encode at their full fixed width as big-endian, regardless of value. A field containing `0` MUST produce four zero bytes (`0x00 0x00 0x00 0x00`), not an empty field or a variable-length encoding. A Python reimplementer who parses `spk_id = 0` into a Python int and re-encodes with a variable-length BE encoder (which might produce `b''` for zero) produces different bytes from the original encoding, yielding a different AAD and permanent `AeadFailed` with no diagnostic. The "byte-for-byte identical" guarantee in §7.3 includes fixed-width integers, not only variable-length fields.

**Truncated input**: If the input is too short to contain all required fields, the decoder returns `InvalidData` (not `InvalidLength`). This includes the case where a length prefix claims more bytes than remain in the buffer — the decoder must not read past the end. Using `InvalidLength` would leak parser state — an attacker could probe incrementally longer inputs and observe the error transition from `InvalidLength` to `InvalidData`, revealing the byte offset where parsing progressed past the size check.

---

## 8. X-Wing KEM Details

### 8.1 Encoding (LO-specific)

LO uses X25519-first encoding (diverges from draft-09 which uses ML-KEM-first):

```
// LO encoding (X25519-first):
X-Wing public key (1216 B):  X25519_pk (32) || ML-KEM-768_pk (1184)
X-Wing secret key (2432 B):  X25519_sk (32) || ML-KEM-768_sk (2400)
X-Wing ciphertext (1120 B):  X25519_eph_pk (32) || ML-KEM-768_ct (1088)

// draft-09 encoding (ML-KEM-first) — for contrast only; LO does NOT use this:
//   public key:  ML-KEM-768_pk (1184) || X25519_pk (32)
//   secret key:  ML-KEM-768_sk (2400) || X25519_sk (32)
//   ciphertext:  ML-KEM-768_ct (1088) || X25519_eph_pk (32)
```

**Interoperability consequence**: A reimplementer who uses draft-09's ML-KEM-first layout instead of LO's X25519-first layout produces public keys, ciphertexts, and secret keys whose byte order is inverted. Encapsulation "succeeds" (no length error), but the combiner receives reversed sub-components: `pk_X` from the ML-KEM portion and `pk_M` from the X25519 portion, producing a wrong shared secret. The mismatch surfaces only as `AeadFailed` at the AEAD layer with no indication of which byte offset was misinterpreted. If interoperating with a draft-09-compatible library, both parties must explicitly reorder the concatenation — LO's test suite includes a KAT that reorders a draft-09 vector into LO's X25519-first layout before decapsulation.

This encoding difference is internal only — no external interop with draft-09 implementations is required. Combiner inputs are extracted correctly regardless of encoding order; the cryptographic output is identical. **Canonical byte representation**: The X25519 component of an X-Wing public key is the raw 32-byte little-endian u-coordinate with no bit masking applied to the public key bytes. Only the private scalar is clamped (RFC 7748 §5), and clamping is applied at *use time* (inside the X25519 scalar-multiplication operation — §8.2) — the stored scalar bytes are unclamped raw random bytes. Clamping is NOT applied at storage time; the secret key bytes in the X-Wing 2432-byte blob are stored without clamping. A reimplementer who pre-clamps the scalar at storage time and then clamps again at use will compute the correct result (clamping is idempotent via RFC 7748's bit mask: bits 0-2 of byte 0 are already 0 after the first clamp; bit 7 of byte 31 is already 0; bit 6 of byte 31 is already 1), but the stored bytes will differ from soliton's unclamped format, causing silent key import failures when round-tripping through the 2432-byte serialization. Some X25519 libraries clear bit 255 of the public key byte 31 — using such a library produces a different 32-byte public key than soliton's, causing silent SPK signature verification failure (the signed bytes differ from the stored/verified bytes). Reimplementers MUST verify their X25519 library does not mask public key bits.

**ML-KEM-768 public key coefficient reduction — happens inside `Encaps`, not at `from_bytes`**: `EncapsulationKey::from_bytes` is a size check only — it does not normalize coefficients. Coefficient reduction (FIPS 203 §7.2 `ByteDecode_12`, which silently reduces any coefficient ≥ 3329 modulo q) occurs inside `ML-KEM-768.Encaps()` when the encapsulation key bytes are imported for use. The practical implication: a round-trip byte-comparison test (`from_bytes` then `to_bytes`, compare to original) will NOT detect normalization incompatibilities — the stored bytes are returned verbatim because `from_bytes` is a pure size check. Only a shared-secret KAT (encapsulate, decapsulate, compare shared secrets) detects normalization divergence. A foreign library that stores unreduced coefficients produces encapsulation keys that produce a different shared secret after `Encaps` — this surfaces as `AeadFailed` at the AEAD layer with no indication the key was modified. Reimplementers importing ML-KEM-768 encapsulation keys from external libraries MUST verify via KAT, not via byte-comparison. The cross-check from §8.5 also applies: re-derive the public key from the decapsulation key and compare `ek_PKE` bytes — a mismatch indicates encoding-domain divergence (NTT vs. coefficient-domain). **This normalization divergence also affects SPK signature verification**: the IK signature over the SPK (produced by `HybridSign` in §5.3, §10.2) covers the raw 1,216-byte SPK public key bytes as stored. If a reimplementer's ML-KEM library normalizes coefficients on import (modifying the byte representation), the bytes the reimplementer would sign or verify over differ from the raw bytes stored and transmitted by the reference implementation. The SPK signature verification step (`HybridVerify` at §5.5 Step 3 / §5.3) would then return `VerificationFailed` even when the SPK is cryptographically valid — because the signed bytes and the verified bytes are different normalized representations of the same underlying key. The normalization divergence therefore causes bundle authentication failure even when no tampering occurred.

### 8.2 Combiner (draft-09 §5.3)

**Version pinning**: `lo-crypto-v1` is pinned to `draft-connolly-cfrg-xwing-kem-09`. Any future revision that alters the combiner construction — including a published RFC that differs from draft-09 — requires a new `crypto_version` string (i.e., `"lo-crypto-v2"`) for compatibility. The `XWingLabel` bytes and `ss_M ‖ ss_X ‖ ct_X ‖ pk_X` argument order are draft-09-specific; a reimplementer using a different draft or the final RFC MUST verify these values match before using this spec.

```
function XWing.Combine(ss_M, ss_X, ct_X, pk_X):
    return SHA3-256(ss_M || ss_X || ct_X || pk_X || XWingLabel)

XWingLabel = 0x5c 0x2e 0x2f 0x2f 0x5e 0x5c   // ASCII: \.//^\  (6 bytes, label goes LAST)
```

- `ss_M` = ML-KEM-768 shared secret (32 bytes)
- `ss_X` = X25519 DH output (32 bytes)
- `ct_X` = ephemeral X25519 public key (32 bytes) — `ciphertext[0..32]` in LO's X25519-first encoding (§8.1)
- `pk_X` = recipient X25519 public key (32 bytes) — `public_key[0..32]` in LO's encoding
- `c_M` = ML-KEM-768 ciphertext (1088 bytes) — `ciphertext[32..1120]`. Not in the combiner formula — `c_M` is bound inside `ss_M` via ML-KEM's implicit rejection (the pseudorandom SS depends on the ciphertext).
- `pk_M` = ML-KEM-768 public key (1184 bytes) — `public_key[32..1216]`. Not in the combiner formula — `pk_M` is bound inside `ss_M` on both sides: on the **decapsulator** side, `ss_M` is derived from the decapsulation key, which embeds `pk_M` in its `ek_PKE` field (§8.5); on the **encapsulator** side, `ss_M = ML-KEM-768.Encaps(pk_M, randomness)` directly consumes `pk_M` as an input — encapsulation is a function of the public key, so `pk_M` is bound to `ss_M` there as well. This follows draft-09 §5.3.
- Hash: SHA3-256
- Label position: **last** (changed from draft-06 which had label first)
- **`ss_M ‖ ss_X` argument order**: The `ss_M ‖ ss_X` order is fixed by draft-09 §5.3 — it is not a local choice. Swapping them produces a different SHA3-256 output with no error signal.
- **Total SHA3-256 input length: 134 bytes** (32 + 32 + 32 + 32 + 6 = 134). SHA3-256's rate is 136 bytes (one Keccak block absorbs all 134 bytes in a single call — no second block). A reimplementer who miscounts the input length (e.g., adding or omitting the `pk_M` or `c_M` confusion from the "not in combiner" note above) produces a different hash output with no error at the hash primitive layer.

**pk_X during decapsulation**: The combiner requires `pk_X` (the recipient's X25519 public key), but the decapsulation key contains only `sk_X`. The decapsulator re-derives `pk_X` via `X25519(sk_X, G)` (scalar-basepoint multiplication) each time — no separate public key storage or input is needed. `G` is the X25519 base point defined in RFC 7748 §6.1: the u-coordinate value 9 encoded as a 32-byte little-endian integer (`09 00 00 00 ... 00`). X25519 libraries expose this as their `scalarmult_base` operation — use the library's base-point function rather than encoding G manually. This matches soliton's secret key layout (§8.5): only the X25519 scalar and ML-KEM expanded key are stored.

The label bytes decode as: `\` (0x5c), `.` (0x2e), `/` (0x2f), `/` (0x2f), `^` (0x5e), `\` (0x5c) = `\.//^\`.

**SHA3-256 input must be one concatenated byte string — no separators, no length prefixes between fields**: The five combiner inputs (`ss_M`, `ss_X`, `ct_X`, `pk_X`, `XWingLabel`) are concatenated as raw bytes with no separators, no length prefixes, and no delimiter bytes between them. The total input is exactly 134 bytes (32 + 32 + 32 + 32 + 6). Some hash APIs accept multiple buffers via repeated `update()` calls or a variadic array; these are equivalent to concatenation only when the underlying hash is a sponge/Merkle-Damgård construct that processes data chunk-boundary-invariantly. However, certain framework wrappers or tree-hash APIs (Merkle-tree SHA3, protocol-framing helpers, "typed" hash APIs) insert domain separation bytes or length prefixes between `update()` calls. Using such an API produces a different SHA3-256 output even when the individual fields are correct. The correct API call is either: (a) concatenate all five fields into a 134-byte buffer and call SHA3-256 once, or (b) use a streaming SHA3-256 context with raw `update()` — no other arguments, wrappers, or domain separation. **Verification**: compute `SHA3-256(ss_M || ss_X || ct_X || pk_X || label)` using known test inputs from Appendix F.3 and compare against the expected output.

Compile-time assertion verifies all six label bytes.

**`XWing.Combine` MUST be internal-only — not a public API**: The combiner takes raw `ss_M` and `ss_X` values as inputs and returns a SHA3-256 hash — it performs no key validation, no randomness checks, and no binding to a specific encapsulation operation. Exposing it as a callable public function allows a caller to supply arbitrary `ss_X` values (e.g., all-zero, repeated from a prior session, or attacker-controlled), which breaks the IND-CCA2 security guarantee of the combined scheme. The X-Wing security proof assumes `ss_X` is the genuine DH output from encapsulation or decapsulation — not a caller-supplied value. Implementations MUST call the combiner exclusively from within `XWing.Encapsulate` and `XWing.Decapsulate`, where `ss_X` is derived from a fresh ephemeral scalar (encapsulation) or from the peer's ephemeral public key and the stored secret key (decapsulation). The CAPI does NOT expose `soliton_xwing_combine`; the Rust API exposes `XWing.Combine` as `pub(crate)` only. Binding authors MUST NOT promote this to a public function.

**The SharedSecret returned by `XWing.Encapsulate` / `XWing.Decapsulate` MUST be consumed immediately by `KDF_Root` and zeroized**: The 32-byte shared secret (`ss`) is the output of `XWing.Combine` — it is secret key material of the same sensitivity as the inputs to `KDF_Root`. A binding author who returns the raw `ss` to callers for "custom KDF use" achieves the same security risk as exposing `XWing.Combine` directly: the caller can supply the `ss` to arbitrary downstream operations, bypassing the key hierarchy defined in §5.4 / §6.4. In Rust, `xwing::SharedSecret` is not `Clone` or `Copy` — it cannot be extracted without a deliberate `as_bytes()` call, and soliton never calls `as_bytes()` on the combined output outside `KDF_Root`. Binding authors MUST ensure the `ss` is passed directly into `KDF_Root` at the call site and zeroized before the encapsulation/decapsulation function returns. Do not buffer, log, or expose it through any intermediate field.

**End-to-end encapsulation and decapsulation pseudocode:**

```
function XWing.Encapsulate(pk):
    // pk layout (§8.1): pk_X(32) || pk_M(1184)
    pk_X = pk[0..32]
    pk_M = pk[32..1216]

    // X25519 half: generate ephemeral scalar, compute shared secret and ephemeral pk
    eph_sk = random_bytes(32)                      // 32 raw CSPRNG bytes; do NOT pre-clamp — clamping is applied
                                                   // internally by the X25519 call per §8.5 (stored raw).
                                                   // `random_bytes(32)` is used (not `random_scalar()`) to
                                                   // avoid library functions that return pre-clamped scalars;
                                                   // storing a pre-clamped scalar violates the raw-bytes
                                                   // storage requirement (§8.5).
    ct_X   = X25519(eph_sk, G)                    // ephemeral public key (32 bytes); G = RFC 7748 §6.1 base point (u-coordinate 9)
    ss_X   = X25519(eph_sk, pk_X)                 // DH output (32 bytes); if the library rejects
                                                   // all-zero output (low-order pk_X), substitute
                                                   // [0u8; 32] — same rule as Decapsulate (§8.3)

    // ML-KEM-768 half
    // FIPS 203 §7.2 draws m ← B^32 (32 random bytes) internally before encapsulation.
    // Deterministic-API callers (e.g., ml-kem crate's `encapsulate_deterministic`) must
    // supply this m explicitly via `random_b32()`. Passing a fixed, zero, or reused m
    // produces structurally valid ciphertexts but silently breaks IND-CCA2 — an attacker
    // who can predict m can recover the shared secret. Each Encaps call MUST use a fresh
    // 32-byte CSPRNG value; reuse across calls is not detectable at the API level.
    (ct_M, ss_M) = ML-KEM-768.Encaps(pk_M)       // ct_M = 1088 bytes, ss_M = 32 bytes

    // Assemble ciphertext: X25519-first (§8.1)
    ct = ct_X || ct_M                              // 32 + 1088 = 1120 bytes

    // Combine
    ss = XWing.Combine(ss_M, ss_X, ct_X, pk_X)   // 32 bytes

    zeroize(eph_sk, ss_X, ss_M)
    return (ct, ss)

function XWing.Decapsulate(sk, ct):
    // sk layout (§8.1, §8.5): sk_X(32) || dk_M(2400)
    sk_X = sk[0..32]
    dk_M = sk[32..2432]

    // ct layout (§8.1): ct_X(32) || ct_M(1088)
    ct_X = ct[0..32]
    ct_M = ct[32..1120]

    // X25519 half: re-derive pk_X from sk_X (no stored copy needed)
    pk_X = X25519(sk_X, G)                        // scalar-basepoint multiplication
    ss_X = X25519(sk_X, ct_X)                     // DH output (32 bytes)

    // ML-KEM-768 half (implicit rejection: always returns a shared secret, never fails)
    ss_M = ML-KEM-768.Decaps(dk_M, ct_M)         // 32 bytes

    // Combine — uses re-derived pk_X, not any value from the ciphertext
    ss = XWing.Combine(ss_M, ss_X, ct_X, pk_X)   // 32 bytes

    zeroize(ss_X, ss_M)
    return ss
```

**pk_X re-derivation in decapsulation**: The combiner requires `pk_X` (the decapsulator's own X25519 public key), but soliton stores only the X25519 scalar `sk_X` in the secret key (§8.5 — no separate public key is stored). The decapsulator computes `pk_X = X25519(sk_X, G)` (scalar-basepoint multiplication) on every decapsulation call. A reimplementer who passes `ct_X` (the encapsulator's ephemeral key from the ciphertext) as `pk_X` to the combiner produces a wrong shared secret — `ct_X` is the ephemeral *encapsulator* key, not the *decapsulator* public key. This is the most common error in X-Wing implementations.

**Clamping requirement for pk_X re-derivation in non-auto-clamping libraries**: The re-derivation `pk_X = X25519(sk_X, G)` requires RFC 7748 clamping applied to `sk_X` before the scalar multiply. Libraries that apply clamping automatically at every X25519 call (such as `x25519-dalek`) handle this transparently — the raw stored scalar is passed in and clamped internally. Libraries that require explicit pre-clamping (raw Montgomery ladders, some low-level crypto primitives) MUST have clamping applied before each `X25519` call per §8.5's "Portability note." A non-clamping library computing `X25519(raw_sk_X, G)` without clamping produces a different public key for scalars where the low 3 bits or bits 254/255 differ from their clamped values — specifically, scalars where any of bits 0, 1, 2, 255 are set or bit 254 is clear. The mismatch between the re-derived `pk_X` and the actual public key causes the combiner to produce a wrong shared secret, and the AEAD fails silently. To verify clamping correctness: `X25519(sk_X, G)` with your library MUST equal `public_key[0..32]` (the stored X25519 public key). See §8.5 for the full portability note.

### 8.3 Low-Order X25519 Points

X25519 DH with a low-order public key produces an all-zeros output. LO uses the all-zeros value rather than rejecting the point. **The all-zero check MUST be constant-time**: the DH output is secret material before the check executes, so a variable-time comparison leaks one bit of information about the relationship between the ephemeral private key and the recipient's public key. The reference uses `subtle::ConstantTimeEq` against `[0u8; 32]`; see also the Constant-Time Requirements table in Appendix E. The SHA3-256 combiner absorbs this result alongside the ML-KEM shared secret and label; the full combiner output is secure regardless. Rejecting low-order points would allow an attacker with a malicious pre-key bundle to force session initiation to fail without providing any security benefit.

**Why the all-zero check is sufficient for all low-order points**: Curve25519 has 8 points of order dividing 8 (the cofactor), corresponding to points of order 1, 2, 4, or 8. RFC 7748 §5 clamps the scalar by clearing its three low bits (making it a multiple of 8). Multiplying any of these 8 low-order torsion points by a multiple-of-8 scalar produces the group identity. On Curve25519 in Montgomery form, the identity element has u-coordinate 0 — represented as the all-zero 32-byte string. Therefore, for any of the 8 low-order input points, the clamped scalar multiplication produces `[0u8; 32]`. The all-zero check is both necessary and sufficient: any low-order public key → all-zero output, and (with overwhelming probability) all-zero output → the input was a low-order point. A reimplementer who checks for a specific set of known low-order points by value (e.g., maintaining a hardcoded list of the 8 torsion points) is over-engineering — the all-zero output test is the complete check, since clamping makes it impossible for any non-torsion point to produce an all-zero DH output.

**Implementation mechanism — error-catch-and-replace, not pre-filtering**: The correct implementation calls the X25519 DH function normally and does NOT pre-filter low-order input points before calling DH. If the DH function returns an error or all-zero output, the caller substitutes `[0u8; 32]` explicitly. In soliton's Rust implementation, `x25519::dh()` rejects all-zero output by returning `Err(DecapsulationFailed)`; the X-Wing encapsulate/decapsulate layer catches that error and substitutes `[0u8; 32]` via `.unwrap_or([0u8; 32])`. The underlying `x25519_dalek` crate's `PublicKey` type itself does not reject low-order points — soliton's wrapper adds the check. Other X25519 libraries behave differently on low-order input: (1) some return an error (catch and substitute `[0u8; 32]`); (2) some panic (catch the panic and substitute `[0u8; 32]`); (3) some silently return `[0u8; 32]` without any error signal — in this case no substitution is needed, the all-zero value is already correct, and adding an explicit error-catch that rejects the silent-success path is wrong. A reimplementer who checks for an error return and, finding none, then checks for all-zero output and substitutes `[0u8; 32]` handles all three behaviors correctly — the substitution of `[0u8; 32]` for `[0u8; 32]` is a no-op. The substitution is always the X-Wing layer's responsibility, not the X25519 primitive's.

**Scope**: This no-rejection policy applies exclusively inside X-Wing encapsulate/decapsulate (§8.2). LO-Auth (§4) uses full X-Wing KEM (not standalone X25519 DH), so the no-rejection policy applies there as well — the ML-KEM-768 component provides security even if X25519 produces an all-zero output. The standalone `x25519::dh()` function (used only internally within X-Wing, not exposed as a protocol-level primitive) DOES reject all-zero output — it returns an error — and the X-Wing layer catches that error and substitutes `[0u8; 32]` explicitly. A reimplementer who adds a zero-output rejection inside X-Wing's internal X25519 step and propagates the error rather than substituting zeros breaks interop with degenerate-but-secure key exchanges.

### 8.4 ML-KEM Implicit Rejection

ML-KEM-768 decapsulation implements FIPS 203 implicit rejection: invalid ciphertexts produce a pseudorandom shared secret rather than an error. Authentication failure surfaces only at the AEAD layer (wrong shared secret → wrong session/epoch key → AEAD tag mismatch). Implementations must not add explicit ciphertext validation that could create a timing oracle distinguishing valid from invalid ML-KEM ciphertexts.

**X25519 does not independently provide implicit rejection** — it produces a valid curve point (shared secret) for any 32-byte input, including malformed or attacker-chosen public keys. The combined X-Wing shared secret is pseudorandom on invalid ciphertext because the ML-KEM component's randomized rejection (FIPS 203 §7.3) dominates the combiner output via SHA3-256. A reimplementer constructing a non-standard X-Wing variant who removes or replaces the ML-KEM component would silently break this property — the X25519-only combiner output would be attacker-influenced rather than pseudorandom.

### 8.5 Secret Key Storage

LO stores the fully expanded X-Wing secret key (2432 bytes) rather than the 32-byte seed format specified in draft-09. This avoids re-running SHAKE256 key expansion on every decapsulate. This applies to all X-Wing secret keys in soliton — identity IK, signed pre-keys (SPK), one-time pre-keys (OPK), and ratchet keys are all stored in expanded 2432-byte form. The 32-byte seed form is never used for storage regardless of key type. The extra 2400 bytes per key is negligible given the 2496-byte composite key.

**X25519 scalar `sk_X` is stored as raw bytes — clamping is applied at use time**: The 32-byte `sk_X` field in the stored secret key is the raw random scalar from key generation (or from SHAKE256 seed expansion), before RFC 7748 clamping. Clamping (bit 255 clear, bit 254 set, three low bits of byte 0 clear) is applied at the time of each X25519 operation (`X25519(sk_X, G)` and `X25519(sk_X, peer_pk)`) by the underlying library. The stored bytes are NOT pre-clamped. A reimplementer who clamps `sk_X` before writing it into the secret key blob produces a different wire format: the stored bytes differ from the reference implementation, the X25519 operations that re-clamp at use time produce the same curve result (clamping is idempotent for DH), but the blob round-trip fails because the stored bytes do not match what the reference expects.

**Portability note — libraries that require explicit pre-clamping**: Some X25519 libraries do not apply clamping internally and require the caller to pre-clamp the scalar before each use (e.g., a raw Montgomery ladder that takes the scalar as-is). When integrating such a library: (1) do NOT clamp before storage — store the raw random bytes as specified; (2) DO clamp before each X25519 call — read the 32 stored bytes, apply RFC 7748 clamping in a temporary variable, pass the clamped bytes to the library, and zeroize the temporary immediately after. This "clamp at use time" pattern matches the reference implementation's semantics even when the library doesn't clamp automatically. Libraries that clamp automatically (including `x25519-dalek` used by the reference) make this transparent — the stored raw scalar is passed directly and clamped internally. A reimplementer who passes the raw stored scalar to a non-clamping library produces the wrong DH output (the unclamped scalar may have the low bits set, changing the scalar value and thus the DH result), causing silent `AeadFailed` at the AEAD layer. To verify: compute `X25519(sk_X, G)` using the library and compare against `public_key[0..32]` — a mismatch indicates clamping divergence.

**Production keygen draws three independent OS CSPRNG values, not a SHAKE256-expanded seed**: `XWing.KeyGen()` in production draws `sk_X` (32 bytes), `d` (32 bytes), and `z` (32 bytes) as three separate independent OS CSPRNG values — it does NOT draw a single 32-byte seed and expand it via `SHAKE256(seed, 96)`. The three values are passed directly: `sk_X` to `X25519` for the X25519 component, and `d` + `z` to `ML-KEM.KeyGen_internal(d, z)` for the ML-KEM component. No seed is stored or exposed. **Both production key generation paths are conformant**: drawing three independent OS CSPRNG values (reference path) and SHAKE256 seed expansion (alternative path) both produce interoperable key pairs. The two paths differ only in how the three random inputs (`sk_X`, `d`, `z`) are generated — the downstream X25519 and ML-KEM operations are identical. **The reference implementation uses the three-draw path**; a reimplementer who uses SHAKE256 seed expansion for production keygen produces keys that interoperate fully with the reference. The deviation SHOULD be documented, as it changes the security analysis: with seed expansion, the three components are no longer independent random values — their joint distribution is determined by SHAKE256 applied to a single seed. The SHAKE256 seed expansion path is the natural choice for test vectors and KAT reproduction (where deterministic derivation from a known seed is required), but it is also a valid production path. A reimplementer MUST NOT use a non-CSPRNG seed (e.g., a counter or a fixed value) for production keygen on either path — the seed or the three independent draws must come from the OS CSPRNG.

**Seed-to-expanded-key derivation**: The X-Wing 32-byte seed produces the expanded key via `SHAKE256(seed, 96)` → 96 bytes, split as `d(32) || z(32) || sk_X(32)` (draft-09 §3.2). `d` and `z` are the ML-KEM-768 seeds passed to `ML-KEM.KeyGen_internal(d, z)` (FIPS 203 §7.3), which produces the 2400-byte expanded decapsulation key. These are two separate arguments — `ML-KEM.KeyGen_internal` is NOT called with a single 64-byte `d‖z` concatenation. Passing a concatenation or reversing the argument order as `(z, d)` produces different key material with no error. `sk_X` is the X25519 scalar. LO's storage order is `sk_X(32) || dk_M(2400)` — X25519 scalar first, then the ML-KEM-768 expanded key. This is the reverse of draft-09's wire order (which places ML-KEM first). A reimplementer deriving from the seed MUST use this expansion and storage order; using draft-09's wire order produces a valid-looking 2432-byte key that silently fails at decapsulation (the X25519 and ML-KEM components are swapped, producing wrong shared secrets in both sub-KEMs).

**ML-KEM-768 expanded key format (2400 bytes)**: The 2400-byte ML-KEM secret key is the `ml-kem` Rust crate's `DecapsulationKey` serialization, laid out as:

| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0 | 1152 | `dk_PKE` | NTT-domain decryption key (12 polynomials × 256 coefficients × 12 bits/coeff / 8) |
| 1152 | 1184 | `ek_PKE` | Encapsulation key (coefficient-domain encoding, identical to public key bytes) |
| 2336 | 32 | `H(ek_PKE)` | SHA3-256 hash of the encapsulation key |
| 2368 | 32 | `z` | Implicit-rejection seed (random, used for FO decapsulation) |

This is **not** the FIPS 203 32-byte seed form (`d || z`), nor the FIPS 203 standardized 2400-byte `dk_PKE || ek_PKE || H(ek_PKE) || z` expansion (which uses coefficient-domain for `dk_PKE`, not NTT-domain). The field sizes and order match FIPS 203 §7.3 `ML-KEM.KeyGen_internal`, but the `dk_PKE` encoding differs: FIPS 203 specifies coefficient-domain via `ByteEncode_12`, while the `ml-kem` crate serializes in NTT-domain. Byte-for-byte comparison with FIPS 203 output is invalid for the first 1152 bytes (`dk_PKE` only). The remaining three fields — `ek_PKE` (bytes 1152-2335), `H(ek_PKE)` (bytes 2336-2367), and `z` (bytes 2368-2399) — use standard FIPS 203 encoding and are byte-for-byte identical to FIPS 203 output. Only `dk_PKE` diverges; a reimplementer who mistrusts all four fields and attempts to convert or reorder all of them produces a wrong key. Other ML-KEM libraries (liboqs, PQClean, BouncyCastle) use different serialization formats. Reimplementers must verify their library's `DecapsulationKey` serialization matches soliton's byte layout, or perform format conversion at the deserialization boundary. **Silent failure mode**: there is no format magic in the 2400-byte key bytes — a wrong-format key is accepted by deserialization and only manifests as AEAD failures during decapsulation (the shared secret diverges silently). Cross-library key import requires an explicit check: re-derive the public key from the decapsulation key and compare to the known public key. A mismatch indicates a format incompatibility. **Concrete comparison**: for the ML-KEM-768 component, compare bytes 1152-2335 of the decapsulation key (the `ek_PKE` field, 1184 bytes) against `public_key[32..1216]` (the ML-KEM-768 portion of the X-Wing public key). For the X25519 component, compute `pk_X = X25519(sk_X, basepoint)` from the first 32 bytes of the secret key and compare against `public_key[0..32]`. Both comparisons must pass; a mismatch in either indicates an incompatible key format or encoding.

ML-DSA-65 secret keys are stored as the 32-byte seed (FIPS 204 §6.1 `ξ`), not the 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize)-byte expanded form. The signing key is deterministically re-expanded from the seed on each sign operation via `ML-DSA.KeyGen_internal(ξ)` (FIPS 204 §6.1), which produces the full expanded signing key on each call. **Re-expansion is fully deterministic — `ML-DSA.KeyGen_internal` consumes no CSPRNG input** (FIPS 204 §6.1 defines it as a pure function of `ξ`). Libraries that expose a two-level API — a public `KeyGen()` that draws OS randomness alongside an internal `KeyGen_internal(ξ)` that does not — must call the internal variant; calling the public variant for key re-expansion would succeed structurally but produce a different expanded key on every call, making signing non-reproducible. Implementations using ML-DSA libraries that only accept the expanded form must perform this expansion explicitly — passing the 32-byte seed directly as the signing key to such a library produces wrong signatures (the seed is not the signing key). Libraries that accept a seed-form input (e.g., via a `from_seed(ξ)` constructor) call `KeyGen_internal` internally; check the library's API.

**ML-DSA seed expansion for low-level library APIs**: FIPS 204 §6.1 Algorithm 1 (`ML-DSA.KeyGen`) applies an internal expansion to `ξ`: `(ρ, ρ', K) = SHAKE256(ξ ‖ k ‖ ℓ, 128)` (where `k = 5`, `ℓ = 4` for ML-DSA-65, expressed as single bytes), followed by polynomial sampling from `ρ` and `ρ'`. Some ML-DSA libraries expose this two-step structure with separate `keygen_internal(d, z)` parameters — **note that these are not the same `d` and `z` as ML-KEM's seed expansion**; ML-DSA uses `(ρ, ρ', K)` as its intermediate state, derived differently. A reimplementer whose library requires explicit seed expansion must run Algorithm 1 §6.1 in full before constructing the signing key; there is no simple fixed-length hash split analogous to ML-KEM's `SHAKE256(seed, 96) → d ‖ z ‖ sk_X`. In practice, FIPS 204-compliant libraries targeted at this use case provide a `KeyPair::from_seed(ξ)` or equivalent entry point that performs Algorithm 1 internally — verify that the library entry point accepts the 32-byte seed directly and runs Algorithm 1 §6.1, rather than accepting already-expanded polynomial state. ML-DSA-65 public keys (1952 bytes) use the standard FIPS 204 `pkEncode` format and are compatible with compliant implementations (liboqs, PQClean, BouncyCastle) without format conversion — unlike ML-KEM-768 (see above), there is no NTT-domain encoding divergence.

**ML-DSA-65 cross-check requirement for cross-library import**: Importing a 32-byte ML-DSA-65 seed from an external source produces no immediate error — `from_seed(ξ)` succeeds for any 32-byte input, and the re-derived public key always matches the stored public key for the same seed. However, if the seed bytes don't correspond to the intended keypair (wrong format, wrong byte order, corrupted), signatures produced with the imported seed verify against the *re-derived* public key but not against the *original* stored public key in the identity blob. The size check passes, expansion succeeds, and signatures look valid — the mismatch is only visible if the other party's stored ML-DSA public key is available for comparison. **Cross-library key import requires explicit verification**: call `ML-DSA.KeyGen_internal(candidate_seed)` (FIPS 204 §6.1), derive the public key from it, and compare against the known ML-DSA-65 public key (`composite_pk[1248..3200]`, 1952 bytes). A mismatch indicates an incompatible seed format and MUST be treated as `InvalidData` — proceeding would produce signatures that the receiver cannot verify.

Ed25519 secret keys are stored as the 32-byte seed (RFC 8032 §5.1.5), not the 64-byte expanded form (SHA-512 of seed). The signing key is deterministically expanded from the seed on each sign operation. The 32-byte `ed25519_sk` field in the composite secret key (bytes 2432-2464, §2.2) is this seed. **Interop note**: Libraries that represent Ed25519 private keys as 64-byte `seed || public_key` (libsodium, Go `crypto/ed25519`, PyNaCl) must extract only the first 32 bytes as the seed. The trailing 32 bytes (public key copy) are not part of soliton's secret key layout — passing the full 64-byte representation to key extraction produces corrupted output.

---

## 9. Verification Phrases

### 9.1 Purpose

A short human-readable phrase derived from both parties' identity public keys. Both parties can compare the phrase out-of-band (voice call, in-person) to verify identity key authenticity. The phrase is independent of session state — it depends only on the two identity keys and remains stable across session resets.

### 9.2 Algorithm

```
function VerificationPhrase(pk_a, pk_b):
    // Both pk_a, pk_b must be full LO composite identity public keys per §2.2
    // (3200 bytes: X-Wing 1216 + Ed25519 32 + ML-DSA 1952). Passing only
    // a sub-key component (e.g., the 1216-byte X-Wing key) returns InvalidLength;
    // passing a different 3200-byte value (e.g., padded or truncated) silently
    // produces a different phrase.
    // pk_a == pk_b → InvalidData (self-verification produces a valid phrase
    // that gives a false sense of security — the user verified against their own key).

    // Step 1: Sort keys lexicographically ascending — smaller key first.
    // The comparison is over the full 3200-byte raw public key bytes, NOT over
    // a fingerprint, hash, or any sub-key component. The fingerprint
    // (SHA3-256 of the key) is used as a canonical identifier throughout the
    // rest of the protocol — a reimplementer who sorts by SHA3-256(key) instead
    // of by the key itself produces a different ordering silently (the sort
    // succeeds, the phrase is wrong, no error is returned).
    // "Ascending" means the key that is lexicographically smaller (byte-by-byte,
    // left to right, unsigned comparison) is placed first. If pk_a <= pk_b,
    // first = pk_a, second = pk_b; otherwise first = pk_b, second = pk_a.
    // Descending order (larger key first) produces a different hash — silently
    // incompatible phrases with no runtime error.
    (first, second) = sort_lexicographic_ascending(pk_a, pk_b)

    // Step 2: Concatenate with domain separation label and hash.
    hash = SHA3-256("lo-verification-v1" || first || second)  // label = 18 bytes
    // Total SHA3-256 input: 6,418 bytes (18-byte label + 3,200-byte first + 3,200-byte second).
    // The full 3200-byte composite public key is hashed — NOT the 32-byte fingerprint
    // (SHA3-256 of the key). Using fingerprints instead silently produces different phrases
    // with no error signal and reduces the preimage size from 3200 bytes to 32 bytes.

    // Step 3: Map hash bytes to word indices.
    // Consume 2-byte chunks (u16, big-endian). The read cursor advances by 2 bytes
    // for each sample regardless of acceptance or rejection — rejected values are
    // discarded, not retried. Accept only values in [0, 62208)
    // (floor(65536 / 7776) × 7776 = 8 × 7776 = 62208) to eliminate modular bias.
    // Rejection rate: (65536 − 62208) / 65536 ≈ 5.1% per sample.
    // On exhaustion of 32 bytes, rehash: hash = SHA3-256("lo-phrase-expand-v1" || round || hash).
    // Total input: 52 bytes = 19-byte label + 1-byte round (u8) + 32-byte previous hash.
    // The read cursor resets to byte 0 of the new hash — no carry-over from the previous hash.
    // Concatenation order: 19-byte label, then 1-byte round (u8), then 32-byte previous hash.
    // Round counter starts at 1 (first rehash uses round = 0x01, range 1..=19).
    // Starting at 0 vs 1 produces different hash outputs — this is interop-critical.
    // Maximum round count is 19. Reaching round 20 → Internal error (structurally
    // unreachable at probability < 2^-150; implementations MUST treat as fatal and
    // return Internal — this indicates CSPRNG failure or a broken hash function,
    // not a recoverable condition. Do NOT retry or fall back to fewer words).
    // 16 initial samples + 19 rehash rounds × 16 = 320 total samples; termination probability < 2^-150.
    // "16 initial samples" means 16 candidate u16 values extracted from the 32-byte SHA3-256 output
    // (32 bytes / 2 bytes per u16 = 16 candidates). Each candidate is an attempt that may be accepted
    // (if < 62,208) or rejected (if ≥ 62,208). These are NOT 16 output words — 7 words are needed;
    // each hash round provides at most 16 candidate slots, typically more than enough for 7 words.
    words = []
    while len(words) < 7:
        val = next_accepted_u16(hash)   // bias-free, rejection sampling
        words.append(EFF_WORDLIST[val % 7776])

    return " ".join(words)
```

**Canonical output format**: The returned string is the seven words joined by **single ASCII space (0x20) characters**, with no leading or trailing whitespace. Words are **lowercase** as they appear in the EFF large wordlist — the wordlist is already lowercase; no case transformation is applied. Programmatic comparison of two phrases MUST use exact byte equality on the canonical string — case-folding or whitespace normalization before comparison is incorrect and masks implementation divergence.

**`fingerprint_hex()` produces lowercase hex — same constraint as §2.1**: The `fingerprint_hex()` function used for display returns 64 lowercase hexadecimal characters (digits `0-9` and lowercase letters `a-f`). This matches the §2.1 specification ("64 lowercase hex chars"). Any implementation that produces uppercase hex fingerprints (e.g., using `%X` format in C or `strings.ToUpper` in Go) diverges from the canonical form — verification phrase comparison and fingerprint display will not match across implementations even if the underlying keys are identical.

The word index is computed as `val % 7776`. The rejection threshold 62,208 (= 8 × 7,776) ensures uniform distribution — values ≥ 62,208 are rejected to eliminate modular bias.

**Cursor advance on rejection is mandatory for interoperability**: The read cursor advances by 2 bytes for every u16 sample regardless of whether the sample is accepted or rejected. An implementation that re-reads the same 2-byte position on rejection (i.e., advances only on acceptance) produces a different phrase for any input containing a rejected sample — occurring in approximately 5.1% of samples. Since the 7-word phrase requires 7 accepted samples, and any hash containing a rejected sample causes cursor divergence, the two implementations will agree only for the ~69% of 32-byte hashes that happen to contain no rejected sample in their first 7 accepted positions. **Reimplementers MUST advance the cursor unconditionally — treating rejection as "advance past the rejected sample" (discarded), not "retry the same position."** The test vector F.9 exercises this behavior explicitly (see Appendix F).

The EFF large wordlist contains exactly 7776 words, 0-indexed (entry 0 = "abacus", entry 7775 = "zoom"). Each word carries log2(7776) ≈ 12.9 bits of entropy. Seven words provide ≈ 90.3 bits of entropy. A 1-indexed implementation maps every index to a different word — this is a silent interop failure. Implementations MUST verify their embedded wordlist matches SHA3-256 `a1e90a00ec269fc42a5f335b244cf6badcf94b62e331fa1639b49cce488c95c5` (full reference in Appendix D). Mismatched wordlist copies — different versions, incomplete dice-prefix stripping, trailing whitespace differences — produce silently incompatible phrases with no error indicator.

**Canonical byte sequence for the checksum**: The EFF large wordlist source file ships with dice-prefix columns (`11111\tabacus`, etc.). To produce the canonical form: (1) strip the dice prefix and tab from each line, leaving only the word; (2) use LF (`\n`, 0x0a) line endings — no CRLF; (3) include a trailing LF after the last word. The resulting file is 7776 lowercase words, one per line with a trailing newline, totalling 43,186 bytes. The SHA3-256 of this byte sequence is `a1e90a00ec269fc42a5f335b244cf6badcf94b62e331fa1639b49cce488c95c5`. A wordlist with CRLF endings, no trailing newline, or retained dice prefixes produces a different hash — verify independently before embedding.

### 9.2.1 Error Summary

| Error | Condition |
|-------|-----------|
| `InvalidLength` | Either `pk_a` or `pk_b` is not exactly 3200 bytes. The 3200-byte requirement reflects the full LO composite identity public key (X-Wing 1216 + Ed25519 32 + ML-DSA 1952 — §2.2). Passing only a sub-key component (e.g., the 1216-byte X-Wing portion) returns `InvalidLength`. |
| `InvalidData` | `pk_a == pk_b` (self-verification). A phrase computed from a key paired with itself gives a false sense of security — both parties see the same phrase regardless of the other's key, providing no authentication signal. Public keys are non-secret material — variable-time comparison (`==`) is used, not `ct_eq`. |
| `Internal` | Rehash round counter reached 20 — structurally unreachable at probability < 2⁻¹⁵⁰. Indicates CSPRNG failure or a broken hash function. Must NOT be retried. |

### 9.3 Properties

- **Order-independent**: `VerificationPhrase(A, B) == VerificationPhrase(B, A)`.
- **Deterministic**: Given the same two identity keys, always produces the same phrase.
- **Unbiased**: Rejection sampling eliminates modular bias in word selection.
- **Session-independent**: Depends only on long-term identity keys, not session state.
- **Wordlist**: EFF large wordlist, 7776 words, compile-time length assertion.

### 9.4 Security Analysis

**Second-preimage resistance:** ~90.3 bits. An attacker trying to find a key that produces the same phrase when paired with a specific victim key must brute-force ~2^90 SHA3-256 hashes.

**Birthday collision resistance:** ~45 bits. An attacker who freely generates their own identity keys can generate ~2^45 keys; by the birthday paradox, with >50% probability two will produce the same verification phrase when paired with a given victim key. No CSPRNG access or key-generation control over the victim is required — the attacker generates their own keys until a collision is found. The attacker registers one key, establishes a legitimate verification phrase, then substitutes the colliding key — the victim's out-of-band check passes despite the key swap.

2^45 SHA3-256 operations (~35 trillion hashes) is expensive but within reach of well-resourced state-level attackers. For most threat models (where the attacker does not control key generation at scale), the ~90.3-bit second-preimage bound is the relevant security parameter. Applications with state-level threat models should supplement verification phrases with full fingerprint comparison.

---

## 10. Key Management

### 10.1 Identity Key

- Generated once, stored permanently.
- MUST be encrypted at rest (passphrase, device key, or platform secure storage).
- Loss = loss of identity. No recovery by design.
- Used for: auth (X-Wing component, §4), hybrid signing of pre-keys (§3), session initiation signing (§5.4 Step 6), KEM decapsulation in LO-KEX (§5.5 Step 4), and HKDF info binding in LO-KEX (§5). IK compromise alone is insufficient to derive session keys — also requires SPK private key (§5.6).

### 10.2 Signed Pre-Key Rotation

- Rotate every 7 days (recommended).
- Retain the full SPK keypair (secret key **and** SPK ID) for 30 days after rotation (grace period for delayed session inits). Incoming `session_init` blobs carry an `spk_id` used to look up the correct decapsulation key; storing only the key bytes without the ID makes this lookup impossible. The 30-day clock starts when the replacement SPK is uploaded and the old SPK leaves the active bundle — not when the old SPK was originally generated. An SPK that was active for 7 days and then rotated is retained for 30 additional days (37 days total from generation).
- After grace period, delete old SPK private key.

### 10.3 One-Time Pre-Key Management

- Upload batches of 100.
- Replenish on DM_PREKEY_LOW (remaining < 10).
- Delete private key immediately after single use.
- OPKs have no time-based expiry — an unconsumed OPK private key remains valid indefinitely until consumed or explicitly deleted by application policy. Unlike SPKs (which have a 30-day retention window after rotation, §10.2), OPKs are not rotated on a schedule.
- Protocol functions without OPK (reduced initial forward secrecy only). **When the OPK pool is empty, servers MUST return a bundle with `has_opk = false` rather than rejecting the bundle request.** Refusing to serve a bundle when no OPKs are available prevents session initiation entirely and is a denial-of-service hazard. An empty pool is a temporary operational condition, not a protocol error — the session proceeds without OPK and Alice and Bob acknowledge the reduced forward-secrecy guarantee.

### 10.4 Ratchet Key Lifecycle

- Generated per KEM ratchet step.
- Previous ratchet private key deleted (zeroized via `ZeroizeOnDrop`) after step completes.
- Previous receive epoch key retained for one epoch (late message grace period), then zeroized.

### 10.5 Memory Hygiene

Zeroize all sensitive material immediately after use:
- Shared secrets after key derivation.
- Message keys after encrypt/decrypt.
- Old ratchet private keys.
- Auth shared secrets.
- Intermediate KDF outputs.
- **Streaming AEAD key (caller copy)**: After calling `stream_encrypt_init` or `stream_decrypt_init`, the library holds an internal copy of the key in the encryptor/decryptor handle. The caller's original key buffer is NOT zeroed by the library — the caller MUST zeroize their copy immediately after init returns. See §15.1 for the key lifecycle. The handle's internal copy is zeroized automatically when the handle is freed (`stream_encrypt_free` / `stream_decrypt_free`).

Use `Zeroizing<T>` wrappers and `ZeroizeOnDrop` trait (Rust `zeroize` crate). `[u8; N]` is `Copy` — after `Zeroizing::new(val)`, explicitly `.zeroize()` the source copy.

**AEAD output buffer pre-allocation**: When encrypting into a growable buffer, pre-allocate the full output capacity (`plaintext.len() + 16` for Poly1305 tag) before writing any data. If the buffer reallocates mid-write, the abandoned heap region containing partial plaintext is freed without zeroization — the allocator does not zero freed memory. This applies to any language with growable buffers (Rust `Vec`, Go slices, Python `bytearray`). Pre-computing the output size eliminates reallocation entirely.

**Decompression intermediate buffers**: During storage decryption (§11.3) and streaming decryption (§15.5), the zstd decoder allocates internal buffers that are freed without zeroization — only the final output is wrapped in `Zeroizing`. The same applies to compression during encryption. These intermediate buffers may contain plaintext fragments on the heap. Implementations that require stronger guarantees should use a custom allocator that zeros on deallocation, or accept that decompression intermediates are a residual exposure window.

### 10.6 Passphrase-Based Key Derivation (Argon2id)

Identity keys stored on-device MUST be encrypted at rest. For passphrase protection use `primitives::argon2::argon2id` (RFC 9106, Argon2id variant, version 0x13 / v1.3). The version MUST be 0x13 — RFC 9106 also defines v1.0 (version 0x10), and some libraries default to it. Using the wrong version produces different output and silently incompatible key derivation.

**Presets:**

| Preset | m\_cost | t\_cost | p\_cost | Use case |
|--------|---------|---------|---------|----------|
| `OWASP_MIN` | 19 MiB (19456 KiB) | 2 | 1 | Interactive auth, latency < 1 s |
| `RECOMMENDED` | 64 MiB (65536 KiB) | 3 | 4 | Stored keypair protection, ~0.1-2 s (hardware-dependent; modern multi-core hardware typically 0.1-1 s) |
| `WASM_DEFAULT` | 16 MiB (16384 KiB) | 3 | 1 | WASM targets (single-threaded, constrained memory) — `p_cost = 1` because WASM runtimes are single-threaded: `p_cost > 1` serializes lane execution without achieving any parallelism, multiplying wall-clock time with zero additional security benefit |

**Requirements:**
- Salt: at least 8 bytes; 16 or 32 random bytes recommended. Use `primitives::random::random_array::<16>()`.
- Output: caller-allocated; 32 bytes for a 256-bit symmetric key; any positive length accepted. **Not length-extensible**: requesting 32 bytes and requesting 64 bytes produce outputs where the first 32 bytes are completely different — Argon2id's variable-length output uses Blake2b's long-output mode, which re-hashes the entire state for different output lengths. A reimplementer who requests a larger output and slices it to 32 bytes will silently produce an incompatible key.
- Zeroize output with `zeroize::Zeroize::zeroize(&mut out)` or wrap in `Zeroizing` after use.
- **Error-path zeroization**: On any error return (invalid parameters, Argon2 library failure), the output buffer is explicitly zeroized by the implementation before returning. Callers are NOT required to zeroize the output on the error path — but reimplementers MUST apply the same zeroization on failure; omitting it leaves partial key material in a caller-visible buffer with no obligation or documentation to clean it up.
- The salt MUST be stored alongside the encrypted key material — it is not secret and may be stored in plaintext. Without the original salt, the Argon2id derivation cannot be reproduced and the protected key is permanently unrecoverable.

**Validation bounds:**

| Parameter | Min | Max |
|-----------|-----|-----|
| `m_cost` | max(8, 8 × `p_cost`) KiB (RFC 9106 §3.1 block minimum: 8 blocks minimum; each lane requires at least 8 blocks, so p_cost lanes require 8 × p_cost blocks minimum; the standalone minimum of 8 applies when p_cost = 1, making max(8, 8 × p_cost) = 8) | 4 GiB (4194304 KiB) **`m_cost` is in KiB, not bytes**: passing `65536` means 64 MiB (correct for `RECOMMENDED`), not 65536 bytes (which would be only 64 KiB and silently produce a different key). Argon2 accepts any `m_cost` ≥ the minimum with no error, so a factor-of-1024 unit mistake causes silent incompatibility. |
| `t_cost` | 1 | 256 |
| `p_cost` | 1 | 256 |
| output length | 1 byte (inclusive) | 4096 bytes (soliton-imposed; RFC 9106 allows up to 2³²−1 bytes) |
| salt length | 8 bytes | 268,435,456 bytes (256 MiB) — CAPI limit; RFC 9106 allows up to 2³²−1 bytes, but `soliton_argon2id` caps salt input at the general CAPI buffer limit of 256 MiB to prevent allocation exhaustion |
| password length | 0 bytes | 268,435,456 bytes (256 MiB) — CAPI limit. **Zero-length password is accepted**: an empty password (`password = NULL` or `password_len = 0`) is valid input; `soliton_argon2id` passes zero bytes to Argon2id without error. Callers MUST validate non-empty passwords at the application layer if their threat model requires it — the primitive does not enforce this. |

**Argon2 library failure → `Internal`**: Parameter-validation failures (violations of the bounds table above) return `InvalidData`. An Argon2 library-internal failure during the hash operation itself (OOM, BLAKE2 internal error — structurally unreachable on correct parameters under normal OS conditions) returns `Internal`. Binding authors who need to enumerate all possible return codes must include `Internal` (-12) for this path. On any error (including `Internal`), the output buffer is zeroed before returning.

**The 4096-byte output cap is soliton-specific** — it is not mandated by RFC 9106 (which allows outputs up to 2³²−1 bytes). The cap prevents allocation-exhaustion attacks in server contexts where output length comes from untrusted input (e.g., a malicious client requesting a 4 GiB KDF output). A reimplementer who removes the cap to "follow the standard" reintroduces this vector.

**The `t_cost = 256` cap bounds iteration-time exhaustion**: `t_cost` controls the number of passes over the memory block. Each pass takes time proportional to `m_cost`, so an adversary supplying untrusted `t_cost` from a request (e.g., a client sending its KDF parameters to a server that re-derives the key) can force arbitrarily long computation. The cap of 256 limits total work to `256 × m_cost` passes, bounding the server-side CPU cost to a predictable maximum regardless of client-supplied input. This is the same defense-in-depth motivation as the `p_cost = 256` and output-length caps — all three parameters are capped to prevent resource exhaustion when Argon2id parameters originate from untrusted input rather than the application's own configuration.

**Error types**: Salt too short (< 8 bytes) or output length violations (0 bytes or > 4096 bytes) return `InvalidLength`. Cost parameter violations (`m_cost`, `t_cost`, `p_cost` out of bounds or below argon2 library minimums) return `InvalidData`. **Coupled constraint**: RFC 9106 §3.1 additionally requires `m_cost >= 8 × p_cost` — each parallel lane requires at least 8 KiB of memory. Combinations where both `m_cost` and `p_cost` are individually within bounds but violate this coupling (e.g., `m_cost=100`, `p_cost=100`) return `InvalidData`. The individual upper-bound caps (`m_cost > M_COST_MAX`, `t_cost > T_COST_MAX`, `p_cost > P_COST_MAX`) are checked in soliton code before the library call. The coupled `m_cost >= 8 × p_cost` constraint is enforced by the argon2 library's parameter constructor (`Params::new`) and mapped to `InvalidData` via `.map_err` — it is not a soliton-level pre-check. A reimplementer who adds their own pre-checks must include this coupled constraint explicitly; checking only the individual caps will miss it.

**`InvalidLength.expected` for zero-length output**: When `output_len = 0`, `InvalidLength` is returned with `expected = 1` (the minimum valid output length), NOT `expected = 4096`. The `expected` field reflects the bound violated: for `output_len < 1`, it signals the minimum; for `output_len > 4096`, it signals the maximum. A caller who inspects the `expected` field programmatically to build a diagnostic message should not assume that `expected = 4096` means "value too small" — the value that appears in `expected` is always the valid-range bound that was violated, not a fixed error code. This is consistent with `InvalidLength` semantics throughout soliton (see §12 error semantics).

**Usage**: Argon2id is a building-block primitive — it derives a symmetric key from a passphrase, but does not define an encrypted blob format. The application is responsible for using the derived key with an AEAD to encrypt identity key material, and for defining the on-disk format. soliton does not export a combined "encrypt identity with passphrase" function; the CAPI exposes `soliton_argon2id` as the KDF and the application composes AEAD encryption separately.

**Recommended composition**: For passphrase-protected identity keys, use XChaCha20-Poly1305 (the same AEAD as the rest of the protocol) with a 24-byte random nonce (`random_bytes(24)`) and the Argon2id-derived 32-byte key used directly as the AEAD key — no secondary KDF or key expansion step. The 32-byte Argon2id output is already a uniformly distributed key of the correct size for XChaCha20-Poly1305. A reimplementer who adds an additional HKDF step (e.g., `HKDF(argon2_output, salt=nonce, info="...")`) produces incompatible ciphertext with no error at encryption time — the mismatch surfaces only as `AeadFailed` at decryption. The on-disk format is `salt (16 bytes) ‖ nonce (24 bytes) ‖ AEAD ciphertext (plaintext_len + 16 tag)`. **Minimum parseable blob size**: With the recommended format, a blob must be at least **56 bytes** (16 salt + 24 nonce + 16 Poly1305 tag with empty plaintext). Decoders MUST reject blobs shorter than 56 bytes before attempting AEAD — slicing a sub-24-byte remainder for the nonce causes out-of-bounds access in C or a panic in Rust. Return `InvalidLength` (not `AeadFailed`) for blobs shorter than 56 bytes; this is a pre-AEAD framing check on publicly observable data, not an authentication failure. Note: the storage blob format (§11.1) enforces a 42-byte minimum analogously — passphrase-protected key blobs need the same pre-AEAD guard. No AAD is required (the salt and nonce are integrity-protected via their role in the derivation/encryption — changing either produces decryption failure). This composition is not normative (applications may use a different AEAD or format), but following it ensures interoperability between independent implementations of passphrase-protected key storage. An application that chooses a different AEAD (e.g., AES-256-GCM) produces encrypted keys that are incompatible with applications following this recommendation.

**Parameter flexibility limitation and extended blob format**: The recommended `salt(16) ‖ nonce(24) ‖ ciphertext` format does not encode Argon2id cost parameters. If an application later upgrades from `OWASP_MIN` to `RECOMMENDED` parameters, existing blobs become permanently undecryptable without out-of-band knowledge of which parameters were used. Applications that may need to change parameters should use the extended format: `m_cost (4 bytes, BE u32) ‖ t_cost (1 byte) ‖ p_cost (1 byte) ‖ salt (16 bytes) ‖ nonce (24 bytes) ‖ ciphertext`. **Extended format serialization constraint**: `t_cost` and `p_cost` are stored as single bytes (u8, values 0x01-0xFF = 1-255). The bounds table above allows `t_cost` and `p_cost` up to 256, but a value of 256 does not fit in a u8 (256 = 0x100 truncates to 0x00, which is invalid — the minimum is 1). Applications using the extended format MUST restrict `t_cost` and `p_cost` to 1-255. The `m_cost` field is stored as a 4-byte BE u32, which accommodates the full 4 GiB maximum without constraint. **Important: soliton does not implement a passphrase-blob encoder or decoder.** The CAPI exposes only `soliton_argon2id` as the KDF primitive — there is no `soliton_passphrase_encrypt` or `soliton_passphrase_decrypt` function. Both the basic and extended format layouts are APPLICATION-LAYER conventions for applications to implement. The magic-byte discriminator described below is a RECOMMENDED convention for achieving cross-implementation interoperability, not a normative value enforced by the soliton library. A decoder that needs to support both formats MUST distinguish them using a magic prefix byte (`0x00` for basic, `0x01` for extended — recommended values for interoperability), NOT by a length check alone. A length check is only unambiguous when the plaintext is shorter than 6 bytes (total blob ≤ 77 bytes for basic, ≤ 83 bytes for extended): for any real-world content with ≥ 6 bytes of plaintext, the size ranges of the two formats overlap completely and a length-based discriminator silently misparses every ambiguous blob. A reimplementer who uses a length check will find it works on test vectors with empty or short plaintexts but fails in production. The magic-byte scheme is the only reliable approach for cross-implementation interoperability. The extended format minimum parseable size is **62 bytes** (4 + 1 + 1 + 16 + 24 + 16 tag with empty plaintext). **Magic-byte interaction with minimum blob sizes**: The 56-byte and 62-byte minimums above are for the format bodies — the payload after the magic discriminator byte. When the magic-byte discriminator is prepended at offset 0, the minimum sizes inclusive of the discriminator become **57 bytes** (basic: `0x00 ‖ salt ‖ nonce ‖ tag`) and **63 bytes** (extended: `0x01 ‖ m_cost ‖ t_cost ‖ p_cost ‖ salt ‖ nonce ‖ tag`). Decoders using the magic-byte scheme MUST apply the format-inclusive minimum (57 or 63 bytes) as their pre-AEAD size check, not the format-body minimums (56 or 62). Test vector F.29 uses the basic format (parameters supplied out-of-band). No test vector covers the extended format (`0x01 ‖ m_cost ‖ t_cost ‖ p_cost ‖ salt ‖ nonce ‖ ciphertext`) — the encoding of `m_cost` as a 4-byte BE u32 and `t_cost`/`p_cost` as single bytes is verified only through the extended-format decoder in the reference implementation (`tests/compute_vectors.rs`). Implementors adding extended-format support should verify their encoder output matches the reference decoder by running the reference integration test.

**Unicode normalization**: No Unicode normalization is applied — raw UTF-8 bytes are passed directly. Multi-platform applications MUST normalize passwords to NFC before calling, because iOS (`NSString`), Android (`String`), and Rust (`str`) use different internal representations; `"café"` may encode differently across platforms, producing different keys from the same apparent passphrase.

**Invalid UTF-8 passthrough**: Argon2id (RFC 9106) accepts arbitrary byte strings — it is not a Unicode function. The soliton `argon2id` primitive does NOT validate that the password bytes are well-formed UTF-8. Invalid UTF-8 byte sequences (e.g., `0xFF`, lone continuation bytes) are passed through to Argon2id unchanged and produce a deterministic key. A reimplementer who adds a UTF-8 validation step at their API boundary (rejecting invalid UTF-8 with `InvalidData`) produces a stricter interface than the reference — callers passing arbitrary byte strings (not just UTF-8 passwords) to the CAPI will receive `InvalidData` from the reimplementation but succeed with the reference. The CAPI `soliton_argon2id` accepts `*const u8, usize` and passes the bytes through without UTF-8 checking.

**Notes:**
- Only the Argon2id variant is supported (hybrid of Argon2i and Argon2d; recommended by RFC 9106 §4 for on-disk key material).
- Not used internally by the protocol KDFs (HKDF-SHA3-256); provided for application-layer key protection.
- The `argon2` crate zeroizes internal memory blocks on drop.
- **`ad` (associated data) is always empty**: soliton passes an empty byte string as the Argon2id `ad` parameter unconditionally. RFC 9106 defines `ad` as an optional context distinguisher (analogous to HKDF `info`), but the soliton KDF does not use it. Reimplementers MUST pass empty `ad` — a non-empty `ad` produces a different derived key with no error signal. **Per-language**: C callers using `argon2_ctx` directly MUST set `.ad = NULL, .adlen = 0`; Python's `argon2-cffi` has no `ad` parameter (always empty, correct by construction); Go's `argon2.IDKey` has no `ad` parameter (always empty, correct by construction); Rust's `argon2` crate uses `Params::default()` which sets `ad` to empty.
- **`secret` (pepper) is always empty**: soliton passes an empty byte string as the Argon2id `secret` parameter. The `secret` input provides a server-side pepper, but soliton does not use it (the pepper would require secure server-side key management outside the protocol scope). Reimplementers MUST pass empty `secret` — a non-empty `secret` produces a different derived key with no error signal. **Per-language**: C callers using `argon2_ctx` directly MUST set `.secret = NULL, .secretlen = 0`; Python's `argon2-cffi` has no `secret` parameter (always empty, correct by construction); Go's `argon2.IDKey` has no `secret` parameter (always empty, correct by construction); Rust's `argon2` crate uses `Params::default()` which sets `secret` to empty. See Appendix B for the full parameter table.

---

## 11. Server-Side Encryption at Rest

### 11.1 Storage Blob Format

Messages are batched, compressed, then encrypted. Blob format:

```
[version: 1 byte, offset 0] [flags: 1 byte, offset 1] [nonce: 24 bytes, offsets 2-25] [ciphertext + tag, offset 26+]
```

**Version byte** (1-255): Storage encryption key version. Value 0 is reserved and rejected.

**Version 0 on the decrypt path**: Version 0 is rejected at key creation (`StorageKey::new`) and never enters the keyring. A blob header with version 0 therefore produces a keyring lookup miss, returning `AeadFailed` — not an early `InvalidData`. Implementations MUST NOT add a pre-AEAD version-0 check on the decrypt path; doing so returns a different error variant and creates an error-type oracle. **This guarantee depends entirely on the keyring construction invariant**: the decrypt path itself performs no version-0 check — it relies on `StorageKey::new` having enforced `version ≠ 0` at key creation time, so no version-0 key can ever be in the keyring to produce a lookup hit. An implementation that allows version-0 keys to be added via a debug path, test fixture, or internal bypass silently breaks this guarantee — version-0 blobs would then decrypt successfully against a version-0 key, producing correct plaintext where the spec mandates `AeadFailed`. Implementations MUST enforce `version ≠ 0` at key construction unconditionally; the decrypt path's correctness is derived from it.

**Flags byte** (bitfield):
- Bit 0: compression. 0 = none, 1 = zstd.
- Bits 1-7: reserved (must be 0; blobs with reserved bits set are rejected as `AeadFailed` — not `UnsupportedFlags` — to prevent error oracles that distinguish pre-AEAD validation failures from authentication failures).

**Nonce**: 24 bytes generated from the OS CSPRNG (`random_bytes(24)`) per encryption call. Birthday collision probability is ~2⁻⁹⁶ per pair, negligible across realistic encryption volumes per key version. For context: at one billion encryptions per day (10⁹/day), the expected time to a first nonce collision under the same key bytes exceeds 10²⁰ years — exhausting the 2⁹⁶ nonce space is not a realistic threat. **Key rotation MUST be driven by key-material compromise concerns and organizational policy, not by nonce-collision probability.** The 24-byte nonce space is large enough that nonce exhaustion is structurally irrelevant; rotate keys on a schedule appropriate for the sensitivity of the protected data. **The birthday bound is per key_bytes, not per version number**: assigning a new version number to the same key bytes does not reset the nonce pool — all blobs encrypted under any version that maps to the same key bytes share a single nonce space. Operators MUST use fresh, independently generated key bytes for each new key version. Reusing key material across version numbers provides no cryptographic isolation.

Minimum blob length: 42 bytes (1 + 1 + 24 + 16-byte Poly1305 tag with empty ciphertext).

**Version and flags are AEAD-authenticated via AAD** (§11.4): both fields are included verbatim in the AAD that is passed to XChaCha20-Poly1305 encryption. A reimplementer who reads §11.1 and implements the encrypt path without reaching §11.4 may omit version and flags from the AAD — the AEAD succeeds, but the resulting blob is malleable: an attacker can flip the compression bit or substitute a different version byte without detection. The AAD construction in §11.4 is not optional.

### 11.2 Pipeline

**Write**: batch → serialize → compress (zstd if enabled) → construct AAD → encrypt (XChaCha20-Poly1305) → prepend version + flags + nonce → write.

**Read**: fetch → parse version + flags + nonce → reject reserved flag bits as `AeadFailed` (not `UnsupportedFlags` — §11.1 oracle collapse) → look up key by version → reconstruct AAD → decrypt → decompress (if flag set) → deserialize.

Compression before encryption is mandatory ordering (encrypted data is incompressible).

### 11.3 Compression

**`encrypt_blob(compress=true)` always compresses and always sets `flags=0x01`**, regardless of whether zstd output is larger than the input. There is no expansion-ratio skip. A reimplementer who skips compression when it would expand must set `flags=0x00` (not `flags=0x01`) — see the "Zstd level 1 expansion" note below. **Empty plaintext with `compress=true`**: when the plaintext is empty (0 bytes) and `compress=true`, the encoder MUST still call zstd on the empty input and store the resulting non-empty zstd frame. An empty zstd input produces a minimal valid zstd frame (~4-12 bytes, not 0 bytes — **informational only; implementations MUST NOT add a minimum frame size check based on this range**: adding a "frame must be ≥ 4 bytes" pre-AEAD guard would reject AEAD-authenticated blobs from future-compatible encoders that use a different zstd version or configuration, and re-create the oracle-collapse problem by returning a distinct error before attempting AEAD). The encoder MUST NOT skip compression for empty plaintext and produce a 0-byte body — doing so creates a blob that `decrypt_blob` can decode successfully (AEAD passes on the correct key, decompression of a 0-byte body produces 0 bytes), but which is not conformant to this spec and is not byte-compatible with the reference implementation's encrypt output. A reimplementer who tests the empty-plaintext case using `compress=false` and assumes the same behavior applies to `compress=true` will miss this divergence.

**Zstd** (RFC 8878). On by default. The current implementation uses `ruzstd`'s `Fastest` level (~zstd level 1); higher levels are not yet available in `ruzstd` 0.8.x. The compression level is not configurable — all blobs are compressed at the same level. **Interop note**: the compression level is not part of the wire format. Any valid zstd frame is acceptable on decompression regardless of the compression level used to produce it. A reimplementer using a higher compression level (e.g., zstd level 3) produces interoperable blobs — the decompressor does not know or care what level was used.

**Size limit**: 256 MiB maximum on native targets; 16 MiB on `wasm32` targets (`cfg(target_arch = "wasm32")`). This limit is enforced on both the encrypt and decrypt paths. On encrypt, the core library returns `InvalidData` (not `InvalidLength`; oversized plaintext is a protocol-level size policy violation, not a type-level buffer size mismatch) when plaintext exceeds the platform's limit. On decrypt, decompressed output exceeding the limit triggers `AeadFailed` (not `DecompressionFailed`) — all post-AEAD errors are collapsed to prevent a 1-bit oracle that would reveal successful authentication (see §12 error-oracle collapse). The decrypt-side limit prevents OOM from maliciously crafted zstd payloads ("zip bomb" attacks). Enforced via `decoder.take(MAX_DECOMPRESSED_SIZE + 1)` followed by length check. **Cross-platform note**: a blob encrypted on native with plaintext between 16-256 MiB is permanently undecryptable on WASM (exceeds the WASM limit). WASM encryptors are capped at 16 MiB so they cannot create such blobs, but mixed-platform deployments must enforce the lower limit on the encryption side.

**CAPI error on oversized plaintext — platform-dependent**: The CAPI applies a blanket 256 MiB cap on all input buffers (§13.4), returning `InvalidLength` for any buffer exceeding that limit. On native targets, this CAPI cap fires before the core library's 256 MiB `InvalidData` check — so CAPI callers on native see `InvalidLength` for oversized plaintext, not `InvalidData`. On WASM targets, the core library's 16 MiB limit is smaller than the CAPI's 256 MiB cap, so the core check fires first and returns `InvalidData`. A reimplementer building a compatible CAPI should apply the general `InvalidLength` cap first, then let the core `InvalidData` check enforce the platform-specific limit.

**Zstd level 1 expansion and conditional compression skip — AAD binding hazard**: At compression level 1 (`Fastest`), zstd occasionally expands incompressible data (the compressed output is larger than the input). A reimplementer who skips compression when it would expand (i.e., uses the uncompressed plaintext if `compressed_size >= original_size`) MUST set `flags = 0x00` in both the blob header and the AAD. If they set `flags = 0x01` in the AAD (because they "attempted" compression) but store uncompressed plaintext in the ciphertext, AEAD authentication will succeed at decrypt but decompression of the uncompressed content will fail or produce garbage. Equivalently, if they set `flags = 0x01` in the header and `flags = 0x00` in the AAD, AEAD authentication fails immediately. The invariant: `flags` in the AAD MUST equal `flags` in the blob header, and BOTH must accurately reflect whether the encrypted content is zstd-compressed. There is no mechanism to "correct" the flags after AEAD seals the blob — the flags are bound to the ciphertext at encryption time.

**Decompression is flag-driven, not content-sniffing**: The decryptor checks flags bit 0 to determine whether to decompress — it does NOT inspect the plaintext for zstd magic bytes (0x28 0xB5 0x2F 0xFD) and attempt decompression speculatively. A reimplementer who sniffs content and decompresses any output that begins with zstd magic bytes diverges from the specification: an uncompressed blob whose plaintext happens to begin with those bytes would be incorrectly decompressed, producing garbage or a decompression error. The flags byte is the sole authority on whether decompression applies.

An empty compressed payload decompresses to an empty plaintext (**decrypt side only** — special-cased **after AEAD decryption** by checking whether the decrypted content is empty before calling zstd, which would otherwise reject the empty frame). The empty check fires on the post-AEAD plaintext bytes, not on the raw ciphertext body: `flags=0x01` with a zero-byte post-AEAD payload is accepted; a reimplementer who checks for emptiness pre-AEAD (on the ciphertext) and rejects would silently refuse a class of valid blobs. On the encrypt side, zstd produces a non-empty frame even for empty input (zstd frame headers are always present), so an empty compressed payload can only appear in a blob produced outside the standard encrypt path.

**`encrypt_blob` zstd expansion asymmetry with streaming**: `encrypt_blob` does not enforce a zstd expansion guard — if zstd produces output larger than the input, the larger output is stored. This is asymmetric with `stream_encrypt_chunk` (§15.11), which returns `Internal` if zstd output exceeds `plaintext.len() + STREAM_ZSTD_OVERHEAD`. Implementations MUST NOT add the streaming guard to `encrypt_blob` for consistency — doing so returns `Internal` for incompressible inputs instead of storing them, breaking interoperability.

**Compression oracle (CRIME/BREACH)**: Any API that compresses plaintext that an attacker can partially control and then reports the ciphertext size creates a compression oracle. If the caller can observe the size of the encrypted blob (e.g., as returned by `encrypt_blob`) and inject chosen text adjacent to a secret in the same compression context (e.g., the same blob or channel), the attacker can recover the secret byte-by-byte by correlating size changes with injected guesses. This is the CRIME/BREACH attack family. For `encrypt_blob`, the entire plaintext is compressed as a single unit before AEAD — if the plaintext mixes attacker-controlled data with secrets (e.g., a JSON blob with a user-controlled field alongside an authentication token), the compressed size reveals information about the secret. Callers MUST NOT include attacker-controlled data and secrets in the same `encrypt_blob` call without either disabling compression (`compress=false`) or ensuring the compression contexts are isolated. For community channel storage where all subscribers share the same channel key, this concern applies to any blob that mixes channel content from multiple trust levels. See §15.5 for the same analysis applied to the streaming layer, where the concern is more acute due to sequential chunk compression.

### 11.4 Storage AAD

#### 11.4.1 Community Storage AAD
aad = "lo-storage-v1"        // 13 bytes
   || version                              // 1 byte (key version)
   || flags                                // 1 byte (bit 0: compressed)
   || len(channel_id) || channel_id        // UTF-8, 2-byte BE len
   || len(segment_id) || segment_id        // UTF-8, 2-byte BE len

**Total AAD size**: `15 + 2 + len(channel_id) + 2 + len(segment_id)` = `19 + len(channel_id) + len(segment_id)` bytes. The fixed overhead is 19 bytes: 13 (label) + 1 (version) + 1 (flags) + 2 (channel_id length prefix) + 2 (segment_id length prefix). Quick-check: for 8-byte `channel_id` and 12-byte `segment_id`, the AAD is 39 bytes total.

**Why version and flags are in the AAD**: Binding `version` prevents an attacker from substituting a different key version's ciphertext (key confusion). Binding `flags` prevents flipping the compression bit after encryption — without this, an attacker could set the compression flag on uncompressed ciphertext, causing the decryptor to run zstd decompression on raw plaintext (producing garbage or a decompression bomb). The AAD authenticates the processing pipeline, not just the plaintext.

**Why community storage AAD omits identity fingerprints**: Community storage is channel-keyed, not user-keyed — blobs are shared channel content encrypted under the channel's storage key, not under any individual user's key. Binding sender/recipient fingerprints would require knowing the author's identity at both encrypt and decrypt time, which is not always available (e.g., bulk channel export, server-side re-encryption). The `channel_id` and `segment_id` provide the binding that matters for anti-relocation: blobs cannot be moved to a different channel or segment position. **Why DM queue AAD binds recipient but not sender**: DM queue blobs are server-held per-recipient caches. The recipient's identity is always known at both encrypt and decrypt time (it is the key owner). The sender's identity is not reliably available at decrypt time — a server processing a DM queue does not necessarily know which sender produced each blob. Binding only the recipient prevents a relay from substituting one recipient's queued messages into another recipient's slot, without requiring sender identity tracking. A reimplementer who adds sender fingerprints to community storage AAD or DM queue AAD produces blobs that cannot be decrypted by the standard implementation — the AAD mismatch causes silent `AeadFailed`.

**No Unicode normalization**: All string identifiers (`channel_id`, `segment_id`, `batch_id`, `recipient_fp`) are raw UTF-8 bytes with no normalization applied. `"café"` encoded as NFC (U+00E9) and NFD (U+0065 U+0301) produce different AAD and thus decryption failure. Languages with automatic string normalization (Swift `String` normalizes to NFC; macOS filesystem APIs may normalize paths) must preserve the original byte representation.

**`len()` is byte length of the UTF-8 encoding**: The 2-byte BE length prefix encodes the byte count of the UTF-8-encoded string — not the character count, UTF-16 code-unit count, or code-point count. A 4-byte emoji (U+1F600) has byte-length 4, character-count 1, and UTF-16-unit-length 2. Implementations that call `string.length()` in Java, C#, or JavaScript receive the UTF-16 unit count and MUST convert to byte count (e.g., `string.getBytes(UTF_8).length` in Java, `Encoding.UTF8.GetByteCount(s)` in C#, `Buffer.byteLength(s, 'utf8')` in Node.js) before writing the prefix. Passing the wrong count shifts all subsequent fields and produces permanent AEAD failure with no diagnostic.

**UTF-8 validation**: All string fields (`channel_id`, `segment_id`, `batch_id`) MUST be validated as well-formed UTF-8 before inclusion in the AAD. Invalid UTF-8 → `InvalidData`. **In Rust**, `channel_id: &str` (and likewise `segment_id: &str`, `batch_id: &str`) guarantees valid UTF-8 at the type level — no explicit `from_utf8()` check is required or present in the reference implementation; the Rust type system enforces it. **In C, Go, and other language bindings**, the caller must perform an explicit check — Go's `string([]byte{0xFF})` accepts arbitrary bytes and does NOT validate UTF-8; C has no built-in UTF-8 validation. Without this check, two callers passing the same logical string through different byte-level representations (one with an invalid continuation byte silently substituted) produce different AAD and silent AEAD failure on decrypt. CAPI callers and non-Rust bindings MUST validate UTF-8 before passing string fields to the library.

**Oversized identifier error asymmetry**: When `channel_id` or `segment_id` exceeds 65,535 bytes (the maximum representable value of the 2-byte BE length prefix), `build_storage_aad` returns `InvalidData`. On the **encrypt** path this propagates directly as `InvalidData`. On the **decrypt** path it is remapped to `AeadFailed` — returning `InvalidData` from a decrypt call would leak that the rejection occurred at AAD construction before AEAD was attempted, revealing that the identifier exceeded the length limit rather than that authentication failed. Callers must not interpret `AeadFailed` on decrypt as proof that the ciphertext was structurally valid — an oversized identifier produces the same error as a tampered blob.

**Oversized `batch_id` error asymmetry (§11.4.2)**: The same encrypt→`InvalidData` / decrypt→`AeadFailed` asymmetry documented above for `channel_id` and `segment_id` applies equally to `batch_id`. When `batch_id` exceeds 65,535 bytes (the maximum representable value of the 2-byte BE length prefix in the DM queue AAD construction), `InvalidData` is returned on the encrypt path and `AeadFailed` on the decrypt path. The rationale is identical to the community storage asymmetry above.

**Zero-length `channel_id` and `segment_id` are valid**: The `len(x) || x` encoding permits zero-length strings — a zero-length `channel_id` encodes as `0x0000` (2-byte BE prefix with no subsequent bytes). The library accepts zero-length IDs on both encrypt and decrypt paths. Reimplementers MUST NOT add a non-empty guard on these fields; doing so produces `InvalidData` on encrypt and `AeadFailed` on decrypt for blobs where an empty string was used as the identifier, creating a silent interoperability break with any implementation following this spec.

#### 11.4.2 DM Queue AAD
aad = "lo-dm-queue-v1"       // 14 bytes
   || version                              // 1 byte (key version)
   || flags                                // 1 byte (bit 0: compressed)
   || len(recipient_fp) || recipient_fp    // 32 bytes, recipient identity fingerprint (length-prefixed despite being fixed-size; a reimplementer who uses bare encoding — by analogy with §7.4's fixed-size fields — produces different AAD bytes and silent AEAD failure on every message)
   || len(batch_id) || batch_id            // UTF-8, 2-byte BE len

**Total AAD size**: `14 + 1 + 1 + 2 + 32 + 2 + len(batch_id)` = `52 + len(batch_id)` bytes. For a UUID4 `batch_id` (36 ASCII characters), the total AAD is 88 bytes.

**`batch_id` MUST be unique per `(recipient_fp, key_version)` pair**: The `batch_id` is the sole per-batch domain separator in the DM queue AAD. Two blobs with the same `recipient_fp`, `key_version` (the `version` byte in the header), and `batch_id` but different content share the same AAD — an attacker who can observe or influence both blobs can use the colliding AAD to mount an **integrity substitution attack**: because AEAD authentication binds the AAD, a ciphertext-and-tag pair that is valid under one blob's (nonce, key, AAD) is also valid when presented under any other blob with the same AAD and the same key. In practice, nonces are random and different per blob, so valid tag reuse across blobs requires nonce collision (negligible); however, colliding AADs mean the AEAD tag provides no binding between the ciphertext and *which specific batch* it belongs to — an attacker with write access to the store can substitute blob A for blob B if both share the same AAD, and the recipient's AEAD will accept it. Distinct `batch_id` values prevent this by making each batch's AAD unique, ensuring that a ciphertext produced for batch A cannot authenticate as batch B. `batch_id` SHOULD be a value that is unique per batch: a UUID4 (`random_bytes(16)` formatted as UUID), a monotonic counter, or a timestamp with sufficient resolution. The reference implementation does not generate `batch_id` automatically — it is caller-supplied. A server that reuses `batch_id` across different message batches to the same recipient under the same key version creates colliding AADs.

**`recipient_fp` is a raw 32-byte binary value**: `recipient_fp` is the raw SHA3-256 digest of the recipient's LO composite public key (32 bytes of binary data) — not a hex string, not a UTF-8 encoding, not a Base64 value. It is concatenated directly into the AAD with no encoding step and no additional length delimiter beyond the `len()` prefix shown above. Unlike `batch_id` (which is a UTF-8 string requiring byte-length encoding) and unlike the display fingerprint (a 64-character hex string returned by `GenerateIdentity`), `recipient_fp` is always 32 raw bytes. The surrounding §11.4 discussion of UTF-8 validation and byte-length semantics applies only to string fields (`channel_id`, `segment_id`, `batch_id`) — it does not apply to `recipient_fp`.

**DM queue blob size cap**: DM queue blobs (`soliton_dm_queue_encrypt` / `soliton_dm_queue_decrypt`) inherit the CAPI input size cap from §13.2: **256 MiB (268,435,456 bytes)** on standard (non-WASM) targets; **16 MiB (16,777,216 bytes)** on WASM targets (where allocation constraints are tighter). Inputs exceeding the applicable cap return `InvalidLength`. The WASM cap is enforced by the WASM binding layer, not the core library — the core library applies only the 256 MiB cap. Implementations targeting WASM MUST apply the 16 MiB cap before calling into the core library. In practice, DM queue blobs are bounded by the application-layer message size limit (LO Protocol §15.1 mandates padding to a fixed maximum message size), which is well below either cap.

### 11.5 Storage Layout

```
<backend>/<channel_id>/<yyyy-mm-dd>/segment-<N>.blob
```

- Partitioned by channel and date for efficient retention purging.
- Segments are append-only encrypted chunks, numbered sequentially within a day.
- New segment when current exceeds `batch_size`. `batch_size` is an application-defined threshold (not a library constant) — the maximum number of messages stored in a single segment file before rolling over to a new one. A typical value is 1,000 messages; larger values increase the decryption cost when accessing old messages (the entire segment must be decrypted to retrieve any message within it).
- S3: shallow, predictable prefixes.

**segment_id AAD value**: The `segment_id` field in community storage AAD (§11.4.1) is the `<yyyy-mm-dd>` directory name for channels with at most one segment per day (e.g., `"2024-03-15"`, matching the test vector in Appendix F.8). **The date MUST be ISO 8601 with zero-padded month and day**: `"2024-03-05"` for March 5th — NOT `"2024-3-5"`. Two implementations using different date formatters (one zero-padded, one not) produce different AAD bytes and silent `AeadFailed` on every cross-implementation decrypt. For channels with multiple segments per day, `segment_id` MUST include both the date and the sequence number to prevent AAD collisions — e.g., `"2024-03-15/segment-42"` (separating date and filename with `/`) or another unambiguous application-defined format, provided both encrypt and decrypt use the same convention. Using the bare date for multi-segment days means all blobs on that day share the same `segment_id` and therefore the same AAD — different key versions (the `version` byte) prevent key confusion, but the position binding is weaker. The `channel_id` AAD value is the bare channel identifier string, not any path prefix.

**`segment_id` is external caller-supplied context — not stored in the blob**: The `segment_id` value is never encoded inside the encrypted blob. It must be reproduced identically at decrypt time from external metadata (e.g., the file path, directory name, or database record identifying this segment). A reimplementer who assumes `segment_id` is derivable from the ciphertext will produce a wrong AAD at decrypt time and receive `AeadFailed` with no diagnostic. Both the encrypt and decrypt calls MUST supply the same `segment_id` string — the AEAD tag authenticates it as part of AAD, so any mismatch causes authentication failure.

**No canonical multi-segment segment_id format**: This specification does not standardize a multi-segment `segment_id` format. The example `"2024-03-15/segment-42"` is illustrative, not normative. Any format that uniquely identifies the date and segment position within a channel is acceptable, provided it is applied consistently on the encrypt and decrypt sides. Deployments that define their own format (e.g., `"2024-03-15_42"`, `"20240315-042"`) are specification-conformant as long as the same convention is used throughout.

### 11.6 Key Rotation

Multiple decryption keys active simultaneously, identified by version byte (1-255).

```
LO_STORAGE_KEY_V1 = <64 hex chars>    # 256-bit key — MUST be generated from the OS CSPRNG
LO_STORAGE_KEY_V2 = <64 hex chars>
# Generate: openssl rand -hex 32
```

**Storage key MUST be generated from the OS CSPRNG**: unlike the per-blob nonce (§11.1, explicitly `random_bytes(24)`), the storage key is a long-lived 256-bit symmetric key used to encrypt every blob under that version. The key MUST be generated using the OS CSPRNG (`openssl rand -hex 32`, `random_bytes(32)`, or equivalent). A key derived from a deterministic schedule, a counter, a password without KDF stretching, or any non-CSPRNG source reduces security to the entropy of that source. A KDF-derived key (e.g., HKDF from a master key) is acceptable only if the master key itself was CSPRNG-generated and the derivation is documented. The `openssl rand -hex 32` example above is the recommended generation command; any OS-level CSPRNG invocation (`/dev/urandom`, `getrandom(2)`, `CryptGenRandom`, `SecRandomCopyBytes`) is equivalent.

**Procedure:**
1. Generate new key for version N+1.
2. Provide at boot: `LO_STORAGE_KEY_V{N+1}` environment variable.
3. Update config: `active_version = N+1` (live reload).
4. New writes tagged V(N+1). Old reads use version byte to select key.
5. Optional: re-encrypt old segments (read with old key, write with new).
6. After migration: remove old key on next restart. **Warning**: if step 5 is skipped, removing the old key makes all blobs encrypted under that version permanently undecryptable — there is no grace period, soft delete, or recovery mechanism. The version byte in each blob identifies which key decrypts it; without that key in the keyring, `decrypt_blob` returns `AeadFailed`. Callers MUST either complete re-encryption before key removal or accept permanent data loss for unrewritten blobs.

**Key validation**: `StorageKey::new(version, key_bytes)` rejects version 0 with `UnsupportedVersion` (version 0 is reserved — consistent with encountering version 0 in a blob's version byte during decryption) and rejects all-zero key material with `InvalidData` via constant-time comparison (`ct_eq`). An all-zero key is never legitimate — with the key known to any observer, the XChaCha20 keystream is publicly computable and all encrypted blobs are trivially decryptable. (Nonces are CSPRNG-generated independently of the key value; the rejection is not about nonce determinism.) **Caller zeroization obligation**: `key_bytes` is `[u8; 32]` — a `Copy` type. `StorageKey::new` receives a bitwise copy of the caller's array and zeroizes its own copy on rejection paths. The caller's original array is a separate copy that is NOT zeroized by the library. After calling `StorageKey::new`, the caller MUST explicitly zeroize their copy of `key_bytes` (Rust: `key_bytes.zeroize()`; C: `soliton_zeroize(key_ptr, 32)`). The general `[u8; N]` Copy zeroization pattern is described in §10.5; `StorageKey::new` is a specific instance of it.

**`add_key` atomicity — active_version update and map insert**: The `add_key` function sets `active_version = version` and inserts the key into the map. **The Rust reference implementation assigns `active_version` BEFORE the map insert** — this is safe in Rust because `HashMap::insert` is infallible (it cannot throw or return an error). **Go's `map[V]K` insert and CPython's `dict` insert are also infallible** — the assign-before-insert ordering is safe in both and requires no reordering. **Reimplementers in languages where map insert CAN throw or fail (Java `HashMap`, C# `Dictionary`, custom `MutableMapping` subclasses in Python) MUST use the opposite ordering**: assign `active_version` only AFTER the insert returns success. In these languages, if the map insert throws after the `active_version` assignment, `active_version` points to a missing key, breaking the `active_key()` never-`None` invariant — every subsequent `encrypt_blob` call returns `InvalidData` with no diagnostic. The safe rule: assign `active_version` after insert in any language where the map insert can raise an exception or return an error; assign before insert only when insert is unconditionally infallible.

Keys are held in process memory for the lifetime of the `StorageKeyRing` object and are zeroized on drop — when the `StorageKeyRing` is dropped (end of scope in Rust; explicit destruction in CAPI via the free function), all key material in the key list is zeroized before deallocation. CAPI callers MUST call the keyring free function after use; failing to do so leaks key material (the allocation is freed but the key bytes are not zeroed). Keys are not persisted automatically — the caller is responsible for reloading keys (e.g., from environment variables or a secrets manager) at startup. Environment variables should be cleared from the process environment immediately after reading.

**Security risk of retaining old keys**: Step 6 of the rotation procedure (removing the old key) is a security obligation, not only a hygiene concern. An old key that remains in the keyring — even after all blobs have been re-encrypted under the new key — provides an attacker who later compromises that old key with the ability to substitute old blobs back into storage. If an attacker has write access to the blob store and possesses a compromised old key, they can replace new-version blobs with old-version blobs encrypted under the compromised key; the keyring will decrypt them successfully (the version byte routes to the retained old key). Retaining old keys long after migration extends the window during which a key compromise enables replay of stale content. The recommended pattern: remove old keys promptly after re-encryption is complete, and treat the `Ok(false)` return from `remove_key` (key was already absent) as confirmation, not an error.

**`remove_key` behavior**: Removes a key version from the keyring's in-memory list and returns `Ok(true)` if the key was present, `Ok(false)` if it was absent (idempotent — removing a non-existent version is a no-op, not an error). `remove_key(0)` returns `UnsupportedVersion` (version 0 is reserved — same as `StorageKey::new(0)`). The active version cannot be removed (returns `InvalidData`). This is immediate in-memory removal, not deferred deletion — subsequent decrypt calls using that version will return `AeadFailed` (not `UnsupportedVersion`, to prevent error-oracle attacks that could distinguish "removed key" from "corrupted ciphertext"). Version tracking is the caller's responsibility; the keyring does not persist removal history.

**`add_key` return value**: `add_key(version, key_bytes, make_active)` returns `Ok(true)` if a key at that version already existed and was replaced (prior material is destroyed — any blobs encrypted under the old key material at that version are permanently undecryptable), `Ok(false)` if the version was new. `make_active=false` with a `version` matching the current active version returns `InvalidData` (see §13.2 for the CAPI note). The `Ok(true)` case is a silent overwrite of key material — callers who need to know whether a key was replaced should inspect the return value before adding.

**`add_key(make_active=true)` with the current active version replaces key material in-place**: When `make_active=true` and `version` equals the current active version, `add_key` succeeds (`Ok(true)`) and replaces the active key's material with the new bytes. The `make_active=false`-with-active-version guard does NOT fire in this case (that guard prevents the specific inconsistency of changing the active pointer without updating the material). The result: every subsequent `encrypt_blob` call uses the new key material, and any blobs previously encrypted with the old material at that version are permanently undecryptable — the old material is destroyed in-place with no grace period. This operation is only safe when all blobs at that version have already been migrated (re-encrypted) under a different version. The recommended rotation pattern (§11.6 step 3) avoids this hazard by always using a new version number when adding a replacement key.

**`StorageKeyRing` thread safety model**: `StorageKeyRing` auto-derives `Send + Sync` (all fields are `Send + Sync`) but is **not designed for concurrent access**. There is no internal `Mutex` — the struct is a plain `HashMap<u8, StorageKey>` with an `active_version: u8` field. Mutating operations (`add_key`, `remove_key`) take `&mut self`; concurrent mutation requires exclusive access enforced by the caller, not the library. `encrypt_blob` and `decrypt_blob` do NOT take a `&StorageKeyRing` reference — the caller retrieves the active key via `ring.active_key()` and passes the resulting `&StorageKey` directly to `encrypt_blob`. The CAPI `SolitonKeyRing` wrapper adds an `AtomicBool` reentrancy guard that returns `ConcurrentAccess` (-18) on re-entrant calls from a single thread — this is a single-thread reentrancy guard, not a multi-thread Mutex. **Correct concurrent model for reimplementers**: wrap `StorageKeyRing` in a caller-owned `Mutex<StorageKeyRing>`. Encrypt/decrypt and key management all require exclusive lock acquisition, since getting the key reference (`active_key()`) and passing it to `encrypt_blob` are two separate operations that must not be split by a concurrent `add_key`/`remove_key`. A `RwLock` where encrypt/decrypt acquire read locks and key management acquires a write lock is unsafe because `active_key()` returns a reference into the inner `HashMap` — the reference is invalidated if any write-lock operation (`add_key`/`remove_key`) rehashes the map.

**`active_key()` never-None invariant**: The keyring is constructed with an initial key (`new(version, key_bytes)`), `remove_key` rejects removal of the active version, and `add_key(make_active: true)` atomically replaces the active version. These three constraints ensure `active_key()` always returns `Some`. In Rust, the `Option<&StorageKey>` return type is never `None` by construction. Binding authors implementing a keyring outside Rust's type system MUST maintain this invariant — if violated, `encrypt_blob` has no key to encrypt with and returns `InvalidData` with no diagnostic pointing to the empty keyring. The invariant should be checked at construction time (reject an empty or zero-version keyring) rather than at each `encrypt_blob` call.

---

## 12. Error Types

All soliton operations return a `Result<T, Error>`. The following error variants are defined:

| Variant | C code | Meaning | Recoverability |
|---------|--------|---------|----------------|
| `InvalidLength` | -1 | Input has wrong size for the expected key or buffer type. **Rust struct variant**: `InvalidLength { expected: usize, got: usize }` — NOT a unit variant. Pattern-matching must bind both fields (or use `..`); constructing it requires both fields. The error message is `"invalid length: expected {expected}, got {got}"`. Internal truncation errors that would expose parser offset information use `InvalidData` instead, not `InvalidLength` — `InvalidLength` is reserved for caller-supplied parameters that don't match a known fixed size. | Caller bug |
| `DecapsulationFailed` | -2 | KEM decapsulation failed — unreachable in lo-crypto-v1 (see note below). Exists for forward compatibility with explicit-rejection KEMs. | Retry-safe (decrypt) |
| `VerificationFailed` | -3 | Signature verification failed | Retry-safe |
| `AeadFailed` | -4 | AEAD decryption failed (wrong key, tampered ciphertext, or wrong AAD) | Session-fatal (encrypt); retry-safe (decrypt) |
| `BundleVerificationFailed` | -5 | Pre-key bundle IK mismatch or signature invalid | Retry-safe (fetch new bundle) |
| `TooManySkipped` | -6 | *(reserved — was skip cache overflow, removed in counter-mode redesign)* | — |
| `DuplicateMessage` | -7 | Message counter already in recv_seen (already decrypted). **Caller guidance**: silently discard the duplicate — no application-level notification, no retry. **MUST NOT** surface this error to the message sender: an attacker who can distinguish `DuplicateMessage` from `AeadFailed` gains a membership-oracle on the receiver's `recv_seen` set, enabling byte-by-byte probing of which counters have been decrypted. Treat as an opaque "already delivered" signal at the transport layer. | Retry-safe (state unchanged) |
| *(reserved)* | -8 | Was `SkippedKeyNotFound`, removed | — |
| `AlgorithmDisabled` | -9 | *(reserved — intended for platform-specific algorithm availability; currently unused)* | — |
| `UnsupportedVersion` | -10 | Serialized blob has unknown version byte. **Source functions**: `from_bytes`/`from_bytes_with_min_epoch` (ratchet blob version ≠ 0x01); `stream_decrypt_init` (stream header version ≠ 0x01); `StorageKey::new` (key version = 0 is reserved). **Not** returned from `decrypt_blob` for unknown blob version — that case collapses to `AeadFailed` (version-enumeration oracle). | Permanent |
| `DecompressionFailed` | -11 | Zstandard decompression failed, or decompressed size exceeds 256 MiB (collapsed to `AeadFailed` at trust boundaries — see note below) | Retry-safe |
| `Internal` | -12 | Structurally unreachable internal error. Also returned from `stream_encrypt_chunk` if zstd produces expansion beyond `plaintext.len() + STREAM_ZSTD_OVERHEAD` (§15.11) — the overhead ceiling is additive over the actual plaintext length, not over `CHUNK_SIZE`; a 100-byte final chunk that compresses to more than 356 bytes (100 + 256) triggers `Internal`, not a 1 MiB + 256 ceiling. Encrypt-side only (no oracle concern), not session-fatal. **Recovery requires a full stream restart**: `compress` is fixed at `stream_encrypt_init` — there is no per-chunk `compress` parameter to toggle. Retrying the same chunk on the same encryptor fails deterministically with the same result (the zstd output for that plaintext is fixed). Recovery requires abandoning the current encryptor, creating a new one via `stream_encrypt_init` with `compress = false`, and re-encrypting the stream from the beginning. `encrypt()` does **not** return `Internal` on CSPRNG failure. `random_bytes()` panics on OS CSPRNG unavailability, and `panic = "abort"` converts that panic into process termination — no error is propagated to the caller. CSPRNG failure is treated as non-recoverable by design: there is no safe fallback from an unusable entropy source, and a panicking abort is preferable to silently deriving keys from predictable "random" bytes. Also returned from `soliton_argon2id` if the underlying argon2 library returns an unexpected error not mappable to any other variant — structurally unreachable with a correct argon2 implementation and valid parameters. | Context-dependent (see notes) |
| `NullPointer` | -13 | CAPI null pointer argument (C ABI only) | Caller bug |
| `UnsupportedFlags` | -14 | Reserved for storage blobs with reserved flag bits set. **Never constructed in the current implementation** — reserved-flag rejections are collapsed directly to `AeadFailed` without producing this variant (see oracle-collapse note below). Retained solely for ABI stability: error code -14 MUST NOT be reassigned. | Permanent |

> **Error oracle collapse (defense-in-depth):** In the decrypt path, `DecompressionFailed` and `UnsupportedFlags` are collapsed to `AeadFailed` before returning to the caller. Distinct error codes for post-AEAD parsing steps would let an attacker distinguish "AEAD passed but decompression failed" from "AEAD failed," leaking a decryption oracle. The CAPI maps both to `SOLITON_ERR_AEAD (-4)`. The distinct codes above are retained solely for ABI stability and are never exposed across trust boundaries. `UnsupportedFlags` is never constructed in the current implementation — reserved-flag rejections are mapped directly to `AeadFailed`.
>
> **Consolidated collapse table:**
>
> | Internal Error | Exposed As | Context | Reason |
> |---|---|---|---|
> | `DecompressionFailed` | `AeadFailed` | Storage (§11.3) | Post-AEAD parsing oracle |
> | `UnsupportedFlags` (reserved bits) | `AeadFailed` | Storage (§11.3) | Reserved-bit oracle on pre-AEAD header field (flags byte is parsed before AEAD; a distinct error leaks that the rejection was structural, not cryptographic) |
> | `DecompressionFailed` | `AeadFailed` | Streaming (§15.7) | Post-AEAD decompression oracle |
> | Reserved flag bits (stream header) | `AeadFailed` | Streaming (§15.8) | Header field oracle (attacker-controlled) |
> | Size mismatch after decompress | `AeadFailed` | Streaming (§15.7) | Post-AEAD size oracle |
> | Key version not in keyring at decrypt time | `AeadFailed` | Storage decrypt (§11.3) | Version-enumeration oracle — returning `UnsupportedVersion` for an unregistered version byte would let an attacker distinguish "key not loaded" from "wrong ciphertext" |
> | Undersize ciphertext (< 16 bytes) | `AeadFailed` | AEAD decrypt (§7.1) | Too-short-vs-bad-tag oracle — using `InvalidLength` would let an attacker distinguish "ciphertext shorter than Poly1305 tag" from "valid-length but wrong tag" |
> | Storage blob shorter than 42 bytes | `AeadFailed` | Storage decrypt (§11.1) | Pre-AEAD framing oracle — 42 bytes is the minimum valid blob (26-byte header + 16-byte Poly1305 tag); using `InvalidLength` or `InvalidData` would let an attacker distinguish "blob too short to contain valid ciphertext" from "plausible-length blob with wrong key/tag" |
> | `ChainExhausted` from `from_bytes` | `ChainExhausted` (not `InvalidData`) | Deserialization (§6.8) | The blob's stored epoch is `u64::MAX` — the resulting state cannot be re-serialized (`to_bytes` would overflow on `epoch + 1`). States with stored epoch `u64::MAX - 1` are accepted but `can_serialize()` returns false, preventing `to_bytes` from producing a zombie blob. This is a counter-exhaustion condition, not a format error. |
> | Streaming chunk input shorter than `STREAM_CHUNK_OVERHEAD` = 17 bytes | `AeadFailed` | Streaming decrypt (§15) | Pre-AEAD framing oracle — a 17-byte minimum is `tag_byte (1) + Poly1305 tag (16)`, the smallest structurally valid chunk with zero-length plaintext. Returning `InvalidLength` or `InvalidData` for a sub-17-byte chunk would let an attacker distinguish "chunk too short to attempt AEAD" from "plausible-length chunk with wrong key/tag." This parallels the "Undersize ciphertext (< 16 bytes) → AeadFailed" rule for raw AEAD, with the streaming layer adding 1 byte for the tag_byte. **Note**: §15.7 describes the oracle-collapse scope as "post-authentication errors" — this pre-AEAD check is the streaming-layer analogue of the raw AEAD undersize collapse, not a post-auth error; both collapse to `AeadFailed` for the same oracle-prevention reason. |
>
> | `DuplicateMessage` | `AeadFailed` (toward sender) | Ratchet decrypt (§6.6) | Replay-detection oracle — an attacker who can distinguish `DuplicateMessage` from `AeadFailed` gains a membership oracle on the receiver's `recv_seen` set, enabling byte-by-byte probing of which counters have been decrypted. `DuplicateMessage` MUST NOT be surfaced to the message sender; the transport layer MUST treat it as an opaque "already delivered" signal and silently discard the duplicate. |
>
> **Not collapsed** (checked on public/pre-AEAD data): `UnsupportedVersion` (version byte is cleartext), `InvalidData` for pre-AEAD framing checks (chunk wire length is observable).
| `ChainExhausted` | -15 | Five distinct recoverability modes: **(1) Encrypt-side** (send_count at u32::MAX): session-fatal for the send direction — no more messages can be sent. **Source: `encrypt()` / `soliton_ratchet_encrypt` only.** **(2) Decrypt-side recv_seen saturation** (§6.8): transient — the `recv_seen` or `prev_recv_seen` set is full (65536 entries); the cap resets on the next KEM ratchet step (peer triggers direction change). A caller who treats all `ChainExhausted` from `decrypt()` as session-fatal will terminate a recoverable session. **Source: `decrypt()` / `soliton_ratchet_decrypt` only.** **(3) Serialization epoch overflow** (`to_bytes` at epoch u64::MAX, §6.8 guard 24): persistence-fatal — the in-memory session remains functional for send/receive but can never be serialized again. **Source: `to_bytes()` / `soliton_ratchet_to_bytes` only.** Also returned by `from_bytes()` / `soliton_ratchet_from_bytes_with_min_epoch` when the deserialized epoch equals `u64::MAX` (guard 24 — the session cannot be serialized again; §6.8). **(4) Call chain advance limit** (§6.12): `CallKeys::advance()` returns `ChainExhausted` after 2²⁴ steps; the call session is permanently exhausted and a new `derive_call_keys()` call is required. Unrelated to ratchet message counters. **Source: `CallKeys::advance()` / `soliton_call_keys_advance` only.** **(5) Streaming chunk index exhaustion** (§15.9): returned by `encrypt_chunk` or `decrypt_chunk` (sequential) when `next_index == u64::MAX`. Not session-fatal — the handle is still valid and can be freed normally. Distinct from the ratchet modes: a streaming `ChainExhausted` does NOT indicate any ratchet problem. **Source: `soliton_stream_encrypt_chunk` / `soliton_stream_decrypt_chunk` only; `soliton_stream_decrypt_chunk_at` never returns this.** | See per-mode description |
| `UnsupportedCryptoVersion` | -16 | `crypto_version` field in a session init is not "lo-crypto-v1". **Source functions**: `decode_session_init` and `receive_session`. `receive_session` (§5.5 Step 1) performs its own `crypto_version` check against the parsed `SessionInit` before signature verification — it returns `UnsupportedCryptoVersion` directly (not collapsed, because §5.5's checked values are public; see the error-collapsing note in §5.5). `decode_session_init` returns it during wire-format parsing. Not returned by `verify_bundle` — a wrong `crypto_version` in a pre-key bundle is collapsed to `BundleVerificationFailed` (§5.3 error-collapsing paragraph) along with fingerprint mismatches and signature failures, to prevent enumeration of which check failed. Not returned by the ratchet or storage layers. A binding author who pattern-matches for `UnsupportedCryptoVersion` from `decode_session_init` but not from `receive_session` will miss the second source. | Permanent |
| `InvalidData` | -17 | Structural violation in serialized data or caller protocol misuse. Covers: bad marker bytes, co-presence errors, implausible values in deserialized blobs (ratchet, session-init); and caller misuse on the streaming API — calling `encrypt_chunk` or `decrypt_chunk` after finalization, passing a wrong-size non-final chunk (uncompressed), or passing an oversized final chunk plaintext. Binding authors MUST NOT assume this error always indicates corrupt received data; it may indicate a caller-side state machine bug. | Retry-safe |
| `ConcurrentAccess` | -18 | Opaque handle is being freed while another thread holds a reference (CAPI-only — not present in the core `Error` enum; exists only as a CAPI error code) | Caller bug |

**`DecapsulationFailed` is unreachable in lo-crypto-v1 — two blocked paths**: This variant is structurally unreachable because both sites that could produce it are blocked:

1. **`encode_ratchet_header` / `decode_ratchet_header`**: Any KEM ciphertext (`kem_ct`) with the wrong size (not exactly 1120 bytes) is rejected as `InvalidData` during header parsing, before X-Wing decapsulation is attempted. A malformed ciphertext never reaches `xwing::decapsulate`.

2. **`xwing::decapsulate` itself**: X-Wing uses implicit rejection (§8.4): if ML-KEM decapsulation detects a "garbage ciphertext" condition (`J ≠ 0`), it substitutes a pseudo-random shared secret (`z XOR H(ciphertext)`) rather than returning an error. X25519 always produces a result (all-zero output is handled separately by the low-order point check). `xwing::decapsulate` therefore always returns `Ok(shared_secret)`, never `Err(DecapsulationFailed)`.

The variant is retained in the enum for ABI stability (-2 is reserved) and for future explicit-rejection KEMs. Binding authors may safely treat `DecapsulationFailed` from the current library as `Internal` — it indicates a logic error, not a recoverable condition.

**`Error` is `#[non_exhaustive]`**: The `Error` enum is marked `#[non_exhaustive]` in Rust, meaning match arms must include a catch-all (`_ => ...`). Binding authors and application code MUST NOT exhaustively match on error codes — a future version may add new variants without incrementing the library's major version. New variants MUST get new numeric codes (not reuse reserved slots). The `#[non_exhaustive]` attribute also prevents binding authors from constructing `Error` values directly; use the library's entry points.

**Recoverability key**: *Retry-safe* — the operation can be retried or the message dropped; ratchet state is unchanged on error. *Session-fatal* — the session (or encrypt direction) is permanently broken; `AeadFailed` on encrypt triggers full key zeroization as defense-in-depth, making the session irrecoverable. *Permanent* — the error reflects a capability or format gap, not a transient condition. *Caller bug* — indicates a programming error in the calling code.

`InvalidLength` is for type-level size mismatches (wrong key size, wrong ciphertext size). `InvalidData` is for structural content violations (bad format, co-presence invariant broken, implausible values). The distinction matters for diagnostics.

**`InvalidData` from `_free` functions means wrong handle type, not blob corruption**: When `soliton_ratchet_free`, `soliton_keyring_free`, `soliton_call_keys_free`, `soliton_stream_encrypt_free`, or `soliton_stream_decrypt_free` return `InvalidData` (-17), it indicates the opaque handle's internal type discriminant is wrong — the handle pointer belongs to a different handle type (e.g., a `SolitonKeyRing*` was passed to `soliton_ratchet_free`). This is distinct from `InvalidData` returned by decryption or deserialization functions, where it means structurally invalid content. Binding authors writing diagnostic or error-handling code for `_free` functions should map `InvalidData` to "handle type mismatch" rather than "corrupted data."

**Error code ABI stability**: Once a numeric error code is assigned (e.g., `-6` for `TooManySkipped`), that code is reserved forever — even if the error variant is removed or renamed. Binding authors hardcode these values in constants, switch statements, and documentation. Reassigning a code to a different error would silently change the meaning of existing bindings without compilation or test failures. Removed codes are marked "reserved" in the table above and must never be reused.

---

## 13. C ABI (soliton_capi)

### 13.1 Overview

`soliton_capi` exposes the core library as a stable C ABI (`extern "C"` functions). Direct consumers: Go (cgo), C# (P/Invoke), Dart (dart:ffi), C/C++. Swift and Kotlin consume the CAPI indirectly via UniFFI-generated wrappers. Node.js uses napi-rs (a Rust-native Node add-on API that does not call through the C ABI).

The generated header is `soliton.h`. It is produced by `cbindgen` and must not be edited manually.

### 13.2 Conventions

- **Return codes**: `0` = success, negative = error (see §12 for codes).
- **Caller-allocated output buffers**: Used when output size is a fixed compile-time constant (e.g., 32-byte fingerprints, 32-byte shared secrets). The caller passes a pre-allocated buffer; on error the buffer is zeroed.
- **Library-allocated buffers** (`SolitonBuf`): Used for variable-length outputs. Must be freed with `soliton_buf_free`. Never call `free()` directly on `ptr`.
- **Opaque heap objects** (`SolitonRatchet*`, `SolitonKeyRing*`): Allocated by the library, freed with their respective `_free` functions.
- **CSPRNG failure aborts the process**: All keygen and encapsulation operations that consume OS entropy (`getrandom(2)`, `ProcessPrng`, `getentropy`, etc.) abort the process on CSPRNG failure rather than returning an error code — there is no safe cryptographic fallback when randomness is unavailable. This behavior is by design and is not configurable. **Binding authors**: do NOT wrap CAPI calls in a catch-all exception handler or POSIX signal handler expecting to recover from abort — the abort is deliberate and the process state after a failed CSPRNG call is not safely continuable. **C++ callers**: `extern "C"` functions MUST NOT propagate exceptions across the FFI boundary (undefined behavior per the C++ standard); the abort-on-CSPRNG-failure guarantee depends on no exception reaching the FFI boundary from within the library. A C++ wrapper that installs a `std::terminate` handler or catches SIGABRT will mis-handle this.
- **All pointer arguments** must be non-null unless documented otherwise. Null pointers return `NullPointer` (-13). **Exception (empty plaintext for encrypt only)**: `soliton_stream_encrypt_chunk` accepts `plaintext = NULL` with `plaintext_len = 0` — this is the mechanism for producing an empty final chunk (valid empty-file stream). `soliton_stream_decrypt_chunk` does NOT share this exception: its ciphertext input is named `chunk` (not `plaintext`) and null `chunk` is always rejected with `NullPointer`, even with `chunk_len = 0`. Binding wrappers that add blanket non-null guards on the encrypt-side plaintext pointer break the empty-file use case silently (the null check fires before the zero-length check, returning `NullPointer` where the empty chunk would succeed). Wrappers that apply the same exception to the decrypt-side chunk pointer diverge from the reference — the reference returns `NullPointer` for null chunk unconditionally. **Exception (empty AAD)**: `soliton_stream_encrypt_init` and `soliton_stream_decrypt_init` accept `caller_aad = NULL` with `aad_len = 0` — this is the mechanism for streams with no additional authenticated data. The AAD defaults to empty, and HMAC domain separation is provided by the stream key and base nonce. Binding wrappers that add blanket non-null guards on `caller_aad` return `NullPointer` for valid empty-AAD calls. **Exception (zero-length primitive inputs)**: `soliton_hmac_sha3_256`, `soliton_hkdf_sha3_256`, and `soliton_argon2id` accept a NULL pointer for any input whose corresponding length field is 0. Specifically: `key = NULL` with `key_len = 0` (HMAC with empty key), `data = NULL` with `data_len = 0` (HMAC/HKDF with empty data/IKM), `salt = NULL` with `salt_len = 0` (HKDF/Argon2id with empty salt), `password = NULL` with `password_len = 0` (Argon2id with empty password), `info = NULL` with `info_len = 0` (HKDF with empty info). These are valid degenerate inputs to the underlying primitives — HMAC(key=∅, data), HKDF with empty IKM or salt, and Argon2id with empty password are all well-defined by their respective RFCs. The null-with-nonzero-length combination still returns `NullPointer`. Binding wrappers that add blanket non-null guards on these input pointers break the empty-input use case for primitive APIs where the caller explicitly wants to derive from an empty string. This exception does NOT apply to output buffers, key parameters with implicit fixed sizes (e.g., the `key` in `soliton_aead_encrypt`), or any parameters not enumerated here.
- **Zero-length byte arrays**: Most CAPI functions reject non-null pointers with zero length as `InvalidLength`. **Exception (zero-length ciphertext to decrypt operations)**: `soliton_ratchet_decrypt`, `soliton_ratchet_decrypt_first`, `soliton_stream_decrypt_chunk`, and `soliton_stream_decrypt_chunk_at` return `AeadFailed` (not `InvalidLength`) for inputs shorter than their respective AEAD minimums (16 bytes for ratchet, 40 bytes for first-message, 17 bytes for streaming) — collapsing to `AeadFailed` prevents an oracle distinguishing "too short to attempt AEAD" from "wrong key." See §12 collapse table. `soliton_stream_decrypt_chunk_at` shares the same collapse because it calls the same underlying `decrypt_chunk_inner` path as `soliton_stream_decrypt_chunk`. Binding wrappers that add zero-length short-circuit guards on these ciphertext inputs may return `InvalidLength` where the library returns `AeadFailed`, breaking the oracle-collapse guarantee. **`soliton_aead_decrypt` with zero-length ciphertext is NOT in this exception**: `soliton_aead_decrypt` with `ciphertext_len = 0` returns `InvalidLength` (the CAPI zero-length guard fires before the core AEAD minimum check). `ciphertext_len` values 1-15 return `AeadFailed` (too short to contain the 16-byte Poly1305 tag, but non-zero length passes the CAPI guard). A reimplementer who applies the ratchet/stream pattern to `soliton_aead_decrypt` and returns `AeadFailed` for `len = 0` diverges from the reference.
- **Input size cap**: All CAPI functions reject any single input buffer exceeding 256 MiB (268,435,456 bytes) with `InvalidLength`. This is a defense-in-depth limit — no legitimate cryptographic input approaches this size, and rejecting oversized buffers early prevents downstream integer overflow or allocation-exhaustion issues in binding languages with unchecked size casts. **Exception — streaming chunk functions**: `soliton_stream_decrypt_chunk` and `soliton_stream_decrypt_chunk_at` do not apply a 256 MiB pre-check on the `chunk` input — chunk size is bounded structurally by `STREAM_CHUNK_STRIDE` (1,048,593 bytes) and the AEAD layer rejects any oversized input. A reimplementer who adds an explicit 256 MiB `InvalidLength` guard to streaming chunk functions introduces an observable divergence: the reference implementation returns `AeadFailed` for oversized chunks, not `InvalidLength`.
- **`crypto_version` string: null vs empty vs non-UTF-8 produce different errors** — **applies to `soliton_kex_verify_bundle`, `soliton_kex_initiate`, `soliton_kex_decode_session_init`, and `soliton_kex_receive`** (the only four CAPI functions that accept a `crypto_version` parameter; all other CAPI functions do not take a `crypto_version` argument): `crypto_version` is passed as a null-terminated C string (`const char *`), not as a `(ptr, len)` pair. Three outcomes: (1) A null pointer returns `NullPointer` (-13); (2) a non-null pointer to a valid UTF-8 string that is not `"lo-crypto-v1"` (including an empty string `""`) returns `UnsupportedCryptoVersion` (-16); (3) a non-null pointer whose bytes are not valid UTF-8 returns `InvalidData` (-17) — the CAPI's `CStr::from_ptr` → `to_str()` call fails before version comparison can run, and the conversion error maps to `InvalidData`, not `UnsupportedCryptoVersion`. A reimplementer who pattern-matches on `UnsupportedCryptoVersion` to detect all "wrong version" inputs will miss the non-UTF-8 case, which surfaces as the unrelated-seeming `InvalidData`. This third outcome matters for bindings from runtimes whose string types may not be UTF-8 (Latin-1 in older Java contexts, arbitrary bytes in C char arrays). This distinction matters for bindings that represent "absent" and "empty" differently: some binding languages (Python `None` vs `""`, Java `null` vs `""`, Swift `nil` vs `""`) have distinct representations for these two cases. The binding's null-to-C mapping must pass a null pointer for "absent" and a pointer-to-null-byte for "empty." Bindings that convert `None`/`nil`/`null` to an empty C string (pointer-to-`'\0'`) instead of a null pointer will return `UnsupportedCryptoVersion` where `NullPointer` is expected, and vice versa.

**Concurrency safety — stateless functions vs. opaque handles**: All primitive functions that take no opaque handles are safe to call concurrently from multiple threads without synchronization: `soliton_sha3_256`, `soliton_hmac_sha3_256`, `soliton_hmac_sha3_256_verify`, `soliton_hkdf_sha3_256`, `soliton_aead_encrypt`, `soliton_aead_decrypt`, `soliton_xwing_keygen`, `soliton_xwing_encapsulate`, `soliton_xwing_decapsulate`, `soliton_identity_sign`, `soliton_identity_verify`, `soliton_verification_phrase`, `soliton_random_bytes`, `soliton_argon2id`, `soliton_zeroize`. These functions have no internal mutable state — each call is fully independent. Opaque-handle functions (`soliton_ratchet_*`, `soliton_keyring_*`, `soliton_stream_*`, `soliton_kex_*`) require exclusive access per-handle; concurrent calls on the same handle are detected by the reentrancy guard and return `ConcurrentAccess` (-18).

### 13.3 Buffer Management

```c
typedef struct SolitonBuf {
    uint8_t *ptr;
    uintptr_t len;
} SolitonBuf;

// Free and zeroize a library-allocated buffer.
// Sets ptr = null, len = 0 after free. Double-free is safe (no-op).
void soliton_buf_free(SolitonBuf *buf);
```

All library-allocated buffers are zeroized before freeing. The `ptr` and `len` fields are zeroed after free, making double-free a safe no-op. **The `ptr` field MUST NOT be modified by the caller**: `soliton_buf_free` passes the stored `ptr` value directly to `free()`. If the caller advances `ptr` (e.g., `buf.ptr += n` to read from an offset), `soliton_buf_free` frees the advanced pointer — not the original allocation — causing heap corruption in C or undefined behavior in C++. Use a separate local variable for reading: `const uint8_t *p = buf.ptr; while (remaining > 0) { ... p++; remaining--; }` — do not modify `buf.ptr`. The `len` field may be read but also MUST NOT be modified before the free call; modifying `len` does not affect `soliton_buf_free` (which does not use `len` during deallocation), but doing so breaks the "zeroed after free" invariant and may confuse callers who check `len` to detect freed state.

All CAPI functions with output buffer parameters zero the output upfront (after null-pointer guard) before any computation, so outputs are always in a defined state even on error. **Exception — streaming chunk functions**: `soliton_stream_decrypt_chunk`, `soliton_stream_decrypt_chunk_at`, and `soliton_stream_encrypt_chunk` zero the output buffer on error paths only — on success, bytes in the output buffer beyond the written bytes (`out_written..out_len`) are NOT zeroed. The rationale: the output is ciphertext or plaintext (not secret material requiring zeroization), and the buffer may be as large as `CHUNK_SIZE` / `STREAM_ENCRYPT_MAX` (≈1 MiB); zeroing on success would waste cycles per chunk. Reimplementers MUST NOT rely on post-success-write bytes being zero — read `out_written` to determine the valid range.

**Caller-side zeroization**: For caller-owned buffers that held secret material (e.g., chain keys copied out of `soliton_ratchet_encrypt_first`), use `soliton_zeroize(ptr, len)` — a volatile-write zeroing function guaranteed not to be optimized out by the compiler. Standard C `memset` may be elided if the buffer is not read afterward. **Managed-runtime caveat**: In languages with garbage collection (Go, Python, C#, Dart), the runtime may relocate heap objects between the last use of the buffer and the `soliton_zeroize` call, leaving a copy of the secret material at the old address. Callers in managed runtimes MUST pin the buffer (e.g., `GCHandle.Alloc` in .NET, `pinner` in Go 1.21+, `ctypes` with explicitly allocated C buffers in Python) before writing secrets into it. Alternatively, allocate secret buffers via `malloc`/`calloc` (outside the GC's control) and free them after zeroization. Volatile writes to a GC-relocated address zeroize the new location but leave the old location intact — the secret survives in memory with no reference to find it.

### 13.4 Key Functions

**Identity:**
- `soliton_identity_generate(pk_out, sk_out, fingerprint_hex_out)` — generate LO composite keypair
- `soliton_identity_fingerprint(pk, pk_len, out)` — compute raw SHA3-256 fingerprint
- `soliton_identity_sign(sk, sk_len, message, message_len, sig_out)` — hybrid sign
- `soliton_identity_verify(pk, pk_len, message, message_len, sig, sig_len)` — hybrid verify
- `soliton_identity_encapsulate(pk, pk_len, ct_out, ss_out)` — encapsulate to IK X-Wing component. `ss_out` receives a 32-byte shared secret into a caller-allocated buffer. **The caller MUST zeroize `ss_out` after use** — use `soliton_zeroize(ss_out, 32)`.
- `soliton_identity_decapsulate(sk, sk_len, ct, ct_len, ss_out)` — decapsulate. `ss_out` receives a 32-byte shared secret into a caller-allocated buffer. **The caller MUST zeroize `ss_out` after use** — use `soliton_zeroize(ss_out, 32)`.

**Authentication:**
- `soliton_auth_challenge(client_pk, client_pk_len, ct_out, token_out)` — server: generate challenge
- `soliton_auth_respond(client_sk, client_sk_len, ct, ct_len, proof_out)` — client: generate proof
- `soliton_auth_verify(expected_token, proof)` — server: constant-time verification

**LO-KEX:**
- `soliton_kex_verify_bundle(bundle_ik_pk, ..., spk_pub, ..., spk_sig, ...)` — verify pre-key bundle. **Error codes**: `BundleVerificationFailed` (-5) for all non-structural failures — IK mismatch (bundle_ik_pk ≠ known_ik_pk), invalid SPK signature, or `crypto_version ≠ "lo-crypto-v1"`. All three collapse to a single error code to prevent iterative oracle probing (an attacker cannot determine which check failed — see §5.3 and §5.5 error-collapsing rationale). `InvalidData` (-17) on the one structural failure: OPK co-presence violation (opk_pub and opk_id must both be present or both absent) — this check precedes cryptography and is not security-sensitive. `InvalidLength` (-1) on wrong key/signature sizes. **Note**: `VerificationFailed` (-3) is NOT returned by this function — that code is for non-bundle signature operations (identity verification, auth). The collapse to `BundleVerificationFailed` is intentional; binding authors who pattern-match for `VerificationFailed` on the bundle verification path will silently miss all bundle-authentication failures.
- `soliton_kex_initiate(alice_ik_pk, ..., bob_ik_pk, ..., bob_spk_pub, ..., ...)` — initiate session (returns `SolitonInitiatedSession`). **Error codes**: `InvalidLength` (-1) if any key or signature has the wrong size. `InvalidData` (-17) on structural corruption or co-presence violation. `BundleVerificationFailed` (-5) for all non-structural bundle failures (IK mismatch, unsupported crypto version, invalid SPK signature) — `soliton_kex_initiate` calls `verify_bundle` internally and the same oracle-collapse applies as for `soliton_kex_verify_bundle` above. **SPK signature is re-verified internally** even if the caller already called `soliton_kex_verify_bundle` — this is defense-in-depth; binding authors should not attempt to skip the pre-call to `verify_bundle`. **Note**: `VerificationFailed` (-3) and `UnsupportedCryptoVersion` (-16) are NOT returned by this function — both conditions collapse to `BundleVerificationFailed` (-5).
- `soliton_kex_receive(bob_ik_pk, ..., bob_ik_sk, ..., alice_ik_pk, ..., ...)` — receive session init
- `soliton_kex_encode_session_init(...)` — encode a parsed SessionInit back to canonical bytes (§7.4). **Bob's tool, not Alice's**: Alice never calls this directly — `soliton_kex_initiate` handles encoding internally. Bob calls `soliton_kex_encode_session_init` after individually parsing or validating the received fields, to reconstruct Alice's canonical encoding for use in first-message AAD construction. The output must be byte-for-byte identical to Alice's internal encoding; any normalization of individual fields during decode (key clamping, padding removal) that alters re-encoding causes silent first-message AEAD failure.
- `soliton_kex_build_first_message_aad(...)` — build first-message AAD
- `soliton_kex_sign_prekey(ik_sk, ..., spk_pub, ..., sig_out)` — sign a pre-key
- `soliton_kex_initiated_session_free(session)` — free `SolitonInitiatedSession`. **Safety model**: null-safe (null `session` is a no-op). Returns `void` — not `int32_t` like opaque-handle free functions (`soliton_ratchet_free`, `soliton_call_keys_free`). `SolitonInitiatedSession` is a flat `#[repr(C)]` struct, not an opaque pointer — there is no type-tag field and no type-discriminant check. Callers MUST NOT pass a handle from a different free function (e.g., a ratchet handle) — doing so will zeroize and free incorrect memory without any error or diagnostic.

**Ratchet:**
- `soliton_ratchet_init_alice(root_key, ..., chain_key, ..., local_fp, ..., remote_fp, ..., ek_pk, ..., ek_sk, ..., out)` — init Alice state; fingerprints follow root_key/chain_key but precede the ephemeral key params (§6.2 parameter order note). Parameter name is `chain_key` in the header — see §13.5 for the full name-alias table (`epoch_key` / `chain_key` / `initial_chain_key`).
- `soliton_ratchet_init_bob(root_key, ..., chain_key, ..., local_fp, ..., remote_fp, ..., peer_ek, ..., out)` — init Bob state; fingerprints follow root_key/chain_key but precede the ephemeral key params (§6.2 parameter order note). Same `chain_key` alias — see §13.5.
- `soliton_ratchet_encrypt(ratchet, plaintext, ..., out)` — encrypt (fingerprints are stored in the ratchet state, not passed per call)
- `soliton_ratchet_decrypt(ratchet, ratchet_pk, ..., kem_ct, ..., n, pn, ciphertext, ..., plaintext_out)` — decrypt. Pass `kem_ct = NULL` and `kem_ct_len = 0` when the header contains no KEM ciphertext (same-chain message). Pass `n` and `pn` exactly as received from the wire header — both are included in AAD regardless of epoch type; a caller who passes `pn = 0` for every message gets AEAD failure whenever the wire `pn ≠ 0`.
- `soliton_ratchet_encrypt_first(epoch_key, plaintext, ..., aad, ..., payload_out, ratchet_init_key_out)` — first message
- `soliton_ratchet_decrypt_first(epoch_key, payload, ..., aad, ..., plaintext_out, ratchet_init_key_out)` — first message decrypt
- `soliton_ratchet_to_bytes(ratchet, data_out, epoch_out)` — serialize state (ownership-consuming: takes `*mut *mut SolitonRatchet`, nulls the caller's handle on success to prevent post-serialization use; `epoch_out` receives the new epoch for anti-rollback tracking, nullable). On `ChainExhausted` (epoch at u64::MAX — the only counter the CAPI `to_bytes` wrapper visibly checks, because `can_serialize()` pre-filters `send_count`/`recv_count`/`prev_send_count` at u32::MAX before the CAPI takes ownership; the Rust `to_bytes` itself checks all four counters), `*ratchet` is NOT nulled — the handle remains valid. On `ConcurrentAccess` (-18), `*ratchet` is also NOT nulled — the handle remains live. On `NullPointer` (-13, e.g., `data_out` is null), `*ratchet` is likewise NOT nulled — the call was rejected before ownership transfer began. All three non-success cases that leave the handle intact (`NullPointer`, `ChainExhausted`, `ConcurrentAccess`) are retryable after fixing the caller bug or waiting for the concurrent operation; only a successful return irreversibly transfers and nulls ownership. A binding that frees the handle on any non-zero return code will double-free a live session whenever a null-pointer caller bug triggers `NullPointer`. Callers who check only for null after failure will lose the handle. **Maintainer note**: The "NOT nulled on `ChainExhausted`" guarantee depends on `can_serialize()` (see §6.8) pre-validating all conditions before the CAPI layer takes ownership of the handle. If a future `to_bytes` refactor introduces a new error condition not covered by `can_serialize()`, the handle will be nulled on that error with no recovery path — `can_serialize()` and `to_bytes` must check identical conditions. **epoch_out sentinel on error**: When `epoch_out` is non-null, the CAPI sets `*epoch_out = 0` immediately at entry so that error paths never leave stale values from a previous call. Epoch 0 is never a valid serialized epoch (the initial `to_bytes` produces epoch 1), so 0 acts as a sentinel meaning "no epoch written." Callers that store `*epoch_out` as their `min_epoch` for anti-rollback MUST check the return code first and MUST NOT update their stored `min_epoch` on any error return — storing the sentinel value 0 as `min_epoch` silently disables anti-rollback protection for all subsequent `from_bytes_with_min_epoch` calls.
- `soliton_ratchet_from_bytes(data, data_len, out)` — deserialize state **(deprecated** — use `from_bytes_with_min_epoch`; see §6.8). **Error codes**: `InvalidData` (-17) on structural blob corruption (guards 1-25, §6.8), `ChainExhausted` (-15) when the blob encodes epoch == u64::MAX (guard 24 — the session is structurally valid but permanently un-re-serializable; see §12 collapse table). `InvalidLength` (-1) if the input exceeds the 1 MiB CAPI cap. A binding author who catches only `InvalidData` and propagates all other errors as "corruption" will silently lose a recoverable serialization-exhausted session.
- `soliton_ratchet_from_bytes_with_min_epoch(data, data_len, min_epoch, out)` — deserialize with anti-rollback check (epoch must be > min_epoch). Same error codes as `from_bytes`, plus `InvalidData` for epoch-rollback rejection (guard 12 — indistinguishable from structural corruption at the API level; see §6.8).
- `soliton_ratchet_epoch(ratchet, out)` — query current epoch counter non-destructively (since `to_bytes` is ownership-consuming, use `epoch()` to read the epoch without committing to serialization — e.g., to check consistency with a stored `min_epoch` before calling `to_bytes`, or to initialize a `min_epoch` store when migrating existing sessions)
- `soliton_ratchet_reset(ratchet)` — reset ratchet state to initial (zeroizes all epoch keys). Returns `int32_t`: 0 on success, `ConcurrentAccess` (-18) if the handle is in use, `InvalidData` (-17) if the handle's type discriminant is wrong (handle was not created by `soliton_ratchet_init_*`; see §13.6 type tagging). On `ConcurrentAccess`, the state is NOT reset — the caller must retry after the concurrent operation completes. On `InvalidData`, the state is also NOT reset — the type-discriminant check fires before any reset logic, so the handle (if it is a valid ratchet handle accidentally passed to the wrong operation) is unmodified and safe to continue using.
- `soliton_ratchet_free(ratchet)` — free opaque ratchet. Returns `int32_t`: 0 on success, `ConcurrentAccess` (-18) if in use, `InvalidData` (-17) if the type discriminant is wrong. Null outer/inner pointer is a safe no-op (returns 0)
- `soliton_encrypted_message_free(msg)` — free `SolitonEncryptedMessage` buffer fields (`header.ratchet_pk`, `header.kem_ct`, `ciphertext`). Does NOT free the struct itself — `SolitonEncryptedMessage` is a caller-owned value type, not an opaque heap handle. After calling this function, the caller is responsible for freeing the struct allocation (e.g., `free(msg)` in C). Contrast with `soliton_ratchet_free`, which frees the opaque handle allocation.

**Call:**
- `soliton_ratchet_derive_call_keys(ratchet, kem_ss, kem_ss_len, call_id, call_id_len, out)` — derive call keys. `kem_ss_len` MUST be exactly 32 and `call_id_len` MUST be exactly 16; any other value → `InvalidLength`. These are the only two fixed-size input parameters in the call group with explicit length validation — unlike `local_fp` and `remote_fp` (taken from ratchet state internally), `kem_ss` and `call_id` are caller-supplied buffers with strict size contracts.
- `soliton_call_keys_send_key(keys, out, out_len)` — copy current send key. **`out_len` must be exactly 32**; any other value returns `InvalidLength`. **The caller MUST zeroize `out` after use** — use `soliton_zeroize(out, 32)`. The copied key is live session key material for media encryption.
- `soliton_call_keys_recv_key(keys, out, out_len)` — copy current recv key. **`out_len` must be exactly 32**; any other value returns `InvalidLength`. **The caller MUST zeroize `out` after use** — use `soliton_zeroize(out, 32)`. The copied key is live session key material for media encryption.
- `soliton_call_keys_advance(keys)` — advance call chain (rekey). Returns `ChainExhausted` (-15) after 2²⁴ steps. On exhaustion, all call key material (`key_a`, `key_b`, `chain_key`) is immediately zeroized — **the handle is dead**: `soliton_call_keys_send_key` and `soliton_call_keys_recv_key` will return zeroed material after exhaustion, with no error or diagnostic. The handle is NOT auto-freed on `ChainExhausted`; callers MUST free it via `soliton_call_keys_free` and establish a new call via `soliton_ratchet_derive_call_keys`. See §6.12.
- `soliton_call_keys_free(keys)` — free opaque call keys (zeroizes). Returns `int32_t`: 0 on success, `ConcurrentAccess` (-18) if in use, `InvalidData` (-17) if type discriminant wrong. Null outer/inner is safe no-op (returns 0)

**Storage:**
- `soliton_storage_encrypt(keyring, plaintext, ..., channel_id, segment_id, compress, out)` — encrypt blob
- `soliton_storage_decrypt(keyring, blob, ..., channel_id, segment_id, out)` — decrypt blob
- `soliton_dm_queue_encrypt(keyring, plaintext, ..., recipient_fp, batch_id, compress, out)` — encrypt DM queue blob (§11.4.2 AAD)
- `soliton_dm_queue_decrypt(keyring, blob, ..., recipient_fp, batch_id, out)` — decrypt DM queue blob
- `soliton_keyring_new(key, key_len, version, out)` — create keyring (key is fixed 32 bytes). **Error codes**: `NullPointer` (-13) if `out` is null; `InvalidLength` (-1) if `key_len ≠ 32`; `UnsupportedVersion` (-10) if `version == 0` (version 0 is reserved — §11.1); `InvalidData` (-17) if the key is all-zero bytes (§11.2 guard — all-zero is an invalid key). Returns 0 on success.
- `soliton_keyring_add_key(keyring, key, key_len, version, make_active)` — add key (key is fixed 32 bytes). `encrypt_blob` always uses the active version's key. `make_active=true` atomically updates the active version to the newly-added key. `make_active=false` registers the key for decryption only (lookup by version byte) — the active version for new encryptions does not change. **`make_active=false` with a `version` matching the current active version returns `InvalidData`**: a caller adding a key with the same version byte as the current active key while passing `make_active=false` intends for the new key material to remain inactive, but the version byte already identifies the active slot — this is an ambiguous / incoherent request (it would silently replace key material for the active version without activating it, making the active version undecryptable for blobs previously encrypted under the old material). The function rejects this with `InvalidData` rather than silently updating the key material.
- `soliton_keyring_remove_key(keyring, version)` — remove key. Returns `int32_t`: 0 if key was present and removed, 0 if key was absent (idempotent), `InvalidData` (-17) if `version` is the current active version (active key cannot be removed — §10 invariant), `UnsupportedVersion` (-10) if `version == 0`, `NullPointer` (-13) if keyring is null, `InvalidData` (-17) if type discriminant wrong. **Design note — both Ok outcomes return 0**: The core Rust `remove_key` returns `Ok(true)` (was present, removed) or `Ok(false)` (was absent). The CAPI collapses both to return code 0 — the distinction is informational and has no security consequence; the idempotency is the externally visible contract. Binding authors who need to distinguish the two cases must track key versions independently or use the Rust API directly.
- `soliton_keyring_free(keyring)` — free keyring. Returns `int32_t`: 0 on success, `ConcurrentAccess` (-18) if in use, `InvalidData` (-17) if type discriminant wrong. Null outer/inner is safe no-op (returns 0)

**Streaming AEAD:**
- `soliton_stream_encrypt_init(key, key_len, caller_aad, aad_len, compress, out)` — init encryptor (generates random base nonce). `key_len` MUST be exactly 32; any other value returns `InvalidLength`. Unlike `header_len` (lenient — extra bytes accepted), `key_len` is strict — the key is always exactly 32 bytes for XChaCha20-Poly1305.
- `soliton_stream_encrypt_header(enc, out, out_len)` — copy 26-byte header into caller-allocated buffer; `out_len` MUST be ≥ 26 (lenient: extra buffer space is accepted)
- `soliton_stream_encrypt_chunk(enc, plaintext, ..., is_last, out)` — encrypt one chunk; `out_len` MUST be ≥ `STREAM_ENCRYPT_MAX` (1,048,849 bytes) — returns `InvalidLength` for smaller buffers (parallel to the `out_len < STREAM_CHUNK_SIZE → InvalidLength` rule for decrypt chunk)
- `soliton_stream_encrypt_chunk_at(enc: *const, index, plaintext, ..., is_last, out)` — encrypt at explicit index (stateless, random-access); uses `*const SolitonStreamEncryptor` (not `*mut`) to reflect the `&self` Rust contract — see §15.11 for the `*const` caveat. Same `out_len` ≥ `STREAM_ENCRYPT_MAX` requirement as the sequential variant. `index` MUST be unique per call — calling with the same `index` and different plaintexts produces nonce reuse (§15.12). Does not advance `next_index`. Not interchangeable with the sequential variant; see §15.11 for mixed-mode use. **Absent from `soliton.h`**: this function is implemented and exported (`#[unsafe(no_mangle)]`) but has no declaration in the C header — its decrypt counterpart `soliton_stream_decrypt_chunk_at` is declared in the header. Binding authors (C, C++, Go cgo, C#, Dart) must supply a manual `extern` declaration matching the signature above until the header is updated.
- `soliton_stream_encrypt_is_finalized(enc, out: *mut bool)` — write finalized state to `out`
- `soliton_stream_encrypt_free(enc)` — free encryptor (zeroizes key). Returns `int32_t`: 0 on success, `NullPointer` (-13) if outer pointer null, 0 (safe no-op) if inner pointer null (null inner pointer means the handle was already freed or never initialized — matches the double-free behavior of `soliton_ratchet_free` / `soliton_keyring_free`; does NOT return `NullPointer` for inner-null), `ConcurrentAccess` (-18) if in use, `InvalidData` (-17) if type discriminant wrong

**`soliton_stream_encrypt_chunk` output buffer — only `out_written` bytes are valid**: On a successful return from `soliton_stream_encrypt_chunk`, only the first `*out_written` bytes of `out` contain ciphertext. The output buffer must be at least `STREAM_ENCRYPT_MAX` (1,048,849 bytes) to accommodate any valid chunk, but a non-final uncompressed chunk writes exactly `CHUNK_SIZE + CHUNK_OVERHEAD` = 1,048,593 bytes, leaving the remaining 256 bytes of a minimum-sized buffer uninitialized. A binding author who copies `out[0..STREAM_ENCRYPT_MAX]` to a transport (instead of `out[0..*out_written]`) transmits up to 256 bytes of heap content alongside the ciphertext — ciphertext is not secret, but the heap bytes may contain earlier key material or other sensitive data from the process heap. Always use `*out_written` to determine the valid range. This mirrors the behavior documented for `soliton_stream_decrypt_chunk` and `soliton_stream_decrypt_chunk_at`.

**No `soliton_stream_encrypt_next_index` function**: After `encrypt_chunk(is_last=true)` succeeds, the chunk count equals the encryptor's internal `next_index` — but this is not exposed via CAPI. §15.12 describes how to track chunk count: callers must count `encrypt_chunk` calls manually, or use `is_finalized()` to confirm the stream is complete. The decrypt-side `soliton_stream_decrypt_expected_index` has no symmetric encrypt-side counterpart — this asymmetry is intentional.
- `soliton_stream_decrypt_init(key, key_len, header, header_len, caller_aad, aad_len, out)` — init decryptor from header; `key_len` MUST be exactly 32 (strict: any other length returns `InvalidLength`, same as encrypt_init); `header_len` MUST be exactly 26 (strict: any other length returns `InvalidLength`)
- `soliton_stream_decrypt_chunk(dec, chunk, chunk_len, out, out_len, out_written, is_last: *mut bool)` — decrypt sequential chunk; `out_len` MUST be ≥ `STREAM_CHUNK_SIZE` (1,048,576 bytes) — returns `InvalidLength` for smaller buffers (see note below); `is_last` is a required non-null out-parameter — returns `NullPointer` if null
- `soliton_stream_decrypt_chunk_at(dec, index, chunk, chunk_len, out, out_len, out_written, is_last: *mut bool)` — decrypt at explicit index (stateless); same `out_len` ≥ `STREAM_CHUNK_SIZE` requirement as above; `is_last` is a required non-null out-parameter — returns `NullPointer` if null
- `soliton_stream_decrypt_is_finalized(dec, out: *mut bool)` — write finalized state to `out`
- `soliton_stream_decrypt_expected_index(dec, out: *mut u64)` — write next expected sequential index to `out`
- `soliton_stream_decrypt_free(dec)` — free decryptor (zeroizes key). Returns `int32_t`: 0 on success, `NullPointer` (-13) if outer pointer null, 0 (safe no-op) if inner pointer null (null inner pointer means already-freed or never-initialized — does NOT return `NullPointer` for inner-null, consistent with `soliton_ratchet_free` / `soliton_keyring_free`), `ConcurrentAccess` (-18) if in use, `InvalidData` (-17) if type discriminant wrong

**`SOLITON_STREAM_ENCRYPT_MAX` and `SOLITON_STREAM_CHUNK_SIZE` are NOT defined as `#define` constants in `soliton.h`**: The header references these names in documentation comments but does not provide `#define` or `constexpr` entries. Binding authors who write `out_len = SOLITON_STREAM_ENCRYPT_MAX` get a compile error. The values must be embedded as integer literals in bindings: `STREAM_ENCRYPT_MAX = 1,048,849` (encrypt output buffer, see Appendix A) and `STREAM_CHUNK_SIZE = 1,048,576` (decrypt output buffer, see Appendix A). Language-idiomatic constant definitions are recommended:
```c
// C/C++ — add to binding wrapper or generated header
#define SOLITON_STREAM_ENCRYPT_MAX  1048849UL
#define SOLITON_STREAM_CHUNK_SIZE   1048576UL
```
These values are stable and will not change without a major version bump.

**Streaming decrypt output buffer minimum — `STREAM_CHUNK_SIZE` (1,048,576 bytes)**: Both `soliton_stream_decrypt_chunk` and `soliton_stream_decrypt_chunk_at` require the output buffer to be at least `STREAM_CHUNK_SIZE` bytes regardless of the expected plaintext size. This is because the buffer size cannot be known before decryption completes (for compressed streams, the decompressed size is variable and determined post-AEAD; for uncompressed streams, the plaintext size equals the ciphertext minus the 16-byte AEAD tag, which requires parsing the ciphertext first). The library therefore mandates a worst-case buffer that can hold any valid decrypted chunk. **This is asymmetric with the encrypt side**: the encrypt output buffer uses `STREAM_ENCRYPT_MAX` (1,048,849 bytes), which is larger than `STREAM_CHUNK_SIZE` to accommodate compression overhead and the tag_byte. The decrypt minimum is the raw `STREAM_CHUNK_SIZE` because decrypt outputs plaintext (no tag_byte, no compression overhead). Binding authors who size the output buffer to the expected plaintext for a small final chunk (e.g., a 100-byte final chunk with a 100-byte output buffer) will receive `InvalidLength` with no diagnostic in the error message indicating that buffer size is the cause. See Appendix A for the constant value.

**Streaming header buffer size asymmetry**: `soliton_stream_encrypt_header` accepts any `out_len ≥ 26` (lenient — a 32-byte output buffer is fine). `soliton_stream_decrypt_init` requires `header_len == 26` exactly (strict — any other length returns `InvalidLength`). This asymmetry is intentional: the encryptor writes into a caller-owned buffer and the caller controls the buffer size; the decryptor parses an input buffer where any size other than exactly 26 indicates a framing error. A caller who stores the 26-byte header in a 32-byte buffer can encrypt successfully but must pass exactly `header_len = 26` to `decrypt_init` — passing the full buffer length (32) returns `InvalidLength`.

**Primitives:**
- `soliton_hmac_sha3_256(key, key_len, data, data_len, out, out_len)` — HMAC-SHA3-256. **`out_len` must be exactly 32** (the HMAC-SHA3-256 output size); any other value returns `InvalidLength`. Unlike most output-length parameters in the CAPI (which express a caller-allocated buffer size), `out_len` here is a strict size-check: the function does not produce a variable-length output.
- `soliton_hkdf_sha3_256(salt, salt_len, ikm, ikm_len, info, info_len, out, out_len)` — HKDF-SHA3-256. **`out_len` constraint**: must be in the range 1-8160 bytes. The upper bound is the RFC 5869 §2.3 HKDF-Expand maximum: 255 × HashLen = 255 × 32 = 8160 bytes for SHA3-256. A zero `out_len` or `out_len > 8160` returns `InvalidLength`.
- `soliton_sha3_256(data, data_len, out, out_len)` — SHA3-256. **`out_len` must be exactly 32**; any other value returns `InvalidLength`.
- `soliton_xwing_keygen(pk_out, sk_out)` — X-Wing key generation
- `soliton_xwing_encapsulate(pk, pk_len, ct_out, ss_out)` — X-Wing encapsulate. `ss_out` receives a 32-byte shared secret into a caller-allocated buffer. **The caller MUST zeroize `ss_out` after use** — use `soliton_zeroize(ss_out, 32)`.
- `soliton_xwing_decapsulate(sk, sk_len, ct, ct_len, ss_out)` — X-Wing decapsulate. `ss_out` receives a 32-byte shared secret into a caller-allocated buffer. **The caller MUST zeroize `ss_out` after use** — use `soliton_zeroize(ss_out, 32)`.
- `soliton_aead_encrypt(key, key_len, nonce, nonce_len, plaintext, ..., aad, ..., out)` — raw XChaCha20-Poly1305 encrypt. **`key_len` MUST be exactly 32** (AES-style key mismatch: XChaCha20-Poly1305 uses a 256-bit key); any other value returns `InvalidLength`. **`nonce_len` MUST be exactly 24** — XChaCha20 uses a 192-bit nonce; passing a 12-byte ChaCha20 nonce returns `InvalidLength`. This is a common caller error when migrating from `chacha20poly1305` (12-byte nonce) to `xchacha20poly1305` (24-byte nonce).
- `soliton_aead_decrypt(key, key_len, nonce, nonce_len, ciphertext, ..., aad, ..., out)` — raw XChaCha20-Poly1305 decrypt. Same key and nonce length constraints as `soliton_aead_encrypt`: `key_len` must be 32 and `nonce_len` must be 24; any other value returns `InvalidLength`.
- `soliton_hmac_sha3_256_verify(tag_a, tag_a_len, tag_b, tag_b_len)` — constant-time 32-byte tag comparison. Returns 0 if equal, `VerificationFailed` (-3) if tags differ, `InvalidLength` (-1) if either length ≠ 32. **Constant-time is a security requirement** — comparison time must be independent of tag contents to prevent timing attacks on authentication tokens (§4). Do NOT substitute `memcmp()` or any early-exit comparison.
- `soliton_argon2id(password, ..., salt, ..., m_cost, t_cost, p_cost, out, out_len)` — Argon2id KDF (§10.6). **`out_len` constraint**: 1-4096 bytes; zero or `> 4096` returns `InvalidLength` (see §10.6 cap rationale).
- `soliton_verification_phrase(pk_a, pk_a_len, pk_b, pk_b_len, out)` — verification phrase
- `soliton_random_bytes(buf, len)` — fill `buf` with `len` cryptographically random bytes from the OS CSPRNG. **Output cap**: `len` must be ≤ 256 MiB (268,435,456 bytes) — requests exceeding this return `InvalidLength`. **CSPRNG failure aborts**: like keygen and encapsulation (§13.2), `soliton_random_bytes` aborts the process on OS entropy failure rather than returning an error code. Binding authors MUST NOT expect an error return on CSPRNG failure for this function.
- `soliton_zeroize(ptr, len)` — volatile-write zeroing (guaranteed not optimized out — use for caller-owned secret buffers). **Null-safe and zero-length-safe**: if `ptr` is NULL or `len == 0`, the function is a silent no-op (returns immediately without error and without performing any memory write). This diverges from the general §13.2 convention where null pointers return `NullPointer` (-13) — `soliton_zeroize` does NOT return an error code for null input. Callers relying on `soliton_zeroize` to confirm that a buffer was zeroed MUST verify `ptr != NULL && len > 0` before calling; the silent no-op means a null-check failure is invisible at the call site. **`soliton_zeroize` has no return value** — it returns `void` (C) / `()` (Rust). Unlike all other CAPI functions, there is no `int32_t` return code to check; the function either performs the volatile writes or silently does nothing.
- `soliton_version()` — return version string as `*const c_char`. **Static lifetime — do NOT free**: The returned pointer is embedded in the library binary (a `'static` string slice in Rust, exposed as a C string literal). It is valid for the lifetime of the process, never null, and MUST NOT be passed to `free()` or `soliton_buf_free()`. Calling `free()` on a static pointer is undefined behavior (heap corruption). Binding authors who follow the "every library allocation must be freed" convention from §13.2 must add an exception for `soliton_version()`. This function is the sole CAPI function that returns a raw C string pointer rather than a `SolitonBuf`; all other variable-length string outputs use `SolitonBuf` and are heap-allocated. The pointer remains valid as long as the library is loaded.

**KEX (additional):**
- `soliton_kex_decode_session_init(data, data_len, out)` — decode SessionInit from bytes. Input cap: 64 KiB (65,536 bytes). Inputs exceeding 64 KiB return `InvalidLength`. This is tighter than the general 256 MiB CAPI cap (§13.2) — the maximum valid SessionInit is 4,669 bytes (with OPK; §7.4), so 64 KiB is a safe conservative bound that prevents allocation-exhaustion from oversized buffers.
- `soliton_decoded_session_init_free(session)` — free decoded SessionInit: frees the `crypto_version` `SolitonBuf`. No zeroization is performed — `SolitonDecodedSessionInit` contains no secret material (§13.6). Null `session` is a safe no-op. Must be called on every successful `soliton_kex_decode_session_init` output.
- `soliton_kex_received_session_free(session)` — free received session

**`soliton_kex_build_first_message_aad` input cap**: This function constructs the first-message AAD from a `SolitonInitiatedSession` and returns it as a `SolitonBuf`. It applies an 8 KiB (8,192 bytes) internal cap on the combined size of the SessionInit encoding and ancillary fields. Inputs exceeding 8 KiB return `InvalidLength`. In practice the SessionInit encoding is at most 4,669 bytes (§7.4), so this cap is never reached with well-formed inputs. Binding authors who synthesize oversized mock `SolitonInitiatedSession` structs for testing may encounter this limit.

**`opk_sk` co-presence error codes at the CAPI level**: The two directions of OPK co-presence violation produce different error codes at the CAPI level. (1) When `ct_opk` is non-null (OPK ciphertext present) but `opk_sk` is null (the OPK secret key pointer is null): the CAPI's null-pointer guard for `opk_sk` fires first and returns `NullPointer` (-13), before the co-presence check runs. (2) When `ct_opk` is null (no OPK ciphertext) but `opk_sk` is non-null (a secret key pointer was passed): the co-presence check fires and returns `InvalidData` (-17), because the OPK secret key was supplied for an absent OPK ciphertext. Binding authors pattern-matching on errors from `soliton_kex_receive` MUST handle both: `NullPointer` for "OPK ciphertext present, OPK secret key missing" and `InvalidData` for "OPK ciphertext absent, OPK secret key present."

**`opk_id` co-presence constraint**: When `ct_opk` is null (no OPK ciphertext), `opk_id` MUST be 0. Passing a non-zero `opk_id` with a null `ct_opk` returns `InvalidData`. This constraint is enforced by `soliton_kex_receive` and `soliton_kex_decode_session_init`. The `opk_id` field is meaningful only when `ct_opk` is present; a non-zero `opk_id` with absent `ct_opk` indicates a malformed SessionInit (the OPK key lookup would use the non-zero ID to look up an OPK that the protocol says is not being used). A reimplementer who initializes `opk_id` to a non-zero default when building a no-OPK SessionInit will receive `InvalidData` on the receiving side. **`opk_id = 0` is a valid OPK ID when `has_opk = true` / `ct_opk` is present**: A server can assign OPK ID 0 to the first uploaded one-time pre-key. When a `SessionInit` arrives with `has_opk = 0x01` (or `ct_opk` non-null) and `opk_id = 0`, this means OPK ID 0 was used — `has_opk` is the sole authority for whether an OPK was included. `opk_id = 0` does NOT act as a sentinel for "no OPK present" in the case where `has_opk = true`. A reimplementer who treats `opk_id == 0` as "no OPK" and ignores `has_opk` will discard valid SessionInits that used OPK ID 0, silently ignoring the OPK ciphertext and producing wrong decapsulation output. In `SolitonDecodedSessionInit`, `has_opk` is the canonical field to check; `opk_id` must only be used when `has_opk == 1`.

### 13.5 Key Usage Order for Session Initiation

The epoch key flows through several steps and the right value must be passed to each:

```
// Alice (initiator):
soliton_kex_initiate(...)           → SolitonInitiatedSession { initial_epoch_key, ... }
soliton_ratchet_encrypt_first(initial_epoch_key, plaintext, aad, ...)  → (payload, ratchet_init_key)
soliton_ratchet_init_alice(root_key, ratchet_init_key, ek_pk, ek_sk, ...)

// Bob (responder):
soliton_kex_receive(...)            → (root_key, initial_epoch_key, peer_ek)
soliton_ratchet_decrypt_first(initial_epoch_key, payload, aad, ...)  → (plaintext, ratchet_init_key)
soliton_ratchet_init_bob(root_key, ratchet_init_key, peer_ek, ...)
```

**`aad` parameter for `encrypt_first` / `decrypt_first`**: The `aad` parameter is the first-message AAD constructed by `build_first_message_aad` / `soliton_kex_build_first_message_aad`. Its value is (§7.3 / §5.4 Step 7): `"lo-dm-v1" || sender_fingerprint_raw || recipient_fingerprint_raw || encode_session_init(session_init)`. Both Alice and Bob must pass byte-for-byte identical `aad` bytes; any divergence (wrong label, wrong fingerprint order, non-canonical `encode_session_init` output) produces `AeadFailed` on Bob's `decrypt_first` with no diagnostic pointing to the AAD. A reimplementer reading §13.5 in isolation who constructs `aad` from the per-function parameter description only will not find the required content — it must be sourced from §5.4 Step 7 (Alice's side) and §5.5 Step 6 (Bob's side). The easiest correct implementation calls `soliton_kex_build_first_message_aad` (§13.4) to produce this value rather than constructing it manually.

**`encrypt_first_message` / `decrypt_first_message` are pre-RatchetState standalone operations**: These functions take an `initial_epoch_key` parameter directly — they do NOT require or use a `SolitonRatchet` handle. They are stateless AEAD operations called before the ratchet is initialized (`ratchet_init_alice` / `ratchet_init_bob`). A reimplementer who constructs a `SolitonRatchet` first and then tries to pass it to the first-message functions has misread the call sequence — the first-message functions consume the initial epoch key and return `ratchet_init_key`, which is then passed to ratchet init.

`ratchet_init_key` is the epoch key returned unchanged by `encrypt_first_message` / `decrypt_first_message` — it is the input `initial_epoch_key` passed through (counter-mode does not advance the epoch key). It is passed to `ratchet_init_alice` / `ratchet_init_bob` as the initial epoch key. It is not a separate derived value.

**Name equivalence (epoch key)**: The same 32-byte value (`session_key[32..64]` from §5.4 Step 4) appears under four names across the spec, Rust API, and CAPI: `epoch_key` (§5.4 protocol pseudocode), `initial_epoch_key` (CAPI `SolitonInitiatedSession` / `soliton_kex_receive` output in the §13.5 pseudocode), `initial_chain_key` (Rust `InitiatedSession::take_initial_chain_key()` — historical name from the pre-counter-mode chain design), and `ratchet_init_key` (CAPI return from `encrypt_first` / `decrypt_first`). All four are the same value at different points in the key flow. **`SolitonReceivedSession` struct field name**: In the `SolitonReceivedSession` C struct (§13.6), Bob's copy of this value is named **`chain_key`** — not `initial_epoch_key`. The §13.5 pseudocode uses `initial_epoch_key` as the return-value label for `soliton_kex_receive`; the §13.6 struct layout names the same field `chain_key`. A binding author laying out `SolitonReceivedSession` manually must use the field name `chain_key`, not `initial_epoch_key`.

**Name equivalence (ephemeral public key)**: Alice's ephemeral X-Wing public key (`EK_pub`, 1216 bytes) also appears under three names: `sender_ek` (the `SessionInit` struct field transmitted in §5.4 Step 5), `ek_pk` (the `SolitonInitiatedSession` field returned by `soliton_kex_initiate` and stored in Alice's ratchet handle via `soliton_ratchet_init_alice`), and `send_ratchet_pk` (the `RatchetState` field after `init_alice` — Alice's initial send ratchet public key, which Bob will encapsulate to on his first send). Getting this mapping wrong means Bob's first KEM ratchet step encapsulates to a different key than Alice expects — the resulting `kem_ss` diverges, the new epoch key diverges, and every subsequent message fails with `AeadFailed` with no diagnostic pointing to the mismatched key.

Passing `initial_epoch_key` directly to ratchet init (skipping the first-message step) produces no immediate error — AEAD encryption succeeds with any 32-byte key — but decryption at the remote end will fail.

**`soliton_kex_receive` wrong-key-ID silent failure**: If a recognized `spk_id` is paired with the wrong secret key (e.g., storage corruption maps a valid ID to different key material), `soliton_kex_receive` succeeds and returns a valid-looking `SolitonReceivedSession` — but X-Wing implicit rejection produces a pseudorandom `ss_spk`, so `root_key` and `initial_epoch_key` diverge from Alice's. The error surfaces only when `decrypt_first_message` fails with `AeadFailed`, with no diagnostic distinguishing this from ciphertext tampering or transport corruption. This is the same category of silent failure as passing `initial_epoch_key` directly to ratchet init. Bob's `spk_id → sk` mapping MUST be verified for integrity independently (e.g., by storing a fingerprint of the SPK public key alongside the private key and checking it before decapsulation) — see §5.5 Step 4.

**Single-use key extraction**: `InitiatedSession` and `ReceivedSession` enforce single-use extraction of `root_key` and `initial_epoch_key`. The first call to `take_root_key()` / `take_initial_epoch_key()` returns the value and replaces the internal copy with zeros. A second call returns all-zeros, which ratchet init rejects (all-zero root_key is invalid). Reimplementers providing accessor methods (`get_root_key()`) instead of consuming methods risk accidental key reuse — extracting the same root key twice and initializing two ratchets produces two sessions with identical state, causing nonce reuse on the first message.

**`ek_sk` is also single-use**: The `ek_sk` field (X-Wing ephemeral secret key, 2432 bytes) in `SolitonInitiatedSession` MUST be passed to exactly one `soliton_ratchet_init_alice` call. `ek_sk` is the X-Wing decapsulation key that Alice will use to decapsulate Bob's first KEM ratchet ciphertext — passing it to two `init_alice` calls creates two ratchet instances with identical `send_ratchet_sk`. When Bob sends his first ratchet message, he encapsulates to Alice's `ek_pk` once; only one of the two Alice instances can derive the correct epoch key from the resulting KEM ciphertext. The other instance has the same `send_ratchet_sk` but decapsulates against a mismatched ciphertext, producing a wrong `kem_ss`, a wrong `recv_epoch_key`, and silent `AeadFailed` on the first message with no diagnostic pointing to the duplicated `ek_sk`. Unlike `root_key` and `initial_epoch_key`, `ek_sk` is not enforced as single-use by a consuming wrapper at the CAPI level (it is passed as a `*const SolitonBuf` raw pointer); callers MUST NOT reuse it. After `soliton_ratchet_init_alice` returns, the `ek_sk` buffer should be freed via `soliton_kex_initiated_session_free` — do not pass it to any further `init_alice` calls.

### 13.6 Opaque Structs

`SolitonRatchet`, `SolitonKeyRing`, `SolitonCallKeys`, `SolitonStreamEncryptor`, and `SolitonStreamDecryptor` are heap-allocated opaque structs. Their internal layout is not part of the ABI. They must be freed with `soliton_ratchet_free`, `soliton_keyring_free`, `soliton_call_keys_free`, `soliton_stream_encrypt_free`, and `soliton_stream_decrypt_free` respectively.

`SolitonInitiatedSession` is a flat C struct with both inline fields (zeroed by `soliton_kex_initiated_session_free`) and `SolitonBuf` fields. The `ek_sk` field must be freed via `soliton_kex_initiated_session_free` — do NOT call `soliton_buf_free` on `ek_sk` directly. `soliton_buf_free` frees the heap allocation and nulls the `SolitonBuf` fields, but the inline `root_key` and `initial_chain_key` arrays (32 bytes each, embedded directly in the struct, not `SolitonBuf` fields) are left unzeroized. The dedicated free function zeroizes both inline arrays and then frees the `SolitonBuf` fields. Calling `soliton_buf_free` on `ek_sk` followed by the dedicated free is safe (the null-after-free guarantee makes the second free of `ek_sk` a no-op), but calling only `soliton_buf_free` leaks 64 bytes of secret material. **GC language hazard**: `SolitonInitiatedSession` contains inline `root_key` and `initial_chain_key` (32 bytes each) — secret material embedded directly in the struct. In GC languages (C#, Go, Python), the GC may relocate (compact) a managed-heap struct, leaving unzeroized copies of these keys at the old address. Binding authors MUST allocate this struct in pinned/unmanaged memory (`Marshal.AllocHGlobal`, `C.malloc`, `ctypes.create_string_buffer`) and call `soliton_kex_initiated_session_free` immediately after extracting both keys to minimize the pinned lifetime.

**GC language hazard — `SolitonReceivedSession`**: The identical hazard applies to `SolitonReceivedSession` (Bob's side). `SolitonReceivedSession` contains inline `root_key` ([u8; 32]) and `chain_key` ([u8; 32]) — secret material embedded directly in the struct alongside `SolitonBuf` fields for `peer_ek`. GC relocation at any point between `soliton_kex_receive` returning and `soliton_kex_received_session_free` executing leaves unzeroized copies at the old address. Binding authors MUST apply the same pinned/unmanaged-memory allocation to `SolitonReceivedSession` as to `SolitonInitiatedSession`. The mitigation pattern is: allocate `SolitonReceivedSession` in pinned memory → call `soliton_kex_receive` → extract `root_key` and `chain_key` into pinned buffers → call `soliton_kex_received_session_free` → unpin. This struct is Bob's counterpart to Alice's `SolitonInitiatedSession` and carries the same category of secret material.

**Alignment padding in flat structs**: `SolitonInitiatedSession` has two implicit padding gaps that binding authors laying out the struct manually (Go `struct`, C# `StructLayout`, Python `ctypes.Structure`) MUST include explicitly. (1) **`spk_id` → `ct_opk` (4-byte gap)**: `spk_id` (`uint32_t`, 4 bytes) ends at offset 212, but `ct_opk` (`SolitonBuf`) requires 8-byte pointer alignment — the next 8-aligned boundary is offset 216, so 4 bytes of implicit padding appear at offsets 212-215. A binding author who places `ct_opk` at offset 212 corrupts all subsequent fields. (2) **`has_opk` → `sender_sig` (3-byte gap)**: `has_opk` (`uint8_t`) is followed by 3 bytes of implicit alignment padding before the next pointer-aligned `SolitonBuf` field. The generated `soliton.h` header handles both gaps automatically via C's natural alignment rules. The same 3-byte `has_opk` pattern applies to `SolitonDecodedSessionInit`'s `has_opk` field (3 bytes padding before the next 4-byte-aligned field).

**`SolitonDecodedSessionInit` contains no secret material — no zeroization required**: All fields are wire-transmitted public or semi-public values (ciphertexts, public keys, fingerprints, version string). None require zeroization or privileged memory treatment. Callers MAY discard this struct normally after use — `free()` in C, garbage collection in managed languages, stack deallocation in Rust. Contrast with `SolitonInitiatedSession` and `SolitonReceivedSession`, which contain secret key material (`root_key`, epoch keys) derived from KEM operations and MUST be freed exclusively via `soliton_kex_initiated_session_free` / `soliton_kex_received_session_free`, which zeroize their contents before deallocation. `SolitonDecodedSessionInit` does not have and does not need a zeroizing free function.

**`SolitonDecodedSessionInit` is large (4,672 bytes on LP64) — avoid stack allocation in constrained environments**: This struct contains the full decoded fields of a SessionInit including `ct_ik` (1,120 bytes), `ct_spk` (1,120 bytes), `ct_opk` (1,120 bytes), `sender_ek` (1,216 bytes), and one `SolitonBuf` field (`crypto_version`, 16 bytes on LP64: `ptr` + `len`). The exact `#[repr(C)]` size on LP64 is 4,672 bytes: `crypto_version`(16) + `sender_fp`(32) + `recipient_fp`(32) + `sender_ek`(1216) + `ct_ik`(1120) + `ct_spk`(1120) + `spk_id`(4) + `has_opk`(1) + `ct_opk`(1120) + 3 bytes alignment padding + `opk_id`(4) + 4 bytes trailing struct padding to align to 8 bytes = 4,672. Binding authors doing manual struct layout (Go `struct`, C# `StructLayout`, Python `ctypes.Structure`) must include both the 3-byte padding before `opk_id` and the 4-byte trailing padding. On Go goroutines (initial stack 8 KiB, fragmented by other locals) and .NET async state machines (stack budget shared with awaiter frames), placing this struct on the stack risks non-deterministic stack overflow. **Binding authors MUST heap-allocate this struct**: `C.malloc` in Go, `Marshal.AllocHGlobal` in .NET, `ctypes.create_string_buffer(ctypes.sizeof(...))` in Python, or equivalent. In Rust, the core library's `SolitonDecodedSessionInit` is behind a `Box<>`; C/Go/Python bindings must ensure the same. A binding that allocates this struct on the frame stack passes tests on machines with ample stack space but crashes non-deterministically in production under deep call chains.

**`SolitonDecodedSessionInit.crypto_version` SolitonBuf length includes null terminator**: The `crypto_version` field of `SolitonDecodedSessionInit` is a `SolitonBuf` whose `len` is **13** for the current version — 12 bytes for the string `"lo-crypto-v1"` plus one trailing null byte (`\0`). The null byte is included to make the buffer directly usable as a C string without an additional copy. Binding authors who read `len` and compare it to 12 (the character count of `"lo-crypto-v1"`) will find `len == 13` and may incorrectly conclude the version string is malformed. The correct validation pattern is: `buf.len == 13 && buf.ptr[0..12] == b"lo-crypto-v1" && buf.ptr[12] == 0`. Do NOT pass `buf.len` as the length of a cryptographic comparison (e.g., to a constant-time compare function) expecting 12 — the extra null byte would cause a mismatch against a 12-byte reference string.

**`SolitonRatchetHeader` and `SolitonEncryptedMessage` layouts (flat value types — binding authors must lay out manually)**: These are `#[repr(C)]` flat structs returned from `soliton_ratchet_encrypt` and passed to `soliton_ratchet_decrypt`. Unlike the opaque handle types above, binding authors in Go, C#, and Python must lay out these structs explicitly.

`SolitonRatchetHeader` (40 bytes on LP64):

| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0 | 16 | `ratchet_pk` | `SolitonBuf` — sender's ratchet public key (library-allocated; ptr + len, each 8 bytes on LP64) |
| 16 | 16 | `kem_ct` | `SolitonBuf` — KEM ciphertext, if present; `ptr` is null and `len` is 0 if absent (same-epoch message) |
| 32 | 4 | `n` | `uint32_t` — message number within current send chain |
| 36 | 4 | `pn` | `uint32_t` — length of the previous send chain |

`SolitonEncryptedMessage` (56 bytes on LP64):

| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0 | 40 | `header` | `SolitonRatchetHeader` (inline, not a pointer) |
| 40 | 16 | `ciphertext` | `SolitonBuf` — AEAD-encrypted message (library-allocated) |

**`kem_ct.ptr == null` (null pointer, `len == 0`) signals absence of a KEM ciphertext** — do NOT use an all-zero `SolitonBuf` as the absent sentinel. On success, `soliton_ratchet_encrypt` zeroes the entire `SolitonEncryptedMessage` output before writing, so the null-ptr convention applies on success paths. On error, the output is also zeroed (making `soliton_encrypted_message_free` safe to call on error paths — it is a no-op on zero-initialized structs). Binding authors MUST pass `kem_ct.ptr` as `NULL` and `kem_ct.len` as `0` to `soliton_ratchet_decrypt` when the header contains no KEM ciphertext.

**Type-tagging**: Each opaque handle type embeds a 4-byte magic discriminant as its first field. The `_free` functions validate this discriminant before operating on the pointer. Passing a handle to the wrong free function (e.g., `soliton_ratchet_free` on a `SolitonKeyRing*`) is detected and returns `InvalidData` rather than corrupting memory. The discriminant values are internal and not part of the ABI.

**`soliton.h` OWNERSHIP comment says cross-type free is "undefined behavior" — Specification.md is normative**: The generated `soliton.h` header may carry an OWNERSHIP comment stating that passing the wrong handle type to a `_free` function is "undefined behavior." This contradicts §13.6's normative claim that the type discriminant check catches this and returns `InvalidData`. **Specification.md is normative; the header comment is documentation only.** Binding authors reading `soliton.h` who see "undefined behavior" and add their own UB-protection wrappers (null-checking the outer pointer, refusing to call `_free` when uncertain of the handle type) may inadvertently mask the `InvalidData` return code. The correct model: the discriminant check is implemented; cross-type free returns `InvalidData` (-17); no memory is corrupted; binding wrappers should propagate `InvalidData` as a type-mismatch error, not treat cross-type free as safe to elide.

**Pointer aliasing**: Opaque handles must not be aliased. Copying a handle pointer via `memcpy` and then using both copies produces undefined behavior — specifically, two encrypt calls on aliased `SolitonRatchet` handles will use the same nonce, causing catastrophic AEAD nonce reuse. The CAPI does not enforce single-ownership at the API level; this is a caller obligation. If a binding language needs to share a handle across threads, it must serialize access (e.g., mutex).

---

## 14. Security Analysis

### 14.1 Compromised Community Server

**Impact**: Reads group plaintext, observes connected users, could modify or inject messages, present fake keys.
**Mitigations**: Group chat visibility accepted by design (§11 — community storage is channel-keyed, not user-keyed). DMs E2E encrypted (§5-§6). Fake key presentation mitigated by verification phrases (§9) + key pinning + key change warnings.

### 14.2 Compromised DM Relay

**Impact**: Metadata (sender/recipient/timing), stored ciphertext, could substitute pre-keys.
**Mitigations**: Content E2E encrypted (§5-§6). Pre-key substitution → hybrid signature verification fails (requires breaking both Ed25519 and ML-DSA; §3.2, §5.3).

### 14.3 Harvest-Now-Decrypt-Later

**Impact**: Recorded ciphertext held for future quantum computer.
**Mitigations**: X-Wing ML-KEM-768 protects session keys (§8). ML-DSA-65 protects signature integrity (§3).

**What a CRQC breaking X25519 alone cannot do**: X-Wing combines X25519 and ML-KEM-768 with a SHA3-256 combiner (§8). Breaking X25519 yields `ss_X` but not `ss_M`. The session key is `SHA3-256(ss_M || ss_X || ct_X || pk_X || label)` — an attacker who knows `ss_X` but not `ss_M` cannot recover the session key. A classical quantum computer (CRQC) capable of Shor's algorithm against X25519 gains nothing unless ML-KEM-768 is simultaneously broken. The harvest-now-decrypt-later threat is neutralized for session keys as long as ML-KEM-768 remains secure. **What is at risk**: pre-key bundle signatures (if ML-DSA-65 is broken) and the X25519 component of initial session key material, which contributes to the hybrid combiner's IND-CCA2 security claim but not to security when ML-KEM-768 is intact.

### 14.4 Identity Key Compromise

**Can**: Impersonate user, sign fake pre-keys (§5.3), authenticate as user (§4).
**Cannot**: Decrypt past or current sessions with IK alone (also requires SPK private key — see §5.6). Decrypt current sessions (needs ratchet keys, §6). Impersonate others to compromised user.
**Recovery**: New identity keypair. Contacts re-verify phrases (§9).

**IK + SPK capability window is bounded by the SPK retention policy**: A combined IK-and-SPK compromise recovers session keys only while `sk_SPK` is retained. SPKs are rotated every 7 days and the private key is deleted 30 days after rotation (§10.2, Appendix B). After `sk_SPK` deletion, even combined IK + SPK capability cannot recover that session's key — `ss_spk` is no longer computable. The attacker's window is at most 37 days from SPK generation (7-day rotation interval + 30-day retention window). Sessions established more than 37 days before the compromise with no SPK re-use are retrospectively safe.

### 14.5 First Contact (TOFU)

On first contact without prior keys, mutual auth not guaranteed. Same as Signal/SSH. See §5.6.

**Verification phrase birthday resistance (~2^45)**: Verification phrases (§9) provide a partial mitigation for TOFU key substitution, but their birthday resistance is limited to approximately 2^45 SHA3-256 operations (§9.4). A well-resourced attacker who controls key generation at scale can generate ~2^45 identity keys and, by the birthday paradox, find two that produce the same verification phrase when paired with a given victim key — substituting the colliding key passes the out-of-band check. For most threat models this is out of reach, but the limitation is relevant for environments with state-level adversaries. Applications with high-threat requirements SHOULD supplement verification phrase comparison with full 32-byte fingerprint comparison (64 lowercase hex characters, §2.1), which provides ~256-bit second-preimage resistance against key substitution. See §9.4 for the full collision analysis.

### 14.6 Ratchet State Desynchronization

Counter-mode derivation eliminates stateful chain advancement, the primary historical source of desynchronization. Session reset (§6.10) recovers at cost of in-flight messages.

### 14.7 Header Tampering

All header fields bound into AEAD AAD (§7.3-7.4). Tampering → AEAD failure. Prevents state poisoning.

### 14.8 Storage Blob Relocation

Channel and segment IDs in storage AAD (§11.4). Blobs cannot be moved.

### 14.8a Ratchet State Blob Substitution

**Impact**: An attacker with write access to persisted ratchet state blobs can substitute an older blob (replay attack) or a blob from a different session (session confusion), potentially recovering old messages or inducing key reuse.

**Substitution of an older blob**: Reloading a stale ratchet blob rolls back `send_count`, causing nonce reuse: the next encrypted message reuses a counter that was already used in the current epoch, producing a ciphertext under the same `(key, nonce)` pair as a previously sent message. AEAD nonce reuse with the same key recovers the XOR of the two plaintexts — a catastrophic confidentiality failure. Mitigations: per §6.8 Caller Obligation 2, callers MUST store the last-known epoch (`new_epoch - 1`) and pass it to `from_bytes_with_min_epoch` on reload. Any blob with `epoch ≤ min_epoch` is rejected with `InvalidData`. Callers who use `from_bytes` (no min_epoch) instead of `from_bytes_with_min_epoch` — or who store the min_epoch value in the same write-accessible store as the blob — have no protection against blob rollback.

**Substitution of a different session's blob**: A blob from a different session fails immediately at AAD reconstruction — the `sender_fp` and `recipient_fp` embedded in the ratchet state's AAD scheme (§6.8) will not match the expected values for this session, causing `AeadFailed` before any ratchet state is loaded. No cross-session confusion is possible without breaking the ratchet AEAD.

**Countermeasures are documented in §6.8** (anti-rollback epoch guard, Caller Obligation 2) but are not automatically enforced — they require explicit caller action. Application authors and binding authors MUST implement the epoch-store pattern. See §6.8 for the full caller obligation list.

### 14.9 Pre-Key Exhaustion

Per-source-per-target rate limiting. Sessions without OPK secure with reduced initial FS.

**Reduced initial FS — concrete window**: When no OPK is used, the session's initial forward secrecy is bounded by the SPK's lifetime. SPKs are rotated weekly and retained for 30 days after rotation (Appendix B). Therefore, for an OPK-absent session, an attacker who later obtains the SPK private key (before it is deleted at 30 days post-rotation) can recover the session's initial shared secrets. The forward secrecy window is up to 30 days from session initiation — not the one-time, delete-on-use guarantee that OPK provides. For OPK-present sessions, deleting `sk_OPK` immediately after `receive_session` terminates the forward secrecy vulnerability window at that point, independent of SPK lifetime. Implementers calibrating OPK replenishment thresholds and the rate-limiting policy should note that OPK exhaustion degrades forward secrecy from "delete-on-use" to "30-day window," not to "no forward secrecy" — the SPK still provides forward secrecy after its private key is deleted.

### 14.10 Metadata

Relay knows sender/recipient/timing. No IP logging. Tor/VPN for elevated threats. DM padding is mandatory (Protocol §15.1); community padding is optional (Protocol §15.2).

### 14.11 Forced Session Reset

**What an adversary gains from a forced reset**: Denial of service — the session's in-flight messages become permanently undecryptable (the ratchet state is zeroized) and both parties must establish a new session via LO-KEX. The adversary learns nothing new: the post-reset state is all-zeros with no key material remaining.

**Forward secrecy after reset**: Reset does NOT provide retroactive forward secrecy for pre-reset messages. Messages encrypted before the reset remain at risk if the pre-reset epoch was already compromised. Reset terminates an active session; it does not erase the adversary's copy of previously captured ciphertext.

**What a forced reset gives an attacker**: Forcing a reset requires the attacker to produce a ratchet state inconsistency that triggers `§6.9` recommendation 4 (unrecoverable decryption failure → call `reset()`). An attacker who can inject malformed ciphertexts can trigger repeated resets, denying service (all in-flight messages permanently lost per reset). This is no worse than the baseline capability of dropping messages — message suppression already prevents delivery — but repeated resets additionally force LO-KEX re-establishment overhead.

**Mutual reset prerequisite**: A reset by one party does not automatically reset the other's state. For the conversation to resume, both parties must independently detect the desynchronization (e.g., via application-layer re-key request) and perform new LO-KEX exchanges. An asymmetric reset — one party resets, the other does not — produces permanent desynchronization with no error distinguishable from transport loss.

### 14.12 KEM Ratchet — Single-Sided Randomness

LO-Ratchet's KEM ratchet differs from the Double Ratchet's DH ratchet in a fundamental way: only the encapsulator (new sender) contributes fresh randomness per step. In a DH ratchet, both parties contribute private key material to the shared secret. In a KEM ratchet, the decapsulator's contribution is their existing ratchet public key from a previous step.

**Implication**: If the encapsulator's RNG is compromised during a KEM ratchet step, that step does not advance forward secrecy. However, an RNG failure is catastrophic regardless — ephemeral keys, nonces, and all security-critical random values generated during the failure window are equally compromised. The ratchet's single-sided randomness is the least of the problems. Mitigating it would require bidirectional KEM per ratchet step (mandatory round-trip, doubled ciphertext) — costs that address a scenario already catastrophic for independent reasons.

This property is inherent to all KEM-based ratchets, not specific to LO-Ratchet. Mitigation: use the OS CSPRNG exclusively (`getrandom`).

### 14.13 Header Size Side Channel

**KEM-ratchet-step headers are observably larger than same-chain headers.** A ratchet header that includes a KEM ciphertext (`has_kem_ct = 0x01`) encodes to exactly 2,347 bytes on the wire (§7.4, Appendix C). A same-chain header (no KEM ciphertext, `has_kem_ct = 0x00`) encodes to exactly 1,225 bytes. The exact 1,122-byte difference is directly observable by any passive network adversary, regardless of transport encryption (the header is inside the encrypted channel, but the message size is observable as a traffic feature).

**What this leaks**: A passive adversary observing message sizes can infer when a party changes send direction (the encapsulating party's header grows by ~1,122 bytes). In a typical DM exchange, this reveals the alternating communication pattern — who initiated each new "round" of the conversation. This does not reveal message content or timing of individual messages within a round, but it does reveal the coarse structure of who speaks next after a silence.

**Normative position**: This is accepted leakage. LO's threat model (§14.10) acknowledges that relay operators observe communication metadata (sender, recipient, timing). Header size is a metadata feature visible at the same layer as message timing and count. Mitigating it would require padding all headers to a fixed size (2,347 bytes), adding ~1,122 bytes of overhead to every non-ratchet-step message — approximately doubling header overhead for typical high-frequency exchanges. The security benefit (hiding direction changes) is low: direction changes are correlated with reply events, which are already inferable from timing alone. Implementations MUST NOT treat this as a bug; this is a documented, accepted property.

**Transport-layer mitigation (optional)**: Transports that pad traffic to fixed-size cells (e.g., QUIC with datagram padding, LO's Protocol §15.1 DM padding) may partially obscure this difference. This is a transport-layer concern, not a cryptographic one.

### 14.14 Version Downgrade Policy

LO uses a hard-fail version policy. `verify_bundle` rejects any `crypto_version` other than the currently supported version (`"lo-crypto-v1"`). There is no version negotiation, no silent fallback, and no "choose best supported" logic. An unrecognized version is treated as a malformed bundle — the session is aborted and the user is warned.

This eliminates downgrade attacks by design: an attacker who modifies `crypto_version` in a relayed bundle causes rejection, not degraded security. The `crypto_version` field is not signed (§5.3), but tampering with it produces the same outcome as dropping the bundle entirely — the attacker's best outcome is message suppression (denial of service), not weakened cryptography, which is no worse than the Dolev-Yao baseline capability of dropping messages.

**Migration window downgrade risk** *(forward-looking — no v2 is currently defined; these are design requirements for a future migration window)*: The above guarantee holds only when a single version is in operation. During a v1→v2 migration window (where both versions are accepted), the `crypto_version` field is not signed and a network attacker relaying a bundle can substitute `"lo-crypto-v2"` with `"lo-crypto-v1"` — causing a v2-capable initiator connecting to a v2 peer to silently negotiate v1 instead. Unlike the single-version case, this substitution does NOT cause rejection (v1 is still accepted during the window), so the attacker's outcome is not message suppression but downgrade. **Requirement for v2 deployment**: The v2 pre-key bundle MUST sign the `crypto_version` field to prevent this substitution. This is a known gap in the current single-version design (§5.3 explicitly excludes `crypto_version` from the SPK signature); it MUST be corrected before deploying a migration window. Alternatively, bundle integrity can be protected at the transport layer (e.g., QUIC with server-authenticated certificate + pinning), ensuring that relay substitution is detectable. **The v2 pre-key bundle format — including the signing message structure, wire layout, and migration mechanism — is out of scope for this specification.** Implementers MUST NOT attempt to deploy a v2 migration window based solely on this note. The v2 format will be defined in a future version of this spec; v1 and any future v2 are disjoint wire formats with no backward-compatible relationship defined here.

*(The following describes the intended migration mechanism — no v2 protocol is currently defined; no negotiation infrastructure need be built until v2 is specified.)* Future version transitions (e.g., `lo-crypto-v2`) will support exactly two versions during a migration window. The older version will be removed in a subsequent release. At no point will more than two versions be accepted concurrently. **Migration mechanism**: The initiator reads `crypto_version` from the recipient's pre-key bundle and uses the highest mutually supported version. There is no separate negotiation handshake — version selection is implicit in the bundle. During a v1→v2 migration window, a v2-capable initiator connecting to a v1 peer (whose bundle advertises `"lo-crypto-v1"`) uses v1. A v2-capable initiator connecting to a v2 peer uses v2. A v1-only initiator connecting to a v2-only peer fails at `verify_bundle` (unrecognized version). The recipient's bundle is the sole version signal; the initiator MUST NOT exceed the recipient's advertised version.

**Dual role of `crypto_version`**: The `"lo-crypto-v1"` string serves two independent purposes: (1) a wire field in pre-key bundles and session init, triggering hard-fail version rejection in `verify_bundle` and `decode_session_init`; and (2) a KDF domain separator embedded in the HKDF `info` (§5.4 Step 4), binding the session key derivation to the protocol version. A v2 migration requires changing both, for different reasons — the wire field for compatibility gating, the KDF label for cryptographic domain separation. Changing only the wire field would produce a version that hard-fails bundle verification but would derive the same session keys as v1 if it somehow bypassed the check. Changing only the KDF label would silently produce incompatible keys while the wire field still accepts the old version. Both must change atomically.

**KDF label mismatch is undetectable at session establishment.** When a version mismatch causes the initiator and responder to derive session keys using different `crypto_version` strings in the HKDF `info`, `receive_session` succeeds and `init_alice`/`init_bob` initialize without error — the divergent keys are not compared. The first observable symptom is `AeadFailed` at `decrypt_first_message`, with no diagnostic distinguishing "mismatched crypto_version in KDF" from "corrupted ciphertext" or "wrong session keys." An implementer of the migration window who mismatches the KDF label while matching the wire field will see what appears to be random AEAD failures with no obvious cause. The safe verification: after `receive_session`, compare the negotiated `crypto_version` from the `SessionInit` against both parties' KDF labels before proceeding.

**Independent versioning axes**: `crypto_version` (e.g., `"lo-crypto-v1"`) governs session establishment (§5) — it determines KEM algorithms, HKDF labels, and wire formats for the key exchange. The ratchet blob version byte (§6.8, currently `0x01`) governs ratchet state serialization and is independent. A new optional field in the ratchet blob increments only the blob version, not the protocol version. A `lo-crypto-v2` transition would require, at minimum, a new `crypto_version` string and updated HKDF labels; the blob version can remain `0x01` if the ratchet format is unchanged. Similarly, the streaming AEAD header version (§15.2, currently `0x01`) and storage blob format are separate versioning axes.

### 14.15 Non-Deniability

LO-KEX and LO-Ratchet do not provide deniability. Any message ciphertext and ratchet header can be cryptographically attributed to the sender: the `sender_sig` in §5.4 Step 6 is a hybrid signature (Ed25519 + ML-DSA-65) binding Alice's long-term identity key to the session init, and the AAD scheme (§6.5) binds each ratchet message to both parties' fingerprints. An adversary who obtains a session transcript can verify that the session was established with Alice's identity key. This is intentional — LO's threat model prioritizes verifiable authenticated channels over offline deniability (Signal's approach). Applications requiring offline deniability (e.g., protection against coerced evidence disclosure) MUST use an additional repudiability layer — such as omitting long-term signatures from stored transcripts, using per-session ephemeral signing keys without long-term key binding, or employing a deniable symmetric-key scheme for message bodies — and should not rely on soliton alone.

The §5.6 brief paragraph on deniability refers specifically to the short-term deniability window provided by the ephemeral KEM ciphertext: during session establishment, an observer who does not hold Alice's identity key cannot confirm authorship of the session init ciphertext (since Alice's `EK_sk` is ephemeral). However, once `sender_sig` is verified and the session is established, deniability is lost — the long-term identity binding is irrevocable.

### 14.16 Streaming AEAD and Ratchet Key Exposure

Per-stream random keys (§15.1) limit the blast radius of a ratchet epoch compromise. An adversary who compromises a ratchet epoch key can recover stream keys that transited that epoch — the stream key is transmitted inside a ratchet-encrypted message alongside stream metadata, so the epoch key decrypts the message and recovers the stream key. However, only streams whose keys were transmitted during the compromised epoch are affected; streams whose keys transited different epochs remain protected.

Batching multiple stream keys in a single ratchet message multiplies exposure — a single epoch compromise recovers all batched keys. The recommended pattern: one stream key per ratchet message. For streams spanning many chunks (large file transfers), the long-lived stream key's exposure window equals the ratchet epoch during which it was transmitted, regardless of the stream's total duration or chunk count.

**Random-access-only callers of `decrypt_chunk_at` have no replay protection**: Applications that use `decrypt_chunk_at` exclusively — never the sequential `decrypt_chunk` — have zero anti-replay protection. In sequential mode, presenting a previously-decrypted chunk at the same index incidentally fails because `next_index` has already advanced past that position (AEAD runs against the wrong index-derived nonce). In random-access mode, presenting the same `(index, chunk)` pair a second time succeeds identically — `decrypt_chunk_at` is stateless and has no memory of prior decryptions. The only cryptographic freshness binding is the per-stream CSPRNG-unique key (§15.1); within a single stream, any valid `(key, index, chunk)` triple is always decryptable. Applications building file delivery, random-access video streaming, or any repeat-query-capable API on top of `decrypt_chunk_at` MUST track successfully-decrypted indices at the application layer or arrange for single-use key material (§15.1). This threat does not require breaking the ratchet — it only requires access to the `(key, chunk)` material already held by the application. See §15.12 "Chunk replay" for the behavioral details.

**Out-of-band key delivery (distinct threat from epoch compromise)**: If a stream key is delivered via an unencrypted or weakly-authenticated channel — for example, via plaintext HTTP, an unauthenticated metadata API, or a push notification service with no end-to-end encryption — the stream is protected only by transport security on the key delivery path, not by the ratchet. An adversary who intercepts the key delivery (e.g., via MITM on the key delivery channel, server compromise, or push notification interception) can decrypt all stream chunks even if the ratchet session itself is fully uncompromised. This is a distinct threat from the ratchet epoch compromise scenario above: epoch compromise requires breaking the ratchet's cryptographic properties; out-of-band key exposure requires only access to the unprotected delivery channel. The mitigation is the same pattern specified in §15.1: always deliver stream keys inside ratchet-encrypted messages, never via separate channels. Any deviation from this pattern removes the ratchet's protection for the affected streams.

**Streaming AEAD format version bumps use the version byte, not a new label**: The streaming header version byte (`0x01`) is included in every per-chunk AAD (§15.4), which provides cryptographic domain separation between format versions. A v2 streaming format MUST increment the version byte (reader sees `0x02` → `UnsupportedVersion`, or negotiates accordingly). Adding a new label string (e.g., `"lo-stream-v2"`) would be redundant — the version byte in AAD already provides the domain separation. Conversely, a format change that does NOT increment the version byte but uses a different label produces ciphertexts that are undistinguishable at the header level from v1, causing opaque AEAD failures rather than clean `UnsupportedVersion` errors.

### 14.17 Post-Compromise Security Healing Boundary

LO-Ratchet provides post-compromise security (PCS) — recovery of message confidentiality after a transient key compromise, without requiring a new session. The healing mechanics and boundary conditions are specified in §6.13. Key points for formal models:

- **Healing event (initial step)**: PCS healing begins at the KEM *decapsulation* step on the previously-compromised party — specifically, when the compromised party receives and successfully decapsulates a message containing a new `ratchet_pk` from the uncompromised peer. After this step, new-epoch messages are immediately protected by a `recv_epoch_key` unknown to the attacker. The uncompromised party's encapsulation step (which sends the new KEM ciphertext) does not itself heal the compromised party. **Full healing requires a second KEM ratchet step** — after the first decapsulation, `prev_recv_epoch_key` still holds the compromised epoch key (now as the previous-epoch backup). An attacker with the compromised key can still decrypt late-arriving previous-epoch messages until the second KEM ratchet step discards `prev_recv_epoch_key`. See §6.13 for `FullyHealed(session, t₄)` — the formal two-step healing definition.
- **Why decapsulation is the boundary**: Before decapsulation, the compromised party still holds old epoch keys derivable from known state. After decapsulation, the new `root_key` and `send_epoch_key` derive from a KEM shared secret that was never exposed — the attacker who held the old epoch key cannot reproduce the new epoch keys. The healing epoch therefore begins at the decapsulation event on the previously-compromised side.
- **One-directional streams do not heal**: If the compromised party never receives a message from the uncompromised peer (and therefore never decapsulates a new KEM ciphertext), no KEM ratchet step occurs and PCS healing never happens. See §6.13 for the full list of PCS boundary conditions and exclusions.
- **Known weakening — `prev_recv_epoch_key` survives the first KEM ratchet step**: `Corrupt(RatchetState)` at any point after the *first* KEM ratchet step but before the *second* still exposes `prev_recv_epoch_key` — the previous epoch's key is retained for one grace period (§6.6). An attacker who compromises the session state between the first and second KEM ratchet steps can decrypt all messages from the prior epoch using `prev_recv_epoch_key`, even though new-epoch messages (protected by the freshly-derived `recv_epoch_key`) are safe. This corresponds to Abstract.md Theorem 4 / Lemma 4b: `FullyHealed(session, t)` requires `t` to be after the *second* KEM ratchet step, not the first. This is a deliberate design tradeoff — retaining `prev_recv_epoch_key` for one step enables decryption of late-arriving previous-epoch messages without storing per-message keys. The `two_kem_ratchets_expire_old_epoch` integration test is the empirical evidence that this two-step behavior is intentional and tested, not an oversight. Formal models of the PCS property MUST use the `FullyHealed` predicate from §6.13 (which captures the two-step boundary), not a simpler "healed after one ratchet step" approximation.
- **Formal modelers**: A PCS lemma derived from §14 without consulting §6.13 risks placing the healing event at the wrong point in the protocol transcript. The normative PCS specification is §6.13; §14 provides the threat-model framing only.

### 14.18 New-Epoch Path as Unauthenticated KEM Decapsulation Oracle

Every incoming ratchet message whose `header.ratchet_pk` does not match `recv_ratchet_pk` or `prev_recv_ratchet_pk` takes the new-epoch path and triggers a full X-Wing decapsulation (dominated by ML-KEM-768). ML-KEM implicit rejection (§8.4) means this operation never returns an error for invalid inputs — a mismatched or maliciously crafted `ratchet_pk` produces a pseudorandom shared secret, which derives a wrong `recv_epoch_key`, which causes AEAD failure, which triggers snapshot rollback. The session is unharmed, but the decapsulation was performed unconditionally.

**Performance context**: In the pure-Rust implementation on modern 64-bit hardware, a full new-epoch path execution (X-Wing decapsulation + KDF + AEAD failure + rollback) completes well under 10 µs per message. This was measured by the soliton fuzzer sustaining over 190 000 executions per second per core. At this rate, a sustained injection of 190 000 crafted messages per second consumes at most one CPU core — no more than any other high-throughput CPU-bound workload. On 64-bit hardware this is not a meaningful denial-of-service vector.

**Accepted tradeoff**: The epoch routing decision (§6.6) relies solely on comparing `header.ratchet_pk` — a cleartext field — against stored public keys. No authentication occurs before decapsulation. Deferring decapsulation to post-authentication would require knowing the correct epoch key before AEAD runs, creating a circular dependency. This means the KEM decapsulation step is inherently unauthenticated. The cleartext `ratchet_pk` field already reveals epoch transitions to any observer, so the timing of the new-epoch path is not a novel information leak — it is observable from the public key value alone.

**Residual concern**: The performance characterization above applies to modern 64-bit hardware. Deployments on severely resource-constrained targets (e.g., hobby-grade 32-bit microcontrollers) where ML-KEM-768 decapsulation is orders of magnitude slower should evaluate whether transport-layer sender authentication is appropriate before messages reach the ratchet layer. soliton does not target 32-bit platforms and offers no performance guarantees for them.

---

## §15 Streaming AEAD

Chunked authenticated encryption for large payloads (file transfer, attachments). Enables disk-to-disk encryption in fixed-size chunks without holding the full payload in memory. Inspired by the STREAM construction (Hoang, Reyhanitabar, Rogaway, Vizár, 2015) but uses counter-based nonce derivation for random-access decryption rather than ciphertext chaining.

### 15.1 Construction

Each stream uses a single caller-provided 32-byte key and a random 24-byte base nonce (generated from the OS CSPRNG). The key MUST be freshly generated from the OS CSPRNG for each stream — reusing a key across streams is catastrophic (see §15.12). The key is not managed by the library — key wrapping is the caller's responsibility (the standard pattern: generate a random 32-byte key, encrypt the stream, then encrypt the key in a ratchet message alongside the stream metadata). Deriving the streaming key deterministically from ratchet material is unsafe: ratchet compromise would propagate to all streaming keys derived from the compromised epoch, defeating the per-stream isolation that fresh randomness provides. Plaintext is split into 1 MiB chunks, each independently encrypted with XChaCha20-Poly1305 using a per-chunk nonce derived from the base nonce and chunk index.

**Security model**: The stream header (including `base_nonce`) is not secret — an adversary who knows the header and all ciphertexts but not the key cannot decrypt any chunk. The key is the sole secret; it is not contained in or recoverable from the header. Losing the key makes the stream permanently undecryptable.

**No KDF step**: The caller-provided key is used directly as the XChaCha20-Poly1305 key — there is no HKDF or other derivation step between the input key and the AEAD key. No KDF is needed because the caller is required to supply a fresh 256-bit CSPRNG key (§15.1 "key MUST be freshly generated from the OS CSPRNG"): a uniformly distributed 256-bit value already saturates XChaCha20-Poly1305's key entropy, so HKDF's extract phase adds no security benefit. A reimplementer who adds a KDF step (e.g., `HKDF-SHA3-256(key, base_nonce, "stream")`) produces incompatible ciphertext that the reference implementation cannot decrypt.

**All-zero key policy**: Unlike the storage keyring (§11.6), which explicitly rejects all-zero keys via constant-time check, the streaming layer does **not** validate that the caller-provided key is non-zero. This is a caller obligation. The storage layer's active guard exists because keys are long-lived and stored in a keyring managed by the library; the streaming layer's keys are ephemeral, caller-provided, and used once — validating them would shift a caller responsibility into a layer that cannot meaningfully enforce it (the caller could pass any weak key, not just all-zeros). A reimplementer who adds an all-zero guard to the streaming layer for "consistency" with storage creates a behavioral divergence from the specification.

**Caller key zeroization**: The library copies the key into the opaque encryptor/decryptor handle on initialization. The caller's original key buffer is not zeroed by the library. After calling `soliton_stream_encrypt_init` or `soliton_stream_decrypt_init`, the caller MUST zeroize their copy of the key via `soliton_zeroize` (CAPI) or `Zeroizing` wrapper (Rust). The handle's internal copy is zeroized automatically when the handle is freed.

### 15.2 Wire Format

> **Buffer-allocation quick reference**: All streaming sizes (`STREAM_HEADER_SIZE`, `CHUNK_SIZE`, `STREAM_CHUNK_OVERHEAD`, `STREAM_ZSTD_OVERHEAD`, `STREAM_ENCRYPT_MAX`, `STREAM_CHUNK_STRIDE`) are defined with their derivations in Appendix A. A consolidated buffer-sizing summary table for binding authors is in Appendix B.

```
Stream = Header || Chunk₀ || Chunk₁ || ... || ChunkN

Header (26 bytes):
  version       (1)     — stream format version (0x01)
  flags         (1)     — bit 0: compression (0 = none, 1 = zstd), bits 1-7: reserved (must be zero)
  base_nonce    (24)    — random, unique per stream

Chunk:
  tag_byte      (1)     — 0x00 = non-final, 0x01 = final
  ciphertext    (variable)   — AEAD output: encrypted plaintext + 16-byte Poly1305 tag
```

**tag_byte interpretation**: On decrypt, only the value `0x01` is treated as final. Any other value (including `0x00` and hypothetical future values) is treated as non-final. Implementations MUST NOT reject unknown tag_byte values pre-AEAD — the tag_byte is authenticated via inclusion in both the nonce (§15.3) and the AAD (§15.4), so a chunk with a hypothetical future tag_byte `0x02` from a newer writer would fail AEAD on any older reader (the nonce and AAD would differ from what the encryptor used). The "lenient decoding" means: don't add a pre-AEAD guard that rejects non-0x00/0x01 values, because AEAD already provides the rejection. On encrypt, only `0x00` and `0x01` are produced.

The library does not embed length prefixes between chunks. Chunk delimitation is a transport/storage concern — different transports (QUIC, WebSocket, HTTP/2) and storage backends (object stores, flat files) have different framing mechanisms.

**Compressed stream chunk framing is NOT specified as an interoperability format**: This spec does not define a normative chunk-length framing format for compressed streams. The compressed streaming feature (`flags & 0x01 == 1`) is a **single-implementation feature**: it is designed to be written and read by the same soliton implementation (or a reimplementation that derives its own chunk framing scheme from this spec). The wire format specifies the AEAD construction, the nonce derivation, and the header layout — but NOT how the variable-length compressed ciphertext chunks are delimited on the wire when transported across a byte stream. A reimplementer who builds an independent implementation targeting cross-implementation interoperability for compressed streams MUST define and negotiate a chunk framing mechanism out-of-band (e.g., HTTP chunked encoding, a length-prefix layer, or an out-of-band chunk index). Without this, two independent compressed-stream implementations will fail to interoperate at the transport level even though their AEAD layer is identical.

**Compressed streams are NOT self-delimiting**: For a compressed stream (`flags & 0x01 == 1`), non-final chunks have variable ciphertext size (1 to `CHUNK_SIZE + STREAM_ZSTD_OVERHEAD + 16` bytes, depending on content and compression ratio). There is no fixed stride — the transport MUST provide per-chunk lengths (e.g., HTTP chunked encoding, a length-prefix framing layer, or an index built during encryption). A reimplementer who applies the 1,048,593-byte fixed-stride read algorithm to a compressed stream will misalign at the first chunk boundary, causing all subsequent AEAD decryptions to fail.

**Recommended framing for compressed streams**: When transporting a compressed stream over a raw byte channel (file, TCP socket, UNIX pipe), implementers SHOULD prefix each chunk's ciphertext with a 4-byte big-endian `u32` length field giving the ciphertext byte count (not including the `tag_byte` or the length prefix itself). The on-wire layout per chunk becomes `tag_byte (1) || ciphertext_len (4, BE u32) || ciphertext (ciphertext_len bytes)`. `ciphertext_len` is the AEAD output byte count: `len(compressed_plaintext) + 16` — the 16-byte Poly1305 authentication tag is part of the ciphertext and is included in `ciphertext_len`, not a separate field. For an empty final chunk (0 bytes of compressed plaintext), `ciphertext_len = 16`. A reimplementer who excludes the 16-byte Poly1305 tag from `ciphertext_len` (treating it as overhead outside the count) produces length values 16 bytes short per chunk, causing the reader's framing to misalign immediately after the first chunk. This framing is simple, has zero overhead relative to AEAD (the ciphertext already contains the Poly1305 tag), and enables a reader to allocate exactly the right buffer for each chunk without look-ahead. Implementations that deviate from this framing for compressed streams will be silently incompatible with conforming implementations at the transport layer even though their AEAD output is identical. Uncompressed streams do not need this framing — the fixed stride already provides delimitation (§15.2 above).

**Uncompressed stream sequential read algorithm**: For an uncompressed stream (`flags & 0x01 == 0`), every non-final chunk is exactly 1,048,593 bytes on the wire (1 `tag_byte` + 1,048,576 plaintext bytes encrypted to 1,048,576 + 16 AEAD ciphertext bytes = 1,048,593 total). A sequential reader reads fixed-size chunks until it encounters a chunk with `tag_byte = 0x01` (final). Because the wire size is fully determined by `CHUNK_SIZE` (1,048,576 bytes, see §A Constants), a streaming implementation can read exactly 1,048,593 bytes per non-final chunk without a length prefix — no look-ahead required. This derivation combines §15.2 (wire format) and §15.6 (chunk sizing); it is stated here to spare streaming-layer implementers from reconstructing it.

**Short final chunk from an unframed transport**: The final chunk has variable ciphertext size (17 bytes minimum — 1 `tag_byte` + 16 Poly1305 tag for empty plaintext — up to 1,048,593 bytes). When reading from an unframed byte stream (raw TCP, file read), the algorithm is: attempt to read 1,048,593 bytes; if the transport delivers fewer bytes (because it reached EOF or end-of-stream), those fewer bytes constitute the final chunk. The size shortfall is not an error — it is the signal that the final chunk has been received. A reimplementer who requires exactly 1,048,593 bytes for every chunk (including the final) will reject all non-MiB-boundary streams. Note: pre-framed transports (QUIC streams, WebSocket messages, HTTP chunked encoding) deliver chunks with explicit boundaries and do not exhibit this ambiguity.

**Minimum final chunk size and transport accumulation obligation**: A final chunk delivered with fewer than 17 bytes returns `AeadFailed` — the 16-byte Poly1305 tag plus 1 `tag_byte` is the irreducible minimum (encrypting zero plaintext bytes produces a 16-byte tag with no ciphertext). Transport implementations MUST accumulate bytes until either 1,048,593 bytes are in hand (a full non-final stride) or a clean stream EOF signal before presenting a chunk to the decryption layer. Presenting a partial chunk (e.g., 8 bytes of a truncated stream) to `decrypt_chunk` returns `AeadFailed` with no indication of whether the data was truncated in transit or the key/nonce was wrong — the AEAD cannot distinguish these cases.

### 15.3 Nonce Derivation

Per-chunk nonce is derived by XORing a 24-byte mask into the base nonce:

```
mask = chunk_index (8 bytes, big-endian u64)
    || tag_byte   (1 byte: 0x00 = non-final, 0x01 = final)
    || 0x00 * 15  (15 zero bytes, padding)

chunk_nonce = base_nonce XOR mask
```

| Bytes | Mask content | Purpose |
|-------|-------------|---------|
| 0-7 | `chunk_index` (u64 BE) | Distinct nonce per chunk position |
| 8 | `tag_byte` | Distinct nonce for final vs non-final at same index |
| 9-23 | `0x00` | No effect on base nonce (XOR with zero is identity) |

These bytes MUST be zero in the mask. A reimplementer who places additional data in bytes 9-23 of the mask produces nonces incompatible with any conforming implementation.

**Injectivity**: For two chunks `(i₁, t₁)` and `(i₂, t₂)`, `mask₁ = mask₂` iff `i₁ = i₂` and `t₁ = t₂`. Since XOR with a constant is a bijection, distinct `(index, tag_byte)` pairs always produce distinct nonces.

### 15.4 AAD Construction

```
aad = "lo-stream-v1"              // 12 bytes, domain label
   || version                      // 1 byte
   || flags                        // 1 byte
   || base_nonce                   // 24 bytes
   || chunk_index                  // 8 bytes, big-endian u64
   || tag_byte                     // 1 byte (0x00 or 0x01)
   || caller_aad                   // variable, caller-supplied context
```

Total AAD: `47 + len(caller_aad)` bytes. `caller_aad` is optional application-level context (file ID, channel ID) provided once at stream init and constant across all chunks. **`caller_aad` is not treated as secret material.** The library stores it in a plain buffer without `Zeroizing` and does not zeroize it on handle destruction. Callers MUST NOT pass sensitive values (private paths, internal batch IDs, authentication tokens) as `caller_aad` — use only public or non-sensitive identifiers. It is the terminal field with no length prefix — the first 47 bytes have a fixed layout, so `caller_aad` is unambiguously everything after byte 46. Omitting the length prefix is intentional, not an oversight. A reimplementer who adds a 2-byte BE length prefix for consistency with other length-prefixed fields in the protocol (e.g., §7.4's session init encoding) produces different AAD bytes and AEAD authentication failure. The implementation captures `caller_aad` at init time and reuses the same bytes for every chunk. A reimplementer constructing per-chunk AAD manually MUST use identical `caller_aad` bytes for every chunk in the stream — varying the caller portion produces AEAD authentication failure on decrypt with no diagnostic indicating which field changed.

**`caller_aad` is a raw byte string — C callers MUST NOT use `strlen()` to derive its length**: `caller_aad` may contain null bytes (e.g., a binary UUID, a binary file identifier, an all-zero channel ID). C binding authors who pass `strlen(aad)` as `aad_len` silently truncate `caller_aad` at the first null byte, producing wrong AAD and `AeadFailed` on every `decrypt_chunk` call. Always pass the explicit byte count: `soliton_stream_decrypt_init(key, key_len, header, header_len, aad, aad_len, out)` where `aad_len = sizeof(aad_array)` or a separately-tracked length, never `strlen(aad)`.

**`caller_aad` mismatch is not detected at `stream_decrypt_init`**: `stream_decrypt_init` accepts any `caller_aad` bytes without checking them against the stream's encrypted header (the header contains only version, flags, and base_nonce — there is no stored hash or commitment of `caller_aad`). A mismatch between the encrypt-side and decrypt-side `caller_aad` values first manifests as `AeadFailed` on the first `decrypt_chunk` call. Callers who supply a wrong `caller_aad` to `stream_decrypt_init` will always receive `AeadFailed` from `decrypt_chunk`, not from `decrypt_init`, with no indication at init time that the context binding is wrong.

| AAD component | Prevents |
|---------------|----------|
| `version` | Version downgrade |
| `flags` | Flag flipping (e.g., compression flag → skip decompression) |
| `chunk_index` | Chunk reordering |
| `base_nonce` | Cross-stream splicing |
| `tag_byte` | Truncation (stripping final marker) |
| `caller_aad` | Context confusion (file from channel X served as channel Y) |

**`caller_aad` size recommendation**: `caller_aad` is semantically a file ID, channel ID, or similar context identifier — typically a few bytes to a few hundred. There is no protocol-level size limit (the CAPI 256 MiB general input cap applies), but large values produce multiplicative work: every chunk's AEAD runs Poly1305 over `47 + len(caller_aad)` bytes of AAD. With a 256 MiB `caller_aad` and thousands of chunks, the aggregate AAD processing dominates total encryption time. Recommended maximum: 4096 bytes. Applications needing to bind larger context should hash it first (e.g., `SHA3-256(full_context)`) and pass the 32-byte digest as `caller_aad`.

### 15.5 Compression

Per-chunk zstd compression (Zstandard, RFC 8878), controlled by flags bit 0. When enabled, each chunk's plaintext is independently compressed before encryption. Empty plaintext (0-byte final chunk) bypasses compression regardless of the flag.

**"Non-empty" check applies to post-AEAD plaintext, not ciphertext**: The bypass condition for empty final chunks is checked on the plaintext after AEAD decryption, not on the raw ciphertext length. A 16-byte ciphertext (Poly1305 tag only, decrypting to 0 bytes of plaintext) is empty by this definition; a ciphertext whose decrypted content is 0 bytes after decompression is also empty. A reimplementer who checks `ciphertext.len() == 0` (before decryption) instead of `plaintext.len() == 0` (after decryption) will incorrectly attempt zstd decompression on a 0-byte buffer — resulting in a decompression error that collapses to `AeadFailed` (§15.7) with no diagnostic pointing to the misplaced check.

**`flags` is a stream-level constant, not a per-chunk value.** The `flags` byte is set once at stream initialization and appears identically in every chunk's AAD (§15.4) — including the final chunk, even when that chunk is empty and compression is bypassed. A reimplementer who interprets `flags` as "was this specific chunk compressed" and writes `0x00` for the empty final chunk when `compress = true` produces an AAD mismatch and AEAD failure on decrypt. The `flags` byte records the stream's compression *configuration*, not the per-chunk compression *outcome*.

Pipeline:
- **Encrypt** (compression enabled, non-empty): plaintext → zstd compress → AEAD encrypt → prepend tag_byte.
- **Encrypt** (compression disabled, or empty): plaintext → AEAD encrypt → prepend tag_byte.
- **Decrypt**: read tag_byte → AEAD decrypt → (if compressed and non-empty) zstd decompress → plaintext.

**Caller-visible buffer layout**: `encrypt_chunk` produces a single output buffer containing `tag_byte (1) || AEAD_ciphertext (plaintext_len + 16)`. The `tag_byte` is prepended and returned as part of the output — callers do NOT append it separately. `decrypt_chunk` expects the same layout as input: `tag_byte (1) || AEAD_ciphertext`. A reimplementer who returns only the AEAD ciphertext (without `tag_byte`) from `encrypt_chunk`, expecting the caller to prepend it, produces an API that is incompatible with the standard wire format and with CAPI callers who use the output buffer directly.

Compression level: Fastest (~1), matching `encrypt_blob`. Pure Rust via `ruzstd`. No dictionary (per-chunk independent, required for random access). Max decompressed size per chunk: `CHUNK_SIZE` (1 MiB).

**Compression oracle (CRIME/BREACH)**: When attacker-controlled content is mixed with secret data in the same chunk, the per-chunk compressed size leaks information about the secret via adaptive chosen-plaintext. An attacker who can influence the plaintext and observe chunk wire sizes can iteratively extract secrets by measuring compression ratios. Since chunks compress independently with no cross-chunk dictionary, this oracle is bounded to within a single chunk — an attacker who places controlled content in chunk 0 cannot learn anything about secrets in chunk 5. Callers who separate attacker-influenced data from secrets across chunk boundaries do not need to disable compression for the entire stream. Use `compress = false` only when attacker-influenced data and secrets coexist within the same chunk (e.g., a single chunk containing both a user-supplied filename and session metadata).

### 15.6 Chunk Sizing

**Non-final chunks**: plaintext MUST be exactly `CHUNK_SIZE` (1 MiB). Enforced on both encrypt and decrypt sides. The timing of the size check differs by compression mode, and this asymmetry is security-relevant:

- **Uncompressed**: The on-wire chunk is `tag_byte (1) || AEAD_ciphertext (CHUNK_SIZE + 16)` — total wire size `CHUNK_SIZE + 17` (= `CHUNK_SIZE + CHUNK_OVERHEAD`). After reading the `tag_byte` byte, the AEAD ciphertext size is deterministic (`CHUNK_SIZE + 16`). The decryptor checks the AEAD ciphertext length **pre-AEAD** (framing check, `InvalidData`) before attempting decryption. "Chunk wire length" in this context means the AEAD ciphertext portion (not counting the already-read `tag_byte`). A reimplementer who defers this check to post-AEAD wastes cycles decrypting malformed chunks.
- **Compressed**: ciphertext size is non-deterministic (compression ratio varies), so the plaintext-size check occurs **post-AEAD** after decrypt + decompress. The decompressed output must be exactly `CHUNK_SIZE` (not merely `≤ CHUNK_SIZE`) — both undersized and oversized decompressed non-final chunks are rejected as `AeadFailed` (post-auth error collapse per §15.7). Returning a distinct error (e.g., `InvalidData` or `DecompressionFailed`) for either size mismatch would create a post-AEAD size oracle. A reimplementer who checks compressed chunk sizes pre-AEAD creates an oracle: rejecting a chunk before authentication reveals that the size check (not the AEAD) failed, leaking information about the expected plaintext size. **No pre-AEAD ciphertext cap is applied** for compressed chunks at the streaming layer — the CAPI 256 MiB input cap (§13.2) provides the outer bound. A legitimate compressed chunk is at most `STREAM_ENCRYPT_MAX` (= `CHUNK_SIZE + ZSTD_OVERHEAD + CHUNK_OVERHEAD` = 1,048,849 bytes — the maximum CAPI output buffer for one encrypted chunk); without a tighter cap, a peer can force AEAD attempt on up to 256 MiB of ciphertext before authentication fails. This is intentional — any tighter pre-AEAD cap would create the same oracle it is designed to prevent. **Exception**: a cap of exactly `STREAM_ENCRYPT_MAX` (1,048,849 bytes) is safe — it eliminates only inputs no conforming encryptor could produce (the reference encryptor never outputs a compressed chunk exceeding `STREAM_ENCRYPT_MAX` bytes) and does not create an oracle about the compression ratio or the expected size of valid ciphertext. The oracle concern applies only to caps tighter than the maximum conforming encryptor output.

**Normative cap statement for compressed non-final chunk pre-AEAD**: Implementations MAY apply a pre-AEAD ciphertext size cap of exactly `STREAM_ENCRYPT_MAX` (1,048,849 bytes). Implementations MUST NOT apply a pre-AEAD cap below `STREAM_ENCRYPT_MAX` — a cap tighter than the maximum conforming encryptor output creates the oracle it is designed to prevent (it would reject valid ciphertexts from a conforming peer, causing `AeadFailed` for valid data and allowing timing-based oracle inference). The reference implementation applies no tighter cap than the outer 256 MiB CAPI bound. **An implementation applying the optional `STREAM_ENCRYPT_MAX` cap MUST return `InvalidLength` for ciphertext inputs exceeding that cap** — not `AeadFailed`. The pre-AEAD size check fires before any AEAD operation, so `InvalidLength` is the correct variant (the input exceeds the size constraint, not the authentication check). This is an acceptable, documented divergence from the reference: the reference returns `AeadFailed` for oversized inputs (the 256 MiB CAPI cap returns `InvalidLength`, but inputs between `STREAM_ENCRYPT_MAX` and 256 MiB proceed to AEAD which then fails). Callers testing against both implementations MUST handle either `InvalidLength` or `AeadFailed` for inputs in the `STREAM_ENCRYPT_MAX + 1` to `256 MiB` range.

Accepting undersized non-final chunks in either mode would allow malformed streams where chunk boundaries are shifted, corrupting random-access offset calculations.

**Encrypt-side non-final wrong-size is a caller bug — no library-level enforcement beyond the error return**: The `encrypt_chunk` function returns `InvalidData` when a non-final chunk's plaintext length ≠ `CHUNK_SIZE`. There is no additional internal guard that prevents the caller from ignoring the error and continuing to encrypt subsequent chunks — the error is informational. The streaming state is unchanged on `InvalidData` from wrong chunk size (§15.11 atomicity), so a caller who ignores the error and re-calls `encrypt_chunk` with a different size produces a stream with inconsistent chunk sizes. This is a caller programming error; the library cannot enforce correct behavior beyond the error return for the offending call. The distinct error (`InvalidData` not `AeadFailed`) ensures this is diagnosable — it fires before AEAD, so it is safe to expose the distinction without creating an oracle.

**Final chunk**: plaintext may be `0..=CHUNK_SIZE`. A final chunk exceeding `CHUNK_SIZE` is rejected as `InvalidData` (not `InvalidLength` — the type is correct (bytes of plaintext) but the value violates the chunk-size structural constraint; not `AeadFailed` — this is a pre-AEAD framing check on the plaintext length, not a post-authentication error). An empty file produces one final chunk (tag_byte + 16-byte AEAD tag = 17 bytes). Every valid stream has exactly one chunk with `tag_byte=0x01`.

**Compressed final chunk decompressing beyond CHUNK_SIZE**: A compressed final chunk that decompresses to more than `CHUNK_SIZE` bytes returns `AeadFailed` — the post-AEAD error collapse (§15.7) applies to all decompression-side size violations, including the final chunk. Reimplementers MUST NOT return a distinct error (`DecompressionFailed`, `InvalidData`) for this case — doing so creates an oracle distinguishing "AEAD passed, decompression size check failed" from "AEAD failed."

**Minimum valid stream**: 26 (header) + 17 (empty final chunk) = 43 bytes.

### 15.7 Error Oracle Collapse

Two categories of errors are collapsed to `AeadFailed` for oracle prevention, for different reasons:

- **Post-authentication errors** (decompression failure, size mismatch): collapsed to prevent a 1-bit oracle distinguishing "authentication succeeded but post-processing failed" from "authentication failed." These checks fire after AEAD succeeds, so distinguishing them from `AeadFailed` would confirm that authentication passed — leaking information about key correctness.

- **Pre-authentication header errors** (reserved flag bits): collapsed to prevent a 1-bit oracle distinguishing "unsupported flag combination" from "wrong key." Reserved-bit checks fire at `stream_decrypt_init`, before any chunk AEAD, so returning a distinct error would allow an attacker to distinguish "correct key with malformed header" from "wrong key" by probing the flag byte. This is a different oracle than the post-AEAD case but equally undesirable.

Pre-authentication checks on publicly visible fields (`UnsupportedVersion` for version byte, `InvalidData` for uncompressed chunk framing) do not create oracles because the checked values are visible to anyone who observes the header or chunk.

**Error origin table for stream initialization and decryption:**

| Error | Returned from | Phase |
|-------|--------------|-------|
| `UnsupportedVersion` | `stream_decrypt_init` | Header parsing — version byte checked at init, before any chunk |
| `AeadFailed` (reserved flag bits) | `stream_decrypt_init` | Header parsing — flag byte checked at init, before any chunk |
| `AeadFailed` (authentication failure) | `stream_decrypt_chunk` / `stream_decrypt_chunk_at` | Per-chunk AEAD |
| `AeadFailed` (decompression failure) | `stream_decrypt_chunk` / `stream_decrypt_chunk_at` | Post-AEAD (oracle collapse) |
| `AeadFailed` (size mismatch post-decompress) | `stream_decrypt_chunk` / `stream_decrypt_chunk_at` | Post-AEAD (oracle collapse) |
| `InvalidData` (wrong non-final chunk size, uncompressed) | `stream_decrypt_chunk` / `stream_decrypt_chunk_at` | Pre-AEAD framing (not oracle — checked value is public) |
| `AeadFailed` (chunk shorter than 17 bytes) | `stream_decrypt_chunk` / `stream_decrypt_chunk_at` | Pre-AEAD oracle collapse — 17 bytes is the minimum valid chunk (1 `tag_byte` + 16-byte Poly1305 tag with zero plaintext). Returning `InvalidData` for chunks shorter than 17 bytes would allow an attacker to distinguish "chunk too short to attempt AEAD" from "valid-length but wrong tag." The same oracle-collapse rationale as §12's undersize-ciphertext row applies here for the streaming layer. Reimplementers who add a pre-AEAD `if len(chunk) < 17: return InvalidData` guard violate this requirement. |
| `InvalidData` (oversized final chunk plaintext) | `stream_encrypt_chunk` only | Pre-AEAD framing (encrypt side only — plaintext size is known before AEAD on the encrypt path; the decrypt path has no pre-AEAD plaintext size check for the final chunk, because the decrypted size is unknown until AEAD succeeds) |
| `InvalidData` (post-finalization call) | `stream_decrypt_chunk` | State guard |
| `ChainExhausted` | `stream_decrypt_chunk` | Counter guard (sequential only) |

A reimplementer who places the version-byte check in the per-chunk path (returning `InvalidData` on each chunk that encounters an unexpected version) instead of in `stream_decrypt_init` will diverge from the specified error ordering — callers who check the return code of `stream_decrypt_init` expect to detect version mismatches before processing any chunks. **The version byte appears only in the header, not per-chunk** — a reimplementer checking version per-chunk is also structurally wrong (the version byte is not re-read from each chunk's data).

### 15.8 Version and Flags Handling

Version byte `0x01` is accepted; all other values rejected with `UnsupportedVersion` **at init time** (`stream_decrypt_init`), before any chunk is processed. Reserved flag bits (1-7) must be zero; non-zero reserved bits are also rejected **at init time** with `AeadFailed` (oracle collapse — attacker-controlled header field). Both checks fire during header parsing, not during the first chunk decrypt. A reimplementer who defers the reserved-bits check to per-chunk AEAD will observe different error ordering (the error appears on the first `decrypt_chunk` call rather than on `stream_decrypt_init`), producing divergent behavior in error-ordering tests.

**Asymmetry rationale — why version gets `UnsupportedVersion` but flags get `AeadFailed`**: The version byte is a public implementation-capability indicator. Returning `UnsupportedVersion` for an unknown version enables the caller to distinguish "library version too old, upgrade required" from "authentication failure" without any oracle risk — an attacker who knows the version byte (which is in the cleartext header) gains no information about key correctness by learning that the version is unsupported. The flags byte is security-relevant: an attacker who controls the flags byte and can observe the error response gains a key-verification oracle — if the correct key is loaded and only the flag is wrong, a distinct error would confirm key correctness. Collapsing flags errors to `AeadFailed` removes this distinguisher. In short: unknown version → caller needs to upgrade, expose clearly; unknown flags → potential adversarial probe, collapse to prevent oracle.

### 15.9 Chunk Index Exhaustion

The sequential encryptor and decryptor maintain a `next_index: u64` counter (initially 0). Before each chunk operation, if `next_index == u64::MAX`, the operation returns `ChainExhausted` without encrypting or decrypting. This prevents `next_index + 1` from wrapping to 0, which would reuse the chunk 0 nonce — catastrophic for AEAD security. The random-access `decrypt_chunk_at` does not maintain a sequential counter and accepts any `u64` index directly, so exhaustion does not apply. Passing `u64::MAX` as the index is not guarded — it computes a valid nonce and attempts AEAD decryption, which will return `AeadFailed` (no encryptor could have produced a chunk at that index due to the sequential exhaustion guard). The nonce for index `u64::MAX` with a non-final tag byte is computed as `base_nonce XOR (0xFFFFFFFFFFFFFFFF || 0x00 || 0x00{15})`, i.e., the first 8 bytes of the mask (bytes 0-7, the chunk_index field encoded as a big-endian u64) are all `0xFF`. This is a structurally valid XChaCha20-Poly1305 nonce — the AEAD proceeds, finds no matching ciphertext, and returns `AeadFailed`. Reimplementers MUST NOT add a `ChainExhausted` guard to `decrypt_chunk_at` — the function is stateless and cannot know whether the index is "valid."

**`expected_index()` value after `ChainExhausted`**: When a sequential encryptor or decryptor returns `ChainExhausted` (at `next_index == u64::MAX`), `expected_index()` / `soliton_stream_decrypt_expected_index` returns `u64::MAX`. The counter is not cleared, reset, or advanced on the exhaustion guard — it retains the value that triggered the guard. A reimplementer who advances or resets `next_index` on `ChainExhausted` will return the wrong value from `expected_index()` and break callers who inspect the counter after exhaustion to determine how many chunks were processed.

**`ChainExhausted` boundary**: A stream with exactly `u64::MAX - 1` (18,446,744,073,709,551,614) chunks processes the final chunk at index `u64::MAX - 1` (`next_index` advances from `u64::MAX - 1` to `u64::MAX` after that chunk). The next call to `encrypt_chunk` or `decrypt_chunk` (at `next_index == u64::MAX`) returns `ChainExhausted`. Reimplementers testing this boundary MUST use the sequential API, not `decrypt_chunk_at` (which is stateless and does not check the counter).

**`decrypt_chunk_at` remains usable after sequential exhaustion.** When a sequential decryptor returns `ChainExhausted` (at `next_index == u64::MAX`), `decrypt_chunk_at` is unaffected — it reads no sequential state and can still decrypt any chunk by explicit index. This enables a valid use pattern: sequentially process all chunks up to the exhaustion boundary, then use `decrypt_chunk_at` for any remaining chunks. A reimplementer who adds a terminal-exhausted flag that also blocks `decrypt_chunk_at` breaks this pattern.

**Compressed non-final chunk size validation applies to `decrypt_chunk_at`**: The compressed non-final chunk size check (§15.6 — decompressed output MUST be exactly `CHUNK_SIZE`) applies to `decrypt_chunk_at` exactly as it does to sequential decryption. A reimplementer treating random-access decryption as "bare AEAD + decompress" without the size check accepts malformed streams where a non-final chunk decompresses to the wrong size. The check is post-AEAD (§15.6) and therefore safe — it does not create an oracle. The stateless nature of `decrypt_chunk_at` does not exempt it from content validation.

**Empty-final-chunk compression bypass applies to `decrypt_chunk_at`**: The §15.5 compression bypass for empty plaintext (a 0-byte final chunk is stored uncompressed regardless of the compress flag) applies identically to `decrypt_chunk_at`. When the decrypted AEAD output is zero bytes, the decompression step is skipped — attempting `zstd_decompress([])` on an empty AEAD output would reject a structurally valid empty final chunk, collapsing to `AeadFailed`. A reimplementer who unconditionally decompresses the AEAD output in `decrypt_chunk_at` (rather than conditioning on `!decrypted.is_empty()`) breaks empty-file stream support.

### 15.10 Finalization State Machine

Both the encryptor and sequential decryptor maintain a `finalized` boolean (initially false) that enforces stream integrity:

- **Encrypt**: Successfully encrypting a chunk with `is_last = true` sets `finalized = true`. A failed `encrypt_chunk(is_last=true)` call (e.g., `Internal` from zstd expansion) does NOT set `finalized` — the stream is not sealed and the call is retryable (§15.11 atomicity). Subsequent successful calls to `encrypt_chunk` (regardless of `is_last`) after finalization return `InvalidData` — the stream is sealed. A reimplementer who allows post-final writes would permit appending chunks to a supposedly-complete stream, breaking the exactly-one-final-chunk invariant (§15.6).

- **Decrypt (sequential)**: Successfully decrypting a chunk with `tag_byte = 0x01` sets `finalized = true`. Subsequent calls to `decrypt_chunk` return `InvalidData`. This prevents callers from feeding additional chunks after the stream is complete. **AEAD failure on the final chunk does NOT set `finalized`**: if `decrypt_chunk` returns `AeadFailed` for a chunk whose `tag_byte` would have been `0x01`, `finalized` remains `false`. The caller may retry the chunk (e.g., after re-fetching from a corrupted transport) without hitting the post-finalization guard. A reimplementer who sets `finalized = true` on any `tag_byte = 0x01` attempt (including failed ones) prevents retry of a legitimately corrupted final chunk.

- **Encrypt (random access)**: `encrypt_chunk_at` does NOT read or set `finalized`, and does NOT read or advance `next_index`. It can be called before, during, or after sequential finalization — the `finalized` guard that `encrypt_chunk` enforces is absent. A reimplementer who adds a post-finalization guard to `encrypt_chunk_at` breaks the mixed sequential/random-access pattern and prevents callers from using parallel encryption alongside a sequential stream. **Calling `encrypt_chunk_at(is_last=true)` after sequential finalization succeeds silently**: the library emits a second final chunk (`tag_byte = 0x01`) with no error. The resulting stream violates the exactly-one-final invariant (§15.6) — a sequential decryptor seals at the first `tag_byte = 0x01` and returns `InvalidData` for all subsequent chunks, including the second final marker. Tracking whether a final chunk has already been emitted is a caller obligation.

- **Decrypt (random access)**: `decrypt_chunk_at` does NOT read or set `finalized`. It can be called in any order, including after `finalized = true`, and including on the final chunk. The caller owns completion tracking when using random-access mode. The return value is `(plaintext, is_last)` where `is_last` reflects the decoded `tag_byte` (`true` if `tag_byte == 0x01`, `false` otherwise) — this is a pure read of the chunk's tag byte with no connection to the `finalized` flag. A reimplementer who omits `is_last` from the return value or always returns `false` prevents callers from detecting the final chunk in random-access mode.

The `finalized` flag is queryable via `is_finalized()` on both encryptors and decryptors.

**Silent truncation when freeing an unfinalized encryptor**: Calling `soliton_stream_encrypt_free` on an encryptor where `finalized = false` silently destroys the handle and zeroizes the key without error. The library does NOT return `InvalidData` or any error for freeing an unfinalized encryptor — the free operation always succeeds. The resulting stream has no final chunk (`tag_byte = 0x01` was never emitted), so any sequential decryptor reading the output will eventually reach EOF without seeing `tag_byte = 0x01` and detect truncation. However, if the caller discards the partially-written stream output without checking the free return code, the truncation is silent from the caller's perspective. **Callers MUST call `is_finalized()` before freeing an encryptor** and treat a non-finalized free as a programming error. The library cannot emit the final chunk automatically on free — the final chunk carries the actual last plaintext data, which the library does not buffer. An auto-emitted empty final chunk on free would produce a spurious 17-byte trailing chunk that the caller did not request and whose plaintext (empty) may be incorrect for the application. Callers who want to guarantee a final chunk MUST call `encrypt_chunk(..., is_last=true)` explicitly before freeing.

**Freeing an unfinalized sequential decryptor also always succeeds**: Calling `soliton_stream_decrypt_free` on a decryptor where `finalized = false` (i.e., the stream was only partially consumed — the `tag_byte = 0x01` final chunk was never decrypted) silently destroys the handle and zeroizes the key without error. The library does NOT return `InvalidData` or any error for freeing an unfinalized decryptor. Whether the absent finalization reflects a truncated stream, a partial read, or a transport failure is a caller concern; the library imposes no constraint on consuming the full stream before freeing the handle.

**`header()` is valid immediately after `stream_encrypt_init` — before the first chunk.** The 26-byte header (version + flags + base_nonce) is written once at construction time and never changes. The canonical usage is: init → `header()` → `encrypt_chunk(...)` × N. A reimplementer who adds a "not-yet-started" guard — returning an error from `header()` before the first `encrypt_chunk` call — breaks protocols that transmit the header before beginning chunk production (e.g., streaming pipelines that open the output channel, write the header, and then encrypt chunks as they arrive). `header()` is equally valid before the first chunk, between chunks, after the final chunk, and after freeing finalization (if the handle is still accessible). It is not subject to any state guard — the `finalized` flag, the `next_index` counter, and the per-chunk error states are irrelevant. A reimplementer who adds a post-finalization guard to `header()` also breaks this pattern (retrieving the header after the final chunk is emitted is a common pattern for container formats).

### 15.11 Random Access

Counter-based nonce derivation enables both encryption and decryption of any chunk without processing preceding chunks.

**`encrypt_chunk_at` — random-access encryption**: The symmetric counterpart to `decrypt_chunk_at`. Encrypts one chunk at an explicit index using the same nonce and AAD construction as `encrypt_chunk` (§15.3, §15.4), but does not advance `next_index` or set `finalized`. The primary use case is parallel encryption: the caller splits the plaintext into chunks, assigns each chunk an index, and dispatches `encrypt_chunk_at` calls concurrently. Because each chunk's nonce and AAD are fully determined by the chunk index, compression flag, base nonce, and key — none of which change during parallel execution — the encrypted chunks can be computed independently and assembled in index order without synchronization. The caller is responsible for:

- Assigning each chunk a unique index. Calling `encrypt_chunk_at` twice with the same `(index, is_last, plaintext)` triple produces identical output (nonce reuse — see §15.12 index uniqueness). Calling it twice with the same index but different plaintexts produces ciphertexts that are cryptographically indistinguishable from a corruption — no oracle exists to detect them.
- Marking exactly one chunk as `is_last = true`. The final-chunk invariant (§15.6 exactly-one-final) is a caller obligation when using `encrypt_chunk_at`. The library enforces nothing: a caller who marks two chunks as final produces a stream where `decrypt_chunk` accepts the first `tag_byte = 0x01` it encounters and returns `InvalidData` for all subsequent chunks (§15.10 decrypt sequential finalization guard).
- Knowing the total chunk count before encryption begins (to identify which chunk is final). The sequential `encrypt_chunk` does not require this — `is_last` is provided per call. For `encrypt_chunk_at`, the caller must know the chunk count in advance to set `is_last` correctly on the last chunk.

`encrypt_chunk_at` takes `&self` in Rust (immutable borrow), so multiple concurrent calls from safe Rust code (e.g., via `rayon::par_iter`) are permitted without unsafe — the borrow checker enforces that no mutable state is shared. The CAPI `soliton_stream_encrypt_chunk_at` uses `*const SolitonStreamEncryptor` for the same reason: the function does not mutate handle state. The CAPI reentrancy guard (§13.6) still fires on concurrent calls to the same handle, so parallel encryption through the CAPI requires one encryptor handle per thread, all initialized from the same key, AAD, and base nonce. The Rust API has no such restriction.

**`*const SolitonStreamEncryptor` for `soliton_stream_encrypt_chunk_at`**: The CAPI uses `*const` to reflect the `&self` Rust contract. The same caveat from `soliton_stream_decrypt_chunk_at` applies: in C, `const T*` does NOT mean concurrent calls are safe — the reentrancy guard enforces single-caller access at runtime. Parallel encryption through CAPI always requires separate handles.

**Sequential and random-access encryption can be mixed on the same handle**: `encrypt_chunk_at` never modifies `next_index` or `finalized`, so it does not interfere with a concurrent or subsequent sequential `encrypt_chunk` pass. For example, a caller can encrypt the bulk of a stream sequentially via `encrypt_chunk`, then re-encrypt a specific chunk at a known index via `encrypt_chunk_at` to patch it — the sequential counter is unaffected. "Mixed" means interleaved calls within a single thread in the Rust API; CAPI mixed access requires sequential calls on the same handle due to the reentrancy guard.

**Decryption of random-access-encrypted streams**: A stream encrypted entirely via `encrypt_chunk_at` is wire-format identical to one encrypted via `encrypt_chunk` — the wire format (§15.2) depends only on the key, base nonce, indices, and plaintexts, not on which encrypt API was used. It can be decrypted via `decrypt_chunk` (sequential), `decrypt_chunk_at` (random access), or a mix.

Counter-based nonce derivation enables decryption of any chunk without processing preceding chunks:

- **No compression**: chunk byte offsets are deterministic: `STREAM_HEADER_SIZE + N × (CHUNK_SIZE + STREAM_CHUNK_OVERHEAD)`, where `STREAM_CHUNK_OVERHEAD = 17` (1 `tag_byte` + 16 Poly1305 authentication tag) — using the exact names from Appendix A. Full expansion: `26 + N × (1,048,576 + 17)` = `26 + N × 1,048,593`. The `tag_byte` occupies the **first byte** of each chunk at this offset; the AEAD ciphertext (the bytes passed to XChaCha20-Poly1305) starts at `STREAM_HEADER_SIZE + N × STREAM_CHUNK_STRIDE + 1`. A seek-and-decrypt implementation that passes the tag_byte as the first byte to the AEAD primitive gets `AeadFailed` with no obvious diagnostic. Recommended for random-access use cases (video seeking, resumable downloads).
- **With compression**: chunk sizes are content-dependent. The caller must build a chunk-offset index during encryption (accumulate per-chunk output sizes).

**Index integrity**: A tampered chunk-offset index (pointing to the wrong byte range for a given chunk index) causes AEAD failure, not silent wrong plaintext — both the per-chunk nonce and AAD include the chunk index, so presenting chunk N's ciphertext at index M fails authentication. This holds for both sequential and random-access modes.

**`decrypt_chunk_at` takes an extracted chunk, not the stream tail**: The `chunk` parameter is exactly one encrypted chunk's bytes — the bytes from the stream at offset `26 + N × 1,048,593` to the start of the next chunk (`26 + (N+1) × 1,048,593`), excluding the stream header. It is NOT the remaining stream bytes starting at that offset. Passing the stream tail (everything from the chunk's start byte to the end of the stream) does not decrypt correctly — the function expects exactly one chunk and treats trailing bytes as an oversized input that fails length validation. The caller is responsible for extracting the correct byte range before calling `decrypt_chunk_at`. For uncompressed streams, the offset formula above gives the exact byte range; for compressed streams, the caller must use the chunk-offset index built during encryption (§15.11 "With compression").

The `decrypt_chunk_at` API accepts a chunk index directly, does not advance the sequential counter, does not set the `finalized` flag regardless of `tag_byte`, and can be called on an immutable (`&self`) reference. Sequential and random-access decryption can be mixed on the same decryptor handle — `decrypt_chunk_at` never modifies `next_index` or `finalized`, so it does not interfere with a sequential pass (e.g., random-access retry of a failed chunk during an otherwise sequential download). "Mixed" means interleaved calls within a single thread, not concurrent multi-threaded access. Parallel chunk decryption requires separate decryptor handles initialized from the same key and header bytes; the CAPI reentrancy guard (§13.6) prevents concurrent calls on the same handle.

**`*const SolitonStreamDecryptor` means "no observable state mutation," not "thread-safe"**: The CAPI signature uses `*const SolitonStreamDecryptor` for `soliton_stream_decrypt_chunk_at` to reflect that the function takes `&self` in Rust (no state mutation). In C, `const T*` conventionally signals "safe for concurrent reads," but this guarantee does NOT hold here — the CAPI reentrancy guard (§13.6) fires on any concurrent call regardless of whether the call is read-only. A C binding author who interprets `*const` as "concurrent calls are safe" and dispatches `decrypt_chunk_at` from multiple threads on the same handle receives `ConcurrentAccess` (-18) with no indication that const was the source of confusion. Parallel chunk decryption always requires separate handles, even when all calls are read-side `decrypt_chunk_at`. The `const` qualifier signals the Rust-level API contract (immutable borrow), not a C-level concurrency guarantee.

**Atomicity**: On encryption or decryption failure (AEAD rejection, ChainExhausted, decompression failure, post-finalization guard, Internal from compression expansion check), the encryptor/decryptor state is unchanged — `next_index` is not advanced and `finalized` is not set. The operation is retryable (the same chunk can be re-submitted after correcting the input). Unlike ratchet `encrypt()` (§6.5), per-chunk failures are NOT session-fatal and the streaming key is NOT zeroized on error — retryability requires the key to survive failed calls. The key is zeroized exclusively on handle destruction (`soliton_stream_encrypt_free` / `soliton_stream_decrypt_free`). Reimplementers MUST NOT zeroize the streaming key on per-chunk AEAD failure.

**Output parameters on error**: On any error return from `soliton_stream_decrypt_chunk` or `soliton_stream_decrypt_chunk_at`, the output parameters are set to defined values: `*out_written = 0` and `*is_last = false`. Callers MUST check the return code before reading `out_written` or `is_last` — on error, these values are sentinels, not results. A caller who reads `out_written` or `is_last` without first checking the return code gets 0 and false, which is safe (no buffer overflow, no false finalization signal), but the defined-value guarantee is part of the CAPI contract so reimplementers must provide it. Note: `soliton_stream_encrypt_chunk` sets only `*out_written = 0` on error; there is no `is_last` output parameter on the encrypt side.

### 15.12 Stream-Level Security Analysis

**Cross-stream splicing**: Each stream has a unique random base nonce (24 bytes from CSPRNG). Moving a chunk from stream A into stream B at the same index fails AEAD authentication — the per-chunk nonce is derived from the base nonce, so the chunk decrypts under a different nonce in stream B. Moving a chunk to a different index within the same stream also fails — different chunk index produces a different nonce.

**Chunk reordering**: Per-chunk nonces are deterministic from (base_nonce, index). Swapping chunks i and j fails AEAD because each chunk authenticates under its own index-derived nonce. The sequential decryptor also detects reordering via `next_index` monotonic advance. `next_index` starts at 0 (§15.9) — the first chunk has index 0. A stream encrypted with N chunks has chunk indices 0 through N−1, and `next_index` equals N after the final chunk is encrypted. Unlike ratchet `send_count` (which starts at 0 but represents a sequence number where 0 is the first sent message), stream chunk indexing is zero-based purely as a counter: chunk 0 is the first chunk, not a "zeroth" message. Reimplementers who initialize `next_index = 1` by analogy with ratchet counters will misalign every chunk's nonce from the first chunk onward.

**Truncation**: The `is_final` tag byte (§15.10) detects truncation — a sequential decryptor that reaches EOF without seeing `tag_byte = 0x01` knows the stream was truncated. **The detection mechanism is `is_finalized() == false` after transport EOF, not a library error**: the library does not return an error for a non-finalized stream at EOF; it returns errors only per-chunk (e.g., `AeadFailed` for a partial chunk, `ChainExhausted` for index overflow). Truncation between whole chunks — i.e., the transport closes cleanly without delivering the final chunk — produces no library error on any call. The caller detects this by checking `is_finalized()` after transport EOF: `false` means the final chunk (`tag_byte = 0x01`) was never delivered. A reimplementer who adds a `check_complete()` or `flush()` API that returns `InvalidData` for non-finalized state creates an incompatible API — no such call exists in the reference implementation. Random-access decryptors do not check finalization and cannot detect truncation; callers using random-access mode must verify completeness externally (e.g., via a known chunk count in the stream metadata). **For compressed streams, the chunk count is not derivable from byte length** (unlike uncompressed streams where `(total_bytes - HEADER_SIZE) / (CHUNK_SIZE + CHUNK_OVERHEAD)` is exact). The chunk count must be stored in the enclosing metadata. **This metadata must itself be authenticated** — it must be covered by a ratchet AEAD, a detached signature, or another integrity mechanism. An adversary who controls the metadata channel can substitute a smaller chunk count, making a truncated stream appear complete to a random-access caller that decrypts only the first N chunks. Storing the chunk count in an unauthenticated plaintext field (e.g., a JSON wrapper, an HTTP header) defeats this completeness check entirely. **Standard authenticated placement**: include the chunk count in the same ratchet message body that delivers the stream key (§15.1). The ratchet AEAD authenticates the entire message body, so the chunk count inherits authentication without a separate integrity mechanism. This is the recommended pattern; alternatives (detached signature, AEAD-authenticated sidecar) are valid but add complexity for no benefit in the standard composition.

**Definition of "chunk count"**: The chunk count is the total number of chunks produced by the encryptor — equivalently, `final_chunk_index + 1`, where `final_chunk_index` is the 0-based index of the final chunk (the chunk with `tag_byte = 0x01`). For a stream with N non-final chunks followed by one final chunk, the chunk count is N + 1. This is also the value of the encryptor's `next_index` counter immediately after calling `encrypt_chunk` with `is_last = true`. An off-by-one (storing `final_chunk_index` instead of `final_chunk_index + 1`) silently accepts a stream truncated before the last chunk: a random-access caller decrypting chunks 0 through `count − 1` would stop one chunk before the final one, never seeing `is_last = true` and incorrectly treating the stream as complete.

**`next_index` reliability after a failed `encrypt_chunk`**: This is only reliable when `is_finalized()` is true. If the final `encrypt_chunk(is_last=true)` call fails (e.g., `InvalidData` for oversized plaintext), `next_index` is not incremented — the state is unchanged (§15.11 atomicity). A caller who reads `next_index` after a failed final-chunk call and treats it as chunk count will be off by one.

**No CAPI function retrieves the chunk count post-finalization**: The CAPI provides `soliton_stream_decrypt_expected_index` (read the decryptor's sequential counter) but no corresponding `soliton_stream_encrypt_expected_index`. To implement the recommended metadata pattern (§15.12 authenticated chunk count), CAPI callers must track the chunk count themselves — increment a caller-managed counter on each `soliton_stream_encrypt_chunk` call. **`soliton_stream_decrypt_expected_index` cannot substitute for a caller-managed counter**: this function reads the decryptor's own sequential `next_index` (the number of chunks it has successfully decrypted), not the encryptor's state. A paired decryptor that has not yet decrypted any chunks returns 0 — it has no visibility into how many chunks the encryptor has produced. Do not use `soliton_stream_decrypt_expected_index` as a proxy for the encryptor's chunk count. The simplest workaround: maintain an application-level `chunk_count` variable initialized to 0, increment it after each successful `soliton_stream_encrypt_chunk`, and embed the final value in the ratchet message alongside the stream key. Neither the Rust API nor the CAPI exposes the encryptor's internal index counter — `StreamEncryptor` provides only `header()` and `is_finalized()` (the symmetric `expected_index()` exists on `StreamDecryptor` only, as an asymmetric design choice). CAPI callers and Rust callers alike must implement equivalent counter tracking in application code.

**Chunk deletion (middle)**: Deleting chunk i causes chunk i+1 to be presented at index i during sequential decryption — AEAD fails (wrong nonce for that ciphertext). Random-access at the original index returns `AeadFailed` (no ciphertext at that offset, or wrong ciphertext).

**`encrypt_chunk_at` index uniqueness**: Calling `encrypt_chunk_at` twice with the same `index` on the same encryptor handle produces identical ciphertext both times — nonce and AAD are deterministic from the index, so the same plaintext at the same index always yields the same encrypted output. This is not a security vulnerability within a single stream (unique indices across chunks prevent nonce reuse between chunks), but it means `encrypt_chunk_at` offers no write-once enforcement: a caller who accidentally encrypts the same index twice will silently produce a redundant chunk with no error return. The stream assembled from such duplicate calls contains two ciphertexts at the same position; which one the decryptor sees depends on how the caller assembles the stream. Callers using `encrypt_chunk_at` for parallel encryption MUST ensure each chunk index is used exactly once. The library cannot enforce this — enforcing it would require shared mutable state, which contradicts the `&self` contract.

**Chunk replay**: The streaming layer does not provide replay protection — this is mode-dependent. In sequential mode, replaying a chunk that was already successfully decrypted fails incidentally: the sequential decryptor has already advanced `next_index` past that chunk's position, so presenting the chunk again decrypts it at a different (wrong) index, producing `AeadFailed`. This is incidental, not by design — the sequential counter provides freshness as a side effect of monotonic advance, not via explicit replay tracking. In random-access mode, decrypting the same `(index, chunk)` pair twice succeeds both times — `decrypt_chunk_at` is stateless and has no memory of prior decryptions. A formal modeler asking whether the streaming layer provides authenticated-channel replay resistance gets different answers depending on the mode. Reimplementers MUST NOT add stateful replay tracking to `decrypt_chunk_at` — its stateless contract is required for mixed sequential/random-access operation (§15.11) and for parallel chunk decryption across multiple handles.

**Cross-session replay and key freshness**: The in-session counter provides no protection against cross-session replay — presenting an entire stream (header + all chunks) to a fresh decryptor initialized with the same key succeeds without error. The stream has a unique `base_nonce`, but the decryptor has no memory of prior base nonces. Protection against cross-session replay relies entirely on the key being freshly generated from the OS CSPRNG for each stream (§15.1). The probability that two streams share the same key is negligible with a properly seeded CSPRNG. A reimplementer who derives stream keys deterministically (e.g., from a counter or from fixed material) or who reuses stream keys across sessions loses this protection entirely — cross-session replay becomes trivially possible.

**Key reuse across streams**: Catastrophic — two streams with the same key and base nonce produce identical per-chunk nonces, enabling XOR of plaintexts. The base nonce is 192 bits from CSPRNG, making accidental collision negligible (~2^-96 birthday bound for 2^48 streams). Callers MUST NOT reuse keys across streams; generate a fresh random key per stream. **`caller_aad` does not substitute for key freshness**: using distinct `caller_aad` values with the same key does not prevent nonce reuse — nonces are derived from the base nonce and chunk index, not from the AAD. Two streams with the same key and same base nonce (birthday collision) produce identical nonces regardless of `caller_aad` differences. The isolation primitive is the per-stream random key and base nonce, not the AAD.

---

## Appendix A: Constants

All domain labels and AAD prefixes are raw UTF-8 byte strings — no null terminators, no length prefixes. Concatenation with other fields (fingerprints, header bytes, etc.) is raw byte concatenation unless explicitly annotated otherwise (e.g., KEX info uses length-prefixed fields per §5.4).

```
AUTH_HMAC_LABEL     = b"lo-auth-v1"          // 10 bytes
KEX_HKDF_INFO_PFX  = b"lo-kex-v1"            // 9 bytes
SPK_SIG_LABEL       = b"lo-spk-sig-v1"        // 13 bytes
INITIATOR_SIG_LABEL = b"lo-kex-init-sig-v1"  // 18 bytes
RATCHET_HKDF_INFO  = b"lo-ratchet-v1"        // 13 bytes
DM_AAD              = b"lo-dm-v1"             // 8 bytes — shared by first-message (§5.4)
                                               // and ratchet-message (§6.5) AAD. Context
                                               // disambiguation is provided by the suffix
                                               // (session-init-bytes vs. ratchet-header-bytes),
                                               // not the label. Cross-context confusion
                                               // (feeding a first-message ciphertext as a
                                               // ratchet message or vice versa) is rejected
                                               // by AEAD: encode_session_init begins with a
                                               // 2-byte BE length prefix (~0x000C) while
                                               // encode_ratchet_header begins with 1216 bytes
                                               // of public key material — the AAD mismatch
                                               // causes tag verification to fail. Future
                                               // message formats needing distinct AEAD contexts
                                               // MUST use a new label.
STORAGE_AAD         = b"lo-storage-v1"        // 13 bytes
DM_QUEUE_AAD        = b"lo-dm-queue-v1"       // 14 bytes — separate label (not DM_AAD with a suffix)
                                               // because the DM queue context has no fixed structural
                                               // suffix to provide disambiguation. DM_AAD is shared
                                               // between first-message and ratchet-message contexts
                                               // because those contexts have structurally distinct
                                               // suffixes (session-init bytes vs. ratchet-header bytes)
                                               // that make cross-context confusion impossible. DM queue
                                               // AAD has no such suffix — using DM_AAD with a queue-
                                               // specific suffix would require a separately-standardized
                                               // encoding convention with the same collision-prevention
                                               // burden as a distinct label. A distinct label is simpler.
CALL_HKDF_INFO      = b"lo-call-v1"           // 10 bytes
PHRASE_HASH_LABEL   = b"lo-verification-v1"  // 18 bytes
PHRASE_EXPAND_LABEL = b"lo-phrase-expand-v1"  // 19 bytes
MSG_KEY_DOMAIN_BYTE = 0x01                    // HMAC domain byte for KDF_MsgKey (§6.3)
                                               // 0x02 reserved — gap buffer between 0x01 (message key
                                               //   derivation) and 0x03; reserved for hypothetical future
                                               //   epoch-key-derived outputs to maintain a consistent gap
                                               //   and prevent contiguous assignment with 0x01.
                                               // 0x03 reserved (prevents collision with call chain bytes 0x04-0x06)
CALL_KEY_A_BYTE     = 0x04                    // HMAC data byte for first call key
CALL_KEY_B_BYTE     = 0x05                    // HMAC data byte for second call key
CALL_CHAIN_ADV_BYTE = 0x06                    // HMAC data byte for next call chain key
MAX_CALL_ADVANCE    = 2²⁴                     // Maximum advance_call_chain steps per call session
                                               // (16,777,216 rekeys). Exceeding this limit returns
                                               // ChainExhausted. Also listed in Appendix B.
                                               // NOT an exported pub const — this is a private
                                               // const in call.rs; importing MAX_CALL_ADVANCE by
                                               // name will fail at link/import time. Binding authors
                                               // must embed the literal value (16_777_216 / 0x100_0000).
CALL_ID_SIZE        = 16                       // 128-bit random call identifier
XWING_CIPHERTEXT_SIZE = 1120                  // X-Wing KEM ciphertext bytes: X25519_eph_pk (32) ||
                                               // ML-KEM-768_ct (1088), LO X25519-first order (§8.1).
                                               // Fixed in lo-crypto-v1; length-prefixed in wire format
                                               // (§7.4) for forward-compat across crypto versions.
HMAC_SHA3_256_BLOCK_SIZE = 136             // NOT an exported pub const — binding authors must
                                               // embed the value 136 directly; importing this name
                                               // will fail at link/import time.
                                               // SHA3-256's Keccak rate (block size) in bytes.
                                               // RFC 2104 HMAC pads/truncates keys to the hash's
                                               // block size — 136 bytes for SHA3-256, NOT the 64
                                               // bytes of SHA-2. A reimplementer using a SHA-2-
                                               // configured HMAC library or hardcoding 64 as the
                                               // block size produces wrong output on every KDF_MsgKey,
                                               // KDF_Root, KDF_Call, and AdvanceCallChain call.
                                               // Standard HMAC libraries handle this automatically
                                               // when SHA3-256 is selected — this constant exists
                                               // for reimplementers building HMAC from primitives
                                               // and for interoperability test vectors (F.25 / T3).
XWING_SEED_SHAKE_OUTPUT = 96              // NOT an exported pub const — binding authors must
                                               // embed the value 96 directly; importing this name
                                               // will fail at link/import time.
                                               // SHAKE256 output length (bytes) for X-Wing seed expansion
                                               // (§8.5, draft-09 §3.2): SHAKE256(seed_32, 96) → d(32)
                                               // || z(32) || sk_X(32). Not used in production keygen
                                               // (which draws three independent CSPRNG values) — used
                                               // exclusively in deterministic test environments and KAT
                                               // reproduction. A reimplementer using SHAKE256(seed, 64)
                                               // would derive only d and z, missing sk_X.
HKDF_ZERO_SALT      = [0x00] × 32   // 32 zero bytes (sequence notation — not integer multiplication)
MAX_RECV_SEEN       = 65536                    // max entries in recv_seen duplicate tracking set
RATCHET_BLOB_VERSION = 0x01                    // current ratchet state serialization version (§6.8).
                                               // `from_bytes` returns `UnsupportedVersion` for any
                                               // version ≠ 0x01. No migration path for unknown versions.
STREAM_HEADER_VERSION = 0x01                   // current streaming AEAD header version (§15.2) — Rust source: STREAM_VERSION
CRYPTO_VERSION      = "lo-crypto-v1"
XWING_LABEL         = 0x5c 0x2e 0x2f 0x2f 0x5e 0x5c  // \.//^\  (label goes LAST in combiner)
STREAM_AAD          = b"lo-stream-v1"          // 12 bytes
STREAM_TAG_NONFINAL = 0x00                     // non-final chunk tag byte. Three roles:
                                               // (1) XOR component in nonce derivation (§15.3) —
                                               //   XORed into mask byte 8, producing a nonce that
                                               //   is distinct from the final-chunk nonce at the
                                               //   same index (0x00 vs 0x01 in byte 8).
                                               // (2) Final-chunk signal — value 0x00 means there
                                               //   are more chunks to follow; the sequential
                                               //   decryptor does not set finalized=true.
                                               // (3) Reader termination — sequential decryptors
                                               //   continue reading chunks as long as tag_byte ≠ 0x01.
STREAM_TAG_FINAL    = 0x01                     // final chunk tag byte. Three roles:
                                               // (1) XOR component in nonce derivation (§15.3) —
                                               //   XORed into mask byte 8, producing a nonce that
                                               //   differs from the non-final nonce at the same index.
                                               //   This prevents the final-chunk ciphertext from being
                                               //   presentable as a valid non-final chunk (the nonces
                                               //   differ, so AEAD would fail if the tag_byte were flipped).
                                               // (2) Final-chunk signal — exactly one chunk per stream
                                               //   has tag_byte=0x01; its presence terminates the stream.
                                               // (3) Reader termination — sequential decryptors set
                                               //   finalized=true and reject any subsequent decrypt_chunk
                                               //   calls when a chunk with tag_byte=0x01 is successfully
                                               //   decrypted.
CHUNK_SIZE          = 1_048_576               // plaintext bytes per non-final chunk (1 MiB).
                                               // Also the minimum output buffer size for
                                               // soliton_stream_decrypt_chunk /
                                               // soliton_stream_decrypt_chunk_at (see Appendix B).
                                               // Rust source: STREAM_CHUNK_SIZE (the exported pub
                                               // const is named STREAM_CHUNK_SIZE, not CHUNK_SIZE;
                                               // this spec uses CHUNK_SIZE as the canonical name).
FLAG_COMPRESSED     = 0x01                    // bits 1-7 reserved (MUST be zero on write,
                                               // collapse to AeadFailed on read per §15.7).
                                               // This flag appears in: §11.1 storage blob header (flags byte),
                                               // §11.2 DM queue blob, §15.2 streaming AEAD header (flags byte),
                                               // §15.5 streaming AAD. In all contexts, bit 0 = compression
                                               // (0 = none, 1 = zstd). Binding authors using the flag
                                               // value directly should define this constant locally.
STREAM_HEADER_SIZE  = 26                      // bytes in the streaming AEAD header (§15.2):
                                               // version (1) + flags (1) + base_nonce (24).
                                               // Used in the random-access offset formula
                                               // (§15.11): offset = STREAM_HEADER_SIZE + N × STREAM_CHUNK_STRIDE.
STREAM_CHUNK_OVERHEAD = 17                    // bytes added per chunk beyond plaintext:
                                               // tag_byte (1) + Poly1305 tag (16).
                                               // An encrypted chunk is: tag_byte (1) ||
                                               // XChaCha20-Poly1305 output (plaintext + 16).
STREAM_CHUNK_STRIDE = 1_048_593               // fixed byte stride between uncompressed chunk
                                               // boundaries: CHUNK_SIZE + STREAM_CHUNK_OVERHEAD
                                               // = 1_048_576 + 17 = 1_048_593.
                                               // Used in the §15.11 random-access offset formula:
                                               // byte_offset(N) = STREAM_HEADER_SIZE + N × STREAM_CHUNK_STRIDE.
                                               // Only valid for uncompressed streams; compressed
                                               // chunk sizes are content-dependent (§15.11).
                                               // NOT an exported pub const — binding authors must
                                               // compute this as CHUNK_SIZE + STREAM_CHUNK_OVERHEAD;
                                               // importing STREAM_CHUNK_STRIDE by name will fail
                                               // at link/import time.
STREAM_ZSTD_OVERHEAD = 256                    // zstd expansion guard for streaming encrypt_chunk (§15.11).
                                               // If zstd output exceeds plaintext.len() + 256, encrypt_chunk
                                               // returns Internal (retryable with compress=false). The value
                                               // is a conservative margin: zstd's worst-case expansion on
                                               // incompressible 1 MiB data is ~50 bytes (frame + block headers);
                                               // 256 provides ~5× headroom. Used in STREAM_ENCRYPT_MAX below.
STREAM_ENCRYPT_MAX  = 1_048_849               // max bytes of CAPI output buffer for one chunk:
                                               // CHUNK_SIZE (1_048_576) + ZSTD_OVERHEAD (256) +
                                               // CHUNK_OVERHEAD (17). Binding authors MUST
                                               // allocate at least this many bytes for the output
                                               // buffer passed to soliton_stream_encrypt_chunk;
                                               // smaller buffers return InvalidLength.
                                               // NOTE: this is the ceiling for the full-CHUNK_SIZE
                                               // case. For a short final chunk (e.g., 100 bytes
                                               // of plaintext), the Internal guard fires if zstd
                                               // expands that chunk beyond 100 + ZSTD_OVERHEAD
                                               // (= 356 bytes), not beyond STREAM_ENCRYPT_MAX.
                                               // The guard is per-actual-plaintext-length, not
                                               // per-CHUNK_SIZE. An encrypt_chunk caller with a
                                               // 100-byte final chunk needs only a 356-byte CAPI
                                               // output buffer, but MUST still allocate at least
                                               // STREAM_ENCRYPT_MAX to satisfy the length guard.
```

## Appendix B: Parameters

| Parameter | Value |
|-----------|-------|
| OPK batch size | 100 |
| Pre-key low threshold | 10 |
| SPK rotation | 7 days |
| Old SPK retention | 30 days (from rotation, not generation — §10.2) |
| Auth challenge timeout | 30 seconds (§4.4) |
| Max recv_seen entries | 65536 per epoch |
| Max epoch length | 2^32 - 1 messages |
| Storage key versions | 1-255 |
| Verification phrase | 7 words / EFF large wordlist (7,776 words) |
| Verification phrase entropy | ~90.3 bits (7 × log2(7776)) |
| Zstd compression level | Fastest (~1); `ruzstd` 0.8.x limitation |
| Max plaintext per blob (encrypt) | 256 MiB on native; 16 MiB on WASM — `encrypt_blob` returns `InvalidData` if plaintext exceeds the platform limit before compression. This is the caller-provided pre-compression plaintext size, not the post-compression ciphertext size. |
| Max decompressed blob | 256 MiB |
| Call ID size | 16 bytes (128-bit random) |
| Argon2id version | 0x13 (decimal 19 = v1.3, the only version produced and accepted; §10.6) |
| Argon2id m_cost | 8 KiB - 4,194,304 KiB (4 GiB); must be ≥ 8 × p_cost (RFC 9106 §3.1) |
| Argon2id t_cost | 1 - 256 |
| Argon2id p_cost | 1 - 256 |
| Argon2id output length | 1 - 4,096 bytes |
| Argon2id salt minimum | 8 bytes |
| Argon2id `secret` (pepper) | Empty (0 bytes) — soliton does not use the Argon2id pepper input; reimplementers MUST pass empty `secret` |
| Argon2id `ad` (associated data) | Empty (0 bytes) — soliton does not use the Argon2id associated data input; reimplementers MUST pass empty `ad` |
| Stream chunk size | 1 MiB (1,048,576 bytes) |
| Stream header size | 26 bytes (version + flags + nonce) |
| Stream chunk overhead | 17 bytes (tag_byte + Poly1305 tag) |
| Stream zstd overhead | 256 bytes (~5× worst-case margin; zstd worst-case expansion on incompressible data is ~50 bytes for a 1 MiB input: frame header + block headers) |
| Stream max encrypted chunk | 1,048,849 bytes (CHUNK_SIZE + ZSTD_OVERHEAD + CHUNK_OVERHEAD) |
| Stream decrypt output buffer minimum | 1,048,576 bytes (CHUNK_SIZE) — binding authors MUST allocate at least this many bytes for the output buffer passed to `soliton_stream_decrypt_chunk` / `soliton_stream_decrypt_chunk_at`; smaller buffers return `InvalidLength` regardless of actual plaintext size |
| Stream max chunk index (sequential) | u64::MAX − 1 (guard fires at next_index == u64::MAX) |
| Stream max chunk index (random-access) | u64::MAX (no guard — any u64 accepted, returns AeadFailed for indices no encryptor produced; see §15.9) |
| Argon2id `OWASP_MIN` preset | m=19 MiB (19456 KiB), t=2, p=1 — interactive auth |
| Argon2id `RECOMMENDED` preset | m=64 MiB (65536 KiB), t=3, p=4 — stored keypair |
| Argon2id `WASM_DEFAULT` preset | m=16 MiB (16384 KiB), t=3, p=1 — WASM targets |
| Call chain advance limit | 2²⁴ steps (16,777,216 rekeys per call session, §6.12). `step_count` starts at 0; the initial keys from `derive_call_keys` are step-0 keys. `ChainExhausted` fires when `step_count` reaches 2²⁴ (i.e., after 16,777,216 `advance()` calls). |
| WASM decompressed blob limit | 16 MiB (WASM targets use a lower limit than the general 256 MiB; §11.3) |
| Max ratchet serialization epoch | u64::MAX − 1 (epoch u64::MAX triggers ChainExhausted from to_bytes, §6.8) |
| Ratchet blob deserialization cap | 1 MiB (1,048,576 bytes) — CAPI `soliton_ratchet_from_bytes` / `from_bytes_with_min_epoch` reject inputs exceeding this size with `InvalidLength` (-1). Tighter than the general 256 MiB cap; the maximum valid blob is ~530 KB (§6.8). Reimplementers building their own deserialization entry point SHOULD apply an equivalent cap. |
| `decode_session_init` input cap | 64 KiB (65,536 bytes) — `soliton_kex_decode_session_init` rejects inputs exceeding this size with `InvalidLength` (-1). Tighter than the general 256 MiB CAPI cap; the maximum valid session init blob is 4,669 bytes (with OPK; per Appendix C / §7.4). |
| `build_first_message_aad` input cap | 8 KiB (8,192 bytes) — `soliton_kex_build_first_message_aad` rejects `session_init_encoded` inputs exceeding this size with `InvalidLength` (-1). The cap is never reached in practice — the maximum valid `session_init_encoded` blob is 4,669 bytes (with OPK; §7.4 / Appendix C). There is no `associated_data` parameter on this function. Tighter than the general 256 MiB CAPI cap. |

### HKDF Usage Summary

All three HKDF invocations use different salt conventions. Implementers must use the exact salt specified for each KDF — do not assume uniformity.

**KDF_Root and KDF_Call share `root_key` as the HKDF salt — this is safe**: both use `root_key` as the salt, which an auditor might flag as salt reuse. Domain separation is maintained by distinct IKM values (`kem_shared_secret` for KDF_Root vs `kem_ss ‖ call_id` for KDF_Call) and distinct info strings (`"lo-ratchet-v1"` vs `"lo-call-v1" ‖ fp_lo ‖ fp_hi`). HKDF's Extract step with a shared salt produces different PRKs only when the IKM differs; the different IKM inputs guarantee distinct PRKs. The info strings then provide additional domain separation in the Expand step. Same-salt reuse introduces no cross-context weakness here.

| KDF | Salt | IKM | Info | Output |
|-----|------|-----|------|--------|
| KDF_KEX (§5.4) | `0x00 × 32` (zero salt) | Combined pre-key shared secrets: **64 B without OPK** (`ss_ik ‖ ss_spk`); **96 B with OPK** (`ss_ik ‖ ss_spk ‖ ss_opk`). The two IKM variants are not interchangeable — a 64 B IKM and a zero-padded 96 B IKM produce different HKDF outputs. | Length-prefixed composite (§5.4) — **exception: the `"lo-kex-v1"` (9 B) domain prefix is raw, no length prefix** (see §5.4 and Appendix A); only the per-field entries that follow it use `len(x)‖x` encoding | 64 B → (rk, ek) |
| KDF_Root (§6.4) | `root_key` | X-Wing shared secret | `"lo-ratchet-v1"` (raw, 13 B) | 64 B → (rk′, ek′) |
| KDF_Call (§6.12) | `root_key` | `kem_ss ‖ call_id` (raw, 48 B) | `"lo-call-v1" ‖ fp_lo ‖ fp_hi` (raw, 74 B) | 96 B → (key_a, key_b, ck) |

## Appendix C: Sizes

| Component | Bytes |
|-----------|-------|
| LO composite public key | 3200 — field layout: X-Wing pk (bytes 0-1215) ‖ Ed25519 pk (bytes 1216-1247) ‖ ML-DSA-65 pk (bytes 1248-3199) |
| LO composite secret key | 2496 — field layout: X-Wing sk (bytes 0-2431) ‖ Ed25519 seed (bytes 2432-2463) ‖ ML-DSA-65 seed `ξ` (bytes 2464-2495) |
| X-Wing public key | 1216 |
| X-Wing secret key | 2432 |
| X-Wing ciphertext | 1120 |
| X-Wing shared secret | 32 |
| X25519 scalar (sk) | 32 |
| X25519 public key | 32 |
| ML-KEM-768 public key (`ek_PKE`) | 1184 |
| ML-KEM-768 secret key (expanded, `dk_M`) | 2400 (see §8.5 for field layout) |
| ML-KEM-768 ciphertext | 1088 |
| ML-KEM-768 shared secret | 32 |
| ML-DSA-65 public key | 1952 |
| ML-DSA-65 secret key seed (`ξ`, stored form) | 32 |
| ML-DSA-65 expanded signing key (`sk_expanded`, not stored — re-derived from seed at signing time per §8.5) | 4032 (FIPS 204 §7.2, ML-DSA-65 sigKeySize) |
| Auth proof / token (HMAC-SHA3-256 output of LO-Auth) | 32 |
| Ed25519 public key | 32 |
| Ed25519 secret key seed (stored form) | 32 |
| Fingerprint (raw) | 32 |
| Fingerprint (hex) | 64 chars |
| Verification phrase | 7 words (~90 bits entropy) |
| Ed25519 signature | 64 |
| ML-DSA-65 signature | 3309 |
| Hybrid signature | 3373 |
| AEAD tag | 16 |
| AEAD nonce | 24 |
| Storage blob header | 26 (version + flags + nonce) |
| Storage blob minimum | 42 (header + Poly1305 tag) |
| Ratchet blob minimum | 195 bytes (§6.8) — any blob shorter MUST be rejected with `InvalidData` without parsing; see §6.8 for field breakdown |
| Passphrase blob minimum (basic, no prefix) | 56 bytes: salt(16) + nonce(24) + tag(16) for empty plaintext (§10.6) |
| Passphrase blob minimum (basic, with magic prefix) | 57 bytes: 0x00 magic(1) + salt(16) + nonce(24) + tag(16) (§10.6) |
| Passphrase blob minimum (extended, no prefix) | 62 bytes: m_cost(4) + t_cost(1) + p_cost(1) + salt(16) + nonce(24) + tag(16) (§10.6) |
| Passphrase blob minimum (extended, with magic prefix) | 63 bytes: 0x01 magic(1) + m_cost(4) + t_cost(1) + p_cost(1) + salt(16) + nonce(24) + tag(16) (§10.6) |
| Ratchet serialization version | 1 byte (0x01 current) |
| Call ID | 16 |
| Call HKDF output | 96 (send_key + recv_key + chain_key) |
| Call encryption key | 32 |
| Stream header | 26 (version + flags + nonce) |
| Stream chunk overhead | 17 (tag_byte + Poly1305 tag) |
| Stream min valid stream | 43 (header + empty final chunk) |
| Stream max encrypted chunk | 1,048,849 (with zstd overhead) — applies to any chunk, including the final |
| Stream max final chunk plaintext (decrypt output) | 1,048,576 (`CHUNK_SIZE`) — a final chunk's plaintext is `0..=CHUNK_SIZE` bytes; the decrypt output buffer must be at least this size regardless of expected plaintext (§15.6) |
| Stream uncompressed chunk wire stride | 1,048,593 (CHUNK_SIZE + CHUNK_OVERHEAD = 1,048,576 + 17) — the fixed byte stride between chunk boundaries in an uncompressed stream; used by §15.11 random-access offset formula: `offset = 26 + N × 1,048,593` |
| encode_session_init (no OPK) | 3,543 — field breakdown (§7.4): 14 (2 len + 12 `"lo-crypto-v1"`) + 32 (sender_ik_fp) + 32 (recipient_ik_fp) + 1216 (sender_ek / X-Wing pk) + 1122 (2 len + 1120 ct_ik) + 1122 (2 len + 1120 ct_spk) + 4 (spk_id u32 BE) + 1 (has_opk=0x00) = 3,543 |
| encode_session_init (with OPK) | 4,669 — adds 1122 (2 len + 1120 ct_opk) + 4 (opk_id u32 BE) = 3,543 + 1,126 = 4,669 (§7.4) |
| encode_ratchet_header (no KEM ct) | 1,225 |
| encode_ratchet_header (with KEM ct) | 2,347 |
| encode_prekey_bundle (no OPK) | 7,808 — 14 (2 len + 12 `"lo-crypto-v1"`) + 3200 (IK_pub) + 1216 (SPK_pub) + 4 (spk_id u32 BE) + 3373 (SPK_sig) + 1 (has_opk=0x00) = 7,808 (§5.3) |
| encode_prekey_bundle (with OPK) | 9,028 — adds has_opk=0x01 (1) + OPK_pub (1216) + opk_id u32 BE (4) = 7,808 − 1 + 1,221 = 9,028 (§5.3) |
| First-message AAD (no OPK) | 3,615 |
| First-message AAD (with OPK) | 4,741 |
| Ratchet message AAD (no KEM ct) | 1,297 |
| Ratchet message AAD (with KEM ct) | 2,419 |
| First-message wire prefix, no OPK (encode + sig) | 6,916 (3,543 + 3,373) |
| First-message wire prefix, with OPK (encode + sig) | 8,042 (4,669 + 3,373) |
| First-message encrypted payload minimum | 40 (nonce + tag) |
| Ratchet ciphertext minimum | 16 (Poly1305 tag only) |

**ML-KEM-768 expanded secret key sub-field layout**: The 2400-byte `dk_M` field has four sub-fields whose offsets matter for cross-library interoperability (the `dk_PKE` sub-field uses NTT-domain encoding, diverging from FIPS 203 coefficient-domain). See §8.5 for the full offset table (`dk_PKE` at 0, `ek_PKE` at 1152, `H(ek_PKE)` at 2336, `z` at 2368) and the byte-for-byte comparison procedure for detecting encoding incompatibilities.

## Appendix D: References

### Key Agreement and KEM Protocols

- **X3DH**: Marlinspike, M. and Perrin, T. "The X3DH Key Agreement Protocol." Signal, 2016. https://signal.org/docs/specifications/x3dh/ — Basis for LO-KEX's asynchronous key agreement design.
- **PQXDH**: Ehren, S., Gershuni, S., and Perrin, T. "The PQXDH Key Agreement Protocol." Signal, 2023. https://signal.org/docs/specifications/pqxdh/ — Signal's PQ extension of X3DH. LO-KEX uses X-Wing as the sole KEM rather than adding PQ KEM alongside DH.
- **Formal Analysis of Signal**: Cohn-Gordon, K., Cremers, C., Dowling, B., Garratt, L., and Stebila, D. "A Formal Security Analysis of the Signal Messaging Protocol." Journal of Cryptology, 2020. https://eprint.iacr.org/2016/1013 — The formal analysis LO-KEX should aspire to.
- **Modular Double Ratchet**: Alwen, J., Coretti, S., and Dodis, Y. "The Double Ratchet: Security Notions, Proofs, and Modularization for the Signal Protocol." EUROCRYPT 2019. https://eprint.iacr.org/2018/1037 — Formal treatment of the Double Ratchet as a composition of CKA and symmetric ratchet. Relevant to LO-Ratchet's KEM-based CKA adaptation.
- **CKA Extension**: Alwen, J., Coretti, S., Dodis, Y., and Tselekounis, Y. "Security Analysis and Improvements for the IETF MLS Standard for Group Messaging." CRYPTO 2021. https://eprint.iacr.org/2019/1189 — Extends the CKA framework. Relevant to understanding what security properties a KEM-based CKA (like LO-Ratchet's) must satisfy.
- **KEM-based X3DH**: Brendel, J., Fischlin, M., Günther, F., Janson, C., and Stebila, D. "Towards Post-Quantum Security for Signal's X3DH Handshake." SAC 2020. https://eprint.iacr.org/2020/1353 — Analyzes replacing DH with KEM in X3DH, including authentication asymmetry and IK encapsulation trade-offs.
- **Formal Verification of KEM-based AKE**: Cremers, C., Jacomme, C., and Lukert, P. "Subgroup-Based Key Agreement Protocols and the Security of KEM-based AKE." CRYPTO 2024. https://eprint.iacr.org/2024/1186 — Recent formal verification methodology for KEM-based key agreement. Relevant approach for future Tamarin/ProVerif analysis of LO-KEX.
- **PQ Asynchronous Key Exchange**: Hashimoto, K. "Post-Quantum Asynchronous Deniable Key Exchange and the Signal Handshake." PKC 2024. https://eprint.iacr.org/2023/1720 — Deniability and authentication in PQ adaptations of Signal's handshake.

### Hybrid Constructions

- **Hybrid AKE**: Bindel, N., Brendel, J., Fischlin, M., Goncalves, B., and Stebila, D. "Hybrid Key Encapsulation Mechanisms and Authenticated Key Exchange." PQCrypto 2019. https://eprint.iacr.org/2018/903 — Formal treatment of hybrid KEM/AKE, applicable to X-Wing and LO's hybrid signatures.
- **Hybrid Signatures**: Bindel, N., Herath, U., McKague, M., and Stebila, D. "Transitioning to a Quantum-Resistant Public Key Infrastructure." PQCrypto 2017. https://eprint.iacr.org/2017/460 — Parallel "both must verify" composition (as in LO's Ed25519 + ML-DSA-65) is EUF-CMA secure if either component is.
- **KEM Combiners**: Giacon, F., Heuer, F., and Poettering, B. "KEM Combiners." PKC 2018. https://eprint.iacr.org/2018/024 — Formal analysis of concatenate-then-KDF for multiple KEMs (relevant to ss_ik || ss_spk || ss_opk derivation).

### Component Algorithms

- **X-Wing KEM**: Connolly, D. et al. draft-connolly-cfrg-xwing-kem-09. https://eprint.iacr.org/2024/039
- **X25519 (Diffie-Hellman on Curve25519)**: Langley, A., Hamburg, M., and Turner, S. "Elliptic Curves for Security." RFC 7748, 2016. https://doi.org/10.17487/RFC7748 — Defines Curve25519 Diffie-Hellman (X25519) as used in the X-Wing classical sub-component (§8). Note §5 of RFC 7748: X25519 implicitly clamps the scalar (bits 0-2 of byte 0 cleared, bit 7 of byte 31 cleared, bit 6 of byte 31 set); the reference implementation relies on this clamping behavior and does not apply it separately. Low-order point handling is described in §6.1 and §8.3.
- **ML-KEM**: NIST FIPS 203, 2024. https://doi.org/10.6028/NIST.FIPS.203 — The NTT-domain encoding used for the `dk_PKE` sub-field of the ML-KEM expanded secret key (§8.5) is defined in FIPS 203 §4.2.1 (`NTT` function) and §4.2.2 (`ByteEncode`/`ByteDecode` in NTT representation). Reimplementers investigating the NTT-vs-coefficient divergence should consult these subsections specifically; §7.3 (`DecapsKeyGen`) defines the key generation procedure but uses coefficient-domain internally before `ByteEncode` is applied.
- **ML-DSA**: NIST FIPS 204, 2024. https://doi.org/10.6028/NIST.FIPS.204
- **Ed25519**: Josefsson, S., Liusvaara, I. "Edwards-Curve Digital Signature Algorithm (EdDSA)." RFC 8032, 2017. https://doi.org/10.17487/RFC8032
- **Double Ratchet**: Perrin, T. and Marlinspike, M. "The Double Ratchet Algorithm." Signal, 2016. https://signal.org/docs/specifications/doubleratchet/

### Symmetric Primitives

- **SHA3-256 and SHAKE256**: Dworkin, M. "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions." NIST FIPS 202, 2015. https://doi.org/10.6028/NIST.FIPS.202 — Defines the Keccak-based SHA3-256 hash function and the SHAKE256 extendable-output function (XOF). SHA3-256 is used for identity fingerprints, X-Wing combining (§8), HMAC, and HKDF; SHAKE256 is used in X-Wing's ML-KEM-768 seed expansion step (§8.5 — `SHAKE256(seed, 96)` expands the 32-byte seed to 96 bytes: `d(32) || z(32) || sk_X(32)`, the ML-KEM-768 generation randomness plus the X25519 secret key). A reimplementer who uses `SHAKE256(seed, 64)` derives only `d` and `z`, missing `sk_X` — the X25519 component. The correct length is given in Appendix A (`XWING_SEED_SHAKE_OUTPUT = 96`). Note the 136-byte block size (rate) of SHA3-256 vs SHA-2's 64-byte block size, relevant for raw HMAC implementation.
- **HMAC**: Krawczyk, H., Bellare, M., and Canetti, R. "HMAC: Keyed-Hashing for Message Authentication." RFC 2104, 1997. https://doi.org/10.17487/RFC2104 — Defines the HMAC construction. For HMAC-SHA3-256, block size is 136 bytes (SHA3-256 rate), not the SHA-2 value of 64 bytes.
- **HKDF**: Krawczyk, H. and Eronen, P. RFC 5869, 2010. https://doi.org/10.17487/RFC5869
- **ChaCha20-Poly1305**: Nir, Y., Langley, A. "ChaCha20 and Poly1305 for IETF Protocols." RFC 8439, 2018. https://doi.org/10.17487/RFC8439
- **XChaCha20-Poly1305 (HChaCha20 extension)**: Arciszewski, S. "XChaCha20-Poly1305 Construction." draft-irtf-cfrg-xchacha-03, 2020. https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-xchacha-03 — Defines HChaCha20 (the PRF that extends ChaCha20's 8-byte nonce to 24 bytes). RFC 8439 alone does not define HChaCha20 or XChaCha20; this document is the specification for the 24-byte nonce construction used throughout soliton.
- **Nonce Reuse**: Joux, A. "Authentication Failures in NIST version of GCM." 2006. — Why AEAD nonce reuse is catastrophic (applies to Poly1305 as well as GCM); motivates LO's defense-in-depth random nonce for first messages.
- **Argon2id**: Biryukov, A., Dinu, D., Khovratovich, D., and Josefsson, S. "Argon2 Memory-Hard Function for Password Hashing and Proof-of-Work Applications." RFC 9106, 2021. https://doi.org/10.17487/RFC9106 — Password-based key derivation used in §10.6. The Argon2id variant (hybrid of Argon2i and Argon2d) is specified; do not substitute Argon2i or Argon2d.
- **Zstandard**: Collet, Y. and Kucherawy, M. "Zstandard Compression and the application/zstd Media Type." RFC 8878, 2021. https://doi.org/10.17487/RFC8878 — Compression format used for storage blobs (§11.3) and streaming chunks (§15.5). Pure Rust implementation via the `ruzstd` crate; no dependency on the reference C library.
- **STREAM**: Hoang, V.T., Reyhanitabar, R., Rogaway, P., and Vizár, D. "Online Authenticated-Encryption and its Nonce-Reuse Misuse-Resistance." CRYPTO 2015. https://eprint.iacr.org/2015/189 — Streaming AEAD construction. LO's streaming API uses counter-based nonce derivation (for random access) rather than STREAM's ciphertext chaining.

### General

- **SoK: Secure Messaging**: Unger, N. et al. IEEE S&P 2015. https://doi.org/10.1109/SP.2015.22 — Covers TOFU, forward secrecy, deniability. Useful for positioning LO's design choices.
- **Post-Quantum Key Exchange / OQS**: Stebila, D. and Mosca, M. "Post-Quantum Key Exchange for the Internet and the Open Quantum Safe Project." SAC 2016. https://doi.org/10.1007/978-3-319-69453-5_2 — Background on post-quantum key exchange design; liboqs originates from this project.
- **NIST PQC Standardization**: https://csrc.nist.gov/projects/post-quantum-cryptography
- **EFF Wordlist**: Electronic Frontier Foundation large wordlist for passphrase generation (7,776 words). https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases — The embedded copy is the July 2016 version (108,800 bytes, 7776 lines, LF line endings, dice-number prefix stripped). The hash is computed over the file's raw bytes with LF (`\n`) line endings — CRLF-normalized copies have different byte lengths and a different hash. **Windows CRLF trap**: On Windows with `core.autocrlf=true`, Git normalizes LF to CRLF on checkout. After dice-prefix stripping, each line becomes `WORD\r` — every word gains a trailing carriage return (0x0D). The wordlist hash detects this if verified on the embedded bytes (the embedded CRLF copy produces a different hash), but if stripping and embedding happen at runtime from a file (rather than at build time with a compile-time assertion), the `\r` appears silently in every word: phrases differ from conforming implementations with no error indicator. Implementations that load the wordlist from a file MUST strip any trailing `\r` (0x0D) from each line before use, in addition to stripping the dice prefix. SHA3-256 of the raw file: `a1e90a00ec269fc42a5f335b244cf6badcf94b62e331fa1639b49cce488c95c5`. Reimplementers MUST verify their wordlist matches this hash — different versions or copies of the "EFF large wordlist" produce different phrases for the same indices. **Word lookup is case-insensitive; canonical form is lowercase**: All words in the embedded wordlist are lowercase ASCII. When looking up a user-entered word (e.g., during phrase verification), comparisons MUST be case-insensitive — `"Abacus"`, `"ABACUS"`, and `"abacus"` all resolve to the same word. The canonical stored form and the form used for index derivation is lowercase. Implementations MUST normalize user input to lowercase before lookup, not expect the user to type in exact case. A case-sensitive comparison would reject correctly-entered phrases from users who capitalize the first word or type in all-caps.

**Raw file format and stripping step**: Each line in the original EFF file has the format `DDDDD\tWORD\n` — a 5-digit decimal dice number (e.g., `"11111"`), a literal tab character (`\t`), the word (e.g., `"abacus"`), and a LF newline. Soliton strips the prefix by discarding every character up to and including the first tab on each line, retaining only the word. The resulting embedded wordlist is one word per line with no dice prefix, no tab, and no trailing whitespace. A reimplementer who strips only the digits (not the tab), or who splits on whitespace and takes the last token, produces the same words but must verify against the hash. A reimplementer who takes the first token instead of the last gets the dice number, not the word — a silent interop failure.

## Appendix E: Implementor's Guide

This appendix consolidates security-critical requirements scattered throughout the specification into a single reference for binding authors and application developers.

### RNG Requirements

All randomness must come from the OS CSPRNG (`getrandom` on Linux, `CryptGenRandom` on Windows, `SecRandomCopyBytes` on macOS/iOS). There is no fallback mechanism — RNG failure is fatal.

The following operations consume randomness:

| Operation | Randomness consumed | Section |
|---|---|---|
| `generate_identity` | Ed25519 keygen, X25519 keygen, ML-KEM-768 keygen, ML-DSA-65 seed | §2.1, §3.1 |
| `xwing::keygen` | X25519 keygen, ML-KEM-768 keygen | §2.3 |
| `xwing::encapsulate` | X25519 ephemeral scalar, ML-KEM-768 encap coins | §2.3 |
| `HybridSign` | ML-DSA-65 hedged `rnd` (32 bytes, ephemeral, zeroized after `Sign_internal` returns — §3.1) | §3.1 |
| `encrypt_first_message` | 192-bit random nonce | §5.4 |
| KEM ratchet step (send) | `xwing::keygen` + `xwing::encapsulate` | §6.4 |
| `auth_challenge` | `xwing::encapsulate` | §4.2 |
| Call ephemeral KEM | `xwing::keygen` + `xwing::encapsulate` | §6.12 |
| `stream_encrypt_init` | 192-bit random base nonce | §15.1 |
| Stream key (caller) | 256-bit random key (one per stream, MUST NOT be derived from ratchet material) | §15.1 |

### Failure Semantics

| Operation | Error | Rollback | State after | Retryable? |
|---|---|---|---|---|
| `encrypt()` | AEAD failure | **Session-fatal.** All session keys zeroized as defense-in-depth — a transient AEAD failure followed by retry could produce valid encryption with compromised internal state. Send counter is *not* incremented (§6.5), but the session is irrecoverable after key zeroization. | Permanently unusable — discard session. | No — new session required. |
| `decrypt()` | `InvalidData` (dead session: zeroed `root_key`) | Returns before snapshot — no state mutation occurs. | Unchanged. | No — session is permanently dead (post-reset or deserialized from zeroed state). New session required. |
| `decrypt()` | `InvalidData` (missing `prev_recv_epoch_key`) | Returns before snapshot — no state mutation occurs. | Unchanged. | No — structural error in the message/state combination. |
| `decrypt()` | `ChainExhausted` (header `n == u32::MAX`) | Returns after snapshot but before any state mutation — rollback is a no-op. The guard is the first operation inside the inner decryption function, before epoch routing, KEM ratchet, or key derivation. The §6.6 pseudocode shows it before the snapshot for presentational clarity; both orderings are correct since no mutations precede the guard. | Unchanged. | No — counter value is inherent to the message. |
| `decrypt()` | `AeadFailed` | Full snapshot/rollback (§6.6). State is restored to pre-decrypt values. | Unchanged. | Yes — caller may retry with different messages. |
| `decrypt()` | `DuplicateMessage` | Full snapshot/rollback (§6.6). Rollback is a no-op: `DuplicateMessage` can only occur in previous-epoch or current-epoch paths where no state fields are modified before the duplicate check. | Unchanged. | No — message was already processed. |
| `decrypt()` | `ChainExhausted` (`recv_seen` cap) | Full snapshot/rollback (§6.6). | Unchanged. | No — epoch's `recv_seen` set is full (65536 entries). For current-epoch saturation: requires peer to send from a new epoch (one KEM ratchet step = one direction change). For `prev_recv_seen` saturation: the next KEM ratchet step copies the current `recv_seen` into `prev_recv_seen` — if the current set is also full, `prev_recv_seen` remains saturated after the step. Full recovery from `prev_recv_seen` saturation may require two direction changes (the first rotates current into previous; the second discards the saturated previous). |
| `decrypt()` | `InvalidData` (`send_ratchet_sk` is None in new-epoch path) | Returns after snapshot but before any state mutation — rollback is a no-op. The new-epoch path checks `send_ratchet_sk` presence before performing the KEM ratchet receive step (the `else if NOT current_epoch:` block in §6.6). | Unchanged. | No — same message will fail again on any ratchet state (structurally malformed: new-epoch message requires decapsulation with local secret key). |
| `decrypt()` | `InvalidData` (`kem_ct` absent in new-epoch message) | Returns after snapshot but before any state mutation — rollback is a no-op. | Unchanged. | No — same message will fail again (structurally malformed: new-epoch header lacks KEM ciphertext). |

### Streaming AEAD Failure Semantics

Key differences from ratchet encrypt/decrypt:

| Operation | Error | State after | Retryable? |
|---|---|---|---|
| `encrypt_chunk` | `AeadFailed` | Unchanged — `next_index` not advanced, `finalized` not set. | Yes — retry with the same plaintext. Note: in practice, `AeadFailed` from `encrypt_chunk` is structurally unreachable — XChaCha20-Poly1305 encrypt can only fail on usize overflow (§7.1), which cannot occur with chunk sizes bounded by `CHUNK_SIZE` (1,048,576 bytes). If encountered, it indicates an unexpected integer overflow in the AEAD layer, not a transient condition. |
| `encrypt_chunk` | `ChainExhausted` (`next_index == u64::MAX`) | Unchanged — guard fires before any state mutation. | No — `next_index` cannot advance further. The handle is not freed; call `soliton_stream_encrypt_free`. |
| `encrypt_chunk` | `Internal` (zstd expansion > `STREAM_ZSTD_OVERHEAD`) | Unchanged — guard fires before AEAD. | Yes — retry the same chunk with `compress = false`. This is not session-fatal; the streaming key is not zeroized. |
| `encrypt_chunk` | `InvalidData` (post-finalization or bad chunk size) | Unchanged. | No — structural caller error. |
| `decrypt_chunk` | `AeadFailed` | Unchanged — `next_index` not advanced, `finalized` not set. | Yes — retry or skip; the decryptor survives. |
| `decrypt_chunk` | `ChainExhausted` | Unchanged. | No — same semantics as encrypt-side. |
| `decrypt_chunk` | `InvalidData` (post-finalization, framing, or version mismatch) | Unchanged. | No. |

**Critical differences from ratchet:**
- All streaming failures are retry-safe — the streaming key is NEVER zeroized on per-chunk error (unlike ratchet `encrypt()`, where `AeadFailed` zeroizes all keys and makes the session permanently unusable). Retrying a failed chunk with the correct input will succeed.
- `Internal` from compression expansion is retryable — pass `compress = false` for the affected chunk. This is an encode-side-only path (no oracle concern); no session state is affected.
- `ChainExhausted` from a streaming handle does not affect the ratchet state in any way — the two are independent.

**`stream_encrypt_free` / `stream_decrypt_free` outer-null behavior differs from ratchet/keyring/callkeys free**: `soliton_ratchet_free`, `soliton_keyring_free`, and `soliton_call_keys_free` treat outer-null as a safe no-op (return 0). `soliton_stream_encrypt_free` and `soliton_stream_decrypt_free` return `NullPointer` (-13) for outer-null. The rationale: `soliton_stream_encrypt_init` and `soliton_stream_decrypt_init` always write a non-null handle on success — a null outer pointer cannot arise from normal use (init succeeded, producing a valid handle; init failed, leaving the output unchanged). An outer-null pointer to a stream free function signals a caller bug (passing an uninitialized pointer or a wrong variable), whereas a null outer pointer to ratchet/keyring/callkeys free more plausibly arises from defensive cleanup patterns. A reimplementer who makes all free functions return 0 for outer-null will diverge silently; a binding author who expects `NullPointer` for stream-free outer-null and tests for it will not catch the bug if using the non-streaming free functions.

### Caller Obligations

1. **Fingerprint → key resolution**: The caller is responsible for mapping identity fingerprints to authentic public keys. Incorrect resolution causes §5.5 Step 1 (fingerprint mismatch) or Step 3 (signature verification failure) to fail; the session does not establish silently with a wrong key. The library provides `fingerprint_hex()` and verification phrases (§9) but does not manage identity stores, TOFU, or key pinning.

   **`receive_session` fingerprint mismatch returns `InvalidData`, not `BundleVerificationFailed`**: `receive_session` is called with a parsed `SessionInit` (not a bundle), so `BundleVerificationFailed` is not applicable. A fingerprint mismatch (sender or recipient fingerprint does not match expected values) in `receive_session` returns `InvalidData`. `BundleVerificationFailed` applies only to `verify_bundle` (§5.3), which also collapses crypto-version mismatches and signature failures to `BundleVerificationFailed` to prevent an enumeration oracle. Callers who pattern-match on the error from `receive_session` expecting `BundleVerificationFailed` will never match it — all fingerprint validation failures from `receive_session` arrive as `InvalidData`.

2. **OPK deletion**: Must happen before the ratchet state is used for messaging (§5.5 Step 4, §10.3). Failure to delete allows an attacker who later compromises the OPK to recover the session key.

3. **SPK rotation**: 7-day cycle with 30-day grace period for old secret keys (§10.2). Stale SPKs reduce forward secrecy.

4. **Secret key zeroization**: `IdentitySecretKey`, `xwing::SecretKey`, and shared secrets implement `ZeroizeOnDrop` in Rust. CAPI callers must free library-allocated buffers via `soliton_buf_free` and zeroize caller-owned copies of secret material via `soliton_zeroize` — standard C `memset` may be optimized out. Failing to do either leaks key material.

5. **Concurrency**: All opaque CAPI handles (`SolitonRatchet`, `SolitonKeyRing`, `SolitonCallKeys`, `SolitonStreamEncryptor`, `SolitonStreamDecryptor`) are **not** thread-safe. Each handle embeds an `AtomicBool` reentrancy guard — concurrent calls on the same handle return `ConcurrentAccess` (-18) rather than corrupting state. This is a diagnostic for caller threading bugs, not a retriable error. Callers must serialize access per handle (e.g., mutex). Concurrent use of *different* handles is safe. `SolitonKeyRing` is particularly deceptive: a server encrypting storage blobs for multiple users might naturally share a single keyring across threads, but this will trigger `ConcurrentAccess`. Create one keyring per thread instead.

6. **Storage decompression**: The 256 MiB decompression limit (§11.3) is enforced internally. Callers need not guard against zip bombs.

7. **Stream key freshness**: Each stream key MUST be freshly generated from the OS CSPRNG (`random_bytes(32)`). Do not derive stream keys from ratchet material (epoch key, root key, call key) — a ratchet epoch compromise would propagate to all streams whose keys were derived from the compromised epoch, defeating per-stream isolation. The standard composition: generate a random key, encrypt the stream, then transmit the key inside a ratchet-encrypted message alongside stream metadata (§15.1).

8. **Auth shared-secret zeroization**: The shared secret returned by `auth_respond` (§4.2) must be zeroized immediately after use. In Rust, `auth_respond` returns `Zeroizing<[u8; 32]>` (automatic). CAPI callers receive the shared secret in a caller-provided buffer and must call `soliton_zeroize` on it after consuming the value. Failure to zeroize leaves the authentication shared secret on the heap, recoverable via memory scanning.

9. **Argon2id parameter coupling**: `m_cost` must be ≥ 8 × `p_cost` (RFC 9106 §3.1). This constraint is enforced at the library level — `soliton_argon2id` returns `InvalidData` for combinations where `m_cost < 8 × p_cost` (e.g., `m_cost=100, p_cost=100`). Per-parameter range checks (`m_cost ∈ [8, 4,194,304]`, `p_cost ∈ [1, 256]`) pass individually for such combinations, so a binding author who validates parameters against per-field bounds only will not discover the error until the library call returns `InvalidData`. The coupling check must be performed in addition to the individual range checks (§10.6 / Appendix B parameter limits).

10. **Old SPK secret key zeroization**: After the 30-day SPK retention window (§10.2), the old SPK secret key MUST be zeroized and discarded. Retaining old SPK private keys beyond the retention window allows an attacker who later compromises those keys to decrypt sessions established with the corresponding SPK, retroactively breaking forward secrecy for the retention period. The library does not manage SPK lifecycle or trigger zeroization automatically — this is the caller's responsibility. See §10.2 for the rotation schedule.

11. **Ephemeral `ek_sk` zeroization after ratchet init**: The X-Wing ephemeral secret key (`ek_sk`, 2432 bytes) in `SolitonInitiatedSession` is passed to `soliton_ratchet_init_alice` and must be zeroized and freed immediately after. `soliton_kex_initiated_session_free` performs both the zeroization and deallocation. Do not retain `ek_sk` after `init_alice` returns — the ratchet has absorbed the public key counterpart (`ek_pk`); the private key is no longer needed and its continued presence in memory extends the window for side-channel or memory-dump attacks. The same obligation applies if `ratchet_init_alice` returns an error: zeroize and free `ek_sk` before retrying or cleaning up. See §13.5 for the single-use enforcement note.

12. **Persistent session deserialization MUST use `from_bytes_with_min_epoch`, not bare `from_bytes`**: When deserializing a persisted ratchet state (§6.8), callers MUST use `soliton_ratchet_from_bytes_with_min_epoch` (passing the epoch value stored on the last successful `to_bytes` call as `min_epoch`). Using bare `from_bytes` / `soliton_ratchet_from_bytes` silently accepts a rolled-back epoch, permanently disabling anti-rollback protection — an attacker who can substitute an earlier blob snapshot will cause message-key replay. Bare `from_bytes` is provided for initial deserialization only (when no prior epoch exists — specifically, when the application has never successfully completed a `to_bytes` call on this session and therefore has no persisted `min_epoch` value to supply), not for routine restore-from-disk operations. **Binding-layer obligation**: `soliton_ratchet_from_bytes` in `soliton.h` has no deprecation marker in the C header — the deprecation exists only at the Rust API level. Binding authors MUST add a language-level deprecation annotation when wrapping this function (e.g., `@deprecated` in Java/Kotlin, `[Obsolete]` in C#, `#[deprecated]` in Rust re-exports, a deprecation warning in Python docstrings) so that callers of the binding receive the same guidance as callers of the Rust API. A session that has been serialized at least once always has a `min_epoch` to supply; that value MUST be stored persistently alongside the encrypted ratchet blob. Losing the `min_epoch` store (e.g., due to a crash or storage error) does NOT qualify as "no prior epoch" — the correct response is to treat the session as potentially compromised (reset it and re-establish via LO-KEX), not to fall back to bare `from_bytes`. Callers who always use bare `from_bytes` will never observe an error from anti-rollback rejection, even when a rollback attack is in progress.

13. **`recv_seen` saturation recovery requires a peer KEM ratchet step, not local state manipulation**: When `decrypt()` returns `ChainExhausted` due to `recv_seen` saturation (65536 entries — §6.8), the correct recovery path is to wait for the peer to send a new message from a new epoch (a KEM ratchet step, triggered by sending in a direction that requires a new ephemeral key). The `recv_seen` set resets automatically on the next KEM ratchet step — no explicit caller action is needed. Callers MUST NOT attempt to clear or manipulate the `recv_seen` set directly; the ratchet state provides no API for this, and the set's integrity is essential for replay protection. An application that interprets `ChainExhausted` from `decrypt()` as session-fatal will incorrectly terminate a recoverable session — see §12 for the full per-mode `ChainExhausted` breakdown.

### Constant-Time Requirements

| Operation | Requirement | Implementation |
|---|---|---|
| `auth_verify` | Constant-time comparison | `subtle::ConstantTimeEq` |
| AEAD tag verification | Constant-time | Handled by `chacha20poly1305` crate |
| `hybrid_verify` | Constant-time AND combination of Ed25519 + ML-DSA results | `subtle::Choice` bitwise AND (§3.2) — short-circuit `&&` leaks which component failed, enabling targeted forgery. **`Err` returns must not cause early exit before both verifications complete**: libraries that return `Err` (rather than `Ok(false)`) on verification failure (e.g., for malformed-length signatures) must be wrapped so that both components are evaluated before any error is returned — an early `?` propagation on the first component's `Err` skips the second component entirely, leaking which half failed. The reference wraps both calls to produce `subtle::Choice` values before combining; a reimplementer must apply the same pattern when the underlying library uses error returns rather than boolean-valued verification results. |
| Epoch identification (§6.6) | Constant-time public key comparison | `subtle::ConstantTimeEq` on `ratchet_pk` vs `recv_ratchet_pk` / `prev_recv_ratchet_pk` — variable-time comparison would leak which epoch a message belongs to (current, previous, or new), revealing ratchet state to a timing attacker |
| Root key liveness check (§6.5, §6.6) | Constant-time all-zero comparison | `subtle::ConstantTimeEq` — defense-in-depth against partial-zero leakage |
| `derive_call_keys` secret input checks (§6.12) | Constant-time all-zero comparison for `root_key` and `kem_ss` | `subtle::ConstantTimeEq` — both are secret material; `call_id` and fingerprint equality use variable-time `==` (non-secret public values) |
| `verify_bundle` IK_pub comparison (§5.3) | Constant-time comparison | `subtle::ConstantTimeEq` on the full 3200-byte stored identity key vs. `bundle.IK_pub` — a variable-time comparison leaks the stored key byte-by-byte via response timing (32 × 100-byte probes, each byte determined at the cost of constructing a crafted bundle, far cheaper than paying `HybridVerify` per probe). `verify_bundle` collapses all failures to `BundleVerificationFailed` but does not prevent timing measurements on that common return path. |
| `receive_session` fingerprint checks (§5.5 Step 1) | Constant-time comparison | `subtle::ConstantTimeEq` on untrusted wire fingerprints before signature verification — variable-time comparison would let an attacker probe the expected fingerprint byte-by-byte via timing |
| X25519 DH all-zero output check (§8.3) | Constant-time comparison | `subtle::ConstantTimeEq` against `[0u8; 32]` — the DH output is secret material before the check fires. A variable-time comparison leaks whether the low-order-point substitution path was taken, revealing a bit of information about the relationship between the ephemeral key and the recipient's public key |
| `StorageKey::new` all-zero key rejection (§11.6) | Constant-time comparison | `subtle::ConstantTimeEq` against `[0u8; 32]` — the key is secret material; variable-time comparison leaks partial information about the key value during the liveness check |
| Streaming layer key quality (§15.1) | **No all-zero check — deliberate asymmetry with storage layer**. `stream_encrypt_init` and `stream_decrypt_init` do NOT validate that the caller-provided key is non-zero. Storage keys are library-managed, long-lived, and validated at construction time (`StorageKey::new`); streaming keys are ephemeral, caller-provided, and used once — validating them would be a caller-responsibility violation. See §15.1 "All-zero key policy" for full rationale. A reimplementer who adds an all-zero guard to streaming init for consistency with the storage layer diverges from the specification. | N/A — streaming init does not inspect key quality |
| Stream header version/flags byte comparisons (§15.8) | No constant-time requirement — public values | The version byte and flags byte in the stream header are cleartext, attacker-visible values. Variable-time comparison leaks no information beyond what is already observable from the header bytes themselves. Constant-time comparison (e.g., `subtle::ConstantTimeEq`) would provide no security benefit here and would add unnecessary complexity. This is in contrast to AEAD tag verification (always CT) and ratchet epoch identification (CT to prevent timing leakage of ratchet state). |
| All other operations | No constant-time requirement at the protocol level | — |

## Appendix F: Test Vectors

All values are hex-encoded. These vectors enable a reimplementor to verify their primitive constructions before attempting full protocol integration.

### F.1 KDF_MsgKey (§6.5)

```
epoch_key:  4242424242424242424242424242424242424242424242424242424242424242
counter:    7
HMAC input: 0100000007   (0x01 || BE32(7))
msg_key:    cac256e53d0b0abc468331210d63c50f15ec875c3badfef6bfe53e1137165610
```

Construction: `HMAC-SHA3-256(key=epoch_key, data=0x01 || counter_BE32)`

**Counter = 0** (first message in Bob's initial epoch and in every post-KEM-ratchet epoch for both parties). Alice's first ratchet send uses counter=1, not 0 — her `send_count` starts at 1 after session initiation (§6.2). See §6.7.1 for a worked example where Alice's first message has `n=1`.

```
epoch_key:  4242424242424242424242424242424242424242424242424242424242424242
counter:    0
HMAC input: 0100000000   (0x01 || BE32(0))
msg_key:    5ac7a1b8dd3103a3ef7bab0af995570a087b6a92b34d93bc8c88f3485e96054d
```

### F.2 KDF_Root (§6.4)

```
root_key (salt):  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
kem_ss (ikm):     bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
info:             "lo-ratchet-v1"
new_root_key:     db7be3c198f86c5e044d6f5c39d526eaf72a651a4cd6b7d32b1adb6b6754d587
new_epoch_key:    71ceff4de7d184f3c97821177dc5afcc2abc334707301c0b9267a3f4b0aa0ff9
```

Construction: `HKDF-SHA3-256(salt=root_key, ikm=kem_ss, info="lo-ratchet-v1", len=64)`. First 32 bytes = new root key, last 32 bytes = new epoch key.

### F.3 X-Wing Combiner (§8.2)

```
ss_M:     1111111111111111111111111111111111111111111111111111111111111111
ss_X:     2222222222222222222222222222222222222222222222222222222222222222
ct_X:     3333333333333333333333333333333333333333333333333333333333333333
pk_X:     4444444444444444444444444444444444444444444444444444444444444444
label:    5c2e2f2f5e5c
output:   40ad7dbc0dd87305287bd9a9104f5dc064db038a8ac3da443fe3a090a272e2d5
```

Construction: `SHA3-256(ss_M || ss_X || ct_X || pk_X || label)`. Label goes last (draft-09 §5.3).

**`pk_X` in this vector is a fabricated test value**: The `pk_X = 0x44...44` input above is not derived from any real X25519 scalar — it is a fixed constant used to verify the SHA3-256 combiner construction independently of X25519 arithmetic. In the actual protocol, `pk_X` is re-derived during decapsulation as `X25519(sk_X, G)` (§8.2 and §8.5), where `G` is the standard X25519 base point: the 32-byte little-endian encoding of the integer 9, i.e., `09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00` (RFC 7748 §6.1). This is the one call to X25519 that is NOT a Diffie-Hellman step — it is a public-key rederivation: `X25519(scalar, basepoint)`. A reimplementer who accidentally uses `ct_X` (the ephemeral key from the ciphertext, 32 bytes) instead of `G` (the fixed base point) produces wrong `ss_X` silently — the combiner runs, AEAD fails, and no diagnostic points to the wrong base point. No fabricated X25519 key-derivation vector is provided here — use the RFC 7748 §6.1 KAT vectors to validate `X25519(scalar, basepoint)` separately, then combine with this combiner vector to build confidence in the full `XWing.Decaps` pipeline. **Clamping divergence is the primary risk**: an implementation that omits RFC 7748 clamping (clear bits 0, 1, 2, 255; set bit 254) before the scalar multiply produces a different output without any error signal — both clamped and unclamped scalars produce valid curve points, just different ones. RFC 7748 §6.1 test vector 1 (u=09...00, k=77...00) exercises the clamped path; compare your `X25519(sk_X, G)` output against that vector to confirm your library applies clamping before the multiply. §8.5 documents the per-use clamping requirement in detail.

**Using draft-09 X-Wing KAT for full pipeline verification**: The IETF draft-connolly-cfrg-xwing-kem-09 provides a full X-Wing decapsulation KAT in its Appendix C (SHAKE-256 seed → key generation → encapsulation → decapsulation → shared secret). Applying it to LO requires three adaptations:
1. **Key ordering**: LO uses X25519-first storage (`sk_X (32) || dk_M (2400)` for secret key, `pk_X (32) || pk_M (1184)` for public key); draft-09 uses ML-KEM-first. Extract and reorder components before using draft-09 vectors.
2. **Seed expansion**: LO expands the X-Wing seed via `SHAKE-256(seed, 96)` → `d (32) || z (32) || sk_X (32)`. The `d` and `z` values feed ML-KEM key generation; `sk_X` is the X25519 scalar. This matches draft-09 §6.2; verify your SHAKE-256 implementation produces the same intermediate values.
3. **Ciphertext ordering**: LO's ciphertext is `ct_X (32) || ct_M (1088)` (X25519-first); draft-09's ciphertext is `ct_M (1088) || ct_X (32)`. Swap when extracting from draft-09 test vectors. The combiner inputs remain identical once extracted: `SHA3-256(ss_M || ss_X || ct_X || pk_X || label)`.

The source contains an `xwing_draft09_decap_kat` test that performs exactly this adaptation — use it as a reference for the above steps.

**Byte-order swap produces silent wrong output via ML-KEM implicit rejection**: If the ciphertext byte order is not adapted (i.e., a draft-09 ML-KEM-first ciphertext `ct_M(1088) || ct_X(32)` is passed to LO's X25519-first decapsulator as-is), LO extracts the first 32 bytes as `ct_X` (these are actually the first 32 bytes of `ct_M`) and the remaining 1088 bytes as `ct_M` (these are the last 1056 bytes of `ct_M` concatenated with the actual `ct_X`). ML-KEM-768 decapsulation of the malformed 1088-byte input does not fail with an error — FIPS 203 defines implicit rejection: when decapsulation fails (ciphertext does not re-encrypt to itself), the function returns a pseudorandom output derived from a pre-keyed hash rather than reporting an error. So `ss_M` becomes a pseudorandom value, `ss_X` is computed from wrong bytes, the combiner produces a wrong but plausible-looking 32-byte output, and the AEAD fails with no diagnostic pointing to the byte-order swap. A byte-order-swap bug in a reimplementation is therefore completely invisible until AEAD fails; no intermediate value signals the error. The `xwing_draft09_decap_kat` test catches this by comparing against the expected shared secret after decapsulation — any byte-order mistake causes a test-vector mismatch at that comparison point.

### F.4 Identity Fingerprint (§2.1)

```
pk:           5555...55  (3200 bytes of 0x55)
fingerprint:  6197102522f51ba35cf4e2e721ffcc5a1ae8e9dc14442b093bc0388696569a4d
```

Construction: `SHA3-256(pk)`

### F.5 Auth HMAC (§4)

```
shared_secret:  cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
label:          "lo-auth-v1"
hmac_output:    b12569ef76edbe2f1215b876d89db5f067bdbf35bd99c6d0bcd47733609f02cf
```

Construction: `HMAC-SHA3-256(key=shared_secret, data="lo-auth-v1")`. This vector covers only the HMAC step. The shared secret is the X-Wing combined output (§8.2) from the X-Wing encapsulation/decapsulation in §4.2 — see F.3 for the combiner KAT and §8.2 for the full encap/decap pseudocode.

### F.6 Verification Phrase Hash (§9.2)

```
pk_a:   0101...01  (3200 bytes of 0x01)
pk_b:   0202...02  (3200 bytes of 0x02)
sorted: pk_a first (lexicographic)
hash:   94488b955db55587ef0e0b1721a6db95b62b6f2c61ba158a557a2e007c7638b9
```

Construction: `SHA3-256("lo-verification-v1" || sorted_first || sorted_second)`

**Word output** (from rejection sampling on the hash above):

```
Samples (u16 big-endian from hash bytes):
  [0..2]  0x9448 = 37960  → accepted, 37960 % 7776 = 6856  → "triangle"
  [2..4]  0x8b95 = 35733  → accepted, 35733 % 7776 = 4629  → "phobia"
  [4..6]  0x5db5 = 23989  → accepted, 23989 % 7776 = 661   → "breeder"
  [6..8]  0x5587 = 21895  → accepted, 21895 % 7776 = 6343  → "sterile"
  [8..10] 0xef0e = 61198  → accepted, 61198 % 7776 = 6766  → "tibia"
  [10..12] 0x0b17 = 2839  → accepted, 2839 % 7776 = 2839   → "gerbil"
  [12..14] 0x21a6 = 8614  → accepted, 8614 % 7776 = 838    → "caption"

Phrase: "triangle phobia breeder sterile tibia gerbil caption"
```

All 7 samples accepted with no rejections (no rehash needed). This vector verifies the full pipeline: hash → u16 extraction → rejection sampling → modular index → EFF wordlist lookup. **Note**: Neither this vector nor the "with rejection" vector below exercises the rehash path (§9.2: when all 16 u16 samples from a 32-byte hash are exhausted before producing 7 accepted words, compute `SHA3-256("lo-phrase-expand-v1" || round || hash)` and continue sampling from the new hash). Both vectors complete within the first hash. The rehash path is tested by unit tests with adversarial inputs; a KAT vector is impractical because no naturally-occurring fingerprint pair is known to require rehashing (the probability of exhausting 16 samples is approximately (1 − 62208/65536)^16 ≈ 2 × 10⁻²¹).

**With rejection** (exercises cursor-advance-on-reject behavior):

```
pk_a:   0808...08  (3200 bytes of 0x08)
pk_b:   0101...01  (3200 bytes of 0x01)
sorted: pk_b first (lexicographic: 0x01 < 0x08)
hash:   9ea8205db7552a2a0679fbe6760b49fb59b46559ea3a44708ec7feb19b1c8d85

Samples (u16 big-endian from hash bytes):
  [0..2]   0x9ea8 = 40616  → accepted, 40616 % 7776 = 1736  → "despise"
  [2..4]   0x205d = 8285   → accepted, 8285 % 7776 = 509    → "barrier"
  [4..6]   0xb755 = 46933  → accepted, 46933 % 7776 = 277   → "approve"
  [6..8]   0x2a2a = 10794  → accepted, 10794 % 7776 = 3018  → "grinch"
  [8..10]  0x0679 = 1657   → accepted, 1657 % 7776 = 1657   → "degrading"
  [10..12] 0xfbe6 = 64486  → REJECTED (≥ 62208), cursor advances to [12]
  [12..14] 0x760b = 30219  → accepted, 30219 % 7776 = 6891  → "tropical"
  [14..16] 0x49fb = 18939  → accepted, 18939 % 7776 = 3387  → "implosive"

Phrase: "despise barrier approve grinch degrading tropical implosive"
```

This vector exercises the critical cursor-advance-on-reject behavior: sample [10..12] (0xfbe6 = 64486) fails the `< 62208` acceptance check and is discarded. The cursor advances past it to [12..14] — it does NOT retry at [10..12]. A reimplementer who advances the cursor only on accepted samples would read [10..12] as the 6th accepted sample and produce a different phrase. The rejection also means 8 u16 samples are consumed for 7 words (16 bytes of the 32-byte hash).

### F.7 Ratchet Nonce from Counter (§6.5)

```
counter:  42
nonce:    000000000000000000000000000000000000000000000000002a  (24 bytes: zeros with BE32(42) in bytes 20..24)
```

Counter occupies the last 4 bytes of a 24-byte nonce buffer. Bytes 0-19 are zero.

**Counter = 0** (first message of each epoch and of every post-KEM-ratchet epoch):
```
counter:  0
nonce:    000000000000000000000000000000000000000000000000  (24 bytes: all zero)
```

An all-zero nonce is valid — see §7.2 for rationale. A reimplementer who guards against all-zero nonces as a "sanity check" would incorrectly reject every epoch's first message.

**Counter = 1** (second message, validates that counter increments appear at the correct byte positions):
```
counter:  1
nonce:    000000000000000000000000000000000000000000000001  (24 bytes: zeros with BE32(1) in bytes 20..24)
```

The difference between counter=0 and counter=1 is a single bit flip at byte 23 (the least-significant byte of the 4-byte big-endian counter). A reimplementer whose counter goes in bytes 0-3 (wrong end) instead of bytes 20-23 would produce `01000000...00` for counter=1 — the counter=1 vector catches this.

### F.8 Storage AAD Construction (§11.4.1)

```
version:    01
flags:      00 (uncompressed)
channel_id: "general"     (67656e6572616c)
segment_id: "2024-03-15"  (323032342d30332d3135)

aad: 6c6f2d73746f726167652d76310100000767656e6572616c000a323032342d30332d3135
     (36 bytes)
```

Construction: `"lo-storage-v1" || version || flags || len(channel_id) || channel_id || len(segment_id) || segment_id`. Length prefixes are 2-byte big-endian.

### F.9 Streaming Nonce Derivation (§15.3)

```
base_nonce:   101112131415161718191a1b1c1d1e1f2021222324252627

chunk_index=0, tag_byte=0x00 (non-final):
  mask:         000000000000000000000000000000000000000000000000
  chunk_nonce:  101112131415161718191a1b1c1d1e1f2021222324252627

chunk_index=2, tag_byte=0x00 (non-final):
  mask:         000000000000000200000000000000000000000000000000
  chunk_nonce:  101112131415161518191a1b1c1d1e1f2021222324252627

chunk_index=0, tag_byte=0x01 (final — single-chunk stream):
  mask:         000000000000000001000000000000000000000000000000
  chunk_nonce:  101112131415161719191a1b1c1d1e1f2021222324252627

chunk_index=2, tag_byte=0x01 (final):
  mask:         000000000000000201000000000000000000000000000000
  chunk_nonce:  101112131415161519191a1b1c1d1e1f2021222324252627
```

Construction: `mask = chunk_index(u64 BE) || tag_byte || 0x00*15`, `chunk_nonce = base_nonce XOR mask`.

The pair `(chunk_index=2, tag_byte=0x00)` and `(chunk_index=2, tag_byte=0x01)` verifies that `tag_byte` participates in the XOR: the two nonces differ only at byte 8 (`0x18` vs. `0x19`), which is exactly where `tag_byte` sits in the mask. An implementation that omits `tag_byte` from the mask (or passes it as a constant) would compute the same nonce for both entries — the distinct expected values catch this silently.

The `(chunk_index=0, tag_byte=0x01)` entry covers the single-final-chunk case — the most common real-world usage for small files. An implementation that XORs `tag_byte` at the wrong byte position in the mask produces the correct nonce for `(0, 0x00)` (mask is all-zeros regardless) but the wrong nonce for `(0, 0x01)` (where the tag byte's position within the zero-index mask is the only distinguishing feature). Without this vector, a mask-ordering bug would pass all other F.9 entries.

**Debugging property at `chunk_index=0, tag_byte=0x00`**: The mask is all-zeros, so `chunk_nonce == base_nonce` exactly. If decryption fails for index-0, the bug is in header parsing or base_nonce extraction — not in the XOR step (which is a no-op at this index). If index-0 succeeds but a subsequent index fails, the bug is in the mask construction or XOR logic.

**`chunk_index = u64::MAX` boundary vector**: Detects 32-bit-truncation bugs — implementations using a `u32` counter silently compute a different nonce for any chunk index above `u32::MAX = 0xFFFFFFFF`. At `u64::MAX`, bytes 0-7 of the mask are all `0xFF` (vs. only bytes 4-7 for `u32::MAX`), so the two implementations produce nonces that differ in bytes 0-3.

```
chunk_index=u64::MAX (0xFFFFFFFFFFFFFFFF), tag_byte=0x00 (non-final):
  mask:         ffffffffffffffff00000000000000000000000000000000
  chunk_nonce:  efeeedecebeae9e818191a1b1c1d1e1f2021222324252627

chunk_index=u64::MAX (0xFFFFFFFFFFFFFFFF), tag_byte=0x01 (final):
  mask:         ffffffffffffffff01000000000000000000000000000000
  chunk_nonce:  efeeedecebeae9e819191a1b1c1d1e1f2021222324252627
```

Computed using the same `base_nonce = 101112...27` as above: bytes 0-7 XOR `0xFF` each; byte 8 XOR `tag_byte`; bytes 9-23 unchanged.

### F.10 Streaming AAD Construction (§15.4)

```
version:    01
flags:      00 (uncompressed)
base_nonce: 101112131415161718191a1b1c1d1e1f2021222324252627
caller_aad: "" (empty)

chunk_index=0, tag_byte=0x00:
  aad: 6c6f2d73747265616d2d76310100101112131415161718191a1b1c1d1e1f2021222324252627000000000000000000
       (47 bytes)

chunk_index=0, tag_byte=0x00, caller_aad="file-abc-123":
  aad: 6c6f2d73747265616d2d76310100101112131415161718191a1b1c1d1e1f202122232425262700000000000000000066696c652d6162632d313233
       (59 bytes)

chunk_index=2, tag_byte=0x01, caller_aad="file-abc-123":
  aad: 6c6f2d73747265616d2d76310100101112131415161718191a1b1c1d1e1f202122232425262700000000000000020166696c652d6162632d313233
       (59 bytes)
```

Construction: `"lo-stream-v1" || version || flags || base_nonce || chunk_index(u64 BE) || tag_byte || caller_aad`.

**Why three entries**: The first entry (non-final, empty AAD) and third entry (final, non-empty AAD) do not cover the combination non-final + non-empty AAD. A reimplementer who adds a spurious length prefix to `caller_aad` only when `tag_byte == 0x01` (final) passes entries 1 and 3 but fails entry 2. Entry 2 catches this bug.

**Note**: F.9 and F.10 cover only `flags=0x00` (uncompressed). A compressed vector (`flags=0x01`) exercising the zstd-before-AEAD encrypt path and post-AEAD-decompress decrypt path is needed for complete cross-implementation validation of the compression integration. The AAD construction is identical (only the flags byte differs); the key difference is the plaintext/ciphertext relationship (compressed vs. raw).

### F.11 KDF_KEX Session Key Derivation (§5.4 Step 4)

```
ss_ik:            1111111111111111111111111111111111111111111111111111111111111111
ss_spk:           2222222222222222222222222222222222222222222222222222222222222222
(no OPK)
ikm:              11111111111111111111111111111111111111111111111111111111111111112222222222222222222222222222222222222222222222222222222222222222
                  (64 bytes: ss_ik || ss_spk)

salt:             0000000000000000000000000000000000000000000000000000000000000000

alice_ik_pub:     0xAA repeated × 3200 bytes
bob_ik_pub:       0xBB repeated × 3200 bytes
ek_pub:           0xCC repeated × 1216 bytes
crypto_version:   "lo-crypto-v1" (6c6f2d63727970746f2d7631)

info (7645 bytes): "lo-kex-v1"                           // 9 bytes raw
                || 000c 6c6f2d63727970746f2d7631         // len(cv) + crypto_version
                || 0c80 [AA × 3200]                      // len(alice_ik) + alice_ik_pub
                || 0c80 [BB × 3200]                      // len(bob_ik) + bob_ik_pub
                || 04c0 [CC × 1216]                      // len(ek) + ek_pub

root_key:         5067b4b2c0b33aafa8be7805a7b1a136c32e7769624b8e78cc762c6194a3322c
epoch_key:        4ee99ff8ff9588a8c1df8819cb0bd49bd39277412f668c6be4ea0850220e8000
```

Construction: `HKDF-SHA3-256(salt=0x00{32}, ikm=ss_ik||ss_spk, info=info, len=64)`. First 32 bytes = root_key, last 32 bytes = epoch_key. The `info` field uses 2-byte BE length prefixes for all components. The `"lo-kex-v1"` prefix is raw (not length-prefixed).

**Missing intermediate checkpoint**: No SHA3-256 hash of the assembled 7645-byte `info` field is provided. The `info` field mixes raw and length-prefixed fields in a specific order; a field-encoding error (wrong prefix size, swapped field order, missing length prefix on "lo-kex-v1") shifts all subsequent bytes and produces a final root_key/epoch_key mismatch with no intermediate signal. A reimplementer can verify their `info` assembly independently: compute SHA3-256 of the assembled `info` bytes before passing to HKDF and compare against a trusted build — this check is not provided in-document but is the diagnostic step to run when F.11/F.12 root_key or epoch_key diverge.

### F.12 KDF_KEX with OPK (§5.4 Step 4)

```
ss_ik:            1111111111111111111111111111111111111111111111111111111111111111
ss_spk:           2222222222222222222222222222222222222222222222222222222222222222
ss_opk:           3333333333333333333333333333333333333333333333333333333333333333
ikm:              ss_ik || ss_spk || ss_opk  (96 bytes)

salt, info:       identical to F.11

root_key:         c308b84238e8b73424b88d5e24ac6e4e0e5a0bfe047b5620fc9811f368ec0be1
epoch_key:        35d3ddd0b464faa3663e92041cebf2bcd8db593b5b0ebae75e7f02a24631ea2c
```

The IKM concatenation order (`ss_ik || ss_spk || ss_opk`) is critical — any reordering produces a different session key.

### F.13 encode_session_init (§7.4)

```
crypto_version:         "lo-crypto-v1"  (6c6f2d63727970746f2d7631)
sender_ik_fingerprint:  0xAA × 32
recipient_ik_fingerprint: 0xBB × 32
sender_ek:              0xCC × 1216
ct_ik:                  0x11 × 1120
ct_spk:                 0x22 × 1120
spk_id:                 0x000000DD (u32 big-endian)
has_opk:                0x00 (absent)

Encoded layout (3543 bytes):
  [0..2]        000c                    len(crypto_version)
  [2..14]       6c6f2d63727970746f2d7631  crypto_version
  [14..46]      AA × 32                 sender_ik_fingerprint (no length prefix — fixed size)
  [46..78]      BB × 32                 recipient_ik_fingerprint (no length prefix — fixed size)
  [78..1294]    CC × 1216               sender_ek (no length prefix — fixed size)
  [1294..1296]  0460                    len(ct_ik) = 1120
  [1296..2416]  11 × 1120              ct_ik
  [2416..2418]  0460                    len(ct_spk) = 1120
  [2418..3538]  22 × 1120              ct_spk
  [3538..3542]  000000dd               spk_id (u32 big-endian, no length prefix)
  [3542]        00                      has_opk

SHA3-256(encoded): e45e05fb2d4218d1cd2f660491cd026ceec187ea7e3048908aa0f37681c36a9c
```

The SHA3-256 hash provides a quick verification that the encoding is correct — compute the hash of your serialized output and compare before attempting signature or AAD construction. Note: `spk_id` is a 4-byte big-endian u32 (not 32 bytes), and it follows ct_spk (not sender_ek). The `has_kem_ct` field does not exist in encode_session_init — it is part of encode_ratchet_header only.

**With OPK** (4669 bytes):

```
(Same fields as above through spk_id)
ct_opk:       0x33 × 1120
opk_id:       0x000000EE (u32 big-endian)

Encoded layout (4669 bytes):
  [0..3542]     (identical to no-OPK variant through spk_id)
  [3542]        01                      has_opk
  [3543..3545]  0460                    len(ct_opk) = 1120
  [3545..4665]  33 × 1120              ct_opk
  [4665..4669]  000000ee               opk_id (u32 big-endian)

SHA3-256(encoded): 230d711bebc95875ee9d7e3bd4a56c0cf7e5f34a52a453ec498326b489af7dcc
```

The OPK block format is: `0x01 || len(ct_opk) || ct_opk || opk_id`. Note that `opk_id` follows `ct_opk` (not `spk_id`) — the two key IDs are not adjacent. A reimplementer who places `opk_id` immediately after `spk_id` (before `ct_opk`) will produce an incompatible encoding.

**Missing rejection vectors for `decode_session_init`**: No negative-case KATs are provided for `decode_session_init`. The following inputs MUST return `InvalidData` and are the primary decoder-strictness checks: (1) `has_opk = 0x02` (any byte other than `0x00`/`0x01` for the OPK flag); (2) trailing bytes after the last field (a conforming no-OPK blob with one extra byte appended); (3) mid-ciphertext truncation (e.g., 3541 bytes — truncated one byte before the end of ct_spk). A decoder that accepts (1) passes format-malleability; one that accepts (2) creates a decoding oracle; one that silently succeeds on (3) produces a ct_spk/spk_id parsing shift. These rejection behaviors are normative requirements in §7.4 but are not covered by existing test vectors.

### F.14 encode_ratchet_header (§7.4)

**Without KEM ciphertext** (same-chain message, 1225 bytes):

```
ratchet_pk:   0xAA × 1216
has_kem_ct:   0x00 (absent)
n:            42 (0x0000002A)
pn:           10 (0x0000000A)

Encoded layout:
  [0..1216]    AA × 1216               ratchet_pk (no length prefix — fixed size)
  [1216]       00                      has_kem_ct
  [1217..1221] 0000002a               n (u32 big-endian)
  [1221..1225] 0000000a               pn (u32 big-endian)

SHA3-256(encoded): 71d0bf62f50a1fff7b27b0825426e3ae29b52e2e335940caeb46a485ec73e1bf
```

**With KEM ciphertext** (new-epoch message, 2347 bytes):

```
ratchet_pk:   0xAA × 1216
has_kem_ct:   0x01 (present)
kem_ct:       0xBB × 1120
n:            42 (0x0000002A)
pn:           10 (0x0000000A)

Encoded layout:
  [0..1216]    AA × 1216               ratchet_pk (no length prefix — fixed size)
  [1216]       01                      has_kem_ct
  [1217..1219] 0460                    len(kem_ct) = 1120
  [1219..2339] BB × 1120              kem_ct
  [2339..2343] 0000002a               n (u32 big-endian)
  [2343..2347] 0000000a               pn (u32 big-endian)

SHA3-256(encoded): 99588b3b8b7539dc864443b16741f642a963207b66eb59058fe5f1729b180ed2
```

### F.15 KDF_Call (§6.12)

```
root_key:   aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
kem_ss:     bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
call_id:    cccccccccccccccccccccccccccccccc
local_fp:   1111111111111111111111111111111111111111111111111111111111111111
remote_fp:  2222222222222222222222222222222222222222222222222222222222222222

ikm:        kem_ss || call_id  (48 bytes, no length prefixes)
info:       "lo-call-v1" || fp_lo || fp_hi  (74 bytes)
            fp_lo = local_fp (0x11... < 0x22...), fp_hi = remote_fp

key_a:      ed75d812373c9b3bf6bddd394a631950520503f103b492fb908621eb712b5970
key_b:      c3e5171534e0d1f922ea4ebf318357b990eafb0fff45d8cf430639a1fe2bb1e4
chain_key:  1427dde311aaa195b116cc98c870753179297981446d3b53e00a4a92a0d34aeb
```

Construction: `HKDF-SHA3-256(salt=root_key, ikm=kem_ss||call_id, info="lo-call-v1"||fp_lo||fp_hi, len=96)`. Fingerprints sorted by unsigned byte-by-byte lexicographic comparison (lower first). Output: first 32 bytes = key_a, next 32 = key_b, last 32 = chain_key.

**Reversed-order coverage**: In this vector, `local_fp` (0x11...) < `remote_fp` (0x22...) so `local_fp` is `fp_lo` and `remote_fp` is `fp_hi`. To test the reversed sort branch (where `local_fp > remote_fp`), swap the two fingerprints: set `local_fp = 0x22...` and `remote_fp = 0x11...`. The `info` field must be identical (sorting produces the same `fp_lo = 0x11..., fp_hi = 0x22...`), so `key_a`, `key_b`, and `chain_key` are the same as above — but the role assignment reverses: the party whose `local_fp = 0x22...` now uses `key_b` as send key (not `key_a`), because they are the lexicographically higher party. A reimplementer who tests only the non-reversed case passes this vector and misses a sort-direction bug that would produce incompatible role assignments.

### F.16 AdvanceCallChain (§6.12)

```
chain_key:  1427dde311aaa195b116cc98c870753179297981446d3b53e00a4a92a0d34aeb

key_a':     9cf3129c6bb7ad86cb12ffc534517a4c06a472fbcddbe295a501c79aa49800e1
key_b':     f24cd7822fd611159a6e6d809c6ac148fd7b9bad65d8b4f85745869634b2dd1e
chain_key': d3ae610c39cd9f7f8dce990b5c91634092ad0621fc01b44b24b2cb9f3638d0f2
```

Construction: `key_a' = HMAC-SHA3-256(chain_key, 0x04)`, `key_b' = HMAC-SHA3-256(chain_key, 0x05)`, `chain_key' = HMAC-SHA3-256(chain_key, 0x06)`. Each HMAC input is a single byte.

### F.17 DM Queue AAD (§11.4.2)

```
version:       01
flags:         00 (uncompressed)
recipient_fp:  0xAA × 32
batch_id:      "batch-001"  (62617463682d303031)

aad: 6c6f2d646d2d71756575652d763101000020aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa000962617463682d303031
     (61 bytes)
```

Construction: `"lo-dm-queue-v1" || version || flags || len(recipient_fp) || recipient_fp || len(batch_id) || batch_id`. Note: `len(recipient_fp)` = `0020` (2-byte BE for 32 bytes) — recipient_fp is length-prefixed despite being fixed-size, for wire format consistency with community AAD (§11.4.1).

### F.18 First-Message AAD (§5.4 Step 7)

```
sender_fp:    0xAA × 32  (raw SHA3-256, not hex)
recipient_fp: 0xBB × 32
session_init: F.13 no-OPK encoding (3543 bytes)

aad = "lo-dm-v1" || sender_fp || recipient_fp || encode_session_init(si)
      (3615 bytes: 8 + 32 + 32 + 3543)

SHA3-256(aad): 091a81dbff776e4a81d34ce22f7cd7efeaf225cd40bbf5f9f49825fd5c462ac7
```

The sender and recipient fingerprints are the raw 32-byte SHA3-256 digests (§2.1), not hex strings. On the decrypt side (§5.5 Step 6), the AAD is assembled as `"lo-dm-v1" || Alice.fingerprint_raw || Bob.fingerprint_raw || session_init_bytes` — the same order. This is the most common first integration failure: passing hex-encoded fingerprints (64 bytes) instead of raw (32 bytes), or swapping sender/recipient positions.

### F.19 Ratchet Message AAD (§6.5)

```
sender_fp:      0xAA × 32  (raw SHA3-256, not hex)
recipient_fp:   0xBB × 32
ratchet_header: F.14 no-KEM-ct encoding (1225 bytes)

aad = "lo-dm-v1" || sender_fp || recipient_fp || encode_ratchet_header(h)
      (1297 bytes: 8 + 32 + 32 + 1225)

SHA3-256(aad): eaec65b7ac6d8e3912bacf1ed40429ab5005f33550c1d6e0231844fecac6a93e
```

**Direction asymmetry**: When Alice encrypts, `sender_fp = Alice.fingerprint_raw` and `recipient_fp = Bob.fingerprint_raw`. When Bob decrypts the same message, he assembles AAD with `sender_fp = Alice.fingerprint_raw` (the message author) and `recipient_fp = Bob.fingerprint_raw` (himself) — the same order. A reimplementer who uses `(local_fp, remote_fp)` for both encrypt and decrypt gets the wrong AAD on one side.

### F.19b Ratchet Message AAD — New Epoch (§6.5)

```
sender_fp:      0xAA × 32  (raw SHA3-256, not hex)
recipient_fp:   0xBB × 32
ratchet_header: F.14 with-KEM-ct encoding (2347 bytes)

aad = "lo-dm-v1" || sender_fp || recipient_fp || encode_ratchet_header(h)
      (2419 bytes: 8 + 32 + 32 + 2347)

SHA3-256(aad): 25e46f405c91fb21aef5f7cd719d19b36d3edc030edaedf488f5624c02e4c854
```

This is the most common reimplementation failure path — the with-KEM-ct header (2347 bytes) appears in every new-epoch message, and incorrect KEM ciphertext length-prefix encoding (e.g., omitting it, using little-endian, or using the wrong ciphertext size) silently produces a different AAD hash with no diagnostic.

### F.20 Argon2id (§10.6)

```
password (21 bytes): 746573742d70617373776f72642d736f6c69746f6e  ("test-password-soliton")
salt     (16 bytes): 736f6c69746f6e2d73616c742d766563              ("soliton-salt-vec")
m_cost:  65536 (KiB, = Argon2Params::RECOMMENDED)
t_cost:  3
p_cost:  4
version: 0x13 (19, Argon2id v1.3)
output_len: 32

output (32 bytes): 79f1dce60c8371a21f849470848c40dc1589deb5119cd3c4f26298c3f17ac3cf
```

Verified against the reference C implementation (`argon2` CLI, libargon2 20190702) and the `argon2` Rust crate used by soliton. Key implementation pitfall: `m_cost` is in KiB (not bytes); passing 65536 bytes instead of 65536 KiB produces a different output silently (Argon2 accepts any m_cost ≥ 8 × p_cost). The `p_cost` parameter specifies lanes (degree of parallelism); some wrappers accept a separate "threads" parameter — for this vector, lanes = threads = 4.

### F.21 Ratchet Blob Layout (§6.8)

The ratchet blob is too large for a full hex dump (3,847 bytes for Alice's minimal initial state due to X-Wing keys) but the structural layout is the primary reimplementation hazard. This annotated offset table describes Alice's initial state after `init_alice` + one `to_bytes()` call, with no OPK, no previous epoch, and no recv_seen entries:

```
Offset  Size    Field                          Value (this vector)
------  ------  -----------------------------  -------------------
0       1       version                        0x01
1       8       epoch (u64 BE)                 1
9       32      root_key                       [session-dependent]
41      32      send_epoch_key                 [session-dependent]
73      32      recv_epoch_key                 0x00 * 32 (always all-zero for Alice initial — hard-coded in init_alice; not session-dependent)
105     32      local_fp                       SHA3-256(Alice.IK_pub)
137     32      remote_fp                      SHA3-256(Bob.IK_pub)

--- Optional: send_ratchet_sk (present) ---
169     1       present flag                   0x01
170     2       length (u16 BE)                0x0980 (2432)
172     2432    X-Wing secret key              [session-dependent]

--- Optional: send_ratchet_pk (present) ---
2604    1       present flag                   0x01
2605    2       length (u16 BE)                0x04C0 (1216)
2607    1216    X-Wing public key              [session-dependent]

--- Optional: recv_ratchet_pk (absent in Alice initial) ---
3823    1       present flag                   0x00

--- Optional: prev_recv_epoch_key (absent) ---
3824    1       present flag                   0x00

--- Optional: prev_recv_ratchet_pk (absent) ---
3825    1       present flag                   0x00

--- Counters ---
3826    4       send_count (u32 BE)            0x00000001 (1)
3830    4       recv_count (u32 BE)            0x00000000 (0)
3834    4       prev_send_count (u32 BE)       0x00000000 (0)

--- Flags ---
3838    1       ratchet_pending                0x00

--- recv_seen set ---
3839    4       num_recv_seen (u32 BE)         0x00000000 (0)
                (entries would follow as u32 BE, sorted ascending)

--- prev_recv_seen set ---
3843    4       num_prev_recv_seen (u32 BE)    0x00000000 (0)
                (entries would follow as u32 BE, sorted ascending)

Total: 3847 bytes
```

**Key reimplementation hazards:**
- **Optional field encoding**: present fields use `0x01 + u16_BE_length + data`; absent fields use a single `0x00` byte. Exception: `prev_recv_epoch_key` uses `0x01 + 32_bytes` (no length prefix) since the size is always exactly 32 bytes. Present-case byte sequence: `01 XX XX...XX` (33 bytes, where `XX × 32` is the key). A decoder that inserts a 2-byte length prefix after the `0x01` marker misaligns all subsequent fields by 2 bytes. See §6.8 for a full worked example.
- **X-Wing key sizes**: secret key = 2432 bytes (32 X25519 + 2400 ML-KEM-768), public key = 1216 bytes (32 X25519 + 1184 ML-KEM-768). These are not the same sizes as draft-09 (which uses ML-KEM-first ordering and different key representations).
- **recv_seen entries**: sorted strictly ascending, each u32 BE. No entry may equal `u32::MAX`. Each entry must be `< recv_count`. **No test vector exercises a non-empty `recv_seen` or `prev_recv_seen`** — both F.21 vectors show the empty case (`num_recv_seen = 0`). To independently verify the encoding: a `recv_seen` set containing `{0x00000001, 0x00000003}` would serialize as `00 00 00 02` (count=2) followed by `00 00 00 01 00 00 00 03` (two u32 BE values in ascending order). A `{0x00000003, 0x00000001}` insertion order MUST produce the same ascending-sorted bytes — the sort is by value, not insertion order.
- **send_count = 1 in Alice initial**: `init_alice` sets `send_count = 1` directly (§6.2) because counter 0 was consumed by `encrypt_first_message`. The first ratchet `encrypt()` call uses counter 1 and increments `send_count` to 2. A reimplementer who initializes `send_count = 0` and expects the first `encrypt()` to set it to 1 produces a nonce collision with the first-message counter.
- **Absent recv_ratchet_pk with recv_count = 0**: Valid for Alice's initial state (hasn't received anything). Guard 3 prevents `recv_count > 0` with absent `recv_ratchet_pk`.

**Bob's initial state** (after `init_bob` + one `to_bytes()` call, i.e., after receiving Alice's session init and calling `decrypt_first_message`):

```
Offset  Size    Field                          Value (this vector)
------  ------  -----------------------------  -------------------
0       1       version                        0x01
1       8       epoch (u64 BE)                 1
9       32      root_key                       [session-dependent]
41      32      send_epoch_key                 0x00 * 32 (all-zero placeholder; replaced on first KEM ratchet step)
73      32      recv_epoch_key                 [session-dependent]
105     32      local_fp                       SHA3-256(Bob.IK_pub)
137     32      remote_fp                      SHA3-256(Alice.IK_pub)

--- Optional: send_ratchet_sk (absent — Bob hasn't sent yet) ---
169     1       present flag                   0x00

--- Optional: send_ratchet_pk (absent) ---
170     1       present flag                   0x00

--- Optional: recv_ratchet_pk (present = Alice's EK_pub from SessionInit) ---
171     1       present flag                   0x01
172     2       length (u16 BE)                0x04C0 (1216)
174     1216    X-Wing public key              [session-dependent = Alice's EK_pub]

--- Optional: prev_recv_epoch_key (absent) ---
1390    1       present flag                   0x00

--- Optional: prev_recv_ratchet_pk (absent) ---
1391    1       present flag                   0x00

--- Counters ---
1392    4       send_count (u32 BE)            0x00000000 (0)
1396    4       recv_count (u32 BE)            0x00000001 (1)
1400    4       prev_send_count (u32 BE)       0x00000000 (0)

--- Flags ---
1404    1       ratchet_pending                0x01 (Bob's first send triggers a KEM ratchet step)

--- recv_seen set ---
1405    4       num_recv_seen (u32 BE)         0x00000000 (0)
                (empty — counter 0 was consumed by decrypt_first_message,
                 outside the ratchet; do NOT seed with {0})

--- prev_recv_seen set ---
1409    4       num_prev_recv_seen (u32 BE)    0x00000000 (0)

Total: 1413 bytes (vs Alice's 3847)
```

The size difference is entirely due to `send_ratchet_sk` (2432 bytes) and `send_ratchet_pk` (1216 bytes) being absent on Bob's side: Bob hasn't sent a ratchet message yet and therefore has no send-side ratchet key. Reimplementers whose first test is "Bob receives and serializes" will see a dramatically smaller blob than Alice's layout — this is correct, not a bug. Note also: `send_epoch_key` is all-zero in Bob's initial state (the epoch key for Bob's sending direction is a placeholder, set to a real value by the first KEM ratchet step when Bob sends his first message), while `recv_epoch_key` is the actual key Bob received during `decrypt_first_message`.

A reimplementer should round-trip their own serialization/deserialization against these offsets before attempting a cross-implementation session.

**Test path for `from_bytes` → `ChainExhausted`**: To test the `ChainExhausted` error path from deserialization (guard 24, §6.8), take any valid ratchet blob and replace the 8 bytes at the `epoch` field offset with `0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF` (u64::MAX in big-endian). Pass the modified blob to `soliton_ratchet_from_bytes` (or `from_bytes_with_min_epoch` with any `min_epoch`). The function MUST return `ChainExhausted` (-15), NOT `InvalidData` (-17). A deserializer that returns `InvalidData` for this input misclassifies a serialization-exhausted-but-recoverable state as corrupted data, causing the caller to permanently discard a session that could have been handled differently (§12 mode (3)). The epoch field is at bytes 1-8 of the blob (immediately after the 1-byte version tag — there is no reserved field; version occupies offset 0, epoch occupies offsets 1-8, see F.21 layout). Verify the exact offset from the F.21 layout map before patching.

### F.22 Streaming AEAD with Compression (§15)

**Note**: A byte-exact compressed streaming vector is not provided because zstd output is not guaranteed to be identical across implementations, compression levels, or library versions for the same input. The frame format (RFC 8878) is standardized, but the encoder's block-splitting, match-finding, and entropy coding decisions vary.

To validate the compression integration path:

1. **Encrypt** a known plaintext chunk with `compress=true` using your zstd encoder.
2. Verify the stream header has `flags=0x01` (bit 0 set).
3. **Decrypt** using your zstd decoder — the recovered plaintext must match the original.
4. Cross-validate the AAD: identical to the uncompressed case (F.10) except `flags=0x01`. The flags byte appears in both the stream header (byte 1 of the 26-byte header) and in the per-chunk AAD (byte 13, immediately after the 12-byte `"lo-stream-v1"` label and the 1-byte version field). Using the same base nonce as F.10 and `flags=0x01`, the compressed chunk_index=0, tag_byte=0x00 AAD is:
   ```
   6c6f2d73747265616d2d76310101101112131415161718191a1b1c1d1e1f2021222324252627000000000000000000
   (47 bytes — identical to F.10 except byte 13 is 01 instead of 00)
   ```
   Byte 13 is the flags byte; bytes 14-37 are the base_nonce; bytes 38-45 are the chunk index; byte 46 is the tag_byte. A reimplementer who propagates `flags=0x01` to the stream header but uses `flags=0x00` in the per-chunk AAD (or vice versa) will produce a ciphertext that their own implementation cannot decrypt — the AAD mismatch causes AEAD failure immediately, with no diagnostic pointing to the flag inconsistency.
5. Verify that a chunk compressed with `flags=0x01` is rejected when decrypted with `flags=0x00` (wrong AAD), and vice versa.

**Compressed + non-empty `caller_aad`**: F.10 covers uncompressed + non-empty `caller_aad`; F.22 above covers compressed + empty `caller_aad`. The combination compressed + non-empty `caller_aad` is exercised by substituting `flags=0x01` into the F.10 non-empty-`caller_aad` entry. Using the same inputs as F.10 (`chunk_index=2, tag_byte=0x01, caller_aad="file-abc-123"`) but with `flags=0x01`:
```
aad: 6c6f2d73747265616d2d76310101101112131415161718191a1b1c1d1e1f202122232425262700000000000000020166696c652d6162632d313233
     (59 bytes — identical to F.10 chunk_index=2 entry except byte 13 is 01 instead of 00)
```
A reimplementer who correctly handles each dimension separately but adds a spurious length prefix to `caller_aad` only when it is non-empty (a plausible mistake given that all other AAD fields are fixed-size) would pass all F.10 and F.22 vectors and only fail here, manifesting as an AEAD mismatch in cross-implementation testing.

**Empty final chunk with `compress=true` — AAD still uses `flags=0x01`**: A stream initialized with `compress=true` that ends with an empty final chunk (`is_last=true`, 0-byte plaintext) bypasses compression per §15.5, but the per-chunk AAD still uses `flags=0x01` — the stream's compression *configuration*, not the per-chunk *outcome*. Using the same base nonce as F.10 and `flags=0x01`, the empty final chunk (chunk_index=0, tag_byte=0x01) AAD is:
```
6c6f2d73747265616d2d76310101101112131415161718191a1b1c1d1e1f2021222324252627000000000000000001
(47 bytes — identical to F.10 step 4 except byte 13 is 01 (flags=compressed) and byte 46 is 01 (tag_byte=final))
```
A reimplementer who writes `flags=0x00` for this chunk ("compression was bypassed, so this chunk's flag should be 0") produces an AAD mismatch and immediate AEAD failure on decrypt, with no diagnostic pointing to the flag inconsistency.

The critical interop property is not the compressed byte sequence but the encrypt-then-decrypt round-trip and the AAD binding of the flags byte.

### F.23 Storage Blob Wire Format (§11.1)

The storage blob has a fixed-layout header followed by an AEAD-protected body. No fabricated ciphertext bytes are included — the test for this section is structural validation of the header and the AAD construction, not a known-answer ciphertext output.

**Wire layout:**

```
Offset  Size  Field
------  ----  -----
0       1     version (u8) — storage key version, 1-255; 0 is reserved (AeadFailed via keyring miss — NOT InvalidData; §11.1 prohibits a pre-AEAD version-0 check as an oracle)
1       1     flags (u8)   — bit 0 = FLAG_COMPRESSED (0x01); bits 1-7 reserved (AeadFailed if set)
2       24    nonce        — 192-bit random; unique per encrypted blob
26      ≥16   ciphertext + Poly1305 tag — XChaCha20-Poly1305 AEAD output
```

Minimum blob: 42 bytes (`1 + 1 + 24 + 16`). A valid blob with zero plaintext bytes still carries a full 16-byte Poly1305 tag.

**AAD binding (§11.4):** Both `version` and `flags` are included in the AAD passed to AEAD. The AEAD operation covers bytes `[26..]` with AAD constructed from `version`, `flags`, `channel_id`, and `segment_id`. Neither `version` nor `flags` are inside the ciphertext — an implementation that omits them from the AAD produces a malleable blob (see §11.1).

**Decryption read path:**
1. Assert `len(blob) >= 42`; else `AeadFailed` (per §12 oracle-collapse table — a sub-42-byte blob returns `AeadFailed`, not `InvalidData` or `InvalidLength`, to prevent an oracle distinguishing "too short to contain valid ciphertext" from "plausible-length blob with wrong key/tag").
2. Read `version = blob[0]`, `flags = blob[1]`, `nonce = blob[2..26]`.
3. Reject unknown flag bits as `AeadFailed`. (`version == 0` is not a separate early reject: version 0 is never loaded into the keyring, so step 4's key-not-found lookup returns `AeadFailed` — the same outcome as any other unrecognized version.)
4. Look up encryption key by `version`; if absent → `AeadFailed` (not `UnsupportedVersion` — §11.3 oracle).
5. Reconstruct AAD from `version`, `flags`, `channel_id`, `segment_id` (§11.4.1 or §11.4.2).
6. Decrypt `blob[26..]` with XChaCha20-Poly1305 using `nonce` and AAD.
7. If compressed (`flags & 0x01`), decompress with zstd.

**F.8 covers the AAD construction** with a full worked example. Combine F.8 with the wire layout above to validate a complete storage encrypt/decrypt round-trip.

### F.24 ML-DSA-65 Seed-to-Public-Key (§2.1 Cross-Library Verification)

This vector verifies the §2.1 portable cross-library verification procedure: extract candidate 32 bytes, call `ML-DSA.KeyGen_internal(candidate)`, compare the resulting public key against the known public key.

**Input**: seed `ξ` = `[0xAA] × 32` (32 bytes, all 0xAA)

```
seed (32 bytes, hex):
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
```

**Expected output**: ML-DSA-65 public key (1952 bytes). Computed via `MlDsa65::from_seed(ξ).verifying_key().encode()` using the `ml-dsa` Rust crate (version used by soliton — see `Cargo.lock`). `ML-DSA.KeyGen_internal(ξ)` runs FIPS 204 Algorithm 1 deterministically with no CSPRNG input.

```
public_key (1952 bytes, hex):
2a3cd553791045a9363393c3f720866028e048bf598a099e8f81043491fb7095
71fe64caa83e93bac8c1c931d1a148aa8d04a37d42c6cfdaebc0638c7fff12b7
0ab0d76c209239bc4bbdfb36d2a676098792a1f9a5fd388292300e416e693633
1026274889fc21b80701d39a2c564f11417a20c2be4dae84ab0743071bb97bf8
c65c4520a6fd5c4e48e759bfac3e857a5fc23de915dee91d3fe6e83b5230ec28
d77b478c4831d19bf26e697abf5c890f527e6fef6f0499b69a490af5dc6e5e43
3b1c168ba9e9e51aab125d0927d1aaa5cc17cd649b6a5ca83418b163d9dd487c
fcbdcebb7d6386ed26ff22f4cdd329dfce0de2667d1809401a649cdcf4c4232b
06abcaa82c2d8277a12622045de61d224ac6913b488d885822b2c1a7e5c1be41
61eed1c5a79da7d86738e4d77740591090216554246cc12aa89ebd9c024e054a
1a9fc28d18b6263dd95cd9e5e50a28a615742f1c43a1326bb004f9fe0856672a
7e7873226d222f949032c17f3a13e7a9a1812f496cfa88d1261bde89a7d8117f
cd1e7fa50cc26072d516613cb75f457b7f7681f9b5c58c0fa13be6fc56ec446f
5c1347b62cde77c950d368fc63329a35f6584bfd74ae769fafdb7982601be1ad
d10816d57b85a647b7bf772a21d56453303b67825c58a9f71b0fb644b6ce6351
2dcd054ae0dc5995abed531098a1235c757ceada7e643004530173eccb2e2d3f
114a578cd7cca8304a4cab4fe39e206a089193f2566ff811da3eca6e634594e9
b330b85f10fb50304b997d387189aa121746aba38897c1691ca2fe590e2f12e1
bfb84106b043dcb8e8ec7009d8247cab028b90e792b9d186f20ed3e6ec0dd419
b54f953572ade2e144c3bade312cbe92d52e7c8ed350af61c24848dfa4686f30
fecb15b25e5618797e78add739e542b725f517fdd0ab4084a5d4da81bfe6e226
72b5f8817be017a28674e97d0f7f8410e7bb7257ab5131e1b56ca21cc7c57b75
c4a5b05992971f46d2648675b829ed71bfb49b5dfba39c071ac95cbe42d9dfc1
1bb81ad316e7656b55dba3f8a5786c050607d355791d5406c9e21c99a6ba2763
44eb1e755c8d83344a344ad5ea149051b91729c7cafdf5252d5a766ede05ad9e
1ef06e5dbf7de24486155caa2e92275d54f8c2df4e85f29605b975a9c2bbe775
f33761fc05d894a0834f96f5355cb63b83f0e90e5b5111bdca71611c93df96db
72a1db4723fdf4184c7f62f1e3efca954a772667effc9b553b7ab91c644cdbfb
a15c5f5c9e6b38e4df2ade1dd0b098739c47b39f5520eb2f584d0e353f90eddd
20320800545eec44c51f6c41618dc1451041ecb958351e2ef04a5fc13a7195c9
bc0a397944da82bdfa7ac46aeb05bee813944b25e66b311263f9d0f3d9bb6f5e
242a53b2ab9322eacf70388bd5be0ad4990ae9d7e3abaca428ce50c6eb35c9ee
0ced604ac17db0443b2ad1fe6d9bd9397457f2ce0f5e8665d9acd96b924344ad
3bc45cc0cf392d48b2e4dabbb07da0e2ba5561d346a952bd20054d035a4ff378
a4108dc0092ee25b40be9056a235aaf9aa314874351c99ec0bcbfa7e9bb6b1bd
e74fa506f863008058482be9fcb67449b2c2566b6d011985c4311fc5f1551bfc
a25699123a68e2a790cf2388120f28ff411836a3e95a97c6f60633b7cc27bea6
5f9323abb9731b222e28db4765748a3bfcdb962f0e290025e3b28b5642d891cd
dc09f1489aea45eee7f57ff231be716204192fbcf5fb574bae1d7e1e6e039bfe
f7dcd93162c093c11a80c2ab5b127a4f214ebb03dae20dcc38afa246320ee8e4
4a2ca97fd265fc7813ddd5efbacb981f401fbaa895e60a58a6e7d44bc1a17873
d523bc2659256e0e73eec555a6f7c799c74902e0ddb593c8d76937623feb0bcc
ee816d42841cf7935383435ba4fc4d4b6d36235da237fc2dcdd6d0e774157616
503a174417cd325e2c3ebe0b520d94f6f4c18a8d01a2daa087ce0fce85e61aa4
f126664d073773f8a927ed8c4d4d9724449af637a8e3a8bc15ddf2ec19f9f5b2
d0ad84dbf2a59fce072b132ba38bf5d985a966cabce4cc5f31a87101b56eb7e6
2b37e3fd0091afb9c8063cba5184234ceb4938313440678d4dd6c9eafa88d986
6e56fe75fa063396a66be4833ebef1e0b25ced5d5f5bceb26b7cb4775e792926
58c81b2405d2b488a0a881a2589749acbd0af912308aff5450d87dde8a0ee25b
7fee2a7d3b76e5237bfe6fc890f009438e539e1719864958c2bf3b63e43fae41
e591b53a8fbac2ec3f37d5b74a6d2e83b9a1e050ffc082e415e39288d51fabbb
1c791c2ccccef44e6f9c2886c506e561ad372cc20b691aba206d14d007518c3f
4b7aeabfe836f5c55fd65fb85d09948b652e1156983678ab6dffe739ca888614
3fe630b2a10741aa81d6a79fdfd3d144f2ce43d39ad5cd55e42d4f6deef1f406
2fd03aa83676a0a945dbd702e5f8c111c84d74c3d5a53d72a426c8ca5bc3f4f2
a4226d2efe1c1a476fd65d69a2d85216213108b45e5567bcba7f9ac9c73d7173
21561d56589eafa13f49fbeaa1ad47a3f2cb4f4f64f8b2055ea5968035a12b34
f0735981e2f3ebf50c51ba7e1f3e1f9b0ab892eb90e04d3a4d5e924b4280fb5f
e9e018f9d0bec7b53a097986abc61f6b3c9ef97dc30e97b4841cd1d64303646b
15b4fc5f99e97af00bc205a5c53097f572f0914fbc706c7164e1564396bfaa7d
ac3531d2c109a62c16ef9e81b49dbd91d7669bf5cf2ff875539b2ee691215114
```

**How to use**: A reimplementer who cannot call `to_seed()` on their ML-DSA library (e.g., because the library does not expose the 32-byte seed) uses the procedure in §2.1: extract the candidate 32 bytes from whatever API the library provides, call `ML-DSA.KeyGen_internal(candidate)` (FIPS 204 §6.1, deterministic — no CSPRNG input), and compare the resulting 1952-byte public key against the known public key. If the result matches this vector for `ξ = [0xAA] × 32`, the candidate extraction is correct. **Note**: this vector uses `sign_internal` / `verify_internal` (not the standard FIPS 204 domain-separated API — see §2.1 module doc); the public key encoding itself is standard FIPS 204 `pkEncode` format and is byte-for-byte comparable with any compliant ML-DSA-65 implementation.

---

### F.25 Standalone HKDF-SHA3-256 Primitive (§5.4, §6.4)

**Purpose**: Validates HKDF-SHA3-256 in isolation before any composed operation. The primary interop trap: RFC 5869 test vectors use HMAC-SHA-256 (64-byte HMAC block size), not HMAC-SHA3-256 (136-byte block size — §4.3). A reimplementer who pads HMAC inputs to 64 bytes instead of 136 produces wrong output in every derived key with no diagnostic.

```
salt (32 bytes): 0000000000000000000000000000000000000000000000000000000000000000
ikm  (64 bytes): 0101010101010101010101010101010101010101010101010101010101010101
                 0101010101010101010101010101010101010101010101010101010101010101
info (15 bytes): "lo-test-hkdf-v1"  (raw UTF-8, no length prefix)
len:             64

output (64 bytes):
  4a694c255636bd5a472c807cf1400a05f78a4a3e93b7f663dd6825c9d496904c
  6224e025169b8c67e62ed3b10129da39c546d6e84c84920f69232fd8e76e7cf0
```

Verified by `tests/compute_vectors.rs::f25_hkdf_sha3_256` in the reference implementation.

### F.26 Standalone XChaCha20-Poly1305 Primitive (§3, §6.5)

**Purpose**: Validates the AEAD primitive directly: `(key, nonce, plaintext, aad) → ciphertext || tag`. Isolates AEAD bugs — wrong key/nonce byte ordering, wrong AD binding, wrong tag placement — before full session integration.

```
key       (32 bytes): 0202020202020202020202020202020202020202020202020202020202020202
nonce     (24 bytes): 030303030303030303030303030303030303030303030303
plaintext (11 bytes): "hello world"  (raw UTF-8)
aad       (15 bytes): "lo-test-aead-v1"  (raw UTF-8)

ciphertext (11 bytes): 356c4d3352734de8f25fe3
tag        (16 bytes): 91c8f97e537cf5c7d3f07d2b03388f77

wire output (27 bytes, ciphertext || tag):
  356c4d3352734de8f25fe391c8f97e537cf5c7d3f07d2b03388f77
```

Verified by `tests/compute_vectors.rs::f26_xchacha20_poly1305` in the reference implementation.

**`soliton_aead_decrypt` error boundary — `InvalidLength` vs `AeadFailed` for undersized ciphertext**: The standalone AEAD function has an asymmetry compared to ratchet/stream decrypt; this is not shown in a success vector but is documented here for completeness.

| `ciphertext_len` | `soliton_aead_decrypt` return |
|------------------|-------------------------------|
| 0 | `InvalidLength` (-1) — the CAPI zero-length guard fires before any AEAD operation |
| 1-15 | `AeadFailed` (-4) — non-zero length passes the CAPI guard; too short to contain a 16-byte Poly1305 tag, so AEAD authentication fails |
| ≥ 16, wrong key/tag | `AeadFailed` (-4) — AEAD authentication failure |
| ≥ 16, correct | 0 (success) |

**Contrast with ratchet/stream**: `soliton_ratchet_decrypt` and `soliton_stream_decrypt_chunk` return `AeadFailed` for ALL undersized inputs including `len = 0` (oracle-collapse requirement, §12). `soliton_aead_decrypt` with `len = 0` returns `InvalidLength` — the CAPI zero-length guard fires first. A binding author who tests only the success path and then applies the ratchet/stream AeadFailed pattern to `soliton_aead_decrypt` diverges from the reference for the `len = 0` case.

### F.27 HybridSign / HybridVerify (§3.1)

**Purpose**: Validates the composite signature layout (Ed25519 || ML-DSA-65 concatenation) and the `Sign_internal` / `Verify_internal` internal-API requirement (not the FIPS 204 public API). Non-determinism is eliminated by pinning ML-DSA `rnd = [0x00] × 32` — this is test-only; production signing uses fresh `getrandom` entropy.

```
Identity secret key sub-components (used to construct 2496-byte SK):
  X-Wing sk (bytes    0..2432): [0x01] × 2432  (not used for signing)
  Ed25519 seed (bytes 2432..2464): [0x02] × 32
  ML-DSA-65 seed (bytes 2464..2496): [0x03] × 32

message (15 bytes): "lo-test-sign-v1"  (raw UTF-8)
ML-DSA rnd (32 bytes, test-only): [0x00] × 32

Ed25519 signature (bytes 0..64):
  21aafa2d66a4774e163064717412a2694527c84cdc57e93370ba05738940bdd0
  facc5cb6330088ce849635ac41a0099842a40ef82cb0046f6978eeb7196be00f

ML-DSA-65 signature (bytes 64..3373): [3309 bytes — see reference test for full hex]

composite signature (3373 bytes total):
  Ed25519 component: 21aafa2d66a4774e163064717412a2694527c84cdc57e93370ba05738940bdd0
                     facc5cb6330088ce849635ac41a0099842a40ef82cb0046f6978eeb7196be00f
  ML-DSA-65 component starts: 1b47e0e18a96f465b42396b24a77f72f...
  ML-DSA-65 component ends:   ...000000000000000000000000000000000000000000000000000a1316161b1e
```

Full 6746-char hex is in `EXPECTED_F27` in `tests/compute_vectors.rs`. Verified by `tests/compute_vectors.rs::f27_hybrid_sign_verify` which also runs `hybrid_verify` against the assembled composite. **In-document verification limitation**: The 3309-byte ML-DSA-65 component is too large to embed in full; no SHA3-256 hash of the ML-DSA-65 component is provided inline. Standalone verifiers who cannot access the repository must compute `SHA3-256(composite[64..3373])` from their own implementation output and cross-check against a trusted build, rather than comparing against an in-document hash.

**Missing partial-failure vectors for `HybridVerify`**: No vectors are provided for the two partial-failure cases: (1) Ed25519 component valid + ML-DSA-65 component corrupted (e.g., byte 64 flipped); (2) ML-DSA-65 component valid + Ed25519 component corrupted (e.g., byte 0 flipped). §3.2 requires that `HybridVerify` evaluates BOTH components in constant time before combining the results — a `&&` short-circuit that returns early on the first failure leaks which component failed. Partial-failure vectors would verify this by checking that the function returns `VerificationFailed` for both cases. Reimplementors MUST verify their `HybridVerify` does not short-circuit: evaluate `Ed25519.Verify()` AND `MLDSA.Verify()` independently, then combine with `&` (not `&&`).

### F.28 Streaming AEAD End-to-End Wire Vector (§15)

**Purpose**: Provides complete wire bytes for a two-chunk stream. The primary interop trap: `tag_byte` is outside the AEAD call (used in AAD and XORed into the nonce), while the 16-byte Poly1305 tag is inside. A reimplementer who passes `tag_byte` as plaintext produces a different wire format — passes all nonce/AAD checks but fails AEAD on the receiver.

**Non-final chunk size note**: This vector uses 16-byte plaintext for the non-final chunk (chunk 0) for compactness. In production, `soliton_stream_encrypt_chunk` with `is_last=false` requires the plaintext to be exactly `CHUNK_SIZE` (1,048,576 bytes) — a non-final chunk whose plaintext is not exactly `CHUNK_SIZE` is rejected with `InvalidData` (-17) by the core library (not `InvalidLength` — `InvalidLength` fires only when the output buffer is too small; the constraint on plaintext size is a semantic content check, which maps to `InvalidData`). The F.28 vector was computed at the primitive level (direct AEAD calls with the correct nonce and AAD, bypassing the CAPI chunk-size guard), so the wire bytes are protocol-correct. A reimplementer building streaming AEAD MUST enforce the same constraint: non-final chunks must be full size (1 MiB), and only the final chunk may be smaller. The AEAD itself does not enforce this — it will accept any size — so the guard must be in the framing layer.

```
key        (32 bytes): 0404040404040404040404040404040404040404040404040404040404040404
base_nonce (24 bytes): 050505050505050505050505050505050505050505050505
flags:       0x00  (no compression)
caller_aad:  empty

header (26 bytes):
  01 00 050505050505050505050505050505050505050505050505
  hex: 0100050505050505050505050505050505050505050505050505

chunk 0 — non-final (tag_byte=0x00), plaintext=[0x41]×16:
  nonce mask = chunk_index(8B) || tag_byte(1B) || 0x00×15(15B) = 24 bytes total
             = 0000000000000000 || 00 || 000000000000000000000000000000
  nonce = base_nonce XOR (all-zero mask) = base_nonce (unchanged)
  aad   = "lo-stream-v1" || 0x01 || 0x00 || base_nonce || 0000000000000000 || 0x00
  wire (33 bytes): 00d5425e7085cc776bc8c608ad84c41cc37eefb10d2b859ebddf8c1187c616c0c4
    tag_byte:   00
    ciphertext: d5425e7085cc776bc8c608ad84c41cc3
    tag:        7eefb10d2b859ebddf8c1187c616c0c4

chunk 1 — final (tag_byte=0x01), plaintext=[0x42]×8:
  nonce mask = chunk_index(8B) || tag_byte(1B) || 0x00×15(15B) = 24 bytes total
             = 0000000000000001 || 01 || 000000000000000000000000000000
  nonce = base_nonce XOR mask
        = 0505050505050504 || 04 || 050505050505050505050505050505
        (8 B)               (1 B) (15 B — 30 hex chars)
        flat (48 hex chars): 050505050505050404050505050505050505050505050505
  aad   = "lo-stream-v1" || 0x01 || 0x00 || base_nonce || 0000000000000001 || 0x01
  wire (25 bytes): 01aac61cb7b722895cb246433e7ebc081e92150081150d345d
    tag_byte:   01
    ciphertext: aac61cb7b722895c  (8 bytes — matches 8-byte plaintext)
    tag:        b246433e7ebc081e92150081150d345d  (16 bytes — Poly1305 tag)

full stream hex (84 bytes):
  010005050505050505050505050505050505050505050505050500d5425e7085cc776bc8c608ad84c41cc37eefb10d2b859ebddf8c1187c616c0c401aac61cb7b722895cb246433e7ebc081e92150081150d345d
```

Verified by `tests/compute_vectors.rs::f28_streaming_aead_wire` in the reference implementation.

### F.29 Argon2id + XChaCha20-Poly1305 Passphrase-Protected Key Blob (§10.6)

**Purpose**: End-to-end vector for the §10.6 recommended composition: `salt(16) || nonce(24) || AEAD_ciphertext`. F.20 covers Argon2id output in isolation; this covers the full assembly where the derived key feeds directly into XChaCha20-Poly1305 with the identity fingerprint as AAD. The easy mistake: using the wrong AAD (empty, or the salt, or something other than the identity fingerprint) or inserting an extra HKDF step between Argon2id output and the AEAD key — both produce incompatible ciphertext with no error at encryption time.

```
password (18 bytes): "lo-test-passphrase"  (raw UTF-8)
argon2_salt (16 bytes): 06060606060606060606060606060606
argon2 params: OWASP_MIN (m=19456 KiB, t=2, p=1)

aad / fingerprint (32 bytes): SHA3-256([0x00] × 3200)
  = 1fc29a619ef720eaf2966023f1d22c797a31a7ad6c9fd94b7fb28dfff94c5e4b

derived_key (32 bytes, Argon2id output):
  2058fdb73306ec7271061be269fccaf39756b8666248172d6923976e377f5d30

aead_nonce (24 bytes): 070707070707070707070707070707070707070707070707
plaintext  (17 bytes): "test-key-material"  (raw UTF-8)

ciphertext || tag (33 bytes):
  f90394fa7144500a63da86ca3ff6d900f855314f4c9030ab88b060a0ab41b9eede

blob (73 bytes, salt || nonce || ciphertext || tag):
  06060606060606060606060606060606
  070707070707070707070707070707070707070707070707
  f90394fa7144500a63da86ca3ff6d900f855314f4c9030ab88b060a0ab41b9eede

hex: 06060606060606060606060606060606070707070707070707070707070707070707070707070707f90394fa7144500a63da86ca3ff6d900f855314f4c9030ab88b060a0ab41b9eede
```

Verified by `tests/compute_vectors.rs::f29_passphrase_key_blob` in the reference implementation.

### F.30 `from_bytes_with_min_epoch` Rejection Boundary (§6.8)

**Purpose**: Verify the strict `>` boundary in anti-rollback deserialization. The condition is `epoch > min_epoch` — equal epoch is rejected. This is the boundary that prevents replaying the *current* epoch's blob (not only older ones).

**Test procedure**: Obtain any valid ratchet blob. Read the epoch value from bytes 1-8 (u64 big-endian, immediately after the 1-byte version tag — see F.21 layout). Let N be the deserialized epoch.

```
blob epoch N

from_bytes_with_min_epoch(blob, N - 1) → Ok     ← epoch N > min_epoch N-1 ✓
from_bytes_with_min_epoch(blob, N)     → InvalidData (-17)  ← epoch N ≤ min_epoch N (equal, not strictly greater)
from_bytes_with_min_epoch(blob, N + 1) → InvalidData (-17)  ← epoch N < min_epoch N+1
```

**Patching for boundary testing**: To produce a blob with a specific epoch without running a full session, take any valid blob and overwrite bytes 1-8 with the desired epoch as u64 big-endian. Epoch = 1 → `00 00 00 00 00 00 00 01`. The blob must otherwise be valid (pass `from_bytes` guards) — patch only the epoch field.

**Off-by-one hazard**: A reimplementer who uses `>=` instead of `>` accepts the current epoch's blob, defeating rollback protection for the common case where the adversary replays the most recent serialized state.

**Error code**: `InvalidData (-17)` on rejection. Not `InvalidLength`, `UnsupportedVersion`, or any other variant — those would let the caller misclassify a rollback attempt as a format error.

### F.31 Stream Header — Compressed Stream (§15.2)

**Purpose**: Pin the 26-byte stream header wire encoding for `flags=0x01` (compressed). The flags byte distinguishes compressed from uncompressed streams and is also bound into every per-chunk AAD (F.10, F.22) — a reimplementer who misplaces or omits it in either the header or the AAD produces unreadable ciphertext.

Using the F.10 base_nonce (`101112131415161718191a1b1c1d1e1f2021222324252627`):

```
version (1 byte):   01
flags   (1 byte):   01  (bit 0 = compressed)
base_nonce (24 bytes): 101112131415161718191a1b1c1d1e1f2021222324252627

header (26 bytes): 0101101112131415161718191a1b1c1d1e1f2021222324252627
```

The header is a concatenation with no length prefix, no delimiter. Any reimplementer who inserts a 1-byte or 2-byte length prefix before `version`, or who encodes `flags` as a 2-byte field, shifts all subsequent byte offsets and causes a parse failure at the first chunk.

### F.32 Streaming AEAD Random-Access Byte Offset (§15.3)

**Purpose**: Confirm the `byte_offset(N)` formula for random-access decryption. A reimplementer computing the wrong chunk stride cannot seek correctly and will either read garbage or fail AEAD authentication on every chunk beyond the first.

Parameters: chunk_size = 1 MiB = 1,048,576 bytes.

```
chunk wire size = 1 (tag_byte) + 1,048,576 (ciphertext) + 16 (AEAD tag)
               = 1,048,593 bytes

byte_offset(N) = 26 + N × 1,048,593

byte_offset(0) =          26  ← first chunk starts immediately after 26-byte header
byte_offset(1) =   1,048,619  ← 26 + 1,048,593
byte_offset(2) =   2,097,212  ← 26 + 2 × 1,048,593
byte_offset(N) =  26 + N × 1,048,593
```

The `26` addend is the fixed stream header size (version=1, flags=1, base_nonce=24). A reimplementer who omits the header from the offset (using `N × 1,048,593` directly) will seek 26 bytes too early on every chunk except chunk 0, producing AEAD failure.

The final chunk may be shorter than 1 MiB; byte_offset(N) gives the start of chunk N regardless of preceding chunk lengths only when all preceding chunks are full-size (1 MiB). Random access to a non-final chunk in a variable-chunk-size stream requires an index. The stride formula applies exclusively to fixed-1-MiB-chunk streams.

### F.33 HMAC-SHA3-256 with Long Key (§3.2)

**Purpose**: Discriminates SHA-2 and SHA3-256 HMAC implementations at the block-size boundary. SHA3-256's HMAC block size is 136 bytes; SHA-2-256's is 64 bytes. A 100-byte key falls above the SHA-2 threshold (forcing key hashing to 32 bytes in a SHA-2 implementation) but below the SHA3-256 threshold (padding the key to 136 bytes without hashing). All existing HMAC vectors use 32-byte keys, which lie below both thresholds and cannot expose this mismatch.

```
key   (100 bytes): AB × 100
data  (10 bytes):  "lo-hmac-v1"  (ASCII)

MAC (32 bytes): aa5575019f7aade135d379d92699d13d62cded9208869f9c9898d687d93ae293
```

A SHA-2 implementation would hash the 100-byte key to 32 bytes before XOR-padding. A SHA3-256 implementation pads 100 bytes to 136 bytes by appending zeros — no preliminary hashing. Both produce distinct MAC values for this key length; a reimplementer who produces the same MAC as above for all existing 32-byte-key vectors but fails here has used the wrong hash function in their HMAC.

Verified by `tests/compute_vectors.rs::f33_hmac_sha3_256_long_key`.

### F.34 SHA3-256 of First-Message AAD with OPK (§5.4 Step 7)

**Purpose**: Hashing the full AAD to a 32-byte value provides a fixed-size discriminator for the 4,741-byte with-OPK AAD structure. A reimplementer who omits `ct_opk` from the encoding, reverses the `ct_opk`/`opk_id` order, or omits the `"lo-dm-v1"` label prefix produces a different hash.

Inputs (synthetic, fixed):

```
sender_fingerprint   (32 bytes): AA × 32
recipient_fingerprint (32 bytes): BB × 32
crypto_version: "lo-crypto-v1"
sender_ek   (1216 bytes): CC × 1216
ct_ik       (1120 bytes): DD × 1120
ct_spk      (1120 bytes): EE × 1120
spk_id: 42 (0x0000002A)
ct_opk      (1120 bytes): FF × 1120
opk_id: 7  (0x00000007)
```

AAD wire layout: `"lo-dm-v1"(8) || sender_fp(32) || recipient_fp(32) || encode_session_init(4669)` = 4741 bytes total.

```
SHA3-256(first_message_aad with OPK):
  ba8e4c4ffb1330f47e5ca95a63671970036a1f3d07934836548efa0403e84815
```

Verified by `tests/compute_vectors.rs::f34_first_message_aad_with_opk`.

### F.35 HybridSign over SPK Message (§5.3)

**Purpose**: Domain label vector for SPK signing. The signed message is `"lo-spk-sig-v1" || SPK_pub`. A reimplementer who uses the wrong label (e.g., `"lo-kex-init-sig-v1"`) or signs only `SPK_pub` without the label produces a composite signature that `hybrid_verify` rejects.

Inputs (same synthetic identity key as F.27):

```
Ed25519 seed:   02 × 32
ML-DSA-65 seed: 03 × 32
ML-DSA-65 rnd:  00 × 32  (test only — production uses getrandom)
message: "lo-spk-sig-v1" (13 bytes) || CC × 1216  (total 1229 bytes)
```

Output: Ed25519 sig (64 bytes) || ML-DSA-65 sig (3309 bytes) = 3373 bytes total.

```
composite[0..64]   (Ed25519): 2856bb008aa260e6b541ead779730ad350d97feb39db4829cb4ef5520979f3c3820bda50d51fec0e16ae1b7bb2cba8016ab389222c51b46af1fa223914ad8a01
composite[64..3373] (ML-DSA-65): 93759b7f59dd...  (see EXPECTED_F35 in compute_vectors.rs)
```

Full 6746-character hex in `tests/compute_vectors.rs::EXPECTED_F35`. Verified by `tests/compute_vectors.rs::f35_hybrid_sign_spk`. **In-document verification limitation**: The ML-DSA-65 component is truncated; no inline SHA3-256 hash is provided. Standalone verifiers must compare `SHA3-256(composite[64..3373])` from their implementation against a trusted build.

### F.36 HybridSign over `encode_session_init` (§5.4 Step 6)

**Purpose**: Domain label vector for session-init signing. The signed message is `"lo-kex-init-sig-v1" || encode_session_init(si)`. A reimplementer who signs the raw `SessionInit` fields directly (instead of the encoded form) or who uses the SPK label produces a different composite signature.

Inputs (same synthetic identity key as F.27; synthetic SessionInit without OPK):

```
Ed25519 seed:   02 × 32
ML-DSA-65 seed: 03 × 32
ML-DSA-65 rnd:  00 × 32  (test only)
SessionInit: crypto_version="lo-crypto-v1", sender_fp=AA×32, recipient_fp=BB×32,
             sender_ek=CC×1216, ct_ik=DD×1120, ct_spk=EE×1120, spk_id=42,
             ct_opk=None, opk_id=None

encode_session_init output: 3543 bytes (no-OPK path)
message: "lo-kex-init-sig-v1" (18 bytes) || si_encoded (3543 bytes) = 3561 bytes
```

Output: Ed25519 sig (64 bytes) || ML-DSA-65 sig (3309 bytes) = 3373 bytes total.

```
composite[0..64] (Ed25519): c53f65e56414c595257a2e7233b91b5c52f2da83edc9c6245c63091dc83815c4c72fc53db16e5bd658826641c15e5dc33397e85b4447bff11213eb4273376c03
composite[64..3373] (ML-DSA-65): 397e85b4...  (see EXPECTED_F36 in compute_vectors.rs)
```

Full 6746-character hex in `tests/compute_vectors.rs::EXPECTED_F36`. Verified by `tests/compute_vectors.rs::f36_hybrid_sign_session_init`. **In-document verification limitation**: The ML-DSA-65 component is truncated; no inline SHA3-256 hash is provided. Standalone verifiers must compare `SHA3-256(composite[64..3373])` from their implementation against a trusted build.

### F.37 LO-Auth HMAC Token Derivation (§4)

**Purpose**: The LO-Auth proof token is `HMAC-SHA3-256(shared_secret, "lo-auth-v1")`. This vector isolates the HMAC step from the KEM by using a synthetic shared secret. The KEM round-trip is covered by X-Wing unit tests; the label and key/data order are the reimplementation risks addressed here.

```
shared_secret (32 bytes): 08 × 32
label (10 bytes): "lo-auth-v1"

token = HMAC-SHA3-256(key=shared_secret, data=label):
  4e14e7ab92b70dd587a558e208cbcd98fd933048a2b2bf90e188e1d9b04f6e2a
```

A reimplementer who swaps key and data (`HMAC(key=label, data=ss)`) or who uses a different label (e.g., `"lo-auth"` without the version suffix) produces a different 32-byte value. The token must be compared constant-time using `hmac_sha3_256_verify_raw` — never with `==`.

Verified by `tests/compute_vectors.rs::f37_lo_auth_hmac`.

### F.38 Streaming AEAD `UnsupportedVersion` Rejection (§15.8)

**Purpose**: Verify that `stream_decrypt_init` (and `soliton_stream_decrypt_init` at the CAPI) returns `UnsupportedVersion` when the stream header's version byte is not `0x01`. This is a rejection boundary test — no shared secret is produced.

The 26-byte stream header format is `version(1B) || flags(1B) || base_nonce(24B)` (§15.2 / F.31). Version `0x01` is the only currently defined version; any other byte triggers `UnsupportedVersion`.

**Test inputs** (any valid key and a header with a non-0x01 version byte):

```
key (32 bytes):                0404040404040404040404040404040404040404040404040404040404040404
header — version=0x00 (26 bytes): 0000050505050505050505050505050505050505050505050505
  version byte: 00  (not 0x01)
  flags:        00
  base_nonce:   050505050505050505050505050505050505050505050505

header — version=0x02 (26 bytes): 0200050505050505050505050505050505050505050505050505
  version byte: 02  (not 0x01)
  flags:        00
  base_nonce:   050505050505050505050505050505050505050505050505
```

**Expected result for both inputs**: `stream_decrypt_init` returns `UnsupportedVersion` (-10). No decryptor object is created.

**Additional rejection inputs**: any header with version byte in `[0x00, 0x02..0xFF]` must produce `UnsupportedVersion`. Version `0x01` with any valid flags byte and nonce produces `Ok`.

**Reimplementation check**: A reimplementer who validates only that the version byte is non-zero (instead of exactly `0x01`) will accept version `0x02` silently. A reimplementer who skips version validation entirely will attempt to parse future-format streams with current-version rules, producing wrong decryption output with no error.

Verified by the `decrypt_init_wrong_version` (version=0x00) and `decrypt_init_version_0x02` (version=0x02) tests in `soliton/soliton/src/streaming.rs` `#[cfg(test)]`.

### F.39 Missing Vectors — Acknowledged Gaps

The following vectors are not provided in-document. Each represents an integration failure mode that existing vectors do not cover. Reimplementors SHOULD add these as integration tests against the reference implementation.

**F.39.1 First-Message Encrypt/Decrypt End-to-End KAT (§5.4 Step 7, §5.5 Step 5)**

No vector combines F.11's `epoch_key` + a 24-byte random nonce + F.18's first-message AAD + a known plaintext into a complete encrypted first message with expected ciphertext. The primary integration failure mode — Alice and Bob deriving different AAD values — produces `AeadFailed` on Bob's side with no diagnostic pointing to the AAD divergence. To add this vector: run `encrypt_first_message(epoch_key, plaintext, aad)` with a pinned nonce and record the 24-byte nonce + ciphertext output; the corresponding `decrypt_first_message` call with the same inputs must reproduce the plaintext.

**F.39.2 encode_prekey_bundle (§5.3)**

No `encode_prekey_bundle` KAT is provided (with or without OPK). F.13 covers `encode_session_init`; the bundle format is structurally different (no sender fingerprints, no KEM ciphertexts). Field ordering and the absence of length prefixes on `IK_pub`, `SPK_pub`, and `SPK_sig` are the primary reimplementation hazards. To add these vectors: call `encode_prekey_bundle` with known key material and record the SHA3-256 of the encoded output for both the OPK-present and OPK-absent cases.