# LAC Wire Format Normative specification of the LAC bitstream. This document is the authority on byte layout, field semantics, and encoder/decoder constraints. ## 1. Conventions - All multi-byte integer fields are **big-endian**. - Bit streams are **MSB-first**: the first bit written occupies bit 7 of its byte, subsequent bits fill lower positions, and a new byte begins once eight bits have been emitted. - Samples are **signed integers** passed as `i32` with magnitude bounded by `|sample| ≤ 2²³ − 1`. The upper 9 bits of each `i32` must be a consistent sign extension of the 24-bit-magnitude value. Narrower source formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly — they trivially satisfy the magnitude bound — and compress at the bit cost of their actual values, not a 24-bit ceiling. The codec does not carry bit-depth metadata; the container or application layer is responsible for remembering the source format. - Sample rate is **not** part of the bitstream. The container or transport carries it. - A frame encodes **one channel**. Stereo is two independent streams of frames, one per channel. ## 2. Frame Layout A frame is a contiguous byte sequence: ```text +--------+--------------------+ | header | rice_bitstream | +--------+--------------------+ ``` The `header` is fixed-structure (variable length because the coefficient array depends on `prediction_order`). The `rice_bitstream` is a bit-packed payload; its byte length is `ceil(total_rice_bits / 8)` with zero padding in the low bits of the last byte. Decoder input is the complete frame. There is no intra-frame continuation or fragmentation — the transport layer handles that. ## 3. Frame Header ```text Offset Size Field Type Constraint ------ ---- -------------------- ------- ---------------------------- 0-1 2 sync_word u16 BE == 0x1ACC 2 1 prediction_order u8 ∈ [0, 32] 3 1 partition_order u8 ∈ [0, 7] 4 1 coefficient_shift u8 ∈ [0, 5] 5-6 2 frame_sample_count u16 BE ≥ 1, % (1 << partition_order) == 0 7+ 2·p lpc_coefficients i16 BE[] length = prediction_order = p ``` Total header length: `7 + 2 · prediction_order` bytes. ### 3.1 `sync_word` Fixed value `0x1ACC`. Present to identify a LAC frame on lightly framed transports and to reject foreign payloads at the first check. Decoders **must** reject any frame whose first two bytes are not `0x1ACC`. ### 3.2 `prediction_order` Integer order of the LPC analysis filter used to produce residuals. - Value `0` is **verbatim mode**: no prediction, residuals equal the samples, and the `lpc_coefficients` array is empty (zero bytes). - Values `1` through `32` are standard LPC orders; `lpc_coefficients` carries exactly that many predictor coefficients, interpreted in the Q-format determined by `coefficient_shift` (§3.4). Decoders **must** reject values above 32. ### 3.3 `partition_order` Controls how the residual stream is split for Rice coding. - `partition_count = 1 << partition_order`. - The residual stream is divided into `partition_count` equal partitions of `frame_sample_count / partition_count` samples each. Decoders **must** reject values above 7 and **must** reject frames where `frame_sample_count` is not a multiple of `partition_count`. ### 3.4 `coefficient_shift` Controls the fixed-point scale of the stored Q-format LPC predictor coefficients. Coefficients are stored as 16-bit integers interpreted as `Q(15 − coefficient_shift)`: | shift | Q-format | Real-value range | Use case | |-------|----------|-------------------|------------------------------------------------| | 0 | Q15 | `[−1, 1)` | Coefficients with magnitude < 1 (most orders > 1) | | 1 | Q14 | `[−2, 2)` | Low-frequency content, `a[1]` near −2 | | 2 | Q13 | `[−4, 4)` | Extreme bass / narrow resonances | | 3 | Q12 | `[−8, 8)` | Pathological transients | | 4 | Q11 | `[−16, 16)` | Reserved for synthetic signals | | 5 | Q10 | `[−32, 32)` | Upper bound; decoder rejects larger values | The encoder **must** select the smallest `coefficient_shift` at which no coefficient's real value exceeds the representable range for that shift — i.e., the smallest scale that does not clamp. Smaller shifts give finer precision and thus smaller residuals when no clamping is required. If no `coefficient_shift ∈ [0, 5]` suffices (the real coefficient magnitude exceeds the Q10 range at `shift = 5`), the encoder saturates each offending coefficient independently to the i16 range `[−32768, 32767]`. Bit-exact round-trip is preserved because the decoder applies the synthesis formula to whatever 16-bit values the wire carries; the cost of saturation is compression — the predictor no longer matches the encoder's ideal coefficients, so residuals grow. Real audio at the input-magnitude contract (§1) rarely reaches this case; synthetic or adversarial inputs can force it. Decoders **must** reject values above 5. The shift applies uniformly to every coefficient in `lpc_coefficients`; there is no per-coefficient scale. When `prediction_order == 0` (verbatim frame), `coefficient_shift` **must** be `0`. The shift only modifies how stored coefficients are interpreted, and a verbatim frame stores none. Decoders **must** reject frames with `prediction_order == 0` and `coefficient_shift != 0` as malformed; this rule closes the space of legal but meaningless headers so two implementations agree bit-for-bit on which inputs round-trip. ### 3.5 `frame_sample_count` Number of audio samples produced by this frame (also the number of residuals in the Rice bitstream). The value **must** be in `[1, 65535]`. Decoders **must** reject `frame_sample_count == 0`: a zero-sample frame trivially satisfies the partition-divisibility check below (`0 mod n == 0` for any `n`) but carries no audio and has no legal Rice payload. For compliance with `partition_order`, the value **must** satisfy `frame_sample_count mod (1 << partition_order) == 0`. Decoders **must** reject frames where this does not hold. ### 3.6 `lpc_coefficients` Array of `prediction_order` predictor coefficients, each a 16-bit big-endian signed integer interpreted in `Q(15 − coefficient_shift)` format — see §3.4 for the shift semantics. The wire format does not distinguish coefficients by derivation. The synthesis formula below applies identically whether the encoder obtained the values from Levinson-Durbin analysis, from a fixed coefficient template (e.g. FLAC-style integer predictors), from a trained model, or from any other strategy. What goes on the wire is just `prediction_order` 16-bit integers; how the encoder chose them is encoder-internal and not observable to the decoder. Synthesis formula (applied in the decoder): ```text s = 15 − coefficient_shift bias = 1 << (s − 1) predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i − j − 1] + bias) >> s sample[i] = residual[i] + predict[i] ``` where `terms = min(i, prediction_order)`. The `+ bias` term implements round-to-nearest for the right shift and is **required** for bit-exact decoding. For the default `coefficient_shift = 0`: `s = 15`, `bias = 16384`. The `>> s` operator **must** be an **arithmetic right shift** on signed integers — equivalent to floor division by `2^s`. Combined with the `+ bias` pre-add, this implements **round-half toward +∞**: on a value whose scaled form is exactly `k + 0.5`, the result is `k + 1` for both positive and negative `k`. Implementations using truncating integer division (C's `/` on signed integers, which rounds toward zero) **will diverge** from this on any `sum + bias` that is negative and not evenly divisible by `2^s`: arithmetic shift rounds further from zero, truncating division rounds toward zero. Concrete example: at `s = 15`, `sum = -32769`, `bias = 16384`, arithmetic shift gives `(-16385) >> 15 = -1`, truncating division gives `-16385 / 32768 = 0`. Decoders in languages whose native integer division does not floor **must** emulate arithmetic right shift explicitly on the signed accumulator. #### Accumulator width The inner sum `Σ coeff[j] · sample[i − j − 1]` **must** be computed in a signed integer accumulator of at least **49 bits** (equivalently: an `i64` or wider). Worst-case bounds at `prediction_order = 32`, `coefficient_shift = 5` (Q10), and full-scale samples give a product of magnitude `(2¹⁵) · (2²³ − 1) ≈ 2³⁸` per term, summed over 32 terms for a maximum of `~2⁴³`. Adding the bias keeps the result below `2⁴⁴`. A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs — implementations that reach for `int32_t` because samples are 32-bit will silently corrupt high-order frames. JavaScript / TypeScript implementers should note that `Number` is an IEEE 754 double, not a signed integer: its 53-bit safe-integer range covers in-contract accumulator values, but adversarial bitstreams (see §6.2) can produce out-of-contract samples whose synthesis arithmetic lands in the 2⁴⁹–2⁵¹ range and beyond, where `Number` silently loses low bits to float rounding. For bit-exact spec compliance in JS/TS, **use `BigInt` for the accumulator** — it has the integer semantics the spec requires; `Number` does not. #### Warm-up (`terms == 0`) When `i == 0`, `terms = min(0, prediction_order) = 0`. The sum is empty and `predict[0] = 0` — the `(0 + bias) >> s` formula is **not** applied. Stating this explicitly avoids an implementation that mechanically applies the formula in the warm-up case and produces `predict[0] = bias >> s`, which is zero only in specific `(bias, s)` parametrisations and surprising in any other. For `0 < i < prediction_order`, the sum truncates to the available `i` predecessors (`terms = i`). The formula applies as stated. #### Sign convention for stored coefficients The synthesis formula uses `+Σ`. Classical Levinson-Durbin implementations that derive LPC from the error-prediction AR model ```text x[n] = −Σ a[j] · x[n-j] + e[n] (error convention) ``` produce coefficients `a[j]` whose sign is the **opposite** of what the synthesis formula expects; those encoders **must** negate before quantisation so the wire value is `coeff[j-1] = −a[j]`. Implementations using the predictor convention ```text x̂[n] = +Σ c[j] · x[n-j] (predictor convention) ``` store `c[j]` directly. Both conventions are common in DSP literature. Encoders **must** verify that the coefficients emitted on the wire, when substituted into the synthesis formula above, reproduce the encoder's own prediction. The reference implementation uses the error convention and negates at quantisation time. #### Overflow semantics of the final add The `residual[i] + predict[i]` add is specified as a **wrapping i32 add** (two's complement, modulo `2³²`; in languages without native signed-overflow semantics, compute `(residual + predict) & 0xFFFFFFFF` and then re-interpret as a signed 32-bit integer via sign-extension of bit 31). On well-formed bitstreams — those produced by a compliant encoder from in-contract samples (§1) — the result stays inside the sample-magnitude contract and the wrap is never observable. Adversarial bitstreams with crafted coefficients and residuals **may** produce any `i32` value; the decoder **must not** panic, abort, or reject on the basis of this add's result. The consequences of out-of-contract decoder output are addressed in §6.2. ## 4. Rice Bitstream Immediately follows the header. Flat MSB-first bitstream structured as consecutive partition payloads: ```text +---------+---------+-----+-----------+ | part. 0 | part. 1 | ... | part. P-1 | +---------+---------+-----+-----------+ ``` where `P = 1 << partition_order`. Each partition has the same structure: ```text +-------+-------------+-------------+-----+-------------+ | k (5) | codeword 0 | codeword 1 | ... | codeword M-1| +-------+-------------+-------------+-----+-------------+ ``` where `M = frame_sample_count / P` is the per-partition residual count. Partitions are **bit-contiguous**: the 5-bit `k` field of partition `i + 1` begins at the bit immediately following the last codeword of partition `i`. There is no byte alignment between partitions. Only the final trailing padding described in §4.3 is byte-aligned. Within a partition, the bit cursor likewise advances continuously: codeword 0 begins at the bit immediately following the 5-bit `k` field, codeword 1 immediately after codeword 0's remainder bit, and so on. A conformant decoder maintains a single bit-read position across the entire Rice bitstream — from the `k` of partition 0 through the last codeword of partition P−1 — and never realigns to a byte or bit boundary between fields. This is implicit in the byte-stream decoder design (a bit reader that consumes bits sequentially needs no special handling at field boundaries) but stated here so second-team implementations do not introduce a spurious alignment. ### 4.1 Per-Partition Parameter `k` Five-bit unsigned integer, MSB-first, immediately before the partition's codewords. `k` is the Rice parameter for this partition and must be in `[0, 23]`. Decoders **must** reject values above 23 as malformed. ### 4.2 Codeword Each residual is encoded by: 1. **Zigzag mapping** from signed to unsigned: ```text z = (r << 1) ^ (r >> 31) interpreted as u32 ``` where `(r >> 31)` is an **arithmetic right shift** on the i32 residual, sign-extending the sign bit to all 32 bit positions — `0` for non-negative `r`, `-1` (all ones) for negative `r`. The entire expression is **masked to 32 bits** before being interpreted as `u32`; in languages with arbitrary-precision integers (e.g. Python) or where native bitwise ops return signed 32-bit (e.g. JavaScript / TypeScript `Number`, where `(x ^ y) >>> 0` coerces to u32), this mask is explicit (`& 0xFFFFFFFF` or `>>> 0`) and **required** — without it, negative residuals produce zigzag values with extra high bits set. Implementations in languages whose native right shift is logical on unsigned types **must** coerce `r` to a signed 32-bit type first; implementations in languages where `>>` on signed types is implementation-defined (e.g. pre-C++20 C/C++) **must** emulate the arithmetic shift explicitly. The `r << 1` factor is always safe on i32. Residuals on the encoder side are bounded by `|r| ≤ |sample| + |predict| ≤ 2·2²³ ≈ 2²⁴`, so `r << 1` has magnitude `≤ 2²⁵` and fits in i32 without overflow or undefined behaviour — even for `r` at the most negative value the encoder can ever produce from in-contract input (§1). Decoder-side code does not perform this shift; the inverse uses `z >> 1` on a u32, which is always defined. The mapping sends ```text {0, −1, 1, −2, 2, −3, 3, …} → {0, 1, 2, 3, 4, 5, 6, …} ``` so small magnitudes of either sign map to small unsigned values. The decoder's inverse is ```text r = ((z >> 1) as i32) ^ −((z & 1) as i32) ``` where both shifts here are natural (unsigned-u32 logical for the first, integer negation for the second). Stating this inverse explicitly removes any ambiguity about how an implementation must invert the zigzag. 2. **Rice code** at parameter `k`: - **Unary part**: `q = z >> k` zero-bits followed by a single terminating 1-bit. - **Remainder part**: `k` bits of `z & ((1 << k) − 1)`, MSB-first. (The remainder part is absent when `k == 0`.) Total codeword length: `q + 1 + k` bits. #### Decoder-side unary-run bound Decoders **must** reject any codeword whose unary run length satisfies ```text q > (2³² − 1) >> k (equivalently, q > u32::MAX >> k) ``` A valid codeword reconstructs `z = (q << k) | remainder` as a u32. `q > u32::MAX >> k` implies `q << k ≥ 2³²`, which either overflows u32 silently (a critical decoder bug class — corrupt output with no error) or indicates a malformed stream. Either way the frame **must** be discarded with `InvalidParameter` or equivalent rejection class (§6). The bound varies with `k`: at `k = 23` it is `511`, at `k = 0` it is `u32::MAX` (no practical constraint). This rule also caps the CPU cost of unary scanning on adversarial input: without the cap, a decoder could be forced to scan an arbitrarily long run of zero bits before reaching either a `1` or the buffer end. ### 4.3 Byte Padding After the last codeword of the last partition, any unused bits of the final byte are zero-padded on the LSB side. The encoder writes `0` for all padding bits; the decoder ignores them. ### 4.4 Bitstream Length The Rice bitstream's total bit length is the sum of codeword bit lengths across every partition, plus 5 bits per partition for the `k` fields: ```text total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition) ``` This depends on every residual's quotient and cannot be computed from the frame header alone. A decoder **streams-decodes** until it has produced exactly `frame_sample_count` samples, then stops. Any bits remaining inside the last byte are padding (§4.3) and carry no information. A decoder **must not** require the Rice bitstream's byte length to be signalled out-of-band. The header plus the zero-padded byte-aligned tail fully determines the frame boundary; `parse_header`'s `bytes_consumed` return plus streaming Rice decode locates the end of the frame in the input buffer. ## 5. Degenerate Cases ### 5.1 All-Zero Frame For an all-zero sample vector, the encoder **must** use `prediction_order = 0` because the Levinson-Durbin recursion is undefined at `R[0] = 0`. Residuals equal the input (all zeros). Partition-order and per-partition `k` selection remain at the encoder's discretion; any legal `(partition_order, k)` combination produces a bit-exact-decodable frame. The minimum-cost choice — `partition_order = 0`, `k = 0` — produces a Rice payload of exactly `5 + frame_sample_count` bits. Compliant encoders are **not** required to pick this minimum. ### 5.2 Single-Sample Frame `frame_sample_count = 1` is valid but forces `partition_order = 0` (the only value that divides 1 evenly). The single sample is Rice-coded directly because no predecessors exist for any LPC order. ## 6. Error Recovery Decoders **must** detect and reject every frame that violates the constraints elsewhere in this document. The exhaustive list of rejection classes, each of which is a distinct error condition so callers can distinguish them in telemetry, is: 1. **Sync word mismatch** — bytes `0-1` differ from `0x1ACC` (§3.1). 2. **`prediction_order` out of range** — value > `32` (§3.2). 3. **`partition_order` out of range** — value > `7` (§3.3). 4. **`coefficient_shift` out of range** — value > `5` (§3.4). 5. **Verbatim frame with non-zero shift** — `prediction_order == 0` and `coefficient_shift != 0` (§3.4). 6. **`frame_sample_count == 0`** (§3.5). 7. **`frame_sample_count` not divisible by `partition_count`** (§3.3). 8. **Buffer truncated** — fewer bytes than the header plus coefficient array requires, or fewer bits than the Rice bitstream demands during streaming decode. This class is intentionally coarse-grained: a single `Truncated` variant covers header truncation, missing `k` fields, mid-codeword exhaustion, and every other "buffer ends early" shape the decoder can encounter. Sub-categorising these provides no caller benefit — the recovery action (discard and substitute silence, see §6.1) is identical regardless of where truncation happened. 9. **Per-partition `k` out of range** — value > `23` (§4.1). 10. **Unary-run cap exceeded** — any codeword with `q > u32::MAX >> k` (§4.2). On any of these, the decoder **must** discard the frame, produce no output samples, and signal the error to the caller. **No partial state may propagate** to the next frame's decode — frames are independent (§2), so subsequent frames decode cleanly regardless. ### 6.1 Caller-side silence substitution On rejection, the caller substitutes `frame_sample_count` zeros (silence) for the frame period. The count is obtained as follows: - **Post-header rejections** (classes 8-10 above — `Truncated` in the Rice bitstream, `InvalidParameter` during Rice decode): the frame header parsed successfully before the failure, so the count is recoverable. The caller re-parses just the header on the same buffer (reference API: `parse_header(data)`) and reads `frame_sample_count` from the resulting `AudioFrameHeader`. - **Pre-header rejections** (classes 1-7 above): the header itself failed; the frame length is not recoverable from the bitstream. The caller **must** fall back to a session-level default frame size carried out-of-band by the container or transport (WebRTC and QUIC audio sessions typically negotiate this at session setup). This asymmetry is inherent to the wire format: `frame_sample_count` lives inside the header at offset 5, so any rejection that happens while parsing bytes 0-4 precedes its discovery. ### 6.2 Decoder output magnitude On well-formed bitstreams produced by a compliant encoder from in-contract samples (§1), decoder output satisfies `|sample| ≤ 2²³ − 1`. Adversarial bitstreams — those with hand-crafted coefficients and residuals that pass every rejection check in this section yet produce arithmetic results outside the sample-magnitude contract — **may** produce output samples of any `i32` value, including values that exceed `2²³ − 1`. The decoder **must not** panic or reject on this basis: the wrapping-add semantics of §3.6 are precisely what makes every bit sequence produce a defined output, which is the ground of the "no partial state propagates" contract at the top of this section. Callers that re-feed decoder output into LAC's encoder (for example, an MCU decode → PCM mix → re-encode pipeline) **should** validate or clamp to the input magnitude contract before re-encoding. A compliant encoder assumes its input satisfies `|sample| ≤ 2²³ − 1` and is not required to re-validate. ## 7. Encoder Guidance (non-normative) The reference encoder's search has three phases: ```text # Phase 0: all-zero short-circuit R[0] = Σ sample[i]² if R[0] == 0: emit frame with prediction_order = 0, any legal partition_order (§5.1) return # Phase 1: sparse LPC order grid with stop-when-stale early-out for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]: coeffs_q31 = levinson_durbin(samples, prediction_order) # cached shift = smallest s such that every |coeff_real| < 2^s coeffs_stored = quantize(coeffs_q31, to: Q(15 - shift)) residuals = compute_residuals(samples, coeffs_stored, shift) for partition_order in 0..=7: if frame_sample_count % (1 << partition_order) != 0: continue rice_bits = estimate_cost(residuals, partition_order) total = header_bits(prediction_order) + rice_bits track minimum over (prediction_order, partition_order) if no improvement for 2 consecutive grid entries: break # Phase 2: fixed-predictor post-pass for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS: residuals = compute_residuals(samples, fp_coeffs, fp_shift) for partition_order in 0..=7: if frame_sample_count % (1 << partition_order) != 0: continue rice_bits = estimate_cost(residuals, partition_order) total = header_bits(fp_order) + rice_bits track minimum over (fp_order, partition_order) emit frame with the (order, partition_order) that minimised `total` ``` The sparse grid + early-out is a speed/compression trade-off; a compliant encoder may still exhaustively search every integer order `0..=32` for marginal gains at higher cost. The produced bitstreams are interchangeable. The fixed-predictor post-pass tries FLAC-style integer predictors (orders 1-4 with a small static coefficient table) after the LPC grid. These evaluate quickly and occasionally beat the Levinson- Durbin winner on content where a low-order integer polynomial fits better than the statistically-optimal LPC fit — silent-plus-DC, very smooth tones, polynomial-ish sensor data. Running them second avoids tripping the stop-when-stale heuristic in the LPC phase. The reference encoder's `FIXED_PREDICTORS` table, materialising the FLAC-style `[1]`, `[2, −1]`, `[3, −3, 1]`, `[4, −6, 4, −1]` integer predictors at the smallest `coefficient_shift` that represents each coefficient without clamping: | `prediction_order` | `lpc_coefficients` (Q-format integers) | `coefficient_shift` | Real-value interpretation | |-------------------:|---------------------------------------------|--------------------:|------------------------------| | 1 | `[16384]` | 1 (Q14) | `[1]` | | 2 | `[16384, −8192]` | 2 (Q13) | `[2, −1]` | | 3 | `[24576, −24576, 8192]` | 2 (Q13) | `[3, −3, 1]` | | 4 | `[16384, −24576, 16384, −4096]` | 3 (Q12) | `[4, −6, 4, −1]` | These are the exact wire-format bytes a second-team encoder would emit to match the reference's fixed-predictor outputs bit-for-bit. Compliant encoders MAY use a different set (or none), since §3.6 treats the coefficient field as opaque — decoders apply the synthesis formula identically regardless of source. The `R[0] == 0` short-circuit is both a correctness requirement (§5.1 — Levinson-Durbin is undefined at zero autocorrelation) and an encoder-cost optimisation: on digital silence, the sparse grid and fixed-predictor pass produce identical zero residuals and order 0 wins on header size alone. Levinson-Durbin runs once to order 32 with all intermediate orders saved into a flat buffer (one recursion pass yields all orders 1..32 at `O(order²)` cost), so the outer loop fetch is free and order selection is effectively `O(orders_tried × N)`. `shift` is determined per order by the coefficient magnitudes — there is no shift search, as smaller shifts are always at least as good as larger ones when they don't clamp (saturation, §3.4, is the fallback). Rice cost at a given `partition_order` is exact and closed-form given the per-partition `k`, so the inner search introduces no estimation error. #### Levinson-Durbin numerical choices (reference, non-normative) The reference encoder runs Levinson-Durbin with i64 autocorrelation accumulators, Q31 working coefficients, and widens to i128 for reflection-coefficient intermediates at orders where Q31 would lose precision (typically above order ~12). Rounding on the Q31→Q(15−shift) quantisation step is round-half-up via `(a_q31 + bias) >> shift_amt`, with `bias = 1 << (14 + shift)` — the direct analogue of the synthesis formula's rounding (§3.6), chosen so analysis and synthesis agree on tie-break direction. None of these choices are normative. Two encoders making different precision or rounding choices will produce different coefficient bytes on the same input, but both bitstreams decode correctly under §3.6 so long as the coefficients they emit faithfully represent their own LPC decision. §3.6's "wire format doesn't distinguish coefficients by derivation" clause is exactly what permits this freedom. #### Rice `k` selection (reference, non-normative) Cost is closed-form for a given `(partition, k)`: `bit_cost(k) = N · (1 + k) + Σ (v >> k)` where `v` ranges over the zigzag-mapped residuals of the partition. Exhaustive search over `k ∈ [0, 23]` is always acceptable and is the simplest compliant choice. The reference encoder uses convex descent: seed `k_seed = ⌊log₂(mean(v))⌋`, then walk either direction. Ties break **toward smaller `k`** on descent (condition `cost ≤ best_cost`, so equal costs at `k−1` overwrite `k`) and **strictly larger costs only** on ascent (condition `cost < best_cost`, so equal costs at `k+1` don't override `k`). This yields a unique `k` per partition matching an exhaustive search's first-wins tie-break. On the reference corpus, the sparse grid's compression matches exhaustive-search output within ~0.2 percentage points (measured); the differential test caps the acceptable excess at 0.5%. Implementations that prefer tighter compression at higher encode cost can extend the grid without wire-format consequences — decoders do not care which orders an encoder tried. ### 7.1 Frame size (non-normative) The codec accepts any `frame_sample_count` in `[1, 65535]`. Larger frames amortise the 7-byte fixed header and the LPC coefficient vector over more samples, and generally compress better. Smaller frames give tighter latency and finer partition-order granularity on transient content. Recommended defaults by use case: - **Real-time voice** (QUIC streaming, MCU mix): 160-320 samples at 16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used by Opus / WebRTC. - **Real-time full-band** (game audio, music conferencing): 1024-2048 at 48 kHz (21-43 ms). - **Offline / archival**: 4096-8192 at 48 kHz; compression gains flatten past this. Power-of-two frame sizes expose every `partition_order ∈ [0, 7]` to the encoder's search. Non-power-of-two sizes restrict the search to partition counts that divide evenly; a prime frame size forces `partition_order = 0`. Encoders SHOULD prefer frame sizes with several small-prime factors when free to choose. ## 8. Versioning This document specifies **LAC version 1**, identified on the wire by `sync_word = 0x1ACC`. No per-frame version byte is carried; the sync word uniquely identifies the wire format. Future revisions of the format **must** use a distinct `sync_word`. The recommended allocation is `0x1ACD` for v2, `0x1ACE` for v3, and so on inside the `0x1ACC..0x1ACF` cluster, with the cluster boundary making casual grep / hex-dump inspection robust. A revision whose wire format cannot be made compatible with v1 at the header level **must** pick a sync word outside this cluster. This approach is exhaustive by construction: - A v1 decoder that encounters a v2 frame sees an unrecognised sync word on the first check (§3.1) and rejects cleanly — the same error path as foreign or corrupted payloads. - A v2-aware decoder dispatches on the sync word before reading any further field, so it can fall back to v1 parsing when appropriate or decode v2 frames natively. Because every field in §3 has a strict range with no reserved high-bit patterns, in-place extension (flag bits inside existing fields) is **not** a supported evolution path. New features go into a new `sync_word`, not into reinterpreting existing field values. Transports that multiplex LAC frames with other formats should frame each LAC payload explicitly (length prefix or stream separator); the sync word alone is not a framing delimiter, only a format identifier. ## 9. Implementation notes (non-normative) ### 9.1 GPU offload is out of scope LAC is a scalar integer codec. The reference implementation, and any conforming implementation this document anticipates, runs on a CPU. GPU offload is deliberately not a goal: - **Levinson-Durbin** is serial by construction (each iteration depends on the previous) and its intermediate accumulator needs more than 64 bits of precision at higher orders — a poor fit for WGSL or SPIR-V compute shaders, which have no native 128-bit integer arithmetic. - **Rice decode** uses a data-dependent unary run for every residual; on GPU execution models this diverges warps badly and its sequential bit-cursor progression fights SIMD lane packing. - **LPC synthesis** has a tight per-sample feedback loop (sample `i` depends on samples `i-1`, `i-2`, …, `i-order`), so each channel is inherently serial. - **The one plausibly GPU-parallel phase** — residual computation inside the encoder's order search — is also the phase where the CPU's autovectorized implementation is already well-served by SIMD on any modern target. At the measured encode latencies (P99 under 50 µs on x86 for a 20 ms frame period, >400× headroom), there is no motivation to offload it. A hypothetical future revision whose hot path genuinely benefited from GPU execution (large-batch archival encoding across many channels at once, for instance) would need to change the wire format to carry enough shape metadata for batched kernels — i.e., a new sync word under the versioning rules in §8, not a retrofit. ## 10. Conformance test vectors The reference repository's `tests/conformance.rs` holds the canonical test-vector set for this specification: - **`DECODE_FIXTURES`** — `(samples, bytes)` pairs pinned at the byte level. A conformant decoder **must** produce the `samples` array when fed the `bytes` array. Encoders have latitude (§3.6, §7), so a second-team encoder's bytes for the same samples may differ; the decoder direction is the normative one. Coverage includes the smallest valid frames (single-sample verbatim, 4- and 8-sample silence), single-sample polarity boundaries (±1, ±(2²³−1)), DC offset, alternating-polarity Nyquist-like content, smooth polynomial (fixed-predictor territory), and a 16-sample growing-amplitude pattern exercising partition search. - **`REJECT_FIXTURES`** — hand-constructed malformed inputs mapped to their expected rejection variants. Covers every class in §6 (1-10): bad sync, each field-range violation, verbatim + non-zero shift, `frame_sample_count == 0`, non-divisible partition count, header / coefficient / Rice-bitstream truncation, per-partition `k > 23`. - **`reject_unary_run_above_cap`** — a programmatic test for §6 class 10 (`q > u32::MAX >> k`). The minimal triggering payload is ~75 bytes of mostly zeros; construction logic is in the test, not a const fixture. Second-team implementations should port the decode fixtures byte-for-byte and the reject fixtures byte-and-variant-for-variant. `encode_matches_fixtures` in the same file is reference-specific (it asserts the reference encoder's exact bytes) and is **not** a conformance requirement — see §3.6's encoder-latitude clause. ### 10.1 Reference encoder exemplars (non-normative) The same `(samples, bytes)` pairs, read in the encoder direction, serve as **reference-encoder validation targets** for implementations that want to match this project's reference byte-for-byte — a common goal for porting work even though the spec does not require it. The fixture set is deliberately chosen to pin every encoder-discretion axis from §7: - `single_zero`, `single_pos_one`, `single_neg_one` — single-sample frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime where the encoder's order choice is nearly arbitrary; pinning the bytes fixes this project's choice (order 0 with the minimum-cost Rice encoding). - `silence_4`, `silence_8` — force `partition_order` tie-breaks on an all-zero frame (every `partition_order ∈ [0, log₂(N)]` produces identical cost; the reference picks the smallest via its convex- descent tie-break). - `dc_100_4`, `alternating_small_4` — exercise the order-vs-verbatim decision. DC content favours a low-order LPC fit with small residuals; alternating content favours order 1 with `a = −1` (approximated at the closest Q-format). Pinning the bytes fixes the reference's decision boundary. - `single_full_scale_pos`, `single_full_scale_neg` — maximum-magnitude single samples. Exercise the `|sample| ≤ 2²³ − 1` boundary on both sides and fix the zigzag-of-extremum output. - `linear_ramp_8` — smooth polynomial content, fixed-predictor territory. Pins the reference's fixed-predictor-vs-LPC tie-break. - `lfsr_noise_16` — exercises partition search on a frame large enough for `partition_order > 0` to be competitive. A second-team encoder that produces the same bytes for every entry here is **likely** (not guaranteed) to produce matching bytes on wider inputs, since the tie-break axes are the ones most sensitive to encoder discretion. An encoder that produces different bytes is still compliant so long as its own bytes round-trip — see §3.6, §7.