Add Specification

2026-04-23 12:36:05 +00:00 · 2026-04-23 12:36:05 +00:00 · 5310d749b1
commit 5310d749b1
parent d869cfdac6
1 changed files with 784 additions and 0 deletions
--- a/Specification.md
+++ b/Specification.md
@ -0,0 +1,784 @@
 # LAC Wire Format
 Normative specification of the LAC bitstream. This document is the authority
 on byte layout, field semantics, and encoder/decoder constraints.
 ## 1. Conventions
 - All multi-byte integer fields are **big-endian**.
 - Bit streams are **MSB-first**: the first bit written occupies bit 7 of its
  byte, subsequent bits fill lower positions, and a new byte begins once
  eight bits have been emitted.
 - Samples are **signed integers** passed as `i32` with magnitude bounded
  by `|sample| ≤ 2²³ − 1`. The upper 9 bits of each `i32` must be a
  consistent sign extension of the 24-bit-magnitude value. Narrower source
  formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly
  — they trivially satisfy the magnitude bound — and compress at the bit
  cost of their actual values, not a 24-bit ceiling. The codec does not
  carry bit-depth metadata; the container or application layer is
  responsible for remembering the source format.
 - Sample rate is **not** part of the bitstream. The container or transport
  carries it.
 - A frame encodes **one channel**. Stereo is two independent streams of
  frames, one per channel.
 ## 2. Frame Layout
 A frame is a contiguous byte sequence:
 ```text
 +--------+--------------------+
 | header | rice_bitstream     |
 +--------+--------------------+
 ```
 The `header` is fixed-structure (variable length because the coefficient
 array depends on `prediction_order`). The `rice_bitstream` is a bit-packed
 payload; its byte length is `ceil(total_rice_bits / 8)` with zero padding in
 the low bits of the last byte.
 Decoder input is the complete frame. There is no intra-frame continuation or
 fragmentation — the transport layer handles that.
 ## 3. Frame Header
 ```text
 Offset  Size  Field                 Type     Constraint
 ------  ----  --------------------  -------  ----------------------------
   0-1   2     sync_word             u16 BE   == 0x1ACC
   2     1     prediction_order      u8       ∈ [0, 32]
   3     1     partition_order       u8       ∈ [0, 7]
   4     1     coefficient_shift     u8       ∈ [0, 5]
   5-6   2     frame_sample_count    u16 BE   ≥ 1, % (1 << partition_order) == 0
   7+    2·p   lpc_coefficients      i16 BE[] length = prediction_order = p
 ```
 Total header length: `7 + 2 · prediction_order` bytes.
 ### 3.1 `sync_word`
 Fixed value `0x1ACC`. Present to identify a LAC frame on lightly framed
 transports and to reject foreign payloads at the first check. Decoders
 **must** reject any frame whose first two bytes are not `0x1ACC`.
 ### 3.2 `prediction_order`
 Integer order of the LPC analysis filter used to produce residuals.
 - Value `0` is **verbatim mode**: no prediction, residuals equal the samples,
  and the `lpc_coefficients` array is empty (zero bytes).
 - Values `1` through `32` are standard LPC orders; `lpc_coefficients` carries
  exactly that many predictor coefficients, interpreted in the Q-format
  determined by `coefficient_shift` (§3.4).
 Decoders **must** reject values above 32.
 ### 3.3 `partition_order`
 Controls how the residual stream is split for Rice coding.
 - `partition_count = 1 << partition_order`.
 - The residual stream is divided into `partition_count` equal partitions of
  `frame_sample_count / partition_count` samples each.
 Decoders **must** reject values above 7 and **must** reject frames where
 `frame_sample_count` is not a multiple of `partition_count`.
 ### 3.4 `coefficient_shift`
 Controls the fixed-point scale of the stored Q-format LPC predictor
 coefficients. Coefficients are stored as 16-bit integers interpreted as
 `Q(15 − coefficient_shift)`:
 | shift | Q-format | Real-value range  | Use case                                       |
 |-------|----------|-------------------|------------------------------------------------|
 | 0     | Q15      | `[−1, 1)`         | Coefficients with magnitude < 1 (most orders > 1) |
 | 1     | Q14      | `[−2, 2)`         | Low-frequency content, `a[1]` near −2         |
 | 2     | Q13      | `[−4, 4)`         | Extreme bass / narrow resonances               |
 | 3     | Q12      | `[−8, 8)`         | Pathological transients                        |
 | 4     | Q11      | `[−16, 16)`       | Reserved for synthetic signals                 |
 | 5     | Q10      | `[−32, 32)`       | Upper bound; decoder rejects larger values     |
 The encoder **must** select the smallest `coefficient_shift` at which no
 coefficient's real value exceeds the representable range for that shift —
 i.e., the smallest scale that does not clamp. Smaller shifts give finer
 precision and thus smaller residuals when no clamping is required.
 If no `coefficient_shift ∈ [0, 5]` suffices (the real coefficient
 magnitude exceeds the Q10 range at `shift = 5`), the encoder
 saturates each offending coefficient independently to the i16 range
 `[−32768, 32767]`. Bit-exact round-trip is preserved because the
 decoder applies the synthesis formula to whatever 16-bit values the
 wire carries; the cost of saturation is compression — the predictor
 no longer matches the encoder's ideal coefficients, so residuals
 grow. Real audio at the input-magnitude contract (§1) rarely reaches
 this case; synthetic or adversarial inputs can force it.
 Decoders **must** reject values above 5. The shift applies uniformly to
 every coefficient in `lpc_coefficients`; there is no per-coefficient
 scale.
 When `prediction_order == 0` (verbatim frame), `coefficient_shift`
 **must** be `0`. The shift only modifies how stored coefficients are
 interpreted, and a verbatim frame stores none. Decoders **must**
 reject frames with `prediction_order == 0` and `coefficient_shift != 0`
 as malformed; this rule closes the space of legal but meaningless
 headers so two implementations agree bit-for-bit on which inputs
 round-trip.
 ### 3.5 `frame_sample_count`
 Number of audio samples produced by this frame (also the number of residuals
 in the Rice bitstream). The value **must** be in `[1, 65535]`.
 Decoders **must** reject `frame_sample_count == 0`: a zero-sample frame
 trivially satisfies the partition-divisibility check below
 (`0 mod n == 0` for any `n`) but carries no audio and has no legal
 Rice payload.
 For compliance with `partition_order`, the value **must** satisfy
 `frame_sample_count mod (1 << partition_order) == 0`. Decoders **must**
 reject frames where this does not hold.
 ### 3.6 `lpc_coefficients`
 Array of `prediction_order` predictor coefficients, each a 16-bit big-endian
 signed integer interpreted in `Q(15 − coefficient_shift)` format — see §3.4
 for the shift semantics.
 The wire format does not distinguish coefficients by derivation. The
 synthesis formula below applies identically whether the encoder
 obtained the values from Levinson-Durbin analysis, from a fixed
 coefficient template (e.g. FLAC-style integer predictors), from a
 trained model, or from any other strategy. What goes on the wire is
 just `prediction_order` 16-bit integers; how the encoder chose them
 is encoder-internal and not observable to the decoder.
 Synthesis formula (applied in the decoder):
 ```text
 s           = 15 − coefficient_shift
 bias        = 1 << (s − 1)
 predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i − j − 1] + bias) >> s
 sample[i]  = residual[i] + predict[i]
 ```
 where `terms = min(i, prediction_order)`. The `+ bias` term implements
 round-to-nearest for the right shift and is **required** for bit-exact
 decoding. For the default `coefficient_shift = 0`: `s = 15`, `bias = 16384`.
 The `>> s` operator **must** be an **arithmetic right shift** on
 signed integers — equivalent to floor division by `2^s`. Combined with
 the `+ bias` pre-add, this implements **round-half toward +∞**: on a
 value whose scaled form is exactly `k + 0.5`, the result is `k + 1`
 for both positive and negative `k`. Implementations using truncating
 integer division (C's `/` on signed integers, which rounds toward
 zero) **will diverge** from this on any `sum + bias` that is negative
 and not evenly divisible by `2^s`: arithmetic shift rounds further
 from zero, truncating division rounds toward zero. Concrete example:
 at `s = 15`, `sum = -32769`, `bias = 16384`, arithmetic shift gives
 `(-16385) >> 15 = -1`, truncating division gives `-16385 / 32768 = 0`.
 Decoders in languages whose native integer division does not floor
 **must** emulate arithmetic right shift explicitly on the signed
 accumulator.
 #### Accumulator width
 The inner sum `Σ coeff[j] · sample[i − j − 1]` **must** be computed in
 a signed integer accumulator of at least **49 bits** (equivalently: an
 `i64` or wider). Worst-case bounds at `prediction_order = 32`,
 `coefficient_shift = 5` (Q10), and full-scale samples give a product
 of magnitude `(2¹⁵) · (2²³ − 1) ≈ 2³⁸` per term, summed over 32 terms
 for a maximum of `~2⁴³`. Adding the bias keeps the result below `2⁴⁴`.
 A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs —
 implementations that reach for `int32_t` because samples are 32-bit
 will silently corrupt high-order frames.
 JavaScript / TypeScript implementers should note that `Number` is an
 IEEE 754 double, not a signed integer: its 53-bit safe-integer range
 covers in-contract accumulator values, but adversarial bitstreams (see
 §6.2) can produce out-of-contract samples whose synthesis arithmetic
 lands in the 2⁴⁹–2⁵¹ range and beyond, where `Number` silently loses
 low bits to float rounding. For bit-exact spec compliance in JS/TS,
 **use `BigInt` for the accumulator** — it has the integer semantics
 the spec requires; `Number` does not.
 #### Warm-up (`terms == 0`)
 When `i == 0`, `terms = min(0, prediction_order) = 0`. The sum is
 empty and `predict[0] = 0` — the `(0 + bias) >> s` formula is **not**
 applied. Stating this explicitly avoids an implementation that
 mechanically applies the formula in the warm-up case and produces
 `predict[0] = bias >> s`, which is zero only in specific
 `(bias, s)` parametrisations and surprising in any other.
 For `0 < i < prediction_order`, the sum truncates to the available
 `i` predecessors (`terms = i`). The formula applies as stated.
 #### Sign convention for stored coefficients
 The synthesis formula uses `+Σ`. Classical Levinson-Durbin
 implementations that derive LPC from the error-prediction AR model
 ```text
 x[n] = −Σ a[j] · x[n-j] + e[n]      (error convention)
 ```
 produce coefficients `a[j]` whose sign is the **opposite** of what
 the synthesis formula expects; those encoders **must** negate before
 quantisation so the wire value is `coeff[j-1] = −a[j]`.
 Implementations using the predictor convention
 ```text
 x̂[n] = +Σ c[j] · x[n-j]             (predictor convention)
 ```
 store `c[j]` directly.
 Both conventions are common in DSP literature. Encoders **must**
 verify that the coefficients emitted on the wire, when substituted
 into the synthesis formula above, reproduce the encoder's own
 prediction. The reference implementation uses the error convention
 and negates at quantisation time.
 #### Overflow semantics of the final add
 The `residual[i] + predict[i]` add is specified as a **wrapping i32
 add** (two's complement, modulo `2³²`; in languages without native
 signed-overflow semantics, compute `(residual + predict) & 0xFFFFFFFF`
 and then re-interpret as a signed 32-bit integer via sign-extension
 of bit 31). On well-formed bitstreams — those produced by a compliant
 encoder from in-contract samples (§1) — the result stays inside the
 sample-magnitude contract and the wrap is never observable.
 Adversarial bitstreams with crafted coefficients and residuals **may**
 produce any `i32` value; the decoder **must not** panic, abort, or
 reject on the basis of this add's result. The consequences of
 out-of-contract decoder output are addressed in §6.2.
 ## 4. Rice Bitstream
 Immediately follows the header. Flat MSB-first bitstream structured as
 consecutive partition payloads:
 ```text
 +---------+---------+-----+-----------+
 | part. 0 | part. 1 | ... | part. P-1 |
 +---------+---------+-----+-----------+
 ```
 where `P = 1 << partition_order`. Each partition has the same structure:
 ```text
 +-------+-------------+-------------+-----+-------------+
 | k (5) | codeword 0  | codeword 1  | ... | codeword M-1|
 +-------+-------------+-------------+-----+-------------+
 ```
 where `M = frame_sample_count / P` is the per-partition residual count.
 Partitions are **bit-contiguous**: the 5-bit `k` field of partition
 `i + 1` begins at the bit immediately following the last codeword of
 partition `i`. There is no byte alignment between partitions. Only
 the final trailing padding described in §4.3 is byte-aligned.
 Within a partition, the bit cursor likewise advances continuously:
 codeword 0 begins at the bit immediately following the 5-bit `k`
 field, codeword 1 immediately after codeword 0's remainder bit, and so
 on. A conformant decoder maintains a single bit-read position across
 the entire Rice bitstream — from the `k` of partition 0 through the
 last codeword of partition P−1 — and never realigns to a byte or bit
 boundary between fields. This is implicit in the byte-stream decoder
 design (a bit reader that consumes bits sequentially needs no
 special handling at field boundaries) but stated here so second-team
 implementations do not introduce a spurious alignment.
 ### 4.1 Per-Partition Parameter `k`
 Five-bit unsigned integer, MSB-first, immediately before the partition's
 codewords. `k` is the Rice parameter for this partition and must be in
 `[0, 23]`. Decoders **must** reject values above 23 as malformed.
 ### 4.2 Codeword
 Each residual is encoded by:
 1. **Zigzag mapping** from signed to unsigned:
   ```text
   z = (r << 1) ^ (r >> 31)            interpreted as u32
   ```
   where `(r >> 31)` is an **arithmetic right shift** on the i32
   residual, sign-extending the sign bit to all 32 bit positions — `0`
   for non-negative `r`, `-1` (all ones) for negative `r`. The entire
   expression is **masked to 32 bits** before being interpreted as
   `u32`; in languages with arbitrary-precision integers (e.g. Python)
   or where native bitwise ops return signed 32-bit (e.g. JavaScript /
   TypeScript `Number`, where `(x ^ y) >>> 0` coerces to u32), this
   mask is explicit (`& 0xFFFFFFFF` or `>>> 0`) and **required** —
   without it, negative residuals produce zigzag values with extra
   high bits set. Implementations in languages whose native right
   shift is logical on unsigned types **must** coerce `r` to a signed
   32-bit type first; implementations in languages where `>>` on
   signed types is implementation-defined (e.g. pre-C++20 C/C++)
   **must** emulate the arithmetic shift explicitly.
   The `r << 1` factor is always safe on i32. Residuals on the
   encoder side are bounded by `|r| ≤ |sample| + |predict| ≤ 2·2²³
   ≈ 2²⁴`, so `r << 1` has magnitude `≤ 2²⁵` and fits in i32 without
   overflow or undefined behaviour — even for `r` at the most
   negative value the encoder can ever produce from in-contract
   input (§1). Decoder-side code does not perform this shift; the
   inverse uses `z >> 1` on a u32, which is always defined.
   The mapping sends
   ```text
   {0, −1, 1, −2, 2, −3, 3, …}  →  {0, 1, 2, 3, 4, 5, 6, …}
   ```
   so small magnitudes of either sign map to small unsigned values.
   The decoder's inverse is
   ```text
   r = ((z >> 1) as i32) ^ −((z & 1) as i32)
   ```
   where both shifts here are natural (unsigned-u32 logical for the
   first, integer negation for the second). Stating this inverse
   explicitly removes any ambiguity about how an implementation must
   invert the zigzag.
 2. **Rice code** at parameter `k`:
   - **Unary part**: `q = z >> k` zero-bits followed by a single
     terminating 1-bit.
   - **Remainder part**: `k` bits of `z & ((1 << k) − 1)`, MSB-first.
     (The remainder part is absent when `k == 0`.)
 Total codeword length: `q + 1 + k` bits.
 #### Decoder-side unary-run bound
 Decoders **must** reject any codeword whose unary run length satisfies
 ```text
 q > (2³² − 1) >> k         (equivalently, q > u32::MAX >> k)
 ```
 A valid codeword reconstructs `z = (q << k) | remainder` as a u32.
 `q > u32::MAX >> k` implies `q << k ≥ 2³²`, which either overflows
 u32 silently (a critical decoder bug class — corrupt output with no
 error) or indicates a malformed stream. Either way the frame **must**
 be discarded with `InvalidParameter` or equivalent rejection class
 (§6). The bound varies with `k`: at `k = 23` it is `511`, at `k = 0`
 it is `u32::MAX` (no practical constraint).
 This rule also caps the CPU cost of unary scanning on adversarial
 input: without the cap, a decoder could be forced to scan an
 arbitrarily long run of zero bits before reaching either a `1` or
 the buffer end.
 ### 4.3 Byte Padding
 After the last codeword of the last partition, any unused bits of the final
 byte are zero-padded on the LSB side. The encoder writes `0` for all padding
 bits; the decoder ignores them.
 ### 4.4 Bitstream Length
 The Rice bitstream's total bit length is the sum of codeword bit
 lengths across every partition, plus 5 bits per partition for the `k`
 fields:
 ```text
 total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition)
 ```
 This depends on every residual's quotient and cannot be computed
 from the frame header alone. A decoder **streams-decodes** until it
 has produced exactly `frame_sample_count` samples, then stops.
 Any bits remaining inside the last byte are padding (§4.3) and
 carry no information.
 A decoder **must not** require the Rice bitstream's byte length to be
 signalled out-of-band. The header plus the zero-padded byte-aligned
 tail fully determines the frame boundary; `parse_header`'s
 `bytes_consumed` return plus streaming Rice decode locates the end of
 the frame in the input buffer.
 ## 5. Degenerate Cases
 ### 5.1 All-Zero Frame
 For an all-zero sample vector, the encoder **must** use
 `prediction_order = 0` because the Levinson-Durbin recursion is
 undefined at `R[0] = 0`. Residuals equal the input (all zeros).
 Partition-order and per-partition `k` selection remain at the
 encoder's discretion; any legal `(partition_order, k)` combination
 produces a bit-exact-decodable frame. The minimum-cost choice —
 `partition_order = 0`, `k = 0` — produces a Rice payload of exactly
 `5 + frame_sample_count` bits. Compliant encoders are **not**
 required to pick this minimum.
 ### 5.2 Single-Sample Frame
 `frame_sample_count = 1` is valid but forces `partition_order = 0` (the only
 value that divides 1 evenly). The single sample is Rice-coded directly
 because no predecessors exist for any LPC order.
 ## 6. Error Recovery
 Decoders **must** detect and reject every frame that violates the
 constraints elsewhere in this document. The exhaustive list of
 rejection classes, each of which is a distinct error condition so
 callers can distinguish them in telemetry, is:
 1. **Sync word mismatch** — bytes `0-1` differ from `0x1ACC` (§3.1).
 2. **`prediction_order` out of range** — value > `32` (§3.2).
 3. **`partition_order` out of range** — value > `7` (§3.3).
 4. **`coefficient_shift` out of range** — value > `5` (§3.4).
 5. **Verbatim frame with non-zero shift** — `prediction_order == 0` and
   `coefficient_shift != 0` (§3.4).
 6. **`frame_sample_count == 0`** (§3.5).
 7. **`frame_sample_count` not divisible by `partition_count`** (§3.3).
 8. **Buffer truncated** — fewer bytes than the header plus coefficient
   array requires, or fewer bits than the Rice bitstream demands
   during streaming decode. This class is intentionally coarse-grained:
   a single `Truncated` variant covers header truncation, missing `k`
   fields, mid-codeword exhaustion, and every other "buffer ends early"
   shape the decoder can encounter. Sub-categorising these provides no
   caller benefit — the recovery action (discard and substitute
   silence, see §6.1) is identical regardless of where truncation
   happened.
 9. **Per-partition `k` out of range** — value > `23` (§4.1).
 10. **Unary-run cap exceeded** — any codeword with `q > u32::MAX >> k`
    (§4.2).
 On any of these, the decoder **must** discard the frame, produce no
 output samples, and signal the error to the caller. **No partial
 state may propagate** to the next frame's decode — frames are
 independent (§2), so subsequent frames decode cleanly regardless.
 ### 6.1 Caller-side silence substitution
 On rejection, the caller substitutes `frame_sample_count` zeros
 (silence) for the frame period. The count is obtained as follows:
 - **Post-header rejections** (classes 8-10 above — `Truncated` in the
  Rice bitstream, `InvalidParameter` during Rice decode): the frame
  header parsed successfully before the failure, so the count is
  recoverable. The caller re-parses just the header on the same buffer
  (reference API: `parse_header(data)`) and reads `frame_sample_count`
  from the resulting `AudioFrameHeader`.
 - **Pre-header rejections** (classes 1-7 above): the header itself
  failed; the frame length is not recoverable from the bitstream. The
  caller **must** fall back to a session-level default frame size
  carried out-of-band by the container or transport (WebRTC and QUIC
  audio sessions typically negotiate this at session setup).
 This asymmetry is inherent to the wire format: `frame_sample_count`
 lives inside the header at offset 5, so any rejection that happens
 while parsing bytes 0-4 precedes its discovery.
 ### 6.2 Decoder output magnitude
 On well-formed bitstreams produced by a compliant encoder from
 in-contract samples (§1), decoder output satisfies
 `|sample| ≤ 2²³ − 1`.
 Adversarial bitstreams — those with hand-crafted coefficients and
 residuals that pass every rejection check in this section yet
 produce arithmetic results outside the sample-magnitude contract —
 **may** produce output samples of any `i32` value, including values
 that exceed `2²³ − 1`. The decoder **must not** panic or reject on
 this basis: the wrapping-add semantics of §3.6 are precisely what
 makes every bit sequence produce a defined output, which is the
 ground of the "no partial state propagates" contract at the top of
 this section.
 Callers that re-feed decoder output into LAC's encoder (for example,
 an MCU decode → PCM mix → re-encode pipeline) **should** validate or
 clamp to the input magnitude contract before re-encoding. A
 compliant encoder assumes its input satisfies `|sample| ≤ 2²³ − 1`
 and is not required to re-validate.
 ## 7. Encoder Guidance (non-normative)
 The reference encoder's search has three phases:
 ```text
 # Phase 0: all-zero short-circuit
 R[0] = Σ sample[i]²
 if R[0] == 0:
    emit frame with prediction_order = 0, any legal partition_order (§5.1)
    return
 # Phase 1: sparse LPC order grid with stop-when-stale early-out
 for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]:
    coeffs_q31     = levinson_durbin(samples, prediction_order)  # cached
    shift          = smallest s such that every |coeff_real| < 2^s
    coeffs_stored  = quantize(coeffs_q31, to: Q(15 - shift))
    residuals      = compute_residuals(samples, coeffs_stored, shift)
    for partition_order in 0..=7:
        if frame_sample_count % (1 << partition_order) != 0: continue
        rice_bits = estimate_cost(residuals, partition_order)
        total     = header_bits(prediction_order) + rice_bits
        track minimum over (prediction_order, partition_order)
    if no improvement for 2 consecutive grid entries: break
 # Phase 2: fixed-predictor post-pass
 for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS:
    residuals = compute_residuals(samples, fp_coeffs, fp_shift)
    for partition_order in 0..=7:
        if frame_sample_count % (1 << partition_order) != 0: continue
        rice_bits = estimate_cost(residuals, partition_order)
        total     = header_bits(fp_order) + rice_bits
        track minimum over (fp_order, partition_order)
 emit frame with the (order, partition_order) that minimised `total`
 ```
 The sparse grid + early-out is a speed/compression trade-off; a
 compliant encoder may still exhaustively search every integer order
 `0..=32` for marginal gains at higher cost. The produced bitstreams
 are interchangeable.
 The fixed-predictor post-pass tries FLAC-style integer predictors
 (orders 1-4 with a small static coefficient table) after the LPC
 grid. These evaluate quickly and occasionally beat the Levinson-
 Durbin winner on content where a low-order integer polynomial fits
 better than the statistically-optimal LPC fit — silent-plus-DC, very
 smooth tones, polynomial-ish sensor data. Running them second avoids
 tripping the stop-when-stale heuristic in the LPC phase.
 The reference encoder's `FIXED_PREDICTORS` table, materialising the
 FLAC-style `[1]`, `[2, −1]`, `[3, −3, 1]`, `[4, −6, 4, −1]` integer
 predictors at the smallest `coefficient_shift` that represents each
 coefficient without clamping:
 | `prediction_order` | `lpc_coefficients` (Q-format integers)      | `coefficient_shift` | Real-value interpretation    |
 |-------------------:|---------------------------------------------|--------------------:|------------------------------|
 | 1                  | `[16384]`                                    | 1 (Q14)             | `[1]`                        |
 | 2                  | `[16384, −8192]`                             | 2 (Q13)             | `[2, −1]`                    |
 | 3                  | `[24576, −24576, 8192]`                      | 2 (Q13)             | `[3, −3, 1]`                 |
 | 4                  | `[16384, −24576, 16384, −4096]`              | 3 (Q12)             | `[4, −6, 4, −1]`             |
 These are the exact wire-format bytes a second-team encoder would emit
 to match the reference's fixed-predictor outputs bit-for-bit. Compliant
 encoders MAY use a different set (or none), since §3.6 treats the
 coefficient field as opaque — decoders apply the synthesis formula
 identically regardless of source.
 The `R[0] == 0` short-circuit is both a correctness requirement (§5.1
 — Levinson-Durbin is undefined at zero autocorrelation) and an
 encoder-cost optimisation: on digital silence, the sparse grid and
 fixed-predictor pass produce identical zero residuals and order 0
 wins on header size alone.
 Levinson-Durbin runs once to order 32 with all intermediate orders saved
 into a flat buffer (one recursion pass yields all orders 1..32 at
 `O(order²)` cost), so the outer loop fetch is free and order selection
 is effectively `O(orders_tried × N)`.
 `shift` is determined per order by the coefficient magnitudes — there
 is no shift search, as smaller shifts are always at least as good as
 larger ones when they don't clamp (saturation, §3.4, is the
 fallback). Rice cost at a given `partition_order` is exact and
 closed-form given the per-partition `k`, so the inner search
 introduces no estimation error.
 #### Levinson-Durbin numerical choices (reference, non-normative)
 The reference encoder runs Levinson-Durbin with i64 autocorrelation
 accumulators, Q31 working coefficients, and widens to i128 for
 reflection-coefficient intermediates at orders where Q31 would lose
 precision (typically above order ~12). Rounding on the Q31→Q(15−shift)
 quantisation step is round-half-up via `(a_q31 + bias) >> shift_amt`,
 with `bias = 1 << (14 + shift)` — the direct analogue of the synthesis
 formula's rounding (§3.6), chosen so analysis and synthesis agree on
 tie-break direction.
 None of these choices are normative. Two encoders making different
 precision or rounding choices will produce different coefficient
 bytes on the same input, but both bitstreams decode correctly under
 §3.6 so long as the coefficients they emit faithfully represent their
 own LPC decision. §3.6's "wire format doesn't distinguish coefficients
 by derivation" clause is exactly what permits this freedom.
 #### Rice `k` selection (reference, non-normative)
 Cost is closed-form for a given `(partition, k)`: `bit_cost(k) =
 N · (1 + k) + Σ (v >> k)` where `v` ranges over the zigzag-mapped
 residuals of the partition. Exhaustive search over `k ∈ [0, 23]` is
 always acceptable and is the simplest compliant choice.
 The reference encoder uses convex descent: seed `k_seed =
 ⌊log₂(mean(v))⌋`, then walk either direction. Ties break **toward
 smaller `k`** on descent (condition `cost ≤ best_cost`, so equal
 costs at `k−1` overwrite `k`) and **strictly larger costs only** on
 ascent (condition `cost < best_cost`, so equal costs at `k+1` don't
 override `k`). This yields a unique `k` per partition matching an
 exhaustive search's first-wins tie-break.
 On the reference corpus, the sparse grid's compression matches
 exhaustive-search output within ~0.2 percentage points (measured);
 the differential test caps the acceptable excess at 0.5%.
 Implementations that prefer tighter compression at higher encode
 cost can extend the grid without wire-format consequences — decoders
 do not care which orders an encoder tried.
 ### 7.1 Frame size (non-normative)
 The codec accepts any `frame_sample_count` in `[1, 65535]`. Larger
 frames amortise the 7-byte fixed header and the LPC coefficient vector
 over more samples, and generally compress better. Smaller frames give
 tighter latency and finer partition-order granularity on transient
 content.
 Recommended defaults by use case:
 - **Real-time voice** (QUIC streaming, MCU mix): 160-320 samples at
  16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used
  by Opus / WebRTC.
 - **Real-time full-band** (game audio, music conferencing): 1024-2048
  at 48 kHz (21-43 ms).
 - **Offline / archival**: 4096-8192 at 48 kHz; compression gains
  flatten past this.
 Power-of-two frame sizes expose every `partition_order ∈ [0, 7]` to
 the encoder's search. Non-power-of-two sizes restrict the search to
 partition counts that divide evenly; a prime frame size forces
 `partition_order = 0`. Encoders SHOULD prefer frame sizes with several
 small-prime factors when free to choose.
 ## 8. Versioning
 This document specifies **LAC version 1**, identified on the wire by
 `sync_word = 0x1ACC`. No per-frame version byte is carried; the sync
 word uniquely identifies the wire format.
 Future revisions of the format **must** use a distinct `sync_word`. The
 recommended allocation is `0x1ACD` for v2, `0x1ACE` for v3, and so on
 inside the `0x1ACC..0x1ACF` cluster, with the cluster boundary making
 casual grep / hex-dump inspection robust. A revision whose wire format
 cannot be made compatible with v1 at the header level **must** pick a
 sync word outside this cluster.
 This approach is exhaustive by construction:
 - A v1 decoder that encounters a v2 frame sees an unrecognised sync
  word on the first check (§3.1) and rejects cleanly — the same error
  path as foreign or corrupted payloads.
 - A v2-aware decoder dispatches on the sync word before reading any
  further field, so it can fall back to v1 parsing when appropriate
  or decode v2 frames natively.
 Because every field in §3 has a strict range with no reserved
 high-bit patterns, in-place extension (flag bits inside existing
 fields) is **not** a supported evolution path. New features go into a
 new `sync_word`, not into reinterpreting existing field values.
 Transports that multiplex LAC frames with other formats should frame
 each LAC payload explicitly (length prefix or stream separator); the
 sync word alone is not a framing delimiter, only a format identifier.
 ## 9. Implementation notes (non-normative)
 ### 9.1 GPU offload is out of scope
 LAC is a scalar integer codec. The reference implementation, and any
 conforming implementation this document anticipates, runs on a CPU.
 GPU offload is deliberately not a goal:
 - **Levinson-Durbin** is serial by construction (each iteration depends
  on the previous) and its intermediate accumulator needs more than 64
  bits of precision at higher orders — a poor fit for WGSL or SPIR-V
  compute shaders, which have no native 128-bit integer arithmetic.
 - **Rice decode** uses a data-dependent unary run for every residual;
  on GPU execution models this diverges warps badly and its
  sequential bit-cursor progression fights SIMD lane packing.
 - **LPC synthesis** has a tight per-sample feedback loop (sample `i`
  depends on samples `i-1`, `i-2`, …, `i-order`), so each channel is
  inherently serial.
 - **The one plausibly GPU-parallel phase** — residual computation
  inside the encoder's order search — is also the phase where the
  CPU's autovectorized implementation is already well-served by SIMD
  on any modern target. At the measured encode latencies (P99 under
  50 µs on x86 for a 20 ms frame period, >400× headroom), there is no
  motivation to offload it.
 A hypothetical future revision whose hot path genuinely benefited from
 GPU execution (large-batch archival encoding across many channels at
 once, for instance) would need to change the wire format to carry
 enough shape metadata for batched kernels — i.e., a new sync word
 under the versioning rules in §8, not a retrofit.
 ## 10. Conformance test vectors
 The reference repository's `tests/conformance.rs` holds the canonical
 test-vector set for this specification:
 - **`DECODE_FIXTURES`** — `(samples, bytes)` pairs pinned at the byte
  level. A conformant decoder **must** produce the `samples` array
  when fed the `bytes` array. Encoders have latitude (§3.6, §7), so a
  second-team encoder's bytes for the same samples may differ; the
  decoder direction is the normative one. Coverage includes the
  smallest valid frames (single-sample verbatim, 4- and 8-sample
  silence), single-sample polarity boundaries (±1, ±(2²³−1)), DC
  offset, alternating-polarity Nyquist-like content, smooth polynomial
  (fixed-predictor territory), and a 16-sample growing-amplitude
  pattern exercising partition search.
 - **`REJECT_FIXTURES`** — hand-constructed malformed inputs mapped to
  their expected rejection variants. Covers every class in §6 (1-10):
  bad sync, each field-range violation, verbatim + non-zero shift,
  `frame_sample_count == 0`, non-divisible partition count, header /
  coefficient / Rice-bitstream truncation, per-partition `k > 23`.
 - **`reject_unary_run_above_cap`** — a programmatic test for §6 class
  10 (`q > u32::MAX >> k`). The minimal triggering payload is ~75
  bytes of mostly zeros; construction logic is in the test, not a
  const fixture.
 Second-team implementations should port the decode fixtures
 byte-for-byte and the reject fixtures byte-and-variant-for-variant.
 `encode_matches_fixtures` in the same file is reference-specific (it
 asserts the reference encoder's exact bytes) and is **not** a
 conformance requirement — see §3.6's encoder-latitude clause.
 ### 10.1 Reference encoder exemplars (non-normative)
 The same `(samples, bytes)` pairs, read in the encoder direction,
 serve as **reference-encoder validation targets** for implementations
 that want to match this project's reference byte-for-byte — a common
 goal for porting work even though the spec does not require it. The
 fixture set is deliberately chosen to pin every encoder-discretion
 axis from §7:
 - `single_zero`, `single_pos_one`, `single_neg_one` — single-sample
  frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime
  where the encoder's order choice is nearly arbitrary; pinning the
  bytes fixes this project's choice (order 0 with the minimum-cost
  Rice encoding).
 - `silence_4`, `silence_8` — force `partition_order` tie-breaks on an
  all-zero frame (every `partition_order ∈ [0, log₂(N)]` produces
  identical cost; the reference picks the smallest via its convex-
  descent tie-break).
 - `dc_100_4`, `alternating_small_4` — exercise the order-vs-verbatim
  decision. DC content favours a low-order LPC fit with small
  residuals; alternating content favours order 1 with `a = −1`
  (approximated at the closest Q-format). Pinning the bytes fixes
  the reference's decision boundary.
 - `single_full_scale_pos`, `single_full_scale_neg` — maximum-magnitude
  single samples. Exercise the `|sample| ≤ 2²³ − 1` boundary on both
  sides and fix the zigzag-of-extremum output.
 - `linear_ramp_8` — smooth polynomial content, fixed-predictor
  territory. Pins the reference's fixed-predictor-vs-LPC tie-break.
 - `lfsr_noise_16` — exercises partition search on a frame large
  enough for `partition_order > 0` to be competitive.
 A second-team encoder that produces the same bytes for every entry
 here is **likely** (not guaranteed) to produce matching bytes on
 wider inputs, since the tie-break axes are the ones most sensitive
 to encoder discretion. An encoder that produces different bytes is
 still compliant so long as its own bytes round-trip — see §3.6, §7.