diff --git a/Specification.md b/Specification.md
new file mode 100644
index 0000000..eedae43
--- /dev/null
+++ b/Specification.md
@@ -0,0 +1,784 @@
+# LAC Wire Format
+
+Normative specification of the LAC bitstream. This document is the authority
+on byte layout, field semantics, and encoder/decoder constraints.
+
+## 1. Conventions
+
+- All multi-byte integer fields are **big-endian**.
+- Bit streams are **MSB-first**: the first bit written occupies bit 7 of its
+  byte, subsequent bits fill lower positions, and a new byte begins once
+  eight bits have been emitted.
+- Samples are **signed integers** passed as `i32` with magnitude bounded
+  by `|sample| ≤ 2²³ − 1`. The upper 9 bits of each `i32` must be a
+  consistent sign extension of the 24-bit-magnitude value. Narrower source
+  formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly
+  — they trivially satisfy the magnitude bound — and compress at the bit
+  cost of their actual values, not a 24-bit ceiling. The codec does not
+  carry bit-depth metadata; the container or application layer is
+  responsible for remembering the source format.
+- Sample rate is **not** part of the bitstream. The container or transport
+  carries it.
+- A frame encodes **one channel**. Stereo is two independent streams of
+  frames, one per channel.
+
+## 2. Frame Layout
+
+A frame is a contiguous byte sequence:
+
+```text
++--------+--------------------+
+| header | rice_bitstream     |
++--------+--------------------+
+```
+
+The `header` is fixed-structure (variable length because the coefficient
+array depends on `prediction_order`). The `rice_bitstream` is a bit-packed
+payload; its byte length is `ceil(total_rice_bits / 8)` with zero padding in
+the low bits of the last byte.
+
+Decoder input is the complete frame. There is no intra-frame continuation or
+fragmentation — the transport layer handles that.
+
+## 3. Frame Header
+
+```text
+ Offset  Size  Field                 Type     Constraint
+ ------  ----  --------------------  -------  ----------------------------
+   0-1   2     sync_word             u16 BE   == 0x1ACC
+   2     1     prediction_order      u8       ∈ [0, 32]
+   3     1     partition_order       u8       ∈ [0, 7]
+   4     1     coefficient_shift     u8       ∈ [0, 5]
+   5-6   2     frame_sample_count    u16 BE   ≥ 1, % (1 << partition_order) == 0
+   7+    2·p   lpc_coefficients      i16 BE[] length = prediction_order = p
+```
+
+Total header length: `7 + 2 · prediction_order` bytes.
+
+### 3.1 `sync_word`
+
+Fixed value `0x1ACC`. Present to identify a LAC frame on lightly framed
+transports and to reject foreign payloads at the first check. Decoders
+**must** reject any frame whose first two bytes are not `0x1ACC`.
+
+### 3.2 `prediction_order`
+
+Integer order of the LPC analysis filter used to produce residuals.
+
+- Value `0` is **verbatim mode**: no prediction, residuals equal the samples,
+  and the `lpc_coefficients` array is empty (zero bytes).
+- Values `1` through `32` are standard LPC orders; `lpc_coefficients` carries
+  exactly that many predictor coefficients, interpreted in the Q-format
+  determined by `coefficient_shift` (§3.4).
+
+Decoders **must** reject values above 32.
+
+### 3.3 `partition_order`
+
+Controls how the residual stream is split for Rice coding.
+
+- `partition_count = 1 << partition_order`.
+- The residual stream is divided into `partition_count` equal partitions of
+  `frame_sample_count / partition_count` samples each.
+
+Decoders **must** reject values above 7 and **must** reject frames where
+`frame_sample_count` is not a multiple of `partition_count`.
+
+### 3.4 `coefficient_shift`
+
+Controls the fixed-point scale of the stored Q-format LPC predictor
+coefficients. Coefficients are stored as 16-bit integers interpreted as
+`Q(15 − coefficient_shift)`:
+
+| shift | Q-format | Real-value range  | Use case                                       |
+|-------|----------|-------------------|------------------------------------------------|
+| 0     | Q15      | `[−1, 1)`         | Coefficients with magnitude < 1 (most orders > 1) |
+| 1     | Q14      | `[−2, 2)`         | Low-frequency content, `a[1]` near −2         |
+| 2     | Q13      | `[−4, 4)`         | Extreme bass / narrow resonances               |
+| 3     | Q12      | `[−8, 8)`         | Pathological transients                        |
+| 4     | Q11      | `[−16, 16)`       | Reserved for synthetic signals                 |
+| 5     | Q10      | `[−32, 32)`       | Upper bound; decoder rejects larger values     |
+
+The encoder **must** select the smallest `coefficient_shift` at which no
+coefficient's real value exceeds the representable range for that shift —
+i.e., the smallest scale that does not clamp. Smaller shifts give finer
+precision and thus smaller residuals when no clamping is required.
+
+If no `coefficient_shift ∈ [0, 5]` suffices (the real coefficient
+magnitude exceeds the Q10 range at `shift = 5`), the encoder
+saturates each offending coefficient independently to the i16 range
+`[−32768, 32767]`. Bit-exact round-trip is preserved because the
+decoder applies the synthesis formula to whatever 16-bit values the
+wire carries; the cost of saturation is compression — the predictor
+no longer matches the encoder's ideal coefficients, so residuals
+grow. Real audio at the input-magnitude contract (§1) rarely reaches
+this case; synthetic or adversarial inputs can force it.
+
+Decoders **must** reject values above 5. The shift applies uniformly to
+every coefficient in `lpc_coefficients`; there is no per-coefficient
+scale.
+
+When `prediction_order == 0` (verbatim frame), `coefficient_shift`
+**must** be `0`. The shift only modifies how stored coefficients are
+interpreted, and a verbatim frame stores none. Decoders **must**
+reject frames with `prediction_order == 0` and `coefficient_shift != 0`
+as malformed; this rule closes the space of legal but meaningless
+headers so two implementations agree bit-for-bit on which inputs
+round-trip.
+
+### 3.5 `frame_sample_count`
+
+Number of audio samples produced by this frame (also the number of residuals
+in the Rice bitstream). The value **must** be in `[1, 65535]`.
+
+Decoders **must** reject `frame_sample_count == 0`: a zero-sample frame
+trivially satisfies the partition-divisibility check below
+(`0 mod n == 0` for any `n`) but carries no audio and has no legal
+Rice payload.
+
+For compliance with `partition_order`, the value **must** satisfy
+`frame_sample_count mod (1 << partition_order) == 0`. Decoders **must**
+reject frames where this does not hold.
+
+### 3.6 `lpc_coefficients`
+
+Array of `prediction_order` predictor coefficients, each a 16-bit big-endian
+signed integer interpreted in `Q(15 − coefficient_shift)` format — see §3.4
+for the shift semantics.
+
+The wire format does not distinguish coefficients by derivation. The
+synthesis formula below applies identically whether the encoder
+obtained the values from Levinson-Durbin analysis, from a fixed
+coefficient template (e.g. FLAC-style integer predictors), from a
+trained model, or from any other strategy. What goes on the wire is
+just `prediction_order` 16-bit integers; how the encoder chose them
+is encoder-internal and not observable to the decoder.
+
+Synthesis formula (applied in the decoder):
+
+```text
+s           = 15 − coefficient_shift
+bias        = 1 << (s − 1)
+predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i − j − 1] + bias) >> s
+sample[i]  = residual[i] + predict[i]
+```
+
+where `terms = min(i, prediction_order)`. The `+ bias` term implements
+round-to-nearest for the right shift and is **required** for bit-exact
+decoding. For the default `coefficient_shift = 0`: `s = 15`, `bias = 16384`.
+
+The `>> s` operator **must** be an **arithmetic right shift** on
+signed integers — equivalent to floor division by `2^s`. Combined with
+the `+ bias` pre-add, this implements **round-half toward +∞**: on a
+value whose scaled form is exactly `k + 0.5`, the result is `k + 1`
+for both positive and negative `k`. Implementations using truncating
+integer division (C's `/` on signed integers, which rounds toward
+zero) **will diverge** from this on any `sum + bias` that is negative
+and not evenly divisible by `2^s`: arithmetic shift rounds further
+from zero, truncating division rounds toward zero. Concrete example:
+at `s = 15`, `sum = -32769`, `bias = 16384`, arithmetic shift gives
+`(-16385) >> 15 = -1`, truncating division gives `-16385 / 32768 = 0`.
+Decoders in languages whose native integer division does not floor
+**must** emulate arithmetic right shift explicitly on the signed
+accumulator.
+
+#### Accumulator width
+
+The inner sum `Σ coeff[j] · sample[i − j − 1]` **must** be computed in
+a signed integer accumulator of at least **49 bits** (equivalently: an
+`i64` or wider). Worst-case bounds at `prediction_order = 32`,
+`coefficient_shift = 5` (Q10), and full-scale samples give a product
+of magnitude `(2¹⁵) · (2²³ − 1) ≈ 2³⁸` per term, summed over 32 terms
+for a maximum of `~2⁴³`. Adding the bias keeps the result below `2⁴⁴`.
+A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs —
+implementations that reach for `int32_t` because samples are 32-bit
+will silently corrupt high-order frames.
+
+JavaScript / TypeScript implementers should note that `Number` is an
+IEEE 754 double, not a signed integer: its 53-bit safe-integer range
+covers in-contract accumulator values, but adversarial bitstreams (see
+§6.2) can produce out-of-contract samples whose synthesis arithmetic
+lands in the 2⁴⁹–2⁵¹ range and beyond, where `Number` silently loses
+low bits to float rounding. For bit-exact spec compliance in JS/TS,
+**use `BigInt` for the accumulator** — it has the integer semantics
+the spec requires; `Number` does not.
+
+#### Warm-up (`terms == 0`)
+
+When `i == 0`, `terms = min(0, prediction_order) = 0`. The sum is
+empty and `predict[0] = 0` — the `(0 + bias) >> s` formula is **not**
+applied. Stating this explicitly avoids an implementation that
+mechanically applies the formula in the warm-up case and produces
+`predict[0] = bias >> s`, which is zero only in specific
+`(bias, s)` parametrisations and surprising in any other.
+
+For `0 < i < prediction_order`, the sum truncates to the available
+`i` predecessors (`terms = i`). The formula applies as stated.
+
+#### Sign convention for stored coefficients
+
+The synthesis formula uses `+Σ`. Classical Levinson-Durbin
+implementations that derive LPC from the error-prediction AR model
+
+```text
+x[n] = −Σ a[j] · x[n-j] + e[n]      (error convention)
+```
+
+produce coefficients `a[j]` whose sign is the **opposite** of what
+the synthesis formula expects; those encoders **must** negate before
+quantisation so the wire value is `coeff[j-1] = −a[j]`.
+Implementations using the predictor convention
+
+```text
+x̂[n] = +Σ c[j] · x[n-j]             (predictor convention)
+```
+
+store `c[j]` directly.
+
+Both conventions are common in DSP literature. Encoders **must**
+verify that the coefficients emitted on the wire, when substituted
+into the synthesis formula above, reproduce the encoder's own
+prediction. The reference implementation uses the error convention
+and negates at quantisation time.
+
+#### Overflow semantics of the final add
+
+The `residual[i] + predict[i]` add is specified as a **wrapping i32
+add** (two's complement, modulo `2³²`; in languages without native
+signed-overflow semantics, compute `(residual + predict) & 0xFFFFFFFF`
+and then re-interpret as a signed 32-bit integer via sign-extension
+of bit 31). On well-formed bitstreams — those produced by a compliant
+encoder from in-contract samples (§1) — the result stays inside the
+sample-magnitude contract and the wrap is never observable.
+Adversarial bitstreams with crafted coefficients and residuals **may**
+produce any `i32` value; the decoder **must not** panic, abort, or
+reject on the basis of this add's result. The consequences of
+out-of-contract decoder output are addressed in §6.2.
+
+## 4. Rice Bitstream
+
+Immediately follows the header. Flat MSB-first bitstream structured as
+consecutive partition payloads:
+
+```text
++---------+---------+-----+-----------+
+| part. 0 | part. 1 | ... | part. P-1 |
++---------+---------+-----+-----------+
+```
+
+where `P = 1 << partition_order`. Each partition has the same structure:
+
+```text
++-------+-------------+-------------+-----+-------------+
+| k (5) | codeword 0  | codeword 1  | ... | codeword M-1|
++-------+-------------+-------------+-----+-------------+
+```
+
+where `M = frame_sample_count / P` is the per-partition residual count.
+
+Partitions are **bit-contiguous**: the 5-bit `k` field of partition
+`i + 1` begins at the bit immediately following the last codeword of
+partition `i`. There is no byte alignment between partitions. Only
+the final trailing padding described in §4.3 is byte-aligned.
+
+Within a partition, the bit cursor likewise advances continuously:
+codeword 0 begins at the bit immediately following the 5-bit `k`
+field, codeword 1 immediately after codeword 0's remainder bit, and so
+on. A conformant decoder maintains a single bit-read position across
+the entire Rice bitstream — from the `k` of partition 0 through the
+last codeword of partition P−1 — and never realigns to a byte or bit
+boundary between fields. This is implicit in the byte-stream decoder
+design (a bit reader that consumes bits sequentially needs no
+special handling at field boundaries) but stated here so second-team
+implementations do not introduce a spurious alignment.
+
+### 4.1 Per-Partition Parameter `k`
+
+Five-bit unsigned integer, MSB-first, immediately before the partition's
+codewords. `k` is the Rice parameter for this partition and must be in
+`[0, 23]`. Decoders **must** reject values above 23 as malformed.
+
+### 4.2 Codeword
+
+Each residual is encoded by:
+
+1. **Zigzag mapping** from signed to unsigned:
+
+   ```text
+   z = (r << 1) ^ (r >> 31)            interpreted as u32
+   ```
+
+   where `(r >> 31)` is an **arithmetic right shift** on the i32
+   residual, sign-extending the sign bit to all 32 bit positions — `0`
+   for non-negative `r`, `-1` (all ones) for negative `r`. The entire
+   expression is **masked to 32 bits** before being interpreted as
+   `u32`; in languages with arbitrary-precision integers (e.g. Python)
+   or where native bitwise ops return signed 32-bit (e.g. JavaScript /
+   TypeScript `Number`, where `(x ^ y) >>> 0` coerces to u32), this
+   mask is explicit (`& 0xFFFFFFFF` or `>>> 0`) and **required** —
+   without it, negative residuals produce zigzag values with extra
+   high bits set. Implementations in languages whose native right
+   shift is logical on unsigned types **must** coerce `r` to a signed
+   32-bit type first; implementations in languages where `>>` on
+   signed types is implementation-defined (e.g. pre-C++20 C/C++)
+   **must** emulate the arithmetic shift explicitly.
+
+   The `r << 1` factor is always safe on i32. Residuals on the
+   encoder side are bounded by `|r| ≤ |sample| + |predict| ≤ 2·2²³
+   ≈ 2²⁴`, so `r << 1` has magnitude `≤ 2²⁵` and fits in i32 without
+   overflow or undefined behaviour — even for `r` at the most
+   negative value the encoder can ever produce from in-contract
+   input (§1). Decoder-side code does not perform this shift; the
+   inverse uses `z >> 1` on a u32, which is always defined.
+
+   The mapping sends
+
+   ```text
+   {0, −1, 1, −2, 2, −3, 3, …}  →  {0, 1, 2, 3, 4, 5, 6, …}
+   ```
+
+   so small magnitudes of either sign map to small unsigned values.
+
+   The decoder's inverse is
+
+   ```text
+   r = ((z >> 1) as i32) ^ −((z & 1) as i32)
+   ```
+
+   where both shifts here are natural (unsigned-u32 logical for the
+   first, integer negation for the second). Stating this inverse
+   explicitly removes any ambiguity about how an implementation must
+   invert the zigzag.
+
+2. **Rice code** at parameter `k`:
+   - **Unary part**: `q = z >> k` zero-bits followed by a single
+     terminating 1-bit.
+   - **Remainder part**: `k` bits of `z & ((1 << k) − 1)`, MSB-first.
+     (The remainder part is absent when `k == 0`.)
+
+Total codeword length: `q + 1 + k` bits.
+
+#### Decoder-side unary-run bound
+
+Decoders **must** reject any codeword whose unary run length satisfies
+
+```text
+q > (2³² − 1) >> k         (equivalently, q > u32::MAX >> k)
+```
+
+A valid codeword reconstructs `z = (q << k) | remainder` as a u32.
+`q > u32::MAX >> k` implies `q << k ≥ 2³²`, which either overflows
+u32 silently (a critical decoder bug class — corrupt output with no
+error) or indicates a malformed stream. Either way the frame **must**
+be discarded with `InvalidParameter` or equivalent rejection class
+(§6). The bound varies with `k`: at `k = 23` it is `511`, at `k = 0`
+it is `u32::MAX` (no practical constraint).
+
+This rule also caps the CPU cost of unary scanning on adversarial
+input: without the cap, a decoder could be forced to scan an
+arbitrarily long run of zero bits before reaching either a `1` or
+the buffer end.
+
+### 4.3 Byte Padding
+
+After the last codeword of the last partition, any unused bits of the final
+byte are zero-padded on the LSB side. The encoder writes `0` for all padding
+bits; the decoder ignores them.
+
+### 4.4 Bitstream Length
+
+The Rice bitstream's total bit length is the sum of codeword bit
+lengths across every partition, plus 5 bits per partition for the `k`
+fields:
+
+```text
+total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition)
+```
+
+This depends on every residual's quotient and cannot be computed
+from the frame header alone. A decoder **streams-decodes** until it
+has produced exactly `frame_sample_count` samples, then stops.
+Any bits remaining inside the last byte are padding (§4.3) and
+carry no information.
+
+A decoder **must not** require the Rice bitstream's byte length to be
+signalled out-of-band. The header plus the zero-padded byte-aligned
+tail fully determines the frame boundary; `parse_header`'s
+`bytes_consumed` return plus streaming Rice decode locates the end of
+the frame in the input buffer.
+
+## 5. Degenerate Cases
+
+### 5.1 All-Zero Frame
+
+For an all-zero sample vector, the encoder **must** use
+`prediction_order = 0` because the Levinson-Durbin recursion is
+undefined at `R[0] = 0`. Residuals equal the input (all zeros).
+
+Partition-order and per-partition `k` selection remain at the
+encoder's discretion; any legal `(partition_order, k)` combination
+produces a bit-exact-decodable frame. The minimum-cost choice —
+`partition_order = 0`, `k = 0` — produces a Rice payload of exactly
+`5 + frame_sample_count` bits. Compliant encoders are **not**
+required to pick this minimum.
+
+### 5.2 Single-Sample Frame
+
+`frame_sample_count = 1` is valid but forces `partition_order = 0` (the only
+value that divides 1 evenly). The single sample is Rice-coded directly
+because no predecessors exist for any LPC order.
+
+## 6. Error Recovery
+
+Decoders **must** detect and reject every frame that violates the
+constraints elsewhere in this document. The exhaustive list of
+rejection classes, each of which is a distinct error condition so
+callers can distinguish them in telemetry, is:
+
+1. **Sync word mismatch** — bytes `0-1` differ from `0x1ACC` (§3.1).
+2. **`prediction_order` out of range** — value > `32` (§3.2).
+3. **`partition_order` out of range** — value > `7` (§3.3).
+4. **`coefficient_shift` out of range** — value > `5` (§3.4).
+5. **Verbatim frame with non-zero shift** — `prediction_order == 0` and
+   `coefficient_shift != 0` (§3.4).
+6. **`frame_sample_count == 0`** (§3.5).
+7. **`frame_sample_count` not divisible by `partition_count`** (§3.3).
+8. **Buffer truncated** — fewer bytes than the header plus coefficient
+   array requires, or fewer bits than the Rice bitstream demands
+   during streaming decode. This class is intentionally coarse-grained:
+   a single `Truncated` variant covers header truncation, missing `k`
+   fields, mid-codeword exhaustion, and every other "buffer ends early"
+   shape the decoder can encounter. Sub-categorising these provides no
+   caller benefit — the recovery action (discard and substitute
+   silence, see §6.1) is identical regardless of where truncation
+   happened.
+9. **Per-partition `k` out of range** — value > `23` (§4.1).
+10. **Unary-run cap exceeded** — any codeword with `q > u32::MAX >> k`
+    (§4.2).
+
+On any of these, the decoder **must** discard the frame, produce no
+output samples, and signal the error to the caller. **No partial
+state may propagate** to the next frame's decode — frames are
+independent (§2), so subsequent frames decode cleanly regardless.
+
+### 6.1 Caller-side silence substitution
+
+On rejection, the caller substitutes `frame_sample_count` zeros
+(silence) for the frame period. The count is obtained as follows:
+
+- **Post-header rejections** (classes 8-10 above — `Truncated` in the
+  Rice bitstream, `InvalidParameter` during Rice decode): the frame
+  header parsed successfully before the failure, so the count is
+  recoverable. The caller re-parses just the header on the same buffer
+  (reference API: `parse_header(data)`) and reads `frame_sample_count`
+  from the resulting `AudioFrameHeader`.
+- **Pre-header rejections** (classes 1-7 above): the header itself
+  failed; the frame length is not recoverable from the bitstream. The
+  caller **must** fall back to a session-level default frame size
+  carried out-of-band by the container or transport (WebRTC and QUIC
+  audio sessions typically negotiate this at session setup).
+
+This asymmetry is inherent to the wire format: `frame_sample_count`
+lives inside the header at offset 5, so any rejection that happens
+while parsing bytes 0-4 precedes its discovery.
+
+### 6.2 Decoder output magnitude
+
+On well-formed bitstreams produced by a compliant encoder from
+in-contract samples (§1), decoder output satisfies
+`|sample| ≤ 2²³ − 1`.
+
+Adversarial bitstreams — those with hand-crafted coefficients and
+residuals that pass every rejection check in this section yet
+produce arithmetic results outside the sample-magnitude contract —
+**may** produce output samples of any `i32` value, including values
+that exceed `2²³ − 1`. The decoder **must not** panic or reject on
+this basis: the wrapping-add semantics of §3.6 are precisely what
+makes every bit sequence produce a defined output, which is the
+ground of the "no partial state propagates" contract at the top of
+this section.
+
+Callers that re-feed decoder output into LAC's encoder (for example,
+an MCU decode → PCM mix → re-encode pipeline) **should** validate or
+clamp to the input magnitude contract before re-encoding. A
+compliant encoder assumes its input satisfies `|sample| ≤ 2²³ − 1`
+and is not required to re-validate.
+
+## 7. Encoder Guidance (non-normative)
+
+The reference encoder's search has three phases:
+
+```text
+# Phase 0: all-zero short-circuit
+R[0] = Σ sample[i]²
+if R[0] == 0:
+    emit frame with prediction_order = 0, any legal partition_order (§5.1)
+    return
+
+# Phase 1: sparse LPC order grid with stop-when-stale early-out
+for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]:
+    coeffs_q31     = levinson_durbin(samples, prediction_order)  # cached
+    shift          = smallest s such that every |coeff_real| < 2^s
+    coeffs_stored  = quantize(coeffs_q31, to: Q(15 - shift))
+    residuals      = compute_residuals(samples, coeffs_stored, shift)
+    for partition_order in 0..=7:
+        if frame_sample_count % (1 << partition_order) != 0: continue
+        rice_bits = estimate_cost(residuals, partition_order)
+        total     = header_bits(prediction_order) + rice_bits
+        track minimum over (prediction_order, partition_order)
+    if no improvement for 2 consecutive grid entries: break
+
+# Phase 2: fixed-predictor post-pass
+for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS:
+    residuals = compute_residuals(samples, fp_coeffs, fp_shift)
+    for partition_order in 0..=7:
+        if frame_sample_count % (1 << partition_order) != 0: continue
+        rice_bits = estimate_cost(residuals, partition_order)
+        total     = header_bits(fp_order) + rice_bits
+        track minimum over (fp_order, partition_order)
+
+emit frame with the (order, partition_order) that minimised `total`
+```
+
+The sparse grid + early-out is a speed/compression trade-off; a
+compliant encoder may still exhaustively search every integer order
+`0..=32` for marginal gains at higher cost. The produced bitstreams
+are interchangeable.
+
+The fixed-predictor post-pass tries FLAC-style integer predictors
+(orders 1-4 with a small static coefficient table) after the LPC
+grid. These evaluate quickly and occasionally beat the Levinson-
+Durbin winner on content where a low-order integer polynomial fits
+better than the statistically-optimal LPC fit — silent-plus-DC, very
+smooth tones, polynomial-ish sensor data. Running them second avoids
+tripping the stop-when-stale heuristic in the LPC phase.
+
+The reference encoder's `FIXED_PREDICTORS` table, materialising the
+FLAC-style `[1]`, `[2, −1]`, `[3, −3, 1]`, `[4, −6, 4, −1]` integer
+predictors at the smallest `coefficient_shift` that represents each
+coefficient without clamping:
+
+| `prediction_order` | `lpc_coefficients` (Q-format integers)      | `coefficient_shift` | Real-value interpretation    |
+|-------------------:|---------------------------------------------|--------------------:|------------------------------|
+| 1                  | `[16384]`                                    | 1 (Q14)             | `[1]`                        |
+| 2                  | `[16384, −8192]`                             | 2 (Q13)             | `[2, −1]`                    |
+| 3                  | `[24576, −24576, 8192]`                      | 2 (Q13)             | `[3, −3, 1]`                 |
+| 4                  | `[16384, −24576, 16384, −4096]`              | 3 (Q12)             | `[4, −6, 4, −1]`             |
+
+These are the exact wire-format bytes a second-team encoder would emit
+to match the reference's fixed-predictor outputs bit-for-bit. Compliant
+encoders MAY use a different set (or none), since §3.6 treats the
+coefficient field as opaque — decoders apply the synthesis formula
+identically regardless of source.
+
+The `R[0] == 0` short-circuit is both a correctness requirement (§5.1
+— Levinson-Durbin is undefined at zero autocorrelation) and an
+encoder-cost optimisation: on digital silence, the sparse grid and
+fixed-predictor pass produce identical zero residuals and order 0
+wins on header size alone.
+
+Levinson-Durbin runs once to order 32 with all intermediate orders saved
+into a flat buffer (one recursion pass yields all orders 1..32 at
+`O(order²)` cost), so the outer loop fetch is free and order selection
+is effectively `O(orders_tried × N)`.
+
+`shift` is determined per order by the coefficient magnitudes — there
+is no shift search, as smaller shifts are always at least as good as
+larger ones when they don't clamp (saturation, §3.4, is the
+fallback). Rice cost at a given `partition_order` is exact and
+closed-form given the per-partition `k`, so the inner search
+introduces no estimation error.
+
+#### Levinson-Durbin numerical choices (reference, non-normative)
+
+The reference encoder runs Levinson-Durbin with i64 autocorrelation
+accumulators, Q31 working coefficients, and widens to i128 for
+reflection-coefficient intermediates at orders where Q31 would lose
+precision (typically above order ~12). Rounding on the Q31→Q(15−shift)
+quantisation step is round-half-up via `(a_q31 + bias) >> shift_amt`,
+with `bias = 1 << (14 + shift)` — the direct analogue of the synthesis
+formula's rounding (§3.6), chosen so analysis and synthesis agree on
+tie-break direction.
+
+None of these choices are normative. Two encoders making different
+precision or rounding choices will produce different coefficient
+bytes on the same input, but both bitstreams decode correctly under
+§3.6 so long as the coefficients they emit faithfully represent their
+own LPC decision. §3.6's "wire format doesn't distinguish coefficients
+by derivation" clause is exactly what permits this freedom.
+
+#### Rice `k` selection (reference, non-normative)
+
+Cost is closed-form for a given `(partition, k)`: `bit_cost(k) =
+N · (1 + k) + Σ (v >> k)` where `v` ranges over the zigzag-mapped
+residuals of the partition. Exhaustive search over `k ∈ [0, 23]` is
+always acceptable and is the simplest compliant choice.
+
+The reference encoder uses convex descent: seed `k_seed =
+⌊log₂(mean(v))⌋`, then walk either direction. Ties break **toward
+smaller `k`** on descent (condition `cost ≤ best_cost`, so equal
+costs at `k−1` overwrite `k`) and **strictly larger costs only** on
+ascent (condition `cost < best_cost`, so equal costs at `k+1` don't
+override `k`). This yields a unique `k` per partition matching an
+exhaustive search's first-wins tie-break.
+
+On the reference corpus, the sparse grid's compression matches
+exhaustive-search output within ~0.2 percentage points (measured);
+the differential test caps the acceptable excess at 0.5%.
+Implementations that prefer tighter compression at higher encode
+cost can extend the grid without wire-format consequences — decoders
+do not care which orders an encoder tried.
+
+### 7.1 Frame size (non-normative)
+
+The codec accepts any `frame_sample_count` in `[1, 65535]`. Larger
+frames amortise the 7-byte fixed header and the LPC coefficient vector
+over more samples, and generally compress better. Smaller frames give
+tighter latency and finer partition-order granularity on transient
+content.
+
+Recommended defaults by use case:
+
+- **Real-time voice** (QUIC streaming, MCU mix): 160-320 samples at
+  16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used
+  by Opus / WebRTC.
+- **Real-time full-band** (game audio, music conferencing): 1024-2048
+  at 48 kHz (21-43 ms).
+- **Offline / archival**: 4096-8192 at 48 kHz; compression gains
+  flatten past this.
+
+Power-of-two frame sizes expose every `partition_order ∈ [0, 7]` to
+the encoder's search. Non-power-of-two sizes restrict the search to
+partition counts that divide evenly; a prime frame size forces
+`partition_order = 0`. Encoders SHOULD prefer frame sizes with several
+small-prime factors when free to choose.
+
+## 8. Versioning
+
+This document specifies **LAC version 1**, identified on the wire by
+`sync_word = 0x1ACC`. No per-frame version byte is carried; the sync
+word uniquely identifies the wire format.
+
+Future revisions of the format **must** use a distinct `sync_word`. The
+recommended allocation is `0x1ACD` for v2, `0x1ACE` for v3, and so on
+inside the `0x1ACC..0x1ACF` cluster, with the cluster boundary making
+casual grep / hex-dump inspection robust. A revision whose wire format
+cannot be made compatible with v1 at the header level **must** pick a
+sync word outside this cluster.
+
+This approach is exhaustive by construction:
+
+- A v1 decoder that encounters a v2 frame sees an unrecognised sync
+  word on the first check (§3.1) and rejects cleanly — the same error
+  path as foreign or corrupted payloads.
+- A v2-aware decoder dispatches on the sync word before reading any
+  further field, so it can fall back to v1 parsing when appropriate
+  or decode v2 frames natively.
+
+Because every field in §3 has a strict range with no reserved
+high-bit patterns, in-place extension (flag bits inside existing
+fields) is **not** a supported evolution path. New features go into a
+new `sync_word`, not into reinterpreting existing field values.
+
+Transports that multiplex LAC frames with other formats should frame
+each LAC payload explicitly (length prefix or stream separator); the
+sync word alone is not a framing delimiter, only a format identifier.
+
+## 9. Implementation notes (non-normative)
+
+### 9.1 GPU offload is out of scope
+
+LAC is a scalar integer codec. The reference implementation, and any
+conforming implementation this document anticipates, runs on a CPU.
+GPU offload is deliberately not a goal:
+
+- **Levinson-Durbin** is serial by construction (each iteration depends
+  on the previous) and its intermediate accumulator needs more than 64
+  bits of precision at higher orders — a poor fit for WGSL or SPIR-V
+  compute shaders, which have no native 128-bit integer arithmetic.
+- **Rice decode** uses a data-dependent unary run for every residual;
+  on GPU execution models this diverges warps badly and its
+  sequential bit-cursor progression fights SIMD lane packing.
+- **LPC synthesis** has a tight per-sample feedback loop (sample `i`
+  depends on samples `i-1`, `i-2`, …, `i-order`), so each channel is
+  inherently serial.
+- **The one plausibly GPU-parallel phase** — residual computation
+  inside the encoder's order search — is also the phase where the
+  CPU's autovectorized implementation is already well-served by SIMD
+  on any modern target. At the measured encode latencies (P99 under
+  50 µs on x86 for a 20 ms frame period, >400× headroom), there is no
+  motivation to offload it.
+
+A hypothetical future revision whose hot path genuinely benefited from
+GPU execution (large-batch archival encoding across many channels at
+once, for instance) would need to change the wire format to carry
+enough shape metadata for batched kernels — i.e., a new sync word
+under the versioning rules in §8, not a retrofit.
+
+## 10. Conformance test vectors
+
+The reference repository's `tests/conformance.rs` holds the canonical
+test-vector set for this specification:
+
+- **`DECODE_FIXTURES`** — `(samples, bytes)` pairs pinned at the byte
+  level. A conformant decoder **must** produce the `samples` array
+  when fed the `bytes` array. Encoders have latitude (§3.6, §7), so a
+  second-team encoder's bytes for the same samples may differ; the
+  decoder direction is the normative one. Coverage includes the
+  smallest valid frames (single-sample verbatim, 4- and 8-sample
+  silence), single-sample polarity boundaries (±1, ±(2²³−1)), DC
+  offset, alternating-polarity Nyquist-like content, smooth polynomial
+  (fixed-predictor territory), and a 16-sample growing-amplitude
+  pattern exercising partition search.
+- **`REJECT_FIXTURES`** — hand-constructed malformed inputs mapped to
+  their expected rejection variants. Covers every class in §6 (1-10):
+  bad sync, each field-range violation, verbatim + non-zero shift,
+  `frame_sample_count == 0`, non-divisible partition count, header /
+  coefficient / Rice-bitstream truncation, per-partition `k > 23`.
+- **`reject_unary_run_above_cap`** — a programmatic test for §6 class
+  10 (`q > u32::MAX >> k`). The minimal triggering payload is ~75
+  bytes of mostly zeros; construction logic is in the test, not a
+  const fixture.
+
+Second-team implementations should port the decode fixtures
+byte-for-byte and the reject fixtures byte-and-variant-for-variant.
+`encode_matches_fixtures` in the same file is reference-specific (it
+asserts the reference encoder's exact bytes) and is **not** a
+conformance requirement — see §3.6's encoder-latitude clause.
+
+### 10.1 Reference encoder exemplars (non-normative)
+
+The same `(samples, bytes)` pairs, read in the encoder direction,
+serve as **reference-encoder validation targets** for implementations
+that want to match this project's reference byte-for-byte — a common
+goal for porting work even though the spec does not require it. The
+fixture set is deliberately chosen to pin every encoder-discretion
+axis from §7:
+
+- `single_zero`, `single_pos_one`, `single_neg_one` — single-sample
+  frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime
+  where the encoder's order choice is nearly arbitrary; pinning the
+  bytes fixes this project's choice (order 0 with the minimum-cost
+  Rice encoding).
+- `silence_4`, `silence_8` — force `partition_order` tie-breaks on an
+  all-zero frame (every `partition_order ∈ [0, log₂(N)]` produces
+  identical cost; the reference picks the smallest via its convex-
+  descent tie-break).
+- `dc_100_4`, `alternating_small_4` — exercise the order-vs-verbatim
+  decision. DC content favours a low-order LPC fit with small
+  residuals; alternating content favours order 1 with `a = −1`
+  (approximated at the closest Q-format). Pinning the bytes fixes
+  the reference's decision boundary.
+- `single_full_scale_pos`, `single_full_scale_neg` — maximum-magnitude
+  single samples. Exercise the `|sample| ≤ 2²³ − 1` boundary on both
+  sides and fix the zigzag-of-extremum output.
+- `linear_ramp_8` — smooth polynomial content, fixed-predictor
+  territory. Pins the reference's fixed-predictor-vs-LPC tie-break.
+- `lfsr_noise_16` — exercises partition search on a frame large
+  enough for `partition_order > 0` to be competitive.
+
+A second-team encoder that produces the same bytes for every entry
+here is **likely** (not guaranteed) to produce matching bytes on
+wider inputs, since the tie-break axes are the ones most sensitive
+to encoder discretion. An encoder that produces different bytes is
+still compliant so long as its own bytes round-trip — see §3.6, §7.
\ No newline at end of file