784 lines
36 KiB
Markdown
784 lines
36 KiB
Markdown
# LAC Wire Format
|
||
|
||
Normative specification of the LAC bitstream. This document is the authority
|
||
on byte layout, field semantics, and encoder/decoder constraints.
|
||
|
||
## 1. Conventions
|
||
|
||
- All multi-byte integer fields are **big-endian**.
|
||
- Bit streams are **MSB-first**: the first bit written occupies bit 7 of its
|
||
byte, subsequent bits fill lower positions, and a new byte begins once
|
||
eight bits have been emitted.
|
||
- Samples are **signed integers** passed as `i32` with magnitude bounded
|
||
by `|sample| ≤ 2²³ − 1`. The upper 9 bits of each `i32` must be a
|
||
consistent sign extension of the 24-bit-magnitude value. Narrower source
|
||
formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly
|
||
— they trivially satisfy the magnitude bound — and compress at the bit
|
||
cost of their actual values, not a 24-bit ceiling. The codec does not
|
||
carry bit-depth metadata; the container or application layer is
|
||
responsible for remembering the source format.
|
||
- Sample rate is **not** part of the bitstream. The container or transport
|
||
carries it.
|
||
- A frame encodes **one channel**. Stereo is two independent streams of
|
||
frames, one per channel.
|
||
|
||
## 2. Frame Layout
|
||
|
||
A frame is a contiguous byte sequence:
|
||
|
||
```text
|
||
+--------+--------------------+
|
||
| header | rice_bitstream |
|
||
+--------+--------------------+
|
||
```
|
||
|
||
The `header` is fixed-structure (variable length because the coefficient
|
||
array depends on `prediction_order`). The `rice_bitstream` is a bit-packed
|
||
payload; its byte length is `ceil(total_rice_bits / 8)` with zero padding in
|
||
the low bits of the last byte.
|
||
|
||
Decoder input is the complete frame. There is no intra-frame continuation or
|
||
fragmentation — the transport layer handles that.
|
||
|
||
## 3. Frame Header
|
||
|
||
```text
|
||
Offset Size Field Type Constraint
|
||
------ ---- -------------------- ------- ----------------------------
|
||
0-1 2 sync_word u16 BE == 0x1ACC
|
||
2 1 prediction_order u8 ∈ [0, 32]
|
||
3 1 partition_order u8 ∈ [0, 7]
|
||
4 1 coefficient_shift u8 ∈ [0, 5]
|
||
5-6 2 frame_sample_count u16 BE ≥ 1, % (1 << partition_order) == 0
|
||
7+ 2·p lpc_coefficients i16 BE[] length = prediction_order = p
|
||
```
|
||
|
||
Total header length: `7 + 2 · prediction_order` bytes.
|
||
|
||
### 3.1 `sync_word`
|
||
|
||
Fixed value `0x1ACC`. Present to identify a LAC frame on lightly framed
|
||
transports and to reject foreign payloads at the first check. Decoders
|
||
**must** reject any frame whose first two bytes are not `0x1ACC`.
|
||
|
||
### 3.2 `prediction_order`
|
||
|
||
Integer order of the LPC analysis filter used to produce residuals.
|
||
|
||
- Value `0` is **verbatim mode**: no prediction, residuals equal the samples,
|
||
and the `lpc_coefficients` array is empty (zero bytes).
|
||
- Values `1` through `32` are standard LPC orders; `lpc_coefficients` carries
|
||
exactly that many predictor coefficients, interpreted in the Q-format
|
||
determined by `coefficient_shift` (§3.4).
|
||
|
||
Decoders **must** reject values above 32.
|
||
|
||
### 3.3 `partition_order`
|
||
|
||
Controls how the residual stream is split for Rice coding.
|
||
|
||
- `partition_count = 1 << partition_order`.
|
||
- The residual stream is divided into `partition_count` equal partitions of
|
||
`frame_sample_count / partition_count` samples each.
|
||
|
||
Decoders **must** reject values above 7 and **must** reject frames where
|
||
`frame_sample_count` is not a multiple of `partition_count`.
|
||
|
||
### 3.4 `coefficient_shift`
|
||
|
||
Controls the fixed-point scale of the stored Q-format LPC predictor
|
||
coefficients. Coefficients are stored as 16-bit integers interpreted as
|
||
`Q(15 − coefficient_shift)`:
|
||
|
||
| shift | Q-format | Real-value range | Use case |
|
||
|-------|----------|-------------------|------------------------------------------------|
|
||
| 0 | Q15 | `[−1, 1)` | Coefficients with magnitude < 1 (most orders > 1) |
|
||
| 1 | Q14 | `[−2, 2)` | Low-frequency content, `a[1]` near −2 |
|
||
| 2 | Q13 | `[−4, 4)` | Extreme bass / narrow resonances |
|
||
| 3 | Q12 | `[−8, 8)` | Pathological transients |
|
||
| 4 | Q11 | `[−16, 16)` | Reserved for synthetic signals |
|
||
| 5 | Q10 | `[−32, 32)` | Upper bound; decoder rejects larger values |
|
||
|
||
The encoder **must** select the smallest `coefficient_shift` at which no
|
||
coefficient's real value exceeds the representable range for that shift —
|
||
i.e., the smallest scale that does not clamp. Smaller shifts give finer
|
||
precision and thus smaller residuals when no clamping is required.
|
||
|
||
If no `coefficient_shift ∈ [0, 5]` suffices (the real coefficient
|
||
magnitude exceeds the Q10 range at `shift = 5`), the encoder
|
||
saturates each offending coefficient independently to the i16 range
|
||
`[−32768, 32767]`. Bit-exact round-trip is preserved because the
|
||
decoder applies the synthesis formula to whatever 16-bit values the
|
||
wire carries; the cost of saturation is compression — the predictor
|
||
no longer matches the encoder's ideal coefficients, so residuals
|
||
grow. Real audio at the input-magnitude contract (§1) rarely reaches
|
||
this case; synthetic or adversarial inputs can force it.
|
||
|
||
Decoders **must** reject values above 5. The shift applies uniformly to
|
||
every coefficient in `lpc_coefficients`; there is no per-coefficient
|
||
scale.
|
||
|
||
When `prediction_order == 0` (verbatim frame), `coefficient_shift`
|
||
**must** be `0`. The shift only modifies how stored coefficients are
|
||
interpreted, and a verbatim frame stores none. Decoders **must**
|
||
reject frames with `prediction_order == 0` and `coefficient_shift != 0`
|
||
as malformed; this rule closes the space of legal but meaningless
|
||
headers so two implementations agree bit-for-bit on which inputs
|
||
round-trip.
|
||
|
||
### 3.5 `frame_sample_count`
|
||
|
||
Number of audio samples produced by this frame (also the number of residuals
|
||
in the Rice bitstream). The value **must** be in `[1, 65535]`.
|
||
|
||
Decoders **must** reject `frame_sample_count == 0`: a zero-sample frame
|
||
trivially satisfies the partition-divisibility check below
|
||
(`0 mod n == 0` for any `n`) but carries no audio and has no legal
|
||
Rice payload.
|
||
|
||
For compliance with `partition_order`, the value **must** satisfy
|
||
`frame_sample_count mod (1 << partition_order) == 0`. Decoders **must**
|
||
reject frames where this does not hold.
|
||
|
||
### 3.6 `lpc_coefficients`
|
||
|
||
Array of `prediction_order` predictor coefficients, each a 16-bit big-endian
|
||
signed integer interpreted in `Q(15 − coefficient_shift)` format — see §3.4
|
||
for the shift semantics.
|
||
|
||
The wire format does not distinguish coefficients by derivation. The
|
||
synthesis formula below applies identically whether the encoder
|
||
obtained the values from Levinson-Durbin analysis, from a fixed
|
||
coefficient template (e.g. FLAC-style integer predictors), from a
|
||
trained model, or from any other strategy. What goes on the wire is
|
||
just `prediction_order` 16-bit integers; how the encoder chose them
|
||
is encoder-internal and not observable to the decoder.
|
||
|
||
Synthesis formula (applied in the decoder):
|
||
|
||
```text
|
||
s = 15 − coefficient_shift
|
||
bias = 1 << (s − 1)
|
||
predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i − j − 1] + bias) >> s
|
||
sample[i] = residual[i] + predict[i]
|
||
```
|
||
|
||
where `terms = min(i, prediction_order)`. The `+ bias` term implements
|
||
round-to-nearest for the right shift and is **required** for bit-exact
|
||
decoding. For the default `coefficient_shift = 0`: `s = 15`, `bias = 16384`.
|
||
|
||
The `>> s` operator **must** be an **arithmetic right shift** on
|
||
signed integers — equivalent to floor division by `2^s`. Combined with
|
||
the `+ bias` pre-add, this implements **round-half toward +∞**: on a
|
||
value whose scaled form is exactly `k + 0.5`, the result is `k + 1`
|
||
for both positive and negative `k`. Implementations using truncating
|
||
integer division (C's `/` on signed integers, which rounds toward
|
||
zero) **will diverge** from this on any `sum + bias` that is negative
|
||
and not evenly divisible by `2^s`: arithmetic shift rounds further
|
||
from zero, truncating division rounds toward zero. Concrete example:
|
||
at `s = 15`, `sum = -32769`, `bias = 16384`, arithmetic shift gives
|
||
`(-16385) >> 15 = -1`, truncating division gives `-16385 / 32768 = 0`.
|
||
Decoders in languages whose native integer division does not floor
|
||
**must** emulate arithmetic right shift explicitly on the signed
|
||
accumulator.
|
||
|
||
#### Accumulator width
|
||
|
||
The inner sum `Σ coeff[j] · sample[i − j − 1]` **must** be computed in
|
||
a signed integer accumulator of at least **49 bits** (equivalently: an
|
||
`i64` or wider). Worst-case bounds at `prediction_order = 32`,
|
||
`coefficient_shift = 5` (Q10), and full-scale samples give a product
|
||
of magnitude `(2¹⁵) · (2²³ − 1) ≈ 2³⁸` per term, summed over 32 terms
|
||
for a maximum of `~2⁴³`. Adding the bias keeps the result below `2⁴⁴`.
|
||
A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs —
|
||
implementations that reach for `int32_t` because samples are 32-bit
|
||
will silently corrupt high-order frames.
|
||
|
||
JavaScript / TypeScript implementers should note that `Number` is an
|
||
IEEE 754 double, not a signed integer: its 53-bit safe-integer range
|
||
covers in-contract accumulator values, but adversarial bitstreams (see
|
||
§6.2) can produce out-of-contract samples whose synthesis arithmetic
|
||
lands in the 2⁴⁹–2⁵¹ range and beyond, where `Number` silently loses
|
||
low bits to float rounding. For bit-exact spec compliance in JS/TS,
|
||
**use `BigInt` for the accumulator** — it has the integer semantics
|
||
the spec requires; `Number` does not.
|
||
|
||
#### Warm-up (`terms == 0`)
|
||
|
||
When `i == 0`, `terms = min(0, prediction_order) = 0`. The sum is
|
||
empty and `predict[0] = 0` — the `(0 + bias) >> s` formula is **not**
|
||
applied. Stating this explicitly avoids an implementation that
|
||
mechanically applies the formula in the warm-up case and produces
|
||
`predict[0] = bias >> s`, which is zero only in specific
|
||
`(bias, s)` parametrisations and surprising in any other.
|
||
|
||
For `0 < i < prediction_order`, the sum truncates to the available
|
||
`i` predecessors (`terms = i`). The formula applies as stated.
|
||
|
||
#### Sign convention for stored coefficients
|
||
|
||
The synthesis formula uses `+Σ`. Classical Levinson-Durbin
|
||
implementations that derive LPC from the error-prediction AR model
|
||
|
||
```text
|
||
x[n] = −Σ a[j] · x[n-j] + e[n] (error convention)
|
||
```
|
||
|
||
produce coefficients `a[j]` whose sign is the **opposite** of what
|
||
the synthesis formula expects; those encoders **must** negate before
|
||
quantisation so the wire value is `coeff[j-1] = −a[j]`.
|
||
Implementations using the predictor convention
|
||
|
||
```text
|
||
x̂[n] = +Σ c[j] · x[n-j] (predictor convention)
|
||
```
|
||
|
||
store `c[j]` directly.
|
||
|
||
Both conventions are common in DSP literature. Encoders **must**
|
||
verify that the coefficients emitted on the wire, when substituted
|
||
into the synthesis formula above, reproduce the encoder's own
|
||
prediction. The reference implementation uses the error convention
|
||
and negates at quantisation time.
|
||
|
||
#### Overflow semantics of the final add
|
||
|
||
The `residual[i] + predict[i]` add is specified as a **wrapping i32
|
||
add** (two's complement, modulo `2³²`; in languages without native
|
||
signed-overflow semantics, compute `(residual + predict) & 0xFFFFFFFF`
|
||
and then re-interpret as a signed 32-bit integer via sign-extension
|
||
of bit 31). On well-formed bitstreams — those produced by a compliant
|
||
encoder from in-contract samples (§1) — the result stays inside the
|
||
sample-magnitude contract and the wrap is never observable.
|
||
Adversarial bitstreams with crafted coefficients and residuals **may**
|
||
produce any `i32` value; the decoder **must not** panic, abort, or
|
||
reject on the basis of this add's result. The consequences of
|
||
out-of-contract decoder output are addressed in §6.2.
|
||
|
||
## 4. Rice Bitstream
|
||
|
||
Immediately follows the header. Flat MSB-first bitstream structured as
|
||
consecutive partition payloads:
|
||
|
||
```text
|
||
+---------+---------+-----+-----------+
|
||
| part. 0 | part. 1 | ... | part. P-1 |
|
||
+---------+---------+-----+-----------+
|
||
```
|
||
|
||
where `P = 1 << partition_order`. Each partition has the same structure:
|
||
|
||
```text
|
||
+-------+-------------+-------------+-----+-------------+
|
||
| k (5) | codeword 0 | codeword 1 | ... | codeword M-1|
|
||
+-------+-------------+-------------+-----+-------------+
|
||
```
|
||
|
||
where `M = frame_sample_count / P` is the per-partition residual count.
|
||
|
||
Partitions are **bit-contiguous**: the 5-bit `k` field of partition
|
||
`i + 1` begins at the bit immediately following the last codeword of
|
||
partition `i`. There is no byte alignment between partitions. Only
|
||
the final trailing padding described in §4.3 is byte-aligned.
|
||
|
||
Within a partition, the bit cursor likewise advances continuously:
|
||
codeword 0 begins at the bit immediately following the 5-bit `k`
|
||
field, codeword 1 immediately after codeword 0's remainder bit, and so
|
||
on. A conformant decoder maintains a single bit-read position across
|
||
the entire Rice bitstream — from the `k` of partition 0 through the
|
||
last codeword of partition P−1 — and never realigns to a byte or bit
|
||
boundary between fields. This is implicit in the byte-stream decoder
|
||
design (a bit reader that consumes bits sequentially needs no
|
||
special handling at field boundaries) but stated here so second-team
|
||
implementations do not introduce a spurious alignment.
|
||
|
||
### 4.1 Per-Partition Parameter `k`
|
||
|
||
Five-bit unsigned integer, MSB-first, immediately before the partition's
|
||
codewords. `k` is the Rice parameter for this partition and must be in
|
||
`[0, 23]`. Decoders **must** reject values above 23 as malformed.
|
||
|
||
### 4.2 Codeword
|
||
|
||
Each residual is encoded by:
|
||
|
||
1. **Zigzag mapping** from signed to unsigned:
|
||
|
||
```text
|
||
z = (r << 1) ^ (r >> 31) interpreted as u32
|
||
```
|
||
|
||
where `(r >> 31)` is an **arithmetic right shift** on the i32
|
||
residual, sign-extending the sign bit to all 32 bit positions — `0`
|
||
for non-negative `r`, `-1` (all ones) for negative `r`. The entire
|
||
expression is **masked to 32 bits** before being interpreted as
|
||
`u32`; in languages with arbitrary-precision integers (e.g. Python)
|
||
or where native bitwise ops return signed 32-bit (e.g. JavaScript /
|
||
TypeScript `Number`, where `(x ^ y) >>> 0` coerces to u32), this
|
||
mask is explicit (`& 0xFFFFFFFF` or `>>> 0`) and **required** —
|
||
without it, negative residuals produce zigzag values with extra
|
||
high bits set. Implementations in languages whose native right
|
||
shift is logical on unsigned types **must** coerce `r` to a signed
|
||
32-bit type first; implementations in languages where `>>` on
|
||
signed types is implementation-defined (e.g. pre-C++20 C/C++)
|
||
**must** emulate the arithmetic shift explicitly.
|
||
|
||
The `r << 1` factor is always safe on i32. Residuals on the
|
||
encoder side are bounded by `|r| ≤ |sample| + |predict| ≤ 2·2²³
|
||
≈ 2²⁴`, so `r << 1` has magnitude `≤ 2²⁵` and fits in i32 without
|
||
overflow or undefined behaviour — even for `r` at the most
|
||
negative value the encoder can ever produce from in-contract
|
||
input (§1). Decoder-side code does not perform this shift; the
|
||
inverse uses `z >> 1` on a u32, which is always defined.
|
||
|
||
The mapping sends
|
||
|
||
```text
|
||
{0, −1, 1, −2, 2, −3, 3, …} → {0, 1, 2, 3, 4, 5, 6, …}
|
||
```
|
||
|
||
so small magnitudes of either sign map to small unsigned values.
|
||
|
||
The decoder's inverse is
|
||
|
||
```text
|
||
r = ((z >> 1) as i32) ^ −((z & 1) as i32)
|
||
```
|
||
|
||
where both shifts here are natural (unsigned-u32 logical for the
|
||
first, integer negation for the second). Stating this inverse
|
||
explicitly removes any ambiguity about how an implementation must
|
||
invert the zigzag.
|
||
|
||
2. **Rice code** at parameter `k`:
|
||
- **Unary part**: `q = z >> k` zero-bits followed by a single
|
||
terminating 1-bit.
|
||
- **Remainder part**: `k` bits of `z & ((1 << k) − 1)`, MSB-first.
|
||
(The remainder part is absent when `k == 0`.)
|
||
|
||
Total codeword length: `q + 1 + k` bits.
|
||
|
||
#### Decoder-side unary-run bound
|
||
|
||
Decoders **must** reject any codeword whose unary run length satisfies
|
||
|
||
```text
|
||
q > (2³² − 1) >> k (equivalently, q > u32::MAX >> k)
|
||
```
|
||
|
||
A valid codeword reconstructs `z = (q << k) | remainder` as a u32.
|
||
`q > u32::MAX >> k` implies `q << k ≥ 2³²`, which either overflows
|
||
u32 silently (a critical decoder bug class — corrupt output with no
|
||
error) or indicates a malformed stream. Either way the frame **must**
|
||
be discarded with `InvalidParameter` or equivalent rejection class
|
||
(§6). The bound varies with `k`: at `k = 23` it is `511`, at `k = 0`
|
||
it is `u32::MAX` (no practical constraint).
|
||
|
||
This rule also caps the CPU cost of unary scanning on adversarial
|
||
input: without the cap, a decoder could be forced to scan an
|
||
arbitrarily long run of zero bits before reaching either a `1` or
|
||
the buffer end.
|
||
|
||
### 4.3 Byte Padding
|
||
|
||
After the last codeword of the last partition, any unused bits of the final
|
||
byte are zero-padded on the LSB side. The encoder writes `0` for all padding
|
||
bits; the decoder ignores them.
|
||
|
||
### 4.4 Bitstream Length
|
||
|
||
The Rice bitstream's total bit length is the sum of codeword bit
|
||
lengths across every partition, plus 5 bits per partition for the `k`
|
||
fields:
|
||
|
||
```text
|
||
total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition)
|
||
```
|
||
|
||
This depends on every residual's quotient and cannot be computed
|
||
from the frame header alone. A decoder **streams-decodes** until it
|
||
has produced exactly `frame_sample_count` samples, then stops.
|
||
Any bits remaining inside the last byte are padding (§4.3) and
|
||
carry no information.
|
||
|
||
A decoder **must not** require the Rice bitstream's byte length to be
|
||
signalled out-of-band. The header plus the zero-padded byte-aligned
|
||
tail fully determines the frame boundary; `parse_header`'s
|
||
`bytes_consumed` return plus streaming Rice decode locates the end of
|
||
the frame in the input buffer.
|
||
|
||
## 5. Degenerate Cases
|
||
|
||
### 5.1 All-Zero Frame
|
||
|
||
For an all-zero sample vector, the encoder **must** use
|
||
`prediction_order = 0` because the Levinson-Durbin recursion is
|
||
undefined at `R[0] = 0`. Residuals equal the input (all zeros).
|
||
|
||
Partition-order and per-partition `k` selection remain at the
|
||
encoder's discretion; any legal `(partition_order, k)` combination
|
||
produces a bit-exact-decodable frame. The minimum-cost choice —
|
||
`partition_order = 0`, `k = 0` — produces a Rice payload of exactly
|
||
`5 + frame_sample_count` bits. Compliant encoders are **not**
|
||
required to pick this minimum.
|
||
|
||
### 5.2 Single-Sample Frame
|
||
|
||
`frame_sample_count = 1` is valid but forces `partition_order = 0` (the only
|
||
value that divides 1 evenly). The single sample is Rice-coded directly
|
||
because no predecessors exist for any LPC order.
|
||
|
||
## 6. Error Recovery
|
||
|
||
Decoders **must** detect and reject every frame that violates the
|
||
constraints elsewhere in this document. The exhaustive list of
|
||
rejection classes, each of which is a distinct error condition so
|
||
callers can distinguish them in telemetry, is:
|
||
|
||
1. **Sync word mismatch** — bytes `0-1` differ from `0x1ACC` (§3.1).
|
||
2. **`prediction_order` out of range** — value > `32` (§3.2).
|
||
3. **`partition_order` out of range** — value > `7` (§3.3).
|
||
4. **`coefficient_shift` out of range** — value > `5` (§3.4).
|
||
5. **Verbatim frame with non-zero shift** — `prediction_order == 0` and
|
||
`coefficient_shift != 0` (§3.4).
|
||
6. **`frame_sample_count == 0`** (§3.5).
|
||
7. **`frame_sample_count` not divisible by `partition_count`** (§3.3).
|
||
8. **Buffer truncated** — fewer bytes than the header plus coefficient
|
||
array requires, or fewer bits than the Rice bitstream demands
|
||
during streaming decode. This class is intentionally coarse-grained:
|
||
a single `Truncated` variant covers header truncation, missing `k`
|
||
fields, mid-codeword exhaustion, and every other "buffer ends early"
|
||
shape the decoder can encounter. Sub-categorising these provides no
|
||
caller benefit — the recovery action (discard and substitute
|
||
silence, see §6.1) is identical regardless of where truncation
|
||
happened.
|
||
9. **Per-partition `k` out of range** — value > `23` (§4.1).
|
||
10. **Unary-run cap exceeded** — any codeword with `q > u32::MAX >> k`
|
||
(§4.2).
|
||
|
||
On any of these, the decoder **must** discard the frame, produce no
|
||
output samples, and signal the error to the caller. **No partial
|
||
state may propagate** to the next frame's decode — frames are
|
||
independent (§2), so subsequent frames decode cleanly regardless.
|
||
|
||
### 6.1 Caller-side silence substitution
|
||
|
||
On rejection, the caller substitutes `frame_sample_count` zeros
|
||
(silence) for the frame period. The count is obtained as follows:
|
||
|
||
- **Post-header rejections** (classes 8-10 above — `Truncated` in the
|
||
Rice bitstream, `InvalidParameter` during Rice decode): the frame
|
||
header parsed successfully before the failure, so the count is
|
||
recoverable. The caller re-parses just the header on the same buffer
|
||
(reference API: `parse_header(data)`) and reads `frame_sample_count`
|
||
from the resulting `AudioFrameHeader`.
|
||
- **Pre-header rejections** (classes 1-7 above): the header itself
|
||
failed; the frame length is not recoverable from the bitstream. The
|
||
caller **must** fall back to a session-level default frame size
|
||
carried out-of-band by the container or transport (WebRTC and QUIC
|
||
audio sessions typically negotiate this at session setup).
|
||
|
||
This asymmetry is inherent to the wire format: `frame_sample_count`
|
||
lives inside the header at offset 5, so any rejection that happens
|
||
while parsing bytes 0-4 precedes its discovery.
|
||
|
||
### 6.2 Decoder output magnitude
|
||
|
||
On well-formed bitstreams produced by a compliant encoder from
|
||
in-contract samples (§1), decoder output satisfies
|
||
`|sample| ≤ 2²³ − 1`.
|
||
|
||
Adversarial bitstreams — those with hand-crafted coefficients and
|
||
residuals that pass every rejection check in this section yet
|
||
produce arithmetic results outside the sample-magnitude contract —
|
||
**may** produce output samples of any `i32` value, including values
|
||
that exceed `2²³ − 1`. The decoder **must not** panic or reject on
|
||
this basis: the wrapping-add semantics of §3.6 are precisely what
|
||
makes every bit sequence produce a defined output, which is the
|
||
ground of the "no partial state propagates" contract at the top of
|
||
this section.
|
||
|
||
Callers that re-feed decoder output into LAC's encoder (for example,
|
||
an MCU decode → PCM mix → re-encode pipeline) **should** validate or
|
||
clamp to the input magnitude contract before re-encoding. A
|
||
compliant encoder assumes its input satisfies `|sample| ≤ 2²³ − 1`
|
||
and is not required to re-validate.
|
||
|
||
## 7. Encoder Guidance (non-normative)
|
||
|
||
The reference encoder's search has three phases:
|
||
|
||
```text
|
||
# Phase 0: all-zero short-circuit
|
||
R[0] = Σ sample[i]²
|
||
if R[0] == 0:
|
||
emit frame with prediction_order = 0, any legal partition_order (§5.1)
|
||
return
|
||
|
||
# Phase 1: sparse LPC order grid with stop-when-stale early-out
|
||
for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]:
|
||
coeffs_q31 = levinson_durbin(samples, prediction_order) # cached
|
||
shift = smallest s such that every |coeff_real| < 2^s
|
||
coeffs_stored = quantize(coeffs_q31, to: Q(15 - shift))
|
||
residuals = compute_residuals(samples, coeffs_stored, shift)
|
||
for partition_order in 0..=7:
|
||
if frame_sample_count % (1 << partition_order) != 0: continue
|
||
rice_bits = estimate_cost(residuals, partition_order)
|
||
total = header_bits(prediction_order) + rice_bits
|
||
track minimum over (prediction_order, partition_order)
|
||
if no improvement for 2 consecutive grid entries: break
|
||
|
||
# Phase 2: fixed-predictor post-pass
|
||
for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS:
|
||
residuals = compute_residuals(samples, fp_coeffs, fp_shift)
|
||
for partition_order in 0..=7:
|
||
if frame_sample_count % (1 << partition_order) != 0: continue
|
||
rice_bits = estimate_cost(residuals, partition_order)
|
||
total = header_bits(fp_order) + rice_bits
|
||
track minimum over (fp_order, partition_order)
|
||
|
||
emit frame with the (order, partition_order) that minimised `total`
|
||
```
|
||
|
||
The sparse grid + early-out is a speed/compression trade-off; a
|
||
compliant encoder may still exhaustively search every integer order
|
||
`0..=32` for marginal gains at higher cost. The produced bitstreams
|
||
are interchangeable.
|
||
|
||
The fixed-predictor post-pass tries FLAC-style integer predictors
|
||
(orders 1-4 with a small static coefficient table) after the LPC
|
||
grid. These evaluate quickly and occasionally beat the Levinson-
|
||
Durbin winner on content where a low-order integer polynomial fits
|
||
better than the statistically-optimal LPC fit — silent-plus-DC, very
|
||
smooth tones, polynomial-ish sensor data. Running them second avoids
|
||
tripping the stop-when-stale heuristic in the LPC phase.
|
||
|
||
The reference encoder's `FIXED_PREDICTORS` table, materialising the
|
||
FLAC-style `[1]`, `[2, −1]`, `[3, −3, 1]`, `[4, −6, 4, −1]` integer
|
||
predictors at the smallest `coefficient_shift` that represents each
|
||
coefficient without clamping:
|
||
|
||
| `prediction_order` | `lpc_coefficients` (Q-format integers) | `coefficient_shift` | Real-value interpretation |
|
||
|-------------------:|---------------------------------------------|--------------------:|------------------------------|
|
||
| 1 | `[16384]` | 1 (Q14) | `[1]` |
|
||
| 2 | `[16384, −8192]` | 2 (Q13) | `[2, −1]` |
|
||
| 3 | `[24576, −24576, 8192]` | 2 (Q13) | `[3, −3, 1]` |
|
||
| 4 | `[16384, −24576, 16384, −4096]` | 3 (Q12) | `[4, −6, 4, −1]` |
|
||
|
||
These are the exact wire-format bytes a second-team encoder would emit
|
||
to match the reference's fixed-predictor outputs bit-for-bit. Compliant
|
||
encoders MAY use a different set (or none), since §3.6 treats the
|
||
coefficient field as opaque — decoders apply the synthesis formula
|
||
identically regardless of source.
|
||
|
||
The `R[0] == 0` short-circuit is both a correctness requirement (§5.1
|
||
— Levinson-Durbin is undefined at zero autocorrelation) and an
|
||
encoder-cost optimisation: on digital silence, the sparse grid and
|
||
fixed-predictor pass produce identical zero residuals and order 0
|
||
wins on header size alone.
|
||
|
||
Levinson-Durbin runs once to order 32 with all intermediate orders saved
|
||
into a flat buffer (one recursion pass yields all orders 1..32 at
|
||
`O(order²)` cost), so the outer loop fetch is free and order selection
|
||
is effectively `O(orders_tried × N)`.
|
||
|
||
`shift` is determined per order by the coefficient magnitudes — there
|
||
is no shift search, as smaller shifts are always at least as good as
|
||
larger ones when they don't clamp (saturation, §3.4, is the
|
||
fallback). Rice cost at a given `partition_order` is exact and
|
||
closed-form given the per-partition `k`, so the inner search
|
||
introduces no estimation error.
|
||
|
||
#### Levinson-Durbin numerical choices (reference, non-normative)
|
||
|
||
The reference encoder runs Levinson-Durbin with i64 autocorrelation
|
||
accumulators, Q31 working coefficients, and widens to i128 for
|
||
reflection-coefficient intermediates at orders where Q31 would lose
|
||
precision (typically above order ~12). Rounding on the Q31→Q(15−shift)
|
||
quantisation step is round-half-up via `(a_q31 + bias) >> shift_amt`,
|
||
with `bias = 1 << (14 + shift)` — the direct analogue of the synthesis
|
||
formula's rounding (§3.6), chosen so analysis and synthesis agree on
|
||
tie-break direction.
|
||
|
||
None of these choices are normative. Two encoders making different
|
||
precision or rounding choices will produce different coefficient
|
||
bytes on the same input, but both bitstreams decode correctly under
|
||
§3.6 so long as the coefficients they emit faithfully represent their
|
||
own LPC decision. §3.6's "wire format doesn't distinguish coefficients
|
||
by derivation" clause is exactly what permits this freedom.
|
||
|
||
#### Rice `k` selection (reference, non-normative)
|
||
|
||
Cost is closed-form for a given `(partition, k)`: `bit_cost(k) =
|
||
N · (1 + k) + Σ (v >> k)` where `v` ranges over the zigzag-mapped
|
||
residuals of the partition. Exhaustive search over `k ∈ [0, 23]` is
|
||
always acceptable and is the simplest compliant choice.
|
||
|
||
The reference encoder uses convex descent: seed `k_seed =
|
||
⌊log₂(mean(v))⌋`, then walk either direction. Ties break **toward
|
||
smaller `k`** on descent (condition `cost ≤ best_cost`, so equal
|
||
costs at `k−1` overwrite `k`) and **strictly larger costs only** on
|
||
ascent (condition `cost < best_cost`, so equal costs at `k+1` don't
|
||
override `k`). This yields a unique `k` per partition matching an
|
||
exhaustive search's first-wins tie-break.
|
||
|
||
On the reference corpus, the sparse grid's compression matches
|
||
exhaustive-search output within ~0.2 percentage points (measured);
|
||
the differential test caps the acceptable excess at 0.5%.
|
||
Implementations that prefer tighter compression at higher encode
|
||
cost can extend the grid without wire-format consequences — decoders
|
||
do not care which orders an encoder tried.
|
||
|
||
### 7.1 Frame size (non-normative)
|
||
|
||
The codec accepts any `frame_sample_count` in `[1, 65535]`. Larger
|
||
frames amortise the 7-byte fixed header and the LPC coefficient vector
|
||
over more samples, and generally compress better. Smaller frames give
|
||
tighter latency and finer partition-order granularity on transient
|
||
content.
|
||
|
||
Recommended defaults by use case:
|
||
|
||
- **Real-time voice** (QUIC streaming, MCU mix): 160-320 samples at
|
||
16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used
|
||
by Opus / WebRTC.
|
||
- **Real-time full-band** (game audio, music conferencing): 1024-2048
|
||
at 48 kHz (21-43 ms).
|
||
- **Offline / archival**: 4096-8192 at 48 kHz; compression gains
|
||
flatten past this.
|
||
|
||
Power-of-two frame sizes expose every `partition_order ∈ [0, 7]` to
|
||
the encoder's search. Non-power-of-two sizes restrict the search to
|
||
partition counts that divide evenly; a prime frame size forces
|
||
`partition_order = 0`. Encoders SHOULD prefer frame sizes with several
|
||
small-prime factors when free to choose.
|
||
|
||
## 8. Versioning
|
||
|
||
This document specifies **LAC version 1**, identified on the wire by
|
||
`sync_word = 0x1ACC`. No per-frame version byte is carried; the sync
|
||
word uniquely identifies the wire format.
|
||
|
||
Future revisions of the format **must** use a distinct `sync_word`. The
|
||
recommended allocation is `0x1ACD` for v2, `0x1ACE` for v3, and so on
|
||
inside the `0x1ACC..0x1ACF` cluster, with the cluster boundary making
|
||
casual grep / hex-dump inspection robust. A revision whose wire format
|
||
cannot be made compatible with v1 at the header level **must** pick a
|
||
sync word outside this cluster.
|
||
|
||
This approach is exhaustive by construction:
|
||
|
||
- A v1 decoder that encounters a v2 frame sees an unrecognised sync
|
||
word on the first check (§3.1) and rejects cleanly — the same error
|
||
path as foreign or corrupted payloads.
|
||
- A v2-aware decoder dispatches on the sync word before reading any
|
||
further field, so it can fall back to v1 parsing when appropriate
|
||
or decode v2 frames natively.
|
||
|
||
Because every field in §3 has a strict range with no reserved
|
||
high-bit patterns, in-place extension (flag bits inside existing
|
||
fields) is **not** a supported evolution path. New features go into a
|
||
new `sync_word`, not into reinterpreting existing field values.
|
||
|
||
Transports that multiplex LAC frames with other formats should frame
|
||
each LAC payload explicitly (length prefix or stream separator); the
|
||
sync word alone is not a framing delimiter, only a format identifier.
|
||
|
||
## 9. Implementation notes (non-normative)
|
||
|
||
### 9.1 GPU offload is out of scope
|
||
|
||
LAC is a scalar integer codec. The reference implementation, and any
|
||
conforming implementation this document anticipates, runs on a CPU.
|
||
GPU offload is deliberately not a goal:
|
||
|
||
- **Levinson-Durbin** is serial by construction (each iteration depends
|
||
on the previous) and its intermediate accumulator needs more than 64
|
||
bits of precision at higher orders — a poor fit for WGSL or SPIR-V
|
||
compute shaders, which have no native 128-bit integer arithmetic.
|
||
- **Rice decode** uses a data-dependent unary run for every residual;
|
||
on GPU execution models this diverges warps badly and its
|
||
sequential bit-cursor progression fights SIMD lane packing.
|
||
- **LPC synthesis** has a tight per-sample feedback loop (sample `i`
|
||
depends on samples `i-1`, `i-2`, …, `i-order`), so each channel is
|
||
inherently serial.
|
||
- **The one plausibly GPU-parallel phase** — residual computation
|
||
inside the encoder's order search — is also the phase where the
|
||
CPU's autovectorized implementation is already well-served by SIMD
|
||
on any modern target. At the measured encode latencies (P99 under
|
||
50 µs on x86 for a 20 ms frame period, >400× headroom), there is no
|
||
motivation to offload it.
|
||
|
||
A hypothetical future revision whose hot path genuinely benefited from
|
||
GPU execution (large-batch archival encoding across many channels at
|
||
once, for instance) would need to change the wire format to carry
|
||
enough shape metadata for batched kernels — i.e., a new sync word
|
||
under the versioning rules in §8, not a retrofit.
|
||
|
||
## 10. Conformance test vectors
|
||
|
||
The reference repository's `tests/conformance.rs` holds the canonical
|
||
test-vector set for this specification:
|
||
|
||
- **`DECODE_FIXTURES`** — `(samples, bytes)` pairs pinned at the byte
|
||
level. A conformant decoder **must** produce the `samples` array
|
||
when fed the `bytes` array. Encoders have latitude (§3.6, §7), so a
|
||
second-team encoder's bytes for the same samples may differ; the
|
||
decoder direction is the normative one. Coverage includes the
|
||
smallest valid frames (single-sample verbatim, 4- and 8-sample
|
||
silence), single-sample polarity boundaries (±1, ±(2²³−1)), DC
|
||
offset, alternating-polarity Nyquist-like content, smooth polynomial
|
||
(fixed-predictor territory), and a 16-sample growing-amplitude
|
||
pattern exercising partition search.
|
||
- **`REJECT_FIXTURES`** — hand-constructed malformed inputs mapped to
|
||
their expected rejection variants. Covers every class in §6 (1-10):
|
||
bad sync, each field-range violation, verbatim + non-zero shift,
|
||
`frame_sample_count == 0`, non-divisible partition count, header /
|
||
coefficient / Rice-bitstream truncation, per-partition `k > 23`.
|
||
- **`reject_unary_run_above_cap`** — a programmatic test for §6 class
|
||
10 (`q > u32::MAX >> k`). The minimal triggering payload is ~75
|
||
bytes of mostly zeros; construction logic is in the test, not a
|
||
const fixture.
|
||
|
||
Second-team implementations should port the decode fixtures
|
||
byte-for-byte and the reject fixtures byte-and-variant-for-variant.
|
||
`encode_matches_fixtures` in the same file is reference-specific (it
|
||
asserts the reference encoder's exact bytes) and is **not** a
|
||
conformance requirement — see §3.6's encoder-latitude clause.
|
||
|
||
### 10.1 Reference encoder exemplars (non-normative)
|
||
|
||
The same `(samples, bytes)` pairs, read in the encoder direction,
|
||
serve as **reference-encoder validation targets** for implementations
|
||
that want to match this project's reference byte-for-byte — a common
|
||
goal for porting work even though the spec does not require it. The
|
||
fixture set is deliberately chosen to pin every encoder-discretion
|
||
axis from §7:
|
||
|
||
- `single_zero`, `single_pos_one`, `single_neg_one` — single-sample
|
||
frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime
|
||
where the encoder's order choice is nearly arbitrary; pinning the
|
||
bytes fixes this project's choice (order 0 with the minimum-cost
|
||
Rice encoding).
|
||
- `silence_4`, `silence_8` — force `partition_order` tie-breaks on an
|
||
all-zero frame (every `partition_order ∈ [0, log₂(N)]` produces
|
||
identical cost; the reference picks the smallest via its convex-
|
||
descent tie-break).
|
||
- `dc_100_4`, `alternating_small_4` — exercise the order-vs-verbatim
|
||
decision. DC content favours a low-order LPC fit with small
|
||
residuals; alternating content favours order 1 with `a = −1`
|
||
(approximated at the closest Q-format). Pinning the bytes fixes
|
||
the reference's decision boundary.
|
||
- `single_full_scale_pos`, `single_full_scale_neg` — maximum-magnitude
|
||
single samples. Exercise the `|sample| ≤ 2²³ − 1` boundary on both
|
||
sides and fix the zigzag-of-extremum output.
|
||
- `linear_ramp_8` — smooth polynomial content, fixed-predictor
|
||
territory. Pins the reference's fixed-predictor-vs-LPC tie-break.
|
||
- `lfsr_noise_16` — exercises partition search on a frame large
|
||
enough for `partition_order > 0` to be competitive.
|
||
|
||
A second-team encoder that produces the same bytes for every entry
|
||
here is **likely** (not guaranteed) to produce matching bytes on
|
||
wider inputs, since the tie-break axes are the ones most sensitive
|
||
to encoder discretion. An encoder that produces different bytes is
|
||
still compliant so long as its own bytes round-trip — see §3.6, §7.
|