Add Specification
parent
d869cfdac6
commit
5310d749b1
1 changed files with 784 additions and 0 deletions
784
Specification.md
Normal file
784
Specification.md
Normal file
|
|
@ -0,0 +1,784 @@
|
||||||
|
# LAC Wire Format
|
||||||
|
|
||||||
|
Normative specification of the LAC bitstream. This document is the authority
|
||||||
|
on byte layout, field semantics, and encoder/decoder constraints.
|
||||||
|
|
||||||
|
## 1. Conventions
|
||||||
|
|
||||||
|
- All multi-byte integer fields are **big-endian**.
|
||||||
|
- Bit streams are **MSB-first**: the first bit written occupies bit 7 of its
|
||||||
|
byte, subsequent bits fill lower positions, and a new byte begins once
|
||||||
|
eight bits have been emitted.
|
||||||
|
- Samples are **signed integers** passed as `i32` with magnitude bounded
|
||||||
|
by `|sample| ≤ 2²³ − 1`. The upper 9 bits of each `i32` must be a
|
||||||
|
consistent sign extension of the 24-bit-magnitude value. Narrower source
|
||||||
|
formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly
|
||||||
|
— they trivially satisfy the magnitude bound — and compress at the bit
|
||||||
|
cost of their actual values, not a 24-bit ceiling. The codec does not
|
||||||
|
carry bit-depth metadata; the container or application layer is
|
||||||
|
responsible for remembering the source format.
|
||||||
|
- Sample rate is **not** part of the bitstream. The container or transport
|
||||||
|
carries it.
|
||||||
|
- A frame encodes **one channel**. Stereo is two independent streams of
|
||||||
|
frames, one per channel.
|
||||||
|
|
||||||
|
## 2. Frame Layout
|
||||||
|
|
||||||
|
A frame is a contiguous byte sequence:
|
||||||
|
|
||||||
|
```text
|
||||||
|
+--------+--------------------+
|
||||||
|
| header | rice_bitstream |
|
||||||
|
+--------+--------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
The `header` is fixed-structure (variable length because the coefficient
|
||||||
|
array depends on `prediction_order`). The `rice_bitstream` is a bit-packed
|
||||||
|
payload; its byte length is `ceil(total_rice_bits / 8)` with zero padding in
|
||||||
|
the low bits of the last byte.
|
||||||
|
|
||||||
|
Decoder input is the complete frame. There is no intra-frame continuation or
|
||||||
|
fragmentation — the transport layer handles that.
|
||||||
|
|
||||||
|
## 3. Frame Header
|
||||||
|
|
||||||
|
```text
|
||||||
|
Offset Size Field Type Constraint
|
||||||
|
------ ---- -------------------- ------- ----------------------------
|
||||||
|
0-1 2 sync_word u16 BE == 0x1ACC
|
||||||
|
2 1 prediction_order u8 ∈ [0, 32]
|
||||||
|
3 1 partition_order u8 ∈ [0, 7]
|
||||||
|
4 1 coefficient_shift u8 ∈ [0, 5]
|
||||||
|
5-6 2 frame_sample_count u16 BE ≥ 1, % (1 << partition_order) == 0
|
||||||
|
7+ 2·p lpc_coefficients i16 BE[] length = prediction_order = p
|
||||||
|
```
|
||||||
|
|
||||||
|
Total header length: `7 + 2 · prediction_order` bytes.
|
||||||
|
|
||||||
|
### 3.1 `sync_word`
|
||||||
|
|
||||||
|
Fixed value `0x1ACC`. Present to identify a LAC frame on lightly framed
|
||||||
|
transports and to reject foreign payloads at the first check. Decoders
|
||||||
|
**must** reject any frame whose first two bytes are not `0x1ACC`.
|
||||||
|
|
||||||
|
### 3.2 `prediction_order`
|
||||||
|
|
||||||
|
Integer order of the LPC analysis filter used to produce residuals.
|
||||||
|
|
||||||
|
- Value `0` is **verbatim mode**: no prediction, residuals equal the samples,
|
||||||
|
and the `lpc_coefficients` array is empty (zero bytes).
|
||||||
|
- Values `1` through `32` are standard LPC orders; `lpc_coefficients` carries
|
||||||
|
exactly that many predictor coefficients, interpreted in the Q-format
|
||||||
|
determined by `coefficient_shift` (§3.4).
|
||||||
|
|
||||||
|
Decoders **must** reject values above 32.
|
||||||
|
|
||||||
|
### 3.3 `partition_order`
|
||||||
|
|
||||||
|
Controls how the residual stream is split for Rice coding.
|
||||||
|
|
||||||
|
- `partition_count = 1 << partition_order`.
|
||||||
|
- The residual stream is divided into `partition_count` equal partitions of
|
||||||
|
`frame_sample_count / partition_count` samples each.
|
||||||
|
|
||||||
|
Decoders **must** reject values above 7 and **must** reject frames where
|
||||||
|
`frame_sample_count` is not a multiple of `partition_count`.
|
||||||
|
|
||||||
|
### 3.4 `coefficient_shift`
|
||||||
|
|
||||||
|
Controls the fixed-point scale of the stored Q-format LPC predictor
|
||||||
|
coefficients. Coefficients are stored as 16-bit integers interpreted as
|
||||||
|
`Q(15 − coefficient_shift)`:
|
||||||
|
|
||||||
|
| shift | Q-format | Real-value range | Use case |
|
||||||
|
|-------|----------|-------------------|------------------------------------------------|
|
||||||
|
| 0 | Q15 | `[−1, 1)` | Coefficients with magnitude < 1 (most orders > 1) |
|
||||||
|
| 1 | Q14 | `[−2, 2)` | Low-frequency content, `a[1]` near −2 |
|
||||||
|
| 2 | Q13 | `[−4, 4)` | Extreme bass / narrow resonances |
|
||||||
|
| 3 | Q12 | `[−8, 8)` | Pathological transients |
|
||||||
|
| 4 | Q11 | `[−16, 16)` | Reserved for synthetic signals |
|
||||||
|
| 5 | Q10 | `[−32, 32)` | Upper bound; decoder rejects larger values |
|
||||||
|
|
||||||
|
The encoder **must** select the smallest `coefficient_shift` at which no
|
||||||
|
coefficient's real value exceeds the representable range for that shift —
|
||||||
|
i.e., the smallest scale that does not clamp. Smaller shifts give finer
|
||||||
|
precision and thus smaller residuals when no clamping is required.
|
||||||
|
|
||||||
|
If no `coefficient_shift ∈ [0, 5]` suffices (the real coefficient
|
||||||
|
magnitude exceeds the Q10 range at `shift = 5`), the encoder
|
||||||
|
saturates each offending coefficient independently to the i16 range
|
||||||
|
`[−32768, 32767]`. Bit-exact round-trip is preserved because the
|
||||||
|
decoder applies the synthesis formula to whatever 16-bit values the
|
||||||
|
wire carries; the cost of saturation is compression — the predictor
|
||||||
|
no longer matches the encoder's ideal coefficients, so residuals
|
||||||
|
grow. Real audio at the input-magnitude contract (§1) rarely reaches
|
||||||
|
this case; synthetic or adversarial inputs can force it.
|
||||||
|
|
||||||
|
Decoders **must** reject values above 5. The shift applies uniformly to
|
||||||
|
every coefficient in `lpc_coefficients`; there is no per-coefficient
|
||||||
|
scale.
|
||||||
|
|
||||||
|
When `prediction_order == 0` (verbatim frame), `coefficient_shift`
|
||||||
|
**must** be `0`. The shift only modifies how stored coefficients are
|
||||||
|
interpreted, and a verbatim frame stores none. Decoders **must**
|
||||||
|
reject frames with `prediction_order == 0` and `coefficient_shift != 0`
|
||||||
|
as malformed; this rule closes the space of legal but meaningless
|
||||||
|
headers so two implementations agree bit-for-bit on which inputs
|
||||||
|
round-trip.
|
||||||
|
|
||||||
|
### 3.5 `frame_sample_count`
|
||||||
|
|
||||||
|
Number of audio samples produced by this frame (also the number of residuals
|
||||||
|
in the Rice bitstream). The value **must** be in `[1, 65535]`.
|
||||||
|
|
||||||
|
Decoders **must** reject `frame_sample_count == 0`: a zero-sample frame
|
||||||
|
trivially satisfies the partition-divisibility check below
|
||||||
|
(`0 mod n == 0` for any `n`) but carries no audio and has no legal
|
||||||
|
Rice payload.
|
||||||
|
|
||||||
|
For compliance with `partition_order`, the value **must** satisfy
|
||||||
|
`frame_sample_count mod (1 << partition_order) == 0`. Decoders **must**
|
||||||
|
reject frames where this does not hold.
|
||||||
|
|
||||||
|
### 3.6 `lpc_coefficients`
|
||||||
|
|
||||||
|
Array of `prediction_order` predictor coefficients, each a 16-bit big-endian
|
||||||
|
signed integer interpreted in `Q(15 − coefficient_shift)` format — see §3.4
|
||||||
|
for the shift semantics.
|
||||||
|
|
||||||
|
The wire format does not distinguish coefficients by derivation. The
|
||||||
|
synthesis formula below applies identically whether the encoder
|
||||||
|
obtained the values from Levinson-Durbin analysis, from a fixed
|
||||||
|
coefficient template (e.g. FLAC-style integer predictors), from a
|
||||||
|
trained model, or from any other strategy. What goes on the wire is
|
||||||
|
just `prediction_order` 16-bit integers; how the encoder chose them
|
||||||
|
is encoder-internal and not observable to the decoder.
|
||||||
|
|
||||||
|
Synthesis formula (applied in the decoder):
|
||||||
|
|
||||||
|
```text
|
||||||
|
s = 15 − coefficient_shift
|
||||||
|
bias = 1 << (s − 1)
|
||||||
|
predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i − j − 1] + bias) >> s
|
||||||
|
sample[i] = residual[i] + predict[i]
|
||||||
|
```
|
||||||
|
|
||||||
|
where `terms = min(i, prediction_order)`. The `+ bias` term implements
|
||||||
|
round-to-nearest for the right shift and is **required** for bit-exact
|
||||||
|
decoding. For the default `coefficient_shift = 0`: `s = 15`, `bias = 16384`.
|
||||||
|
|
||||||
|
The `>> s` operator **must** be an **arithmetic right shift** on
|
||||||
|
signed integers — equivalent to floor division by `2^s`. Combined with
|
||||||
|
the `+ bias` pre-add, this implements **round-half toward +∞**: on a
|
||||||
|
value whose scaled form is exactly `k + 0.5`, the result is `k + 1`
|
||||||
|
for both positive and negative `k`. Implementations using truncating
|
||||||
|
integer division (C's `/` on signed integers, which rounds toward
|
||||||
|
zero) **will diverge** from this on any `sum + bias` that is negative
|
||||||
|
and not evenly divisible by `2^s`: arithmetic shift rounds further
|
||||||
|
from zero, truncating division rounds toward zero. Concrete example:
|
||||||
|
at `s = 15`, `sum = -32769`, `bias = 16384`, arithmetic shift gives
|
||||||
|
`(-16385) >> 15 = -1`, truncating division gives `-16385 / 32768 = 0`.
|
||||||
|
Decoders in languages whose native integer division does not floor
|
||||||
|
**must** emulate arithmetic right shift explicitly on the signed
|
||||||
|
accumulator.
|
||||||
|
|
||||||
|
#### Accumulator width
|
||||||
|
|
||||||
|
The inner sum `Σ coeff[j] · sample[i − j − 1]` **must** be computed in
|
||||||
|
a signed integer accumulator of at least **49 bits** (equivalently: an
|
||||||
|
`i64` or wider). Worst-case bounds at `prediction_order = 32`,
|
||||||
|
`coefficient_shift = 5` (Q10), and full-scale samples give a product
|
||||||
|
of magnitude `(2¹⁵) · (2²³ − 1) ≈ 2³⁸` per term, summed over 32 terms
|
||||||
|
for a maximum of `~2⁴³`. Adding the bias keeps the result below `2⁴⁴`.
|
||||||
|
A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs —
|
||||||
|
implementations that reach for `int32_t` because samples are 32-bit
|
||||||
|
will silently corrupt high-order frames.
|
||||||
|
|
||||||
|
JavaScript / TypeScript implementers should note that `Number` is an
|
||||||
|
IEEE 754 double, not a signed integer: its 53-bit safe-integer range
|
||||||
|
covers in-contract accumulator values, but adversarial bitstreams (see
|
||||||
|
§6.2) can produce out-of-contract samples whose synthesis arithmetic
|
||||||
|
lands in the 2⁴⁹–2⁵¹ range and beyond, where `Number` silently loses
|
||||||
|
low bits to float rounding. For bit-exact spec compliance in JS/TS,
|
||||||
|
**use `BigInt` for the accumulator** — it has the integer semantics
|
||||||
|
the spec requires; `Number` does not.
|
||||||
|
|
||||||
|
#### Warm-up (`terms == 0`)
|
||||||
|
|
||||||
|
When `i == 0`, `terms = min(0, prediction_order) = 0`. The sum is
|
||||||
|
empty and `predict[0] = 0` — the `(0 + bias) >> s` formula is **not**
|
||||||
|
applied. Stating this explicitly avoids an implementation that
|
||||||
|
mechanically applies the formula in the warm-up case and produces
|
||||||
|
`predict[0] = bias >> s`, which is zero only in specific
|
||||||
|
`(bias, s)` parametrisations and surprising in any other.
|
||||||
|
|
||||||
|
For `0 < i < prediction_order`, the sum truncates to the available
|
||||||
|
`i` predecessors (`terms = i`). The formula applies as stated.
|
||||||
|
|
||||||
|
#### Sign convention for stored coefficients
|
||||||
|
|
||||||
|
The synthesis formula uses `+Σ`. Classical Levinson-Durbin
|
||||||
|
implementations that derive LPC from the error-prediction AR model
|
||||||
|
|
||||||
|
```text
|
||||||
|
x[n] = −Σ a[j] · x[n-j] + e[n] (error convention)
|
||||||
|
```
|
||||||
|
|
||||||
|
produce coefficients `a[j]` whose sign is the **opposite** of what
|
||||||
|
the synthesis formula expects; those encoders **must** negate before
|
||||||
|
quantisation so the wire value is `coeff[j-1] = −a[j]`.
|
||||||
|
Implementations using the predictor convention
|
||||||
|
|
||||||
|
```text
|
||||||
|
x̂[n] = +Σ c[j] · x[n-j] (predictor convention)
|
||||||
|
```
|
||||||
|
|
||||||
|
store `c[j]` directly.
|
||||||
|
|
||||||
|
Both conventions are common in DSP literature. Encoders **must**
|
||||||
|
verify that the coefficients emitted on the wire, when substituted
|
||||||
|
into the synthesis formula above, reproduce the encoder's own
|
||||||
|
prediction. The reference implementation uses the error convention
|
||||||
|
and negates at quantisation time.
|
||||||
|
|
||||||
|
#### Overflow semantics of the final add
|
||||||
|
|
||||||
|
The `residual[i] + predict[i]` add is specified as a **wrapping i32
|
||||||
|
add** (two's complement, modulo `2³²`; in languages without native
|
||||||
|
signed-overflow semantics, compute `(residual + predict) & 0xFFFFFFFF`
|
||||||
|
and then re-interpret as a signed 32-bit integer via sign-extension
|
||||||
|
of bit 31). On well-formed bitstreams — those produced by a compliant
|
||||||
|
encoder from in-contract samples (§1) — the result stays inside the
|
||||||
|
sample-magnitude contract and the wrap is never observable.
|
||||||
|
Adversarial bitstreams with crafted coefficients and residuals **may**
|
||||||
|
produce any `i32` value; the decoder **must not** panic, abort, or
|
||||||
|
reject on the basis of this add's result. The consequences of
|
||||||
|
out-of-contract decoder output are addressed in §6.2.
|
||||||
|
|
||||||
|
## 4. Rice Bitstream
|
||||||
|
|
||||||
|
Immediately follows the header. Flat MSB-first bitstream structured as
|
||||||
|
consecutive partition payloads:
|
||||||
|
|
||||||
|
```text
|
||||||
|
+---------+---------+-----+-----------+
|
||||||
|
| part. 0 | part. 1 | ... | part. P-1 |
|
||||||
|
+---------+---------+-----+-----------+
|
||||||
|
```
|
||||||
|
|
||||||
|
where `P = 1 << partition_order`. Each partition has the same structure:
|
||||||
|
|
||||||
|
```text
|
||||||
|
+-------+-------------+-------------+-----+-------------+
|
||||||
|
| k (5) | codeword 0 | codeword 1 | ... | codeword M-1|
|
||||||
|
+-------+-------------+-------------+-----+-------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
where `M = frame_sample_count / P` is the per-partition residual count.
|
||||||
|
|
||||||
|
Partitions are **bit-contiguous**: the 5-bit `k` field of partition
|
||||||
|
`i + 1` begins at the bit immediately following the last codeword of
|
||||||
|
partition `i`. There is no byte alignment between partitions. Only
|
||||||
|
the final trailing padding described in §4.3 is byte-aligned.
|
||||||
|
|
||||||
|
Within a partition, the bit cursor likewise advances continuously:
|
||||||
|
codeword 0 begins at the bit immediately following the 5-bit `k`
|
||||||
|
field, codeword 1 immediately after codeword 0's remainder bit, and so
|
||||||
|
on. A conformant decoder maintains a single bit-read position across
|
||||||
|
the entire Rice bitstream — from the `k` of partition 0 through the
|
||||||
|
last codeword of partition P−1 — and never realigns to a byte or bit
|
||||||
|
boundary between fields. This is implicit in the byte-stream decoder
|
||||||
|
design (a bit reader that consumes bits sequentially needs no
|
||||||
|
special handling at field boundaries) but stated here so second-team
|
||||||
|
implementations do not introduce a spurious alignment.
|
||||||
|
|
||||||
|
### 4.1 Per-Partition Parameter `k`
|
||||||
|
|
||||||
|
Five-bit unsigned integer, MSB-first, immediately before the partition's
|
||||||
|
codewords. `k` is the Rice parameter for this partition and must be in
|
||||||
|
`[0, 23]`. Decoders **must** reject values above 23 as malformed.
|
||||||
|
|
||||||
|
### 4.2 Codeword
|
||||||
|
|
||||||
|
Each residual is encoded by:
|
||||||
|
|
||||||
|
1. **Zigzag mapping** from signed to unsigned:
|
||||||
|
|
||||||
|
```text
|
||||||
|
z = (r << 1) ^ (r >> 31) interpreted as u32
|
||||||
|
```
|
||||||
|
|
||||||
|
where `(r >> 31)` is an **arithmetic right shift** on the i32
|
||||||
|
residual, sign-extending the sign bit to all 32 bit positions — `0`
|
||||||
|
for non-negative `r`, `-1` (all ones) for negative `r`. The entire
|
||||||
|
expression is **masked to 32 bits** before being interpreted as
|
||||||
|
`u32`; in languages with arbitrary-precision integers (e.g. Python)
|
||||||
|
or where native bitwise ops return signed 32-bit (e.g. JavaScript /
|
||||||
|
TypeScript `Number`, where `(x ^ y) >>> 0` coerces to u32), this
|
||||||
|
mask is explicit (`& 0xFFFFFFFF` or `>>> 0`) and **required** —
|
||||||
|
without it, negative residuals produce zigzag values with extra
|
||||||
|
high bits set. Implementations in languages whose native right
|
||||||
|
shift is logical on unsigned types **must** coerce `r` to a signed
|
||||||
|
32-bit type first; implementations in languages where `>>` on
|
||||||
|
signed types is implementation-defined (e.g. pre-C++20 C/C++)
|
||||||
|
**must** emulate the arithmetic shift explicitly.
|
||||||
|
|
||||||
|
The `r << 1` factor is always safe on i32. Residuals on the
|
||||||
|
encoder side are bounded by `|r| ≤ |sample| + |predict| ≤ 2·2²³
|
||||||
|
≈ 2²⁴`, so `r << 1` has magnitude `≤ 2²⁵` and fits in i32 without
|
||||||
|
overflow or undefined behaviour — even for `r` at the most
|
||||||
|
negative value the encoder can ever produce from in-contract
|
||||||
|
input (§1). Decoder-side code does not perform this shift; the
|
||||||
|
inverse uses `z >> 1` on a u32, which is always defined.
|
||||||
|
|
||||||
|
The mapping sends
|
||||||
|
|
||||||
|
```text
|
||||||
|
{0, −1, 1, −2, 2, −3, 3, …} → {0, 1, 2, 3, 4, 5, 6, …}
|
||||||
|
```
|
||||||
|
|
||||||
|
so small magnitudes of either sign map to small unsigned values.
|
||||||
|
|
||||||
|
The decoder's inverse is
|
||||||
|
|
||||||
|
```text
|
||||||
|
r = ((z >> 1) as i32) ^ −((z & 1) as i32)
|
||||||
|
```
|
||||||
|
|
||||||
|
where both shifts here are natural (unsigned-u32 logical for the
|
||||||
|
first, integer negation for the second). Stating this inverse
|
||||||
|
explicitly removes any ambiguity about how an implementation must
|
||||||
|
invert the zigzag.
|
||||||
|
|
||||||
|
2. **Rice code** at parameter `k`:
|
||||||
|
- **Unary part**: `q = z >> k` zero-bits followed by a single
|
||||||
|
terminating 1-bit.
|
||||||
|
- **Remainder part**: `k` bits of `z & ((1 << k) − 1)`, MSB-first.
|
||||||
|
(The remainder part is absent when `k == 0`.)
|
||||||
|
|
||||||
|
Total codeword length: `q + 1 + k` bits.
|
||||||
|
|
||||||
|
#### Decoder-side unary-run bound
|
||||||
|
|
||||||
|
Decoders **must** reject any codeword whose unary run length satisfies
|
||||||
|
|
||||||
|
```text
|
||||||
|
q > (2³² − 1) >> k (equivalently, q > u32::MAX >> k)
|
||||||
|
```
|
||||||
|
|
||||||
|
A valid codeword reconstructs `z = (q << k) | remainder` as a u32.
|
||||||
|
`q > u32::MAX >> k` implies `q << k ≥ 2³²`, which either overflows
|
||||||
|
u32 silently (a critical decoder bug class — corrupt output with no
|
||||||
|
error) or indicates a malformed stream. Either way the frame **must**
|
||||||
|
be discarded with `InvalidParameter` or equivalent rejection class
|
||||||
|
(§6). The bound varies with `k`: at `k = 23` it is `511`, at `k = 0`
|
||||||
|
it is `u32::MAX` (no practical constraint).
|
||||||
|
|
||||||
|
This rule also caps the CPU cost of unary scanning on adversarial
|
||||||
|
input: without the cap, a decoder could be forced to scan an
|
||||||
|
arbitrarily long run of zero bits before reaching either a `1` or
|
||||||
|
the buffer end.
|
||||||
|
|
||||||
|
### 4.3 Byte Padding
|
||||||
|
|
||||||
|
After the last codeword of the last partition, any unused bits of the final
|
||||||
|
byte are zero-padded on the LSB side. The encoder writes `0` for all padding
|
||||||
|
bits; the decoder ignores them.
|
||||||
|
|
||||||
|
### 4.4 Bitstream Length
|
||||||
|
|
||||||
|
The Rice bitstream's total bit length is the sum of codeword bit
|
||||||
|
lengths across every partition, plus 5 bits per partition for the `k`
|
||||||
|
fields:
|
||||||
|
|
||||||
|
```text
|
||||||
|
total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition)
|
||||||
|
```
|
||||||
|
|
||||||
|
This depends on every residual's quotient and cannot be computed
|
||||||
|
from the frame header alone. A decoder **streams-decodes** until it
|
||||||
|
has produced exactly `frame_sample_count` samples, then stops.
|
||||||
|
Any bits remaining inside the last byte are padding (§4.3) and
|
||||||
|
carry no information.
|
||||||
|
|
||||||
|
A decoder **must not** require the Rice bitstream's byte length to be
|
||||||
|
signalled out-of-band. The header plus the zero-padded byte-aligned
|
||||||
|
tail fully determines the frame boundary; `parse_header`'s
|
||||||
|
`bytes_consumed` return plus streaming Rice decode locates the end of
|
||||||
|
the frame in the input buffer.
|
||||||
|
|
||||||
|
## 5. Degenerate Cases
|
||||||
|
|
||||||
|
### 5.1 All-Zero Frame
|
||||||
|
|
||||||
|
For an all-zero sample vector, the encoder **must** use
|
||||||
|
`prediction_order = 0` because the Levinson-Durbin recursion is
|
||||||
|
undefined at `R[0] = 0`. Residuals equal the input (all zeros).
|
||||||
|
|
||||||
|
Partition-order and per-partition `k` selection remain at the
|
||||||
|
encoder's discretion; any legal `(partition_order, k)` combination
|
||||||
|
produces a bit-exact-decodable frame. The minimum-cost choice —
|
||||||
|
`partition_order = 0`, `k = 0` — produces a Rice payload of exactly
|
||||||
|
`5 + frame_sample_count` bits. Compliant encoders are **not**
|
||||||
|
required to pick this minimum.
|
||||||
|
|
||||||
|
### 5.2 Single-Sample Frame
|
||||||
|
|
||||||
|
`frame_sample_count = 1` is valid but forces `partition_order = 0` (the only
|
||||||
|
value that divides 1 evenly). The single sample is Rice-coded directly
|
||||||
|
because no predecessors exist for any LPC order.
|
||||||
|
|
||||||
|
## 6. Error Recovery
|
||||||
|
|
||||||
|
Decoders **must** detect and reject every frame that violates the
|
||||||
|
constraints elsewhere in this document. The exhaustive list of
|
||||||
|
rejection classes, each of which is a distinct error condition so
|
||||||
|
callers can distinguish them in telemetry, is:
|
||||||
|
|
||||||
|
1. **Sync word mismatch** — bytes `0-1` differ from `0x1ACC` (§3.1).
|
||||||
|
2. **`prediction_order` out of range** — value > `32` (§3.2).
|
||||||
|
3. **`partition_order` out of range** — value > `7` (§3.3).
|
||||||
|
4. **`coefficient_shift` out of range** — value > `5` (§3.4).
|
||||||
|
5. **Verbatim frame with non-zero shift** — `prediction_order == 0` and
|
||||||
|
`coefficient_shift != 0` (§3.4).
|
||||||
|
6. **`frame_sample_count == 0`** (§3.5).
|
||||||
|
7. **`frame_sample_count` not divisible by `partition_count`** (§3.3).
|
||||||
|
8. **Buffer truncated** — fewer bytes than the header plus coefficient
|
||||||
|
array requires, or fewer bits than the Rice bitstream demands
|
||||||
|
during streaming decode. This class is intentionally coarse-grained:
|
||||||
|
a single `Truncated` variant covers header truncation, missing `k`
|
||||||
|
fields, mid-codeword exhaustion, and every other "buffer ends early"
|
||||||
|
shape the decoder can encounter. Sub-categorising these provides no
|
||||||
|
caller benefit — the recovery action (discard and substitute
|
||||||
|
silence, see §6.1) is identical regardless of where truncation
|
||||||
|
happened.
|
||||||
|
9. **Per-partition `k` out of range** — value > `23` (§4.1).
|
||||||
|
10. **Unary-run cap exceeded** — any codeword with `q > u32::MAX >> k`
|
||||||
|
(§4.2).
|
||||||
|
|
||||||
|
On any of these, the decoder **must** discard the frame, produce no
|
||||||
|
output samples, and signal the error to the caller. **No partial
|
||||||
|
state may propagate** to the next frame's decode — frames are
|
||||||
|
independent (§2), so subsequent frames decode cleanly regardless.
|
||||||
|
|
||||||
|
### 6.1 Caller-side silence substitution
|
||||||
|
|
||||||
|
On rejection, the caller substitutes `frame_sample_count` zeros
|
||||||
|
(silence) for the frame period. The count is obtained as follows:
|
||||||
|
|
||||||
|
- **Post-header rejections** (classes 8-10 above — `Truncated` in the
|
||||||
|
Rice bitstream, `InvalidParameter` during Rice decode): the frame
|
||||||
|
header parsed successfully before the failure, so the count is
|
||||||
|
recoverable. The caller re-parses just the header on the same buffer
|
||||||
|
(reference API: `parse_header(data)`) and reads `frame_sample_count`
|
||||||
|
from the resulting `AudioFrameHeader`.
|
||||||
|
- **Pre-header rejections** (classes 1-7 above): the header itself
|
||||||
|
failed; the frame length is not recoverable from the bitstream. The
|
||||||
|
caller **must** fall back to a session-level default frame size
|
||||||
|
carried out-of-band by the container or transport (WebRTC and QUIC
|
||||||
|
audio sessions typically negotiate this at session setup).
|
||||||
|
|
||||||
|
This asymmetry is inherent to the wire format: `frame_sample_count`
|
||||||
|
lives inside the header at offset 5, so any rejection that happens
|
||||||
|
while parsing bytes 0-4 precedes its discovery.
|
||||||
|
|
||||||
|
### 6.2 Decoder output magnitude
|
||||||
|
|
||||||
|
On well-formed bitstreams produced by a compliant encoder from
|
||||||
|
in-contract samples (§1), decoder output satisfies
|
||||||
|
`|sample| ≤ 2²³ − 1`.
|
||||||
|
|
||||||
|
Adversarial bitstreams — those with hand-crafted coefficients and
|
||||||
|
residuals that pass every rejection check in this section yet
|
||||||
|
produce arithmetic results outside the sample-magnitude contract —
|
||||||
|
**may** produce output samples of any `i32` value, including values
|
||||||
|
that exceed `2²³ − 1`. The decoder **must not** panic or reject on
|
||||||
|
this basis: the wrapping-add semantics of §3.6 are precisely what
|
||||||
|
makes every bit sequence produce a defined output, which is the
|
||||||
|
ground of the "no partial state propagates" contract at the top of
|
||||||
|
this section.
|
||||||
|
|
||||||
|
Callers that re-feed decoder output into LAC's encoder (for example,
|
||||||
|
an MCU decode → PCM mix → re-encode pipeline) **should** validate or
|
||||||
|
clamp to the input magnitude contract before re-encoding. A
|
||||||
|
compliant encoder assumes its input satisfies `|sample| ≤ 2²³ − 1`
|
||||||
|
and is not required to re-validate.
|
||||||
|
|
||||||
|
## 7. Encoder Guidance (non-normative)
|
||||||
|
|
||||||
|
The reference encoder's search has three phases:
|
||||||
|
|
||||||
|
```text
|
||||||
|
# Phase 0: all-zero short-circuit
|
||||||
|
R[0] = Σ sample[i]²
|
||||||
|
if R[0] == 0:
|
||||||
|
emit frame with prediction_order = 0, any legal partition_order (§5.1)
|
||||||
|
return
|
||||||
|
|
||||||
|
# Phase 1: sparse LPC order grid with stop-when-stale early-out
|
||||||
|
for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]:
|
||||||
|
coeffs_q31 = levinson_durbin(samples, prediction_order) # cached
|
||||||
|
shift = smallest s such that every |coeff_real| < 2^s
|
||||||
|
coeffs_stored = quantize(coeffs_q31, to: Q(15 - shift))
|
||||||
|
residuals = compute_residuals(samples, coeffs_stored, shift)
|
||||||
|
for partition_order in 0..=7:
|
||||||
|
if frame_sample_count % (1 << partition_order) != 0: continue
|
||||||
|
rice_bits = estimate_cost(residuals, partition_order)
|
||||||
|
total = header_bits(prediction_order) + rice_bits
|
||||||
|
track minimum over (prediction_order, partition_order)
|
||||||
|
if no improvement for 2 consecutive grid entries: break
|
||||||
|
|
||||||
|
# Phase 2: fixed-predictor post-pass
|
||||||
|
for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS:
|
||||||
|
residuals = compute_residuals(samples, fp_coeffs, fp_shift)
|
||||||
|
for partition_order in 0..=7:
|
||||||
|
if frame_sample_count % (1 << partition_order) != 0: continue
|
||||||
|
rice_bits = estimate_cost(residuals, partition_order)
|
||||||
|
total = header_bits(fp_order) + rice_bits
|
||||||
|
track minimum over (fp_order, partition_order)
|
||||||
|
|
||||||
|
emit frame with the (order, partition_order) that minimised `total`
|
||||||
|
```
|
||||||
|
|
||||||
|
The sparse grid + early-out is a speed/compression trade-off; a
|
||||||
|
compliant encoder may still exhaustively search every integer order
|
||||||
|
`0..=32` for marginal gains at higher cost. The produced bitstreams
|
||||||
|
are interchangeable.
|
||||||
|
|
||||||
|
The fixed-predictor post-pass tries FLAC-style integer predictors
|
||||||
|
(orders 1-4 with a small static coefficient table) after the LPC
|
||||||
|
grid. These evaluate quickly and occasionally beat the Levinson-
|
||||||
|
Durbin winner on content where a low-order integer polynomial fits
|
||||||
|
better than the statistically-optimal LPC fit — silent-plus-DC, very
|
||||||
|
smooth tones, polynomial-ish sensor data. Running them second avoids
|
||||||
|
tripping the stop-when-stale heuristic in the LPC phase.
|
||||||
|
|
||||||
|
The reference encoder's `FIXED_PREDICTORS` table, materialising the
|
||||||
|
FLAC-style `[1]`, `[2, −1]`, `[3, −3, 1]`, `[4, −6, 4, −1]` integer
|
||||||
|
predictors at the smallest `coefficient_shift` that represents each
|
||||||
|
coefficient without clamping:
|
||||||
|
|
||||||
|
| `prediction_order` | `lpc_coefficients` (Q-format integers) | `coefficient_shift` | Real-value interpretation |
|
||||||
|
|-------------------:|---------------------------------------------|--------------------:|------------------------------|
|
||||||
|
| 1 | `[16384]` | 1 (Q14) | `[1]` |
|
||||||
|
| 2 | `[16384, −8192]` | 2 (Q13) | `[2, −1]` |
|
||||||
|
| 3 | `[24576, −24576, 8192]` | 2 (Q13) | `[3, −3, 1]` |
|
||||||
|
| 4 | `[16384, −24576, 16384, −4096]` | 3 (Q12) | `[4, −6, 4, −1]` |
|
||||||
|
|
||||||
|
These are the exact wire-format bytes a second-team encoder would emit
|
||||||
|
to match the reference's fixed-predictor outputs bit-for-bit. Compliant
|
||||||
|
encoders MAY use a different set (or none), since §3.6 treats the
|
||||||
|
coefficient field as opaque — decoders apply the synthesis formula
|
||||||
|
identically regardless of source.
|
||||||
|
|
||||||
|
The `R[0] == 0` short-circuit is both a correctness requirement (§5.1
|
||||||
|
— Levinson-Durbin is undefined at zero autocorrelation) and an
|
||||||
|
encoder-cost optimisation: on digital silence, the sparse grid and
|
||||||
|
fixed-predictor pass produce identical zero residuals and order 0
|
||||||
|
wins on header size alone.
|
||||||
|
|
||||||
|
Levinson-Durbin runs once to order 32 with all intermediate orders saved
|
||||||
|
into a flat buffer (one recursion pass yields all orders 1..32 at
|
||||||
|
`O(order²)` cost), so the outer loop fetch is free and order selection
|
||||||
|
is effectively `O(orders_tried × N)`.
|
||||||
|
|
||||||
|
`shift` is determined per order by the coefficient magnitudes — there
|
||||||
|
is no shift search, as smaller shifts are always at least as good as
|
||||||
|
larger ones when they don't clamp (saturation, §3.4, is the
|
||||||
|
fallback). Rice cost at a given `partition_order` is exact and
|
||||||
|
closed-form given the per-partition `k`, so the inner search
|
||||||
|
introduces no estimation error.
|
||||||
|
|
||||||
|
#### Levinson-Durbin numerical choices (reference, non-normative)
|
||||||
|
|
||||||
|
The reference encoder runs Levinson-Durbin with i64 autocorrelation
|
||||||
|
accumulators, Q31 working coefficients, and widens to i128 for
|
||||||
|
reflection-coefficient intermediates at orders where Q31 would lose
|
||||||
|
precision (typically above order ~12). Rounding on the Q31→Q(15−shift)
|
||||||
|
quantisation step is round-half-up via `(a_q31 + bias) >> shift_amt`,
|
||||||
|
with `bias = 1 << (14 + shift)` — the direct analogue of the synthesis
|
||||||
|
formula's rounding (§3.6), chosen so analysis and synthesis agree on
|
||||||
|
tie-break direction.
|
||||||
|
|
||||||
|
None of these choices are normative. Two encoders making different
|
||||||
|
precision or rounding choices will produce different coefficient
|
||||||
|
bytes on the same input, but both bitstreams decode correctly under
|
||||||
|
§3.6 so long as the coefficients they emit faithfully represent their
|
||||||
|
own LPC decision. §3.6's "wire format doesn't distinguish coefficients
|
||||||
|
by derivation" clause is exactly what permits this freedom.
|
||||||
|
|
||||||
|
#### Rice `k` selection (reference, non-normative)
|
||||||
|
|
||||||
|
Cost is closed-form for a given `(partition, k)`: `bit_cost(k) =
|
||||||
|
N · (1 + k) + Σ (v >> k)` where `v` ranges over the zigzag-mapped
|
||||||
|
residuals of the partition. Exhaustive search over `k ∈ [0, 23]` is
|
||||||
|
always acceptable and is the simplest compliant choice.
|
||||||
|
|
||||||
|
The reference encoder uses convex descent: seed `k_seed =
|
||||||
|
⌊log₂(mean(v))⌋`, then walk either direction. Ties break **toward
|
||||||
|
smaller `k`** on descent (condition `cost ≤ best_cost`, so equal
|
||||||
|
costs at `k−1` overwrite `k`) and **strictly larger costs only** on
|
||||||
|
ascent (condition `cost < best_cost`, so equal costs at `k+1` don't
|
||||||
|
override `k`). This yields a unique `k` per partition matching an
|
||||||
|
exhaustive search's first-wins tie-break.
|
||||||
|
|
||||||
|
On the reference corpus, the sparse grid's compression matches
|
||||||
|
exhaustive-search output within ~0.2 percentage points (measured);
|
||||||
|
the differential test caps the acceptable excess at 0.5%.
|
||||||
|
Implementations that prefer tighter compression at higher encode
|
||||||
|
cost can extend the grid without wire-format consequences — decoders
|
||||||
|
do not care which orders an encoder tried.
|
||||||
|
|
||||||
|
### 7.1 Frame size (non-normative)
|
||||||
|
|
||||||
|
The codec accepts any `frame_sample_count` in `[1, 65535]`. Larger
|
||||||
|
frames amortise the 7-byte fixed header and the LPC coefficient vector
|
||||||
|
over more samples, and generally compress better. Smaller frames give
|
||||||
|
tighter latency and finer partition-order granularity on transient
|
||||||
|
content.
|
||||||
|
|
||||||
|
Recommended defaults by use case:
|
||||||
|
|
||||||
|
- **Real-time voice** (QUIC streaming, MCU mix): 160-320 samples at
|
||||||
|
16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used
|
||||||
|
by Opus / WebRTC.
|
||||||
|
- **Real-time full-band** (game audio, music conferencing): 1024-2048
|
||||||
|
at 48 kHz (21-43 ms).
|
||||||
|
- **Offline / archival**: 4096-8192 at 48 kHz; compression gains
|
||||||
|
flatten past this.
|
||||||
|
|
||||||
|
Power-of-two frame sizes expose every `partition_order ∈ [0, 7]` to
|
||||||
|
the encoder's search. Non-power-of-two sizes restrict the search to
|
||||||
|
partition counts that divide evenly; a prime frame size forces
|
||||||
|
`partition_order = 0`. Encoders SHOULD prefer frame sizes with several
|
||||||
|
small-prime factors when free to choose.
|
||||||
|
|
||||||
|
## 8. Versioning
|
||||||
|
|
||||||
|
This document specifies **LAC version 1**, identified on the wire by
|
||||||
|
`sync_word = 0x1ACC`. No per-frame version byte is carried; the sync
|
||||||
|
word uniquely identifies the wire format.
|
||||||
|
|
||||||
|
Future revisions of the format **must** use a distinct `sync_word`. The
|
||||||
|
recommended allocation is `0x1ACD` for v2, `0x1ACE` for v3, and so on
|
||||||
|
inside the `0x1ACC..0x1ACF` cluster, with the cluster boundary making
|
||||||
|
casual grep / hex-dump inspection robust. A revision whose wire format
|
||||||
|
cannot be made compatible with v1 at the header level **must** pick a
|
||||||
|
sync word outside this cluster.
|
||||||
|
|
||||||
|
This approach is exhaustive by construction:
|
||||||
|
|
||||||
|
- A v1 decoder that encounters a v2 frame sees an unrecognised sync
|
||||||
|
word on the first check (§3.1) and rejects cleanly — the same error
|
||||||
|
path as foreign or corrupted payloads.
|
||||||
|
- A v2-aware decoder dispatches on the sync word before reading any
|
||||||
|
further field, so it can fall back to v1 parsing when appropriate
|
||||||
|
or decode v2 frames natively.
|
||||||
|
|
||||||
|
Because every field in §3 has a strict range with no reserved
|
||||||
|
high-bit patterns, in-place extension (flag bits inside existing
|
||||||
|
fields) is **not** a supported evolution path. New features go into a
|
||||||
|
new `sync_word`, not into reinterpreting existing field values.
|
||||||
|
|
||||||
|
Transports that multiplex LAC frames with other formats should frame
|
||||||
|
each LAC payload explicitly (length prefix or stream separator); the
|
||||||
|
sync word alone is not a framing delimiter, only a format identifier.
|
||||||
|
|
||||||
|
## 9. Implementation notes (non-normative)
|
||||||
|
|
||||||
|
### 9.1 GPU offload is out of scope
|
||||||
|
|
||||||
|
LAC is a scalar integer codec. The reference implementation, and any
|
||||||
|
conforming implementation this document anticipates, runs on a CPU.
|
||||||
|
GPU offload is deliberately not a goal:
|
||||||
|
|
||||||
|
- **Levinson-Durbin** is serial by construction (each iteration depends
|
||||||
|
on the previous) and its intermediate accumulator needs more than 64
|
||||||
|
bits of precision at higher orders — a poor fit for WGSL or SPIR-V
|
||||||
|
compute shaders, which have no native 128-bit integer arithmetic.
|
||||||
|
- **Rice decode** uses a data-dependent unary run for every residual;
|
||||||
|
on GPU execution models this diverges warps badly and its
|
||||||
|
sequential bit-cursor progression fights SIMD lane packing.
|
||||||
|
- **LPC synthesis** has a tight per-sample feedback loop (sample `i`
|
||||||
|
depends on samples `i-1`, `i-2`, …, `i-order`), so each channel is
|
||||||
|
inherently serial.
|
||||||
|
- **The one plausibly GPU-parallel phase** — residual computation
|
||||||
|
inside the encoder's order search — is also the phase where the
|
||||||
|
CPU's autovectorized implementation is already well-served by SIMD
|
||||||
|
on any modern target. At the measured encode latencies (P99 under
|
||||||
|
50 µs on x86 for a 20 ms frame period, >400× headroom), there is no
|
||||||
|
motivation to offload it.
|
||||||
|
|
||||||
|
A hypothetical future revision whose hot path genuinely benefited from
|
||||||
|
GPU execution (large-batch archival encoding across many channels at
|
||||||
|
once, for instance) would need to change the wire format to carry
|
||||||
|
enough shape metadata for batched kernels — i.e., a new sync word
|
||||||
|
under the versioning rules in §8, not a retrofit.
|
||||||
|
|
||||||
|
## 10. Conformance test vectors
|
||||||
|
|
||||||
|
The reference repository's `tests/conformance.rs` holds the canonical
|
||||||
|
test-vector set for this specification:
|
||||||
|
|
||||||
|
- **`DECODE_FIXTURES`** — `(samples, bytes)` pairs pinned at the byte
|
||||||
|
level. A conformant decoder **must** produce the `samples` array
|
||||||
|
when fed the `bytes` array. Encoders have latitude (§3.6, §7), so a
|
||||||
|
second-team encoder's bytes for the same samples may differ; the
|
||||||
|
decoder direction is the normative one. Coverage includes the
|
||||||
|
smallest valid frames (single-sample verbatim, 4- and 8-sample
|
||||||
|
silence), single-sample polarity boundaries (±1, ±(2²³−1)), DC
|
||||||
|
offset, alternating-polarity Nyquist-like content, smooth polynomial
|
||||||
|
(fixed-predictor territory), and a 16-sample growing-amplitude
|
||||||
|
pattern exercising partition search.
|
||||||
|
- **`REJECT_FIXTURES`** — hand-constructed malformed inputs mapped to
|
||||||
|
their expected rejection variants. Covers every class in §6 (1-10):
|
||||||
|
bad sync, each field-range violation, verbatim + non-zero shift,
|
||||||
|
`frame_sample_count == 0`, non-divisible partition count, header /
|
||||||
|
coefficient / Rice-bitstream truncation, per-partition `k > 23`.
|
||||||
|
- **`reject_unary_run_above_cap`** — a programmatic test for §6 class
|
||||||
|
10 (`q > u32::MAX >> k`). The minimal triggering payload is ~75
|
||||||
|
bytes of mostly zeros; construction logic is in the test, not a
|
||||||
|
const fixture.
|
||||||
|
|
||||||
|
Second-team implementations should port the decode fixtures
|
||||||
|
byte-for-byte and the reject fixtures byte-and-variant-for-variant.
|
||||||
|
`encode_matches_fixtures` in the same file is reference-specific (it
|
||||||
|
asserts the reference encoder's exact bytes) and is **not** a
|
||||||
|
conformance requirement — see §3.6's encoder-latitude clause.
|
||||||
|
|
||||||
|
### 10.1 Reference encoder exemplars (non-normative)
|
||||||
|
|
||||||
|
The same `(samples, bytes)` pairs, read in the encoder direction,
|
||||||
|
serve as **reference-encoder validation targets** for implementations
|
||||||
|
that want to match this project's reference byte-for-byte — a common
|
||||||
|
goal for porting work even though the spec does not require it. The
|
||||||
|
fixture set is deliberately chosen to pin every encoder-discretion
|
||||||
|
axis from §7:
|
||||||
|
|
||||||
|
- `single_zero`, `single_pos_one`, `single_neg_one` — single-sample
|
||||||
|
frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime
|
||||||
|
where the encoder's order choice is nearly arbitrary; pinning the
|
||||||
|
bytes fixes this project's choice (order 0 with the minimum-cost
|
||||||
|
Rice encoding).
|
||||||
|
- `silence_4`, `silence_8` — force `partition_order` tie-breaks on an
|
||||||
|
all-zero frame (every `partition_order ∈ [0, log₂(N)]` produces
|
||||||
|
identical cost; the reference picks the smallest via its convex-
|
||||||
|
descent tie-break).
|
||||||
|
- `dc_100_4`, `alternating_small_4` — exercise the order-vs-verbatim
|
||||||
|
decision. DC content favours a low-order LPC fit with small
|
||||||
|
residuals; alternating content favours order 1 with `a = −1`
|
||||||
|
(approximated at the closest Q-format). Pinning the bytes fixes
|
||||||
|
the reference's decision boundary.
|
||||||
|
- `single_full_scale_pos`, `single_full_scale_neg` — maximum-magnitude
|
||||||
|
single samples. Exercise the `|sample| ≤ 2²³ − 1` boundary on both
|
||||||
|
sides and fix the zigzag-of-extremum output.
|
||||||
|
- `linear_ramp_8` — smooth polynomial content, fixed-predictor
|
||||||
|
territory. Pins the reference's fixed-predictor-vs-LPC tie-break.
|
||||||
|
- `lfsr_noise_16` — exercises partition search on a frame large
|
||||||
|
enough for `partition_order > 0` to be competitive.
|
||||||
|
|
||||||
|
A second-team encoder that produces the same bytes for every entry
|
||||||
|
here is **likely** (not guaranteed) to produce matching bytes on
|
||||||
|
wider inputs, since the tie-break axes are the ones most sensitive
|
||||||
|
to encoder discretion. An encoder that produces different bytes is
|
||||||
|
still compliant so long as its own bytes round-trip — see §3.6, §7.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue