lac/Specification.md
Kamal Tufekcic 7862cb1d9d
All checks were successful
CI / lint (push) Successful in 5s
CI / fuzz-regression (push) Successful in 14s
CI / build (push) Successful in 4s
CI / test (push) Successful in 6m54s
CI / publish (push) Successful in 8s
initial commit
Signed-off-by: Kamal Tufekcic <kamal@lo.sh>
2026-04-23 14:58:32 +03:00

36 KiB
Raw Blame History

LAC Wire Format

Normative specification of the LAC bitstream. This document is the authority on byte layout, field semantics, and encoder/decoder constraints.

1. Conventions

  • All multi-byte integer fields are big-endian.
  • Bit streams are MSB-first: the first bit written occupies bit 7 of its byte, subsequent bits fill lower positions, and a new byte begins once eight bits have been emitted.
  • Samples are signed integers passed as i32 with magnitude bounded by |sample| ≤ 2²³ 1. The upper 9 bits of each i32 must be a consistent sign extension of the 24-bit-magnitude value. Narrower source formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly — they trivially satisfy the magnitude bound — and compress at the bit cost of their actual values, not a 24-bit ceiling. The codec does not carry bit-depth metadata; the container or application layer is responsible for remembering the source format.
  • Sample rate is not part of the bitstream. The container or transport carries it.
  • A frame encodes one channel. Stereo is two independent streams of frames, one per channel.

2. Frame Layout

A frame is a contiguous byte sequence:

+--------+--------------------+
| header | rice_bitstream     |
+--------+--------------------+

The header is fixed-structure (variable length because the coefficient array depends on prediction_order). The rice_bitstream is a bit-packed payload; its byte length is ceil(total_rice_bits / 8) with zero padding in the low bits of the last byte.

Decoder input is the complete frame. There is no intra-frame continuation or fragmentation — the transport layer handles that.

3. Frame Header

 Offset  Size  Field                 Type     Constraint
 ------  ----  --------------------  -------  ----------------------------
   0-1   2     sync_word             u16 BE   == 0x1ACC
   2     1     prediction_order      u8       ∈ [0, 32]
   3     1     partition_order       u8       ∈ [0, 7]
   4     1     coefficient_shift     u8       ∈ [0, 5]
   5-6   2     frame_sample_count    u16 BE   ≥ 1, % (1 << partition_order) == 0
   7+    2·p   lpc_coefficients      i16 BE[] length = prediction_order = p

Total header length: 7 + 2 · prediction_order bytes.

3.1 sync_word

Fixed value 0x1ACC. Present to identify a LAC frame on lightly framed transports and to reject foreign payloads at the first check. Decoders must reject any frame whose first two bytes are not 0x1ACC.

3.2 prediction_order

Integer order of the LPC analysis filter used to produce residuals.

  • Value 0 is verbatim mode: no prediction, residuals equal the samples, and the lpc_coefficients array is empty (zero bytes).
  • Values 1 through 32 are standard LPC orders; lpc_coefficients carries exactly that many predictor coefficients, interpreted in the Q-format determined by coefficient_shift (§3.4).

Decoders must reject values above 32.

3.3 partition_order

Controls how the residual stream is split for Rice coding.

  • partition_count = 1 << partition_order.
  • The residual stream is divided into partition_count equal partitions of frame_sample_count / partition_count samples each.

Decoders must reject values above 7 and must reject frames where frame_sample_count is not a multiple of partition_count.

3.4 coefficient_shift

Controls the fixed-point scale of the stored Q-format LPC predictor coefficients. Coefficients are stored as 16-bit integers interpreted as Q(15 coefficient_shift):

shift Q-format Real-value range Use case
0 Q15 [1, 1) Coefficients with magnitude < 1 (most orders > 1)
1 Q14 [2, 2) Low-frequency content, a[1] near 2
2 Q13 [4, 4) Extreme bass / narrow resonances
3 Q12 [8, 8) Pathological transients
4 Q11 [16, 16) Reserved for synthetic signals
5 Q10 [32, 32) Upper bound; decoder rejects larger values

The encoder must select the smallest coefficient_shift at which no coefficient's real value exceeds the representable range for that shift — i.e., the smallest scale that does not clamp. Smaller shifts give finer precision and thus smaller residuals when no clamping is required.

If no coefficient_shift ∈ [0, 5] suffices (the real coefficient magnitude exceeds the Q10 range at shift = 5), the encoder saturates each offending coefficient independently to the i16 range [32768, 32767]. Bit-exact round-trip is preserved because the decoder applies the synthesis formula to whatever 16-bit values the wire carries; the cost of saturation is compression — the predictor no longer matches the encoder's ideal coefficients, so residuals grow. Real audio at the input-magnitude contract (§1) rarely reaches this case; synthetic or adversarial inputs can force it.

Decoders must reject values above 5. The shift applies uniformly to every coefficient in lpc_coefficients; there is no per-coefficient scale.

When prediction_order == 0 (verbatim frame), coefficient_shift must be 0. The shift only modifies how stored coefficients are interpreted, and a verbatim frame stores none. Decoders must reject frames with prediction_order == 0 and coefficient_shift != 0 as malformed; this rule closes the space of legal but meaningless headers so two implementations agree bit-for-bit on which inputs round-trip.

3.5 frame_sample_count

Number of audio samples produced by this frame (also the number of residuals in the Rice bitstream). The value must be in [1, 65535].

Decoders must reject frame_sample_count == 0: a zero-sample frame trivially satisfies the partition-divisibility check below (0 mod n == 0 for any n) but carries no audio and has no legal Rice payload.

For compliance with partition_order, the value must satisfy frame_sample_count mod (1 << partition_order) == 0. Decoders must reject frames where this does not hold.

3.6 lpc_coefficients

Array of prediction_order predictor coefficients, each a 16-bit big-endian signed integer interpreted in Q(15 coefficient_shift) format — see §3.4 for the shift semantics.

The wire format does not distinguish coefficients by derivation. The synthesis formula below applies identically whether the encoder obtained the values from Levinson-Durbin analysis, from a fixed coefficient template (e.g. FLAC-style integer predictors), from a trained model, or from any other strategy. What goes on the wire is just prediction_order 16-bit integers; how the encoder chose them is encoder-internal and not observable to the decoder.

Synthesis formula (applied in the decoder):

s           = 15  coefficient_shift
bias        = 1 << (s  1)
predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i  j  1] + bias) >> s
sample[i]  = residual[i] + predict[i]

where terms = min(i, prediction_order). The + bias term implements round-to-nearest for the right shift and is required for bit-exact decoding. For the default coefficient_shift = 0: s = 15, bias = 16384.

The >> s operator must be an arithmetic right shift on signed integers — equivalent to floor division by 2^s. Combined with the + bias pre-add, this implements round-half toward +∞: on a value whose scaled form is exactly k + 0.5, the result is k + 1 for both positive and negative k. Implementations using truncating integer division (C's / on signed integers, which rounds toward zero) will diverge from this on any sum + bias that is negative and not evenly divisible by 2^s: arithmetic shift rounds further from zero, truncating division rounds toward zero. Concrete example: at s = 15, sum = -32769, bias = 16384, arithmetic shift gives (-16385) >> 15 = -1, truncating division gives -16385 / 32768 = 0. Decoders in languages whose native integer division does not floor must emulate arithmetic right shift explicitly on the signed accumulator.

Accumulator width

The inner sum Σ coeff[j] · sample[i j 1] must be computed in a signed integer accumulator of at least 49 bits (equivalently: an i64 or wider). Worst-case bounds at prediction_order = 32, coefficient_shift = 5 (Q10), and full-scale samples give a product of magnitude (2¹⁵) · (2²³ 1) ≈ 2³⁸ per term, summed over 32 terms for a maximum of ~2⁴³. Adding the bias keeps the result below 2⁴⁴. A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs — implementations that reach for int32_t because samples are 32-bit will silently corrupt high-order frames.

JavaScript / TypeScript implementers should note that Number is an IEEE 754 double, not a signed integer: its 53-bit safe-integer range covers in-contract accumulator values, but adversarial bitstreams (see §6.2) can produce out-of-contract samples whose synthesis arithmetic lands in the 2⁴⁹2⁵¹ range and beyond, where Number silently loses low bits to float rounding. For bit-exact spec compliance in JS/TS, use BigInt for the accumulator — it has the integer semantics the spec requires; Number does not.

Warm-up (terms == 0)

When i == 0, terms = min(0, prediction_order) = 0. The sum is empty and predict[0] = 0 — the (0 + bias) >> s formula is not applied. Stating this explicitly avoids an implementation that mechanically applies the formula in the warm-up case and produces predict[0] = bias >> s, which is zero only in specific (bias, s) parametrisations and surprising in any other.

For 0 < i < prediction_order, the sum truncates to the available i predecessors (terms = i). The formula applies as stated.

Sign convention for stored coefficients

The synthesis formula uses . Classical Levinson-Durbin implementations that derive LPC from the error-prediction AR model

x[n] = −Σ a[j] · x[n-j] + e[n]      (error convention)

produce coefficients a[j] whose sign is the opposite of what the synthesis formula expects; those encoders must negate before quantisation so the wire value is coeff[j-1] = a[j]. Implementations using the predictor convention

x̂[n] = +Σ c[j] · x[n-j]             (predictor convention)

store c[j] directly.

Both conventions are common in DSP literature. Encoders must verify that the coefficients emitted on the wire, when substituted into the synthesis formula above, reproduce the encoder's own prediction. The reference implementation uses the error convention and negates at quantisation time.

Overflow semantics of the final add

The residual[i] + predict[i] add is specified as a wrapping i32 add (two's complement, modulo 2³²; in languages without native signed-overflow semantics, compute (residual + predict) & 0xFFFFFFFF and then re-interpret as a signed 32-bit integer via sign-extension of bit 31). On well-formed bitstreams — those produced by a compliant encoder from in-contract samples (§1) — the result stays inside the sample-magnitude contract and the wrap is never observable. Adversarial bitstreams with crafted coefficients and residuals may produce any i32 value; the decoder must not panic, abort, or reject on the basis of this add's result. The consequences of out-of-contract decoder output are addressed in §6.2.

4. Rice Bitstream

Immediately follows the header. Flat MSB-first bitstream structured as consecutive partition payloads:

+---------+---------+-----+-----------+
| part. 0 | part. 1 | ... | part. P-1 |
+---------+---------+-----+-----------+

where P = 1 << partition_order. Each partition has the same structure:

+-------+-------------+-------------+-----+-------------+
| k (5) | codeword 0  | codeword 1  | ... | codeword M-1|
+-------+-------------+-------------+-----+-------------+

where M = frame_sample_count / P is the per-partition residual count.

Partitions are bit-contiguous: the 5-bit k field of partition i + 1 begins at the bit immediately following the last codeword of partition i. There is no byte alignment between partitions. Only the final trailing padding described in §4.3 is byte-aligned.

Within a partition, the bit cursor likewise advances continuously: codeword 0 begins at the bit immediately following the 5-bit k field, codeword 1 immediately after codeword 0's remainder bit, and so on. A conformant decoder maintains a single bit-read position across the entire Rice bitstream — from the k of partition 0 through the last codeword of partition P1 — and never realigns to a byte or bit boundary between fields. This is implicit in the byte-stream decoder design (a bit reader that consumes bits sequentially needs no special handling at field boundaries) but stated here so second-team implementations do not introduce a spurious alignment.

4.1 Per-Partition Parameter k

Five-bit unsigned integer, MSB-first, immediately before the partition's codewords. k is the Rice parameter for this partition and must be in [0, 23]. Decoders must reject values above 23 as malformed.

4.2 Codeword

Each residual is encoded by:

  1. Zigzag mapping from signed to unsigned:

    z = (r << 1) ^ (r >> 31)            interpreted as u32
    

    where (r >> 31) is an arithmetic right shift on the i32 residual, sign-extending the sign bit to all 32 bit positions — 0 for non-negative r, -1 (all ones) for negative r. The entire expression is masked to 32 bits before being interpreted as u32; in languages with arbitrary-precision integers (e.g. Python) or where native bitwise ops return signed 32-bit (e.g. JavaScript / TypeScript Number, where (x ^ y) >>> 0 coerces to u32), this mask is explicit (& 0xFFFFFFFF or >>> 0) and required — without it, negative residuals produce zigzag values with extra high bits set. Implementations in languages whose native right shift is logical on unsigned types must coerce r to a signed 32-bit type first; implementations in languages where >> on signed types is implementation-defined (e.g. pre-C++20 C/C++) must emulate the arithmetic shift explicitly.

    The r << 1 factor is always safe on i32. Residuals on the encoder side are bounded by |r| ≤ |sample| + |predict| ≤ 2·2²³ ≈ 2²⁴, so r << 1 has magnitude ≤ 2²⁵ and fits in i32 without overflow or undefined behaviour — even for r at the most negative value the encoder can ever produce from in-contract input (§1). Decoder-side code does not perform this shift; the inverse uses z >> 1 on a u32, which is always defined.

    The mapping sends

    {0, 1, 1, 2, 2, 3, 3, …}  →  {0, 1, 2, 3, 4, 5, 6, …}
    

    so small magnitudes of either sign map to small unsigned values.

    The decoder's inverse is

    r = ((z >> 1) as i32) ^ ((z & 1) as i32)
    

    where both shifts here are natural (unsigned-u32 logical for the first, integer negation for the second). Stating this inverse explicitly removes any ambiguity about how an implementation must invert the zigzag.

  2. Rice code at parameter k:

    • Unary part: q = z >> k zero-bits followed by a single terminating 1-bit.
    • Remainder part: k bits of z & ((1 << k) 1), MSB-first. (The remainder part is absent when k == 0.)

Total codeword length: q + 1 + k bits.

Decoder-side unary-run bound

Decoders must reject any codeword whose unary run length satisfies

q > (2³²  1) >> k         (equivalently, q > u32::MAX >> k)

A valid codeword reconstructs z = (q << k) | remainder as a u32. q > u32::MAX >> k implies q << k ≥ 2³², which either overflows u32 silently (a critical decoder bug class — corrupt output with no error) or indicates a malformed stream. Either way the frame must be discarded with InvalidParameter or equivalent rejection class (§6). The bound varies with k: at k = 23 it is 511, at k = 0 it is u32::MAX (no practical constraint).

This rule also caps the CPU cost of unary scanning on adversarial input: without the cap, a decoder could be forced to scan an arbitrarily long run of zero bits before reaching either a 1 or the buffer end.

4.3 Byte Padding

After the last codeword of the last partition, any unused bits of the final byte are zero-padded on the LSB side. The encoder writes 0 for all padding bits; the decoder ignores them.

4.4 Bitstream Length

The Rice bitstream's total bit length is the sum of codeword bit lengths across every partition, plus 5 bits per partition for the k fields:

total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition)

This depends on every residual's quotient and cannot be computed from the frame header alone. A decoder streams-decodes until it has produced exactly frame_sample_count samples, then stops. Any bits remaining inside the last byte are padding (§4.3) and carry no information.

A decoder must not require the Rice bitstream's byte length to be signalled out-of-band. The header plus the zero-padded byte-aligned tail fully determines the frame boundary; parse_header's bytes_consumed return plus streaming Rice decode locates the end of the frame in the input buffer.

5. Degenerate Cases

5.1 All-Zero Frame

For an all-zero sample vector, the encoder must use prediction_order = 0 because the Levinson-Durbin recursion is undefined at R[0] = 0. Residuals equal the input (all zeros).

Partition-order and per-partition k selection remain at the encoder's discretion; any legal (partition_order, k) combination produces a bit-exact-decodable frame. The minimum-cost choice — partition_order = 0, k = 0 — produces a Rice payload of exactly 5 + frame_sample_count bits. Compliant encoders are not required to pick this minimum.

5.2 Single-Sample Frame

frame_sample_count = 1 is valid but forces partition_order = 0 (the only value that divides 1 evenly). The single sample is Rice-coded directly because no predecessors exist for any LPC order.

6. Error Recovery

Decoders must detect and reject every frame that violates the constraints elsewhere in this document. The exhaustive list of rejection classes, each of which is a distinct error condition so callers can distinguish them in telemetry, is:

  1. Sync word mismatch — bytes 0-1 differ from 0x1ACC (§3.1).
  2. prediction_order out of range — value > 32 (§3.2).
  3. partition_order out of range — value > 7 (§3.3).
  4. coefficient_shift out of range — value > 5 (§3.4).
  5. Verbatim frame with non-zero shiftprediction_order == 0 and coefficient_shift != 0 (§3.4).
  6. frame_sample_count == 0 (§3.5).
  7. frame_sample_count not divisible by partition_count (§3.3).
  8. Buffer truncated — fewer bytes than the header plus coefficient array requires, or fewer bits than the Rice bitstream demands during streaming decode. This class is intentionally coarse-grained: a single Truncated variant covers header truncation, missing k fields, mid-codeword exhaustion, and every other "buffer ends early" shape the decoder can encounter. Sub-categorising these provides no caller benefit — the recovery action (discard and substitute silence, see §6.1) is identical regardless of where truncation happened.
  9. Per-partition k out of range — value > 23 (§4.1).
  10. Unary-run cap exceeded — any codeword with q > u32::MAX >> k (§4.2).

On any of these, the decoder must discard the frame, produce no output samples, and signal the error to the caller. No partial state may propagate to the next frame's decode — frames are independent (§2), so subsequent frames decode cleanly regardless.

6.1 Caller-side silence substitution

On rejection, the caller substitutes frame_sample_count zeros (silence) for the frame period. The count is obtained as follows:

  • Post-header rejections (classes 8-10 above — Truncated in the Rice bitstream, InvalidParameter during Rice decode): the frame header parsed successfully before the failure, so the count is recoverable. The caller re-parses just the header on the same buffer (reference API: parse_header(data)) and reads frame_sample_count from the resulting AudioFrameHeader.
  • Pre-header rejections (classes 1-7 above): the header itself failed; the frame length is not recoverable from the bitstream. The caller must fall back to a session-level default frame size carried out-of-band by the container or transport (WebRTC and QUIC audio sessions typically negotiate this at session setup).

This asymmetry is inherent to the wire format: frame_sample_count lives inside the header at offset 5, so any rejection that happens while parsing bytes 0-4 precedes its discovery.

6.2 Decoder output magnitude

On well-formed bitstreams produced by a compliant encoder from in-contract samples (§1), decoder output satisfies |sample| ≤ 2²³ 1.

Adversarial bitstreams — those with hand-crafted coefficients and residuals that pass every rejection check in this section yet produce arithmetic results outside the sample-magnitude contract — may produce output samples of any i32 value, including values that exceed 2²³ 1. The decoder must not panic or reject on this basis: the wrapping-add semantics of §3.6 are precisely what makes every bit sequence produce a defined output, which is the ground of the "no partial state propagates" contract at the top of this section.

Callers that re-feed decoder output into LAC's encoder (for example, an MCU decode → PCM mix → re-encode pipeline) should validate or clamp to the input magnitude contract before re-encoding. A compliant encoder assumes its input satisfies |sample| ≤ 2²³ 1 and is not required to re-validate.

7. Encoder Guidance (non-normative)

The reference encoder's search has three phases:

# Phase 0: all-zero short-circuit
R[0] = Σ sample[i]²
if R[0] == 0:
    emit frame with prediction_order = 0, any legal partition_order (§5.1)
    return

# Phase 1: sparse LPC order grid with stop-when-stale early-out
for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]:
    coeffs_q31     = levinson_durbin(samples, prediction_order)  # cached
    shift          = smallest s such that every |coeff_real| < 2^s
    coeffs_stored  = quantize(coeffs_q31, to: Q(15 - shift))
    residuals      = compute_residuals(samples, coeffs_stored, shift)
    for partition_order in 0..=7:
        if frame_sample_count % (1 << partition_order) != 0: continue
        rice_bits = estimate_cost(residuals, partition_order)
        total     = header_bits(prediction_order) + rice_bits
        track minimum over (prediction_order, partition_order)
    if no improvement for 2 consecutive grid entries: break

# Phase 2: fixed-predictor post-pass
for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS:
    residuals = compute_residuals(samples, fp_coeffs, fp_shift)
    for partition_order in 0..=7:
        if frame_sample_count % (1 << partition_order) != 0: continue
        rice_bits = estimate_cost(residuals, partition_order)
        total     = header_bits(fp_order) + rice_bits
        track minimum over (fp_order, partition_order)

emit frame with the (order, partition_order) that minimised `total`

The sparse grid + early-out is a speed/compression trade-off; a compliant encoder may still exhaustively search every integer order 0..=32 for marginal gains at higher cost. The produced bitstreams are interchangeable.

The fixed-predictor post-pass tries FLAC-style integer predictors (orders 1-4 with a small static coefficient table) after the LPC grid. These evaluate quickly and occasionally beat the Levinson- Durbin winner on content where a low-order integer polynomial fits better than the statistically-optimal LPC fit — silent-plus-DC, very smooth tones, polynomial-ish sensor data. Running them second avoids tripping the stop-when-stale heuristic in the LPC phase.

The reference encoder's FIXED_PREDICTORS table, materialising the FLAC-style [1], [2, 1], [3, 3, 1], [4, 6, 4, 1] integer predictors at the smallest coefficient_shift that represents each coefficient without clamping:

prediction_order lpc_coefficients (Q-format integers) coefficient_shift Real-value interpretation
1 [16384] 1 (Q14) [1]
2 [16384, 8192] 2 (Q13) [2, 1]
3 [24576, 24576, 8192] 2 (Q13) [3, 3, 1]
4 [16384, 24576, 16384, 4096] 3 (Q12) [4, 6, 4, 1]

These are the exact wire-format bytes a second-team encoder would emit to match the reference's fixed-predictor outputs bit-for-bit. Compliant encoders MAY use a different set (or none), since §3.6 treats the coefficient field as opaque — decoders apply the synthesis formula identically regardless of source.

The R[0] == 0 short-circuit is both a correctness requirement (§5.1 — Levinson-Durbin is undefined at zero autocorrelation) and an encoder-cost optimisation: on digital silence, the sparse grid and fixed-predictor pass produce identical zero residuals and order 0 wins on header size alone.

Levinson-Durbin runs once to order 32 with all intermediate orders saved into a flat buffer (one recursion pass yields all orders 1..32 at O(order²) cost), so the outer loop fetch is free and order selection is effectively O(orders_tried × N).

shift is determined per order by the coefficient magnitudes — there is no shift search, as smaller shifts are always at least as good as larger ones when they don't clamp (saturation, §3.4, is the fallback). Rice cost at a given partition_order is exact and closed-form given the per-partition k, so the inner search introduces no estimation error.

Levinson-Durbin numerical choices (reference, non-normative)

The reference encoder runs Levinson-Durbin with i64 autocorrelation accumulators, Q31 working coefficients, and widens to i128 for reflection-coefficient intermediates at orders where Q31 would lose precision (typically above order ~12). Rounding on the Q31→Q(15shift) quantisation step is round-half-up via (a_q31 + bias) >> shift_amt, with bias = 1 << (14 + shift) — the direct analogue of the synthesis formula's rounding (§3.6), chosen so analysis and synthesis agree on tie-break direction.

None of these choices are normative. Two encoders making different precision or rounding choices will produce different coefficient bytes on the same input, but both bitstreams decode correctly under §3.6 so long as the coefficients they emit faithfully represent their own LPC decision. §3.6's "wire format doesn't distinguish coefficients by derivation" clause is exactly what permits this freedom.

Rice k selection (reference, non-normative)

Cost is closed-form for a given (partition, k): bit_cost(k) = N · (1 + k) + Σ (v >> k) where v ranges over the zigzag-mapped residuals of the partition. Exhaustive search over k ∈ [0, 23] is always acceptable and is the simplest compliant choice.

The reference encoder uses convex descent: seed k_seed = ⌊log₂(mean(v))⌋, then walk either direction. Ties break toward smaller k on descent (condition cost ≤ best_cost, so equal costs at k1 overwrite k) and strictly larger costs only on ascent (condition cost < best_cost, so equal costs at k+1 don't override k). This yields a unique k per partition matching an exhaustive search's first-wins tie-break.

On the reference corpus, the sparse grid's compression matches exhaustive-search output within ~0.2 percentage points (measured); the differential test caps the acceptable excess at 0.5%. Implementations that prefer tighter compression at higher encode cost can extend the grid without wire-format consequences — decoders do not care which orders an encoder tried.

7.1 Frame size (non-normative)

The codec accepts any frame_sample_count in [1, 65535]. Larger frames amortise the 7-byte fixed header and the LPC coefficient vector over more samples, and generally compress better. Smaller frames give tighter latency and finer partition-order granularity on transient content.

Recommended defaults by use case:

  • Real-time voice (QUIC streaming, MCU mix): 160-320 samples at 16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used by Opus / WebRTC.
  • Real-time full-band (game audio, music conferencing): 1024-2048 at 48 kHz (21-43 ms).
  • Offline / archival: 4096-8192 at 48 kHz; compression gains flatten past this.

Power-of-two frame sizes expose every partition_order ∈ [0, 7] to the encoder's search. Non-power-of-two sizes restrict the search to partition counts that divide evenly; a prime frame size forces partition_order = 0. Encoders SHOULD prefer frame sizes with several small-prime factors when free to choose.

8. Versioning

This document specifies LAC version 1, identified on the wire by sync_word = 0x1ACC. No per-frame version byte is carried; the sync word uniquely identifies the wire format.

Future revisions of the format must use a distinct sync_word. The recommended allocation is 0x1ACD for v2, 0x1ACE for v3, and so on inside the 0x1ACC..0x1ACF cluster, with the cluster boundary making casual grep / hex-dump inspection robust. A revision whose wire format cannot be made compatible with v1 at the header level must pick a sync word outside this cluster.

This approach is exhaustive by construction:

  • A v1 decoder that encounters a v2 frame sees an unrecognised sync word on the first check (§3.1) and rejects cleanly — the same error path as foreign or corrupted payloads.
  • A v2-aware decoder dispatches on the sync word before reading any further field, so it can fall back to v1 parsing when appropriate or decode v2 frames natively.

Because every field in §3 has a strict range with no reserved high-bit patterns, in-place extension (flag bits inside existing fields) is not a supported evolution path. New features go into a new sync_word, not into reinterpreting existing field values.

Transports that multiplex LAC frames with other formats should frame each LAC payload explicitly (length prefix or stream separator); the sync word alone is not a framing delimiter, only a format identifier.

9. Implementation notes (non-normative)

9.1 GPU offload is out of scope

LAC is a scalar integer codec. The reference implementation, and any conforming implementation this document anticipates, runs on a CPU. GPU offload is deliberately not a goal:

  • Levinson-Durbin is serial by construction (each iteration depends on the previous) and its intermediate accumulator needs more than 64 bits of precision at higher orders — a poor fit for WGSL or SPIR-V compute shaders, which have no native 128-bit integer arithmetic.
  • Rice decode uses a data-dependent unary run for every residual; on GPU execution models this diverges warps badly and its sequential bit-cursor progression fights SIMD lane packing.
  • LPC synthesis has a tight per-sample feedback loop (sample i depends on samples i-1, i-2, …, i-order), so each channel is inherently serial.
  • The one plausibly GPU-parallel phase — residual computation inside the encoder's order search — is also the phase where the CPU's autovectorized implementation is already well-served by SIMD on any modern target. At the measured encode latencies (P99 under 50 µs on x86 for a 20 ms frame period, >400× headroom), there is no motivation to offload it.

A hypothetical future revision whose hot path genuinely benefited from GPU execution (large-batch archival encoding across many channels at once, for instance) would need to change the wire format to carry enough shape metadata for batched kernels — i.e., a new sync word under the versioning rules in §8, not a retrofit.

10. Conformance test vectors

The reference repository's tests/conformance.rs holds the canonical test-vector set for this specification:

  • DECODE_FIXTURES(samples, bytes) pairs pinned at the byte level. A conformant decoder must produce the samples array when fed the bytes array. Encoders have latitude (§3.6, §7), so a second-team encoder's bytes for the same samples may differ; the decoder direction is the normative one. Coverage includes the smallest valid frames (single-sample verbatim, 4- and 8-sample silence), single-sample polarity boundaries (±1, ±(2²³1)), DC offset, alternating-polarity Nyquist-like content, smooth polynomial (fixed-predictor territory), and a 16-sample growing-amplitude pattern exercising partition search.
  • REJECT_FIXTURES — hand-constructed malformed inputs mapped to their expected rejection variants. Covers every class in §6 (1-10): bad sync, each field-range violation, verbatim + non-zero shift, frame_sample_count == 0, non-divisible partition count, header / coefficient / Rice-bitstream truncation, per-partition k > 23.
  • reject_unary_run_above_cap — a programmatic test for §6 class 10 (q > u32::MAX >> k). The minimal triggering payload is ~75 bytes of mostly zeros; construction logic is in the test, not a const fixture.

Second-team implementations should port the decode fixtures byte-for-byte and the reject fixtures byte-and-variant-for-variant. encode_matches_fixtures in the same file is reference-specific (it asserts the reference encoder's exact bytes) and is not a conformance requirement — see §3.6's encoder-latitude clause.

10.1 Reference encoder exemplars (non-normative)

The same (samples, bytes) pairs, read in the encoder direction, serve as reference-encoder validation targets for implementations that want to match this project's reference byte-for-byte — a common goal for porting work even though the spec does not require it. The fixture set is deliberately chosen to pin every encoder-discretion axis from §7:

  • single_zero, single_pos_one, single_neg_one — single-sample frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime where the encoder's order choice is nearly arbitrary; pinning the bytes fixes this project's choice (order 0 with the minimum-cost Rice encoding).
  • silence_4, silence_8 — force partition_order tie-breaks on an all-zero frame (every partition_order ∈ [0, log₂(N)] produces identical cost; the reference picks the smallest via its convex- descent tie-break).
  • dc_100_4, alternating_small_4 — exercise the order-vs-verbatim decision. DC content favours a low-order LPC fit with small residuals; alternating content favours order 1 with a = 1 (approximated at the closest Q-format). Pinning the bytes fixes the reference's decision boundary.
  • single_full_scale_pos, single_full_scale_neg — maximum-magnitude single samples. Exercise the |sample| ≤ 2²³ 1 boundary on both sides and fix the zigzag-of-extremum output.
  • linear_ramp_8 — smooth polynomial content, fixed-predictor territory. Pins the reference's fixed-predictor-vs-LPC tie-break.
  • lfsr_noise_16 — exercises partition search on a frame large enough for partition_order > 0 to be competitive.

A second-team encoder that produces the same bytes for every entry here is likely (not guaranteed) to produce matching bytes on wider inputs, since the tie-break axes are the ones most sensitive to encoder discretion. An encoder that produces different bytes is still compliant so long as its own bytes round-trip — see §3.6, §7.