initial commit
All checks were successful
CI / lint (push) Successful in 5s
CI / fuzz-regression (push) Successful in 14s
CI / build (push) Successful in 4s
CI / test (push) Successful in 6m54s
CI / publish (push) Successful in 8s

Signed-off-by: Kamal Tufekcic <kamal@lo.sh>
This commit is contained in:
Kamal Tufekcic 2026-04-23 14:58:32 +03:00
commit 7862cb1d9d
No known key found for this signature in database
2884 changed files with 16797 additions and 0 deletions

71
.forgejo/workflows/ci.yml Normal file
View file

@ -0,0 +1,71 @@
name: CI
on:
push:
branches: [main]
tags: ['v*']
pull_request:
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: "-D warnings"
jobs:
lint:
runs-on: lo-runner
steps:
- uses: actions/checkout@v6.0.2
- name: cargo fmt
run: cargo +nightly fmt --all --check
- name: cargo clippy
run: cargo +nightly clippy --all-targets --all-features --message-format=short
- name: cargo doc (no deps)
run: cargo +nightly doc --no-deps --document-private-items
env:
RUSTDOCFLAGS: "-D warnings"
test:
runs-on: lo-runner
steps:
- uses: actions/checkout@v6.0.2
with:
lfs: true
- name: cargo test
run: cargo test
fuzz-regression:
runs-on: lo-runner
steps:
- uses: actions/checkout@v6.0.2
- name: Fuzz regression (decode_arbitrary)
working-directory: fuzz
run: cargo +nightly fuzz run decode_arbitrary -- -runs=0
- name: Fuzz regression (roundtrip_arbitrary)
working-directory: fuzz
run: cargo +nightly fuzz run roundtrip_arbitrary -- -runs=0
build:
runs-on: lo-runner
steps:
- uses: actions/checkout@v6.0.2
- name: cargo build --release
run: cargo build --release
publish:
needs: [lint, test, fuzz-regression, build]
runs-on: lo-runner
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/checkout@v6.0.2
- name: Publish lac
run: cargo publish
env:
CARGO_REGISTRY_TOKEN: ${{ secrets.CRATES_IO_TOKEN }}

2
.gitattributes vendored Normal file
View file

@ -0,0 +1,2 @@
*.wav filter=lfs diff=lfs merge=lfs -text
fuzz/corpus/** filter=lfs diff=lfs merge=lfs -text

2
.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
target/
*.log

16
Cargo.lock generated Normal file
View file

@ -0,0 +1,16 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4
[[package]]
name = "hound"
version = "3.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "62adaabb884c94955b19907d60019f4e145d091c75345379e70d1ee696f7854f"
[[package]]
name = "lac"
version = "0.1.0"
dependencies = [
"hound",
]

38
Cargo.toml Normal file
View file

@ -0,0 +1,38 @@
[package]
name = "lac"
version = "0.1.0"
edition = "2024"
rust-version = "1.87"
license = "AGPL-3.0-only"
description = "Lo Audio Codec — lossless audio codec with LPC + partitioned Rice coding."
repository = "https://git.lo.sh/lo/lac"
homepage = "https://git.lo.sh/lo/lac/wiki"
authors = ["LO Contributors"]
readme = "README.md"
categories = ["compression", "multimedia::audio", "multimedia::encoding", "no-std"]
keywords = ["audio", "codec", "lossless", "lpc", "rice"]
exclude = [
"corpus/*",
"fuzz/*",
]
[lib]
name = "lac"
[features]
# Internal-only feature for exposing crate-private kernels to the
# benchmark harness in `benches/codec.rs`. The `__` prefix signals
# instability — no semver guarantees; contents may change or disappear
# between versions. Every `[[bench]]` block that needs a kernel entry
# point lists this under `required-features`, so `cargo bench`
# automatically enables it without touching normal builds.
__internal-for-bench = []
[dependencies]
[dev-dependencies]
hound = "3"
[[bench]]
name = "codec"
required-features = ["__internal-for-bench"]

661
LICENSE.md Normal file
View file

@ -0,0 +1,661 @@
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU Affero General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Remote Network Interaction; Use with the GNU General Public License.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<https://www.gnu.org/licenses/>.

609
README.md Normal file
View file

@ -0,0 +1,609 @@
# LAC — Lo Audio Codec
Lossless audio codec for internal use. Target compression is FLAC-class
(~50% of raw). Integer-only, bit-exact, streaming-oriented.
## Scope
- **Input**: signed integer PCM passed as `i32` with `|sample| ≤ 2²³ 1`.
8-bit, 16-bit, 20-bit, and 24-bit sources are all valid without
conversion — they compress at the bit cost of their actual values, not
a 24-bit ceiling.
- **Sample rate**: caller-specified; not encoded in the stream. The container
or transport carries it.
- **Channels**: mono per encoded stream. Stereo is two independent mono
streams — for example, two QUIC streams over a shared connection, one per
channel. No cross-channel joint coding.
- **Frames**: independently decodable. No cross-frame state; a lost or corrupt
frame never affects subsequent decodes.
## Pipeline
```text
samples → LPC analysis → residuals → partitioned Rice → frame bytes
(inverse for decode)
```
Three encoder-side choices, searched per frame:
- **LPC order**: the reference encoder tries a sparse grid
`{0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32}` with a 2-order early-out
once cost stops improving. Order 0 is verbatim (residuals equal the raw
samples). The wire format permits any order in `[0, 32]`.
- **Coefficient shift** `∈ [0, 5]`: widens the Q-format of the stored
predictor coefficients from Q15 (range `[1, 1)`) out to Q10 (range
`[32, 32)`) so low-frequency / narrow-resonance content doesn't clamp
`|a[1]|` near 2. Chosen deterministically per order as the smallest
shift that avoids clamping.
- **Rice partition order** `∈ [0, 7]`: splits the residual stream into
`2^partition_order` equal partitions, each with its own Rice parameter
`k ∈ [0, 23]` chosen by convex descent.
Levinson-Durbin runs once up to order 32 into a flat stack-allocated
buffer (`LpcLevels`) and the per-order coefficients are consulted by
slice reference; the order search itself does no heap allocation.
## Intended use
- **QUIC streaming** — one reliable stream per audio channel. Frames fit
the per-stream framing (length-prefixed or datagram-mapped) without
modification.
- **Offline file playback** — a container pairs the channel streams by
timestamp; each stream decodes independently.
## Frame size guidance
Frame size is a latency-vs-compression knob chosen at the application
layer. The codec accepts any `frame_sample_count` in `[1, 65535]`, but
the LPC/Rice search amortises better on larger frames (shared header,
more samples per fitted coefficient vector). Concrete defaults:
| Use case | Frame size | Latency at 48 kHz | Notes |
|---|---|---|---|
| Real-time voice, tight latency | 160 @ 16 kHz (10 ms) | — | matches WebRTC/Opus 10 ms mode |
| Real-time voice, balanced | **320 @ 16 kHz (20 ms)** | — | default for MCU workload in `tests/mcu_mix.rs` |
| Game/conf streaming | **960 @ 48 kHz (20 ms)** | 20 ms | one QUIC datagram per frame fits typical MTUs |
| Music streaming | **2048 @ 48 kHz (43 ms)** | 43 ms | compression benefit flattens past this |
| Offline archival | **4096 @ 48 kHz (85 ms)** | — | tightest LPC fit; default in `tests/corpus.rs`, matches FLAC's default blocksize for apples-to-apples compression comparison |
Partition orders that evenly divide the frame size dominate the search
cost. Power-of-two frame sizes (256, 512, 1024, 2048, 4096) unlock every
`partition_order ∈ [0, 7]`; 960 and 2880 (common WebRTC rates) allow
orders up to 6 and 5 respectively; prime sizes like 137 collapse to
`partition_order = 0`. Prefer power-of-two frame sizes unless a
container format constrains the choice.
## Structure
```text
lac/
├── Cargo.toml
├── README.md ← you are here
├── Specification.md ← wire format specification
├── corpus/ ← test WAVs (speech + music), LFS-tracked via .gitattributes
├── src/
│ ├── lib.rs ← public API and project-wide constants
│ ├── bit_io.rs ← MSB-first bit reader/writer
│ ├── lpc.rs ← Levinson-Durbin, LpcLevels flat buffer, residuals/synthesis
│ ├── rice.rs ← zigzag + partitioned Rice coding, convex-descent k
│ ├── frame.rs ← frame header, encode_frame, decode_frame
│ └── test_signals.rs ← integer-only sine LUT for float-free test inputs
├── tests/
│ ├── corpus.rs ← compression ratio + FLAC comparison on real audio
│ ├── synthetic.rs ← bit-depth + pathological-content round-trips, no corpus needed
│ ├── latency.rs ← P50/P95/P99/max encode+decode latency, peak heap, alloc count
│ └── mcu_mix.rs ← end-to-end MCU workload (decode → mix → re-encode)
├── benches/
│ ├── codec.rs ← nightly #[bench] harness (encode, decode, compute_residuals)
│ └── compare-flac.sh ← diagnostic shell script: wall-clock flac encode across corpus
└── fuzz/
├── fuzz_targets/
│ ├── decode_arbitrary.rs ← decoder robustness under arbitrary bytes
│ └── roundtrip_arbitrary.rs ← encoder/decoder self-consistency
└── dict/
├── decode_arbitrary.dict ← libFuzzer dict: sync word + field boundary constants
└── roundtrip_arbitrary.dict ← libFuzzer dict: sample-value boundaries
```
See `Specification.md` for the normative wire format.
## Public API
Every sample is an `i32` with magnitude bounded by `2²³ 1`. Narrower
integer sources go through unchanged:
```rust
use lac::{encode_frame, decode_frame};
// 16-bit microphone PCM → just widen with `i32::from`. Do NOT shift
// left by 8 to "align" to 24-bit: that multiplies residual magnitudes
// by 256 and costs 8 extra bits per residual in the Rice payload. The
// codec compresses at the bit cost of the actual sample magnitudes,
// not a 24-bit ceiling.
let pcm_16: Vec<i16> = /* from microphone */ Vec::new();
let samples: Vec<i32> = pcm_16.iter().map(|&s| i32::from(s)).collect();
let bytes = encode_frame(&samples);
let recovered: Vec<i32> = decode_frame(&bytes)?;
assert_eq!(recovered, samples);
# Ok::<(), lac::DecodeError>(())
```
For 24-bit PCM, samples are already in range — pass through directly.
For 8-bit PCM, `i32::from(s as i8)` (signed) or the equivalent from your
unsigned-offset-128 source.
Round-trip is bit-exact: `decode_frame(encode_frame(s)) == s` for every
valid `s`.
### Buffer-reusing API for hot loops
For the MCU re-encode fanout and QUIC senders that own a per-channel
scratch buffer, use [`encode_frame_into`] / [`decode_frame_into`] to
target a caller-owned `Vec<u8>` / `Vec<i32>` instead of allocating
fresh on each call:
```rust
use lac::{encode_frame_into, decode_frame_into};
let mut encoded = Vec::new(); // one buffer per channel, reused across frames
let mut decoded = Vec::new();
for frame_samples in frames_iter() {
encode_frame_into(&frame_samples, &mut encoded);
// … send `encoded`
}
for incoming_bytes in incoming_iter() {
decode_frame_into(&incoming_bytes, &mut decoded)?;
// … consume `decoded`
}
# fn frames_iter() -> impl Iterator<Item = Vec<i32>> { std::iter::empty() }
# fn incoming_iter() -> impl Iterator<Item = Vec<u8>> { std::iter::empty() }
# Ok::<(), lac::DecodeError>(())
```
Both `_into` variants clear the destination at entry and retain its
capacity, so steady-state usage makes zero allocations past the first
frame.
### Output size expectations
For realistic audio (speech, music, ambient), compressed frames land
around **15-55 %** of raw sample bytes (speech near the low end, music
near the high end). Callers reusing a scratch buffer can safely
preallocate to 1× raw and take the extension cost only on the rare
adversarial frame.
For untrusted input — payloads where residuals might be crafted to
maximise Rice output — the worst-case expansion bound is ~17× raw: at
the Rice `k = 23` ceiling, each codeword is up to 535 bits (511 unary
zeros + terminator + 23 remainder), or ~67 bytes per residual. A
pipeline that must pre-size a bounded output buffer for arbitrary
input can use `samples.len() * 68` bytes as a loose upper bound. The
encoder never exceeds this.
### Error recovery
On decode failure the caller substitutes `frame_sample_count` zeros
(silence) for the frame period. The count is recoverable from the
frame itself as long as the *header* parsed, even if the bitstream
body then failed — call [`parse_header`] on the same buffer:
```rust
use lac::{decode_frame, parse_header};
const SESSION_DEFAULT_FRAME: usize = 320; // negotiated at session start
let bytes = Vec::<u8>::new();
let samples = match decode_frame(&bytes) {
Ok(s) => s,
Err(_) => {
let count = parse_header(&bytes)
.map(|(h, _)| h.frame_sample_count as usize)
.unwrap_or(SESSION_DEFAULT_FRAME);
vec![0i32; count]
}
};
```
When the header itself fails (`BadSyncWord`, `InvalidPredictionOrder`,
`InvalidPartitionOrder`, `InvalidCoefficientShift`, or `Truncated`
below 7 bytes), the frame length is unknowable and the caller must
fall back to a session-level default.
[`encode_frame_into`]: https://docs.rs/lac/latest/lac/fn.encode_frame_into.html
[`decode_frame_into`]: https://docs.rs/lac/latest/lac/fn.decode_frame_into.html
[`parse_header`]: https://docs.rs/lac/latest/lac/fn.parse_header.html
## Concurrency
LAC's encode and decode APIs are pure functions with no shared state —
no globals, no internal `Mutex`, no `unsafe`. All public types are
`Send + Sync`. Calls on different threads never contend with each
other, and each call's scratch buffers are owned (stack or the
caller-supplied `Vec`).
The intended deployment shape for multi-channel and multi-stream
workloads is **one thread or task per channel**. The codec itself does
no threading: scheduling is left to the application so it can pick
whichever executor fits (tokio for async servers, rayon for data-
parallel workloads, `std::thread` for straight-ahead concurrency).
MCU re-encode fanout with stdlib primitives only:
```rust
use std::thread;
use lac::encode_frame;
let mixes: Vec<Vec<i32>> = Vec::new();
let outgoing: Vec<Vec<u8>> = thread::scope(|s| {
let handles: Vec<_> = mixes
.iter()
.map(|mix| s.spawn(move || encode_frame(mix)))
.collect();
handles.into_iter().map(|h| h.join().unwrap()).collect()
});
```
Or with rayon, if the project already pulls it in:
```rust
// use rayon::prelude::*;
// let outgoing: Vec<Vec<u8>> = mixes.par_iter().map(|m| encode_frame(m)).collect();
```
The allocator you link against sets the ceiling on multi-core
scaling: glibc `malloc` has measurable lock contention at tens of
cores, whereas mimalloc / jemalloc keep per-thread caches and scale
further. The codec itself doesn't care which one you pick — it allocs
through the global allocator like any other Rust library.
### Input-size caps on untrusted channels
Applications accepting LAC frames from untrusted peers should cap the
per-frame input size at the application layer. The decoder's
per-codeword unary-run bound (spec §4.2) prevents any single codeword
from consuming unbounded CPU, but total decode cost scales with
buffer length; an attacker handed an unbounded payload can force
proportional scan work. Typical real frames are sub-kilobyte; **a cap
of 64 KB per frame is comfortably above any legitimate LAC payload
and cheap to enforce at the framing layer** (QUIC stream length
field, length-prefixed framing, etc.). The `Truncated` error fires
naturally when a payload is cut, so a hard cap doesn't break legal
traffic — it just bounds pathological work.
### Silence-substitution amplification
Spec §6.1 mandates that callers substitute `frame_sample_count` zeros
on decode failure. An attacker can craft a tiny frame (~10-byte
header with `frame_sample_count = 65535`) whose Rice payload is
malformed; the decoder rejects, the caller dutifully emits 65 535
output samples of silence. At 48 kHz mono `i32`, that's **~256 KB of
zeros per ~10-byte input frame — a ~25 000× amplification**.
The output is silence, not attacker-chosen data, so this is a
downstream-resource-exhaustion vector (memory, bandwidth,
re-encode work at an MCU) rather than a data-injection vector.
Mitigation is at the application layer: **cap `frame_sample_count`
to the session's negotiated frame size** before invoking the silence
substitution. QUIC / WebRTC sessions already negotiate a frame size
at setup; using that as a hard upper bound on the silence-fill
length collapses the amplification ratio to 1×. An MCU that reads
`parse_header(&data).frame_sample_count` without validating it
against the session cap inherits the amplification unchanged.
## Packet loss & concealment
Frames are independently decodable: losing one frame never corrupts
another, regardless of which concealment strategy the application
picks. This is a genuine deployment asset on lossy transports (QUIC
datagrams, UDP), and the section below walks the plausible strategies
in increasing quality order.
### Strategy 1: silence substitution (the default)
The baseline `decode_frame` returns `Err` on structural failure; the
application substitutes `frame_sample_count` zeros for the lost frame
period (see `parse_header` recovery pattern under *Public API →
Error recovery*). Fast, deterministic, audible as a brief cut —
acceptable for voice up to ~20 ms of loss, jarring beyond that.
### Strategy 2: sample-and-hold
Repeat the last successfully decoded sample for the frame period.
Zero-cost on the decoder side, preserves DC level so the click at
the drop boundary is softer than silence. Quality at 20 ms of loss
is better than silence for voice, slightly worse for music (DC hold
on a non-stationary signal adds a small transient when the next
frame arrives).
```rust
// After a successful decode, store the last sample for reuse on loss.
// On loss: fill the gap with that value.
# fn last_decoded_sample() -> i32 { 0 }
# const N: usize = 320;
let conceal = vec![last_decoded_sample(); N];
```
### Strategy 3: linear fade
Interpolate from the last valid sample down to zero over the lost
frame period. Removes the DC-hold transient and the "cut to silence"
click both. Costs N integer adds per lost frame. Recommended baseline
for any application that can afford 2-5 lines of PLC code.
### Strategy 4: LPC-coefficient extrapolation
The last successfully decoded frame's [`AudioFrameHeader`] carries
the LPC coefficients the encoder chose — available from
[`parse_header`] at no extra cost — and the LPC filter is locally
stationary over a 20-40 ms horizon. Run the synthesis formula (§3.6
of `Specification.md`) forward from the last decoded samples to *predict*
the missing frame. Quality is best on pitched content (voiced
speech, sustained notes); on transients it degrades gracefully
because the predictor's autoregressive behaviour damps toward zero
over the frame.
Not built into the library — the math is straightforward and the
"right" tuning varies by deployment (how much damping, whether to
blend with sample-and-hold on transients, etc.). See `src/lpc.rs`'s
`lpc_synthesize_into` for the integer synthesis routine that a
PLC implementation would call.
### Multi-frame loss guidance
The strategies above are only useful up to a handful of consecutive
lost frames. Rough thresholds at 20 ms frame periods:
| Consecutive lost frames | Effective loss | Verdict |
|---|---|---|
| 1 | 20 ms | Inaudible with fade or LPC extrapolation; brief click with silence or sample-and-hold |
| 2-3 | 40-60 ms | Noticeable glitch; LPC extrapolation minimises but cannot hide it |
| 4-10 | 80-200 ms | Audible dropout. PLC keeps the audio from sounding "broken" but doesn't restore content |
| > 10 | > 200 ms | Treat the stream as broken; reset the receiver's concealment state to avoid droning artifacts, and if possible ask the transport to signal "resync" upstream |
Mid-stream resync on a datagram transport uses the sync word
(`0x1ACC`) as an alignment anchor: on a string of bad frames,
search the next `N` bytes of the buffer for the big-endian sequence
`\x1a\xcc` and retry `parse_header` from each candidate offset
until one succeeds. The search is O(N); on a 20 ms frame at 48 kHz
there are at most ~180 bytes per frame to scan, so amortised cost
is negligible.
[`AudioFrameHeader`]: https://docs.rs/lac/latest/lac/struct.AudioFrameHeader.html
## Testing
```
cargo test # unit tests
cargo test --test corpus --release -- --nocapture # compression vs FLAC, lac_enc_ms
cargo test --test synthetic --release -- --nocapture # bit-depth + pathological content
cargo test --test latency --release -- --nocapture --test-threads=1 # p50/p95/p99 + alloc count
cargo test --test mcu_mix --release -- --nocapture --test-threads=1 # MCU throughput
cargo test --test conformance --release -- --nocapture # byte-level spec conformance
cargo test --test determinism --release # encode byte-equality on repeat
cargo fuzz run decode_arbitrary -- -dict=dict/decode_arbitrary.dict
cargo fuzz run roundtrip_arbitrary -- -dict=dict/roundtrip_arbitrary.dict
cargo bench # nightly bench
benches/compare-flac.sh # flac side of the speed table
```
**Published-crate caveat.** `Cargo.toml` excludes `corpus/*` and
`fuzz/*` from the published tarball — they'd blow up crate size and
the audio isn't redistributable under crates.io's constraints anyway.
A user running `cargo test` against a `cargo add lac`'d dependency
sees every corpus test *pass* because the `require_corpus!` macro
skips missing files silently; the compression-ratio assertions,
FLAC comparisons, latency P99 checks, and MCU throughput checks all
go unrun. The full regression suite requires the git repository
(with LFS pulled). The synthetic, conformance, determinism, and
unit tests run unchanged from either source.
Coverage at a glance:
- **Unit** — round-trips for every LPC order 0-32 and every partition order
0-7, prime frame lengths that force `partition_order = 0`, all-zero
frames, full-scale sample magnitudes, malformed-header rejection for
every field (`sync_word`, `prediction_order`, `partition_order`,
`coefficient_shift`), truncated bitstreams, and a convex-descent vs
exhaustive-search `select_k` differential.
- **Corpus** — round-trip + compression-ratio + FLAC subprocess comparison
on a mixed speech and music corpus; asserts ratio ceilings so a codec
regression fails CI; prints LAC encode wall-clock for correlation
against `benches/compare-flac.sh`.
- **Synthetic** — deterministic LFSR-driven round-trips at 8/16/20/24-bit
source widths and pathological content (all-zero, DC offset, Nyquist
square, silence + click, full-scale constant, prime-length frame). No
corpus dependency so the tests run on every CI checkout.
- **Latency** — per-frame encode/decode timing on real speech with a
custom tracking allocator for peak-heap *and* per-frame allocation-count
numbers; reports P50/P95/P99/max and asserts P99 < frame period so a
real-time regression fails CI.
- **MCU** — decode → PCM mix → re-encode simulation on real speech for
2/3/5/8/16 participants (continuous speech) plus an 8-participant
rotating dominant-speaker variant; asserts MCU egress ≤ SFU-fanout egress.
- **Fuzz** — libFuzzer targets for decoder robustness and
encoder/decoder self-consistency on arbitrary bytes, seeded with
dictionaries of the wire-format constants (sync word, field boundaries)
and sample-magnitude boundaries (8/16/20/24-bit ceilings).
## Measurements
### Reference hardware
| Short name | CPU | ISA highlights |
|---|---|---|
| **7840HS** | AMD Ryzen 7 7840HS (laptop, 8c/16t, up to 5.1 GHz) | AVX-512 (F/BW/CD/DQ/VL/VNNI/VBMI), BMI2, FMA |
| **RPi5** | Raspberry Pi 5 (Cortex-A76 quad, 2.4 GHz) | NEON |
| **VF2** | StarFive VisionFive 2 (SiFive U74 quad, 1.5 GHz) | RVV 0.7 (some LLVM autovec, less mature than x86 or NEON) |
Numbers below are measured at default `cargo build --release` (no
`target-cpu=native`, no project-level `RUSTFLAGS`). Empty cells are
awaiting measurement on the listed hardware. FLAC comparison uses both
`-5` (the CLI default, what production pipelines typically use) and
`-8` (`--best`, the compression upper bound).
### Corpus attribution
The measurements are taken on two publicly-licensed audio corpora
checked into `corpus/`:
- **Speech**: the [AMI Meeting Corpus](https://groups.inf.ed.ac.uk/ami/corpus/)
(files named `ES2002a.*`), recorded by the AMI Consortium (University
of Edinburgh, IDIAP, TNO, Brno University of Technology, University
of Sheffield, and partners). Distributed under
[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
- **Music**: Kimiko Ishizaka's recording of J.S. Bach's *Goldberg
Variations, BWV 988* (files named `Kimiko Ishizaka - …`), from the
[Open Goldberg Variations project](https://opengoldbergvariations.org/)
(Robert Douglass, producer). Released under
[CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) —
public domain dedication, no attribution legally required, credited
here as a courtesy.
Both corpora are used unmodified apart from the file selection
described in the tables below.
### Compression (hardware-independent, bit-exact across targets)
LAC ratio = LAC encoded / raw PCM. Both codecs use the same 4096-sample
block size on this corpus — LAC's `tests/corpus.rs` sets
`FRAME_SIZE = 4096`, which matches FLAC's default blocksize at `-5`
and `-8` for ≤ 48 kHz content, so header and coefficient overhead is
amortised identically on both sides.
| Corpus file | Class | LAC | FLAC -5 | FLAC -8 | LAC / -5 | LAC / -8 |
|---|---|---:|---:|---:|---:|---:|
| `ES2002a.Headset-0.wav` | headset speech, 16 kHz / 16-bit | 0.178 | 0.187 | 0.186 | 0.954 | 0.958 |
| `ES2002a.Mix-Headset.wav` | mixed meeting, 16 kHz / 16-bit | 0.292 | 0.300 | 0.297 | 0.975 | 0.984 |
| `ES2002a.Array1-01.wav` | array speech, 16 kHz / 16-bit | 0.375 | 0.378 | 0.377 | 0.989 | 0.994 |
| Goldberg Aria (01) | solo piano, 96 kHz / 24-bit | 0.483 | 0.458 | 0.457 | 1.053 | 1.056 |
| Goldberg Variatio 4 (05, fughetta) | solo piano, 96 kHz / 24-bit | 0.514 | 0.483 | 0.481 | 1.065 | 1.067 |
| Goldberg Variatio 16 (17, Ouverture) | solo piano, 96 kHz / 24-bit | 0.512 | 0.479 | 0.478 | 1.068 | 1.070 |
Speech reliably beats FLAC at both levels by a small margin; music
trails by 5-7 % (the Q-format gap at low frequencies, mitigated but
not eliminated by `coefficient_shift`). FLAC's jump from `-5` to `-8`
buys essentially nothing on this corpus (≤ 0.2 pp of ratio), so the
realistic LAC-vs-FLAC comparison in practice is against `-5`. Numbers
are byte-identical regardless of hardware because LAC's output is
specified bit-exactly.
### Encode wall-clock (ms, full file)
One table per hardware target; each has LAC alongside both FLAC levels
so the speed cost of each quality point is visible. The `-5` column is
the most representative real-world comparison.
**7840HS** (AMD Ryzen 7 7840HS):
| Corpus file | Duration | LAC | FLAC -5 | FLAC -8 |
|---|---|---:|---:|---:|
| `ES2002a.Headset-0.wav` | ~42 min, 16 kHz / 16-bit | 1158 | 221 | 436 |
| `ES2002a.Array1-01.wav` | ~42 min, 16 kHz / 16-bit | 1292 | 226 | 447 |
| `ES2002a.Mix-Headset.wav` | ~42 min, 16 kHz / 16-bit | 1367 | 223 | 469 |
| Goldberg Variatio 4 (05) | ~68 s, 96 kHz / 24-bit stereo | 809 | 272 | 647 |
| Goldberg Variatio 16 (17) | ~188 s, 96 kHz / 24-bit stereo | 2126 | 754 | 1741 |
| Goldberg Aria (01) | ~300 s, 96 kHz / 24-bit stereo | 3521 | 1166 | 2703 |
**RPi5** (Raspberry Pi 5, Cortex-A76 @ 2.4 GHz):
| Corpus file | Duration | LAC | FLAC -5 | FLAC -8 |
|---|---|---:|---:|---:|
| `ES2002a.Headset-0.wav` | ~42 min, 16 kHz / 16-bit | 2856 | 477 | 959 |
| `ES2002a.Array1-01.wav` | ~42 min, 16 kHz / 16-bit | 3249 | 495 | 1096 |
| `ES2002a.Mix-Headset.wav` | ~42 min, 16 kHz / 16-bit | 3363 | 505 | 1132 |
| Goldberg Variatio 4 (05) | ~68 s, 96 kHz / 24-bit stereo | 1904 | 606 | 1570 |
| Goldberg Variatio 16 (17) | ~188 s, 96 kHz / 24-bit stereo | 5201 | 1627 | 4324 |
| Goldberg Aria (01) | ~300 s, 96 kHz / 24-bit stereo | 9015 | 2572 | 6832 |
**VF2** (StarFive VisionFive 2, SiFive U74 quad @ 1.5 GHz):
| Corpus file | Duration | LAC | FLAC -5 | FLAC -8 |
|---|---|---:|---:|---:|
| `ES2002a.Headset-0.wav` | ~42 min, 16 kHz / 16-bit | 29385 | 2355 | 5614 |
| `ES2002a.Array1-01.wav` | ~42 min, 16 kHz / 16-bit | 33231 | 2502 | 6688 |
| `ES2002a.Mix-Headset.wav` | ~42 min, 16 kHz / 16-bit | 34899 | 2548 | 6878 |
| Goldberg Variatio 4 (05) | ~68 s, 96 kHz / 24-bit stereo | 18185 | 3184 | 9278 |
| Goldberg Variatio 16 (17) | ~188 s, 96 kHz / 24-bit stereo | 49811 | 8535 | 25454 |
| Goldberg Aria (01) | ~300 s, 96 kHz / 24-bit stereo | 88208 | 13544 | 40650 |
LAC is ~5-6× slower than FLAC `-5` and ~2-3× slower than FLAC `--best`
on x86 because libFLAC ships hand-tuned SSE intrinsics for its
autocorrelation kernel and LAC relies on LLVM autovectorization.
End-to-end perf barely changes with `target-cpu=native`: the kernel does
pick up AVX-512 zmm dot-products, but the frame encode is bottlenecked
elsewhere (Rice k-search and bitstream assembly dominate the remaining
time).
On RPi5 (ARM Cortex-A76, NEON) LAC runs ~2.5× slower than on 7840HS in
absolute terms, but the *ratio* against FLAC shifts noticeably: LAC is
~6-7× slower than FLAC `-5` on speech (wider gap, libFLAC's NEON path
is well-tuned for the 16 kHz / 16-bit content) but only ~3× slower on
96 kHz / 24-bit music (narrower gap — 24-bit content gives libFLAC's
specialization less leverage). Against FLAC `--best` the music gap
narrows further to ~1.2-1.3×. The 7840HS-vs-RPi5 delta in the LAC
column shows scalar autovec quality is broadly comparable across x86
and ARM backends; the delta in the FLAC columns shows where hand-tuned
intrinsics disappear on a different ISA.
On VF2 (RISC-V SiFive U74, RVV 0.7 — not supported by mainline libFLAC
or LLVM autovec yet) LAC runs ~10× slower than on RPi5. Both codecs
fall back to pure scalar execution; the gap between them *widens* to
~12-13× on speech and ~6× on music vs FLAC `-5`, or ~5× / ~2× vs
FLAC `--best`. Two factors compound: the U74 is a single-issue
in-order core vs the Cortex-A76's dual-issue out-of-order (base IPC is
~2× lower at the ISA-agnostic level), and LLVM's scalar Rust codegen
for RISC-V is less mature than its x86/ARM output — tighter inner
loops in libFLAC's hand-written C survive this better than LAC's
Rust does. The absolute numbers are still useful: even at 88 s to
encode 5 minutes of 96/24 stereo audio, LAC comfortably meets
realtime for streaming use (see the P99 latency table below).
### Per-frame encode latency P99 (µs)
All rows use real AMI speech samples. Frame sample count sets the
real-time deadline; P99 must stay below that period for the frame to
ship inside its own playback slot.
| Test | Frame | Period | 7840HS P99 | RPi5 P99 | VF2 P99 |
|---|---|---:|---:|---:|---:|
| `latency_headset_speech_160` | 160 @ 16 kHz | 10 ms | 20 | 38 | 235 |
| `latency_headset_speech_320` | 320 @ 16 kHz | 20 ms | 36 | 76 | 499 |
| `latency_headset_speech_480` | 480 @ 16 kHz | 30 ms | 37 | 81 | 635 |
| `latency_headset_speech_prime` | 503 @ 16 kHz | 31 ms | 23 | 52 | 387 |
| `latency_array_speech_320` | 320 @ 16 kHz | 20 ms | 42 | 77 | 506 |
| `latency_mixed_meeting_320` | 320 @ 16 kHz | 20 ms | 43 | 84 | 551 |
P99 headroom is ~400-1300× on 7840HS, ~130-600× on RPi5, and
~36-81× on VF2. Every row on every platform stays comfortably inside
the realtime deadline — even VF2's worst case (`mixed_meeting_320` at
551 µs on a 20 ms frame) has 36× margin. LAC meets its streaming
contract on every target tested.
### MCU throughput (× realtime on one core)
Realtime multiplier = audio-ms processed per wall-clock-ms, per core.
"`20×` realtime" means one core sustains twenty simultaneous meetings
of the listed configuration.
| Test | Activity | 7840HS | RPi5 | VF2 |
|---|---|---:|---:|---:|
| `mcu_mix_1on1_voice` (P=2) | continuous | 279× | 145× | 22× |
| `mcu_mix_3people_voice` (P=3) | continuous | 193× | 95× | 14× |
| `mcu_mix_5people_voice` (P=5) | continuous | 120× | 57× | 9× |
| `mcu_mix_8people_voice` (P=8) | continuous | 77× | 35× | 5× |
| `mcu_mix_8people_dominant_speaker` (P=8) | rotating speaker | 106× | 43× | 6× |
| `mcu_mix_16people_voice` (P=16) | continuous | 39× | 17× | 2.5× |
MCU egress byte count as a fraction of SFU fanout egress on 7840HS:
1.00 (P=2, trivially equal), 0.60 (P=3), 0.36 (P=5), 0.22 (P=8
continuous), 0.35 (P=8 dominant-speaker), 0.10 (P=16). The continuous
case is the lower bound — SFU fanout scales quadratically in
participant count while MCU mix egress scales linearly, so the
relative savings grow as the meeting does. The dominant-speaker case
inverts that trend slightly: SFU fanout of N-1 near-silent streams is
almost free, so the SFU baseline falls faster than the MCU mix cost
does. These numbers are byte-accounting, not wall-clock.

784
Specification.md Normal file
View file

@ -0,0 +1,784 @@
# LAC Wire Format
Normative specification of the LAC bitstream. This document is the authority
on byte layout, field semantics, and encoder/decoder constraints.
## 1. Conventions
- All multi-byte integer fields are **big-endian**.
- Bit streams are **MSB-first**: the first bit written occupies bit 7 of its
byte, subsequent bits fill lower positions, and a new byte begins once
eight bits have been emitted.
- Samples are **signed integers** passed as `i32` with magnitude bounded
by `|sample| ≤ 2²³ 1`. The upper 9 bits of each `i32` must be a
consistent sign extension of the 24-bit-magnitude value. Narrower source
formats (8-bit, 16-bit, 20-bit integer PCM) are passed through directly
— they trivially satisfy the magnitude bound — and compress at the bit
cost of their actual values, not a 24-bit ceiling. The codec does not
carry bit-depth metadata; the container or application layer is
responsible for remembering the source format.
- Sample rate is **not** part of the bitstream. The container or transport
carries it.
- A frame encodes **one channel**. Stereo is two independent streams of
frames, one per channel.
## 2. Frame Layout
A frame is a contiguous byte sequence:
```text
+--------+--------------------+
| header | rice_bitstream |
+--------+--------------------+
```
The `header` is fixed-structure (variable length because the coefficient
array depends on `prediction_order`). The `rice_bitstream` is a bit-packed
payload; its byte length is `ceil(total_rice_bits / 8)` with zero padding in
the low bits of the last byte.
Decoder input is the complete frame. There is no intra-frame continuation or
fragmentation — the transport layer handles that.
## 3. Frame Header
```text
Offset Size Field Type Constraint
------ ---- -------------------- ------- ----------------------------
0-1 2 sync_word u16 BE == 0x1ACC
2 1 prediction_order u8 ∈ [0, 32]
3 1 partition_order u8 ∈ [0, 7]
4 1 coefficient_shift u8 ∈ [0, 5]
5-6 2 frame_sample_count u16 BE ≥ 1, % (1 << partition_order) == 0
7+ 2·p lpc_coefficients i16 BE[] length = prediction_order = p
```
Total header length: `7 + 2 · prediction_order` bytes.
### 3.1 `sync_word`
Fixed value `0x1ACC`. Present to identify a LAC frame on lightly framed
transports and to reject foreign payloads at the first check. Decoders
**must** reject any frame whose first two bytes are not `0x1ACC`.
### 3.2 `prediction_order`
Integer order of the LPC analysis filter used to produce residuals.
- Value `0` is **verbatim mode**: no prediction, residuals equal the samples,
and the `lpc_coefficients` array is empty (zero bytes).
- Values `1` through `32` are standard LPC orders; `lpc_coefficients` carries
exactly that many predictor coefficients, interpreted in the Q-format
determined by `coefficient_shift` (§3.4).
Decoders **must** reject values above 32.
### 3.3 `partition_order`
Controls how the residual stream is split for Rice coding.
- `partition_count = 1 << partition_order`.
- The residual stream is divided into `partition_count` equal partitions of
`frame_sample_count / partition_count` samples each.
Decoders **must** reject values above 7 and **must** reject frames where
`frame_sample_count` is not a multiple of `partition_count`.
### 3.4 `coefficient_shift`
Controls the fixed-point scale of the stored Q-format LPC predictor
coefficients. Coefficients are stored as 16-bit integers interpreted as
`Q(15 coefficient_shift)`:
| shift | Q-format | Real-value range | Use case |
|-------|----------|-------------------|------------------------------------------------|
| 0 | Q15 | `[1, 1)` | Coefficients with magnitude < 1 (most orders > 1) |
| 1 | Q14 | `[2, 2)` | Low-frequency content, `a[1]` near 2 |
| 2 | Q13 | `[4, 4)` | Extreme bass / narrow resonances |
| 3 | Q12 | `[8, 8)` | Pathological transients |
| 4 | Q11 | `[16, 16)` | Reserved for synthetic signals |
| 5 | Q10 | `[32, 32)` | Upper bound; decoder rejects larger values |
The encoder **must** select the smallest `coefficient_shift` at which no
coefficient's real value exceeds the representable range for that shift —
i.e., the smallest scale that does not clamp. Smaller shifts give finer
precision and thus smaller residuals when no clamping is required.
If no `coefficient_shift ∈ [0, 5]` suffices (the real coefficient
magnitude exceeds the Q10 range at `shift = 5`), the encoder
saturates each offending coefficient independently to the i16 range
`[32768, 32767]`. Bit-exact round-trip is preserved because the
decoder applies the synthesis formula to whatever 16-bit values the
wire carries; the cost of saturation is compression — the predictor
no longer matches the encoder's ideal coefficients, so residuals
grow. Real audio at the input-magnitude contract (§1) rarely reaches
this case; synthetic or adversarial inputs can force it.
Decoders **must** reject values above 5. The shift applies uniformly to
every coefficient in `lpc_coefficients`; there is no per-coefficient
scale.
When `prediction_order == 0` (verbatim frame), `coefficient_shift`
**must** be `0`. The shift only modifies how stored coefficients are
interpreted, and a verbatim frame stores none. Decoders **must**
reject frames with `prediction_order == 0` and `coefficient_shift != 0`
as malformed; this rule closes the space of legal but meaningless
headers so two implementations agree bit-for-bit on which inputs
round-trip.
### 3.5 `frame_sample_count`
Number of audio samples produced by this frame (also the number of residuals
in the Rice bitstream). The value **must** be in `[1, 65535]`.
Decoders **must** reject `frame_sample_count == 0`: a zero-sample frame
trivially satisfies the partition-divisibility check below
(`0 mod n == 0` for any `n`) but carries no audio and has no legal
Rice payload.
For compliance with `partition_order`, the value **must** satisfy
`frame_sample_count mod (1 << partition_order) == 0`. Decoders **must**
reject frames where this does not hold.
### 3.6 `lpc_coefficients`
Array of `prediction_order` predictor coefficients, each a 16-bit big-endian
signed integer interpreted in `Q(15 coefficient_shift)` format — see §3.4
for the shift semantics.
The wire format does not distinguish coefficients by derivation. The
synthesis formula below applies identically whether the encoder
obtained the values from Levinson-Durbin analysis, from a fixed
coefficient template (e.g. FLAC-style integer predictors), from a
trained model, or from any other strategy. What goes on the wire is
just `prediction_order` 16-bit integers; how the encoder chose them
is encoder-internal and not observable to the decoder.
Synthesis formula (applied in the decoder):
```text
s = 15 coefficient_shift
bias = 1 << (s 1)
predict[i] = (Σ_{j=0..terms-1} coeff[j] · sample[i j 1] + bias) >> s
sample[i] = residual[i] + predict[i]
```
where `terms = min(i, prediction_order)`. The `+ bias` term implements
round-to-nearest for the right shift and is **required** for bit-exact
decoding. For the default `coefficient_shift = 0`: `s = 15`, `bias = 16384`.
The `>> s` operator **must** be an **arithmetic right shift** on
signed integers — equivalent to floor division by `2^s`. Combined with
the `+ bias` pre-add, this implements **round-half toward +∞**: on a
value whose scaled form is exactly `k + 0.5`, the result is `k + 1`
for both positive and negative `k`. Implementations using truncating
integer division (C's `/` on signed integers, which rounds toward
zero) **will diverge** from this on any `sum + bias` that is negative
and not evenly divisible by `2^s`: arithmetic shift rounds further
from zero, truncating division rounds toward zero. Concrete example:
at `s = 15`, `sum = -32769`, `bias = 16384`, arithmetic shift gives
`(-16385) >> 15 = -1`, truncating division gives `-16385 / 32768 = 0`.
Decoders in languages whose native integer division does not floor
**must** emulate arithmetic right shift explicitly on the signed
accumulator.
#### Accumulator width
The inner sum `Σ coeff[j] · sample[i j 1]` **must** be computed in
a signed integer accumulator of at least **49 bits** (equivalently: an
`i64` or wider). Worst-case bounds at `prediction_order = 32`,
`coefficient_shift = 5` (Q10), and full-scale samples give a product
of magnitude `(2¹⁵) · (2²³ 1) ≈ 2³⁸` per term, summed over 32 terms
for a maximum of `~2⁴³`. Adding the bias keeps the result below `2⁴⁴`.
A 32-bit accumulator overflows at orders ≥ 16 with full-scale inputs —
implementations that reach for `int32_t` because samples are 32-bit
will silently corrupt high-order frames.
JavaScript / TypeScript implementers should note that `Number` is an
IEEE 754 double, not a signed integer: its 53-bit safe-integer range
covers in-contract accumulator values, but adversarial bitstreams (see
§6.2) can produce out-of-contract samples whose synthesis arithmetic
lands in the 2⁴⁹2⁵¹ range and beyond, where `Number` silently loses
low bits to float rounding. For bit-exact spec compliance in JS/TS,
**use `BigInt` for the accumulator** — it has the integer semantics
the spec requires; `Number` does not.
#### Warm-up (`terms == 0`)
When `i == 0`, `terms = min(0, prediction_order) = 0`. The sum is
empty and `predict[0] = 0` — the `(0 + bias) >> s` formula is **not**
applied. Stating this explicitly avoids an implementation that
mechanically applies the formula in the warm-up case and produces
`predict[0] = bias >> s`, which is zero only in specific
`(bias, s)` parametrisations and surprising in any other.
For `0 < i < prediction_order`, the sum truncates to the available
`i` predecessors (`terms = i`). The formula applies as stated.
#### Sign convention for stored coefficients
The synthesis formula uses `+Σ`. Classical Levinson-Durbin
implementations that derive LPC from the error-prediction AR model
```text
x[n] = −Σ a[j] · x[n-j] + e[n] (error convention)
```
produce coefficients `a[j]` whose sign is the **opposite** of what
the synthesis formula expects; those encoders **must** negate before
quantisation so the wire value is `coeff[j-1] = a[j]`.
Implementations using the predictor convention
```text
x̂[n] = +Σ c[j] · x[n-j] (predictor convention)
```
store `c[j]` directly.
Both conventions are common in DSP literature. Encoders **must**
verify that the coefficients emitted on the wire, when substituted
into the synthesis formula above, reproduce the encoder's own
prediction. The reference implementation uses the error convention
and negates at quantisation time.
#### Overflow semantics of the final add
The `residual[i] + predict[i]` add is specified as a **wrapping i32
add** (two's complement, modulo `2³²`; in languages without native
signed-overflow semantics, compute `(residual + predict) & 0xFFFFFFFF`
and then re-interpret as a signed 32-bit integer via sign-extension
of bit 31). On well-formed bitstreams — those produced by a compliant
encoder from in-contract samples (§1) — the result stays inside the
sample-magnitude contract and the wrap is never observable.
Adversarial bitstreams with crafted coefficients and residuals **may**
produce any `i32` value; the decoder **must not** panic, abort, or
reject on the basis of this add's result. The consequences of
out-of-contract decoder output are addressed in §6.2.
## 4. Rice Bitstream
Immediately follows the header. Flat MSB-first bitstream structured as
consecutive partition payloads:
```text
+---------+---------+-----+-----------+
| part. 0 | part. 1 | ... | part. P-1 |
+---------+---------+-----+-----------+
```
where `P = 1 << partition_order`. Each partition has the same structure:
```text
+-------+-------------+-------------+-----+-------------+
| k (5) | codeword 0 | codeword 1 | ... | codeword M-1|
+-------+-------------+-------------+-----+-------------+
```
where `M = frame_sample_count / P` is the per-partition residual count.
Partitions are **bit-contiguous**: the 5-bit `k` field of partition
`i + 1` begins at the bit immediately following the last codeword of
partition `i`. There is no byte alignment between partitions. Only
the final trailing padding described in §4.3 is byte-aligned.
Within a partition, the bit cursor likewise advances continuously:
codeword 0 begins at the bit immediately following the 5-bit `k`
field, codeword 1 immediately after codeword 0's remainder bit, and so
on. A conformant decoder maintains a single bit-read position across
the entire Rice bitstream — from the `k` of partition 0 through the
last codeword of partition P1 — and never realigns to a byte or bit
boundary between fields. This is implicit in the byte-stream decoder
design (a bit reader that consumes bits sequentially needs no
special handling at field boundaries) but stated here so second-team
implementations do not introduce a spurious alignment.
### 4.1 Per-Partition Parameter `k`
Five-bit unsigned integer, MSB-first, immediately before the partition's
codewords. `k` is the Rice parameter for this partition and must be in
`[0, 23]`. Decoders **must** reject values above 23 as malformed.
### 4.2 Codeword
Each residual is encoded by:
1. **Zigzag mapping** from signed to unsigned:
```text
z = (r << 1) ^ (r >> 31) interpreted as u32
```
where `(r >> 31)` is an **arithmetic right shift** on the i32
residual, sign-extending the sign bit to all 32 bit positions — `0`
for non-negative `r`, `-1` (all ones) for negative `r`. The entire
expression is **masked to 32 bits** before being interpreted as
`u32`; in languages with arbitrary-precision integers (e.g. Python)
or where native bitwise ops return signed 32-bit (e.g. JavaScript /
TypeScript `Number`, where `(x ^ y) >>> 0` coerces to u32), this
mask is explicit (`& 0xFFFFFFFF` or `>>> 0`) and **required**
without it, negative residuals produce zigzag values with extra
high bits set. Implementations in languages whose native right
shift is logical on unsigned types **must** coerce `r` to a signed
32-bit type first; implementations in languages where `>>` on
signed types is implementation-defined (e.g. pre-C++20 C/C++)
**must** emulate the arithmetic shift explicitly.
The `r << 1` factor is always safe on i32. Residuals on the
encoder side are bounded by `|r| ≤ |sample| + |predict| ≤ 2·2²³
≈ 2²⁴`, so `r << 1` has magnitude `≤ 2²⁵` and fits in i32 without
overflow or undefined behaviour — even for `r` at the most
negative value the encoder can ever produce from in-contract
input (§1). Decoder-side code does not perform this shift; the
inverse uses `z >> 1` on a u32, which is always defined.
The mapping sends
```text
{0, 1, 1, 2, 2, 3, 3, …} → {0, 1, 2, 3, 4, 5, 6, …}
```
so small magnitudes of either sign map to small unsigned values.
The decoder's inverse is
```text
r = ((z >> 1) as i32) ^ ((z & 1) as i32)
```
where both shifts here are natural (unsigned-u32 logical for the
first, integer negation for the second). Stating this inverse
explicitly removes any ambiguity about how an implementation must
invert the zigzag.
2. **Rice code** at parameter `k`:
- **Unary part**: `q = z >> k` zero-bits followed by a single
terminating 1-bit.
- **Remainder part**: `k` bits of `z & ((1 << k) 1)`, MSB-first.
(The remainder part is absent when `k == 0`.)
Total codeword length: `q + 1 + k` bits.
#### Decoder-side unary-run bound
Decoders **must** reject any codeword whose unary run length satisfies
```text
q > (2³² 1) >> k (equivalently, q > u32::MAX >> k)
```
A valid codeword reconstructs `z = (q << k) | remainder` as a u32.
`q > u32::MAX >> k` implies `q << k ≥ 2³²`, which either overflows
u32 silently (a critical decoder bug class — corrupt output with no
error) or indicates a malformed stream. Either way the frame **must**
be discarded with `InvalidParameter` or equivalent rejection class
(§6). The bound varies with `k`: at `k = 23` it is `511`, at `k = 0`
it is `u32::MAX` (no practical constraint).
This rule also caps the CPU cost of unary scanning on adversarial
input: without the cap, a decoder could be forced to scan an
arbitrarily long run of zero bits before reaching either a `1` or
the buffer end.
### 4.3 Byte Padding
After the last codeword of the last partition, any unused bits of the final
byte are zero-padded on the LSB side. The encoder writes `0` for all padding
bits; the decoder ignores them.
### 4.4 Bitstream Length
The Rice bitstream's total bit length is the sum of codeword bit
lengths across every partition, plus 5 bits per partition for the `k`
fields:
```text
total_bits = P · 5 + Σ_{all residuals} (q + 1 + k_partition)
```
This depends on every residual's quotient and cannot be computed
from the frame header alone. A decoder **streams-decodes** until it
has produced exactly `frame_sample_count` samples, then stops.
Any bits remaining inside the last byte are padding (§4.3) and
carry no information.
A decoder **must not** require the Rice bitstream's byte length to be
signalled out-of-band. The header plus the zero-padded byte-aligned
tail fully determines the frame boundary; `parse_header`'s
`bytes_consumed` return plus streaming Rice decode locates the end of
the frame in the input buffer.
## 5. Degenerate Cases
### 5.1 All-Zero Frame
For an all-zero sample vector, the encoder **must** use
`prediction_order = 0` because the Levinson-Durbin recursion is
undefined at `R[0] = 0`. Residuals equal the input (all zeros).
Partition-order and per-partition `k` selection remain at the
encoder's discretion; any legal `(partition_order, k)` combination
produces a bit-exact-decodable frame. The minimum-cost choice —
`partition_order = 0`, `k = 0` — produces a Rice payload of exactly
`5 + frame_sample_count` bits. Compliant encoders are **not**
required to pick this minimum.
### 5.2 Single-Sample Frame
`frame_sample_count = 1` is valid but forces `partition_order = 0` (the only
value that divides 1 evenly). The single sample is Rice-coded directly
because no predecessors exist for any LPC order.
## 6. Error Recovery
Decoders **must** detect and reject every frame that violates the
constraints elsewhere in this document. The exhaustive list of
rejection classes, each of which is a distinct error condition so
callers can distinguish them in telemetry, is:
1. **Sync word mismatch** — bytes `0-1` differ from `0x1ACC` (§3.1).
2. **`prediction_order` out of range** — value > `32` (§3.2).
3. **`partition_order` out of range** — value > `7` (§3.3).
4. **`coefficient_shift` out of range** — value > `5` (§3.4).
5. **Verbatim frame with non-zero shift**`prediction_order == 0` and
`coefficient_shift != 0` (§3.4).
6. **`frame_sample_count == 0`** (§3.5).
7. **`frame_sample_count` not divisible by `partition_count`** (§3.3).
8. **Buffer truncated** — fewer bytes than the header plus coefficient
array requires, or fewer bits than the Rice bitstream demands
during streaming decode. This class is intentionally coarse-grained:
a single `Truncated` variant covers header truncation, missing `k`
fields, mid-codeword exhaustion, and every other "buffer ends early"
shape the decoder can encounter. Sub-categorising these provides no
caller benefit — the recovery action (discard and substitute
silence, see §6.1) is identical regardless of where truncation
happened.
9. **Per-partition `k` out of range** — value > `23` (§4.1).
10. **Unary-run cap exceeded** — any codeword with `q > u32::MAX >> k`
(§4.2).
On any of these, the decoder **must** discard the frame, produce no
output samples, and signal the error to the caller. **No partial
state may propagate** to the next frame's decode — frames are
independent (§2), so subsequent frames decode cleanly regardless.
### 6.1 Caller-side silence substitution
On rejection, the caller substitutes `frame_sample_count` zeros
(silence) for the frame period. The count is obtained as follows:
- **Post-header rejections** (classes 8-10 above — `Truncated` in the
Rice bitstream, `InvalidParameter` during Rice decode): the frame
header parsed successfully before the failure, so the count is
recoverable. The caller re-parses just the header on the same buffer
(reference API: `parse_header(data)`) and reads `frame_sample_count`
from the resulting `AudioFrameHeader`.
- **Pre-header rejections** (classes 1-7 above): the header itself
failed; the frame length is not recoverable from the bitstream. The
caller **must** fall back to a session-level default frame size
carried out-of-band by the container or transport (WebRTC and QUIC
audio sessions typically negotiate this at session setup).
This asymmetry is inherent to the wire format: `frame_sample_count`
lives inside the header at offset 5, so any rejection that happens
while parsing bytes 0-4 precedes its discovery.
### 6.2 Decoder output magnitude
On well-formed bitstreams produced by a compliant encoder from
in-contract samples (§1), decoder output satisfies
`|sample| ≤ 2²³ 1`.
Adversarial bitstreams — those with hand-crafted coefficients and
residuals that pass every rejection check in this section yet
produce arithmetic results outside the sample-magnitude contract —
**may** produce output samples of any `i32` value, including values
that exceed `2²³ 1`. The decoder **must not** panic or reject on
this basis: the wrapping-add semantics of §3.6 are precisely what
makes every bit sequence produce a defined output, which is the
ground of the "no partial state propagates" contract at the top of
this section.
Callers that re-feed decoder output into LAC's encoder (for example,
an MCU decode → PCM mix → re-encode pipeline) **should** validate or
clamp to the input magnitude contract before re-encoding. A
compliant encoder assumes its input satisfies `|sample| ≤ 2²³ 1`
and is not required to re-validate.
## 7. Encoder Guidance (non-normative)
The reference encoder's search has three phases:
```text
# Phase 0: all-zero short-circuit
R[0] = Σ sample[i]²
if R[0] == 0:
emit frame with prediction_order = 0, any legal partition_order (§5.1)
return
# Phase 1: sparse LPC order grid with stop-when-stale early-out
for prediction_order in [0, 2, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32]:
coeffs_q31 = levinson_durbin(samples, prediction_order) # cached
shift = smallest s such that every |coeff_real| < 2^s
coeffs_stored = quantize(coeffs_q31, to: Q(15 - shift))
residuals = compute_residuals(samples, coeffs_stored, shift)
for partition_order in 0..=7:
if frame_sample_count % (1 << partition_order) != 0: continue
rice_bits = estimate_cost(residuals, partition_order)
total = header_bits(prediction_order) + rice_bits
track minimum over (prediction_order, partition_order)
if no improvement for 2 consecutive grid entries: break
# Phase 2: fixed-predictor post-pass
for (fp_order, fp_coeffs, fp_shift) in FIXED_PREDICTORS:
residuals = compute_residuals(samples, fp_coeffs, fp_shift)
for partition_order in 0..=7:
if frame_sample_count % (1 << partition_order) != 0: continue
rice_bits = estimate_cost(residuals, partition_order)
total = header_bits(fp_order) + rice_bits
track minimum over (fp_order, partition_order)
emit frame with the (order, partition_order) that minimised `total`
```
The sparse grid + early-out is a speed/compression trade-off; a
compliant encoder may still exhaustively search every integer order
`0..=32` for marginal gains at higher cost. The produced bitstreams
are interchangeable.
The fixed-predictor post-pass tries FLAC-style integer predictors
(orders 1-4 with a small static coefficient table) after the LPC
grid. These evaluate quickly and occasionally beat the Levinson-
Durbin winner on content where a low-order integer polynomial fits
better than the statistically-optimal LPC fit — silent-plus-DC, very
smooth tones, polynomial-ish sensor data. Running them second avoids
tripping the stop-when-stale heuristic in the LPC phase.
The reference encoder's `FIXED_PREDICTORS` table, materialising the
FLAC-style `[1]`, `[2, 1]`, `[3, 3, 1]`, `[4, 6, 4, 1]` integer
predictors at the smallest `coefficient_shift` that represents each
coefficient without clamping:
| `prediction_order` | `lpc_coefficients` (Q-format integers) | `coefficient_shift` | Real-value interpretation |
|-------------------:|---------------------------------------------|--------------------:|------------------------------|
| 1 | `[16384]` | 1 (Q14) | `[1]` |
| 2 | `[16384, 8192]` | 2 (Q13) | `[2, 1]` |
| 3 | `[24576, 24576, 8192]` | 2 (Q13) | `[3, 3, 1]` |
| 4 | `[16384, 24576, 16384, 4096]` | 3 (Q12) | `[4, 6, 4, 1]` |
These are the exact wire-format bytes a second-team encoder would emit
to match the reference's fixed-predictor outputs bit-for-bit. Compliant
encoders MAY use a different set (or none), since §3.6 treats the
coefficient field as opaque — decoders apply the synthesis formula
identically regardless of source.
The `R[0] == 0` short-circuit is both a correctness requirement (§5.1
— Levinson-Durbin is undefined at zero autocorrelation) and an
encoder-cost optimisation: on digital silence, the sparse grid and
fixed-predictor pass produce identical zero residuals and order 0
wins on header size alone.
Levinson-Durbin runs once to order 32 with all intermediate orders saved
into a flat buffer (one recursion pass yields all orders 1..32 at
`O(order²)` cost), so the outer loop fetch is free and order selection
is effectively `O(orders_tried × N)`.
`shift` is determined per order by the coefficient magnitudes — there
is no shift search, as smaller shifts are always at least as good as
larger ones when they don't clamp (saturation, §3.4, is the
fallback). Rice cost at a given `partition_order` is exact and
closed-form given the per-partition `k`, so the inner search
introduces no estimation error.
#### Levinson-Durbin numerical choices (reference, non-normative)
The reference encoder runs Levinson-Durbin with i64 autocorrelation
accumulators, Q31 working coefficients, and widens to i128 for
reflection-coefficient intermediates at orders where Q31 would lose
precision (typically above order ~12). Rounding on the Q31→Q(15shift)
quantisation step is round-half-up via `(a_q31 + bias) >> shift_amt`,
with `bias = 1 << (14 + shift)` — the direct analogue of the synthesis
formula's rounding (§3.6), chosen so analysis and synthesis agree on
tie-break direction.
None of these choices are normative. Two encoders making different
precision or rounding choices will produce different coefficient
bytes on the same input, but both bitstreams decode correctly under
§3.6 so long as the coefficients they emit faithfully represent their
own LPC decision. §3.6's "wire format doesn't distinguish coefficients
by derivation" clause is exactly what permits this freedom.
#### Rice `k` selection (reference, non-normative)
Cost is closed-form for a given `(partition, k)`: `bit_cost(k) =
N · (1 + k) + Σ (v >> k)` where `v` ranges over the zigzag-mapped
residuals of the partition. Exhaustive search over `k ∈ [0, 23]` is
always acceptable and is the simplest compliant choice.
The reference encoder uses convex descent: seed `k_seed =
⌊log₂(mean(v))⌋`, then walk either direction. Ties break **toward
smaller `k`** on descent (condition `cost ≤ best_cost`, so equal
costs at `k1` overwrite `k`) and **strictly larger costs only** on
ascent (condition `cost < best_cost`, so equal costs at `k+1` don't
override `k`). This yields a unique `k` per partition matching an
exhaustive search's first-wins tie-break.
On the reference corpus, the sparse grid's compression matches
exhaustive-search output within ~0.2 percentage points (measured);
the differential test caps the acceptable excess at 0.5%.
Implementations that prefer tighter compression at higher encode
cost can extend the grid without wire-format consequences — decoders
do not care which orders an encoder tried.
### 7.1 Frame size (non-normative)
The codec accepts any `frame_sample_count` in `[1, 65535]`. Larger
frames amortise the 7-byte fixed header and the LPC coefficient vector
over more samples, and generally compress better. Smaller frames give
tighter latency and finer partition-order granularity on transient
content.
Recommended defaults by use case:
- **Real-time voice** (QUIC streaming, MCU mix): 160-320 samples at
16 kHz, or 480-960 at 48 kHz — matches 10-20 ms frame periods used
by Opus / WebRTC.
- **Real-time full-band** (game audio, music conferencing): 1024-2048
at 48 kHz (21-43 ms).
- **Offline / archival**: 4096-8192 at 48 kHz; compression gains
flatten past this.
Power-of-two frame sizes expose every `partition_order ∈ [0, 7]` to
the encoder's search. Non-power-of-two sizes restrict the search to
partition counts that divide evenly; a prime frame size forces
`partition_order = 0`. Encoders SHOULD prefer frame sizes with several
small-prime factors when free to choose.
## 8. Versioning
This document specifies **LAC version 1**, identified on the wire by
`sync_word = 0x1ACC`. No per-frame version byte is carried; the sync
word uniquely identifies the wire format.
Future revisions of the format **must** use a distinct `sync_word`. The
recommended allocation is `0x1ACD` for v2, `0x1ACE` for v3, and so on
inside the `0x1ACC..0x1ACF` cluster, with the cluster boundary making
casual grep / hex-dump inspection robust. A revision whose wire format
cannot be made compatible with v1 at the header level **must** pick a
sync word outside this cluster.
This approach is exhaustive by construction:
- A v1 decoder that encounters a v2 frame sees an unrecognised sync
word on the first check (§3.1) and rejects cleanly — the same error
path as foreign or corrupted payloads.
- A v2-aware decoder dispatches on the sync word before reading any
further field, so it can fall back to v1 parsing when appropriate
or decode v2 frames natively.
Because every field in §3 has a strict range with no reserved
high-bit patterns, in-place extension (flag bits inside existing
fields) is **not** a supported evolution path. New features go into a
new `sync_word`, not into reinterpreting existing field values.
Transports that multiplex LAC frames with other formats should frame
each LAC payload explicitly (length prefix or stream separator); the
sync word alone is not a framing delimiter, only a format identifier.
## 9. Implementation notes (non-normative)
### 9.1 GPU offload is out of scope
LAC is a scalar integer codec. The reference implementation, and any
conforming implementation this document anticipates, runs on a CPU.
GPU offload is deliberately not a goal:
- **Levinson-Durbin** is serial by construction (each iteration depends
on the previous) and its intermediate accumulator needs more than 64
bits of precision at higher orders — a poor fit for WGSL or SPIR-V
compute shaders, which have no native 128-bit integer arithmetic.
- **Rice decode** uses a data-dependent unary run for every residual;
on GPU execution models this diverges warps badly and its
sequential bit-cursor progression fights SIMD lane packing.
- **LPC synthesis** has a tight per-sample feedback loop (sample `i`
depends on samples `i-1`, `i-2`, …, `i-order`), so each channel is
inherently serial.
- **The one plausibly GPU-parallel phase** — residual computation
inside the encoder's order search — is also the phase where the
CPU's autovectorized implementation is already well-served by SIMD
on any modern target. At the measured encode latencies (P99 under
50 µs on x86 for a 20 ms frame period, >400× headroom), there is no
motivation to offload it.
A hypothetical future revision whose hot path genuinely benefited from
GPU execution (large-batch archival encoding across many channels at
once, for instance) would need to change the wire format to carry
enough shape metadata for batched kernels — i.e., a new sync word
under the versioning rules in §8, not a retrofit.
## 10. Conformance test vectors
The reference repository's `tests/conformance.rs` holds the canonical
test-vector set for this specification:
- **`DECODE_FIXTURES`** — `(samples, bytes)` pairs pinned at the byte
level. A conformant decoder **must** produce the `samples` array
when fed the `bytes` array. Encoders have latitude (§3.6, §7), so a
second-team encoder's bytes for the same samples may differ; the
decoder direction is the normative one. Coverage includes the
smallest valid frames (single-sample verbatim, 4- and 8-sample
silence), single-sample polarity boundaries (±1, ±(2²³1)), DC
offset, alternating-polarity Nyquist-like content, smooth polynomial
(fixed-predictor territory), and a 16-sample growing-amplitude
pattern exercising partition search.
- **`REJECT_FIXTURES`** — hand-constructed malformed inputs mapped to
their expected rejection variants. Covers every class in §6 (1-10):
bad sync, each field-range violation, verbatim + non-zero shift,
`frame_sample_count == 0`, non-divisible partition count, header /
coefficient / Rice-bitstream truncation, per-partition `k > 23`.
- **`reject_unary_run_above_cap`** — a programmatic test for §6 class
10 (`q > u32::MAX >> k`). The minimal triggering payload is ~75
bytes of mostly zeros; construction logic is in the test, not a
const fixture.
Second-team implementations should port the decode fixtures
byte-for-byte and the reject fixtures byte-and-variant-for-variant.
`encode_matches_fixtures` in the same file is reference-specific (it
asserts the reference encoder's exact bytes) and is **not** a
conformance requirement — see §3.6's encoder-latitude clause.
### 10.1 Reference encoder exemplars (non-normative)
The same `(samples, bytes)` pairs, read in the encoder direction,
serve as **reference-encoder validation targets** for implementations
that want to match this project's reference byte-for-byte — a common
goal for porting work even though the spec does not require it. The
fixture set is deliberately chosen to pin every encoder-discretion
axis from §7:
- `single_zero`, `single_pos_one`, `single_neg_one` — single-sample
frames. All three fall in the §5.2 "warm-up-is-whole-frame" regime
where the encoder's order choice is nearly arbitrary; pinning the
bytes fixes this project's choice (order 0 with the minimum-cost
Rice encoding).
- `silence_4`, `silence_8` — force `partition_order` tie-breaks on an
all-zero frame (every `partition_order ∈ [0, log₂(N)]` produces
identical cost; the reference picks the smallest via its convex-
descent tie-break).
- `dc_100_4`, `alternating_small_4` — exercise the order-vs-verbatim
decision. DC content favours a low-order LPC fit with small
residuals; alternating content favours order 1 with `a = 1`
(approximated at the closest Q-format). Pinning the bytes fixes
the reference's decision boundary.
- `single_full_scale_pos`, `single_full_scale_neg` — maximum-magnitude
single samples. Exercise the `|sample| ≤ 2²³ 1` boundary on both
sides and fix the zigzag-of-extremum output.
- `linear_ramp_8` — smooth polynomial content, fixed-predictor
territory. Pins the reference's fixed-predictor-vs-LPC tie-break.
- `lfsr_noise_16` — exercises partition search on a frame large
enough for `partition_order > 0` to be competitive.
A second-team encoder that produces the same bytes for every entry
here is **likely** (not guaranteed) to produce matching bytes on
wider inputs, since the tie-break axes are the ones most sensitive
to encoder discretion. An encoder that produces different bytes is
still compliant so long as its own bytes round-trip — see §3.6, §7.

205
benches/codec.rs Normal file
View file

@ -0,0 +1,205 @@
//! Encode/decode throughput benchmarks.
//!
//! Uses the nightly `test` crate harness (`#[bench]`), not criterion. Run
//! with `cargo bench`. Results are wall-clock nanoseconds per iteration; to
//! convert to samples-per-second, divide the frame sample count by the
//! reported ns/iter and multiply by 10⁹.
//!
//! Representative sizes exercise the encoder's exhaustive search behaviour
//! at different `partition_order` ceilings — 256 and 1024 are power-of-two
//! frames (all seven partition orders available); 960 and 2880 mimic
//! Opus/WebRTC frame sizes (only some partition orders divide evenly, so
//! the inner search is sparser).
#![feature(test)]
extern crate test;
use lac::{decode_frame, encode_frame};
use test::Bencher;
// ── Synthetic-signal benches ────────────────────────────────────────────────
//
// These benches drive the encoder without any WAV I/O overhead, so the
// measurement is pure codec work. Four signal shapes cover the space:
//
// - `silence`: order-0 short-circuit path (skips Levinson entirely).
// - `multi_sine`: maximally LPC-friendly — order search converges fast.
// - `pseudo_speech`: AR(2) resonance on LFSR excitation, which is the
// textbook model of the vocal tract. Residuals are near-Laplacian,
// exercising the Rice k-search at realistic speech statistics.
// - `filtered_noise`: LFSR through a biquad low-pass. Broad-spectrum
// content with no strong tonal structure — the hard case for LPC and
// a reasonable proxy for music-like workload.
fn silence(n: usize) -> Vec<i32> {
vec![0i32; n]
}
fn multi_sine(n: usize) -> Vec<i32> {
// Three superimposed sinusoids with incommensurate frequencies, scaled
// to the 24-bit range. Picks up realistic LPC workload without needing
// to read a WAV file in the bench body.
(0..n)
.map(|i| {
let t = i as f64;
let a = (t * 0.11).sin() * 3_000_000.0;
let b = (t * 0.27).sin() * 1_500_000.0;
let c = (t * 0.43).sin() * 750_000.0;
(a + b + c) as i32
})
.collect()
}
/// 32-bit Galois LFSR. Deterministic (seeded) pseudo-random i32 sequence
/// in approximately `±2^19` — one-eighth of full 24-bit scale, chosen to
/// leave headroom for AR(2) resonance gain without clipping.
fn lfsr_noise(n: usize, seed: u32) -> Vec<i32> {
// Non-zero seed required: the LFSR would otherwise lock at zero.
let mut state = if seed == 0 { 0xACE1_ACE1 } else { seed };
(0..n)
.map(|_| {
// Maximal-length 32-bit Galois polynomial x^32 + x^22 + x^2 + x + 1
// (tap mask 0x8020_0003). Period = 2^32 1, which is comfortably
// larger than any bench frame size.
let lsb = state & 1;
state >>= 1;
if lsb != 0 {
state ^= 0x8020_0003;
}
// Sign-extend via `as i32`, arithmetic shift narrows to ~±2^19.
(state as i32) >> 12
})
.collect()
}
/// AR(2) pseudo-speech: LFSR excitation filtered through a single formant
/// resonance at ~700 Hz / 16 kHz with bandwidth ~100 Hz. The pole pair
/// gives speech-shaped spectral envelope; residuals are near-Laplacian,
/// matching the statistical profile of real vowel segments. This is the
/// content class LPC + Rice is designed for, so it stresses the inner
/// encoder loops more realistically than multi_sine (which converges
/// instantly) or white noise (which doesn't benefit from LPC).
fn pseudo_speech(n: usize) -> Vec<i32> {
// Q14 coefficients for a pole at r·e^{±jθ} with r = 0.9806, θ = 2π·700/16000.
// a1 = 2r·cos(θ) · 2^14 ≈ 30933
// a2 = r² · 2^14 ≈ 15751
// The implied real-valued poles live inside the unit circle so the
// recursion is stable for unbounded input lengths.
const A1_Q14: i64 = 30_933;
const A2_Q14: i64 = -15_751;
let excitation = lfsr_noise(n, 0x5EED);
let mut out = Vec::with_capacity(n);
let mut y1: i64 = 0;
let mut y2: i64 = 0;
for &e in &excitation {
// Resonance gain at the peak is ~1/(1r) ≈ 50, so y magnitudes
// reach ~2^19 · 50 ≈ 2^25. i64 intermediates prevent the
// multiply-accumulate from overflowing; the final clamp keeps
// the output inside the codec's 24-bit input contract.
let sum = A1_Q14 * y1 + A2_Q14 * y2;
// Round-to-nearest for the Q14 → integer demotion.
let ar = (sum + (1 << 13)) >> 14;
let y = ar + e as i64;
let clamped = y.clamp(-((1 << 23) - 1), (1 << 23) - 1);
out.push(clamped as i32);
y2 = y1;
y1 = clamped;
}
out
}
/// Broadband low-passed noise — LFSR excitation through a simple
/// single-pole IIR low-pass (pole at 0.9 in Q15). Covers the content
/// class where LPC cannot predict efficiently: residuals retain most
/// of the source entropy, so the Rice coder dominates the output bit
/// budget.
fn filtered_noise(n: usize) -> Vec<i32> {
const POLE_Q15: i64 = 29_491; // round(0.9 · 2^15)
// (1 pole) in Q15 is the DC gain correction keeping output
// magnitude near the excitation range.
const ONE_MINUS_POLE_Q15: i64 = (1 << 15) - POLE_Q15;
let excitation = lfsr_noise(n, 0xFEED);
let mut out = Vec::with_capacity(n);
let mut y: i64 = 0;
for &e in &excitation {
// y[n] = pole·y[n1] + (1pole)·x[n], all in Q15.
let sum = POLE_Q15 * y + ONE_MINUS_POLE_Q15 * e as i64;
y = (sum + (1 << 14)) >> 15;
out.push(y.clamp(-((1 << 23) - 1), (1 << 23) - 1) as i32);
}
out
}
macro_rules! encode_bench {
($name:ident, $signal:ident, $size:expr) => {
#[bench]
fn $name(b: &mut Bencher) {
let samples = $signal($size);
b.iter(|| encode_frame(test::black_box(&samples)));
}
};
}
macro_rules! decode_bench {
($name:ident, $signal:ident, $size:expr) => {
#[bench]
fn $name(b: &mut Bencher) {
let samples = $signal($size);
let encoded = encode_frame(&samples);
b.iter(|| decode_frame(test::black_box(&encoded)).unwrap());
}
};
}
encode_bench!(encode_silence_960, silence, 960);
encode_bench!(encode_silence_4096, silence, 4096);
encode_bench!(encode_sine_256, multi_sine, 256);
encode_bench!(encode_sine_960, multi_sine, 960);
encode_bench!(encode_sine_1024, multi_sine, 1024);
encode_bench!(encode_sine_2048, multi_sine, 2048);
encode_bench!(encode_sine_2880, multi_sine, 2880);
encode_bench!(encode_sine_4096, multi_sine, 4096);
encode_bench!(encode_speech_320, pseudo_speech, 320);
encode_bench!(encode_speech_960, pseudo_speech, 960);
encode_bench!(encode_speech_2048, pseudo_speech, 2048);
encode_bench!(encode_music_960, filtered_noise, 960);
encode_bench!(encode_music_2048, filtered_noise, 2048);
encode_bench!(encode_music_4096, filtered_noise, 4096);
decode_bench!(decode_silence_4096, silence, 4096);
decode_bench!(decode_sine_960, multi_sine, 960);
decode_bench!(decode_sine_4096, multi_sine, 4096);
decode_bench!(decode_speech_960, pseudo_speech, 960);
decode_bench!(decode_music_2048, filtered_noise, 2048);
// ── SIMD dot-product kernel isolation benches ───────────────────────────────
//
// The encode benches above measure end-to-end encode cost, which is
// dominated by per-frame allocations, Rice k-search, and the order
// loop. When evaluating a SIMD kernel change these confound the signal.
// These benches exercise just the kernel at realistic LPC orders so a
// change to `compute_residuals` shows up directly.
macro_rules! compute_residuals_bench {
($name:ident, $order:expr, $len:expr) => {
#[bench]
fn $name(b: &mut Bencher) {
let samples = multi_sine($len);
let coeffs: Vec<i16> = (0..$order)
.map(|i| ((i as i16) * 711).wrapping_sub(100))
.collect();
b.iter(|| {
lac::compute_residuals(test::black_box(&samples), test::black_box(&coeffs), 1)
});
}
};
}
compute_residuals_bench!(compute_residuals_order_4_n320, 4, 320);
compute_residuals_bench!(compute_residuals_order_8_n320, 8, 320);
compute_residuals_bench!(compute_residuals_order_16_n320, 16, 320);
compute_residuals_bench!(compute_residuals_order_32_n320, 32, 320);
compute_residuals_bench!(compute_residuals_order_8_n960, 8, 960);
compute_residuals_bench!(compute_residuals_order_32_n960, 32, 960);
compute_residuals_bench!(compute_residuals_order_32_n4096, 32, 4096);

69
benches/compare-flac.sh Executable file
View file

@ -0,0 +1,69 @@
#!/usr/bin/env bash
#
# Wall-clock + compressed-size comparison between FLAC and LAC on the
# local corpus directory. Diagnostic only — not part of CI.
#
# Usage:
# benches/compare-flac.sh [corpus_dir]
#
# Output columns (tab-separated, stable for piping into column -t):
#
# file flac_ms flac_bytes [matching LAC line from `cargo test`]
#
# For the LAC side, run `cargo test --test corpus --release -- --nocapture`
# and correlate `lac_enc_ms` / `lac=<bytes>` values against the filenames
# printed here. Two separate invocations because automating the join is
# more fragile than eyeballing it for the six files the corpus contains.
set -euo pipefail
CORPUS_DIR="${1:-corpus}"
if ! command -v flac > /dev/null 2>&1; then
echo "flac CLI not found in PATH; install the flac package" >&2
exit 1
fi
if [[ ! -d "$CORPUS_DIR" ]]; then
echo "corpus directory not found: $CORPUS_DIR" >&2
exit 1
fi
# Header. Columns cover both FLAC modes — default (`-5`, what most
# production pipelines actually use) and `--best` (`-8`, the ceiling
# `tests/corpus.rs` asserts its ratios against). `column -t` on the
# output aligns to the printf format below.
printf "%-50s\t%12s\t%14s\t%12s\t%14s\n" \
"file" "flac_d_ms" "flac_d_bytes" "flac_b_ms" "flac_b_bytes"
# Shell globs are unordered across filesystems; sort for stable output.
shopt -s nullglob
files=("$CORPUS_DIR"/*.wav)
IFS=$'\n' files=($(sort <<< "${files[*]}"))
unset IFS
# Warm-up invocation against the first file: the very first `flac` exec
# in a shell session pays dynamic-linker + page-fault costs that aren't
# representative of steady-state. Subsequent runs don't repay them.
if [[ ${#files[@]} -gt 0 ]]; then
flac --stdout --best --silent "${files[0]}" > /dev/null 2>&1 || true
fi
for f in "${files[@]}"; do
# `date +%s%N` gives nanoseconds since epoch on GNU coreutils. Not
# portable to BSD `date`, but this script is Linux-only by design
# (matches the CI runner environment).
#
# Two invocations per file: default (`-5`) first, then `--best`.
# Ordering is deliberate: the default pass also warms the OS file
# cache, so `--best` sees warm-cache I/O and its time reflects the
# compute cost, not disk read.
start_ns=$(date +%s%N)
flac_d_bytes=$(flac --stdout --silent "$f" 2> /dev/null | wc -c)
mid_ns=$(date +%s%N)
flac_b_bytes=$(flac --stdout --best --silent "$f" 2> /dev/null | wc -c)
end_ns=$(date +%s%N)
flac_d_ms=$(( (mid_ns - start_ns) / 1000000 ))
flac_b_ms=$(( (end_ns - mid_ns) / 1000000 ))
printf "%-50s\t%12d\t%14d\t%12d\t%14d\n" \
"$(basename "$f")" "$flac_d_ms" "$flac_d_bytes" "$flac_b_ms" "$flac_b_bytes"
done

BIN
corpus/ES2002a.Array1-01.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-02.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-03.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-04.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-05.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-06.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-07.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Array1-08.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Headset-0.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Headset-1.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Headset-2.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Headset-3.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Lapel-0.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Lapel-1.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Lapel-2.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Lapel-3.wav (Stored with Git LFS) Normal file

Binary file not shown.

BIN
corpus/ES2002a.Mix-Headset.wav (Stored with Git LFS) Normal file

Binary file not shown.

110
fuzz/Cargo.lock generated Normal file
View file

@ -0,0 +1,110 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4
[[package]]
name = "arbitrary"
version = "1.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3d036a3c4ab069c7b410a2ce876bd74808d2d0888a82667669f8e783a898bf1"
[[package]]
name = "cc"
version = "1.2.60"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "43c5703da9466b66a946814e1adf53ea2c90f10063b86290cc9eb67ce3478a20"
dependencies = [
"find-msvc-tools",
"jobserver",
"libc",
"shlex",
]
[[package]]
name = "cfg-if"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "find-msvc-tools"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5baebc0774151f905a1a2cc41989300b1e6fbb29aff0ceffa1064fdd3088d582"
[[package]]
name = "getrandom"
version = "0.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd"
dependencies = [
"cfg-if",
"libc",
"r-efi",
"wasip2",
]
[[package]]
name = "jobserver"
version = "0.1.34"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9afb3de4395d6b3e67a780b6de64b51c978ecf11cb9a462c66be7d4ca9039d33"
dependencies = [
"getrandom",
"libc",
]
[[package]]
name = "lac"
version = "0.1.0"
[[package]]
name = "lac-fuzz"
version = "0.0.0"
dependencies = [
"lac",
"libfuzzer-sys",
]
[[package]]
name = "libc"
version = "0.2.185"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52ff2c0fe9bc6cb6b14a0592c2ff4fa9ceb83eea9db979b0487cd054946a2b8f"
[[package]]
name = "libfuzzer-sys"
version = "0.4.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f12a681b7dd8ce12bff52488013ba614b869148d54dd79836ab85aafdd53f08d"
dependencies = [
"arbitrary",
"cc",
]
[[package]]
name = "r-efi"
version = "5.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f"
[[package]]
name = "shlex"
version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "wasip2"
version = "1.0.3+wasi-0.2.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20064672db26d7cdc89c7798c48a0fdfac8213434a1186e5ef29fd560ae223d6"
dependencies = [
"wit-bindgen",
]
[[package]]
name = "wit-bindgen"
version = "0.57.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1ebf944e87a7c253233ad6766e082e3cd714b5d03812acc24c318f549614536e"

28
fuzz/Cargo.toml Normal file
View file

@ -0,0 +1,28 @@
[package]
name = "lac-fuzz"
version = "0.0.0"
publish = false
edition = "2024"
[package.metadata]
cargo-fuzz = true
[dependencies]
libfuzzer-sys = "0.4"
lac = { path = ".." }
# Required by cargo-fuzz: prevents libtest from running the fuzz targets
# as regular integration tests.
[[bin]]
name = "decode_arbitrary"
path = "fuzz_targets/decode_arbitrary.rs"
test = false
doc = false
bench = false
[[bin]]
name = "roundtrip_arbitrary"
path = "fuzz_targets/roundtrip_arbitrary.rs"
test = false
doc = false
bench = false

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Some files were not shown because too many files have changed in this diff Show more