A Bug and a Fix
Today, a short tale of fun mystery-solving. I’ve been working on a project that involves a server dynamically generating audio files and streaming them to a client via a WebRTC session.
The dynamic audio generation process works like this: first run a program that
generates a wav
file. Then compress the .wav
to an .ogg
which contains an
Opus audio stream and deliver it over the network.
The WebRTC portion is handled by the awesome Pion
library, a pure Golang implementation which
makes customizing WebRTC (for example by streaming a dynamically generated audio
file) super easy.
So here was the bug: certain audio files were delivered successfully over the
wire and played seamlessly in the browser. Other files weren’t. I knew there
wasn’t anything fundamentally wrong with the files that wouldn’t transmit
because I could listen to them with any old audio player (Google Chrome, for
example). All of the files were ogg
+ Opus sampled at 48 kHz.
Since I could open the files locally, I figured there was probably an issue with
the network. Chrome has a handy tool at
chrome://webrtc-internals for inspecting WebRTC
sessions. Sure enough, this tool revealed that the client never actually
received any bytes of the problematic ogg
files. But why?
Ogg
is a container format which can hold multiple logical data streams, each
with their own respective encoding. In this case, the ogg
files held a single
logical Opus stream. Mozilla maintains a useful tool
called opusinfo
in the opus-tools package
that inspects Opus streams. Here’s what the opusinfo
output looks like for one
of the files which transmitted successfully:
$ opusinfo good_sound.ogg
Processing file "good_sound.ogg"...
New logical stream (#1, serial: 1fe69032): type opus
Encoded with Lavf58.45.100
User comments section follows...
encoder=Lavc58.91.100 libopus
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Total data length: 20863 bytes (overhead: 12.8%)
Playback length: 0m:01.779s
Average bitrate: 93.79 kb/s, w/o overhead: 81.78 kb/s
Logical stream 1 ended
And here’s the output for one which didn’t transmit:
$ opusinfo bad_sound.ogg
Processing file "bad_sound.ogg"...
New logical stream (#1, serial: 9f763f54): type opus
Encoded with Lavf58.45.100
User comments section follows...
encoder=Lavc58.91.100 libopus
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 1000.0ms (max), 900.0ms (avg), 800.0ms (min)
Total data length: 18487 bytes (overhead: 1.6%)
Playback length: 0m:01.779s
Average bitrate: 83.11 kb/s, w/o overhead: 81.78 kb/s
Logical stream 1 ended
Pretty similar! But one obvious difference. The bad files had a much longer
Page duration
than the good ones.
Could this make a difference? It could! When the ogg
files are streamed over the
WebRTC media channel, they are sent via RTP, a protocol over UDP. Each RTP
datagram contains one page of the ogg
file. This meant that the WebRTC server
was attempting to send 10kB datagrams. Too big! (The RTP MTU is 1200
bytes.)
Why did the bad files have such large pages? They were generated by compressing
a .wav
with ffmpeg
, invoked like so:
ffmpeg -i file.wav -c:a libopus -ac 2 file.ogg
But the ffmpeg
ogg mux has a
-page_duration
setting to specify how to slice up the pages. I hadn’t known
about this setting and wasn’t using it. The default: 1000ms. And so, the
19-character fix for my bug:
# page_duration unit is microseconds
ffmpeg -i file.wav -c:a libopus -page_duration 2000 -ac 2 file.ogg
And all my files streamed happily ever after.