In our previous post, we demystified the structure of H.264 bitstreams, focusing on NAL units and slices as the foundational building blocks. We saw how NAL units encapsulate everything from parameter sets (SPS/PPS) to coded slices, separated by start codes, and how slices divide frames for resilience and performance.
Now, let's get hands-on. This sequel post walks you through parsing an H.264 bitstream in code. We'll implement a simple parser that:
This is an educational example — not production-grade — but it gives you a solid foundation for deeper work, such as extracting SPS/PPS or integrating with decoders like FFmpeg.
.264 or .h264 file in Annex B format). You can find free samples online, like from the JVT test suite or FFmpeg's test streams.struct module for byte-level reading.0x000001 or 0x00000001.forbidden_zero_bit (1 bit) | nal_ref_idc (2 bits) | nal_unit_type (5 bits).Here's a complete, runnable Python script that parses the bitstream and reports each NAL unit.
import sys
import struct
# NAL unit type names for readability
NAL_TYPES = {
1: "Coded slice of a non-IDR picture",
2: "Coded slice data partition A",
3: "Coded slice data partition B",
4: "Coded slice data partition C",
5: "Coded slice of an IDR picture",
6: "Supplemental enhancement information (SEI)",
7: "Sequence parameter set (SPS)",
8: "Picture parameter set (PPS)",
9: "Access unit delimiter",
10: "End of sequence",
11: "End of stream",
12: "Filler data",
13: "Sequence parameter set extension",
14: "Prefix NAL unit",
15: "Subset sequence parameter set",
19: "Coded slice of an auxiliary coded picture without partitioning",
20: "Coded slice extension",
}
def find_start_code(data, pos):
"""Find the next start code (0x000001 or 0x00000001) starting from pos."""
while pos < len(data) - 3:
if data[pos:pos+3] == b'\x00\x00\x01':
return pos, 3
if data[pos:pos+4] == b'\x00\x00\x00\x01':
return pos, 4
pos += 1
return None, None
def parse_nal_header(byte):
"""Parse the 1-byte NAL header."""
forbidden_zero = (byte >> 7) & 0x01
nal_ref_idc = (byte >> 5) & 0x03
nal_unit_type = byte & 0x1F
return forbidden_zero, nal_ref_idc, nal_unit_type
def main(filename):
try:
with open(filename, 'rb') as f:
data = f.read()
except FileNotFoundError:
print(f"Error: File '{filename}' not found.")
return
print(f"Parsing H.264 bitstream: {filename}")
print("Offset\tStart Code Len\tNAL Type\tRef\tDescription")
pos = 0
nal_count = 0
while pos < len(data):
start_pos, sc_len = find_start_code(data, pos)
if start_pos is None:
break
# Move past the start code
pos = start_pos + sc_len
# Read the NAL header (next byte)
if pos >= len(data):
break
nal_header = data[pos]
pos += 1
forbidden, ref_idc, unit_type = parse_nal_header(nal_header)
if forbidden != 0:
print(f"Warning: Forbidden zero bit set at offset {start_pos}")
# Find the next start code to determine NAL length
next_start, _ = find_start_code(data, pos)
nal_length = (next_start - start_pos - sc_len) if next_start else (len(data) - start_pos - sc_len)
description = NAL_TYPES.get(unit_type, f"Reserved/Unknown ({unit_type})")
ref_str = "Yes" if ref_idc > 0 else "No"
print(f"{start_pos:08X}\t{sc_len}\t\t{unit_type:02d}\t\t{ref_str}\t{description}")
nal_count += 1
# Move to the next potential start code
pos = start_pos + sc_len + 1 # +1 for header
print(f"\nTotal NAL units found: {nal_count}")
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python h264_parser.py ")
sys.exit(1)
main(sys.argv[1])
h264_parser.py.test.264).python h264_parser.py test.264Parsing H.264 bitstream: test.264
Offset Start Code Len NAL Type Ref Description
00000000 4 07 Yes Sequence parameter set (SPS)
0000001A 4 08 Yes Picture parameter set (PPS)
00000028 4 09 No Access unit delimiter
0000002C 4 05 Yes Coded slice of an IDR picture
...
Total NAL units found: 120
This basic version is a great starting point. You can expand it to:
0x03 inserted after 0x0000 or 0x0001).libavcodec in C for full decoding, or pyav in Python.For example, to remove emulation prevention bytes:
def remove_emulation_prevention(data):
i = 0
while i < len(data) - 2:
if data[i:i+3] == b'\x00\x00\x03':
data = data[:i+2] + data[i+3:]
i += 1
return data
Parsing the bitstream yourself gives you insight into how H.264 data is organized — invaluable for debugging, custom streaming servers, or building tools like bitstream analyzers.
If you're serious about video coding, tools like Elecard StreamEye, FFmpeg's ffprobe, or Bitstream Analyzer from the H.264 reference software provide deeper inspection. But nothing beats rolling your own parser to truly understand the format.
Try it out with your own streams! Have questions about extending the code or tackling specific NAL types? Let me know in the comments.