Magic Bytes and File Signatures
How computers identify file types by their header bytes, with a complete reference table of common magic numbers.
What Are Magic Bytes?
Magic bytes (also called magic numbers or file signatures) are a fixed sequence of bytes at the beginning of a file that identifies its format. When you double-click a file, your operating system does not rely solely on the file extension to determine how to open it. It reads the first few bytes and compares them against a database of known signatures. This mechanism is more reliable than extensions, which can be renamed, removed, or forged.
The term "magic number" comes from the Unix file command, which uses a "magic file" database (/usr/share/misc/magic) to match byte patterns to file types. The concept predates file extensions entirely: early Unix systems had no extensions, and file identification was always content-based.
Why Magic Bytes Matter for Developers
Understanding magic bytes is essential for several practical scenarios:
- File upload validation: Never trust the file extension or MIME type sent by the client. Check the magic bytes on the server to verify the file is actually the type it claims to be. This prevents attacks like uploading a PHP script disguised as a JPEG.
- Content-type detection: When serving user-uploaded files, determine the correct
Content-Typeheader from the magic bytes rather than the extension. - Data recovery: When files are deleted, their directory entries are removed but the data often remains on disk. File carving tools scan raw disk sectors for magic bytes to find and reconstruct files.
- Forensic analysis: Examining magic bytes reveals hidden files, steganography, and files with deliberately wrong extensions.
- Format conversion: Before converting a file, check its actual format. A file named
photo.pngmight actually be a JPEG if someone simply renamed it.
Common File Signatures Reference
Images
| Format | Magic Bytes (Hex) | ASCII |
|---|---|---|
| PNG | 89 50 4E 47 0D 0A 1A 0A | .PNG.... |
| JPEG | FF D8 FF | ... |
| GIF87a | 47 49 46 38 37 61 | GIF87a |
| GIF89a | 47 49 46 38 39 61 | GIF89a |
| BMP | 42 4D | BM |
| WebP | 52 49 46 46 ... 57 45 42 50 | RIFF...WEBP |
| TIFF (LE) | 49 49 2A 00 | II*. |
| TIFF (BE) | 4D 4D 00 2A | MM.* |
| SVG | 3C 73 76 67 | <svg |
| ICO | 00 00 01 00 | .... |
The PNG signature is particularly clever. The first byte (89) is non-ASCII, which immediately distinguishes PNG from text files. Bytes 2-4 spell "PNG" in ASCII for human recognition. Bytes 5-6 are a DOS-style line ending (0D 0A) that detects incorrect text-mode transfers. Byte 7 (1A) is the DOS end-of-file character, which prevents the file from being accidentally displayed as text. Byte 8 (0A) is a Unix line ending that detects the opposite transfer corruption. Every byte serves a defensive purpose.
Archives
| Format | Magic Bytes (Hex) | ASCII |
|---|---|---|
| ZIP | 50 4B 03 04 | PK.. |
| GZIP | 1F 8B | .. |
| RAR | 52 61 72 21 1A 07 | Rar!.. |
| 7-Zip | 37 7A BC AF 27 1C | 7z.... |
| BZIP2 | 42 5A 68 | BZh |
| XZ | FD 37 7A 58 5A 00 | .7zXZ. |
| Zstandard | 28 B5 2F FD | (./. |
Note that ZIP and DOCX/XLSX/PPTX share the same magic bytes (50 4B 03 04 or "PK"), because Microsoft Office Open XML files are ZIP archives containing XML documents. To distinguish them, you need to look at the contents of the archive, not just the header bytes.
Executables
| Format | Magic Bytes (Hex) | Notes |
|---|---|---|
| Windows PE | 4D 5A | MZ header (Mark Zbikowski) |
| ELF (Linux) | 7F 45 4C 46 | .ELF |
| Mach-O 64-bit | FE ED FA CF | macOS executables |
| Mach-O 32-bit | FE ED FA CE | Legacy macOS |
| Java .class | CA FE BA BE | Java bytecode |
| WebAssembly | 00 61 73 6D | .asm (\0asm) |
The Windows PE (Portable Executable) format starts with 4D 5A ("MZ"), the initials of Mark Zbikowski, a Microsoft engineer who designed the original DOS executable format. Every .exe and .dll file starts with these two bytes, a tradition that has survived four decades of Windows evolution. The Java class file signature CA FE BA BE ("CAFEBABE") is one of the most memorable magic numbers in computing, chosen by the Java designers as a playful reference to the Peet's Coffee near their office.
Documents
| Format | Magic Bytes (Hex) | ASCII |
|---|---|---|
| 25 50 44 46 | ||
| OLE2 (DOC/XLS) | D0 CF 11 E0 A1 B1 1A E1 | ........ |
| SQLite | 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 | SQLite format 3. |
Implementing Magic Byte Detection
Here is a minimal implementation in JavaScript for detecting common file types:
async function detectFileType(file) {
// Read the first 16 bytes
const slice = file.slice(0, 16);
const buffer = await slice.arrayBuffer();
const bytes = new Uint8Array(buffer);
// Convert to hex string for matching
const hex = Array.from(bytes)
.map(b => b.toString(16).padStart(2, '0'))
.join('');
// Check signatures (longest first for specificity)
if (hex.startsWith('89504e47')) return 'PNG';
if (hex.startsWith('ffd8ff')) return 'JPEG';
if (hex.startsWith('25504446')) return 'PDF';
if (hex.startsWith('504b0304')) return 'ZIP';
if (hex.startsWith('7f454c46')) return 'ELF';
if (hex.startsWith('4d5a')) return 'PE (EXE/DLL)';
if (hex.startsWith('cafebabe')) return 'Java Class';
return 'Unknown';
}In production, use a library with a comprehensive signature database. Our File Viewer uses a database of 40+ signatures covering images, archives, documents, executables, audio, video, fonts, and databases.
Security Considerations
Magic bytes are not a security boundary. An attacker can construct a file that has valid magic bytes for one format but contains a payload for another. Polyglot files (files that are valid in multiple formats simultaneously) are a well-known attack technique. For example, a file can be both a valid JPEG and a valid JavaScript file.
This means magic byte checking should be one layer of a defense-in-depth strategy, not the only validation. For file uploads, also:
- Validate the full file structure (not just the header)
- Re-encode images through a trusted library (which strips embedded code)
- Set strict Content-Type and Content-Disposition headers when serving
- Store uploaded files outside the web root
- Limit file sizes and types to what your application actually needs
The Unix file Command
The file command is the standard tool for magic byte detection on Unix systems. It consults the magic database at /usr/share/misc/magic (or /usr/share/file/magic) and reports the detected type:
$ file photo.jpg
photo.jpg: JPEG image data, JFIF standard 1.01
$ file mysterious.bin
mysterious.bin: ELF 64-bit LSB executable, x86-64
$ file renamed.png
renamed.png: PNG image data, 800 x 600, 8-bit/color RGBANotice that file ignores the extension entirely. Even if you rename a PNG to .txt, it correctly identifies the format from the magic bytes.
Try It Yourself
Drop any file into our File Hex Viewer to see its magic bytes detected automatically. The viewer shows the file type, MIME type, extension, and magic byte sequence. Use the hex pattern search to find specific byte sequences within the file, or browse the Hex Dump tab to inspect text data byte by byte.
Further Reading
- List of File Signatures — Wikipedia
Comprehensive list of magic bytes for hundreds of file formats.
- file(1) — Linux Manual Page
Documentation for the Unix file type detection command.
- Gary Kessler File Signatures Table
Authoritative reference used by forensic analysts for file signature identification.
- RFC 6838 — Media Type Specifications
The IETF standard defining MIME type registration procedures that complement magic byte detection.
- Magic number (programming) — Wikipedia
Broader context on magic numbers in programming, including file signatures, protocol markers, and sentinel values.