What is Base64 Encoding?
A complete guide to understanding Base64: how it works, why it exists, and when you should use it.
Introduction to Base64
Base64 is a binary-to-text encoding scheme that converts arbitrary binary data into a string of printable ASCII characters. The name "Base64" refers to the fact that the encoding uses a set of exactly 64 characters to represent data, plus an optional padding character. It was designed to safely transmit binary data through systems that only reliably handle text, such as email servers, URLs, and XML documents.
If you have ever looked at the source of an HTML email, inspected a JWT token, or embedded a small image directly in CSS, you have encountered Base64 encoding. It is one of the most ubiquitous encoding schemes in modern computing, appearing in protocols, file formats, and APIs across every major platform.
How Base64 Encoding Works
At its core, Base64 encoding transforms every group of 3 input bytes (24 bits total) into 4 output characters, each representing 6 bits of data. Since 2 to the power of 6 equals 64, each output character maps to one of 64 possible values in the Base64 alphabet.
The Encoding Process Step by Step
Let us walk through encoding the text "Man" to understand exactly what happens at the binary level:
- Convert characters to ASCII values: "M" = 77, "a" = 97, "n" = 110
- Convert to binary (8 bits each): 01001101 01100001 01101110
- Regroup into 6-bit chunks: 010011 010110 000101 101110
- Map each 6-bit value to the Base64 alphabet: 19=T, 22=W, 5=F, 46=u
The result: "Man" encodes to "TWFu" in Base64.
Visual: 3 Bytes Become 4 Characters
The Base64 Alphabet
The standard Base64 alphabet consists of 64 characters plus an optional padding character:
- Uppercase letters: A-Z (indices 0-25)
- Lowercase letters: a-z (indices 26-51)
- Digits: 0-9 (indices 52-61)
- Special characters: + (index 62) and / (index 63)
- Padding: = (used to pad output to a multiple of 4 characters)
This alphabet was carefully chosen because all 64 characters are safely representable in every text encoding, every email system, and every transport protocol. They are a universally safe subset of ASCII.
Understanding Padding
Base64 encoding always processes input in groups of 3 bytes. But what happens when the input length is not evenly divisible by 3? This is where padding comes in. The = character is appended to the output to make its length a multiple of 4.
- Input is a multiple of 3 bytes: No padding needed. For example, "Man" (3 bytes) encodes to
TWFu(4 characters, no padding). - Input has 1 remaining byte: Two padding characters are added. For example, "M" (1 byte) encodes to
TQ==(2 data characters + 2 padding). - Input has 2 remaining bytes: One padding character is added. For example, "Ma" (2 bytes) encodes to
TWE=(3 data characters + 1 padding).
Padding ensures that any Base64 decoder can determine exactly how many bytes were in the original input. Some implementations, like Base64URL, omit padding entirely since the missing bytes can be inferred from the output length.
The Size Overhead
Because 3 input bytes become 4 output characters, Base64 encoding increases data size by approximately 33%. More precisely, the encoded size is ceil(n / 3) * 4 bytes, where n is the input size. This overhead is the price of text-safe representation and is generally acceptable for small to medium-sized data.
For large data like images or files, this 33% increase can be significant. A 1 MB image becomes approximately 1.37 MB in Base64. This is why Base64-encoded data URIs are best reserved for small assets like icons, favicons, or inline SVGs where the overhead is outweighed by eliminating an extra HTTP request.
Common Use Cases
Email Attachments (MIME)
Base64 encoding was originally designed for email. The SMTP protocol was built to carry 7-bit ASCII text, making it impossible to directly send binary attachments like images, PDFs, or compressed files. Base64 solves this by converting binary data into text that email servers can safely transmit. The MIME standard (Multipurpose Internet Mail Extensions) specifies Base64 as one of the standard content transfer encodings, and it remains the encoding used by every email client today.
Data URIs in HTML and CSS
Data URIs allow you to embed small files directly in HTML or CSS using the format data:[mediatype];base64,[data]. Instead of referencing an external file, the Base64-encoded content is inlined. This eliminates an HTTP request, which can improve performance for small assets. Common uses include inline SVG icons, small background images, and custom fonts embedded in stylesheets. Learn more in our complete guide to data URIs.
API Payloads and JSON
When APIs need to transmit binary data within JSON payloads, Base64 encoding is the standard approach. JSON is a text format that cannot natively represent arbitrary bytes, so fields containing images, certificates, encryption keys, or other binary data are typically Base64-encoded. For example, the Google Cloud APIs use Base64 extensively for transmitting file content, and most REST APIs follow the same pattern.
JSON Web Tokens (JWT)
JWTs use a variant called Base64URL encoding (which replaces + with - and / with _) to encode the header and payload segments. This ensures the token can be safely used in URLs, HTTP headers, and cookies without special escaping. You can decode JWT tokens with our JWT Decoder tool. For a deeper comparison, see our article on Base64 vs Base64URL.
Storing Binary Data in Text Formats
Configuration files (YAML, TOML, INI), environment variables, database text columns, and XML documents all work with text. When you need to store binary data like encryption keys, certificates, or small binary blobs in these formats, Base64 provides a reliable encoding that preserves the data through any text processing step.
When NOT to Use Base64
Despite its versatility, Base64 is not always the right choice:
- Large files: The 33% size increase becomes prohibitive for large images, videos, or archives. Use proper binary transfer mechanisms instead.
- Encryption or security: Base64 is not encryption. It is trivially reversible and provides zero security. Never use Base64 to "hide" sensitive data like passwords or API keys.
- URL parameters for large data: URLs have practical length limits (around 2,000 characters in most browsers). Base64-encoded data can quickly exceed this limit.
- Performance-critical paths: Encoding and decoding add CPU overhead. For high-throughput systems processing millions of operations, consider binary protocols like Protocol Buffers or MessagePack.
Base64 Encoding in Programming Languages
Every modern programming language provides built-in Base64 support:
// JavaScript / Node.js
btoa('Hello World') // Encode: "SGVsbG8gV29ybGQ="
atob('SGVsbG8gV29ybGQ=') // Decode: "Hello World"
# Python
import base64
base64.b64encode(b'Hello World') # b'SGVsbG8gV29ybGQ='
base64.b64decode(b'SGVsbG8gV29ybGQ=') # b'Hello World'
// Java
Base64.getEncoder().encodeToString("Hello World".getBytes());
// "SGVsbG8gV29ybGQ="
// Go
base64.StdEncoding.EncodeToString([]byte("Hello World"))
// "SGVsbG8gV29ybGQ="
# Ruby
Base64.encode64("Hello World") # "SGVsbG8gV29ybGQ=\n"
// C# / .NET
Convert.ToBase64String(Encoding.UTF8.GetBytes("Hello World"));
// "SGVsbG8gV29ybGQ="Try It Yourself
The best way to understand Base64 is to experiment with it. Try encoding different inputs with our Base64 Encoder/Decoder tool and observe how the output changes. Try encoding single characters, then pairs, then triples to see how padding works. Encode binary data from files using the File tab, and analyze existing Base64 strings with the Analyze tab.
Further Reading
- RFC 4648 — Base Encodings
The definitive IETF standard for Base64 and related encodings.
- Base64 — Wikipedia
Overview of Base64 encoding history, algorithm, and character tables.
- MDN btoa()
JavaScript API for encoding binary strings to Base64.
- RFC 2045 — MIME
The original MIME specification where Base64 content-transfer-encoding is defined.