Base Encoding Guide
Base encoding converts binary data into text using a limited set of characters. This guide covers the most common encoding formats, their algorithms, and practical applications.
Try the interactive Base Encoding Tool →
What is Base Encoding?
Base encoding represents binary data as text using printable ASCII characters. It’s not encryption—it provides no security. Use it for data transmission and compatibility, not confidentiality.
When to Use Base Encoding
- Data transmission - Send binary data through text-only channels (email, JSON, XML)
- URL compatibility - Encode data safely in URLs and filenames
- Human readability - Create identifiers that avoid ambiguous characters
- Legacy systems - Interface with systems that only support ASCII
Security Warning
Base encoding is not encryption. Anyone can decode it instantly. For security:
- Passwords - Use proper hashing (bcrypt, Argon2, scrypt)
- Sensitive data - Use encryption (AES, RSA, ChaCha20)
- Data integrity - Use HMAC or digital signatures
Format Comparison
| Format | Alphabet Size | Padding | Case Sensitive | Overhead | Best For |
|---|---|---|---|---|---|
| Base64 | 64 chars | Yes (=) | Yes | ~33% | General purpose, MIME, data URIs |
| Base64 URL | 64 chars | No | Yes | ~33% | URLs, filenames, JWT tokens |
| Base32 | 32 chars | Yes (=) | No | ~60% | Human input, TOTP secrets, Tor |
| Base58 | 58 chars | No | Yes | ~37% | Cryptocurrency, IPFS, short codes |
Base64 (Standard)
Overview
Base64 is the most widely used encoding format. It converts every 3 bytes (24 bits) into 4 Base64 characters (6 bits each).
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
Padding: Uses = when input length isn’t divisible by 3
How it Works
- Convert input text to binary (UTF-8)
- Split binary into 6-bit groups
- Map each 6-bit value (0-63) to a character
- Add padding if needed
Example Encoding
Text: "Hi!" UTF-8: 0x48 0x69 0x21 Binary: 01001000 01101001 00100001 Grouped: 010010 000110 100100 100001 Decimal: 18 6 36 33 Base64: S G k h Result: "SGkh"
Example with Padding
Text: "Hi" UTF-8: 0x48 0x69 Binary: 01001000 01101001 Grouped: 010010 000110 1001[00] (2 bits short, pad with 0s) Decimal: 18 6 36 Base64: S G k Padding: Need 1 more char to make multiple of 4 Result: "SGk="
Use Cases
| Application | Why Base64 |
|---|---|
| Email attachments (MIME) | SMTP is text-only, can’t send binary |
Data URIs (data:image/png;base64,...) | Embed images/fonts in HTML/CSS |
| HTTP Basic Auth | Authorization: Basic base64(user:pass) |
| JSON/XML binary data | Both formats are text-based |
| Certificates (PEM format) | -----BEGIN CERTIFICATE----- |
Advantages & Disadvantages
Advantages:
- Universal support across platforms
- Simple algorithm
- Minimal overhead (~33%)
Disadvantages:
- Not URL-safe (contains
+and/) - Case-sensitive
- Padding complicates parsing
Base64 URL-safe
Overview
A Base64 variant designed for URLs and filenames. Replaces problematic characters and removes padding.
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
Padding: None (removed)
Changes from Standard Base64:
+→-(minus)/→_(underscore)- No
=padding
How it Works
- Encode using standard Base64
- Replace
+with- - Replace
/with_ - Remove trailing
=padding
Example
Text: "Hi!" Standard Base64: "SGkh" URL-safe: "SGkh" (no changes needed in this case) Text: ">>?" Standard Base64: "Pj4/" URL-safe: "Pj4_" (/ becomes _)
Use Cases
| Application | Why Base64 URL |
|---|---|
| JWT tokens | Tokens often passed in URL parameters |
| URL parameters | No encoding needed for - and _ |
| Filenames | Safe across all filesystems |
| URL shorteners | Compact and clean URLs |
| OAuth state parameters | Passed as URL query params |
Advantages
- Safe in URLs without percent-encoding
- Works in filenames on all operating systems
- More compact (no padding)
- Double-click selectable
Base32
Overview
Base32 uses uppercase letters and digits 2-7, making it case-insensitive and human-friendly. It converts every 5 bytes (40 bits) into 8 Base32 characters (5 bits each).
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
Padding: Uses = when input length isn’t divisible by 5
Case: Insensitive (decoding accepts lowercase)
How it Works
- Convert input to binary (UTF-8)
- Split binary into 5-bit groups
- Map each 5-bit value (0-31) to a character
- Add padding to make length a multiple of 8
Example Encoding
Text: "Hi" UTF-8: 0x48 0x69 Binary: 01001000 01101001 Grouped: 01001 00001 10100 1[0000] (pad to 5 bits) Decimal: 9 1 20 16 Base32: J B U Q Padding: Need 4 more chars to make 8 Result: "JBUQ===="
Character Choice
Why A-Z and 2-7?
| Excluded | Reason |
|---|---|
| 0 (zero) | Looks like O (letter) |
| 1 (one) | Looks like I or l |
| 8, 9 | Reserved for extended Base32 variants |
This makes Base32 ideal for:
- Verbal communication
- Manual entry
- Case-insensitive systems
- OCR and handwriting
Use Cases
| Application | Why Base32 |
|---|---|
| TOTP/HOTP secrets (2FA) | Users manually enter seeds, case-insensitive helps |
| Tor hidden services | .onion addresses use Base32 |
| Recovery codes | Easy to read and type correctly |
| License keys | Unambiguous characters reduce support requests |
| DNS labels | Case-insensitive, RFC 4648 standard |
Advantages & Disadvantages
Advantages:
- Case-insensitive
- No ambiguous characters
- Good for human input
Disadvantages:
- Higher overhead (~60%)
- Longer output than Base64
- Less common support
Base32 Alphabet Variants
RFC 4648 defines the standard Base32 alphabet, but several variants exist for specific use cases:
RFC 4648 Standard
Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
The most common variant, used in:
- TOTP/HOTP (Google Authenticator, Authy)
- Most RFC-compliant implementations
- General-purpose applications
RFC 4648 Extended Hex
Alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUV
Uses digits first, then letters:
- Preserves hexadecimal sort order
- Useful when transitioning from hex encoding
z-base-32
Alphabet: ybndrfg8ejkmcpqxot1uwisza345h769
Designed for human usability:
- Characters chosen to avoid confusion in all fonts
- Optimized for verbal communication
- Used in Tahoe-LAFS distributed filesystem
Crockford’s Base32
Alphabet: 0123456789ABCDEFGHJKMNPQRSTVWXYZ
Created by Douglas Crockford for human-readable IDs:
- Excludes I, L, O, U (ambiguous or vowels)
- Includes check symbols for error detection
- Case-insensitive (decoding accepts lowercase)
- Used for short identifiers and database keys
Special features:
- Letters I, i, L, l decode as 1
- Letter O, o decodes as 0
Bech32
Alphabet: qpzry9x8gf2tvdw0s3jn54khce6mua7l
Designed for Bitcoin SegWit addresses:
- Optimized for error detection
- No mixed-case requirement
- Works well with QR codes
- Note: The tool implements only the alphabet, not the full Bech32 checksum
Key design choices:
- No
b,i,o(visually ambiguous) - No
1(used as separator in full Bech32) - Optimized for BCH error correction codes
Choosing a Base32 Variant
| Variant | Best For | Key Feature |
|---|---|---|
| RFC 4648 | TOTP, general use | Standard, widest support |
| Extended Hex | Hex-compatible systems | Preserves sort order |
| z-base-32 | Verbal communication | Maximum clarity |
| Crockford | Short IDs, human input | Error tolerance |
| Bech32 | Crypto addresses | Error detection |
Base58
Overview
Base58 excludes visually ambiguous characters, making it ideal for manual entry and copy-paste. No padding needed.
Alphabet: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
Padding: None
Case: Sensitive
Character Choice
Excluded Characters:
| Character | Reason |
|---|---|
| 0 (zero) | Looks like O (capital o) |
| O (capital o) | Looks like 0 (zero) |
| I (capital i) | Looks like l (lowercase L) or 1 |
| l (lowercase L) | Looks like I or 1 |
Advantages of this alphabet:
- Double-click selects entire string
- No confusion in similar fonts
- Easier to read aloud
- Better for QR codes
How it Works
Unlike Base64/Base32, Base58 treats input as a large number:
- Convert input bytes to a big integer
- Repeatedly divide by 58
- Map remainders to alphabet characters
- Preserve leading zero bytes
Example Encoding
Text: "Hi" UTF-8: 0x48 0x69 Integer: 0x4869 = 18537 (decimal) 18537 ÷ 58 = 319 remainder 45 → 'j' 319 ÷ 58 = 5 remainder 29 → 'W' 5 ÷ 58 = 0 remainder 5 → '5' Result: "5Wj" (reversed remainders)
Base58Check (Bitcoin variant)
Bitcoin uses Base58Check, which adds a version byte (identifies address type) and checksum (first 4 bytes of SHA256(SHA256(data))) for error detection.
Example Bitcoin Address Structure
Version + Payload + Checksum 1 byte 20 bytes 4 bytes ↓ ↓ ↓ [00] [pubkey hash] [checksum] ↓ Encoded as Base58Check ↓ 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
Use Cases
| Application | Why Base58 |
|---|---|
| Bitcoin/crypto addresses | Unambiguous, includes checksum variant |
| IPFS content IDs (v0) | Human-readable content addressing |
| Blockchain explorers | Short, URL-safe identifiers |
| Vanity addresses | Users can “mine” custom prefixes |
| Short codes | More compact than Base32 |
Advantages & Disadvantages
Advantages:
- No ambiguous characters
- No special characters (URL-safe)
- Double-click selectable
- More compact than Base32
Disadvantages:
- Not as compact as Base64
- Limited library support
- Complex algorithm (big integer math)
- Variable-length encoding
Custom Alphabets
When to Use Custom Alphabets
Create a custom alphabet when:
- Domain-specific constraints - Your system only accepts certain characters (e.g., alphanumeric-only filenames)
- Human readability - Remove confusing characters for your audience (e.g., no vowels to avoid profanity)
- Legacy compatibility - Match an existing encoding scheme
- Aesthetic requirements - Use characters appropriate for your context
Not for security. Custom alphabets provide obscurity, not cryptographic protection.
How to Design a Custom Alphabet
1. Choose your character set
Consider what constraints you have:
- URL-safe? Avoid special characters
- Case-sensitive system? Can use upper and lowercase
- Human input? Avoid ambiguous characters (0/O, 1/I/l)
- Voice communication? Use distinct sounds
2. Pick the alphabet size
Larger alphabets = more compact encoding:
- Base16 (Hex) - 2 characters per byte, easy to read
- Base32 - Good balance, case-insensitive options
- Base58 - Compact, unambiguous
- Base64 - Most compact, but has special chars
3. Order matters for sorting
If you need encoded values to sort the same as the original data, put characters in sort order:
0123456789ABCDEF(hex order)ABCDEFGHIJKLMNOPQRSTUVWXYZ234567(RFC 4648 Base32)
Design Checklist
- All characters are unique (no duplicates)
- 2-256 characters in your alphabet
- Works in your target environment (URLs, databases, filesystems)
- Documented for future maintainers
- Tested with edge cases (empty input, all zeros, large inputs)
Common Examples
Alphanumeric (Base36) - Case-insensitive, no special chars:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
No vowels (Base32) - Avoid accidental words:
23456789BCDFGHJKLMNPQRSTVWXYZ
DNA encoding (Base4):
ACGT
Lowercase hex (Base16):
0123456789abcdef
Learn More
Try the interactive Base Encoding Tool →
For specifications, see RFC 4648 (Base64, Base32, Base16) and Bitcoin Base58Check.