Base Encoding Guide

Published Nov 5, 2025 Updated Nov 20, 2025 7 min read

Base encoding converts binary data into text using a limited set of characters. This guide covers the most common encoding formats, their algorithms, and practical applications.

Try the interactive Base Encoding Tool →

What is Base Encoding?

Base encoding represents binary data as text using printable ASCII characters. It’s not encryption—it provides no security. Use it for data transmission and compatibility, not confidentiality.

When to Use Base Encoding

Data transmission - Send binary data through text-only channels (email, JSON, XML)
URL compatibility - Encode data safely in URLs and filenames
Human readability - Create identifiers that avoid ambiguous characters
Legacy systems - Interface with systems that only support ASCII

Security Warning

Base encoding is not encryption. Anyone can decode it instantly. For security:

Passwords - Use proper hashing (bcrypt, Argon2, scrypt)
Sensitive data - Use encryption (AES, RSA, ChaCha20)
Data integrity - Use HMAC or digital signatures

Format Comparison

Format	Alphabet Size	Padding	Case Sensitive	Overhead	Best For
Base64	64 chars	Yes (`=`)	Yes	~33%	General purpose, MIME, data URIs
Base64 URL	64 chars	No	Yes	~33%	URLs, filenames, JWT tokens
Base32	32 chars	Yes (`=`)	No	~60%	Human input, TOTP secrets, Tor
Base58	58 chars	No	Yes	~37%	Cryptocurrency, IPFS, short codes

Base64 (Standard)

Overview

Base64 is the most widely used encoding format. It converts every 3 bytes (24 bits) into 4 Base64 characters (6 bits each).

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Padding: Uses = when input length isn’t divisible by 3

How it Works

Convert input text to binary (UTF-8)
Split binary into 6-bit groups
Map each 6-bit value (0-63) to a character
Add padding if needed

Example Encoding

Text:     "Hi!"
UTF-8:    0x48 0x69 0x21
Binary:   01001000 01101001 00100001
Grouped:  010010 000110 100100 100001
Decimal:  18     6      36     33
Base64:   S      G      k      h
Result:   "SGkh"

Example with Padding

Text:     "Hi"
UTF-8:    0x48 0x69
Binary:   01001000 01101001
Grouped:  010010 000110 1001[00] (2 bits short, pad with 0s)
Decimal:  18     6      36
Base64:   S      G      k
Padding:  Need 1 more char to make multiple of 4
Result:   "SGk="

Use Cases

Application	Why Base64
Email attachments (MIME)	SMTP is text-only, can’t send binary
Data URIs (`data:image/png;base64,...`)	Embed images/fonts in HTML/CSS
HTTP Basic Auth	`Authorization: Basic base64(user:pass)`
JSON/XML binary data	Both formats are text-based
Certificates (PEM format)	`-----BEGIN CERTIFICATE-----`

Advantages & Disadvantages

Advantages:

Universal support across platforms
Simple algorithm
Minimal overhead (~33%)

Disadvantages:

Not URL-safe (contains + and /)
Case-sensitive
Padding complicates parsing

Base64 URL-safe

Overview

A Base64 variant designed for URLs and filenames. Replaces problematic characters and removes padding.

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

Padding: None (removed)

Changes from Standard Base64:

+ → - (minus)
/ → _ (underscore)
No = padding

How it Works

Encode using standard Base64
Replace + with -
Replace / with _
Remove trailing = padding

Example

Text:           "Hi!"
Standard Base64: "SGkh"
URL-safe:       "SGkh" (no changes needed in this case)

Text:           ">>?"
Standard Base64: "Pj4/"
URL-safe:       "Pj4_" (/ becomes _)

Use Cases

Application	Why Base64 URL
JWT tokens	Tokens often passed in URL parameters
URL parameters	No encoding needed for - and _
Filenames	Safe across all filesystems
URL shorteners	Compact and clean URLs
OAuth state parameters	Passed as URL query params

Advantages

Safe in URLs without percent-encoding
Works in filenames on all operating systems
More compact (no padding)
Double-click selectable

Base32

Overview

Base32 uses uppercase letters and digits 2-7, making it case-insensitive and human-friendly. It converts every 5 bytes (40 bits) into 8 Base32 characters (5 bits each).

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

Padding: Uses = when input length isn’t divisible by 5

Case: Insensitive (decoding accepts lowercase)

How it Works

Convert input to binary (UTF-8)
Split binary into 5-bit groups
Map each 5-bit value (0-31) to a character
Add padding to make length a multiple of 8

Example Encoding

Text:     "Hi"
UTF-8:    0x48 0x69
Binary:   01001000 01101001
Grouped:  01001 00001 10100 1[0000] (pad to 5 bits)
Decimal:  9     1     20    16
Base32:   J     B     U     Q
Padding:  Need 4 more chars to make 8
Result:   "JBUQ===="

Character Choice

Why A-Z and 2-7?

Excluded	Reason
0 (zero)	Looks like O (letter)
1 (one)	Looks like I or l
8, 9	Reserved for extended Base32 variants

This makes Base32 ideal for:

Verbal communication
Manual entry
Case-insensitive systems
OCR and handwriting

Use Cases

Application	Why Base32
TOTP/HOTP secrets (2FA)	Users manually enter seeds, case-insensitive helps
Tor hidden services	`.onion` addresses use Base32
Recovery codes	Easy to read and type correctly
License keys	Unambiguous characters reduce support requests
DNS labels	Case-insensitive, RFC 4648 standard

Advantages & Disadvantages

Advantages:

Case-insensitive
No ambiguous characters
Good for human input

Disadvantages:

Higher overhead (~60%)
Longer output than Base64
Less common support

Base32 Alphabet Variants

RFC 4648 defines the standard Base32 alphabet, but several variants exist for specific use cases:

RFC 4648 Standard

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

The most common variant, used in:

TOTP/HOTP (Google Authenticator, Authy)
Most RFC-compliant implementations
General-purpose applications

RFC 4648 Extended Hex

Alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUV

Uses digits first, then letters:

Preserves hexadecimal sort order
Useful when transitioning from hex encoding

z-base-32

Alphabet: ybndrfg8ejkmcpqxot1uwisza345h769

Designed for human usability:

Characters chosen to avoid confusion in all fonts
Optimized for verbal communication
Used in Tahoe-LAFS distributed filesystem

Crockford’s Base32

Alphabet: 0123456789ABCDEFGHJKMNPQRSTVWXYZ

Created by Douglas Crockford for human-readable IDs:

Excludes I, L, O, U (ambiguous or vowels)
Includes check symbols for error detection
Case-insensitive (decoding accepts lowercase)
Used for short identifiers and database keys

Special features:

Letters I, i, L, l decode as 1
Letter O, o decodes as 0

Bech32

Alphabet: qpzry9x8gf2tvdw0s3jn54khce6mua7l

Designed for Bitcoin SegWit addresses:

Optimized for error detection
No mixed-case requirement
Works well with QR codes
Note: The tool implements only the alphabet, not the full Bech32 checksum

Key design choices:

No b, i, o (visually ambiguous)
No 1 (used as separator in full Bech32)
Optimized for BCH error correction codes

Choosing a Base32 Variant

Variant	Best For	Key Feature
RFC 4648	TOTP, general use	Standard, widest support
Extended Hex	Hex-compatible systems	Preserves sort order
z-base-32	Verbal communication	Maximum clarity
Crockford	Short IDs, human input	Error tolerance
Bech32	Crypto addresses	Error detection

Base58

Overview

Base58 excludes visually ambiguous characters, making it ideal for manual entry and copy-paste. No padding needed.

Alphabet: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

Padding: None

Case: Sensitive

Character Choice

Excluded Characters:

Character	Reason
0 (zero)	Looks like O (capital o)
O (capital o)	Looks like 0 (zero)
I (capital i)	Looks like l (lowercase L) or 1
l (lowercase L)	Looks like I or 1

Advantages of this alphabet:

Double-click selects entire string
No confusion in similar fonts
Easier to read aloud
Better for QR codes

How it Works

Unlike Base64/Base32, Base58 treats input as a large number:

Convert input bytes to a big integer
Repeatedly divide by 58
Map remainders to alphabet characters
Preserve leading zero bytes

Example Encoding

Text:     "Hi"
UTF-8:    0x48 0x69
Integer:  0x4869 = 18537 (decimal)

18537 ÷ 58 = 319 remainder 45 → 'j'
319 ÷ 58 = 5 remainder 29   → 'W'
5 ÷ 58 = 0 remainder 5     → '5'

Result: "5Wj" (reversed remainders)

Base58Check (Bitcoin variant)

Bitcoin uses Base58Check, which adds a version byte (identifies address type) and checksum (first 4 bytes of SHA256(SHA256(data))) for error detection.

Example Bitcoin Address Structure

Version + Payload + Checksum
   1 byte   20 bytes   4 bytes
     ↓         ↓         ↓
    [00] [pubkey hash] [checksum]
         ↓
    Encoded as Base58Check
         ↓
    1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa

Use Cases

Application	Why Base58
Bitcoin/crypto addresses	Unambiguous, includes checksum variant
IPFS content IDs (v0)	Human-readable content addressing
Blockchain explorers	Short, URL-safe identifiers
Vanity addresses	Users can “mine” custom prefixes
Short codes	More compact than Base32

Advantages & Disadvantages

Advantages:

No ambiguous characters
No special characters (URL-safe)
Double-click selectable
More compact than Base32

Disadvantages:

Not as compact as Base64
Limited library support
Complex algorithm (big integer math)
Variable-length encoding

Custom Alphabets

When to Use Custom Alphabets

Create a custom alphabet when:

Domain-specific constraints - Your system only accepts certain characters (e.g., alphanumeric-only filenames)
Human readability - Remove confusing characters for your audience (e.g., no vowels to avoid profanity)
Legacy compatibility - Match an existing encoding scheme
Aesthetic requirements - Use characters appropriate for your context

Not for security. Custom alphabets provide obscurity, not cryptographic protection.