Skip to content

Parts of a URL

Published Updated 4 min read

This guide comprehensively describes the parts of a URL, across the browser standards and other specs.

URL vocabulary is split across three sources, each authoritative in its domain:

The explorer below is interactive and allows you to explore the parts of a URL, and how they are represented in the different standards. Type or paste in a URL to see how it is parsed:

URL anatomy
Legend:RFC 3986Public Suffix Listunmarked = WHATWG
protocol
https:
username
user
password
pass
hostname
www.example.co.uk
port
8080
host
www.example.co.uk:8080
pathname
/path/to/resource
search
?query=string
hash
#section
origin
https:// www.example.co.uk:8080

The parts of a URL

Components below use WHATWG names. The Rosetta table maps each to its RFC 3986 equivalent.

  • protocol: Identifies the URL scheme. https:, mailto:, urn:, data:.
  • username, password: Credentials embedded in the URL. Most fetch APIs strip them. Browsers block them on top-level navigation.
  • hostname: The server name or IP literal. www.example.co.uk, [::1], 127.0.0.1.
    • port: TCP/UDP port number. Omitted when it matches the scheme’s default. e.g. https://example.com:443/ serializes as https://example.com/.
    • host: hostname and port together. example.com:8443.
  • pathname: Slash-delimited path on the host. /books/123, /, /index.html. For opaque-path schemes (mailto:, urn:), it’s everything after the first :.
  • search: Query string. ?q=urls&page=2. Conventionally key=value pairs joined by &. Applications may define their own format.
  • hash: Fragment identifier. #section-2. Resolved client-side and never sent to the server.

Rosetta table

ConceptWHATWGRFC 3986Note
SchemeprotocolschemeWHATWG includes the trailing :.
Bare host name / IPhostnamehost
Hostname plus porthostWHATWG host includes the port; RFC host does not.
Portportport
Credentialsusername + passworduserinfo
Pathpathnamepath
Query stringsearchqueryWHATWG includes the leading ?.
FragmenthashfragmentWHATWG includes the leading #.
Originoriginscheme + :// + host + port, excluding userinfo. Null for file:// and opaque-path schemes.
SubdomainFrom the Public Suffix List, not either URL spec. The labels left of the registrable domain.
Public suffixFrom the PSL. Longest suffix under which a registrar will let anyone register.
Registrable domainFrom the PSL. Public suffix plus one more label. The unit cookies, CORS, and same-site policies care about.

URI, URL, URN

  • URI · Uniform Resource Identifier · RFC 3986

    Any string that identifies a resource. URL and URN are both kinds of URI.

    • URL · Uniform Resource Locator

      Identifies a resource by its location.

      e.g. https://example.com/books/123

    • URN · Uniform Resource Name · RFC 8141

      Identifies a resource by name, independent of location.

      e.g. urn:isbn:0-486-27557-4

The Public Suffix List

Cookies, same-site checks, and CORS compare hostnames by registrable domain. For kai.github.io, the registrable domain is kai.github.io and the public suffix is github.io. github.io isn’t a TLD, but it acts like one for registration. A TLD list misses this. Browsers consult the Public Suffix List instead: a community-maintained list of every public suffix, including com, co.uk, github.io, and vercel.app. The PSL introduces three new terms:

  • subdomain: Labels to the left of the registrable domain.
  • public suffix: The longest suffix under which anyone can register a domain.
  • registrable domain: Public suffix plus one more label.

Edge cases

A few URL shapes break the usual scheme://host/path mental model.

Parsing oddities

Schemes without // authority, like mailto:, urn:, data:, javascript:, have an opaque path, where everything after the scheme is treated as a single string.

  • mailto:foo@bar.com: The @ is part of the pathname, not a userinfo separator. There’s no ://, so no authority section to put credentials in.
  • urn:isbn:0-486-27557-4: Multiple : after the scheme. Everything after the first one is pathname.
  • file:///etc/hosts: Three slashes, because the host between // and / is empty. file:// is the only scheme that allows this. WHATWG gives it an opaque origin.
  • //host/path: No scheme at all. Inherits the scheme of the resolving page.

Spec disagreements

IPv6 zone identifiers. Given https://[fe80::1%25eth0]/:

  • RFC 6874: Permits zone IDs inside the brackets, with % written as %25.
  • WHATWG: No zone-ID grammar. The URL is rejected.

Browsers follow WHATWG, so zone IDs in URLs don’t work in practice.

Unencoded @ in userinfo. Given https://a@b@example.com:

  • RFC 3986: The first @ ends userinfo. userinfo = a, host = b@example.com — which fails the host grammar, so the URL is rejected as malformed.
  • WHATWG: The last @ before the next /?# ends userinfo. userinfo = a@b, host = example.com. Earlier @ characters get percent-encoded to %40.

Browsers follow WHATWG. To stay compatible with both specs, percent-encode @ as %40 inside userinfo.

References