Parts of a URL
This guide comprehensively describes the parts of a URL, across the browser standards and other specs.
URL vocabulary is split across three sources, each authoritative in its domain:
- WHATWG URL Standard: used by browsers
- RFC 3986: used by HTTP, OAuth, JWT, SIP, and most non-browser tooling.
- Public Suffix List: used by browsers to determine the registrable domain.
The explorer below is interactive and allows you to explore the parts of a URL, and how they are represented in the different standards. Type or paste in a URL to see how it is parsed:
- protocol
- https:
- username
- user
- password
- pass
- hostname
- www.example.co.uk
- port
- 8080
- host
- www.example.co.uk:8080
- pathname
- /path/to/resource
- search
- ?query=string
- hash
- #section
- origin
- https:// www.example.co.uk:8080
The parts of a URL
Components below use WHATWG names. The Rosetta table maps each to its RFC 3986 equivalent.
- protocol: Identifies the URL scheme.
https:,mailto:,urn:,data:. - username, password: Credentials embedded in the URL. Most fetch APIs strip them. Browsers block them on top-level navigation.
- hostname: The server name or IP literal.
www.example.co.uk,[::1],127.0.0.1.- port: TCP/UDP port number. Omitted when it matches the scheme’s default. e.g.
https://example.com:443/serializes ashttps://example.com/. - host: hostname and port together.
example.com:8443.
- port: TCP/UDP port number. Omitted when it matches the scheme’s default. e.g.
- pathname: Slash-delimited path on the host.
/books/123,/,/index.html. For opaque-path schemes (mailto:,urn:), it’s everything after the first:. - search: Query string.
?q=urls&page=2. Conventionally key=value pairs joined by&. Applications may define their own format. - hash: Fragment identifier.
#section-2. Resolved client-side and never sent to the server.
Rosetta table
| Concept | WHATWG | RFC 3986 | Note |
|---|---|---|---|
| Scheme | protocol | scheme | WHATWG includes the trailing :. |
| Bare host name / IP | hostname | host | |
| Hostname plus port | host | — | WHATWG host includes the port; RFC host does not. |
| Port | port | port | |
| Credentials | username + password | userinfo | |
| Path | pathname | path | |
| Query string | search | query | WHATWG includes the leading ?. |
| Fragment | hash | fragment | WHATWG includes the leading #. |
| Origin | origin | — | scheme + :// + host + port, excluding userinfo. Null for file:// and opaque-path schemes. |
| Subdomain | — | — | From the Public Suffix List, not either URL spec. The labels left of the registrable domain. |
| Public suffix | — | — | From the PSL. Longest suffix under which a registrar will let anyone register. |
| Registrable domain | — | — | From the PSL. Public suffix plus one more label. The unit cookies, CORS, and same-site policies care about. |
URI, URL, URN
-
URI · Uniform Resource Identifier · RFC 3986
Any string that identifies a resource. URL and URN are both kinds of URI.
-
URL · Uniform Resource Locator
Identifies a resource by its location.
e.g.
https://example.com/books/123 -
URN · Uniform Resource Name · RFC 8141
Identifies a resource by name, independent of location.
e.g.
urn:isbn:0-486-27557-4
-
The Public Suffix List
Cookies, same-site checks, and CORS compare hostnames by registrable domain. For kai.github.io, the registrable domain is kai.github.io and the public suffix is github.io. github.io isn’t a TLD, but it acts like one for registration. A TLD list misses this. Browsers consult the Public Suffix List instead: a community-maintained list of every public suffix, including com, co.uk, github.io, and vercel.app. The PSL introduces three new terms:
- subdomain: Labels to the left of the registrable domain.
- public suffix: The longest suffix under which anyone can register a domain.
- registrable domain: Public suffix plus one more label.
Edge cases
A few URL shapes break the usual scheme://host/path mental model.
Parsing oddities
Schemes without // authority, like mailto:, urn:, data:, javascript:, have an opaque path, where everything after the scheme is treated as a single string.
mailto:foo@bar.com: The@is part of the pathname, not a userinfo separator. There’s no://, so no authority section to put credentials in.urn:isbn:0-486-27557-4: Multiple:after the scheme. Everything after the first one is pathname.file:///etc/hosts: Three slashes, because the host between//and/is empty.file://is the only scheme that allows this. WHATWG gives it an opaque origin.//host/path: No scheme at all. Inherits the scheme of the resolving page.
Spec disagreements
IPv6 zone identifiers. Given https://[fe80::1%25eth0]/:
- RFC 6874: Permits zone IDs inside the brackets, with
%written as%25. - WHATWG: No zone-ID grammar. The URL is rejected.
Browsers follow WHATWG, so zone IDs in URLs don’t work in practice.
Unencoded @ in userinfo. Given https://a@b@example.com:
- RFC 3986: The first
@ends userinfo. userinfo =a, host =b@example.com— which fails the host grammar, so the URL is rejected as malformed. - WHATWG: The last
@before the next/?#ends userinfo. userinfo =a@b, host =example.com. Earlier@characters get percent-encoded to%40.
Browsers follow WHATWG. To stay compatible with both specs, percent-encode @ as %40 inside userinfo.