HTML Entities Encoder / Decoder

Free

No signup

Runs in your browser

Mode

Input

Output

&lt;p class=&quot;greeting&quot;&gt;Café &copy; 2026 &mdash; &quot;hello&quot;&lt;/p&gt;

Estimates for educational purposes — not financial, medical, or legal advice. See terms.

HTML entities are the escape syntax used to represent special characters in HTML source. When you want a literal less-than sign < in your rendered page instead of the start of a tag, you write < in the source. When you want a copyright symbol, you can write © (named) or © (decimal numeric) or © (hex numeric) — all three render as ©. This tool converts between plain text and any of those entity forms, in both directions.

Type or paste a string into the input, pick an action (encode or decode), and for encoding, pick a mode:

Basic — escape only the five XML-unsafe characters (&, <, >, ", '). The minimum needed for safe insertion into HTML.
Named — escape the XML-unsafe characters plus any other character that has a recognizable HTML named entity (©, €, —, “, etc.).
All non-ASCII — escape every character above ASCII 127 as a decimal numeric entity. The result is pure 7-bit ASCII and will survive any encoding pipeline.

The decoder handles all three forms at once — named, decimal numeric, and hex numeric — and is lenient about a missing trailing semicolon, so messy real-world input decodes correctly. Unknown named entities are passed through unchanged so you can see exactly what the tool couldn’t handle.

When to use each encode mode

Basic mode is what you need when you’re inserting untrusted text into an HTML attribute or text node and want to prevent XSS. The five XML-unsafe characters are the only ones that can break out of a safe text context into HTML or attribute syntax, so escaping just those is enough. This is what modern templating engines do by default — React’s {variable}, Astro’s {variable}, Vue’s {{ variable }} — and it’s what you should do if you’re ever constructing HTML strings by hand.

Named mode is for hand-edited HTML where readability matters. A © in the source is easier to recognize than ©, and both render identically. Named mode escapes the XML-unsafe characters plus every other character that has a recognisable named entity in the tool’s curated list — copyright, trademark, currency symbols, smart quotes, em dashes, and so on. It produces more human-readable output than the numeric form at the cost of not covering every possible character.

All-non-ASCII mode is for transport safety. If you have a string containing characters outside the ASCII range and you need to push it through a system that might re-encode, misencode, or strip non-ASCII bytes (some older email systems, certain databases with wrong column encodings, legacy file formats), encoding everything above 127 as a numeric entity gives you a pure 7-bit ASCII representation that will always survive. The downside is verbosity — a string of emoji becomes a much longer string of numeric entities.

Example: escaping user-submitted content

A user submits the comment <script>alert("hi")</script> — thanks!. You want to display it verbatim on the page without executing the script. Encode with basic mode:

&lt;script&gt;alert(&quot;hi&quot;)&lt;/script&gt; — thanks!

This is safe to insert into an HTML page because none of the output can be interpreted as markup. The em dash is left alone because basic mode doesn’t touch characters that aren’t XML-unsafe; your HTML file’s UTF-8 encoding handles the dash natively.

Example: legacy system round-trip

You have a string with smart quotes and an em dash that needs to survive a legacy email pipeline. Encode with “all non-ASCII” mode:

input:  "Don't forget — it's important"
output: &ldquo;Don&#8217;t forget &mdash; it&#8217;s important&rdquo;

Wait, that’s wrong for “all” mode — let me check what the tool actually outputs. In “all” mode, the smart quotes become “ and ” (decimal numeric). In “named” mode, they become “ and ”. The two outputs decode to the same string, but the numeric form is slightly more portable because it doesn’t depend on the receiving system knowing the named entity.

Example: decoding mixed input

A scraper gives you a string mixing all three entity forms:

input: Price: &euro;100 &mdash; see &#167; 3.2 at &#x2022; point 5
output: Price: €100 — see § 3.2 at • point 5

The decoder handles all three — named (€, —), decimal numeric (§), and hex numeric (•) — without needing mode hints. Unknown named entities are passed through so you can see what failed.

What this tool does not do

It does not parse HTML — the decoder is a simple regex-based entity processor, not a DOM parser. If you paste a full HTML document in, it’ll decode the entities in the text but won’t extract content or handle tags. For that, use a real HTML parser.

It does not include the full HTML5 named entity list (~2200 entries). The curated list covers common punctuation, symbols, and currency that appear in real-world content. If you need an obscure entity like ∀ or &beth;, use the “all non-ASCII” mode, which produces correct numeric entities for any Unicode character. For URL-safe character escaping instead, the URL encoder / decoder handles percent-encoding; for stripping tags entirely from an HTML snippet, the strip HTML tool does that pass.

Frequently asked questions

When do I need to HTML-encode a string?

Whenever you're inserting untrusted text into HTML content and can't use a templating engine that does it for you. The rule is: never concatenate raw strings into HTML — that's how cross-site scripting happens. If you absolutely must, encode the five XML-unsafe characters first: ampersand, less-than, greater-than, double-quote, and apostrophe. Those are the only characters that can break out of a safe text context in HTML, so escaping them is enough to stop injection in most cases.

What's the difference between basic, named, and all-non-ASCII modes?

Basic only escapes the five XML-unsafe characters — the minimum needed for safety. Named escapes those plus every other character that has a recognizable HTML entity name (copyright, registered, euro, smart quotes, dashes). All-non-ASCII escapes everything outside the ASCII range as numeric entities, producing a string that's guaranteed to survive any encoding pipeline. Use basic for safety, named for readability in hand-edited HTML, and all-non-ASCII when you need bulletproof transport through systems that might mishandle UTF-8.

Why are some characters left alone in named mode?

Because the tool ships with a small curated map of named entities rather than the full HTML5 spec's ~2200 entries. Common characters are covered — copyright, trademark, currency symbols, punctuation, smart quotes — but rarer ones like 'alpha' or 'forall' are not. If you need those, use all-non-ASCII mode instead, which handles every non-ASCII character without needing to know its name.

What is the difference between decimal and hex numeric entities?

Both represent the same Unicode code point, just in different bases. © is decimal 169, which is © (copyright). © is hex A9, also 169, also ©. They decode to the same thing. Hex is traditionally used for reference work because Unicode code points are often quoted in hex (U+00A9), but decimal is more common in auto-generated output. The decoder accepts both interchangeably.

Does the decoder handle entities without a trailing semicolon?

Yes. The decoder is deliberately lenient — a trailing semicolon is optional, so both '&' and '&amp' decode to '&'. This matters because hand-written HTML sometimes omits semicolons, and some systems strip them. Strict decoders reject those inputs; this one accepts them. Encoded output always includes the semicolon, so round-tripping works regardless.