blocks entirely, including their JavaScript content. Same with and . If you're seeing leftover script content, the tag may be malformed (unclosed, missing the ) or use unusual casing the regex didn't catch. Paste the source somewhere visible and look for an unbalanced tag."}},{"@type":"Question","name":"What's the difference between 'preserve paragraph breaks' on and off?","acceptedAnswer":{"@type":"Answer","text":"On: block-level tags like

, , ,
are translated into newlines before stripping, so the output reads as separate paragraphs. Off: every tag is replaced with nothing, so

foo

bar

becomes 'foobar' jammed together. Leave it on unless you want a single line of text — for example, when feeding the output to a single-line input field."}},{"@type":"Question","name":"Why does '1 < 2 and 3 > 2' get partially stripped?","acceptedAnswer":{"@type":"Answer","text":"Naked < and > characters in source HTML are technically invalid — real HTML would write '1 < 2'. The stripper treats anything between < and > as a tag, so it eats the '< 2 and 3 >' substring as one 'tag'. To strip text that contains literal <, >, & characters safely, encode them first (use the HTML entity encoder), then strip; or just don't trust HTML output that wasn't encoded properly upstream."}},{"@type":"Question","name":"Does this also handle Markdown?","acceptedAnswer":{"@type":"Answer","text":"No — Markdown is a different syntax (#, *, [], etc.) and a strip pass for it would be a different tool. If your input is HTML produced from Markdown, this tool works fine. For raw Markdown, you'd want a Markdown-to-text tool that runs the Markdown parser and pulls the plain text."}},{"@type":"Question","name":"What entities does the decoder support?","acceptedAnswer":{"@type":"Answer","text":"All numeric entities (A and hex A covering the full Unicode range, including emoji) and the most common named entities: amp, lt, gt, quot, apos, nbsp, copy, reg, trade, hellip, mdash, ndash, lsquo, rsquo, ldquo, rdquo, laquo, raquo, middot, bull, deg. Less-common named entities (over a thousand exist in HTML5) fall back to the literal '&name;' so they're at least visible in the output rather than silently broken."}}]} Skip to content

Strip HTML Tags

Options

Plain text

0 chars · 0 tags removed

Estimates for educational purposes — not financial, medical, or legal advice. See terms.

There are a dozen reasons to strip HTML from text: extracting content from a scraped page, cleaning up an email signature, preparing snippets for plain-text email, generating SEO-friendly text from rich content. The fundamental operation is the same — peel off the tags, keep the text — but the details (entity decoding, line breaks, embedded scripts) determine whether the output is usable or garbage.

This tool handles all of them in your browser, with no upload and no service in between.

How it works

The stripper runs in three passes:

  1. Drop dangerous blocks. <script>...</script>, <style>...</style>, and <!-- ... --> are removed entirely, including their content. JavaScript and CSS are not text the user wants to see.
  2. Promote paragraph structure. When “preserve paragraph breaks” is on, block-level tags (<p>, <div>, <h1>-<h6>, <li>, <br>, <tr>, etc.) become newlines before the next pass — so the output keeps its document structure as line breaks.
  3. Strip everything else. Anything between < and > is removed, with a count of how many tags were dropped.

After stripping, named and numeric entities (&amp;, &#65;, &#x1F600;) are decoded to their actual characters. Then optional whitespace normalisation collapses runs of spaces and excess blank lines.

Example: cleaning an email signature

You receive an email with a verbose HTML signature: a logo image, a disclaimer in a <table>, social media icons in <div>s. You want just the human-written text. Paste it in, leave defaults on, and the tags vanish — the signature collapses to the disclaimer text and the sender’s name. Copy and reply.

Example: extracting article body from scraped HTML

You ran a scraper and got back a page of HTML you actually want as text — for example, an article body wrapped in <article><p>...</p></article> with sidebar <nav>s and <script> tracking blobs. Strip removes the wrappers, drops the scripts entirely, and gives you paragraph-separated body text. Faster than writing a per-site parser when you just need the text.

Example: prepping plain-text email

You’re sending the same content as both HTML and plain text. Write it once in HTML (or paste rich-text into a converter), strip it for the plain-text part. The strip tool’s default settings (preserve paragraphs, decode entities, normalise whitespace, trim each line) produce email-friendly output.

Common mistakes

Expecting Markdown output. This tool produces plain text, not Markdown. Headings lose their #s, lists lose their -s, links drop their URLs. If you need formatting hints preserved, use the HTML to Markdown converter instead.

Pasting HTML with literal < and > characters. A snippet like 1 < 2 is technically invalid HTML — proper HTML would encode it as 1 &lt; 2. The stripper sees < and looks for the next >, so < 2 and 3 > gets treated as a tag and stripped. If your input is hand-written and might contain raw less-than/greater-than, encode it first or check the result.

Forgetting that stripping doesn’t sanitise. Strip removes tags from text. It does not protect a downstream system from injection — if you’re saving HTML to a database or rendering user input back into a page, you need a real sanitiser (DOMPurify, server-side library) that understands attribute values, JavaScript URLs, and other attack vectors. Stripping is a display tool, not a security tool.

What this tool does not do

It doesn’t preserve formatting (bold, italic, links). Those are HTML constructs; the output is plain text by definition.

It doesn’t extract links or images separately. If you want a list of all hrefs or srcs from the input, that’s a different operation — and a different tool.

It doesn’t sanitise input for safe re-rendering. Output is plain text, not “safe HTML”. Don’t strip and then re-insert into innerHTML expecting safety.

It doesn’t handle non-HTML markup (XML schemas with custom tags work coincidentally because they share <tag> syntax, but anything more exotic — RTF, BBCode, Markdown — needs its own parser). To turn the stripped plain text into a URL slug, pipe it through the slugify tool.

Frequently asked questions

Why does the result still contain text from a script tag I forgot to remove?

It shouldn't — the tool drops <script>...</script> blocks entirely, including their JavaScript content. Same with <style>...</style> and <!-- comments -->. If you're seeing leftover script content, the tag may be malformed (unclosed, missing the </script>) or use unusual casing the regex didn't catch. Paste the source somewhere visible and look for an unbalanced tag.

What's the difference between 'preserve paragraph breaks' on and off?

On: block-level tags like <p>, </div>, </h1>, <br> are translated into newlines before stripping, so the output reads as separate paragraphs. Off: every tag is replaced with nothing, so <p>foo</p><p>bar</p> becomes 'foobar' jammed together. Leave it on unless you want a single line of text — for example, when feeding the output to a single-line input field.

Why does '1 < 2 and 3 > 2' get partially stripped?

Naked < and > characters in source HTML are technically invalid — real HTML would write '1 &lt; 2'. The stripper treats anything between < and > as a tag, so it eats the '< 2 and 3 >' substring as one 'tag'. To strip text that contains literal <, >, & characters safely, encode them first (use the HTML entity encoder), then strip; or just don't trust HTML output that wasn't encoded properly upstream.

Does this also handle Markdown?

No — Markdown is a different syntax (#, *, [], etc.) and a strip pass for it would be a different tool. If your input is HTML produced from Markdown, this tool works fine. For raw Markdown, you'd want a Markdown-to-text tool that runs the Markdown parser and pulls the plain text.

What entities does the decoder support?

All numeric entities (&#65; and hex &#x41; covering the full Unicode range, including emoji) and the most common named entities: amp, lt, gt, quot, apos, nbsp, copy, reg, trade, hellip, mdash, ndash, lsquo, rsquo, ldquo, rdquo, laquo, raquo, middot, bull, deg. Less-common named entities (over a thousand exist in HTML5) fall back to the literal '&name;' so they're at least visible in the output rather than silently broken.