There are a dozen reasons to strip HTML from text: extracting content from a scraped page, cleaning up an email signature, preparing snippets for plain-text email, generating SEO-friendly text from rich content. The fundamental operation is the same — peel off the tags, keep the text — but the details (entity decoding, line breaks, embedded scripts) determine whether the output is usable or garbage.
This tool handles all of them in your browser, with no upload and no service in between.
How it works
The stripper runs in three passes:
- Drop dangerous blocks.
<script>...</script>,<style>...</style>, and<!-- ... -->are removed entirely, including their content. JavaScript and CSS are not text the user wants to see. - Promote paragraph structure. When “preserve paragraph breaks” is on, block-level tags (
<p>,<div>,<h1>-<h6>,<li>,<br>,<tr>, etc.) become newlines before the next pass — so the output keeps its document structure as line breaks. - Strip everything else. Anything between
<and>is removed, with a count of how many tags were dropped.
After stripping, named and numeric entities (&, A, 😀) are decoded to their actual characters. Then optional whitespace normalisation collapses runs of spaces and excess blank lines.
Example: cleaning an email signature
You receive an email with a verbose HTML signature: a logo image, a disclaimer in a <table>, social media icons in <div>s. You want just the human-written text. Paste it in, leave defaults on, and the tags vanish — the signature collapses to the disclaimer text and the sender’s name. Copy and reply.
Example: extracting article body from scraped HTML
You ran a scraper and got back a page of HTML you actually want as text — for example, an article body wrapped in <article><p>...</p></article> with sidebar <nav>s and <script> tracking blobs. Strip removes the wrappers, drops the scripts entirely, and gives you paragraph-separated body text. Faster than writing a per-site parser when you just need the text.
Example: prepping plain-text email
You’re sending the same content as both HTML and plain text. Write it once in HTML (or paste rich-text into a converter), strip it for the plain-text part. The strip tool’s default settings (preserve paragraphs, decode entities, normalise whitespace, trim each line) produce email-friendly output.
Common mistakes
Expecting Markdown output. This tool produces plain text, not Markdown. Headings lose their #s, lists lose their -s, links drop their URLs. If you need formatting hints preserved, use the HTML to Markdown converter instead.
Pasting HTML with literal < and > characters. A snippet like 1 < 2 is technically invalid HTML — proper HTML would encode it as 1 < 2. The stripper sees < and looks for the next >, so < 2 and 3 > gets treated as a tag and stripped. If your input is hand-written and might contain raw less-than/greater-than, encode it first or check the result.
Forgetting that stripping doesn’t sanitise. Strip removes tags from text. It does not protect a downstream system from injection — if you’re saving HTML to a database or rendering user input back into a page, you need a real sanitiser (DOMPurify, server-side library) that understands attribute values, JavaScript URLs, and other attack vectors. Stripping is a display tool, not a security tool.
What this tool does not do
It doesn’t preserve formatting (bold, italic, links). Those are HTML constructs; the output is plain text by definition.
It doesn’t extract links or images separately. If you want a list of all hrefs or srcs from the input, that’s a different operation — and a different tool.
It doesn’t sanitise input for safe re-rendering. Output is plain text, not “safe HTML”. Don’t strip and then re-insert into innerHTML expecting safety.
It doesn’t handle non-HTML markup (XML schemas with custom tags work coincidentally because they share <tag> syntax, but anything more exotic — RTF, BBCode, Markdown — needs its own parser). To turn the stripped plain text into a URL slug, pipe it through the slugify tool.