Skip to content

Slugify

Options

Slug

Type to generate
 

Estimates for educational purposes — not financial, medical, or legal advice. See terms.

A slug is the URL-friendly part of a page address: in https://toolsnug.com/text/transform/slugify, the bits separated by slashes are slugs. Generating one from a title is one of those tasks that looks trivial until you realise how many edge cases hide in plain text — accented letters, ligatures, characters from non-Latin scripts, punctuation, runs of whitespace, capitalisation conventions.

This tool produces clean, predictable slugs from any input, with the same algorithm most static-site generators and CMSs use under the hood.

How it works

Six steps:

  1. Apply special-case map. A handful of characters Unicode normalisation doesn’t help with (German ß, Polish ł, Norwegian ø, AE/OE ligatures, Icelandic þ/ð) get manually mapped to ASCII equivalents.
  2. Unicode NFKD normalisation. Decomposes accented characters into base letter + combining mark. Café becomes Cafe + combining acute.
  3. Strip combining marks. The combining marks from step 2 (and any others) get removed via the regex \p{M}+.
  4. Replace non-alphanumerics with separator. Anything that isn’t [A-Za-z0-9] becomes the chosen separator (hyphen by default).
  5. Collapse separator runs and trim. Multiple separators in a row become one; leading and trailing separators get stripped.
  6. Lowercase. Optional but on by default.

Optional extras: stop-word removal (drops English connector words like ‘the’, ‘and’, ‘of’) and max-length truncation (cuts at the most recent word boundary if possible).

Example: blog post title

My New Blog Post About Café & Résumé becomes my-new-blog-post-about-cafe-resume. The accented characters lose their accents (NFKD + strip-combining), the punctuation collapses to a single hyphen, and the result is a clean URL slug.

With remove stop words on, the same input becomes new-blog-post-cafe-resumemy and about get dropped as English stop words.

Example: branch name

Git branch names benefit from the same treatment. Fix the Sign-Up Form Bugfix-the-sign-up-form-bug. Some teams prefer underscores in branch names; switch the separator. The output is safe for any Git remote, no special escaping needed.

Example: non-English titles

Łódź København weißlodz-kobenhavn-weiss. The special-case map handles ł, ø, and ß explicitly because those don’t decompose helpfully via NFKD alone.

hello мир worldhello-world. Cyrillic doesn’t decompose to ASCII, so the Russian word гets dropped (replaced with the separator and collapsed). For non-Latin source languages, transliterate to ASCII first with a locale-aware tool, then slugify.

Common mistakes

Expecting Cyrillic / CJK / Arabic to transliterate. They don’t, and shouldn’t — generic Unicode normalisation can’t know whether your Russian “Х” should map to “Kh” (English convention) or “H” (Bulgarian) or something else. Use a locale-specific transliteration tool first.

Slugifying after URL-encoding. If your input is already URL-encoded (Caf%C3%A9 instead of Café), slugify it before encoding, not after. Otherwise the percent signs get stripped along with the encoding and you lose the character entirely.

Forgetting that slug uniqueness is your job. Slugify is deterministic — the same input always gives the same slug. If two posts have similar titles (“My Post” and “My Post 2”) they’ll both want my-post. Adding the disambiguator (date, ID, suffix) is the calling system’s job.

Slugifying user-supplied raw HTML. If your input might contain HTML tags, strip them first with the strip HTML tool. Otherwise tag characters become hyphens and the structure leaks into the slug.

What this tool does not do

It doesn’t transliterate non-Latin scripts. Real transliteration needs locale knowledge.

It doesn’t enforce uniqueness against an existing set. The output is deterministic; uniqueness logic lives in the application that consumes the slug.

It doesn’t preserve case in any locale-aware way (Turkish dotted/undotted i, etc.). Lowercase uses JavaScript’s default toLowerCase, which works for most languages but has the well-known Turkish bug for that specific case. For non-URL case transformations (camelCase, snake_case, CONSTANT_CASE), the case converter is the sibling tool.

Frequently asked questions

How does slugify handle accented characters?

It normalises the input via Unicode NFKD (compatibility decomposition), which splits characters like é into base + combining acute. The combining mark gets stripped, leaving the plain e. So Café becomes cafe. This is the standard approach used by static-site generators (Hugo, Jekyll, Astro) and most CMSs. The advantage over a fixed transliteration table is that NFKD handles every accented Latin character, not just the ones a hand-written table covers.

What about ß, æ, œ, ł — characters that don't decompose?

Some characters are considered separate base letters by Unicode rather than accented variants — German ß doesn't decompose to ss via NFKD, Polish ł doesn't decompose to l, Norwegian ø doesn't decompose to o. The slugify tool has a small special-case map for these (~13 characters covering the common European-language extras) so weiß becomes weiss, Łódź becomes lodz, København becomes kobenhavn.

Why does my Cyrillic text become a row of hyphens?

Cyrillic, Greek, Arabic, CJK, and other non-Latin scripts don't decompose into ASCII via Unicode normalisation, so they get replaced by the separator. Real transliteration is locale-specific (Russian → Latin uses different rules than Bulgarian → Latin uses different rules than Mongolian → Latin) and doing it badly is worse than producing a placeholder. For a non-Latin source language, transliterate first with a tool that knows the language, then slugify.

Should I remove stop words?

For SEO-friendly URLs, removing stop words (a, the, and, of, etc.) often reads better and stays under length limits. 'My New Post About the Weather' becomes 'my-new-post-weather' instead of 'my-new-post-about-the-weather'. But if the stop word carries meaning (a 'the' that distinguishes 'The Beatles' from 'Beatles', for example), keep it. The toggle defaults to off because the safer behaviour is to keep what the author wrote.

What's the difference between hyphens, underscores, and dots?

Hyphens (kebab-case) are the convention for URL slugs — all the major search engines treat hyphenated words as separate words for indexing. Underscores work too but are sometimes treated as part of a single token. Dots are unusual but show up in some legacy systems. Pick the one your destination expects; the default hyphen is what 95% of slug-using systems use.