Skip to content

Keyword Density Analyzer

Minimum occurrences

Summary

32 words · 21 unique

WordCount%
brown39.4%
lazy39.4%
quick39.4%
dog26.3%
dogs26.3%
famously26.3%
fox26.3%
foxes26.3%
along13.1%
fast13.1%
food13.1%
get13.1%
jumps13.1%
looks13.1%
rarely13.1%
runners13.1%
sleepers13.1%
sleeps13.1%
slow13.1%
sun13.1%
PhraseCount%
quick brown39.7%
brown fox26.5%
lazy dog26.5%
brown foxes13.2%
dog lazy13.2%
dog sleeps13.2%
dogs famously13.2%
dogs rarely13.2%
famously fast13.2%
famously slow13.2%
PhraseCount%
quick brown fox26.7%
brown fox jumps13.3%
brown fox looks13.3%
brown foxes famously13.3%
dog lazy dog13.3%
dog sleeps sun13.3%
dogs famously slow13.3%
dogs rarely get13.3%
famously fast runners13.3%
famously slow sleepers13.3%

Estimates for educational purposes — not financial, medical, or legal advice. See terms.

Count word frequencies and 2/3-word phrase frequencies in any text. Useful for SEO editing (check whether your target keyword actually appears often enough), content analysis (what is this document actually about), and editing (spot repeated phrases that signal weak writing).

Paste text into the box, toggle whether to filter English stopwords (on by default), set a minimum-occurrence threshold, and the tool shows three result tables: top words, top bigrams, and top trigrams — each with raw count and percentage of the total.

What the percentages mean

Word percentage is the fraction of total (post-filter) tokens. If a 100-word text with stopwords removed has 50 content words and “meteor” appears 5 times, the density is 5/50 = 10%. When you turn off the stopword filter, the denominator changes — it becomes the full token count instead — which dilutes every word’s density.

Bigram percentage is the fraction of total possible bigrams. For n tokens, there are n−1 bigrams (each token except the last starts one). A bigram that appears 3 times in a 100-word filtered text has density 3 / 99 ≈ 3.0%.

Trigram percentage works the same way: 3-word phrases, denominator n−2.

Example: content self-check

Paste an article you’re editing for a keyword. If the target keyword doesn’t appear in the top 10 single-word list, either the article isn’t actually about that keyword or you’re using heavy synonym variation (which can be fine for modern SEO but means naive density matching won’t find it).

If the target keyword dominates at >5% density, you’re probably stuffing — modern Google penalises this and it reads awkwardly to humans. 1–3% is typical for naturally-written content on a focused topic; much above that starts to feel repetitive.

Example: phrase detection

“The quick brown fox jumps over the lazy dog. The lazy dog sleeps in the sun while the quick brown fox looks for food.”

With stopwords removed, the token stream becomes: quick brown fox jumps lazy dog lazy dog sleeps sun quick brown fox looks food. The top bigrams are “quick brown” (2), “brown fox” (2), “lazy dog” (2). The top trigram is “quick brown fox” (2). These are the specific two- and three-word phrases recurring in the text — better than single-word frequency at capturing the actual subject matter.

Example: editing for repetition

Writers often repeat certain phrases unconsciously. Paste your draft, look at the bigram and trigram tables, and anything with a count above 2 is a candidate for varied phrasing. The tool doesn’t judge — some repetition is deliberate (refrain, parallelism) — but it’s hard to see repetition in your own draft without a frequency count.

Stopwords

The default filter excludes the ~100 most common English function words (the, and, of, is, was, etc.). This is the right default for content analysis because function words dominate every English text and bury the content-carrying words. Turn the filter off if you’re doing stylometrics (function word patterns are a fingerprint for authorship), or if you’re checking a specific phrase like “as soon as” where stopwords matter.

The stopword list is intentionally minimal — bigger lists exist (NLTK has ~180 English stopwords; some academic lists go to 400+) but the marginal utility drops fast after the first 50 words.

What this tool does not do

It doesn’t handle semantic analysis — it counts exact character matches, so “running”, “runs”, and “ran” are three separate words. For stemming or lemmatisation you need a dedicated NLP tool.

It doesn’t compute TF-IDF (term-frequency inverse-document-frequency), which weighs words by how unusual they are across a corpus. TF-IDF requires a reference corpus; this tool only analyses a single document.

It doesn’t detect named entities (people, places, organisations). Those require a parser; here, any capital-letter word is treated like any other. For total word count and reading time on the same draft, the word counter runs alongside; for sentence-level complexity and grade-level scores, the readability checker is the matching pass.

Frequently asked questions

What is keyword density and does Google care?

Keyword density is the percentage of words in a page that are a specific keyword. Historically it was a straightforward SEO signal — pages with 2–3% density for the target keyword tended to rank well. Modern Google ignores raw keyword density and uses more sophisticated topic modelling, so the metric is no longer a direct ranking factor. But it's still useful as a writing tool: if your top keyword is barely present, the page isn't about what you think it's about; if it's dominating, you may be stuffing. Use density for self-diagnosis, not for ranking manipulation.

Why exclude stopwords by default?

Because without filtering, 'the' and 'of' dominate every English text — the top 10 words are almost always function words, which tells you nothing about the topic. Removing stopwords lifts content words (nouns, verbs, adjectives) to the top so you can see what the text is actually about. Turn the filter off if you care about stylometrics or speaking-style analysis, where function-word frequency is meaningful.

What are bigrams and trigrams for?

A bigram is a 2-word phrase, a trigram is a 3-word phrase. Single-word frequency tells you which topics dominate; phrase frequency tells you which specific claims or entities recur. A page about 'meteor showers' uses both words separately a lot, but the bigram 'meteor shower' is the signal that the topic is specifically meteor showers, not just 'meteors' and 'showers' in unrelated contexts. Trigrams pick up even more specific patterns like brand names ('black friday sale') and named phrases ('natural language processing').

Why does 'the quick brown fox' with stopwords off show the bigram 'the quick' but with stopwords on it doesn't?

Because bigrams are computed on the filtered token list. When stopwords are removed, 'the quick brown fox' becomes 'quick brown fox' in the token stream, and bigrams are taken from that. The phrase 'the quick' literally no longer exists in the filtered stream. This is deliberate — it makes the bigram output focus on content-carrying phrases rather than noise like 'of the' and 'in the'.

What does the percentage column mean for bigrams and trigrams?

The percentage of all possible phrases of that length. For a text with 100 words and stopwords removed, there are 99 possible bigrams (each word except the last is the start of one), and 98 possible trigrams. A bigram that occurs 3 times has a percentage of 3 / 99 ≈ 3.0%. The denominator isn't the word count — it's the number of phrases of that length, which for n tokens is n−1 bigrams and n−2 trigrams.