Count word frequencies and 2/3-word phrase frequencies in any text. Useful for SEO editing (check whether your target keyword actually appears often enough), content analysis (what is this document actually about), and editing (spot repeated phrases that signal weak writing).
Paste text into the box, toggle whether to filter English stopwords (on by default), set a minimum-occurrence threshold, and the tool shows three result tables: top words, top bigrams, and top trigrams — each with raw count and percentage of the total.
What the percentages mean
Word percentage is the fraction of total (post-filter) tokens. If a 100-word text with stopwords removed has 50 content words and “meteor” appears 5 times, the density is 5/50 = 10%. When you turn off the stopword filter, the denominator changes — it becomes the full token count instead — which dilutes every word’s density.
Bigram percentage is the fraction of total possible bigrams. For n tokens, there are n−1 bigrams (each token except the last starts one). A bigram that appears 3 times in a 100-word filtered text has density 3 / 99 ≈ 3.0%.
Trigram percentage works the same way: 3-word phrases, denominator n−2.
Example: content self-check
Paste an article you’re editing for a keyword. If the target keyword doesn’t appear in the top 10 single-word list, either the article isn’t actually about that keyword or you’re using heavy synonym variation (which can be fine for modern SEO but means naive density matching won’t find it).
If the target keyword dominates at >5% density, you’re probably stuffing — modern Google penalises this and it reads awkwardly to humans. 1–3% is typical for naturally-written content on a focused topic; much above that starts to feel repetitive.
Example: phrase detection
“The quick brown fox jumps over the lazy dog. The lazy dog sleeps in the sun while the quick brown fox looks for food.”
With stopwords removed, the token stream becomes: quick brown fox jumps lazy dog lazy dog sleeps sun quick brown fox looks food. The top bigrams are “quick brown” (2), “brown fox” (2), “lazy dog” (2). The top trigram is “quick brown fox” (2). These are the specific two- and three-word phrases recurring in the text — better than single-word frequency at capturing the actual subject matter.
Example: editing for repetition
Writers often repeat certain phrases unconsciously. Paste your draft, look at the bigram and trigram tables, and anything with a count above 2 is a candidate for varied phrasing. The tool doesn’t judge — some repetition is deliberate (refrain, parallelism) — but it’s hard to see repetition in your own draft without a frequency count.
Stopwords
The default filter excludes the ~100 most common English function words (the, and, of, is, was, etc.). This is the right default for content analysis because function words dominate every English text and bury the content-carrying words. Turn the filter off if you’re doing stylometrics (function word patterns are a fingerprint for authorship), or if you’re checking a specific phrase like “as soon as” where stopwords matter.
The stopword list is intentionally minimal — bigger lists exist (NLTK has ~180 English stopwords; some academic lists go to 400+) but the marginal utility drops fast after the first 50 words.
What this tool does not do
It doesn’t handle semantic analysis — it counts exact character matches, so “running”, “runs”, and “ran” are three separate words. For stemming or lemmatisation you need a dedicated NLP tool.
It doesn’t compute TF-IDF (term-frequency inverse-document-frequency), which weighs words by how unusual they are across a corpus. TF-IDF requires a reference corpus; this tool only analyses a single document.
It doesn’t detect named entities (people, places, organisations). Those require a parser; here, any capital-letter word is treated like any other. For total word count and reading time on the same draft, the word counter runs alongside; for sentence-level complexity and grade-level scores, the readability checker is the matching pass.