Find the longest shared text between two inputs using the classic dynamic-programming algorithms for longest common substring (contiguous match) and longest common subsequence (non-contiguous). Paste two texts, pick which kind of match you want, and the result highlights the matching text in both inputs.
The two modes
Longest common substring finds the longest run of characters that appears the same in both strings, contiguously. The shared text must be an unbroken sequence in both inputs. “The quick brown fox” and “The slow brown fox” share the substring ” brown fox” (10 characters including the leading space) because those characters appear consecutively in both.
Longest common subsequence finds the longest sequence of characters that appears in both strings in the same order but not necessarily contiguous. “abcXdef” and “aYbZcWdef” share the subsequence “abcdef” (6 characters) because you can pick those letters in the right order from both strings, even though they’re interrupted by other characters.
The substring is always a subset of the subsequence — the longest substring is never longer than the longest subsequence for the same input pair. They become equal only when the common text really is contiguous in both strings.
The algorithms
Both use a 2-dimensional dynamic-programming table of size $(m+1) \times (n+1)$, where $m$ and $n$ are the lengths of the two input strings. Each cell $(i, j)$ in the table stores a piece of the answer for a prefix of the inputs.
For substring, cell $(i, j)$ holds the length of the longest common substring ending at position $i$ in string A and position $j$ in string B:
Whenever characters match, you extend the run; whenever they don’t, the run breaks and the cell is zero. Track the global maximum as you fill the table, then read off the match by backing up from the cell with the maximum value.
For subsequence, cell $(i, j)$ holds the length of the longest common subsequence of the first $i$ characters of A and the first $j$ of B:
Whenever characters match, you inherit the diagonal value plus one; otherwise you take the best of dropping a character from either string. The final answer lives in cell $(m, n)$, and you reconstruct the actual subsequence by walking back through the table.
Both algorithms run in $O(m \cdot n)$ time and memory.
Example: plagiarism detection
Two student essays share the sentence “the industrial revolution fundamentally transformed western economies by mechanising production and concentrating capital.” The substring mode finds that exact 102-character match if both essays contain it. In real plagiarism work, a 50+ character shared contiguous substring is strong evidence that one author saw the other’s text — genuinely independent prose rarely produces such long coincidental overlaps.
Example: code diff
Given two versions of a function, the subsequence mode finds the longest sequence of lines (or characters, depending on how you tokenise) that are unchanged between the versions. Everything not in the LCS is either an insertion, deletion, or modification. This is the core of how git diff and most other diff tools compute what changed between two files — the LCS tells you what stayed the same, and the delta is everything else.
Example: similar but not identical titles
“The Lord of the Rings: The Fellowship of the Ring” and “The Lord of the Rings: The Two Towers” share the substring “The Lord of the Rings: The ” (27 characters). The subsequence would go further — adding any common-ordered characters from the divergent tails — but for “is this the same franchise?” questions, the substring is more discriminating.
What this tool does not do
It doesn’t compute a similarity percentage — for that, use the similarity percentage checker, which normalises the match length against the input sizes across four algorithms.
It doesn’t handle fuzzy or approximate matching (Levenshtein distance, edit distance). Those are different problems with different algorithms; this tool finds exact character matches only.
It doesn’t produce a full diff output (inserted/deleted/unchanged line counts). The subsequence length tells you what’s shared, but producing a full diff requires tracking the non-matching edits too — for that, the text diff tool gives the line-by-line changed/unchanged split.