OCR (optical character recognition) turns a picture of text into actual selectable, copyable text. The classic use case: photographing a page of a book to quote a passage; extracting the text from a screenshot for a search; reading the price from a menu photo. This tool runs Tesseract.js — the same engine that powers most desktop OCR utilities — entirely in your browser.
How it works
Tesseract.js is the original Tesseract OCR engine compiled to WebAssembly. It loads on first interaction (~2-3 MB) along with the language-specific training data file (~10 MB per language). The training data is cached in your browser’s IndexedDB so subsequent recognitions in the same language don’t re-download it.
Recognition itself is single-threaded and CPU-intensive — Tesseract walks the image looking for character shapes, matches them against a trained model for the chosen language, and returns the recognised text plus a per-character confidence score. The overall confidence in the result panel is the average across all detected characters.
What works well
- Screenshots of UI text. Crisp pixel-aligned anti-aliased text is exactly what Tesseract was trained on. Expect 95%+ accuracy.
- High-resolution scans of printed pages. 300+ DPI, dark text on light background, no shadow. The classic OCR scenario.
- Phone photos of menus, signs, business cards. Held steady, well-lit, text fills most of the frame. Accuracy 85-95%.
What doesn’t work well
- Handwriting. Tesseract is trained on machine-printed text. Handwriting accuracy is poor and varies enormously. For handwritten input, dedicated services (Apple’s iOS Live Text, Google Lens) do better.
- Mixed scripts. Tesseract picks the chosen language’s character set as a strong prior. English+Mandarin together: download both languages and recognition will work, but accuracy drops compared to single-language input.
- Low-contrast or low-resolution photos. Text smaller than ~20px tall in the source pixels rarely reads accurately.
- Heavy rotation. Tesseract handles minor skew (a few degrees) but not 90 or 180 rotation. Rotate before OCR.
Example: extracting code from a screenshot
A screenshot of a code block in a web tutorial. Drop, English, Extract text. The code text appears in the output box, with original line breaks preserved. Confidence is typically high (90+). Copy and paste into your editor. Faster than retyping.
Example: capturing a printed quote
A phone photo of a page from a book. Drop, English (or whatever language the book is in), Extract text. Confidence drops to medium (70-85) because of phone-camera artefacts (slight blur, perspective, page curvature). The text is usable but proofread for OCR substitutions like “rn” → “m” or “0” → “O”.
Example: foreign-language menu
A phone photo of a menu in Spanish. Drop, Spanish (critical — using English would produce garbage on accented characters), Extract text. Use a separate translator afterwards if needed.
Common mistakes
Wrong language. The single biggest accuracy killer. Tesseract uses the chosen language as a strong prior — French text through the English model produces wrong letter shapes for accented characters. Confidence scores plummet but the output looks superficially plausible. Always pick the right language first.
Expecting layout preservation. Tesseract’s output is the recognised characters with line breaks roughly where they appeared in the source. Tables, multi-column layouts, and complex page geometry are flattened — the reading order is best-effort. For structured extraction, a layout-aware pipeline is better.
Photographing text that’s too small. A page of text shot from across a room: the text might be readable to your eye but each character is only 5-10 pixels in the source, below Tesseract’s accuracy threshold. Get closer or zoom in.
Forgetting that recognition is CPU-bound. Recognition can take 5-30 seconds on a moderate-density page. The progress bar moves during the long phases. Don’t navigate away from the tab — recognition stops if the page is closed.
What this tool does not do
It doesn’t upload anything. All recognition runs in your browser via Tesseract.js + WASM. No telemetry, no server calls.
It doesn’t handle PDFs directly. Render the PDF to images first with the PDF to images converter, then OCR each page.
It doesn’t preserve complex layout (tables, columns). Output is plain text in best-effort reading order.
It doesn’t recognise handwriting reliably. Use a dedicated handwriting recogniser (iOS Live Text, Google Lens) for handwriting.
It doesn’t translate. Output is the text in its source language; translation is a separate step.