Skip to content

Image OCR

Drop an image with text or click to browse

OCR runs in your browser via Tesseract.js. First language pick downloads ~10 MB of training data; cached after.

Estimates for educational purposes — not financial, medical, or legal advice. See terms.

OCR (optical character recognition) turns a picture of text into actual selectable, copyable text. The classic use case: photographing a page of a book to quote a passage; extracting the text from a screenshot for a search; reading the price from a menu photo. This tool runs Tesseract.js — the same engine that powers most desktop OCR utilities — entirely in your browser.

How it works

Tesseract.js is the original Tesseract OCR engine compiled to WebAssembly. It loads on first interaction (~2-3 MB) along with the language-specific training data file (~10 MB per language). The training data is cached in your browser’s IndexedDB so subsequent recognitions in the same language don’t re-download it.

Recognition itself is single-threaded and CPU-intensive — Tesseract walks the image looking for character shapes, matches them against a trained model for the chosen language, and returns the recognised text plus a per-character confidence score. The overall confidence in the result panel is the average across all detected characters.

What works well

  • Screenshots of UI text. Crisp pixel-aligned anti-aliased text is exactly what Tesseract was trained on. Expect 95%+ accuracy.
  • High-resolution scans of printed pages. 300+ DPI, dark text on light background, no shadow. The classic OCR scenario.
  • Phone photos of menus, signs, business cards. Held steady, well-lit, text fills most of the frame. Accuracy 85-95%.

What doesn’t work well

  • Handwriting. Tesseract is trained on machine-printed text. Handwriting accuracy is poor and varies enormously. For handwritten input, dedicated services (Apple’s iOS Live Text, Google Lens) do better.
  • Mixed scripts. Tesseract picks the chosen language’s character set as a strong prior. English+Mandarin together: download both languages and recognition will work, but accuracy drops compared to single-language input.
  • Low-contrast or low-resolution photos. Text smaller than ~20px tall in the source pixels rarely reads accurately.
  • Heavy rotation. Tesseract handles minor skew (a few degrees) but not 90 or 180 rotation. Rotate before OCR.

Example: extracting code from a screenshot

A screenshot of a code block in a web tutorial. Drop, English, Extract text. The code text appears in the output box, with original line breaks preserved. Confidence is typically high (90+). Copy and paste into your editor. Faster than retyping.

Example: capturing a printed quote

A phone photo of a page from a book. Drop, English (or whatever language the book is in), Extract text. Confidence drops to medium (70-85) because of phone-camera artefacts (slight blur, perspective, page curvature). The text is usable but proofread for OCR substitutions like “rn” → “m” or “0” → “O”.

Example: foreign-language menu

A phone photo of a menu in Spanish. Drop, Spanish (critical — using English would produce garbage on accented characters), Extract text. Use a separate translator afterwards if needed.

Common mistakes

Wrong language. The single biggest accuracy killer. Tesseract uses the chosen language as a strong prior — French text through the English model produces wrong letter shapes for accented characters. Confidence scores plummet but the output looks superficially plausible. Always pick the right language first.

Expecting layout preservation. Tesseract’s output is the recognised characters with line breaks roughly where they appeared in the source. Tables, multi-column layouts, and complex page geometry are flattened — the reading order is best-effort. For structured extraction, a layout-aware pipeline is better.

Photographing text that’s too small. A page of text shot from across a room: the text might be readable to your eye but each character is only 5-10 pixels in the source, below Tesseract’s accuracy threshold. Get closer or zoom in.

Forgetting that recognition is CPU-bound. Recognition can take 5-30 seconds on a moderate-density page. The progress bar moves during the long phases. Don’t navigate away from the tab — recognition stops if the page is closed.

What this tool does not do

It doesn’t upload anything. All recognition runs in your browser via Tesseract.js + WASM. No telemetry, no server calls.

It doesn’t handle PDFs directly. Render the PDF to images first with the PDF to images converter, then OCR each page.

It doesn’t preserve complex layout (tables, columns). Output is plain text in best-effort reading order.

It doesn’t recognise handwriting reliably. Use a dedicated handwriting recogniser (iOS Live Text, Google Lens) for handwriting.

It doesn’t translate. Output is the text in its source language; translation is a separate step.

Frequently asked questions

Why does the first OCR take so long?

Two one-time downloads happen on the first run: the Tesseract.js engine itself (~2-3 MB of WASM and JS) and the language-specific training data (~10 MB for English, similar for other languages). After the first run, both are cached — Tesseract.js stores the language data in IndexedDB, so switching back to a previously-used language is instant. Subsequent recognitions are CPU-bound (typically 2-15 seconds depending on image size and density).

How accurate is browser-based OCR?

Comparable to desktop Tesseract — same core engine, just compiled to WASM. For clean printed text on a high-resolution scan, expect 95%+ character accuracy. Photographs of printed text (signs, menus, book pages held up to a phone camera) typically land at 85-95%. Handwriting, low-contrast text, mixed scripts, or text at extreme angles is much harder — Tesseract was trained on machine-printed text and handwriting accuracy varies wildly. The confidence score in the result tells you which bucket your input fell into.

Why is my OCR result garbled / repeating characters?

Almost always: language mismatch. Tesseract heavily prefers character shapes from its trained language. Running French text through the English model produces letter substitutions on accented characters; running Cyrillic through English produces near-random Latin output. Switch to the right language and re-run. If the input has *multiple* languages mixed together (English plus Mandarin, say), download both languages and Tesseract will use them together — but accuracy drops noticeably with multi-language input compared to single-language.

What kinds of images work best?

Clean printed text, dark on light background, at least 300 DPI equivalent (so a screenshot at typical screen resolution is fine; a phone photo of a page from across the room is not). Text aligned roughly horizontal — Tesseract has limited tolerance for skew and almost none for rotation by 90/180. PDFs aren't supported directly here — render to PNG/JPG first (the PDF-to-image tool handles that). Tables and multi-column layouts produce reading-order issues; for structured extraction, a layout-aware pipeline like AWS Textract or Google Document AI does better.

Is this private?

Yes. The image, the language data download, and the recognition all stay in your browser. No upload to any server, no telemetry, no logs. You can verify in your browser's devtools Network panel — after the initial JS + WASM + language data download (which completes before recognition starts), there is zero network activity during recognition. Sensitive documents (medical, legal, financial) can be OCR'd safely without leaking to a third party.