Free · Fast · Privacy-first

Convert a Scanned PDF to Word

Scanned PDFs are image files at heart, the text on each page is rendered as pixels rather than as machine-readable characters, which is why your cursor cannot select a sentence and your copy command pulls back nothing.

⚡

OCR extracts text from scanned pages

🔒

Multi-page scanned PDFs supported

✨

Editable Word output

⚡

Free, no Adobe required

Cost: Free forever
Sign-up: Not required
Processing: In your browser
Privacy: Files stay local

FreeNo signupWhite-label

Add this PDF to Word to your website

Drop the PDF to Word into any page — blog post, product docs, intranet, school portal — with a single line of HTML. Your visitors get the full tool, processed entirely in their browser. No backend, no uploads, no signup.

Files stay 100% in the visitor's browser
Responsive — adapts to any container width
Free forever, no API key needed

Embed code

<iframe
  src="https://www.fixtools.io/pdf/pdf-to-word?embed=1"
  width="100%"
  height="780"
  frameborder="0"
  style="border:0;border-radius:16px;max-width:900px;"
  title="PDF to Word by FixTools"
  loading="lazy"
  allow="clipboard-write"
></iframe>

<iframe
  src="https://www.fixtools.io/pdf/pdf-to-word?embed=1"
  width="100%"
  height="780"
  frameborder="0"
  style="border:0;border-radius:16px;max-width:900px;"
  title="PDF to Word by FixTools"
  loading="lazy"
  allow="clipboard-write"
></iframe>

Attribution-friendly: a small "Powered by FixTools" link appears in the embed footer.

How OCR reads scanned pages and what determines its accuracy

When a document is scanned, the scanner captures a raster image of the page, typically stored as a JPEG or TIFF stream embedded inside the PDF container. There is no machine-readable text in the file, only pixels arranged in patterns that human eyes interpret as letters. Optical Character Recognition works by analysing those pixels, identifying shapes that match known glyph templates, and converting them to Unicode text plus positional metadata. The process involves several distinct stages: deskewing the image to correct for the slight rotation introduced by feeding a page through a scanner, binarising the image to high-contrast black and white so character edges become unambiguous, segmenting the page into text regions and non-text regions like photographs or logos, and then applying a character classifier to each text segment. Modern OCR engines like Tesseract running in WebAssembly can process a single A4 page in one to three seconds on a typical laptop. For a 20-page scanned document, expect 20 to 60 seconds of processing time entirely inside your browser tab.

Scan resolution is the single most important factor in OCR accuracy, far more important than the OCR engine or the language pack chosen. A scan at 150 DPI, dots per inch, produces roughly 1240 by 1754 pixels for an A4 page, which gives the OCR classifier very limited character detail and routinely confuses similar shapes such as e and c or a and o. At 300 DPI that becomes 2480 by 3508 pixels, which is the recognised standard minimum for reliable character recognition on body-size fonts. At 600 DPI, accuracy improves further for small footnote fonts and fine details such as accented characters in European languages. Most office multifunction devices default to 200 or 300 DPI. If you are rescanning a document specifically to run through OCR, set your scanner to 300 DPI for standard ten to twelve point text and 400 DPI for documents with eight-point or smaller fonts. TIFF format preserves more detail than JPEG for the source scan because TIFF does not introduce compression artefacts.

For printed, typed documents scanned at 300 DPI or above with good contrast, expect OCR accuracy of 98 to 99 percent for standard Latin characters in common business fonts. A 500-word page might have two to five recognition errors on average, almost all of which are obvious misreadings you can fix in seconds. Common error patterns include 0 misread as O, 1 misread as l or I, the rn pair misread as m, and the cl pair misread as d. Numbers in tables are slightly more error-prone than continuous prose because the classifier has fewer surrounding-context cues to break ties. Handwritten text is fundamentally different: handwriting recognition requires dedicated neural models trained on handwritten samples, and standard OCR engines produce poor results on handwritten pages regardless of resolution. If your scanned document contains handwriting, plan to correct the text manually after conversion or treat the OCR output as a starting outline.

Beyond accuracy, the structure of the resulting Word document depends on how cleanly the OCR engine can identify columns, paragraphs, and tables on the page. Single-column reports with consistent margins convert with clean paragraph breaks and matched indentation. Two-column academic papers usually need a small amount of reordering because the engine occasionally reads across both columns on a single line. Tables built with visible borders are detected fairly well, while tables that rely on whitespace alignment without rules sometimes collapse into a flat sequence of paragraphs that you have to rebuild with Insert Table in Word. Knowing the layout of your source helps you predict the cleanup time, and a quick pre-scan to crop margins and check page orientation pays off in measurably better output.

Tesseract OCR open source engine ISO 12653 scanner resolution standards

How to use this tool

💡

Upload your scanned PDF. FixTools runs OCR on the image pages and produces an editable Word document with the recognised text. Best results come from clear, high-resolution scans.

How It Works

Step-by-step guide to convert a scanned pdf to word:

1
Upload your scanned PDF
Open the PDF to Word tool and drag your scanned PDF onto the upload area, or click to browse. Scanned files are often large because each page carries an embedded raster image, so allow a few seconds for the file to load fully into browser memory before the convert button becomes active. A 50-page colour scan can be 30 MB or more, which is normal and not a sign of trouble.
2
OCR processing
FixTools automatically detects that the pages are image-based and routes them through the embedded OCR engine, which runs as WebAssembly inside your browser. The engine deskews, binarises, segments, and classifies each page in turn, reporting progress as it goes. No image data is uploaded to any server, every recognition step happens on your own machine.
3
Review the output
The converted Word document contains the recognised text along with detected paragraph breaks and table structures. Review it for any OCR errors using Word's spell check as a quick first pass, since most genuine OCR misreads also produce spelling flags. Accuracy depends almost entirely on the scan quality of the original source PDF.
4
Download and edit
Click Download to save the .docx file to your downloads folder. Open it in your preferred word processor, run a final read-through to catch the small handful of typical OCR misreads, then edit, reformat, or send it on as you would any other Word document. The whole round trip for a clean 20-page scan typically takes about three minutes.

Real-world examples

Common situations where this approach makes a real difference:

Law firm digitising paper case files

A small law firm scans 200-page paper case files at 300 DPI to create archival PDFs for long-term storage and easier remote access. Converting these PDFs to Word lets paralegals search across the full text of every case, copy clauses and quotes into new filings, and reference exhibits without trips to the filing cabinet. OCR accuracy on standard legal typewritten text at 300 DPI consistently exceeds 98 percent, keeping manual correction time under ten minutes per file even on dense pleadings, and producing files clean enough for full-text indexing in the firm's document management system.

Academic researcher transcribing archival documents

A history researcher has 1970s typewritten interview transcripts scanned at 300 DPI from a university archive holding. Converting the scanned PDFs to Word provides a working draft that captures roughly 97 percent of the text accurately, including the slightly faded carbon-copy pages. The researcher then reviews the output against the original scan side by side, correcting the remaining errors in under twenty minutes per 30-page document, and ends up with searchable transcripts that can be quoted, coded with qualitative analysis software, or shared with collaborators.

Business owner recovering records from old paper files

A small business owner needs to digitise five years of paper invoices that were scanned to PDF at 200 DPI by a previous bookkeeper. FixTools OCR extracts the vendor names, invoice amounts, dates, and reference numbers into a Word document with each invoice on its own page. The owner then copies the structured data into a spreadsheet for expense tracking, VAT reclaim, and historical analysis, avoiding complete manual reentry of several hundred line items and reducing what would have been a week of typing to an afternoon of review.

Student converting a photocopied textbook chapter

A student photocopied a chapter from a borrowed library textbook on a flatbed scanner at 300 DPI to support a research essay. Converting the scanned PDF to Word gives them searchable, editable text they can paste into their research notes, with the chapter's subheadings preserved as separate paragraph blocks. Footnotes and running headers on each page are captured as small text fragments that the student cleans up in under five minutes, leaving a tidy reference document with every quotable passage at their fingertips.

Pro tips

Get better results with these expert suggestions:

Scan at 300 DPI minimum for reliable OCR

If you are scanning a paper document specifically to convert it to Word later, use 300 DPI as your minimum scanner setting and do not let your scanner default to a lower draft mode. Documents scanned at 150 DPI produce noticeably more OCR errors on standard-sized text because the classifier has too few pixels per character to disambiguate similar shapes. Set your scanner to grayscale rather than colour for pure OCR scans, which dramatically reduces file size and improves the binary contrast that the OCR engine actually uses internally.

Use Word Find and Replace to catch common OCR errors

After conversion, run Find and Replace searches for the most common OCR substitutions: a numeric zero replacing the letter O in proper nouns, a numeric one replacing a lower-case l inside words, the pair r and n appearing as the single letter m, and the pair c and l appearing as a single d. A ten-minute pass through these common patterns cleans up most of the residual mistakes in a typical 20-page scanned document and leaves the file ready for serious editing or sharing without further proofreading.

Check the scanned PDF is not already text-based

Before converting a scanned PDF through OCR, try selecting a line of text in your PDF viewer first. If a blue selection box wraps around the words and you can copy them to the clipboard normally, the PDF already contains an embedded text layer, perhaps added by the scanner's built-in OCR feature or by the document's original author, and you do not need to run OCR again. Converting it as text rather than as an image produces a cleaner result, completes faster, and avoids the small accuracy loss every OCR pass introduces.

Crop blank margins before scanning to reduce file size

Large white margins around scanned content increase file size without adding any useful data for the OCR engine to work with. If your scanner software supports it, enable automatic margin cropping or set a custom scan area that hugs the printed region. A 10 MB scanned PDF carrying generous letterhead margins can shrink to about 4 MB after cropping, which speeds up the OCR conversion noticeably and reduces browser memory pressure. Tighter scans also make page-by-page review easier when you compare the output against the source.

FAQ

Frequently asked questions

Yes, this is exactly what the tool is designed to do for image-based PDFs. FixTools applies OCR, Optical Character Recognition, to image-only scanned PDFs to extract the underlying text, then outputs it as an editable .docx file you can open in Microsoft Word, Google Docs, or any compatible word processor. The process is fully automatic: upload the scanned PDF and the tool detects that the pages are images rather than text, runs the embedded OCR engine without you needing to flip any settings or pick a language pack, and produces the same kind of Word output you would get from a text-based PDF, with the only difference being the slight residual error rate inherent to OCR.

OCR accuracy for printed text scanned at 300 DPI typically exceeds 98 percent on standard business fonts and good black-on-white contrast. That means a typical 500-word page may have five to ten recognition errors on average, almost all of them simple substitutions that you can fix in seconds with spell check. Accuracy drops significantly with low-resolution scans below 200 DPI, with pages that were fed through the scanner at a slight angle, with documents printed on tinted or yellowed paper, and with unusual display fonts. Handwritten text achieves much lower accuracy regardless of scan quality and should be treated as a starting outline rather than a finished transcription.

Scans at 300 DPI with strong black-on-white contrast and minimal page skew give the best OCR results, and small adjustments to your scanning workflow pay off measurably in cleanup time. Avoid scanning at an angle, even a five-degree tilt reduces accuracy because the classifier expects horizontal baselines. If rescanning specifically for OCR, choose grayscale mode at 300 DPI rather than full colour, which keeps file sizes manageable without reducing text recognition quality. For documents with very small fonts such as footnotes or fine-print contracts, bump the resolution to 400 or 600 DPI.

OCR conversion focuses on extracting text content correctly rather than replicating the exact visual layout of the source page. The resulting .docx will contain the text in reading order, with paragraph breaks where the engine detects them, but multi-column layouts, precise text positioning, decorative drop caps, and ornamental elements from the original scan will not be preserved in any sophisticated way. Think of the Word output as a clean, editable text extraction of the document's meaning rather than a pixel-perfect visual replica. If you need a visual replica, keep the original PDF and use the Word file only for the text content.

Standard OCR engines, including the one embedded in FixTools, are designed for printed or typed text, not for handwriting. Handwriting recognition is a separate research area that requires specialised neural network models trained on labelled handwritten samples, and even purpose-built handwriting recognition tools achieve substantially lower accuracy than printed-text OCR. For handwritten documents, the OCR output will typically need extensive manual correction or full retyping. If the handwriting is very neat, large, and consistent, results are slightly better, but you should not trust them for legal, medical, or financial records without a careful human review against the source.

Scanned PDFs contain embedded raster images, which are inherently large compared to text-based PDFs of the same page count. A single A4 page scanned at 300 DPI in colour produces roughly 25 MB of uncompressed pixel data. The PDF format compresses this internally, usually with JPEG compression for colour pages and CCITT fax compression for black-and-white, but a 10-page colour scan can still weigh in at 5 to 15 MB. Use the FixTools PDF Compressor to reduce the file size before converting, which also speeds up the OCR process because the engine reads smaller image streams into memory faster and can complete each page sooner.

The OCR process attempts to group detected text lines into paragraphs based on spacing and indentation analysis, looking for blank gaps between groups of lines, hanging or first-line indents, and consistent left margins. For well-structured printed documents with standard typography, paragraph grouping is usually correct and the resulting Word file reads naturally. Documents with unusual spacing, dense tables, mixed two and three column layouts, or text wrapped around inline images may produce incorrectly grouped paragraphs that need manual reorganisation in the Word output. A quick review pass in Word's draft view usually identifies any such issues immediately.

If the original paper document is available, the simplest and most effective improvement is to rescan it at 300 DPI or higher with the scanner glass clean and the page held flat. If you only have the existing low-quality scan, try opening it in an image editor such as Photoshop, GIMP, or the free Photopea, increasing contrast and brightness so blacks are pure black and whites are pure white, then converting the cleaned image to a fresh PDF before running OCR. This pre-processing pass can recover several percentage points of accuracy on borderline scans, especially those suffering from low contrast or background colour bleed-through from the reverse side of the page.

The default OCR pipeline targets Latin-alphabet languages including English, French, Spanish, German, Italian, Portuguese, Dutch, and the Scandinavian languages, and it will correctly transcribe accented characters such as é, ü, ñ, å, and ç when the source scan resolution is 300 DPI or higher. Cyrillic, Greek, and Asian scripts may need a different OCR build, and even then accuracy depends heavily on font choice and stroke clarity. For mixed-language documents, the engine handles inline foreign-language phrases reasonably well as long as the alphabet is Latin-based, so a German quotation inside an English paper will normally transcribe accurately.