File Juicer

File Juicer for macOS

Rebuild a Word Document From PDF

File Juicer is first and foremost an image and text extractor, but if you have macOS 10.4, 10.5 or 10.6 you can use it to convert simple PDF files to Word, RTF or plain text, unless they are scanned or encrypted.

Information about the structure of the Word document is not saved in the PDF file when it is generated. File Juicer will not try to recreate it. You can extract the text from PDF documents as RTF (rich text format) and this may be good enough if you don't need to preserve multicolumn or tabled layout.

Professional Tools

For advanced software to convert PDF to Office and recreate the layout, you should take a look at Adobe Acrobat.

Converting Scanned Documents to Text

File Juicer does not convert scanned images to text "Optical Character recognition" or OCR. There are 3 classic applications which do this: Adobe Acrobat , OmniPage and Readiris Pro. You can also choose to buy a scanner which comes with bundled OCR software (read the description closely). VueScan will also do OCR, and while not as advanced, it may be enough to cover your needs.

Also visit Apple's Mac App Store and do a search for OCR there. There are several OCR apps available, some more advanced than others and some more accurate than others. Accuracy is one of the things you pay a premium for.

PDF To Word via RTF or ASCII Text

extract text from pdf

Demonstration Video

If you wish to try this yourself you can download this PDF, and watch this 1 minute screen recording

RTF is developed by Microsoft to carry formatted text between applications. Word, TextEdit and other applications can open it retaining the fonts, font sizes and colors. It will not preserve layout.

File Juicer use the same PDF to RTF engine as Apple's Preview and you can do the same extraction with Preview if you copy and paste the text out of each page of the PDF. File Juicer extract images from PDF and place them in a separate folder. You place them manually in the Word document when you have recreated the layout.

This is the File Juicer preferences I would recommend to extract the text and images needed to rebuild a Word file.

Preferences for Conversion from PDF to Word

AutoFormat

Autoformat

I recommend extracting both ASCII and RTF as sometimes it is easier to rebuild the document from pure text without the formatting. Word lets you use abstract names for the formatting like "Heading 1" or "Normal". In the RTF file, this is replaced by the actual font names and sizes used - like "Arial 16" and "Times 12".
Word's Autoformat... function is for turning the "font information" back into document structure.

Extract Images from Files

File Juicer is a general purpose extraction tool designed to search inside any file to see if there are images in any standard format. It was originally made to extract images from PowerPoint files, but since then it has been extended to recognize a lot of file formats.

Extracting images from PDF is done without re-compression so it preserves all the quality that was saved in the PDF file originally.

Rebuilding Word documents from other files

You can download and try File Juicer for free for just this one function from the File Juicer page, but you may also check out its other functions by browsing the User guide and the File Format tips.