File Juicer

File Juicer for Mac OS X

Rebuild a Word Document From PDF

File Juicer is first and foremost an image and text extractor, but if you have Mac OS X 10.4, 10.5 or 10.6 you can use it to convert simple PDF files to Word, RTF or plain text, unless they are scanned or encrypted.

Information about the structure of the Word document is not saved in the PDF file when it is generated. File Juicer will not try to recreate it. You can extract the text from PDF documents as RTF (rich text format) and this may be good enough if you don't need to preserve multicolumn or tabled layout.

Professional Tools

For advanced software to convert PDF to Office and recreate the layout, you should take a look at Adobe Acrobat X Professional [Mac] or PDF2Office

Converting Scanned Documents to Text

File Juicer does not convert scanned images to text "Optical Character recognition" or OCR. There are 3 classic applications which do this: Adobe Acrobat X Professional [Mac] , OmniPage Pro X for Macintosh and Readiris Pro 12 For Mac . You can also choose to buy a scanner (Canon from Amazon) which comes with bundled OCR software (read the description closely). VueScan will also do OCR, and while not as advanced, it may be enough to cover your needs.

Also visit Apple's Mac App Store and do a search for OCR there. There are several OCR apps available, some more advanced than others and some more accurate than others. Accuracy is one of the things you pay a premium for. ABBYY FineReader Express has gotten good reviews as of this writing.

Using a Scanning Service

If you are not ready to learn the art of OCR you can hire a OCR service to do it. They may provide affordable prices in particular if they have offices in India. One example can be New York Document Scanning or do a Google search.

PDF To Word via RTF or ASCII Text

extract text from pdf

Demonstration Video

If you wish to try this yourself you can download this PDF, and watch this 1 minute screen recording

RTF is developed by Microsoft to carry formatted text between applications. Word, TextEdit and other applications can open it retaining the fonts, font sizes and colors. It will not preserve layout.

File Juicer use the same PDF to RTF engine as Apple's Preview and you can do the same extraction with Preview if you copy and paste the text out of each page of the PDF. File Juicer extract images from PDF and place them in a separate folder. You place them manually in the Word document when you have recreated the layout.

This is the File Juicer preferences I would recommend to extract the text and images needed to rebuild a Word file.

Preferences for Conversion from PDF to Word

AutoFormat

Autoformat

I recommend extracting both ASCII and RTF as sometimes it is easier to rebuild the document from pure text without the formatting. Word lets you use abstract names for the formatting like "Heading 1" or "Normal". In the RTF file, this is replaced by the actual font names and sizes used - like "Arial 16" and "Times 12".
Word's Autoformat... function is for turning the "font information" back into document structure.

Extract Images from Files

File Juicer is a general purpose extraction tool designed to search inside any file to see if there are images in any standard format. It was originally made to extract images from PowerPoint files, but since then it has been extended to recognize a lot of file formats.

Extracting images from PDF is done without re-compression so it preserves all the quality that was saved in the PDF file originally.

Rebuilding Word documents from other files

You can download and try File Juicer for free for just this one function from the File Juicer page, but you may also check out its other functions by browsing the User guide and the File Format tips.