File Juicer

File Juicer for Mac OS X

Download File Juicer

Drop Files into DoubleTake

Try DoubleTake

HTML and Web Archives

Extracting Images, Flash and Movies from HTML

HTML itself does not contain other file formats - at least not in the literal sense. It does however contain links to images, flash and movies, which are presented in the web page.

These external files can be collected and saved together with the HTML file in a "web archive".
FireFox has a "save complete" feature which will download all the referred files and store them in a folder together with the HTML, and this is probably the best choice for archiving web pages in an open way.

Web archives are not perfect for all cases. Web pages can be scripted in ways which does not support archiving. This is common on movie trailer sites.

Safari and Internet Explorer both have web archive formats, which does the same thing, except the files are packed into a web archive file, designed to be a single file easyly opened again with Safari or Internet Explorer. If you have one of these archives you can also extract the main content with FileJuicer.

Extracting HTML from other file formats

HTML is mostly found in browser cache files, web archives, and email attachments, but also inside other formats where it is used at rich text. File Juicer will extract the HTML if it has got the start html tag, and a proper end html tag, and it will include the doctype tag if it is there.

Dedicated web archive application

With plenty of control of how links are followed, and web pages are archived.
Using web site downloaders is not always popular on web sites served on lesser internet connections.