Friday, October 12, 2007

HOWTO: Converting .CHM to .PDF

If you are like me, who is fussing about having to click on every 'Next' to move on to the next page of a .CHM file, or who needs to have a CHM file printed (Imagine the horror of having to select every page to print o_O;) You would have, at any point of time, wondered about this.

There is a good ubuntu geek guide on this (google: convert chm pdf ubuntu). I don't use htmldoc for the conversion though, because I think the generated pdf file is too ugly (probably cos I didn't configure it properly. lol)

Step 1: Assuming you already installed libchm-bin, this command will run it

extract_chmLib book.chm outdir

I have always find this command hard to remember, so I like to save it somewhere and do a cut and paste, changing merely the file name and destination directory.

Step 2: With the decompressed html files, I used Adobe Acrobat 8 Professional (Dualbooting = best of both worlds) to convert multiple files to a single pdf file.

The only thing to note here is that you would need to select the order of files and you can remove the headers and footers (showing the file name and location and time/date etc, which I think is too ugly) by unsetting the options under the "Create PDF from Web Page" option. It is NOT at Preferences -> Options setting (I couldn't find it earlier because I was combing through this -.- )

Optional Step:
Before step 2, if you are concerned about the Next, Previous links that appeared on each html page (like me), you can opt to remove them.

Using UltraEdit (or any good text editor that can allow you to search and replace texts in files without having to open them manually), I opened one sample html file and observed how the Next and Previous tags were coded. Most often it is a table by itself.

Copy the code for the 'Next' and do a regular expression search and replace. This is important because we will need to use a regular expression to handle the a href="something.html" part. Note that this is a dynamic line that changes from page to page, while the rest remain static.

Replace the "something.html" with "[\w]+[.]html" This will account for all the different variations of the URL.

Leave the replace field blank. This effectively means removing the search string.

Now search and replace all the .html files in the directory. Leave it to run.

Repeat the same thing for the 'Previous' tag.

Done! After the html files were cleansed of those bothersome links, you can convert them to pdf using step 2 above.

Enjoy your beautifully created pdf file! =)