Skip navigation.
Just your average bearded geek

How to create ebooks with Linux

The digital age has brought us new ways to preserve information. By allowing infinite copies of the same quality as the original, and to concentrate large amounts of data on tiny devices, electronic documents are clearly the way to go if you want to keep a book forever. Projects like Google books amaze me because they are the biggest attempt to date to preserve mankind knowledge. To your scale, you can also preserve books that matter for you.

Printed books have a limited life, are more subject to damage and accidents than digital content, and degrades over time. In order to preserve your precious old books, you can digitize them and produce a portable, readable file that can be opened anywhere. The de-facto standard format in that respect is PDF. This document will shortly overview the process and tools needed in order to create a PDF ebook from a book under Linux. The goal is to have the digitized book being as close to the original as possible. Therefore, I won't detail the OCR process (there are many tools and tutorials available). What we are looking for here is to produce a pdf file from a set of scanned images - while some operations like searching won't be allowed, the digitized book will be a verbatim, perennial copy of the original.

Acquiring the pages

The first step is, of course, to use your scanner in order to digitize the pages of your book, one by one. All pages should have the same resolution - so use the preview to setup your first page correctly and select the right area to scan, then don't change the selection anymore. That way, all the pages will have the exact same size.

For odd pages, you may find it easier to scan them backwards. Do it if necessary - we'll fix that later.

You don't need to scan in a very high resolution. 300 dpi is more than enough - in my case, 200 dpi was good. Don't scan your book in color if it is black & white - if grayscale is enough, use it. Save up what you can as long as it doesn't impact on your scan quality. To determine what is good and what is not, you will need to make several test scans, since the right parameters vary from book to book.

What is important, however, is to save the images in a lossless format like png. Don't save your images as jpg yet, since you will probably want to improve them. Doing so involves loading and saving the image several times, and if you choose jpg now you'll end up with a several times degraded image, which is probably not what you want.

Name your scanned images correctly - they should include the page number at the end of the name. Some tools like kooka are great for this, since they increment the page number and include it in the image file name.

Improving the scanned images

You may have to rotate all the odd or even pages. This occured to me since I could only scan the book this way. For odd pages, all I had to do was:

for i in *{1,3,5,7,9}.png; do mogrify -rotate 180 $i; done

Then, you can try to improve your images. Tiffprocess is a great tool that will do it automatically for you. If you need more precise processing, you can easily slightly rotate your images to fix improperly-placed pages using Gimp. You may also want to remove undesired parasites on the page, or color variations in the background. When I was scanning, the other side of the page was slightly visible - this was very annoying. Using the color selection tool, I selected the background which included these undesired prints. Then, using the square selection tool, I deselected all the pictures, and finally filled the remaining selection with white using the bucket tools. It did wonders to improve readability as well as reduce the images files size.

The .png files can then be converted to jpg with this command:

mogrify -format jpg *.png

Assembling the whole

And finally, assemble all the .jpg into one pdf file. I have been using pstill which, although not free software, is free as in beer for non-commercial usage and works pretty well. Assuming your images have a width of 2000 pixels and a height of 3000, here is the command you may want to use:

pstill -2 -c -c -c -c -i -t -v -w 2000 -h 3000 *.jpg

With 200 dpi scans, for a 60 pages book, this command gave me a 60Mb file named out.pdf that opened pretty fast in kpdf or acroread. Of course, if you are only interested in the content of the book and don't care about the layout, you'd better OCR it.

A little simpler

Nice article, thanks. The rotation upside-down of the odd pages could also be written like this:

for i in *[13579].png; do mogrify -rotate 180 $i; done

Cheers...

Right, thanks a lot. I'm a

Right, thanks a lot. I'm a shame when it comes to shell-scripting...

Actually it should be for i

Actually it should be for i in *{1,3,5,7,9}.png; do mogrify -rotate 180 $i; done it's shell expansion, not regex in the for command.

Much better indeed. Thanks a

Much better indeed. Thanks a lot.