Wednesday, May 7, 2008

Wikimedia2pdf Tool

Textbook-l received a notice this morning from the team at about a new tool that they've been working on: wikimedia2pdf. Apparently the tool was originally titled "Wikibooks2pdf" because of it's focus on creating PDF versions of existing multi-page wikibooks. However, it seems that some changes have been made to allow creating PDFs from Wikipedia as well. I'm not sure about all the details about how this works, so I'm going to spend some time playing with it today or tomorrow.

I tried to make a quick PDF this morning for a new book I've been working on, but the formatting came out a little strange. I didn't have time to really play with the settings, and I'll try it again later hoping for better results.

The website also includes several pre-made PDF versions of books that you can look at. Specifically, they seem to have already made PDFs of all our featured books. I took a look at the PDF version of [[Control Systems]], a book that I wrote most of and the only book that I've authored to become featured so far. I know that Control Systems is about 250 pages long (at least when I created the PDF manually using my own software), so I was surprised to see that the version from Wikimedia2pdf was over 1350 pages! The algorithm used by this tool appears to be a little bit naive. Control Systems has 4 "print versions", pages into which the rest of the pages in the book are trancluded. In a sense, a "print version" is like a book-on-a-single-page. Wikimedia2pdf transcluded the entire copy of all 4 print versions into the PDF it created. In essence, the PDF contains 5 copies of the book, back to back. What would be very cool here would be to provide a list of all the pages that the tool finds, and ask the user to check or uncheck pages which do not belong in the final PDF (like print versions, meta-data pages, etc). If this issue gets sorted out, Wikimedia2pdf will be a very cool tool for us to use.


  1. Now if such a tool could automate the FDL-require attribution for pedia and books, it would be really impressive.

  2. You're right, proper GFDL attribution would be a great thing to have in this tool. I had heard about a project on the toolserver that aimed to do exactly that, but I can't remember where the link to it is. Adding in a little copyright notice for each of the images used in the book would probably be nice too.

    There are a lot of features that need to be added to this tool, but I think it's a pretty good start.

  3. already know this, but Brion Vibber posted about pdf generators a while ago, both the post and the comments have several links that might be useful/interesting.

  4. Please add "Perhaps you already" to the beginning of my previous comment.