# Linux, OCR and PDF: Scan to PDF/A

Friday, March 29th, 2013 | Author:

The (by far) most visited post on this blog is from 2010, about OCRing a PDF in GNU/Linux (Optical Character Recognition), and it contains a small shell script that has been improved by others several times. After having bought a new flatbed scanner, I re-investigated how to scan and OCR pdfs, how to produce DJVU files that are incredibly small and how to get metadata right. It turns out what I really ever wanted was to create PDF/A compliant documents (I just didn't know what PDF/A was before). But let me explain the details after presenting you the quick solution. At the end, I have a shell script that scans directly to PDF/A.

Category: English, Not Mathematics | 20 Comments

# Get your own LaTeX-enabled wiki in the cloud with Instiki on Heroku

Wednesday, November 21st, 2012 | Author:

I guess you all know what a WikiWikiWeb (short: wiki) is, it's a website where you can easily add new pages and modify existing ones. MathOverflow is some kind of hybrid between Q&A and a wiki, since users with enough reputation can edit other people's questions and answers. MathOverflow made the Markdown syntax very popular, and people got used to using LaTeX online. Some of my readers surely know the nLab, a collaborative wiki on n-categorical math(ematical physics) and stuff. The nLab runs on a software called Instiki, which is a wiki written in Ruby (an intepreted language similar to Python, and somewhat similar to Lisp, Perl and JavaScript; which is often used for web applications like wikis). The good thing about Instiki is that it supports editing pages in Markdown syntax with embedded LaTeX, so it is able to support your personal knowledge management needs. In addition, Instiki is small (thus not many bugs are to be expected), fast and the code is quite readable; something I wouldn't say about MediaWiki, the software behind Wikipedia.

In this post, I will tell you how to run your own wiki like the nLab. [UPDATED 2013-01-07; easier fix]

Category: English, Mathematics, Not Mathematics | 2 Comments

# An arrow notation for annotations

Saturday, October 27th, 2012 | Author:

Nowadays it is common to use $x \mapsto f(x)$ to denote that an element $x \in X$ is mapped to an element $f(x) \in Y$ by the map(ping) $f : X \to Y$. In particular, the arrow $\rightarrow$ (in LaTeX: \rightarrow) denotes a map, or more generally a morphism, while $\mapsto$ (in LaTeX: \mapsto) denotes how particular elements or objects are mapped to other elements or objects.

Have you ever seen an arrow which has a triangle as head? Like those:

# 2nd Workshop on Personal Knowledge Management

Sunday, September 12th, 2010 | Author:

Today I'm attending the second Workshop on Personal Knowledge Management (PKM2010) at the Human-Computer-Interaction (HCI) conference "Mensch und Computer" in Duisburg (Germany).

I have absolutely no idea what to expect, so I expect to be surprised.

UPDATE: Now that the workshop is almost over (coffee break right now), maybe the most important for me:

It was fun!

Category: English | One Comment

# Mass renaming papers with BibTex+JabRef export filters

Monday, June 28th, 2010 | Author:

If you manage your (scientific) references, such as journal articles, arXiv papers and textbooks within some reference management system that uses BibTex as storage/export format, and you have local copies of your files, then the following might be of interest:

I wrote a JabRef export filter that takes a BibTex file with file links (so, BibTex fields of the form file={somefile.pdf}) and writes a linux shell script to rename the files systematically according to the scheme [bibtexkey] - [authors] - [title].[extension]. Then JabRef can find the file again via its automatic file association mechanism. I use lower-case bibtexkeys but the export filter is easily adaptable, read about it on the JabRef custom export filter documentation page.

Just create (or download) a file named "renamer.layout" and fill in this line:
\begin{file}mv "\format[FileLink]{\file}" "\format[ToLowerCase,FormatChars]{\bibtexkey} - \format[AuthorNatBib,ToLowerCase,FormatChars,RemoveBrackets]{\author} - \format[FormatChars,RemoveBrackets,ToLowerCase]{\title}.\format[Replace(.*:,),ToLowerCase]{\file}"\end{file}
then open JabRef and go to the menu entry Options->Manage custom exports->Add new where you enter (for example) "renamer" as Export name, the full path to your renamer.layout file in the Main layout file field and "sh" as File extension.

Then open your BibTex file (.bib) with JabRef and then select the menu entry File->Export and select in the drop-down-menu Files of Type your newly created export filter renamer (*.sh). This gives you a shell script which, if executed, renames all files linked from the BibTex document into a standardised format (and moves all into the directory from where you execute the script).