Managing papers – Konrad Völkel

Today in the series "How to do XYZ with software?":

How to manage papers?

I have lots of PDFs on my hard-disk, and most of them is half-read or unread. Since I'm studying mathematics, these PDFs are lecture notes, research papers, my own notes and several more-or-less relevant books. How do I organise them? It's a problem.

I separate my own notes from everything else, and keep my own PDFs with their LaTeX source in the same folder, these folders are sorted by subject. The other PDFs fall in 3 categories: the relevant, the irrelevant and the books. I keep all books in one big folder and use search to find something specific. The irrelevant PDFs are just somewhere floating in my computer - on my desktop and in various folders called "papers", "some papers", "more papers" and "maybe read". The relevant PDFs are re-named to make them easier to find with a desktop search engine (I usually put the authors name, some keywords and the year in the file-name) and they're sorted by subject in folders. I consider this a very bad solution.

So I have a second system: JabRef, which is an open source reference manager (and BibTeX database editor) written in Java, so you can use it on Windows, Linux and Mac OS X. In JabRef I put every book or paper I want to cite somewhere or that I think of being that interesting that I just have to keep one more safe reference. JabRef allows to link BibTex entries to files on your hard-disk. This way you can forget about where you stored your files.
Interesting fact: Google Scholar has a setting to enable BibTex export - which you can use together with JabRefs BibTex import for very fast database creation/updating. Sadly, the Google Scholar metadata is wrong sometimes...
Another interesting fact: there is a Bibsonomy plug-in for JabRef. Sadly, it doesn't work for me (I don't get it how to install it under Linux). Bibsonomy seems to be used almost exclusively by computer scientists, especially those working on semantic web stuff.

It's strange, but I have a third system: a huge spreadsheet in Google Documents, where I keep all the papers and books that are related to something I want to learn in near future. I keep track of dependencies, that means I use this spreadsheet to find out which paper/book I have to read first in order to understand the second. I also attach some relevancy score to each item. Finally, some insane formula calculates a ranking across all items which tells me the top 3 papers I have to read next. This works (at least after I adjusted the formula long enough so that it displayed what I already knew was important). I'm currently thinking about replacing this system with CiteULike, because you can enter a reading priority there, too.

For Wikipedia users, the reference management system Zeteo might be of interest. You can't have user accounts there, but easily manage references for later use in Wikipedia. Funny: it turns out it's written by someone working in my math department!

(image "Dropbox Upgrade" licensed from Scott Beale under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 Generic license)

It's no fun to "maintain" three systems, one messier than the other. I'm currently thinking to just move everything over to JabRef, but this doesn't resolve the "storage" problem. Maybe one big folder that contains it all, combined with desktop search is the best solution. Maybe it would be cool to store the papers on-line, in Ubuntu One, for example. But I wasn't bold enough to try, yet. Maybe it's good to keep track of references in some Web 2.0 tool, like Bibsonomy? I haven't tried either. I think it is a good idea to remove any PDFs from your hard-disk that are available without restrictions (like arXiv papers) that you don't need in the near future. The less there is, the easier it is to manage it. If you have any suggestions, I would be very happy to read your comments.