Mass renaming papers with BibTex+JabRef export filters
Monday, June 28th, 2010 | Author: Konrad Voelkel
If you manage your (scientific) references, such as journal articles, arXiv papers and textbooks within some reference management system that uses BibTex as storage/export format, and you have local copies of your files, then the following might be of interest:
I wrote a JabRef export filter that takes a BibTex file with file links (so, BibTex fields of the form file={somefile.pdf}) and writes a linux shell script to rename the files systematically according to the scheme [bibtexkey] - [authors] - [title].[extension]. Then JabRef can find the file again via its automatic file association mechanism. I use lower-case bibtexkeys but the export filter is easily adaptable, read about it on the JabRef custom export filter documentation page.
Just create (or download) a file named "renamer.layout" and fill in this line:
\begin{file}mv "\format[FileLink]{\file}" "\format[ToLowerCase,FormatChars]{\bibtexkey} - \format[AuthorNatBib,ToLowerCase,FormatChars,RemoveBrackets]{\author} - \format[FormatChars,RemoveBrackets,ToLowerCase]{\title}.\format[Replace(.*:,),ToLowerCase]{\file}"\end{file}
then open JabRef and go to the menu entry Options->Manage custom exports->Add new where you enter (for example) "renamer" as Export name, the full path to your renamer.layout file in the Main layout file field and "sh" as File extension.
Then open your BibTex file (.bib) with JabRef and then select the menu entry File->Export and select in the drop-down-menu Files of Type your newly created export filter renamer (*.sh). This gives you a shell script which, if executed, renames all files linked from the BibTex document into a standardised format (and moves all into the directory from where you execute the script).
This is only useful if you have files linked from your BibTex file, so you might need to do this first. If you already have filenames that contain some metadata, like author names or document titles, you might be very happy with JabRef's RegEx-capable automatic file finder, which can be configured in the menu entry Options->Preferences->External Programs->External file links.
Even if you don't use JabRef, you can use this process as described by exploiting the export-as-BibTex-capabilities of your favourite reference management system.
You might ask "why", and I respond: my files are all organised in a way from which I can easily extract metadata using only the tools some operating system provides, so in case I don't have access to my BibTex file, I can still find the desired files using the GNU/Linux command locate. Of course, I also have included the BibTex information in XMP into the PDF files (which is another feature of JabRef that I like a lot), so nothing is lost if I ever switch the reference management system.
Another lesson learned from this blogpost: writing specific JabRef export filters is very easy. Another one I wrote is able to download automagically entries from the arXiv when the URL is supplied in the url BibTex field. I won't post it here because you need to disguise wget as "Mozilla 5.0", otherwise the arXiv won't let you download stuff (robot protection). I hope those who are able to figure out the details are also responsible enough to not download huge amounts of papers from the arXiv.
Putting it together, this provided a convenient approach to get arXiv papers with full metadata included in filename, PDF and BibTex on my computer. The still-not-perfect part is the first, getting the metadata from arXiv in BibTex format - I use CiteULike as proxy (and would be happy to hear about better solutions with JabRef).
You might also ask why I keep copies of my references on my computer (or why they have to be linked from my reference management system). I just find it very convenient to use my laptop as eReader, even when no internet is available, and given that I have 100+ references in the system, it is good to have metadata such as keywords, abstract, reviews, annotations and so on.
I learned about JabRef export filters somehow by accident because of another project related to reference management, look there.