Export purchased books list from Amazon

Sunday, September 18th, 2011 | Author:

If you happened to buy books from Amazon.com (or, in my case, Amazon.de) and maybe used the recommendation engine and the wishlist (and and and ...) then there will be lots of data about your books on the Amazon website. Have you ever thought about organizing your library with a different tool? May it be Google Books or LibraryThing or Shelfari, you will have to export this precious big amount of data from Amazon to the other service. Luckily, some intelligent people invented ISBN, so you basically need to extract a list of ISBNs to identify the books (neglecting your reviews and tags for now). Not that luckily, Amazon doesn't offer such export functionality to the layman. Searching the internet yields a Greasemonkey script that enables you to export wishlist content - but no ISBNs, so import into other services is not so easy.

The solution is to save each website of "your purchases" (or other such lists of books) as HTML file and let a smart script do the extraction work. This way, you're not violating Amazon's terms of service (which most likely don't allow any robots scraping the website) and on the positive side, it works.

Here is my python script, which you can also download here (in a better version):
import sys, re
asinRegExString = "<tr valign=middle id=\"iyrListItem([A-Z0-9]{10})\">"
asinRegEx = re.compile(asinRegExString)
filename = sys.argv[1]
f = open(filename,'r')
asinlist = []
for line in f.readlines():
    match = asinRegEx.match(line)
    if match != None:
        asinlist+=[match.group(1)]
f.close()
print "\n".join(asinlist)

To run this script, you need a Python interpreter. On most common GNU/Linux systems, those are installed or easily installable, for example by "apt-get install python" on Debian-based systems.

I have tested it with Amazon.de and the "purchased books" website but I guess it would work equally well with Amazon.co.uk and Amazon.com. As always, leave a comment if it worked for you or not. If it doesn't work or if you have different needs (like, extracting ISBN and name and price) this will be easily possible by altering the regular expressions in the script (easy for a programmer, not that easy for anyone else).


I used this to import all books I bought via Amazon into my Google Books library which I use to maintain a list of all books I own. The nice thing about Google Books, on the other hand, is their XML export feature, which I commented on earlier.


Category: English

Comments are currently closed.

 

4 Responses

  1. Oh, it turns out LibraryThing does it like me:
    http://www.librarything.com/wiki/index.php/Adding_and_importing_books

    so probably you can just sign up there and then (hopefully, haven't been there) import from Amazon and export to ISBN. Anyway, with the script I wrote there is no need to create an account at LibraryThing.

  2. It doesn't work for the "I-own-it" list because python has to log into Amazon first (My browser having logged in Amazon is independent of it).

    I tried to use Trill / Mechanize to log into Amazon but their code is currently buggy and cannot parse Amazon's HTML correctly.

    Any help would be appreciated -- I have 65 pages of owned books that I want to export! :)

  3. I don't quite understand your comment "python has to log into Amazon first". My solution really is to download the relevant HTML files by hand, while being logged in with your browser. Only after that you will be able to use my python script.

    Downloading the 65 pages by hand sounds painful, so in that case you might want to write a Greasemonkey script to do that. On the other hand, I would most certainly do this by hand.

  4. Thanks very much for keeping this online. I've just run into this problem and was able to use your script to export all my purchased books and import all into Google Books.

    I did make some small changes to your script, for example I changed the regex to

    asinRegExString = "<td id=\"iyrListCount([A-Z0-9]{10})"

    and put brackets around the print statements to make it work with Python 3+.