During a recent clear-out of stuff, I decided to get rid of all my CDs, most of which I had already ripped to MP3. I had about 50 which had survived previous clear-outs, and some were rather good and hard-to-come by (though nowadays easy to get on Amazon). It seemed a shame to just throw them out, so I decided to find a way to find new homes for as many of them as I could.
Initial ideas were to take a photo of the cover and simply make an image gallery HTML page, and manually type in the artist and album name. Too much work, so my inner lazy programmer took over (and probably spent more time on coding than it would have taken to type stuff in).
Barcodes, of course are ideal. I remember a few years ago some company developed a cataloguing app (for Mac?) that required you to buy a barcode scanner, but that was before smartphones, so I went looking and found an awesome Android app Barcode Scanner from ZXing Team. This app scans reliably even in low light (though you need a reasonably good smartphone camera that can focus quite close – my Galaxy Nexus worked well). The scanned data can be exported as a CSV (comma-separated values) file via email or to your Dropbox. So far so good, I have a mixture of UPC (12 digit) and EAN (13 digit) codes.
To get CD information from the barcodes, there’s a few methods. There are sites like Yoopsie, upc-search.org and upcdatabase.com (and their EAN equivalents) which allow web-based access and pay-for API access to databases of products. Being a cheapskate, I tried running web searches in Python and scraping the returned HTML (import lxml, requests), but crawlers get blocked, even with modified User Agent strings.
A rather elegant solution is to be found in the form of the Amazon Product Advertising API. With this, and the Python module python-amazon-simple-product-api (installable with pip), and after putting your AWS keys and associate tag in AWS.py searching become easy:
In : from amazon.api import AmazonAPI In : import AWS In : amazon_co_uk = AmazonAPI(AWS.AWSAccessKeyId, AWS.AWSSecretKey, AWS.AWSAssociateTag, region='UK') In : ItemId='643443104920' In : product = amazon_co_uk.lookup(SearchIndex='All', IdType='UPC', ItemId=ItemId) In : product.title Out: 'Who Can You Trust / Beats & B-Sides'
Depending on the product type, another object becomes available, product.item.ItemAttributes, which in this case has CD-specific data, like artist, number of CDs, label, release date. You also have access to cover images:
In : product.medium_image_url Out: 'http://ecx.images-amazon.com/images/I/413FM7Z5X3L._SL160_.jpg'
The output (to stdout) is a raw HTML file based on the Jinja2 template in index.html.template (with links to bootstrapcdn for the stylesheet). Upload this file to your webserver.
The whole process works like this:
- pip install python-amazon-simple-product-api jinja2
- git clone https://github.com/ciaron/cd-indexer.git
- scan the barcodes using the Android app
- export the scanned data (e.g. barcode.csv) to a file via email or Dropbox
- python barcodes.py < barcode.csv > index.html
The finished result can be seen here. There are still some glitches (like processing “unknown” CDs), and I’d like it to be more automatic (ideally scan straight to web), and to handle other product categories (books, perhaps).
If you’d like to play around with the code, branch it, fork it, do whatever, the source is up on github here.