CD indexing with barcodes and Python

During a recent clear-out of stuff, I decided to get rid of all my CDs, most of which I had already ripped to MP3. I had about 50 which had survived previous clear-outs, and some were rather good and hard-to-come by (though nowadays easy to get on Amazon). It seemed a shame to just throw them out, so I decided to find a way to find new homes for as many of them as I could.

Initial ideas were to take a photo of the cover and simply make an image gallery HTML page, and manually type in the artist and album name. Too much work, so my inner lazy programmer took over (and probably spent more time on coding than it would have taken to type stuff in).

Barcodes, of course are ideal. I remember a few years ago some company developed a cataloguing app (for Mac?) that required you to buy a barcode scanner, but that was before smartphones, so I went looking and found an awesome Android app Barcode Scanner from ZXing Team. This app scans reliably even in low light (though you need a reasonably good smartphone camera that can focus quite close – my Galaxy Nexus worked well). The scanned data can be exported as a CSV (comma-separated values) file via email or to your Dropbox. So far so good, I have a mixture of UPC (12 digit) and EAN (13 digit) codes.

To get CD information from the barcodes, there’s a few methods. There are sites like Yoopsie, upc-search.org and upcdatabase.com (and their EAN equivalents) which allow web-based access and pay-for API access to databases of products. Being a cheapskate, I tried running web searches in Python and scraping the returned HTML (import lxml, requests), but crawlers get blocked, even with modified User Agent strings.

A rather elegant solution is to be found in the form of the Amazon Product Advertising API. With this, and the Python module python-amazon-simple-product-api (installable with pip), and after putting your AWS keys and associate tag in AWS.py searching become easy:

In [1]: from amazon.api import AmazonAPI
In [2]: import AWS
In [3]: amazon_co_uk = AmazonAPI(AWS.AWSAccessKeyId, AWS.AWSSecretKey, AWS.AWSAssociateTag, region='UK')
In [4]: ItemId='643443104920'
In [5]: product = amazon_co_uk.lookup(SearchIndex='All', IdType='UPC', ItemId=ItemId)
In [6]: product.title
Out[6]: 'Who Can You Trust / Beats & B-Sides'

Depending on the product type, another object becomes available, product.item.ItemAttributes, which in this case has CD-specific data, like artist, number of CDs, label, release date. You also have access to cover images:

In [9]: product.medium_image_url
Out[9]: 'http://ecx.images-amazon.com/images/I/413FM7Z5X3L._SL160_.jpg'

The output (to stdout) is a raw HTML file based on the Jinja2 template in index.html.template (with links to bootstrapcdn for the stylesheet). Upload this file to your webserver.

The whole process works like this:

  1. pip install python-amazon-simple-product-api jinja2
  2. git clone https://github.com/ciaron/cd-indexer.git
  3. scan the barcodes using the Android app
  4. export the scanned data (e.g. barcode.csv) to a file via email or Dropbox
  5. python barcodes.py < barcode.csv > index.html

The finished result can be seen here. There are still some glitches (like processing “unknown” CDs), and I’d like it to be more automatic (ideally scan straight to web), and to handle other product categories (books, perhaps).

If you’d like to play around with the code, branch it, fork it, do whatever, the source is up on github here.

Instagram Image

Instagram Image

Instagram Image

Instagram Image

links of the week – 22 July 2014

Sorry for the gap in posting, here are my links of the week:

Been getting my music playing setup working with Music Player Daemon (mpd). Several clients exist, but I’m using Theremin on Mac OSX and Gnome Music Player Client on Linux (but despite the name, this works on Windows too).

An Illustrated Book of Bad Arguments.

Battleship Rangefinders and Geometry.

links of the week

Secret Thirteen Mix 020 – Machinefabriek

Machinefabriek – lovely ambient – and as the blurb says, lethargic – mix.

Secret Thirteen Mix 020 – Machinefabriek | Secret Thirteen – Infinite Music and Art Journal.

Tech Tip: Extract Pages From a PDF | Linux Journal

Open source to the rescue! If you need to remove pages from a PDF file, here’s how – assuming you have psutils (pdf2ps and psselect) available on your system (e.g. extract pages 22 to 36 to a new PDF):

$ pdftops 100p-inputfile.pdf - | psselect -p22-36 | ps2pdf14 - outfile_p22-p36.pdf

Tech Tip: Extract Pages From a PDF | Linux Journal.

More tips here.

links of the week

  • Shipping pallets

    Take any object you like, pile it onto a pallet, and it becomes, simply, a “unit load”—standardized, cubical, and ideally suited to being scooped up by the tines of a forklift.

  • The Aviator’s Heart

    Amidst hangars full of airplanes and aviation memorabilia, visitors to Brazil’s National Air and Space Museum encounter a much stranger object. It is a gold plated celestial globe, supported by a marble statue of an Icarus-like figure with its arms raised skyward. There is a human heart inside the globe, preserved in formaldehyde. The heart of a man called Alberto Santos-Dumont. Brazilians consider him to be the true inventor of the airplane.

  • Been playing with SuperCollider this week.
  • LCD Module control using Python: Raspberry Pi -> Hitachi HD44780 LCD controller. More GPIO things.
  • Demonstration of herd immunity.