Tom MacWright

tom@macwright.com

Recently

Stand up Berkeley, Stand up

Reading

  • I went to a Logic magazine event at City Lights bookstore, and I glanced to my right and saw The banjo: America’s African Instrument, and immediately bought it. It’s been my reading this month, and I’m quite enjoying it. The book doesn’t follow the exciting adventure through time format of similar historical non-fiction: it digs into the minute details of the instrument’s history, predecessors, and context. And, just as Nixonland was an education on the real substance of the civil rights struggle in the US, this has been an invaluable lesson on slavery. It covers many details and dynamics that my limited history education missed.

Listening

I’ve adored Way Yes for years now, and their new album gives me a much-needed hit of inspiration.

Four Tet’s new music has been making waves, partly because his sharing of the minimal gear that he wrote the album on.

I’ve been learning and practicing classical music on the guitar recently. I would say ‘classical guitar music’, but I’m not sure if that’s pedantically true, given that all of the music I like to learn was originally written for the piano. classtab.org is just spectacular for this purpose: it’s a well-organized, fast, to-the-point website with tabs of many many pieces. I learned Ravel’s Pavane de la Belle au Bois Dormant, (wiki link). I’m also learning the rest of Satie’s catalog - currently Ce Que Dit La Petite Princesse Des Tulipes ‘What the Little Princess of Tulips Said’.

Watching

I’m working on reducing my TV consumption and replace it with reading, running, and going to things in the area. I’m trying.

Site update: Amazon → WorldCat for books

I’ve made a bit of an update to this site: when I link to books, from now on I’m going to link to WorldCat instead of Amazon. For years, I’ve defaulted links to Amazon. It’s where a majority of people consume books anyway, it’s the only place to link to for Amazon-only eBooks, and fees Amazon paid helped to cover my domain registration & gaug.es account. Not entirely, of course - there’s probably a loss of $30 or $40 yearly for running this site.

Anyway, I decided that I cared more about good, neutral, reliable links than I do about absolute convenience or some attempt at profitability. After reviewing a bunch of options, I learned quite a bit about how books can be referenced, and the many systems involved. ISBNs are far from the only identifier in town - there are also OCLC numbers, issued by WorldCat, and Open Library identifiers, and you can identify books using their EAN codes, there are Library of Congress identifiers, and so on. WorldCat won out as the link target because it has high quality data, it’s a non-profit union catalog and their technical chops seem good: pages follow schema rules, are accessible, and are simple. I really like Open Library’s design, but the data is more iffy.

For the curious, this is the script I used to mass-convert the initial batch of URLs, and then I did a separate secondary pass for harder conversions, like links to eBooks that didn’t contain an ISBN in the URL. The process is:

  • For each post, find all amzn.to URLs
  • Do a HEAD request to learn where each link points to
  • If there’s an ISBN in that redirected URL, do a HEAD request to WorldCat’s /isbn/ path to get its redirect, which is the canonical page on WorldCat.
import re
import codecs
import requests
import glob

AMZN_RE = re.compile(u"https?://amzn.to/([0-9A-Za-z]+)")
ISBN1 = re.compile(u"https://www.amazon.com/(?:[A-Za-z\-]+)/dp/(\d{10})/")
ISBN2 = re.compile(u"https://www.amazon.com/gp/product/(\d{10})/")

def remove_amazon(filename):
    print("Translating %s", filename)
    f = codecs.open(filename, encoding='utf-8').read()
    for cap in re.finditer(AMZN_RE, f):
        url = cap.group(0)
        redirected_to = requests.head(url, allow_redirects=True).url
        capture = ISBN1.match(redirected_to) or ISBN2.match(redirected_to)
        if capture == None:
            print("Could not capture ISBN from %s", redirected_to)
            continue
        isbn = capture.group(1)
        worldcat_permalink = requests.head("https://www.worldcat.org/isbn/%s" % isbn, allow_redirects=True).url
        print(url, worldcat_permalink)
        f = f.replace(
            url,
            worldcat_permalink
        )
    codecs.open(filename, 'w', encoding='utf-8').write(f)


for file in glob.glob('../tmcw.github.com/_posts/*.md'):
    remove_amazon(file)

I still have mild concerns about WorldCat. They support the OCLC Control Numbers, which, though they’re public domain, the WorldCat search API isn’t open to regular old folks like myself. So if I were to do this conversion again, I’d have to either be a full-time librarian to get access to the API, or I’ll need to scrape WorldCat’s pages to get the alternative identifiers.

That was kind of surprising - that, despite so much effort spent on great indexes for books, there wasn’t a cross-reference service that would give you alternative IDs: provide an ISBN, get a Library of Congress number, and so on. Maybe I just couldn’t find it.