At the end of the reboot conference this year I ended up walking to dinner with Christian Dalager and I mentioned that my “one actionable idea” contributed at the end of the conference was to create an RSS feed of my banking transactions.
As it turned out, I was talking to the right man: not only was Christian interested in this notion, but already had done work to parse his own Danish bank’s electronic data (in his case he had to screen-scrape, a considerably more ambitious task than I hoped to require).
My first thought when approaching the task was to simply grab data in the handy comma-delimited ASCII file format that’s available under “Activity Format” on the “Account Activity Request” page of the site. Unfortunately the transaction data this provides only includes “metadata” — like the actual place I made a debit card payment, for example — for the current month’s transactions; before then it’s only dates and amounts. This would be fine if I wanted to start my data with the current month (I could just made sure I grabbed the data before it lost its metadata layer at the end of the month), but I want to go back in time as far as possible.
Fortunately there’s another option: MemberDirect now makes “Electronic Statements” available for download going back, in my case, to the start of 2007. While the statements are PDF files, and therefor require some additional parsing to grab transaction data, they are metadata-rich and thus the best candidate for my project.
Rather than going through the process of downloading the 29 PDF files currently available to me, I decided to write a script that would login to my MemberDirect account and grab all available statements automatically. Fortunately this proved to be quite easy: the MemberDirect authentication model simply accepts an HTTP POST with the username and password, and sends back a session cookie that can then be sent back for all future requests.
After some experimenting and parsing of the HTML that gets returned, I’ve come up with memberdirect-getstatements.php, a PHP script that uses cURL and the PHP Simple HTML DOM Parser to login, grab the index of available years, and then download all available PDF electronic statements.
This script isn’t for the faint of heart: you’ll need a PHP-equipped host with cURL (if you have a modern Mac you’ve already got both) and it’s possible that the script might need tweaking for credit unions that aren’t mine.
More information is (or shortly will be) available on this wiki page. Comments and experiences welcome.
My next step is to write a parser for the Electronic Statements that will allow me to do some basic “how much am I spending on coffee”-type data analysis.
Update: you can grab the statement parsing code here to extract transaction data from the electronic statements into a MySQL table.