One of my friend reached out to me last week for help. The photographer she hired for her wedding had gone bankrupted. Since she has not gotten the album now she can't even get her wedding photos in either prints or digital form. All she has access to is the photographer's web site where the people can order prints from. She was hoping that I can somehow retrieve all the photos (~1220) from the site so that at least she would have digital copies of her wedding photos.
After poking around the site a bit, I found out that I was in luck. Sort of. All the photos are accessible but only in low resolution (500 x 366). But more importantly for me, the site developer had opted to have a nice AJAX interface and for some reason decided to include all the image URLs in the page!
So a plan of action quickly formed in my head:
1. Download the page source
2. Extract the image URLs from all the HTML/Javascript code
3. Download all the images, one by one
Initially I want to try out Automator on my Mac and see how easy it would be to do this, since this is exactly the type of repetitive tasks Steve Job told us Automator is perfect for! Unfortunately after poking around for 30 minutes, I quickly came to the conclusion that Automator is woefully inadequate for this task. So I turned to PowerShell on my Windows VM instead. Having used PowerShell 2.0 CTP a lot in my last project, I was able to quickly develop a script that extracts URLs from the page source, download the file, and save it to disk. In fact, the most time consuming part is to figure out the regular expression for URL extraction!
From start to finish, it took me around 90 minutes to get all the photos downloaded. If I didn't waste time with Automator and better at regular expression I think I can do it in 15 minutes!
Now my friend can have her wedding photos. She may not be able to print them out, but at least she can view them on a computer screen.
Leave a Reply