A coworker of mine has been keeping a blog with Typepad for several years now. As of recently he decided he wanted to switch to using WordPress on one of our servers. This is usually easy; you export your blog from the original site and import it using the appropriate plugin. WordPress provides great plugins for importing a blog from various other sources, so an import usually doesn’t require too much work. The current Typepad plugin has one big flaw though; it doesn’t import your photos! To make matters worse my coworker’s blog is a photography website with several hundred posts and even more photos. The photos are essentially stuck on Typepad’s servers since manually linking each picture through WordPress would be an incredibly grueling process.
I decided the only way to make this work was to modify the plugin and add the much needed image downloading functionality. Thanks to some of WordPress’ excellent building functions and a lightweight open source HTML parsing library I was able to do this with relative ease. Here’s how it works…
The original plugin does the bulk of its work through a function called process_posts() . This is where the importer reads your export file and turns it into wordpress content. Towards the end of the function everything starts getting assigned to new post elements and that’s where I tied in my image processing code. When the plugin is building a post’s body, excerpt, or extended content the plugin first looks at each line and pulls any images it finds using my function get_images_from_line().
get_images_from_line() does just that, it reads the line of data and gets any images that it may contain. In order to easily look each line’s HTML elements I made use of the PHP Simple HTML DOM Parser library by Jose Solorzano. The images are pulled in through image tags and hyperlinks containing the Typepad classification “asset-img-link”. Next the plugin downloads the image and adds it to WordPress’ media library. If the file name was already taken it’s given a new one and the path to the newly downloaded image is used to replace the old path that pointed to the Typepad hosted file.
If I were to further modify or add any new features to the plugin I would absolutely incorporate an Ajax progress bar and status system. Something simple that allows the user to know what’s happening during the import. A large site can take a long time to import, so this would keep things user-friendly and also avoid unnecessary server timeouts from large files.