A coworker of mine has been keeping a blog with Typepad for several years now. As of recently he decided he wanted to switch to using WordPress on one of our servers. This is usually easy; you export your blog from the original site and import it using the appropriate plugin. WordPress provides great plugins for importing a blog from various other sources, so an import usually doesn’t require too much work. The current Typepad plugin has one big flaw though; it doesn’t import your photos! To make matters worse my coworker’s blog is a photography website with several hundred posts and even more photos. The photos are essentially stuck on Typepad’s servers since manually linking each picture through WordPress would be an incredibly grueling process.
I decided the only way to make this work was to modify the plugin and add the much needed image downloading functionality. Thanks to some of WordPress’ excellent building functions and a lightweight open source HTML parsing library I was able to do this with relative ease. Here’s how it works…
Modification overview
The original plugin does the bulk of its work through a function called process_posts() . This is where the importer reads your export file and turns it into wordpress content. Towards the end of the function everything starts getting assigned to new post elements and that’s where I tied in my image processing code. When the plugin is building a post’s body, excerpt, or extended content the plugin first looks at each line and pulls any images it finds using my function get_images_from_line().
get_images_from_line() does just that, it reads the line of data and gets any images that it may contain. In order to easily look each line’s HTML elements I made use of the PHP Simple HTML DOM Parser library by Jose Solorzano. The images are pulled in through image tags and hyperlinks containing the Typepad classification “asset-img-link”. Next the plugin downloads the image and adds it to WordPress’ media library. If the file name was already taken it’s given a new one and the path to the newly downloaded image is used to replace the old path that pointed to the Typepad hosted file.
Future modifications
If I were to further modify or add any new features to the plugin I would absolutely incorporate an Ajax progress bar and status system. Something simple that allows the user to know what’s happening during the import. A large site can take a long time to import, so this would keep things user-friendly and also avoid unnecessary server timeouts from large files.
6 responses to “WordPress: Movable Type and TypePad Import Plugin + Image Downloader”
Awesome! Thank you!
Unfortunately, I get an exception 🙁 “The file cannot be saved.”
This could be the result of a few different issues.
1. Try looking at how your server is configured, php might not have write permission to the destination
2. Though I’ve gotten this project to work on small sites, in its current form it will timeout on large images or sites with many images. Ideally each request to pull an image should be done via an asynchronous call, I just never got around to implementing that ability. You can get around this by extending php timeout limits on your server – though I’d do this as a temporary fix only for the sake of migration and reset these limits after.
This is awesome – saved a lot of time. I’ve added a slight modification – you have a media_process function that’s never called, so I’m using it to add the images to the media library, then picking up the attachment ID’s and linking them to the original posts.
Also changing -popup URL’s to -pi, which seems to fetch the largest size image.
Let me know if you’d like me to add the changes via a fork.
That sounds great! Thanks for pushing the code further.