I do a *lot* of web scraping and automation with my marketing tools. Some of it is very advanced. For example, I get into some pretty deep stuff with Site Sniper Pro to be able to determine the ad region location on the page.
Every once in awhile I stumble across a tool so good that I just have to share it. Well… I’ve stumbled 😉
I usually use a program on my mac called Scoop to intercept http requests so I can see what’s happening behind the scenes of web requests. If you’re not using something similar and you do a lot of web scraping or web automation then you’re wasting a *ton* of time.
WARNING: Geek Alert! Ok… before I go any further I should warn you… this is not a beginner’s post. If you don’t know what web scraping is or don’t do home-grown automation then you might want to skip this post. It’s more technical than most of my posts. I’m assuming you have at least a basic knowledge of what I’m talking about in the post or it won’t make sense. You’ve been warned. If you have questions feel free to post in the comments.
Back to the story… The 1 HUGE problem with Scoop is that it couldn’t handle secure http (https) requests very well. This is a common problem with port scanner type software. It gets the request too late in the game and all I can see is the encrypted stream. Typically this is good news. It makes it difficult for unwanted guests to eavesdrop on my browsing session. But when I’m trying to deconstruct a complex web request it just blows.
I’ve been able to get around it in the past using an overly complex system that required entirely too much work on my part.
But recently I started a project where I wanted to create my own front-end for Google’s keyword research tool. Their entire session is secure. And it uses nothing but AJAX and javascript to do its work. I was really struggling with my traditional methods to deconstruct the browser conversatio and duplicate it with my software.
After a bit of googling and trying lots of different solutions I finally found a tool that just ROCKS for what I’m trying to do. It’s called IE Inspector (available from ieinspector.com – not an affiliate link). It works in Windows and intercepts all major browser requests (IE, FireFox, even Chrome). It gives me a window into the browser conversation like I’ve never had before.
Here’s short video of how I use it and what it does for me. Enjoy.