Web Scraping Magic…

I do a *lot* of web scraping and automation with my marketing tools.  Some of it is very advanced.  For example, I get into some pretty deep stuff with Site Sniper Pro to be able to determine the ad region location on the page.

Every once in awhile I stumble across a tool so good that I just have to share it.  Well… I’ve stumbled 😉

I usually use a program on my mac called Scoop to intercept http requests so I can see what’s happening behind the scenes of web requests.  If you’re not using something similar and you do a lot of web scraping or web automation then you’re wasting a *ton* of time.

WARNING: Geek Alert!  Ok… before I go any further I should warn you… this is not a beginner’s post.  If you don’t know what web scraping is or don’t do home-grown automation then you might want to skip this post.  It’s more technical than most of my posts.  I’m assuming you have at least a basic knowledge of what I’m talking about in the post or it won’t make sense.  You’ve been warned.  If you have questions feel free to post in the comments.

Back to the story… The 1 HUGE problem with Scoop is that it couldn’t handle secure http (https) requests very well.  This is a common problem with port scanner type software.  It gets the request too late in the game and all I can see is the encrypted stream.  Typically this is good news.  It makes it difficult for unwanted guests to eavesdrop on my browsing session.  But when I’m trying to deconstruct a complex web request it just blows.

I’ve been able to get around it in the past using an overly complex system that required entirely too much work on my part.

But recently I started a project where I wanted to create my own front-end for Google’s keyword research tool.  Their entire session is secure.  And it uses nothing but AJAX and javascript to do its work.  I was really struggling with my traditional methods to deconstruct the browser conversatio and duplicate it with my software.

After a bit of googling and trying lots of different solutions I finally found a tool that just ROCKS for what I’m trying to do.  It’s called IE Inspector (available from ieinspector.com – not an affiliate link).  It works in Windows and intercepts all major browser requests (IE, FireFox, even Chrome).  It gives me a window into the browser conversation like I’ve never had before.

Here’s short video of how I use it and what it does for me.  Enjoy.

Watch the Video...

8 Comments Web Scraping Magic…

  1. admin

    Good question… here are the differences that matter to me:

    First, I need something that works outside of FireFox. I love addons like FireBug (unbeatable), Live HTTP Header, Tamper, etc. for FireFox… but they don’t always help me automate.

    For example, I use the native WinInet libraries for most of my client-side automation. I often need to peek into my own work to match headers, responses, etc. I can’t do that with a pure FireFox addon. I like that IEInspector is standalone and works with both the WinInet and NSS (FireFox, Chrome) libraries.

    I need to see not just my browser activity, but my application activity as well. IEInspector gives my threaded process access to all the applications using WinInet/NSS on my machine.

    That’s the *big* reason for me. My development efforts require flexibility that an addon won’t give me.

    Beyond that there are some cool things I can do with IEInspector that I like. I may be able to do some (or all) of these with a tool like Live HTTP Header as well. I don’t know… I’m not saying this is the only tool available that works… just the one I fell in love with.

    For example, I can “tamper” with the data using IEInspector. I can also “record” and playback original or tampered sequences for testing purposes. I can even construct raw requests for testing purposes.

    There are actually a variety of FF plugins that I used to use to work out the browser conversation. I’m not against them. I still use them for certain things. But none of them (that I’ve found) allow me to do what I can do with HTTP Analyzer.

  2. admin

    Yes. It’s part of the QVM Extreme Suite I’m developing. Beta testing is starting this week for the first 4 applications in the suite, including the keyword research tool.

  3. underground

    This thing rocks. I can see a lot of uses for this thing. Breaking down everything going on behind the scenes in your browser to help create ways to automate different aspects of SEO. Thanks for video and turning us on to this app.

  4. Susan Kunkel

    I’m looking for a tool that scrapes email header info from messages in Outlook, either in personal folders or inbox. I’d like to grab the names, email addresses and subject from each message into a CSV format.

    Are you aware of such a tool?

Comments are closed.