Grab sites using Python:Create your own site grabber like IDM

Site Grabber Using Twisted Python
—————————————————————-

First install Twisted network engine.you can download it from
you can download it from -> https://twistedmatrix.com/trac/
Direct installer is available for python2.7 for windows.After
installing twisted go to editor and create this pythone file.
Twisted is a event driven network engine got lot more capabilities.
This is only a small use of it.

#########Grab.py###############

#######PROGRAM#################
from twisted.internet import reactor
from twisted.web.client import downloadPage
import sys
def printError(failure):
     print >>sys.stderr, failure
def stop(result):
    reactor.stop()
if len(sys.argv) != 3:
    print >>sys.stderr, “Usage: python download_resource.py <URL>         <output file>”
    exit(1)
d = downloadPage(sys.argv[1], sys.argv[2])
d.addErrback(printError)
d.addBoth(stop)
reactor.run()
###############################

Run it:

C:>python grab.py http://microsoft.com microsoft.html

error

semantics of above is http://www.microsoft.com static page will be saved
to microsoft.html in current directory.

Note:
some people will get this error “no module found win32api” while running
the program.Then download windows extensions for python from here.

->http://sourceforge.net/projects/pywin32/files/pywin32/Build%20218/
Rerun it.sure it will works.

Hint:

we can also use url parsing from httplib to mine links out further and save all corresponding links as offline pages.

I’m leaving it for you.

One thought on “Grab sites using Python:Create your own site grabber like IDM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s