Implementing activity streams in Django web framework

 

hsrry

 

What is an activity stream?

It is a sequential chain of all activities done by the users in a website.Like Twitter showing our own feeds ,listing activities of developers on Gitlab dashboard or showing activities of connections on LinkedIn.Whatever you may call it,It is an activity stream.

For ex:

This is the activity stream for some company dashboard.See the smiling faces of company members and their activity beside on their dashboard.If you are a django developer and trying to create similar activity streams,then this post is for you.This post is for everyone who wants to learn how to create LinkedIn like realtime dashboard using powerful Django 1.7 web  framework.

How we can implement activity streams?

Actually there are few advanced concepts like signals and handlers in Django using which we can notify anybody at a particular time i.e when user posts a comment ,send it to email etc.Using that basic functionality of signals ,we should work hard to create dashboard activity streams.Fortunately a young Django developer called Justin Quick wrote a quick usable Django plug-in called django-activity-stream to lift the burden from our shoulders in creating activity streams  with Django in python.Documentation is pretty good but beginners may confuse about it’s usage.Here is the link to that Django plugin library.

https://github.com/justquick

What are the requirements?

*  Python2.7

* Django 1.7> (In lower versions,we need to look into many more things,so install safely 1.7 in virtualenv )

* django-activity-stream

Use the below commands to install Django1.7 in a virtual environment

$ pip install virtualenv
$ virtualenv djangoenv

First command install the virtual environment tool and second command creates a virtual environment called djangoenv,in which we are going to install our Django,and its related libraries.Now only environment is created.In order to enter into that environment we need to activate it.

$ source djangoenv/bin/activate

then you will find change in the shell symbol to

(djangoenv)$ ls -l

Now you are in a virtual environment that has a separate copy of python interpreter and fresh libraries to work upon.

Note: In virtual environment use only pip install command instead of sudo pip.

Now let us install django 1.7 ,django-activity-stream

$ pip install Django==1.7.1
$ pip install django-activity-stream

Now everything is ready.

I am creating a new project called “tutorial” and creating an app called “dashboard”.

 

(djangoenv)$ django-admin.py startproject tutorial

(djangoenv)$ cd tutorial

(djangoenv)$ django-admin.py startapp dashboard

Now your directory structure should be app inside project directory.I also hope that you have chosen your own back-end.I am continuing with MySQL .First there will not be any users.Create database and then use following commands for creating tables.

(djangoenv)$ ./manage.py makemigrations

(djangoenv)$ ./manage.py migrate

makemigrations command scans all your models in apps.So you have a default User model in django.contrib.auth.models.After typing those commands your User table will be created.Now create super user to quick insert data into django models.

(djangoenv)$ django-admin.py createsuperuser

Now after creating the superuser login and populate the users.For me I created 3 users namely,

* naren

* sriman

* manoj

Selection_017

I am going to make users as supervisors.So I create supervisors model and tasks model in my dashboard app.Each supervisor can be allocated with many tasks in hand.

my models.py file looks like

#dashboard.models
from django.db import models
from django.contrib.auth.models import User
# Create your models here.
class Task(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField(null=True,blank=True)

class Supervisor(models.Model):
    user = models.ForeignKey(User,null=True,blank=True,related_name="supervisor")
    task = models.ManyToManyField(Task,related_name='tasks')

I use foreign key to access user name and one supervisor can have many tasks.So it is a many to many field.

Setting up django-activity-stream?

Your settings.py should consists of following things.Add actstream to list of apps.Add that ACTSTREAM_SETTINGS file to your settings.py file.

#settings.py
....
INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'dashboard',
    'actstream',
)
ACTSTREAM_SETTINGS = {
    'MANAGER': 'dashboard.managers.MyActionManager',
    'FETCH_RELATIONS': True,
    'USE_PREFETCH': True,
    'USE_JSONFIELD': False,
    'GFK_FETCH_DEPTH': 1,
}
....

Now you need to create three files in order to proceed further.In dashboard craete managers.py, signals.py, apps.py.

#dashboard.managers.py

from datetime import datetime
from django.contrib.contenttypes.models import ContentType
from actstream.managers import ActionManager, stream

class MyActionManager(ActionManager):
    @stream
    def mystream(self, obj, verb='posted', time=None):
        if time is None:
            time = datetime.now()
        return obj.actor_actions.filter(verb = verb, timestamp__lte = time)

This file is the time stamp manager that creates and returns action generated time.You can write your own logic here if you wish to return other parameters than time.

#dashboard.apps.py
from django.apps import AppConfig
from actstream import registry
from django.contrib.auth.models import User

class MyAppConfig(AppConfig):
    name = 'dashboard'
    def ready(self):
        registry.register(User,self.get_model('Task'),self.get_model('Supervisor'))

This apps.py registers the models to the action manager.In order to use activity stream we need to register our models first.So I am registering my User model with dashboard models Task,Supervisor.

Next we need to add one line to __init__.py of dashboard app.These all steps are followed to use activity-stream

default_app_config = 'dashboard.apps.MyAppConfig'

By this we link the AppConfiguration to activity stream’s MyAppConfig.

Now everything is finished and we are ready to go.I am going to create tasks in the shell to concentrate more on creation of activity streams.

Selection_001

 

I created three tasks and these can be assigned to any supervisor.I am going to assign tasks to supervisors and at the same time activity will be recorded by activity stream.Normally in our projects assigning task to a user will lie in a view.So you need to send an action signal at the time of creation of assignment action.action_handler in actstream.actions handles the storing of these actions.

Selection_003

Using  action method we send a signal with actor,verb and target as parameters.These actor,target must be registered before using them.That is what we did above in file dashboard.apps.py .

actor – Source object which is performing action. It is like subject in an English statement.

verb – Verb is the description of the action.

target – Target object is the thing upon action is performed. It is like object in an Enlgish statement.

Now i am going to assign planning task to supervisor naren and instantly sends that signal.

Selection_005

Just see the output 31.It returned me the action object that created.It will be stored in the database.

Let us give tasks for two more Supervisors.

Selection_006

 

Selection_007

Remeber these assignments can occur anywhere in your views at anytime,but actions are stored in a common queue like object based on the time they created. we can fetch actions from that  and list them on the dashboard.

Action is the model that stores all actions generated.Who created model for that ?. django-activity-stream takes care of that.Now we need to list all the occured actions on a page.Just create the dash board view and display all actions in it.

See how the time stamps and all perfectly managed.Just display them on a view.your dash board will be finished.Create more combinations of actions as required and also use full power of django-activity-streams by tweaking all options available

We have tons of options for django-activity-stream like

* fetching actor_stream (actor actions)

* fetching target_stream (target actions)

* fethcing user_stream ( all actions associated with a user)

* following/unfollow  certain stream

The full code for the project is available with sqlite db in this link

https://github.com/narenaryan/django-activity-stream-tutorial

:) hope you enjoyed the article.

 

Ultimate guide for scraping JavaScript rendered web pages

We all scraped web pages.HTML content returned as response has our data and we scrape it for fetching certain results.If web page has JavaScript implementation, original data is obtained after rendering process. When we use normal requests package in that situation then responses those are returned  contains no data in them.Browsers know how to render and display the final result,but how a program can know?. So I came with a power pack solution to scrape any JavaScript rendered website very easily.

Many of us use below libraries to perform scraping.

1)Lxml

2)BeautifulSoup

I don’t mention scrapy or dragline frameworks here since underlying basic scraper is lxml .My favorite one is lxml.why? ,It has the element traversal methods rather than relying on regular expressions methodology like BeautifulSoup.Here I am going to take a very interesting example.I am so amazed after finding that ,my article is appeared in recent PyCoders weekly issue 147.So I am taking PyCoders weekly as an example to scrape all useful links from PyCoders archives.link to PyCoders weekly archives is here.

http://pycoders.com/archive/

It is totally a JavaScript rendered website.I want all links for those archives and next all links from each archive post.How to do that?. First I will show that it returned me nothing when used HTTP approach.

import requests
from lxml import html

#storing response
response = requests.get('http://pycoders.com/archive/')

#creating lxml tree from response body
tree = html.fromstring(response.text)

#Finding all anchor tags in response
print tree.xpath('//div[@class="campaign"]/a/@href')

When I run this I got following output

Selection_004

So I returned with only 3 links.How is that possible,because there are nearly 133 archives of PyCoders weekly.So I got nothing in response.Now I will think about tackling the problem.

How can we get the content?

There is one approach of getting data from JS rendered web pages.It is using Web kit library.Web kit library can do everything that a browser can perform.For some browsers Web kit will be the underground element for rendering web pages.Web kit is part of the QT library.So if you installed QT library and PyQT4 then you are ready to go.

You can install it by using command

sudo apt-get install python-qt4

Now everything is finished.We retry the fetching process,but with a different approach.

Here comes the solution

We first give the request through the web kit.We wait until everything is loaded perfectly and then return the completed HTML to a variable.Then we scrape that HTML content using lxml and obtain results.This process is little bit slow but you will be surprised by seeing that content fetched perfectly.

Let us take this code for granted

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  
from lxml import html 

class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  
  
  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit() 

Render class renders the web page. QWebPage is the input URL of web page to scrape.It does something,don’t bother about details.Remember that when we create Render object, it loads everything and creates a frame containing all information about the web page.

url = 'http://pycoders.com/archive/'  
#This does the magic.Loads everything
r = Render(url)  
#result is a QString.
result = r.frame.toHtml()

We are storing the result HTML into variable result.It is not a string to be processed with lxml.So we need to process before using content by lxml.

#QString should be converted to string before processed by lxml
formatted_result = str(result.toAscii())

#Next build lxml tree from formatted_result
tree = html.fromstring(formatted_result)

#Now using correct Xpath we are fetching URL of archives
archive_links = tree.xpath('//div[@class="campaign"]/a/@href')
print archive_links

It gives us all the links for archives and output is a very populated one.

Selection_005

So next create Render objects with these links as URL and extract the required content.The power of Web kit provides us to render a web page pragmatically then fetches data.So use this technique and get data from any JavaScript rendered web page.

Total code looks like this.

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  
from lxml import html 

#Take this class for granted.Just use result of rendering.
class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  
  
  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  

url = 'http://pycoders.com/archive/'  
r = Render(url)  
result = r.frame.toHtml()
#This step is important.Converting QString to Ascii for lxml to process
archive_links = html.fromstring(str(result.toAscii()))
print archive_links

 

I showed you the fully functional way to scrape a JavaScript rendered web page .Apply this technique to automate any no of steps or integrate this technique and override default behavior of a scraping framework.It is slow but 100% result prone.I hope you enjoyed the post.Try now this on any website you think is tricky to scrape.

All the best.

How situations drive me to write my own applications?

This article shows how I created an audio book downloader because I am tired of manual downloading.

Who am I?

I am a good fan of books.I have an e-book reader.I listen to podcasts often.I always  download audio books before journey.It is pleasant to listen stories and novels rather than reading,means using ears rather than eyes.So I always aware of some websites which provide good audio books to listen online.They are audible.com,ITunes and many more.Only few provide the downloadable audio books free with no restrictions.The main players are: http://www.librivox.org , http://www.loyalbooks.com , http://www.openculture.com/freeaudiobooks Everything is fine for some days. I was adjusted with them.Then came the actual problem.The recordings which are made by librivox,loyalbooks are volunteered.It means anybody can create and upload an audio book.Some books are good to listen,but many others are boring and voices are below average.I need premium voices for zero bucks.I searched on the web and found one such website.It is Lit2Go. http://etc.usf.edu/lit2go/books/   lit2go It consists of all classical books recorded with premium voices.But it kept as a free service because it is one of the academical project of http://etc.usf.edu/ .

What is drawback of Lit2Go?

Lit2Go is a free service.It divides each book into no of chapters.It allows you to play chapters online.But we can download those by  right clicking, a common way.But if chapters are more then downloading them is dead boring task.Going to each chapter link and right clicking.Next copying them to a folder.I do it frequently and always thought “why these people are not providing all chapters in a single file?”.

How I overcame that ?

This is my personal life and python is still playing a role in it.Computing is made simpler like talking about a movie to a friend.I created a home application called “createbook” that asks me which book to download.Moreover it  downloads all chapters within a single folder in a systematic order.Just by running a small script, I can select and download entire audio book now.

How createbook works?

mike Just i ran the script(You can just click on script,if you were a windows user) and I got listed with all classic book options. select I selected 199(Winesburg Ohio) and I saw this folder in my script directory after some time. ohio ohio1 All 26 chapters were downloaded excellently with a folder named after book. Now I am running it happily.Next copying the downloaded folder to my phone,so that enjoying the premium voices without having any pain of downloading.Python helped me in this situation.It may help you too.Just apply it and you will see the comfort you get.Below is the code for my application.I used lxml,requests to do this.In your case,find out the library that suits your situation and build your own fun applications.

#Just run it,select book and sit back,sip coffee
import requests
from lxml import html
import os

def createbook(url):
    res = requests.get(url)
    folder = url.split('/')[-2]
    if not os.path.exists(folder):
        os.makedirs(folder)
    tree  = html.fromstring(res.text)
    parts = tree.xpath('//dl/dt/a/@href')
    for i in parts:
        res = requests.get(i)
        tree  = html.fromstring(res.text)
        parturl = tree.xpath('//audio/source[@type="audio/mpeg"]/@src')
        for surl in parturl:
            with open('%s/%s'%(folder,surl.split('/')[-1]), 'wb') as handle:
                response = requests.get(surl, stream=True)
                for block in response.iter_content(1024):
                    if not block:
                        break
                    handle.write(block)

    print '"%s" is successfully downloaded.....'%folder

res = requests.get('http://etc.usf.edu/lit2go/books/')
tree = html.fromstring(res.text)
books = [i.encode('utf-8') for i in tree.xpath('//figcaption[@class="title"]/a/text()')]
links = tree.xpath('//figcaption[@class="title"]/a/@href')

catalog = dict(zip(books,links))
numcatalog = enumerate(books,1)
chose = {}
print '@@@@@@@@@@ Select from the books below @@@@@@@@@@\n'

for i,j in numcatalog:
    print '%d) %s'%(i,j)
    chose[i] = catalog[j]

choice = int(raw_input('\nSelect Book:'))

print 'Your book started downloading.......'
createbook(chose[choice])

If you too have the same taste of listening to audio books ,feel free to use this script and download classic books.you can find same code in my git repository.The output book folder will be on the same directory of your script.I advise you to use python computing power, wherever it is required.Create your own applications and be comfort. https://github.com/narenaryan/makebook

Anatomy and application of parallel programming in python

Google data center clusters

What is Parallel  Processing?

Parallel processing is the simultaneous use of more than one CPU or processor core to execute a program or multiple computational threads. Ideally, parallel processing makes programs run faster because there are more engines (CPUs or cores) running it. In practice, it is often difficult to divide a program in such a way that separate CPUs or cores can execute different portions without interfering with each other. Most computers have just one CPU, but some models have several, and multi-core processor chips are becoming the norm. There are even computers with thousands of CPUs. After reading that weird Wikipedia definition, what are you thinking?. The same stuff repeated in our ears for years.We all know the core concepts,but coming to implementation we take step lazily.There are many reasons for that. In this post we are going to understand the basic concepts of parallel processing,distributed computing and jump start to a hands-on example for implementing them.

Why parallel Processing?

Assume that you have a very popular website,it has a web Fibonacci calculator.If everything is processed sequentially the requests from users are stored in a Queue and program calculates Fibonacci no and return it to user.If website got 80000 requests and calculating sequentially causes waiting of last submitted user.His reload bar will rotate for ever.How all e-commerce websites are serving the customers with light speed spontaneously?. They use parallelization techniques.Learn how to use it,implement straight away in your programmes.

 Let’s begin the show

The parallel program is the program that is distributed among various process,they in turn given to the cores of the processor to execute. Take a look at smallest parallel computing illustration here.It is multiplying a scalar to a matrix.Multiplication is done by sequentially multiplying scalar to each element.But in parallel computing each element is divided as a unit,and workers are allocated with tasks of multiplying scalar to that element unit.This big task is distributed to 4 processes. Selection_022

Patterns for designing parallel structures

The criteria for designing a pattern for a parallel problem depends on the context of problem itself.The no of workers to be dispatched,no of cores to be used all this factors are dealt with taking main problem into consideration.But here we are going to see a universal pattern for parallel computing problems.

Pipeline concept

This concept is so simple.A task is processed in different stages.many no of workers are present in the each stage.At stage 1,workers 1 process the data,next it goes to stage 2 etc. Selection_026 It is like an assembly line shown in Discovery and NatGeo shows of car manufacturing.In that a car chassis  will be sent to the next stage in which multiple robotic hands paint body at a time,next it moves to another stage in which another set of robotic workers fix the bolts etc. Bayerische Motoren Werke AG Unveil Their Latest Mini Automobile

Analyzing best python tools for implementing parallelism

There are four ways for achieving concurrent processing in python. 1) threading and concurrent.features 2) multiprocessing and ProcessPoolExecutor 3) Parallel Python 4) Celery First two tools are available as built in libraries.Next two are external libraries that do most of the job behind the screens.threading solution to a parallel problem is not preferable since,synchronization mechanisms need to be carefully implemented.In this post we are making things simple and clear.No locks,no synchronization techniques.Just execute a program in a parallel way.

Parallel Python, a wonderful ,simple,cool library for parallel as well as distributed computing

If you installed Parallel Python,it is fine.Other wise just open your terminal and install it by downloading compressed file,extract and running setup.py.compressed files available here. http://www.parallelpython.com/content/view/18/32/ After pp is successfully installed, we are going to build a great practical example to illustrate parallel processing using Parallel Python library,which is a very good tool.

What Parallel Python can do?

The most important advantage of using PP is the abstraction that this module provides. Some important features of PP are as follows: •     Automatic detection of number of processors to improve load balance •     Many processors allocated can be changed at runtime •     Load balance at runtime •     Auto-discovery resources throughout the network

Parallel Python basics

Just we need to create a Server from pp next use submit method of that instance to submit tasks for multiple processes.

import pp

#Server encapsulates and dispatches task to multiple processes
s = pp.Server()

#Server can also be started with multiple cores and other distributed systems

s = pp.Server(ncpus=4,ppservers = ("192.168.25.21", "192.168.25.9"))

# ncpus => no of cores to use, ppservers = Ip's_of_computers_conected_as_cluster
#There is one important function called submit for adding tasks to processes.

"""
submit(self, func, args=(), depfuncs=(), modules=(),
callback=None, callbackargs=(), group='default',
globals=None)

func -> target function to be executed,
args -> arguments to be passed for func,
modules -> list of packages to be imported by func to do it's function
callback ->A function to which result of func will be returned,we can process results here,like sending computed values as response to user
"""
#For exmaple see this submit call below.
s.submit(fibo_task, (item,), modules=('os',),
callback=aggregate_results)

with these basics let us solve a real world problem.

Problem ( calculate greater circle distance )

I created a popular website for getting greater circle distance and I need to process 80000 requests.It means the data entered by users should be processed i.e results are computed concurrently. What is greater circle distance?. It is the distance between two locations on earth.Location is a tuple of (latitude,longitude). Calculating greater circle distance uses a formula called Haversine formula.Don’t be panic by seeing big terms.It is a function that returns distance if two locations are given as arguments.

import math

#This is a function.Don't bother it's contents 

def haversine(key,lat1, lon1, lat2, lon2):
    R = 6372.8 # Earth radius in kilometers
    dLat = math.radians(lat2 - lat1)
    dLon = math.radians(lon2 - lon1)
    lat1 = math.radians(lat1)
    lat2 = math.radians(lat2)
    a = math.sin(dLat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dLon/2)**2
    c = 2* math.asin(math.sqrt(a))
    #calculating KM
    a = R * c
    return a

Think this way.haversine function takes lat1,lon1 and lat2,lan2 and return Greater circle distance.For example we have California(37.0000° N, 120.0000° W) and New Jersey(40.0000° N, 74.5000° W).If we want to find distance between California and New Jersey then use above function as distance_ca_nj = haversine(37.0000,120.0000,40.0000,74.0000) #Gives us distance between California and New jersey in KM.

How to make haversine function execute in parallel fashion

Here I am using four inputs from users,we can extend it to any no of inputs and make them run concurrently.

import os, pp

users={'california_to_newJersey' : (37.0000,120.0000,40.0000,74.0000),
'oklahoma_to_texas' : (35.5000, 98.0000,31.0000, 100.0000),
'arizona_to_kansas' : (34.0000, 112.0000,38.5000, 98.0000),
'mississippi_to_boston' : (33.0000, 90.0000,42.3581, 71.0636)} 

#This dict stores Great distance values for each key in users
result_dict = {}

Now we can modify haversine function.

def haversine(key,lat1, lon1, lat2, lon2):
    R = 6372.8 # Earth radius in kilometers
    dLat = math.radians(lat2 - lat1)
    dLon = math.radians(lon2 - lon1)
    lat1 = math.radians(lat1)
    lat2 = math.radians(lat2)
    a = math.sin(dLat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dLon/2)**2
    c = 2* math.asin(math.sqrt(a))
    #calculating KM
    a = R * c
    message = "the Great Circle Distance calculated by pid %d was %d KM"%(os.getpid(), a)
    return (key, message)
Next we need to process the result returned.So let’s create a callback function that takes result returned and stores it in the result_dict
def aggregate_results(result):
    print "Computing results with PID [%d]" % os.getpid()
    result_dict[result[0]] = result[1]
 Now let us code the main thing that adds tasks to process,we will use all the functions we created above.
job_server = pp.Server(ncpus=4)

for key in users.keys(): 
    job_server.submit(haversine,(key,users[key][0],users[key][1],users[key] [2],users[key][3]), modules=('os','math'), \
callback=aggregate_results)

"""
Above line creates multiple processes and assign 
executing haversine function with arguments as key,expanded tuple.
Processes starts executing using all cores of your processor.
Wait for all processes complete execution before retrieving result
"""
job_server.wait()

#Next main process starts executing
print "Main process PID [%d]" % os.getpid()
for key, value in result_dict.items():
    print "For input %d, %s" % (key, value)

See the output, It works

Why there is common pid for all processes?,actually program executed parallel under main process.So main process process id is listed for each dispatched worker.But it  executes in 1/4 th  time of general sequential program.
Total code can be found at this github link.Just save haversine.py from below repository and run it.You will find it’s lightening execution speed.
any discussions please forward to narenarya@live.com
Resources:
*
*

Parallel Programming with Python
By Jan Palach

*

Geospatial developement simplified with python

 

 

 

 

In this article I am going to show how easy it is to build a real world Geo-spatial application with python.

GIS?

What is Geospatial development or Geographical Information Systems (GIS)?. According to Wikipedia ,

Geographical information systems (GIS), which is a large domain that provides a variety of capabilities designed to capture, store, manipulate, analyze, manage, and present all types of geographical data, and utilizes geospatial analysis in a variety of contexts, operations and applications.

Basic Applications

Geo-spatial analysis, using GIS, was developed for problems in the environmental and life sciences, in particular ecology, geology and epidemiology. It has extended to almost all industries including defense, intelligence, utilities, Natural Resources (i.e. Oil and Gas, Forestry etc.), social sciences, medicine and Public Safety (i.e. emergency management and criminology), disaster risk reduction and management (DRRM), and climate change adaptation (CCA). Spatial statistics typically result primarily from observation rather than experimentation.

For more visit this page http://en.wikipedia.org/wiki/Geospatial_analysis

What Python got do with it?

If somebody asks you to find the locations of churches around 30KM from your new residence,as a first choice you go for web and find manually by observing the map of your city.But it will be complex ,if you wish to know about a location of many types. GIS are the systems those analyze earth data and fetches you the results.There are two types of earth data.

1)Raster data,data consisting scanned images of earth

2)Vector data,data consisting geometry drawn images

Analyzing both types of data is essential for many organizations like Military,Survey and few more.In initial days the people from mathematics background are hired for performing Geo-spatial development because all the basic operations,algorithms applied should be mathematically designed peer-peer.But now a days abstract bindings for existing libraries enables normal software developers to use wrappers for creating Geo-spatial applications.If you chose the correct programming language like Python,we can create powerful Geo-spatial applications in less time with less effort.Just programmer need to be familiar with terminology of geography like latitude,longitude,hemisphere,datum,meridian,directions,shape files etc.

Python got 3 fantastic binding libraries which are wrote upon existing libraries in c++

1) GDAL/OGR

2)PyQgis

3)  ArcPy

In this post i am going to show you a jump start example for finding the locations of a city in United States.I have United States city data with me.If your own region has data available,you can use the same code for finding theaters,parks etc around few Kilometers. My application is totally off-line,since I am analyzing a downloaded dataset.

I am going to use GDAL/OGR library for creating the application,You need to install GDAL/OGR, pyproj , shapely python libraries before starting to build application.For installing those libraries see

http://gis.stackexchange.com/questions/9553/whats-the-easiest-way-to-install-gdal-and-ogr-for-python/124751#124751

Seeing is believing

See the application running,and be confident.I am going to find cliffs around Texas,with 50 KM range

 

Here my main program is cliffs.py and program prompts to enter ISO2 code of city,TX-Texas CA-Callifornia.

 

Selection_017

now my application finally returns me all the cliffs around Texas in 50KM range

Selection_018

How I did that?

First we need to download required Datasets into our directory,two datasets we required here are

1) https://app.box.com/s/2s7i5culjrkq6sm3fqr5

2) https://app.box.com/s/7qzk2y3bpvo264hgxs0r

 

First file is Place-Find.zip,unzip it and save all files in the same directory of the program cliffs.py.Next second file is NationalFile_20141005.txt.Keep this file in same directory.Now we got the required datasets.

It will then look like

Selection_020

I am creating a new file called settings.py which stores the information of places to display

#settings.py
cats= []
with open('NationalFile_20141005.txt','r') as fil:
    for i in fil.readlines():
        cats.append((i.rstrip().split('|'))[2])

#List the places and take input from User for Park,Bar,Hotel etc
#we can use dict(enumerate(set(cats))) here but we need to delete FEATURE_CLASS,Unknown fields from list
show_first = {k:v for k,v in enumerate(set(cats)) if v != 'FEATURE_CLASS' and v != 'Unknown'} 

Once observe the  NationalFile_20141005.txt,it consists of information about the locations like parks,bays,cliffs etc.
Next I am going to create the main program cliff.py.Here go imports
from __future__ import division
from settings import show_first

from osgeo import ogr
import shapely.geometry
import shapely.wkt
osgeo deals with opening shape files,shapely is helpful in translating them into geometric shapes.division is used to convert KM into angular distance to measure(1 degree=100KM). We are importing show_first dictionary from settings.py
shapefile = ogr.Open("tl_2014_us_cbsa.shp")
layer = shapefile.GetLayer(0)

 this opens a file in the same directory and creates a shapefile object.Next we created a layer using GetLayer function,Since that shape file t1_2014-us_cbsa.shp has only one layer we can use GetLayer(0) to fetch that single layer.
city = raw_input('\nEnter ISO2 code of city: ')
print '\nSelect category of place to search in and around the city\n'
for index,place in show_first.items():
    print '||%s|||||||%s'%(index,place)
place_choice = int(raw_input('\nEnter code of place from above listing: '))
place = show_first[place_choice]
distance = int(raw_input('\nEnter with in range distance(KM) to find %s: '%place))
#converting distance range to angular distance $$$$ 100 KM = 1 Degree $$$
MAX_DISTANCE = distance/100 # Angular distance; approx 10 km.

Above lines are normal python code for intaking the preferences from user and last line is converting distance range to angular.

print "Loading urban areas..."

# Maps area name to Shapely polygon.
urbanAreas = {} 

for i in range(layer.GetFeatureCount()):
    feature = layer.GetFeature(i)
    name = feature.GetField("NAME")
    geometry = feature.GetGeometryRef()
    shape = shapely.wkt.loads(geometry.ExportToWkt())
    dilatedShape = shape.buffer(MAX_DISTANCE)
    urbanAreas[name] = dilatedShape
In the above code we are translating geometry of each feature from the shape file into dilated shape and storing it in urbanAreas.
These urbanAreas now hold data for city polygons.Now i will open the NationalFile_20141005.txt and read the latitude and longitude values of the points around MAX_DISTANCE.If that point is within the polygon save it,else ignore.
f = open("NationalFile_20141005.txt", "r")
result = {}
for line in f.readlines():
    chunks = line.rstrip().split("|")
    if chunks[2] == place and chunks[3] == city:
        cliffName = chunks[1]
        latitude = float(chunks[9])
        longitude = float(chunks[10])
        pt = shapely.geometry.Point(longitude, latitude)
        for urbanName,urbanArea in urbanAreas.items():
            if urbanArea.contains(pt):
                if not result.has_key(cliffName):
                    result[cliffName]=[urbanName]
                else:
                    result[cliffName].append(urbanName)
 result dictionary is for saving the cliff_name,list of locations.The code above is quite obvious.Now print the results
print '\n---------------------%s--------------------\n'%place
for k,v in result.items():
    print k,'\n','=========================='
    for item in v:
        print item
    print '\n\n'
f.close()
this finishes our cliff.py.This application works entirely offline.Datasets are huge because of the details they are holding.The final code looks this way
#cliffs.py                    
from __future__ import division
from settings import show_first

from osgeo import ogr
import shapely.geometry
import shapely.wkt



shapefile = ogr.Open("tl_2014_us_cbsa.shp")
layer = shapefile.GetLayer(0)
#code for displaying and as

city = raw_input('\nEnter ISO2 code of city: ')

print '\nSelect category of place to search in and around the city\n'
for index,place in show_first.items():
    print '||%s|||||||%s'%(index,place)

place_choice = int(raw_input('\nEnter code of place from above listing: '))
place = show_first[place_choice]

distance = int(raw_input('\nEnter with in range distance(KM) to find %s: '%place))

#converting distance range to angular distance $$$$ 100 KM = 1 Degree $$$
MAX_DISTANCE = distance/100 # Angular distance; approx 10 km.
print "Loading urban areas..."

# Maps area name to Shapely polygon.
urbanAreas = {} 

for i in range(layer.GetFeatureCount()):
    feature = layer.GetFeature(i)
    name = feature.GetField("NAME")
    geometry = feature.GetGeometryRef()
    shape = shapely.wkt.loads(geometry.ExportToWkt())
    dilatedShape = shape.buffer(MAX_DISTANCE)
    urbanAreas[name] = dilatedShape

print "Checking %ss..."%place


f = open("NationalFile_20141005.txt", "r")
result = {}
for line in f.readlines():
    chunks = line.rstrip().split("|")
    if chunks[2] == place and chunks[3] == city:
        parkName = chunks[1]
        latitude = float(chunks[9])
        longitude = float(chunks[10])
        pt = shapely.geometry.Point(longitude, latitude)
        for urbanName,urbanArea in urbanAreas.items():
            if urbanArea.contains(pt):
                if not result.has_key(parkName):
                    result[parkName]=[urbanName]
                else:
                    result[parkName].append(urbanName)


print '\n---------------------%s--------------------\n'%place
for k,v in result.items():
    print k,'\n','=========================='
    for item in v:
        print item
    print '\n\n'
f.close()
 This completes our application.We can do many other things like finding borders of countries accurately,length from one place to another,satellite functions.Any average python developer can excel in Geo-spatial informatics because he got powerful tool as programming language and open source libraries built already with hiding complexity behind great mathematical functions.code forthis application is available at GITHUB.
If you haven’t done programming in GIS,then it might look complex stuff to  you.But learn basics and see it once again,you feel this post nerdy.

Quotebot,a clever twitter bot powered by Python

A person creating twitter bot and passing some orders to it.

 

We can do everything these days with the help of computing.We can automate,we can play with web,we can do anything.The main concern is collecting the different pieces of thoughts and combining them into an Idea.The thing I did is something crazy,but not senseless.

What is twitter bot?

Twitter bot is a program that posts to the twitter page autonomously without the intervention of a human operator.Once the bot is initiated,nobody concerns about  it,because bot itself can see its job from that moment.

Quotebot,story begins!

I created a twitter bot called Quotebot which posts quotes on a twitter page daily.There are lot’s of bots out there to do different things.In this post I am showcasing my twitter bot which is totally written in python.My major thought was to post a quotation daily,but according to the day, category of the quotations should change.

In the beginning i don’t have a single quote data  with me.Instantly i wrote a spider in the powerful framework,dragline and collected nearly 93000 of quotes from different catagories within few minutes.Below is my robomongo snapshot,I created 31 categories since a month has at most 31 days.

mongodb

Now i have the data source to publish.but quotes can have varying length.Twitter will not allow me to post message length greater than 140 characters.I depressed,because I collected the data,but twitter is limiting me.Then one thing came to my mind.

Twitter allows us to upload media,i.e image files.So  converting those quotes into images and then uploading them to the twitter works out.

I used tweepy,a well known twitter handling python library for dealing with updating status,used Img4me API for converting the quotes into images.The main characteristics given to Quotebot are outlined below:

1)  It posts 120 posts at max daily,one post per every 5 minutes.

2) Only posts quotes according to the day in a month,each day in a month is classified as love day,science day,inspiration day.

Quotebot only posts the corresponding posts on a particular day.

3) Each quote that fetched from the database to post is automatically converted into image and then uploaded to the twitter page

4) Logs the posting time and all the network transactions.

5) No conflicts between data posted and to post data,since bot uses Redis data store for controlling the post integrity.

6) Quotebot wakes up when system is booted up,and posts 120 posts and then goes to sleep.

Technologies used in building Quotebot

1) Tweepy library & Python

2) Redis

3) MongoDB

4)Image4me.com API

Here is the screenshot of the page on which twitter bot is running

I think you are interested how things are going in the back. Tweepy is the straight forward tool for working with twitter. Posting,retweeting,deleting tweets,make friends,remove friends,follow pages everything can be done by the tweepy.Once tweak it if you are interested.link is this. http://tweepy.readthedocs.org/en/v2.3.0/getting_started.html#introduction

Where is the code for Qoutebot?

Because it is not good  of telling a thing and messing up post with code same time,I am not describing the entire procedure here.But entire fully functional code for the Quotebot is located open at my github repository.Feel free to explore the code.

https://github.com/narenaryan/Quotebot

Hey where is that twitter page link,you just kept one screenshot?

The Quotebot functionality is 100% authentic.I posted this to show how we can automate things with help of computing power,that too with sweet language Python and light speed Redis.The spider part is only for fetching data from the well known website and store quotes in MongoDB. Here is link for that twitter page bot is operating on.

https://twitter.com/QuotesAryan

Explore the code.Enjoy the thought.You can also view my other repositories on github.

https://github.com/narenaryan

 

How to wake up a Python script, while you are in a sound sleep?

pract

We all know that programmes die after execution.Data may be persistent if serialized.So consider the case that we need to backup all the logs,or delete something periodically.We need a scheduler to do that.One great scheduler available for Linux systems is CRON scheduler.There are two things having subtle importance here.

1) How to use CRON scheduler to execute any command on a Linux computer.

2)How to use APScheduler to run some functions in a python script for a particular time.

Crontab

The crontab (cron derives from chronos, Greek for time; tab stands for table) command, found in Unix and Unix-like operating systems, is used to schedule commands to be executed periodically. To see what crontabs are currently running on your system, you can open a terminal and run:

$ crontab -l

If we want to create new job or edit the existing crontab job Just type

$ crontab -e
 If we wish to remove crontabs just type

$ crontab -r

It removes all crontabs.What is crontab ?,it is a file to which jobs are added .So add jobs to the end of that file by launching command “$ crontab -e”

This will open a the default editor (could be vi or pico, if you want you can change the default editor) to let us manipulate the crontab. If you save and exit the editor, all your cronjobs are saved into crontab. Cronjobs are written in the following format,here any valid command can be used after 5 *:

* * * * * /bin/execute/this/script.sh
* * * * * [any_valid_linux_command]

Scheduling

 

As you can see there are 5 stars. The stars represent different date parts in the following order:

  • minute (from 0 to 59)
  • hour (from 0 to 23)
  • day of month (from 1 to 31)
  • month (from 1 to 12)
  • day of week (from 0 to 6) (0=Sunday)

Execute every minute

If you leave the star, or asterisk, it means every. Maybe that’s a bit unclear. Let’s use the the previous example again:

* * * * * python /home/execute/this/funny.py

They are all still asterisks! So this means execute /home/execute/this/funny.py:

  • every minute
  • of every hour
  • of every day of the month
  • of every month
  • and every day in the week.

In short: This script is being executed every minute. Without exception.

Execute every Friday 1AM

So if we want to schedule the python script to run at 1AM every Friday, we would need the following cronjob:

0 1 * * 5 python /home/aryan/this/script.py

Get it? The script is now being executed when the system clock hits:

  • minute: 0
  • of hour: 1
  • of day of month: * (every day of month)
  • of month: * (every month)
  • and weekday: 5 (=Friday)

Execute on workdays 1AM

So if we want to schedule the python script to Monday till Friday at 1 AM, we would need the following cronjob:

0 1 * * 1-5 python /bin/execute/this/script.py

Neat scheduling tricks

What if you’d want to run something every 10 minutes? Well you could do this:

0,10,20,30,40,50 * * * * python /bin/execute/this/script.py

But crontab allows you to do this as well:

*/10 * * * * python /bin/execute/this/script.py

 

Storing the crontab output

By default cron saves the output of /bin/execute/this/backup.py in the user’s mailbox (root in this case). But it’s prettier if the output is saved in a separate logfile. Here’s how:

*/10 * * * * python /bin/execute/this/backup.py >> /var/log/script_output.log 2>&1

Linux can report on different levels. There’s standard output (STDOUT) and standard errors (STDERR). STDOUT is marked 1, STDERR is marked 2. So the following statement tells Linux to store STDERR in STDOUT as well, creating one datastream for messages & errors:

2>&1

 this is a shortcut illustration to up and run with cron

So now we understood how to run a python script for particular time.This is doing outside the program,means scheduling python programme manually by programmer.But sometimes we require to schedule from inside the program.For that we use a good library called APScheduler.you can install it by using the following command.

$ sudo pip install apscheduler
Ok after installing APScheduler,we can see how simple scheduling any job.Here we are going to a level deeper and schedule python functions to execute at a particular time.Here jobs are python functions.
from apscheduler.scheduler import Scheduler
#start the scheduler i.e create instance
sched=Scheduler
sched.start()
def my_job():
    print 'Happy_Birthday,Aryan'
#schedules job function my_job to greet me every year on my Birthday
sched.add_cron_job(my_job,month=6,day=24,hour=0)
 So this script greets me on my birthday by running the function every year.Running the script is done by the step 1 i.e cron scheduling,and inside the script scheduling is handled by APScheduler,see how goodies are provided for us.Here my_job is doing simple task but in real time systems,this mean anything like taking backup,deleting logs,housekeeping etc.
There are lots of things we can do with APScheduler,by adding jobs to Mongostore,redis store etc.For full fledged documentation kindly go to
and especially for cron format go to this page,and explore yourself.

Last but not least

Sometimes we need to create some python applications for operating systems using  KDE,GTK+.So those applications should run at system start up ,for that we need to add this  below command in the crontab file.Just add this simple line in crontab file

@reboot python /home/arya/Timepass.py &
 here @reboot tells that command should be executed at reboot,and & tells that process to run background.This finishes our little chit chat about CRON and APScheduler.Hope you enjoyed this post.

Understanding Egyptian multiplication via Python

Egyptian Multiplication

The ancient Egyptians used a curious way to multiply two numbers. The algorithm draws on thebinary system: multiplication by 2, or just adding a number two itself. Unlike, the Russian Peasant Multiplication that determines the involved powers of 2 automatically, the Egyptian algorithm has an extra step where those powers have to be found explicitly.

The applet below allows for experimentation with the algorithm I’ll present shortly. The two blue numbers at the top – the multiplicands – can be modified by clicking on their digits. (The digits can be treated individually or as part of a number depending on the state of the “Autonomous digits” checkbox.) The number of digits in the multiplicands changes from 1 through 4.


Write two multiplicands with some room in-between as the captions for two columns of numbers. The first column starts with 1 and the second with the second multiplicand. Below, in each column, write successively the doubles of the preceding numbers. The first column will generate the sequence of the powers of 2: 1, 2, 4, 8, … Stop when the next power becomes greater than the first multiplicand. I’ll use the same example as in the Russian Peasant Multiplication, 85×18:

 

 

The right column is exactly the same as it would be in the Russian Peasant Multiplication. The left column consists of the powers of two. The red ones are important: the corresponding entries in the right column add up to the product 85×18 = 1530:

 

Why some powers of two come in red, while others in gold? Those in red add up to the first multiplicand:

  85 = 1 + 4 + 16 + 64,

which corresponds to the binary representation of 85:

  85 = 10101012,

According to the Rhind papyrus these powers are found the following way.

64 is included simply because it’s the largest power below 85. Compute 85 – 64 = 21 and find the largest power of 2 below 21: 16. Compute 21 – 16 = 5 and find the largest power of 2 below 5: 4. Compute 5 – 4 = 1 and observe that the result, 1, is a power of 2: 1 = 20. This is a reason to stop. The powers of two that go into 85 are 64, 16, 4, 1.

For the product 18×85, we get the following result:

 

It is also called as Russian peasant Algorithm.

Now let us deal this problem in python

first prepare imports

from __future__ import division
import math

 

we can design a function that returns the greatest power of 2 which is less than or equal to the given no.Because we need to frequently use that concept.

def greatest2power(n,i=0):
    while int(math.pow(2,i)) <= n : i = i+1
    return int(math.pow(2,i-1))

Now let us take inputs.a multiplier, and a multiplicand.

m = int(raw_input('Enter multiplicand'))
n = int(raw_input('Enter multiplier'))

 Now according to the above description,Set greatest to first,and least to second.

if m>n : first , second = m , n
else : first , second = n , m

 We are simulating two columns for 85 and 18 with fcol,scol.Seed is the two multiple which is used to populate those columns according to the algorithm. 

fcol , scol = [] , []
seed = 1

 Now we are populating the two columns with the values generated as algorithm described.Code snippet below is quite obvious.

while seed <= greatest2power(first):
    fcol.append(seed)
    scol.append(second*seed)
    seed = seed*2

 Now we need to compute the valid powers of two which are subtracting from the first element and store them in a list.

valid , backseed = [] , seed//2
while backseed>=1:
    valid.append(backseed)
    temp = backseed
    backseed = greatest2power(first-backseed)
    first = first - temp

The above snippet is analogous to (85-64=21, 21>16) and (21-16=5,5>4),     (5-4=1>=1).so [64,16,4,1] are the valid powers of 2.

Now we iterate over that zip of fcol , scol in order to fetch the corresponding element for a valid two power.

answer = 0
for sol in valid:
    for a,b in zip(fcol,scol):
        if a==sol:
            answer = answer+b

Finally we got the answer stored in answer variable.we are printing it.

print 'The Egyptian Product is:%d'%answer 

What is the specialty in this?.Instead we can do straight forward multiplication.The actual beauty lies in the Egyptian strategy was they used only 2 in their calculation.If you see in the program raising a 2 power is equivalent of adding 2 extra to it.So Egyptians did multiplications with addition operator and number 2 as we did in program.Here goes the complete code here.

https://drive.google.com/file/d/0B6VAvV8caRaBd1JDQU5tendBVVE/view?usp=sharing

resources :

http://www.cut-the-knot.org/Curriculum/Algebra/EgyptianMultiplication.shtml

http://en.wikipedia.org/wiki/Ancient_Egyptian_multiplication

Alas, Julius Caesar doesn’t have python in 50 BC

 

 

CuteCaesar-EtTuBwute

We all know that Julius Caesar is a Roman dictator , who is also notable for his initial cryptography studies.The one thing all of us are unaware is hundreds of trees were cut down in 50 BC to provide Cipher wheels to all the Roman generals.A Cipher wheel is a data encrypting device that use Caesar cipher algorithm which gave the base idea for all the modern encryption technologies.

Little past

The Roman ruler Julius Caesar (100 B.C. – 44 B.C.) used a very simple cipher for secret communication. He substituted each letter of the alphabet with a letter three positions further along. Later, any cipher that used this “displacement” concept for the creation of a cipher alphabet, was referred to as a Caesar cipher. Of all the substitution type ciphers, this Caesar cipher is the simplest to solve, since there are only 25 possible combinations.

What is a Cipher wheel ?

A cipher wheel is an encrypting device that consists of two concentric circles inner circle and outer circle.The inner circle is fixed and outer circle is rotated randomly,so that it stops at some point.Then ‘A’ of outer circle is tallied with the position of ‘A’ of inner circle.That position is considered as key and the mapping of all the positions of outer and inner circles is used as encrypting logic.

 

here key = 3 ,since ‘A’ of outer circle is on ‘D’ of inner circle

Why Julius Caesar wonder if he is alive ?

If encrypting message is small, then it can be encrypted  using a Cipher disk by hand.But if message consists of thousands of lines, then computing power can only make it as easy as a ‘Home Alone’ task.Unfortunately Caesar doesn’t have a computer and a python interpreter in it to do that.If he is alive,he might have been wondered how simple it is to implement any mathematical algorithm in python.We here now building a Cipher wheel in python, a minimal encryption program for communicating secrets.

Ready,set go. build it

#cipherwheel.py
import string
from random import randrange

#functions for encryption and decryption

def encrypt(m):
    #define circular wheels
    inner_wheel = [i for i in string.lowercase]
    outer_wheel = inner_wheel
    #caluclate random secret key
    while True:
        key = randrange(26)
        if key!=0:
            break
    cipher_dict={}
    #map the encryption logic
    original_key =key
    for i in range(26):
        cipher_dict[outer_wheel[i]] = inner_wheel[key%26]
        key = key+1
    #getting encrypted message
    print 'Encrypted with secret key ->> %d\n'%original_key
    cipher = ''.join([cipher_dict[i] if i!=' ' else ' ' for i in m])
    return cipher,original_key

def decrypt(cipher,key):
    inner_wheel = [i for i in string.lowercase]    
    outer_wheel = inner_wheel
    cipher_dict={}
    for i in range(26):
        cipher_dict[outer_wheel[i]] = inner_wheel[key%26]
        key = key+1
    #decryption logic
    reverse_dict = dict(zip(cipher_dict.values() , cipher_dict.keys()))

    #getting original message back
    message = ''.join([reverse_dict[i] if i!=' ' else ' ' for i in cipher])
    return message

#Using cipher wheel here


while True:
    s = raw_input("Enter your secret message:")
    encrypted = encrypt(s)
    print 'encrypted message ->> %s\n'%(encrypted[0])
    print 'decrypted message ->> %s\n'%decrypt(encrypted[0],encrypted[1])

This is the small basic encryption system that uses Caesar cipher as its algorithm.Let us do anatomy of program and try to understand how it was built.

Anatomy of above Caesar wheel

First let us design encrypt function with cipher wheel.It is analogous to encrypt() function in our program caesarcipher.py

We need an inner wheel,an outer wheel initialized with 26 alphabets.So for that use string module variable string.lowercase that returns ‘abcd……xyz’.So we are splitting it to get list of alphabets.

import string
inner_wheel = [i for i in string.lowercase] 
outer_wheel = inner_wheel

so now both outer and inner circles are initialized with list of alphabets.Now when outer circle is rotated it should stop at some random point which is key of the algorithm.

from random import randrange
#rotating outer circle i.e generating random key
while True:
    key = randrange(26)
    if key!=0:
    break

 

Here program is rotating the outer circle and  generating a random key which is used to encrypt message.While encrypting i lose key value,so we are making backup for it.

original_key =key

Now we need to create a mapping dictionary that maps the ‘a’ of outer circle to the respective alphabet of the inner circle at the position of key.For example if key=2,then ‘a’ of outer circle is mapped with ‘c’ of inner circle because c has the ‘2’ index in the list.This mapping procedure is done with the below code.

cipher_dict={}
for i in range(26):
    cipher_dict[outer_wheel[i]] = inner_wheel[key%26]
    key = key+1

 

By this a mapping dictionary according to a randomly generated key is formed.Now we need to use this dictionary to translate original message into secret message.

cipher = ''.join([cipher_dict[i] if i!=' ' else ' ' for i in m])
return cipher,original_key

 

cipher is the secret message created by the encryption mapping dictionary cipher_dict.Next we are sending both cipher,randomly generated key from encrypt() function

Decryption process is similar but we need to reverse map the dictionary in order to get original message.This intelligent one line tweak shows the expressive power of python.

#reverse map the dictionary
reverse_dict = dict(zip(cipher_dict.values() , cipher_dict.keys()))

#get original message from cipher
message = ''.join([reverse_dict[i] if i!=' ' else ' ' for i in cipher])

 

So the final output for the program look like this.

progout

 We got it. We designed a Cipher wheel with python.There are many other design aspects like using special symbols,combination of lower and upper cases in the message.

Caution

This is the very basic encryption algorithm that came into mind of Julius Caesar.He might not expected that, with the very same python in few seconds we can crack algorithm, because only 26 combinations are required for brute force.So don’t use this algorithm for commercial purpose(don’t reveal to kids).My intention is to show ‘how to build practical things with python’.In next article i come up with ‘Transposition cipher’ which is more powerful than Caesar cipher but not most powerful one.
You can download source code for cipher wheel here: cipherwheel.py

Screenshot top 20 websites from top 20 categories using python

 

Yes you heard it right.In this post we are going to simulate the way back machine using python.We are going to take the screenshots of top 20 websites from top 20 categories .

We are creating here a project similar to http://www.waybackmachine.com . But here we are going to save  a screenshot of a top website in the form of image in our computer.Along with that,we can save all those top websites URL in a text file for the future use.

Let us build a SnapShotter

For building the SnapShotter(i named it that way) we need to face two questions.

1.How to get the URL of 20 top websites in different categories?

2.Then how to navigate to that URL and snapshot it?.

So for this we incline to a step by step approach. Everything will be clear in this post. No hurry bury.

step 1 : Know about spynner

First we look about how to screenshot a web page?. There is a great python library called webkit to help us.But even a wrapper library for webkit is developed which is easy to use,and it’s name is spynner. Why it is named spynner, because it helps us to perform headless testing of web page rendering similar to pantomJS and acts as a spy in war.

I advice you to install spynner. Don’t jump for PIP to install.A clear installation procedure is given here, once refer . install Spynner .

Now open the python terminal and type following

>>>import spynner
>>>browser = spynner.Browser()
>>>browser.load('www.example.com')
>>>browser.snapshot().save('example.png')

We are creating a browser instance. Next we are loading an URL to that headless browser.last line screenshots example.com and saves that png in the current working directory with file name ‘example.png’.

So now we have a way to capture webpage into an image.Now let’s go and get required URL for our project.

step 2 : Design scraper

We need to write one small web crawler to fetch the required URL from top websites.I found this website http://www.top20.com that lists top 20 websites from top 20 categories .First roam the website and see how it was designed . So we need to have 400+ URL get screenshot .Doing this thing manually is a Herculean task and, that is why we require a crawler here.

#scraper.py
from lxml import html
import requests
def scrape(url,expr):
    #get the response
    page=requests.get(url)
    #build lxml tree from response body
    tree=html.fromstring(page.text)
    #use xpath() to fetch DOM elements
    url_box=set(tree.xpath(expr))
    return url_box
 We are creating here a new file called scraper.py with a function called scrape() in it. We are going to use this to build our crawler. Observe that ,the scrape function takes a URL and a XPATH expression as it’s arguments. it returns  a set of all the URL’s in a given webpage. For crawling from one web page to the another we requires all the navigating URL’s from that page.
step 3: Design crawler body
 Now we are going to write code to scrape all the links of top websites from http://www.top20.com
#SnapShotter.py
from scraper import scrape
import spynner

#Initializations
browser=spynner.Browser()
w = open('top20sites.txt','w')
base_url = 'http://www.top20.com'
Now we are done with the imports and Initialization task.Next job is to write handlers for navigating from one webpage to another.
def scrape_page():
    for scraped_url in scrape(base_url,'//a[@class="link"]/@href'):
        yield scraped_url

scrape_page() function calls the scrape() with base_url and XPATH expression and gets the URL for different categories.It yields that URL. XPATH expression is designed totally by observing the DOM structure of webpage.If you have doubts on writing XPATH expressions kindly refer this. http://lxml.de/xpathxslt.html

def scrape_absolute_url():
    for scraped_url in scrape_page():
    for final_url in scrape(scraped_url,'//a[@class="link"]/@href'):
        yield final_url

This is second call back for second page which consists of top 20 websites for a category.It gets the each category link by calling  scrape_page() callback.It sends all the 20 websites URL to scrape() function with a XPATH expression.This function yields the top website URL which we capture in the another function called save_url()

def save_url():
    for final_url in scrape_absolute_url():
        browser.load(final_url)
        browser.snapshot().save('%s.png'%(final_url))
        w.write(final_url+'\n')

This save_url creates a screenshot for the website whose URL is passed into the function and also write that URL to a text file called   “top20sites.txt” which we opened before.

step 4: Initiate calling of handlers
save_url()

This is the starting point of our program.we need to call save_url which calls scrape_absolute_url that in turn calls scrape_page.See how callbacks are transferring the control.Beauty isn’t it you felt ?.

w.close()

Next we need to close the file.That’s it ,our entire code looks this way.

step 5: Complete code
#ScreenShotter.py
from scraper import scrape
import spynner

#Initializations
browser=spynner.Browser()
w = open('top20sites.txt','w')
base_url = 'http://www.top20.com'

#rock the spider from here

def scrape_page():
    for scraped_url in scrape(base_url,'//a[@class="link"]/@href'):
        yield scraped_url

def scrape_absolute_url():
    for scraped_url in scrape_page():
        for final_url in scrape(scraped_url,'//a[@class="link"]/@href'):
            yield final_url

def save_url():
    for final_url in scrape_absolute_url():
        browser.load(final_url)
        browser.snapshot().save('%s.png'%(final_url))
        w.write(final_url+'\n')

save_url()
w.close()

This completes our ScreenShotter and you will get image screenshots in your directory along with a text file  listing URL of all top websites.Here i am showing the text file which is generated for me. https://app.box.com/s/895ypei1mlzb2yk0p0gb

Hope you enjoyed this post.This is the basic way to scrape the web systematically.