How to wake up a Python script, while you are in a sound sleep?

pract

We all know that programmes die after execution.Data may be persistent if serialized.So consider the case that we need to backup all the logs,or delete something periodically.We need a scheduler to do that.One great scheduler available for Linux systems is CRON scheduler.There are two things having subtle importance here.

1) How to use CRON scheduler to execute any command on a Linux computer.

2)How to use APScheduler to run some functions in a python script for a particular time.

Crontab

The crontab (cron derives from chronos, Greek for time; tab stands for table) command, found in Unix and Unix-like operating systems, is used to schedule commands to be executed periodically. To see what crontabs are currently running on your system, you can open a terminal and run:

$ crontab -l

If we want to create new job or edit the existing crontab job Just type

$ crontab -e
 If we wish to remove crontabs just type

$ crontab -r

It removes all crontabs.What is crontab ?,it is a file to which jobs are added .So add jobs to the end of that file by launching command “$ crontab -e”

This will open a the default editor (could be vi or pico, if you want you can change the default editor) to let us manipulate the crontab. If you save and exit the editor, all your cronjobs are saved into crontab. Cronjobs are written in the following format,here any valid command can be used after 5 *:

* * * * * /bin/execute/this/script.sh
* * * * * [any_valid_linux_command]

Scheduling

 

As you can see there are 5 stars. The stars represent different date parts in the following order:

  • minute (from 0 to 59)
  • hour (from 0 to 23)
  • day of month (from 1 to 31)
  • month (from 1 to 12)
  • day of week (from 0 to 6) (0=Sunday)

Execute every minute

If you leave the star, or asterisk, it means every. Maybe that’s a bit unclear. Let’s use the the previous example again:

* * * * * python /home/execute/this/funny.py

They are all still asterisks! So this means execute /home/execute/this/funny.py:

  • every minute
  • of every hour
  • of every day of the month
  • of every month
  • and every day in the week.

In short: This script is being executed every minute. Without exception.

Execute every Friday 1AM

So if we want to schedule the python script to run at 1AM every Friday, we would need the following cronjob:

0 1 * * 5 python /home/aryan/this/script.py

Get it? The script is now being executed when the system clock hits:

  • minute: 0
  • of hour: 1
  • of day of month: * (every day of month)
  • of month: * (every month)
  • and weekday: 5 (=Friday)

Execute on workdays 1AM

So if we want to schedule the python script to Monday till Friday at 1 AM, we would need the following cronjob:

0 1 * * 1-5 python /bin/execute/this/script.py

Neat scheduling tricks

What if you’d want to run something every 10 minutes? Well you could do this:

0,10,20,30,40,50 * * * * python /bin/execute/this/script.py

But crontab allows you to do this as well:

*/10 * * * * python /bin/execute/this/script.py

 

Storing the crontab output

By default cron saves the output of /bin/execute/this/backup.py in the user’s mailbox (root in this case). But it’s prettier if the output is saved in a separate logfile. Here’s how:

*/10 * * * * python /bin/execute/this/backup.py >> /var/log/script_output.log 2>&1

Linux can report on different levels. There’s standard output (STDOUT) and standard errors (STDERR). STDOUT is marked 1, STDERR is marked 2. So the following statement tells Linux to store STDERR in STDOUT as well, creating one datastream for messages & errors:

2>&1

 this is a shortcut illustration to up and run with cron

So now we understood how to run a python script for particular time.This is doing outside the program,means scheduling python programme manually by programmer.But sometimes we require to schedule from inside the program.For that we use a good library called APScheduler.you can install it by using the following command.

$ sudo pip install apscheduler
Ok after installing APScheduler,we can see how simple scheduling any job.Here we are going to a level deeper and schedule python functions to execute at a particular time.Here jobs are python functions.
from apscheduler.scheduler import Scheduler
#start the scheduler i.e create instance
sched=Scheduler
sched.start()
def my_job:
    print 'Happy_Birthday,Aryan'
#schedules job function my_job to greet me every year on my Birthday
sched.add_cron_job(my_job,month=6,day=24,hour=0)
 So this script greets me on my birthday by running the function every year.Running the script is done by the step 1 i.e cron scheduling,and inside the script scheduling is handled by APScheduler,see how goodies are provided for us.Here my_job is doing simple task but in real time systems,this mean anything like taking backup,deleting logs,housekeeping etc.
There are lots of things we can do with APScheduler,by adding jobs to Mongostore,redis store etc.For full fledged documentation kindly go to
and especially for cron format go to this page,and explore yourself.

Last but not least

Sometimes we need to create some python applications for operating systems using  KDE,GTK+.So those applications should run at system start up ,for that we need to add this  below command in the crontab file.Just add this simple line in crontab file

@reboot python /home/arya/Timepass.py &
 here @reboot tells that command should be executed at reboot,and & tells that process to run background.This finishes our little chit chat about CRON and APScheduler.Hope you enjoyed this post.

Understanding Egyptian multiplication via Python

Egyptian Multiplication

The ancient Egyptians used a curious way to multiply two numbers. The algorithm draws on thebinary system: multiplication by 2, or just adding a number two itself. Unlike, the Russian Peasant Multiplication that determines the involved powers of 2 automatically, the Egyptian algorithm has an extra step where those powers have to be found explicitly.

The applet below allows for experimentation with the algorithm I’ll present shortly. The two blue numbers at the top – the multiplicands – can be modified by clicking on their digits. (The digits can be treated individually or as part of a number depending on the state of the “Autonomous digits” checkbox.) The number of digits in the multiplicands changes from 1 through 4.


Write two multiplicands with some room in-between as the captions for two columns of numbers. The first column starts with 1 and the second with the second multiplicand. Below, in each column, write successively the doubles of the preceding numbers. The first column will generate the sequence of the powers of 2: 1, 2, 4, 8, … Stop when the next power becomes greater than the first multiplicand. I’ll use the same example as in the Russian Peasant Multiplication, 85×18:

 

 

The right column is exactly the same as it would be in the Russian Peasant Multiplication. The left column consists of the powers of two. The red ones are important: the corresponding entries in the right column add up to the product 85×18 = 1530:

 

Why some powers of two come in red, while others in gold? Those in red add up to the first multiplicand:

  85 = 1 + 4 + 16 + 64,

which corresponds to the binary representation of 85:

  85 = 10101012,

According to the Rhind papyrus these powers are found the following way.

64 is included simply because it’s the largest power below 85. Compute 85 – 64 = 21 and find the largest power of 2 below 21: 16. Compute 21 – 16 = 5 and find the largest power of 2 below 5: 4. Compute 5 – 4 = 1 and observe that the result, 1, is a power of 2: 1 = 20. This is a reason to stop. The powers of two that go into 85 are 64, 16, 4, 1.

For the product 18×85, we get the following result:

 

It is also called as Russian peasant Algorithm.

Now let us deal this problem in python

first prepare imports

from __future__ import division
import math

 

we can design a function that returns the greatest power of 2 which is less than or equal to the given no.Because we need to frequently use that concept.

def greatest2power(n,i=0):
    while int(math.pow(2,i)) <= n : i = i+1
    return int(math.pow(2,i-1))

Now let us take inputs.a multiplier, and a multiplicand.

m = int(raw_input('Enter multiplicand'))
n = int(raw_input('Enter multiplier'))

 Now according to the above description,Set greatest to first,and least to second.

if m>n : first , second = m , n
else : first , second = n , m

 We are simulating two columns for 85 and 18 with fcol,scol.Seed is the two multiple which is used to populate those columns according to the algorithm. 

fcol , scol = [] , []
seed = 1

 Now we are populating the two columns with the values generated as algorithm described.Code snippet below is quite obvious.

while seed <= greatest2power(first):
    fcol.append(seed)
    scol.append(second*seed)
    seed = seed*2

 Now we need to compute the valid powers of two which are subtracting from the first element and store them in a list.

valid , backseed = [] , seed//2
while backseed>=1:
    valid.append(backseed)
    temp = backseed
    backseed = greatest2power(first-backseed)
    first = first - temp

The above snippet is analogous to (85-64=21, 21>16) and (21-16=5,5>4),     (5-4=1>=1).so [64,16,4,1] are the valid powers of 2.

Now we iterate over that zip of fcol , scol in order to fetch the corresponding element for a valid two power.

answer = 0
for sol in valid:
    for a,b in zip(fcol,scol):
        if a==sol:
            answer = answer+b

Finally we got the answer stored in answer variable.we are printing it.

print 'The Egyptian Product is:%d'%answer 

What is the specialty in this?.Instead we can do straight forward multiplication.The actual beauty lies in the Egyptian strategy was they used only 2 in their calculation.If you see in the program raising a 2 power is equivalent of adding 2 extra to it.So Egyptians did multiplications with addition operator and number 2 as we did in program.Here goes the complete code here.

https://drive.google.com/file/d/0B6VAvV8caRaBd1JDQU5tendBVVE/view?usp=sharing

resources :

http://www.cut-the-knot.org/Curriculum/Algebra/EgyptianMultiplication.shtml

http://en.wikipedia.org/wiki/Ancient_Egyptian_multiplication

Alas, Julius Caesar doesn’t have python in 50 BC

 

 

CuteCaesar-EtTuBwute

We all know that Julius Caesar is a Roman dictator , who is also notable for his initial cryptography studies.The one thing all of us are unaware is hundreds of trees were cut down in 50 BC to provide Cipher wheels to all the Roman generals.A Cipher wheel is a data encrypting device that use Caesar cipher algorithm which gave the base idea for all the modern encryption technologies.

Little past

The Roman ruler Julius Caesar (100 B.C. – 44 B.C.) used a very simple cipher for secret communication. He substituted each letter of the alphabet with a letter three positions further along. Later, any cipher that used this “displacement” concept for the creation of a cipher alphabet, was referred to as a Caesar cipher. Of all the substitution type ciphers, this Caesar cipher is the simplest to solve, since there are only 25 possible combinations.

What is a Cipher wheel ?

A cipher wheel is an encrypting device that consists of two concentric circles inner circle and outer circle.The inner circle is fixed and outer circle is rotated randomly,so that it stops at some point.Then ‘A’ of outer circle is tallied with the position of ‘A’ of inner circle.That position is considered as key and the mapping of all the positions of outer and inner circles is used as encrypting logic.

 

here key = 3 ,since ‘A’ of outer circle is on ‘D’ of inner circle

Why Julius Caesar wonder if he is alive ?

If encrypting message is small, then it can be encrypted  using a Cipher disk by hand.But if message consists of thousands of lines, then computing power can only make it as easy as a ‘Home Alone’ task.Unfortunately Caesar doesn’t have a computer and a python interpreter in it to do that.If he is alive,he might have been wondered how simple it is to implement any mathematical algorithm in python.We here now building a Cipher wheel in python, a minimal encryption program for communicating secrets.

Ready,set go. build it

#cipherwheel.py
import string
from random import randrange

#functions for encryption and decryption

def encrypt(m):
    #define circular wheels
    inner_wheel = [i for i in string.lowercase]
    outer_wheel = inner_wheel
    #caluclate random secret key
    while True:
        key = randrange(26)
        if key!=0:
            break
    cipher_dict={}
    #map the encryption logic
    original_key =key
    for i in range(26):
        cipher_dict[outer_wheel[i]] = inner_wheel[key%26]
        key = key+1
    #getting encrypted message
    print 'Encrypted with secret key ->> %d\n'%original_key
    cipher = ''.join([cipher_dict[i] if i!=' ' else ' ' for i in m])
    return cipher,original_key

def decrypt(cipher,key):
    inner_wheel = [i for i in string.lowercase]    
    outer_wheel = inner_wheel
    cipher_dict={}
    for i in range(26):
        cipher_dict[outer_wheel[i]] = inner_wheel[key%26]
        key = key+1
    #decryption logic
    reverse_dict = dict(zip(cipher_dict.values() , cipher_dict.keys()))

    #getting original message back
    message = ''.join([reverse_dict[i] if i!=' ' else ' ' for i in cipher])
    return message

#Using cipher wheel here


while True:
    s = raw_input("Enter your secret message:")
    encrypted = encrypt(s)
    print 'encrypted message ->> %s\n'%(encrypted[0])
    print 'decrypted message ->> %s\n'%decrypt(encrypted[0],encrypted[1])

This is the small basic encryption system that uses Caesar cipher as its algorithm.Let us do anatomy of program and try to understand how it was built.

Anatomy of above Caesar wheel

First let us design encrypt function with cipher wheel.It is analogous to encrypt() function in our program caesarcipher.py

We need an inner wheel,an outer wheel initialized with 26 alphabets.So for that use string module variable string.lowercase that returns ‘abcd……xyz’.So we are splitting it to get list of alphabets.

import string
inner_wheel = [i for i in string.lowercase] 
outer_wheel = inner_wheel

so now both outer and inner circles are initialized with list of alphabets.Now when outer circle is rotated it should stop at some random point which is key of the algorithm.

from random import randrange
#rotating outer circle i.e generating random key
while True:
    key = randrange(26)
    if key!=0:
    break

 

Here program is rotating the outer circle and  generating a random key which is used to encrypt message.While encrypting i lose key value,so we are making backup for it.

original_key =key

Now we need to create a mapping dictionary that maps the ‘a’ of outer circle to the respective alphabet of the inner circle at the position of key.For example if key=2,then ‘a’ of outer circle is mapped with ‘c’ of inner circle because c has the ‘2’ index in the list.This mapping procedure is done with the below code.

cipher_dict={}
for i in range(26):
    cipher_dict[outer_wheel[i]] = inner_wheel[key%26]
    key = key+1

 

By this a mapping dictionary according to a randomly generated key is formed.Now we need to use this dictionary to translate original message into secret message.

cipher = ''.join([cipher_dict[i] if i!=' ' else ' ' for i in m])
return cipher,original_key

 

cipher is the secret message created by the encryption mapping dictionary cipher_dict.Next we are sending both cipher,randomly generated key from encrypt() function

Decryption process is similar but we need to reverse map the dictionary in order to get original message.This intelligent one line tweak shows the expressive power of python.

#reverse map the dictionary
reverse_dict = dict(zip(cipher_dict.values() , cipher_dict.keys()))

#get original message from cipher
message = ''.join([reverse_dict[i] if i!=' ' else ' ' for i in cipher])

 

So the final output for the program look like this.

progout

 We got it. We designed a Cipher wheel with python.There are many other design aspects like using special symbols,combination of lower and upper cases in the message.

Caution

This is the very basic encryption algorithm that came into mind of Julius Caesar.He might not expected that, with the very same python in few seconds we can crack algorithm, because only 26 combinations are required for brute force.So don’t use this algorithm for commercial purpose(don’t reveal to kids).My intention is to show ‘how to build practical things with python’.In next article i come up with ‘Transposition cipher’ which is more powerful than Caesar cipher but not most powerful one.
You can download source code for cipher wheel here: cipherwheel.py

Screenshot top 20 websites from top 20 categories using python

 

Yes you heard it right.In this post we are going to simulate the way back machine using python.We are going to take the screenshots of top 20 websites from top 20 categories .

We are creating here a project similar to http://www.waybackmachine.com . But here we are going to save  a screenshot of a top website in the form of image in our computer.Along with that,we can save all those top websites URL in a text file for the future use.

Let us build a SnapShotter

For building the SnapShotter(i named it that way) we need to face two questions.

1.How to get the URL of 20 top websites in different categories?

2.Then how to navigate to that URL and snapshot it?.

So for this we incline to a step by step approach. Everything will be clear in this post. No hurry bury.

step 1 : Know about spynner

First we look about how to screenshot a web page?. There is a great python library called webkit to help us.But even a wrapper library for webkit is developed which is easy to use,and it’s name is spynner. Why it is named spynner, because it helps us to perform headless testing of web page rendering similar to pantomJS and acts as a spy in war.

I advice you to install spynner. Don’t jump for PIP to install.A clear installation procedure is given here, once refer . install Spynner .

Now open the python terminal and type following

>>>import spynner
>>>browser = spynner.Browser()
>>>browser.load('www.example.com')
>>>browser.snapshot().save('example.png')

We are creating a browser instance. Next we are loading an URL to that headless browser.last line screenshots example.com and saves that png in the current working directory with file name ‘example.png’.

So now we have a way to capture webpage into an image.Now let’s go and get required URL for our project.

step 2 : Design scraper

We need to write one small web crawler to fetch the required URL from top websites.I found this website http://www.top20.com that lists top 20 websites from top 20 categories .First roam the website and see how it was designed . So we need to have 400+ URL get screenshot .Doing this thing manually is a Herculean task and, that is why we require a crawler here.

#scraper.py
from lxml import html
import requests
def scrape(url,expr):
    #get the response
    page=requests.get(url)
    #build lxml tree from response body
    tree=html.fromstring(page.text)
    #use xpath() to fetch DOM elements
    url_box=set(tree.xpath(expr))
    return url_box
 We are creating here a new file called scraper.py with a function called scrape() in it. We are going to use this to build our crawler. Observe that ,the scrape function takes a URL and a XPATH expression as it’s arguments. it returns  a set of all the URL’s in a given webpage. For crawling from one web page to the another we requires all the navigating URL’s from that page.
step 3: Design crawler body
 Now we are going to write code to scrape all the links of top websites from http://www.top20.com
#SnapShotter.py
from scraper import scrape
import spynner

#Initializations
browser=spynner.Browser()
w = open('top20sites.txt','w')
base_url = 'http://www.top20.com'
Now we are done with the imports and Initialization task.Next job is to write handlers for navigating from one webpage to another.
def scrape_page():
    for scraped_url in scrape(base_url,'//a[@class="link"]/@href'):
        yield scraped_url

scrape_page() function calls the scrape() with base_url and XPATH expression and gets the URL for different categories.It yields that URL. XPATH expression is designed totally by observing the DOM structure of webpage.If you have doubts on writing XPATH expressions kindly refer this. http://lxml.de/xpathxslt.html

def scrape_absolute_url():
    for scraped_url in scrape_page():
    for final_url in scrape(scraped_url,'//a[@class="link"]/@href'):
        yield final_url

This is second call back for second page which consists of top 20 websites for a category.It gets the each category link by calling  scrape_page() callback.It sends all the 20 websites URL to scrape() function with a XPATH expression.This function yields the top website URL which we capture in the another function called save_url()

def save_url():
    for final_url in scrape_absolute_url():
        browser.load(final_url)
        browser.snapshot().save('%s.png'%(final_url))
        w.write(final_url+'\n')

This save_url creates a screenshot for the website whose URL is passed into the function and also write that URL to a text file called   “top20sites.txt” which we opened before.

step 4: Initiate calling of handlers
save_url()

This is the starting point of our program.we need to call save_url which calls scrape_absolute_url that in turn calls scrape_page.See how callbacks are transferring the control.Beauty isn’t it you felt ?.

w.close()

Next we need to close the file.That’s it ,our entire code looks this way.

step 5: Complete code
#ScreenShotter.py
from scraper import scrape
import spynner

#Initializations
browser=spynner.Browser()
w = open('top20sites.txt','w')
base_url = 'http://www.top20.com'

#rock the spider from here

def scrape_page():
    for scraped_url in scrape(base_url,'//a[@class="link"]/@href'):
        yield scraped_url

def scrape_absolute_url():
    for scraped_url in scrape_page():
        for final_url in scrape(scraped_url,'//a[@class="link"]/@href'):
            yield final_url

def save_url():
    for final_url in scrape_absolute_url():
        browser.load(final_url)
        browser.snapshot().save('%s.png'%(final_url))
        w.write(final_url+'\n')

save_url()
w.close()

This completes our ScreenShotter and you will get image screenshots in your directory along with a text file  listing URL of all top websites.Here i am showing the text file which is generated for me. https://app.box.com/s/895ypei1mlzb2yk0p0gb

Hope you enjoyed this post.This is the basic way to scrape the web systematically.

How I satisfied a request from my friend with Python

 

You may have wondered by now why the title is in that way. Yes, my friend requested me to do a task. What is the request and how I completed that will be known to you if you read this story.

The Story

Two days back my friend Sai Madhu is leaving to his home place from our office in Kochi, which is 700 miles away. He booked the train and should leave in the afternoon. He is very passionate about cricket. He never missed the score when the Indian cricket team is playing. On the same day there is an ODI match between England and  India. Because he doesn’t have a smartphone to access, in the train, he is crippled  to know the score. But he had a feature phone. He requested me to send the score as sms frequently, and I gave my word. I have lots of work to do and thought for a moment “Whether I unthinkingly gave my word to him? No whatever I should send him the score”. Then python came into the rescue. In the next section you find how I successfully handled Sai Madhu request.

Python pulled rabbit out of hat

I thought “why  do I not automate the process?” and  two things came into my mind.

1. Scraping the score from the live score website

2. Send him an SMS, consisting score through a message API

3. SMS is sent for every 3 minutes

Step 1:

import requests

from lxml import html

Step 2:

I am using twilio for sending the SMS here. You can sign up  a free account and  can enjoy sending SMS for verified numbers. There are  lots of free SMS api but twilio works perfectly. I verified Sai Madhu no and kept ready to go. To access our twilio account from python program,we require to install  a package called twilio and we can get it easily by typing

$ sudo pip install twilio

Now installation is over and twilio library is ready to use.Do this final import.

from twilio.rest import TwilioRestClient

Now we imported all the tools required for completing my job of automation.

import requests
from lxml import html
from twilio.rest import TwilioRestClient
import time

 

Ok, i also imported time package to use the sleep function in it to make delay of 3 minutes between messages. Twilio provides Account SID  and Authorization token which are required to send a message. We will instantiate a Twillio Rest client with those details. But I am defining a function separately for dealing the sending stuff.I named it sendscore.

def sendscore(body):
    account_sid = "ACe5a382a0fe505XXXXXXXXXXXXXXXXXXX"
    auth_token = "cc2b50d82df3XXXXXXXXXXXXXXXXXXXXXXX"
    client = TwilioRestClient(account_sid, auth_token)
    message = client.messages.create(body=body,to="+919052108147",from_="+1 720-548-2740")
    print message.sid

 

So simple. First i created a twilio client  and passed Account SID,Authorization token as arguments.Then i will get a connection to my twilio account.There are many functions available on a twilio client for sending SMS,sending MMS,making a call etc.But i chose to send SMS. “twlio.messages” creates an instance for monitoring message transactions.I use create function on that  instance to send an SMS. ‘to’ argument is the verified destination no. ‘from_’  argument is the number  which also allocated to us on signup along with SID,token . If message is sent successfully SMS id will be returned else error message.I display it to know what happend actually.Now I design python code for scraping score.

page=requests.get('http://sports.ndtv.com/cricket/live-scores')
tree=html.fromstring(page.text)
score=(tree.xpath('//div[@class="ckt-scr"]/text()')[0].lstrip()).rstrip()

Ok now i got a clean score which is fetched using xpath method on tree object of lxml and formatting it neatly.Observe the way i send the body of the html response to the html.fromstring() to build an element tree.Now i am calling sendscore function with this score that fetched. as argument. It sends a message to the ‘to’ number with the score as it’s body.I also log the thing for monitoring the process.

sendscore(score)
print "%s sent at:%s "%(score,time.ctime())
#to delay a message by three minutes 
time.sleep(180)

This process should run for every three minutes.So i use while loop to achieve that and combine above two snippets.

while True:
    page=requests.get('http://sports.ndtv.com/cricket/live-scores')
    tree=html.fromstring(page.text)
    score=(tree.xpath('//div[@class="ckt-scr"]/text()')[0].lstrip()).rstrip()
    sendscore(score)
    print "%s sent at:%s "%(score,time.ctime())
    time.sleep(180)
This completes the task.Buliding all together will give us the program that sends the Score of England Vs India for every three minutes to Sai Madhu.
#final script for sending score.I name it sendscore.py
from twilio.rest import TwilioRestClient
from lxml import html
import requests
import time



def sendscore(body):
    #Your account_sid goes here
    account_sid = "ACe5a382a0fe505faaXXXXXXXXXXXXXXXX"
    #Your authorization token goes here
    auth_token = "cc2b50d82df3a31cXXXXXXXXXXXXXXXXXX"
    client = TwilioRestClient(account_sid, auth_token)
    #to = "number to which SMS should be sent",from_="your twilio number"
    message = client.messages.create(body=body,to="+919052108147",from_="+1 720-548-2740")
    print message.sid

while True:
    page=requests.get('http://sports.ndtv.com/cricket/live-scores')
    tree=html.fromstring(page.text)
    score=(tree.xpath('//div[@class="ckt-scr"]/text()')[0].lstrip()).rstrip()
    sendscore(score)
    print "%s sent at:%s "%(score,time.ctime())
    time.sleep(180)

You can press Ctrl+C to quit the program.Once you started running it,It automatically sends the score for every three minutes.I designed this program in 5 minutes and Sai Madhu is so much impressed for the accuracy of scores he got that day. “What not one can do with a computer,internet and python ? “.

My intention of writing this post is to show, how we can send a SMS via python program? .We can also send MMS and do calls.You can refer to complete API here. https://www.twilio.com/docs/api/rest .
Hope you enjoyed the post.Bye.

Forbes top 100 quote collecting spider with Dragline

 

In this post  we start with a new spider using Dragline crawling framework writing a jump start real world spider. By the end of this post additional to dragline api, we are able to cover some basic concepts of python.

Forbes compiles good quotes and displays them in its website.So you wish to save them to forward them to your close friends.Because you are a programmer that too a python coder and your intention is to fetch all the data from that website by using a spider.There are many spiders for understanding in coming days, but we here discuss a basic one for fetching Forbes quotes.This attempt is to make you familiar with dragline framework.

As already explained in previous post dragline got lot of invisible features which makes the spiders created by it smart .Hoping that we already installed dragline.If not, see instructions for installing in the previous post.

Task 1:

Learning Basics of dragline api

Dragline mainly consists of these major modules

      • dragline.http
      • dragline.htmlparser

dragline.http

It has a request method

class dragline.http.Request(url, method=’GET’, form_data=None, headers{}callback=None,meta=None)

Parameters:
  • url (string) – the URL of this request
  • method (string) – the HTTP method of this request. Defaults to 'GET'.
  • headers (dict) – the headers of this request.
  • callback (string) – name of the function to call after url is downloaded.
  • meta (dict) – A dict that contains arbitrary metadata for this request.
send()
This function sends HTTP requests.

Returns: response
Return type: dragline.http.Response
Raises: dragline.http.RequestError: when failed to fetch contents
>>> req = Request("http://www.example.org")
>>> response = req.send()
>>> print response.headers['status']
200

 and a Response method

class dragline.http.Response(url=None, body=None, headers=None, meta=None)

Parameters:
  • headers (dict) – the headers of this response.
  • body (str) – the response body.
  • meta (dict) – meta copied from request

This function is used to create user defined response to test your spider and also in many other cases. It is much easier than Requests module get method.

dragline.htmlParser

Basic parser module for extracting content from html data, there is a main function in htmlparser called as HtmlParser. Apart from entire Dragline,htmlparser alone is a powerful parsing application.

HtmlParser Function

dragline.htmlparser.HtmlParser(response)
Parameters: response (dragline.http.Response)

This method takes response object as its argument and returns the lxml etree object.

HtmlParser function returns a lxml object of type HtmlElement which got few potential methods. All the details of lxml object are discussed in section lxml.html.HtmlElement.

first we should create a HtmlElement object by sending appropriate URL as parameter.The URL is for the page we want to scrape.

HtmlElement object is returned by the HtmlParser function of dragline.htmlparser module:

>>> req = Request('www.gutenberg.com')
>>> parse_object = HtmlParser(req.send())
The methods upon HtmlElement object are:
extract_urls(xpath_expr)

This function fetches all the links from the webpage in response by the specified xpath as its argument.

If xpath is not included then links are fetched from entire document. From previous example let HtmlElement be parse_obj.

>>> parse_obj.extract_urls('//div[@class="product"]')
xpath(expression)

This function directly accumulate the results from the xpath expression.It is used to fetch the html body elements directly:

<html>
    <head>
    </head>
    <body>
        <div class="tree">
            <a href="http://www.treesforthefuture.org/">Botany</a>
        </div>
        <div class="animal">
            <a href="http://www.animalplanet.com/">Zoology</a>
        </div>
    </body>
</html>

then we can use the following XPath expressions.

>>> parse_object.extract_urls('//div[@class="tree"]')
extract_text(xpath_expr)

This function grabs all the text from the web page that specified.xpath is an optional argument.If specified the text obtained will be committed to condition in xpath expression.

     >>> parse_obj.extract_text('//html')

So now you have understood what are the main modules of dragline and important methods in those.

Now let’s begin our journey by writing small spider
First go to folder where you want to save your spider and follow the procedure below.
  • $ mkdir samplespider
  • $ cd samplespider
  • $ dragline-admin init forbesquotes

this creates a spider called forbesquotes in your newly created samplespider directory.

now you see a folder forbesquotes in samplespider and traverse into it

  • $ cd forbesquotes
 Task 2:

Writing a spider for collecting top 100 quotes frrom forbes

 

This is the 26 line spider for extracting top 100 quotes from forbes.

from dragline.htmlparser import HtmlParser
from dragline.http import Request
import re

class Spider:
    def __init__(self, conf):
    self.name = "forbesquotes"
    self.start = "http://www.forbes.com/sites/kevinkruse/2013/05/28/inspirational-quotes"
    self.allowed_domains = ['www.forbes.com']
    self.conf = conf
 
    def parse(self,response):
        html = HtmlParser(response)
        self.parseQuote(response) 
        for url in html.xpath('//span[@class="page_links"]/a/@href'):
            yield Request(url,callback="parseQuote")
 
    def parseQuote(self,response):
        print response.url
        html = HtmlParser(response)
        title = html.xpath('//div[@class="body contains_vestpocket"]/p/text()')
        quotes = [i.encode('ascii',"ignore") for i in title if i!=' '][2:]
        pat = re.compile(r'\d*\.')
        with open('quotes.txt','a') as fil:
        for quote in [i.split(pat.search(i).group(),1)[1] for i in quotes]:
            fil.write('\n'+quote+'\n')

This is a 26 line spider with dragline.By seeing it you might have not understood a bit from it.Let’s explain everything.

As already told when we create a new spider a new directory is formed in the name of spider.It consists of two files

  • main.py
  • settings.py

main.py looks like following with default class called spider and a methods init,parse.

from dragline.htmlparser import HtmlParser
from dragline.http import Request


class Spider:

    def __init__(self, conf):
       self.name = "forbesquotes"
       self.start = "http://www.example.org"
       self.allowed_domains = []
       self.conf = conf

    def parse(self,response):
       html = HtmlParser(response)

All these things are given to us as a gift without hardcoding them again.Just now we need to concentrate on how to attack the problem.

1) init method takes the starting url  and allowed domains from where spider to begin.

In our case forbesquotes spider starts in self.start = ‘http://www.forbes.com/sites/kevinkruse/2013/05/28/inspirational-quotes&#8217;

and set self.allowed_domains = [‘www.forbes.com’]

it is a list which can take more no of allowed domains

Now our main.py looks like

from dragline.htmlparser import HtmlParser
from dragline.http import Request


class Spider:

    def __init__(self, conf):
        self.name = "forbesquotes"
        self.start = "http://www.forbes.com/sites/kev inkruse/2013/05/28/inspirational-quotes"
        self.allowed_domains = ['www.forbes.com']
        self.conf = conf

    def parse(self,response):
        html = HtmlParser(response)

Ok now we should crawl through the pages ,so lets write a function called parseQuote for processing the page whose input is the response object and outcome is quotes from response page are written to a file.We should repeat parseQuote for no of times equal to the total no of pages in which quotes are available.So after adding the parseQuote function

from dragline.htmlparser import HtmlParser
from dragline.http import Request


class Spider:

    def __init__(self, conf):
        self.name = "forbesquotes"
        self.start = "http://www.forbes.com/sites/kevinkruse/2013/05/28/inspirational-quotes"
        self.allowed_domains = ['www.forbes.com']
        self.conf = conf

    def parse(self,response):
        html = HtmlParser(response)

 
    def parseQuote(self,response):
        print response.url
        html = HtmlParser(response)
        title = html.xpath('//div[@class="body contains_vestpocket"]/p/text()')
        quotes = [i.encode('ascii',"ignore") for i in title if i!=' '][2:]
        pat = re.compile(r'\d*\.')
        with open('quotes.txt','a') as fil:
        for quote in [i.split(pat.search(i).group(),1)[1] for i in quotes]:
        fil.write('\n'+quote+'\n')

If you observe parseQuote ,only first three lines were the job of framework and remaining code is pure python logic for stripping and editing the raw quotes fetched from response and then writing it to a file.

parse is the function where spider execution starts.We should supply callbacks from there to the pages where we wish to navigate.It means spider goes smartly in the path we mention.

So now i am adding content to parse method.After observing the web pages structure i am calling parseQuote on current response.

Next using extract_urls method of dragline HtmlElement object I extract all the urls specifying relevant XPATH and pass them as call backs for the parseQuote function.Resulting code looks like

from dragline.htmlparser import HtmlParser
from dragline.http import Request
import re

class Spider:
    def __init__(self, conf):
        self.name = "forbesquotes"
        self.start = "http://www.forbes.com/sites/kevinkruse/2013/05/28/inspirational-quotes"
        self.allowed_domains = ['www.forbes.com']
        self.conf = conf
 
   def parse(self,response):
       html = HtmlParser(response)
       self.parseQuote(response) 
       for url in html.extract_urls('//span[@class="page_links"]/a'):
           yield Request(url,callback="parseQuote")
 
   def parseQuote(self,response):
       print response.url
       html = HtmlParser(response)
       title = html.xpath('//div[@class="body contains_vestpocket"]/p/text()')
       quotes = [i.encode('ascii',"ignore") for i in title if i!=' '][2:]
       pat = re.compile(r'\d*\.')
       with open('quotes.txt','a') as fil:
           for quote in [i.split(pat.search(i).group(),1)[1] for i in quotes]:
               fil.write('\n'+quote+'\n')

so now after comlpleting the main.py just go to terminal and type following command to run spider.

  • $ dragline  .
  • $  dragline  /path_to_spider/  from outer paths

then our spider starts running with displaying all processed urls as information in command prompt.and a new file will be created in our current directory with top 100 quotes

 Life isnt about getting and having, its about giving and being. 

 Whatever the mind of man can conceive and believe, it can achieve. Napoleon Hill

 Strive not to be a success, but rather to be of value. Albert Einstein

 Two roads diverged in a wood, and II took the one less traveled by, And that has made all the difference. Robert Frost

 I attribute my success to this: I never gave or took any excuse. Florence Nightingale

 You miss 100% of the shots you dont take. Wayne Gretzky

 Ive missed more than 9000 shots in my career. Ive lost almost 300 games. 26 times Ive been trusted to take the game winning shot and missed. Ive failed over and over and over again in my life. And that is why I succeed. Michael Jordan

 The most difficult thing is the decision to act, the rest is merely tenacity. Amelia Earhart

 Every strike brings me closer to the next home run. Babe Ruth

 Definiteness of purpose is the starting point of all achievement. W. Clement Stone

 We must balance conspicuous consumption with conscious capitalism. Kevin Kruse

 Life is what happens to you while youre busy making other plans. John Lennon

 We become what we think about. Earl Nightingale

Twenty years from now you will be more disappointed by the things that you didnt do than by the ones you did do, so throw off the bowlines, sail away from safe harbor, catch the trade winds in your sails. Explore, Dream, Discover. Mark Twain

Life is 10% what happens to me and 90% of how I react to it. Charles Swindoll

 There is only one way to avoid criticism: do nothing, say nothing, and be nothing. Aristotle

 Ask and it will be given to you; search, and you will find; knock and the door will be opened for you. Jesus

 The only person you are destined to become is the person you decide to be. Ralph Waldo Emerson

 Go confidently in the direction of your dreams. Live the life you have imagined. Henry David Thoreau

 When I stand before God at the end of my life, I would hope that I would not have a single bit of talent left and could say, I used everything you gave me. Erma Bombeck

 Few things can help an individual more than to place responsibility on him, and to let him know that you trust him. Booker T. Washington

 Certain things catch your eye, but pursue only those that capture the heart. Ancient Indian Proverb

 Believe you can and youre halfway there. Theodore Roosevelt

 Everything youve ever wanted is on the other side of fear. George Addair

 We can easily forgive a child who is afraid of the dark; the real tragedy of life is when men are afraid of the light. Plato

 

 If youre offered a seat on a rocket ship, dont ask what seat! Just get on. Sheryl Sandberg

 First, have a definite, clear practical ideal; a goal, an objective. Second, have the necessary means to achieve your ends; wisdom, money, materials, and methods. Third, adjust all your means to that end. Aristotle

 If the wind will not serve, take to the oars. Latin Proverb

 You cant fall if you dont climb. But theres no joy in living your whole life on the ground. Unknown

 We must believe that we are gifted for something, and that this thing, at whatever cost, must be attained. Marie Curie

 Too many of us are not living our dreams because we are living our fears. Les Brown

 Challenges are what make life interesting and overcoming them is what makes life meaningful. Joshua J. Marine

 If you want to lift yourself up, lift up someone else. Booker T. Washington

 I have been impressed with the urgency of doing. Knowing is not enough; we must apply. Being willing is not enough; we must do. Leonardo da Vinci

 Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless. Jamie Paolinetti

 You take your life in your own hands, and what happens? A terrible thing, no one to blame. Erica Jong

 Whats money? A man is a success if he gets up in the morning and goes to bed at night and in between does what he wants to do. Bob Dylan

 I didnt fail the test. I just found 100 ways to do it wrong. Benjamin Franklin

 Nothing is impossible, the word itself says, Im possible! Audrey Hepburn

 The only way to do great work is to love what you do. Steve Jobs

 If you can dream it, you can achieve it. Zig Ziglar

 Life isnt about getting and having, its about giving and being. 

 Whatever the mind of man can conceive and believe, it can achieve. Napoleon Hill

 Strive not to be a success, but rather to be of value. Albert Einstein
          and so on ....................
                 

So this is a very small example.It is actually killing an ant with an axe.The main theme of this post is to introduce dragline and make you familiar with that.Crawling is not a legal one so write spiders concerning the threats and benefits.Many python techniques were used like smart usage of list comphrensions and regex.Hope you enjoyed.Comment if you had any queries.

Dragline,a Samson jawbone for crawling the web

samsonite

Samson,a great hero from mythology killed all the enemys with a single Jawbone.Here we are having a snigle tool to slay all the problems of crawling. The olden days of pain and patience were gone. Now a new library is emerging, especially for writing powerful spiders for the web. This library will instantly turn you into an amazing spiderman who can play with the hyperlinks.This is not a tool for novice and using it enterprise level work can be done with ease.

What is Dragline?

Dragline is a powerful python framework to write our own spiders.It is even considered as a full time replacement for the other well-known  web scraping frameworks still evolved.

Dragline actually has many advantages than its ancestors in the same field of crawling.The main features are not going to be discussed here.You can find why dragline is more sophisticated than the other scraping frameworks by navigating to this link. Dragline features

Where to get it?

You can download Dragline from the official python repository https://pypi.python.org/pypi/Dragline

we can also install dragline with the following command if pip is installed in the system

$ sudo pip install –pre dragline

or

c:\  pip install –pre dragline

|||||||||||||||||||||||| 1.Introduction to dragline ||||||||||||||||||||||||||

Now we can begin our fun journey.I am going to show a real world example in an upcoming post, but now there are few important points to ponder.

What is a spider ?. A spider is a program that crawls through the web pages in a specified manner.Here specified manner is the way we askes our spider to run.You may wonder  that there are no good resources on crawling and even not a single orielly book on the subject of spiders and especially with python.

Many good crawling frameworks are ignored and it is known for a few developers(looked  few in such a large python community) who are really working in the enterprise industry.Is it a worthy issue to consider crawling.Yes obviously because crawlers are the main sources for creation of datasets and also for fetching information programmatically.

Then why this new framework emerged.There are some drawbacks for the existing crawling frameworks, if we are working with a huge projects.Young readers may be frustrated by my words like “project”,”enterprise”  but i am asking to take them light.I too don’t like them.Everthing should be plain.I used dragline to write spiders for many websites and it roughly takes 5 minutes to write a spider for a normal website.What an amazing speed!.

The complexity of crawling increases by some factors like:

a)javascript rendering of web pages

b)dynamically loading pages by scrolling down

c)The rejection of HTTP requests by the server  i.e. timeout

First two factors are  unavoidable and left for the genius of a programmar while crawling but last thing can be handled by library if it is smart.Dragline is good at pausing and resuming the connection if a server load is heavy.I want to keep the usage of dragline as a suspense for time being.But if you are inspired by my words you can check it right now.There is a good but not an extraordinary documentation available.But you might wonder how good that framework works once you understand it.

Thanks for listening patiently but i assure that dragline won’t disappoint you.We can meet next time with a  real-time spider that amazes you and makes your sleeves always up the shoulder.

If you wish any advancements to dragline you can contribute dragline’s github repository from here.

Dragline github

 

 

Up and running MongoDB with python.

Image

MongoDB is a NoSQL database which stores records as documents.MongoDB has some really nice, unique tools that are not (all) present in any other solution.

1.Indexing

MongoDB supports generic secondary indexes, allowing a variety of fast queries, and provides unique, compound, and geospatial indexing capabilities as well.

2.Stored JavaScript

Instead of stored procedures, developers can store and use JavaScript functions and values on the server side.

   3.Aggregation
MongoDB supports MapReduce and other aggregation tools.
4.Fixed-size collections
Capped collections are fixed in size and are useful for certain types of data, such as logs.
5.File storage
MongoDB supports an easy-to-use protocol for storing large files and file metadata.

Some features common to relational databases are not present in MongoDB, notably joins and complex multi row transactions. These are architectural decisions to allow for scalability, because both of those features are difficult to provide efficiently in a distributed system.

After installing the MongoDB please run following commands to start the mongoDB server.

$./mongod          for linux or mac users (or)

C:\>mongod       if windows users

Server runs default on port 27017.

next if you want to perform raw database operations open another terminal and type

$./mongo (or) C:\>mongo to launch client  database java script shell.

Actually the documents stored in the MongoDB are in the format of BSON.Python developers only deals with storing  Dictionaries as documents into mongoDB but never look into BSON format,we doubt how things are going fine?.The solution is a driver called pymongo abstracts the conversion of python dictionary into MongoDB valid BSON document.So now we should install pymongo to integrate python and mongodb.

First we can look into how basic MongoDB client java script shell works.

initially the shell looks like this:

MongoDB shell version 2.2.7

connecting to: test

>

> x = 200
200
> x / 5;
40

> “Hello, World!”.replace(“World”, “MongoDB”);
Hello, MongoDB!

> function factorial (n) {
… if (n <= 1) return 1;
… return n * factorial(n – 1);
… }
> factorial(5);
120

All the above are java script expressions and its functions.So it is a full fledged interpreter.

If you have a database called ‘student’ then we can switch from test db to student db as

>use student

switched to db student

 

Insertion into MongoDB

First we can create a post and insert it into database.

> post = {“title” : “Naren”,
… “ID” : “10C91A05B3″,
… “date” : new Date()}

>db.student.insert(post)

Now the record post is saved in the student database.

 

Retrieving the record from MongoDB

>db.student.find()

this command returns all the documents in student database.It is equivalent to ‘select * from student’ in SQL.

>db.student.findOne()

this command returns only first document from the student database.we can also use criteria for fetching documents.

Note: _id is the default field that will be added by MongoDB for each document we inserts.By fetching you will see additional field in addition to “title”,”ID”,”date”.In this way MongoDB provides default primary key for each document.

If you want to select a record whose name is Naren.then use the following command.

>db.student.findOne({“title”:”Naren”})

 

Updating a document in MongoDB

case 1: Adding new field

If we want to insert new field percentage into document then first do

>post.percentage=83

>db.student.update({“title”:”Naren”},post)

Now document looks like

>db.student.findOne(“title”:”Naren”)

{“title”:”Naren”,”ID”:”10C91A05B3″,

“date”:”Sunday May 18 2014 09:18:22 PM”,

“percentage”:83}

case 2: modifying existing record

>db.student.update({“title”:”Naren”},{“$set”:{“ID”:”10C91A0501″}})

by above command the database search for document whose title is Naren and sets its ID value to 10C91A0501.

‘$set’ is called as modifier,MongoDB is very rich in modifiers.Check MongoDB manual for complete list.

 

Deleting records in database

> db.student.remove({title : “Naren”})

it removes all the records with title:”Naren”

Note: If you want to remove entire student database collection,then don’t loop for each document.Simply use following command

>db.drop_collection(“student”)

Above command makes student database empty and it is preferred because of performance issues.

Jump starting the Python and MongoDB

First install the pymongo library for python interpreter.Use  “pip install pymongo” to do that.Once pymongo was installed now we are ready to play with mongoDB.


 

from datetime import datetime
import sys
from pymongo import Connection
from pymongo.errors import ConnectionFailure
def main():
    """connects to mongo db"""
    try:
        c=Connection(host="localhost",port=27017)
        print "Connected sucessfully"
    except ConnectionFailure,e:
        sys.stderr.write("Couldn't connect:%s"%e)
        sys.exit(1)
    dbh=c['mydb']
    assert dbh.connection==c
    user_doc={
        "username":"narenarya",
        "firstname":"naren",
        "secondname":"arya",
        "date_of_bitrh":datetime(1993,6,24),
        "email":"narenarya@live.com",
        "score":80}
    dbh.users.insert(user_doc,safe=True)
    print "successfully inserted doc:%s"%user_doc
    print "Now reteiving result"
    fan=dbh.users.find({"firstname":"naren"})
    for user in fan:
        print user["email"]
if __name__=="__main__":main()

  1. First you should start mongoDB server by typing $./mongod or C:\>mongod.
  2. Don't be afraid by seeing all those dirty imports.Just we required is Connection class from pymongo package.So 'from pymongo import Connection' is required.
  3. Above two steps are compulsory.Now we should get a Connection object to mongoDB so,c=Connection(host,port).Here host=”localhost”,port=27017 by default.
  4. Now we should get a database handle to perform CRUD operations,so dbh=c[‘mydb’].This fetch handle for database mydb.in this database we can store many collections(tables in SQL language).
  5. Now to insert a document called user_doc into collection ‘users’, we simply call dbh.users.insert(user_doc). we don’t need to create a new collection called users.MongoDB automatically creates users.
  6. By retrieving we will get a cursor object here it is “fan”.So for each document user in collection fan we are printing ‘email’ .All other imports are for error handling if connection fails,and datetime is for generating date for user_doc.
  7. This is just to give a quick start for you to switch to MongoDB,there are good books on the topic.                                                   For any queries please mail,me.narenarya@live.com

Valuable recipe’s of core Python we love to skip part1

Blink fast - Imgur

Recipe 1:   Accessing private methods in python

Python doesn’t support privacy directly, but relies on the programmer to know when it is safe to modify an attribute from the outside. After all, you should know how to use an object before using that object. It is, however, possible to achieve something like private attributes with a little trickery.

To make a method or attribute private (inaccessible from the outside), simply start its name with two underscores:
class Secretive:
    def __inaccessible(self):
        print “Bet you can’t see me…”

    def accessible(self):
        print “The secret message is:”
        self.__inaccessible()

Now __inaccessible is inaccessible to the outside world, while it can still be used inside the class (for example, from accessible):

>>> s = Secretive()
>>> s.__inaccessible()
Traceback (most recent call last):

File “<pyshell#112>”, line 1, in ?
s.__inaccessible()

AttributeError: Secretive instance has no attribute ‘__inaccessible’

>>> s.accessible()
The secret message is:
Bet you can’t see me…

Although the double underscores are a bit strange, this seems like a standard private method, as found in other languages. What’s not so standard is what actually happens. Inside a class definition, all names beginning with a double underscore are “translated” by adding a single underscore and the class name to the beginning:

>>> Secretive._Secretive__inaccessible
<unbound method Secretive.__inaccessible>
If you know how this works behind the scenes, it is still possible to access private methods outside the class, even though you’re not supposed to:
>>> s._Secretive__inaccessible()
Bet you can’t see me…

So, in short, you can’t be sure that others won’t access the methods and attributes of your objects, but this sort of name-mangling is a pretty strong signal that they shouldn’t.
If you don’t want the name-mangling effect, but you still want to send a signal for other objects to stay away, you can use a single initial underscore. This is mostly just a convention, but has some practical effects. For example, names with an initial underscore aren’t imported with starred imports.

Recipe 2: Dictionary methods you should know perfectly

    

    has_key

The has_key method checks whether a dictionary has a given key. The expression d.has_key(k) is equivalent to k in d. The choice of which to use is largely a matter of taste, althoughhas_key is on its way out of the language (it will be gone in Python 3.0).
Here is an example of how you might use has_key:

>>> d = {}
>>> d.has_key(‘name’)
False
>>> d[‘name’] = ‘Eric’
>>> d.has_key(‘name’)
True
    

    items and iteritems

The items method returns all the items of the dictionary as a list of items in which each item is of the form (key, value). The items are not returned in any particular order:

>>> d = {‘title': ‘Python Web Site’, ‘url': ‘http://www.python.org&#8217;, ‘spam': 0} >>> d.items()
[(‘url’, ‘http://www.python.org&#8217;), (‘spam’, 0), (‘title’, ‘Python Web Site’)]
The iteritems method works in much the same way, but returns an iterator instead of a list:

>>> it = d.iteritems()
>>> it

>>> list(it) # Convert the iterator to a list
[(‘url’, ‘http://www.python.org&#8217;), (‘spam’, 0), (‘title’, ‘Python Web Site’)]
Using iteritems may be more efficient in many cases (especially if you want to iterate over the result). For more information on iterators, see Chapter 9.
keys and iterkeys
The keys method returns a list of the keys in the dictionary, while iterkeys returns an iterator over the keys.
    

    pop

The pop method can be used to get the value corresponding to a given key, and then remove the key-value pair from the dictionary:

>>> d = {‘x': 1, ‘y': 2}
>>> d.pop(‘x’)
1
>>> d
{‘y': 2}
    

    popitem

The popitem method is similar tolist.pop, which pops off the last element of a list. Unlike list.pop, however,popitem pops off an arbitrary item because dictionaries don’t have a “last element” or any order whatsoever. This may be very useful if you want to remove and process the items one by one in an efficient way (without retrieving a list of the keys first):

>>> d
{‘url': ‘http://www.python.org&#8217;, ‘spam': 0, ‘title': ‘Python Web Site’} >>> d.popitem()
(‘url’, ‘http://www.python.org&#8217;)
>>> d
{‘spam': 0, ‘title': ‘Python Web Site’}

Although popitem is similar to the list method pop, there is no dictionary equivalent of append (which adds an element to the end of a list). Because dictionaries have no order, such a method wouldn’t make any sense.
    

    setdefault

The setdefault method is somewhat similar to get, in that it retrieves a value associated with a given key. In addition to the get functionality,setdefault sets the value corresponding to the given key if it is not already in the dictionary:

>>> d = {}
>>> d.setdefault(‘name’, ‘N/A’)
‘N/A’
>>> d
{‘name': ‘N/A’}
>>> d[‘name’] = ‘Gumby’
>>> d.setdefault(‘name’, ‘N/A’)
‘Gumby’
>>> d
{‘name': ‘Gumby’}

As you can see, when the key is missing, setdefault returns the default and updates the dictionary accordingly. If the key is present, its value is returned and the dictionary is left unchanged. The default is optional, as with get; if it is left out,None is used:

>>> d = {}
>>> print d.setdefault(‘name’) None
>>> d
{‘name': None}

    

    update


The update method updates one dictionary with the items of another:

>>> d = {
‘title': ‘Python Web Site’,
‘url': ‘http://www.python.org&#8217;,
‘changed': ‘Mar 14 22:09:15 MET 2008′}
>>> x = {‘title': ‘Python Language Website’}
>>> d.update(x)
>>> d
{‘url': ‘http://www.python.org&#8217;, ‘changed':
‘Mar 14 22:09:15 MET 2008′, ‘title': ‘Python Language Website’}

The items in the supplied dictionary are added to the old one, supplanting any items there with the same keys.
The update method can be called in the same way as the dict function (or type constructor), as discussed earlier in this chapter. This means that update can be called with a mapping, a sequence (or other iterable object) of (key, value) pairs, or keyword arguments.
values and itervalues
The values method returns a list of the values in the dictionary (anditervalues returns an iterator of the values). Unlike keys, the list returned by values may contain duplicates:

>>> d = {}
>>> d[1] = 1
>>> d[2] = 2
>>> d[3] = 3
>>> d[4] = 1
>>> d.values()
[1, 2, 3, 1]

Recipe 3: How nested functions in python can be used productively?

Python functions may be nested—you can put one inside another.2 Here is an example:
def foo():
    def bar():
        print “Hello, world!”
    bar()

Nesting is normally not all that useful, but there is one particular application that stands out: using one function to “create” another. This means that you can (among other things) write functions like the following:

def multiplier(factor):
    def multiplyByFactor(number):
        return number*factor
    return multiplyByFactor

One function is inside another, and the outer function returns the inner one; that is, the function itself is returned—it is not called. What’s important is that the returned function still has access to the scope where it was defined; in other words, it carries its environment (and the associated local variables) with it!

Each time the outer function is called, the inner one gets redefined, and each time, the variable factor may have a new value. Because of Python’s nested scopes, this variable from the outer local scope (of multiplier) is accessible in the inner function later on, as follows:

>>> double = multiplier(2)
>>> double(5)
10
>>> triple = multiplier(3)
>>> triple(3)
9
>>> multiplier(5)(4)
20

A function such as multiplyByFactor that stores its enclosing scopes is called a closure. Normally, you cannot rebind variables in outer scopes. In Python 3.0, however, the keyword nonlocal is introduced. It is used in much the same way as global, and lets you assign to variables in outer (but nonglobal) scopes.

Recipe 4: Launching a webbrowser from a python program

The os.system function is useful for a lot of things, but for the specific task of launching a web browser, there’s an even better solution: the webbrowser module. It contains a function called open, which lets you automatically launch a web browser to open the given URL. For example, if you want your program to open the impythonist web site in a web browser (either starting a new browser or using one that is already running), you simply use this:

import webbrowser
    webbrowser.open(“http://www.impythonist.co.nr&#8221;)

The page should pop up. Pretty nifty, huh?
        We will discuss lot more.Stay tuned. 

How to use python expressive power for solving a level-5 challenge

Hello Friends.It took too long for me to post new article.But Impythonist came up with with really interesting Insight this time.

Ok,what is title refer to?.We all done programming days and nights,why we stumble if new challenge is given to us?.
This article is not intended for Hardco(ders)re,but main focus is on beginners & intermediate ones.

“The coding is an art which rises from deep of the heart”

So here we are going to see an use case challenge which arrives frequently in local or web code contests.We can use any programming
language but as a pythonist i was privileged to use python,a beautiful and elegant programming language.
The problem goes as follows:

CHALLENGE

“”” You are provided with a text file containing a list of integers.

Let S be the longest subsequence of consecutive increasing integers in this list,
such that the element at position n is greater than the element at position n-1.
If there are multiple sub-sequences of the same length, let S be the one that appears first in the list.
What is the absolute value of the sum of the elements in S?

For example, take the list [3, 4, 2, 5, 7, 7, 1, 2].
For this list, the longest subsequence of increasing integers has 3 elements: S = [2, 5, 7].
The absolute value of the sum of these elements is |2+5+7|=14
text file is below:

https://app.box.com/s/7fjlraqe5dm6ik54cbgg

Step 1: Understand The Problem

It is not a simple question but thinking a bit could lead us to create theories for attacking the problem.
If we first see the question we will come up with an idea of dynamic programming.But only a little applicaton of dynamic
programming is required here.

step 2: Run the python Interpreter

Now assume how the beginning of the solution for the challenge could be!.what data types are required?,what python concepts
are involved in this.

By little observation from the question statement we draw following conclusions:

1.Usage of file handling concept in python
2.String handling is required little bit.
3.List manipulation is necessary and nothing more

Now go and get some coffee and rock on sleeves before knowing a technique.

step 3: No more steps.Jumping right on to the code.

#My Version Of Solution to The Subsequence Challenge

 

1. fil=open(“num.txt”,”r”).readlines()
2. mylo=[]
3. for l in fil:
4.     mylo.append(int(l))
5. long_seq,i,temp,max_seq_index=0,0,1,0
6. while i<len(mylo):
7.     if long_seq>temp: temp,max_seq_index=long_seq,i-1
8.     long_seq,j=1,i
9.     while j!=len(mylo)-1 and mylo[j]<mylo[j+1] :
10.         long_seq+=1
11.         j+=1
12.     i+=1
13. total=abs(sum(mylo[max_seq_index:max_seq_index+temp]))
14. print “The longest sequence no in File is: %d and sum is %d”% (temp,total)

 

Just observe the above code.It is complete solution to the problem.Your feet will ache if you try to find
the solution in the other traditional languages.Don’t be panicked by seeing code,we are going to discuss each and every line.
line 1:
fil=open(“num.txt”,”r”).readlines()

It is simple.we are opening a file called num.txt in reading mode.readlines() method of file object returns a ‘list’ of
strings for all lines in the file.Here num.txt is in current directory.path can be specified instead.

line 2:
mylo=[]

Here we are planning to capture each line as a single value and store it into list.So create an empty list first.

line 3,4:
for l in fil:
mylo.append(int(l))

Now for each element in the returned list ‘fil’ we are casting each value to int and stores it into list mylo.
By this a list is set with values to work on.Actual logic starts from next lines.

line 5:
long_seq,i,temp,max_seq_index=0,0,1,0

In this line we are packing(assigning) four variables in a single line.see what the variables mean.

long_seq ———> This variable going to store maximum length of items satisfied condition for each iteration.
It may seem alien,but things will settle in some time.

i ———-> This variable is used for iteration of each element of list.

temp ———-> This variable is used to store the longest length of subsequence found.It means it consists
of part of answer i.e.what is the length of maximum matched elements satisfying required conditions.
s[i]<s[i+1].It is initialized with 1 because at least one element satisfy given condition.

max_seq_index —–> This variable holds the index for starting of maximum subsequence.This is later used by us to find out
the sum of all elements matching criteria s[i]<s[i+1] in longest subsequence.

line 6:
while i<len(mylo):

Here we are iterating from first element in list ‘mylo’ up to last element.

line 7:
if long_seq>temp: temp,max_seq_index=long_seq,i-1

Here comes the tricky part.Here this construct is written by seeing into future.That is what dynamic programming.It states if current
found sequence length is greater than previously calculated one then modify temp,and store index where we find the maximum subsequence
length into max_seq_index.why it is i-1 and not i because,we are looking into future.It will be clarified in few minutes.

line 8:
long_seq,j=1,i

long_seq is a temporary variable for each iteration.It is a sentinel which works for temp.It is analogous as follows.If money collected
by long_seq is greater temp will own it,other wise temp doesn’t care.Now long_seq is sent into battlefield with initializing one rupee.If
earn in the coming battle in next loop,it gets the longest sequence which is later fed to temp.j is initialized to i in order to find any
sub-sequence is availing for greater length.

line 9-11:

while j!=len(mylo)-1 and mylo[j]<mylo[j+1] :
long_seq+=1
j+=1

j!=len(mylo)-1 is written in afraid of array index out of bounds chance for next condition mylo[j]<mylo[j+1].If these two are passed then j
skids to next higher value and long_seq bagged 1 point for satisfying the condition.This continues until longer subsequence satisfying
condition happens.If condition fails i restarts with next element.

line 11:

i+=1

Simple,this is used to iterate i for each element in ‘mylo’

line 12:

total=abs(sum(mylo[max_seq_index:max_seq_index+temp]))

Here total stores sum of all the elements in of the largest subsequence satisfying the condition.The temp value is useful for stripping the list
‘mylo’ and return only part of it, which is a longest subsequence.predefined sum() method adds stripped list and abs() method returns absolute
value converting negative into positive.

line 13:

print “The longest sequence no in File is: %d and sum is %d”% (temp,total)

As it is clear this line prints the longest subsequence length and sum as asked in challenge

In this way python constructs are highly optimized and used to write new libraries at lightening speed.Here dynamic programming approach might
confuse a little bit,but if you follow the execution path you will find easier how cleverly the problem was solved.
It is a little effort to enhance designing strategies for beginners.For further discussions feel free to mail me.

my mail address : narenarya@live.com