internet-of-things-9

Building a Virtual Personal Assistant with Telegram app and Telepot

Have you ever wondered what comforts a truly programmable app can give to a consumer. Many of us admire Internet of things( IOT) . So today I am going to talk about creating Personal Assistants (PA) for Telegram app, an application similar to whatsApp but fully programmable using it’s Bot API. We can pass messages to multiple devices in a click of eye. Telegram is the app which hackers likes because of it’s customization. Messaging apps should not only provide communication between humans but also should lay channels between humans and programmable machines. There are obvious advantages for programmers who use Telegram App.

  1. Get Github and Sentry messages directly to your app
  2. Get favorite tweets from Twitter
  3. Get updates from multiple information sources like weather or scores
  4. Control home appliances by sending pre-defined commands

Applications are endless. In this IOT generation you need a platform to program and Telegram API provides you that.

This tutorial will use two main ingredients.

  1. Telegram app
  2. Telepot python library

Target

Our goal is to build a PA for us using Bot API provided by Telegram app. We need to install the app

Telegram app on Google playstore

You can also have a Telegram client for your system. Mine is Ubuntu14.04. You can download all clients here.

Telegram desktop clients

I presume that you installed the telegram app. Now we need to create a bot for us. Telegram provides us a program called Bot Father using which we can create custom bots. Launch BotFather by visiting this link.

https://telegram.me/botfather

After adding BotFather to your chat list enter into it, you will see few options like this

 

Screenshot from 2015-12-06 21:31:56

now type /newbot and hit Enter. It will ask you for a name, give that. Next it will generate an API key, which we are going to use to build our Bot. Store this API key. It also gives a link to your bot. Visit that link to add it as one of your friends. Share it to others if you want to.

Telepot, a python client for Telegram

Telepot is a python client which is the REST wrapper for Telegram API. Using it we can take commands from user and compute something and give back results to the user. Now I am going to build a small bot program which does following things when below commands are given.

  1. /timeline -> Should fetch the latest tweets on my timeline
  2. /tweet=message  -> Should tweet my message on Twitter
  3. /chat  -> Should launch a virtual chat with machine
  4. /stopchat -> You are bored and stop chatting

These tasks might be simpler one. But you should be aware that if you know how to unleash the power of message passing between devices you can define your own custom tasks which can have greater value of application. Code for this application is at  https://github.com/narenaryan/Mika

Let us build the PA

First we need to install the sufficient libraries for constructing the Virtual Assistant. I name my PA as mika.

$ virtualenv  telegram-bot
$ source telegram-bot/bin/activate
$ pip install telepot tweepy nltk

We are installing telepot for sending and receiving messages from Telegram. NLTK for using it’s virtual chat engines. Tweepy is used for accessing twitter account through consumer keys. For now I am creating a simple bot command which returns “Hello, how are you?” when we say hello to it.

# pa.py
import telepot, time

def handle(msg):
    chat_id = msg['chat']['id']
    command = msg['text']
    print 'Got command: %s' % command

    if command == '/hello':
        bot.sendMessage(chat_id, "Hello, how are you?")

# Create a bot object with API key
bot = telepot.Bot('152871568:AAFRaZ6ibZQ52wEs2sd2XXXXXXXXX')

# Attach a function to notifyOnMessage call back
bot.notifyOnMessage(handle)

# Listen to the messages
while 1:
    time.sleep(10)

Run $ python pa.py

Now enter /hello in the bot channel you created. you will see the following output.

Screenshot from 2015-12-06 21:49:20

So our bot received our message and replied back to us with the greeting. It is actually the Python code running under the hood managing those tasks. Code is very simple. We need to

  • Create  Bot object using API key
  • Create a function for handling commands and returning information
  • Attach the above function to call back handler of Bot. Whenever bot receives a message this function handler executes. We can have any logic in those handlers.

You can see what all inputs you can accept and types of outputs you can send to users from bot here. telepot github link

For now let us Integrate twitter and chat engines of NLTK into our bot. We all know that NLTK comes with few chat engines like Eliza, Iesha,Zen etc. I am here using a chatbot called Iesha. Before I create a file called tweep.py for managing my tweet and timeline fetch tasks.

# tweep.py
import tweepy

#I prepared this class for simplicity. Fill in details and use it.
class Tweet:
    #My Twitter consumer key
    consumer_key='3CbMubgpZvXXXXXXXXXX'
    #My consumer secret
    consumer_secret='Clua2xLNfvbjj3Zoi4BQU5EXXXXXXXXXXX'
    #My access token
    access_token='153952894-cPurjdaQW7bA3B3eXXXXXXXXXXXX'
    #My access token secret
    access_token_secret='r6NJ6qjPrYDenqwuHaop1eBnXXXXXXXXXXXXX'

    def __init__(self):
        self.auth = tweepy.OAuthHandler(self.consumer_key,self.consumer_secret)
        self.auth.set_access_token(self.access_token, self.access_token_secret)
        self.handle = tweepy.API(self.auth)

        def hitme(self,str):
            self.handle.update_status(str)
            print 'tweet posted succesfully'

Now let me finish the show with adding both chatting and Tweeting.

import telepot, time
from nltk.chat.iesha import iesha_chatbot
from tweep import Tweet

# create tweet client
tweet_client = Tweet()
is_chatting = False

def handle(msg):
    global is_chatting
    global tweet_client
    chat_id = msg['chat']['id']
    command = msg['text']
    print 'Got command: %s' % command
    
    if command == '/timeline' and not is_chatting:
        bot.sendMessage(chat_id, '\n'.join([message.text for message in tweet_client.handle.home_timeline()]))
    elif command.split('=')[0] == '/tweet' and not is_chatting:
        try:
            tweet_client.hitme(command.split('=')[1] + ' #mika')
            bot.sendMessage(chat_id, 'Your message tweeted successfully')
        except:
            bot.sendMessage(chat_id, 'There is some problem tweeting! Try after some time')
    elif command == '/chat':
        is_chatting = True
        bot.sendMessage(chat_id, 'Hi I am Iesha. Who are You?')
    elif command == '/stopchat':
        is_chatting = False
        bot.sendMessage(chat_id, 'Bye Bye. take care!')
    elif not command.startswith('/') and is_chatting:
        bot.sendMessage(chat_id, iesha_chatbot.respond(command))
    else:
        pass


# Create a bot object with API key
bot = telepot.Bot('152871568:AAFRaZ6ibZQ52wEs2sd2Tp4Wcs-IXoWfA-Q')

# Attach a function to notifyOnMessage call back
bot.notifyOnMessage(handle)

# Listen to the messages
while 1:
 time.sleep(10)

So output screens will be like this for /chat and /tweet

Screenshot from 2015-12-06 23:41:21

For /timeline

Screenshot from 2015-12-06 23:42:32

Isn’t it funny. We can add lots of features to this basic personal assistant bot like

  1. Tracking time
  2. Scheduler and alert
  3. Notes taking etc
  4. Opening your garage gate when you push a command to bot

If you observe the code there isn’t much in that. I just used Elisha chat bot from NLTK. Used tweepy methods to fetch timeline and post a tweet. If you want to use code visit my repo.

https://github.com/narenaryan/Mika

Thanks for reading this stuff. Hope it will be helpful for you to build your own IOT.

tennis_ball_by_chopshopstuk-d385m7m

Visualize Tennis World with Plot.ly,Docker and Pandas

Namaste everyone. Today I am going to do a small experiment on the tennis game history. I questioned my self  “how I can know the facts of tennis without asking others?”. “What if I generate them myselves?”. So I tried to visualize and see what are the top countries producing majority number of Tennis players. But here I don’t want to go straight forward into solution. Rather we will discuss about few things which are useful in constructing a universal visualization lab. I want to use this article to introduce plot.ly, a plotting library in Python.

What I finally visualized in the experiment

I wanted to find out what countries are having large number of players in professional ATP tennis.

 

Western countries are occupying top list in producing tennis players in ATP history.

Western countries are occupying top list in producing tennis players.

We can solve many other queries like:

“How well players are performing in their respective ages?”

“Which country is producing more quality players?”

and more. But I am going to show you how we can visualize and bring solution like above one.

For downloading the Ipython notebook visit this link. https://github.com/narenaryan/tennis_atp/blob/master/most_player_countries.ipynb 

Building a Python data visualization lab in the docker

Folks, you may be wondering why I brought docker into picture. I am discussing about docker because it is an advantage for a data analyst or a developer to isolate his job with other stuff. I need to write 100 articles showing setup procedure in 100 operating systems. But docker allows us to create an identical container in any operating system we are working with. I will show now how to build a complete scientific python stack from scratch in a docker container. You can store it as a package which you can also push to cloud via dockerhub. So let us begin.

I hope you know something about docker. If not just read my previous article here. Docker up and running

Step 1

$ docker run -i -t -p 0.0.0.0:8000:8000 -p 0.0.0.0:8001:8001 -v /home/naren/pylab:/home/pylab ubuntu:14.04

By this a Ubuntu14.04 container will be created with two ports open. 8000,8001.We can use these ports to forward Ipython notebook to host browser in our visualization procedure later. It also mounts the pylab folder in my host /home directory to /pylab in container.  When you run this, you will be automatically enter into the bash shell of the container.

Step 2

Now install required packages as below.

root@ffrt76yu:/# apt-get update && apt-get upgrade
root@ffrt76yu:/# apt-get install build-essential
root@ffrt76yu:/# apt-get install python python-pip python-dev
root@ffrt76yu:/# pip install pandas ipython jupyter plotly

That’s it.  Pandas will install numpy and matplotlib as deapendencies. We are now ready with our development environment for visualizing anything. We can launch a Ipython notebook using this command.

s ipython notebook --ip=0.0.0.0 --port=8000

So now we have a running Ipython notebook on port 8000 of our local machine. Now fire up your browser and you will find notebook software is running on it. select new “python 27” project in the top right menu.

If you don’t want all the pain, just pull my plotting environment from docker hub.

$ docker run -i -t -p 0.0.0.0:8000:8000 -p 0.0.0.0:8001:8001 -v /home/naren/pylab:/home/pylab narenarya/plotlab

Beginning of the visualization

plot.ly is a library which allows us to create complex graphs  and charts using numpy and pandas. We can load a dataset into a dataframe using pandas. Then we will plot the cleaned data using plot.ly. Full documentation of plot.ly can be found at: https://plot.ly/python/

For my work I used Jeff Sachmann’s ATP tennis dataset from github. https://github.com/JeffSackmann/tennis_atp 

Extract all data set files to your pylab so that it is visible to your notebook. We here are interested in the atp_players.csv. We first clean data to find out how many players belong to a single country and map them on a scatter plot. Code looks like this.

from random import shuffle
import colorsys
import pandas as pd
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *

init_notebook_mode()

# Load players into players dataframe 
players = pd.read_csv('atp_players.csv')

# Find top 20 countries with more player frequncies 
countries = players.groupby(['Country']).size()
selected_countries = countries.sort_values(ascending=False)[:20]

# Generating 20 random color palettes for plotting each country.
N = 20
HSV_tuples = [(x*1.0/N, 0.5, 0.5) for x in range(N)]
RGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)
shuffle(RGB_tuples)

""" Plot.ly plotting code. A plot.ly iplot needs data and a layout 
    So now we prepare data and then layout. Here data is a scatter plot
"""
trace0 = Scatter(
    x = list(selected_countries.index),
    y = list(selected_countries.values),
    mode = 'markers',
    marker = {'color' : plot_colors, 'size' : [30] * N}
)

# Data can be a list of plot types. You can have more than one scatter plots on figure 
data = [trace0]

# layout has properties like x-axis label, y-axis label, background-color etc
layout = Layout(
    xaxis = {'title':"Country"}, # x-axis label
    yaxis = {'title':" No of ATP players produced"}, # y-axis label
    showlegend=False,
    height=600, # height & width of plot
    width=600,
    paper_bgcolor='rgb(233,233,233)', 
    plot_bgcolor='rgb(233,233,233)', # background color of plot layout
)

# Build figure from data, layout and plot it.
fig = Figure(data=data, layout=layout)
iplot(fig)

There is nothiing facny in the code. We just did the following things:

  • Loaded ATP players dataset into Pandas Dataframe
  • We need to assign different random colors to each country. So created random RGB values
  • Created a Scatter kind of plot with markers mode
  • Created a layout with axis details
  • plotted data and layout using iplot method of plotly library.

When I run this code in Ipython notebook (Shift + Enter). I will see the scatter plot given in the beginning of article.

For full documentation on all kinds of plots visit this link. https://plot.ly/python/

This is only one visualization from dataset. You can draw so many analytics from all the datasets provided in the git repo. One obvious advantage here is you are doing this entire thing in a docker container. It will be faster and easy to overcome failure of environments. You can also commit your container to a docker image.

For downloading my Ipython notebook visit this link. https://github.com/narenaryan/tennis_atp/blob/master/most_player_countries.ipynb 

my email address is: narenarya@live.com . Thanks to all.

snake_python_color_head_51750_3840x2160

Lessons I learnt in quest of writing beautiful python code

Hello everyone. I always wonder what are the good practices in developing software in Python. I am young and  inexperienced few years back. But  people around me and situations I faced from past few years had taught me many things. Many things about coding style, good development patterns etc. Here I am going to discuss few things which are important to turn your normal coding style into an elegant one. These things are collected from my own , others code reviews.

If you keep all these points in mind  from tomorrow you will see a different aspect of coding. Thanks for my inspirational man, Chandra -Software Architect @ Knowlarity Communications for reviewing my code and giving valuable tips with his vast software development experience. Let us see how not to write code.

* Your code is a Baby. Protect it with Exception Handling

A Software or program fails when it accepts the wrong input. A good developer always handles his piece of code. No one can guess all possible bugs that creeps in. In statically typed languages like C, C++ type system enforces the kind of information passed to a variable. But in dynamic languages like Python and Ruby there are many chances of failure of a program due to entry of incorrect type. Duck typing is a comfort. But it comes with expense of more careful error handling. Here I always wrap my code in TRY | EXCEPT blocks. If you know what type of error you might encounter, it is easy to make your code function properly. At least it won’t break your code. Let us see the first illustration of handling JSON data.

import json

def handle_json(data_string):
    parsed_data = json.loads(data_string)
    return parsed_data

A newbie of Python just leaves the above code and thinks his job was finished. But code may break if ill formed JSON is passed through handle_json function. So it is better to handle error.

import json

def handle_json(data_string):
    try:   
        parsed_data = json.loads(data_string)
    except:
       return {}
    return parsed_data

This is basic error handling. It will turn into a good practice if we log a message when error occurs. Handling specific error will do more good.

import json

def handle_json(data_string):
    try:   
        parsed_data = json.loads(data_string)
    except ValueError as e:
        logger.info("Error occured: %s" % e.message)
        return {}
    return parsed_data

So never think error handling as an add on. It is a compulsory thing when writing software for reliable systems.

* Never put magic numbers in the code

It is common for us to use constants in the programs. We define few things as mapping to sequence of numbers. Enumerate data type is an example. It gives us a range of named constants. So use name of the constant instead of constant itself.

fruit = int(raw_input("1.Apple\n2.Mango\n3.Gauva\n4.Grape\n5.Orange\nEnter your favorite fruit: "))
if fruit == 1:
    print "Fruit is Apple"
elif fruit == 2:
    print "Fruit is Mango"
elif fruit == 3:
    print "Fruit is Gauva"
.....
else:
    print "Fruit is not available"

It is just a simple program which inputs a number and uses that input to select fruit type. But when one sees the code, he will be wondered what those 1,2,3 means. English names convey better messages than mere numbers. So good practice is not to hard code anything. Instead use your own Enum type to map meaningful names to Numbers.

class Fruit(object):
    APPLE, MANGO, GAUVA, GRAPE, ORANGE = range(1,6) 
 
    @classmethod
     def tostring(cls, val):
         """String representation of a Fruit type."""
         for k, v in vars(cls).iteritems():
             if v == val:
                 return k
fruit = int(raw_input("1.Apple\n2.Mango\n3.Gauva\n4.Grape\n5.Orange\nEnter your favorite fruit: "))
print "The fruit is: %s" % (Fruit.tostring(Fruit.APPLE)).capitalize()

See by building our own enumeration, we are able to transform a hard coded program into beautiful, meaningful one. Here we defined a class to store named constants. We are reverse looking a key from value using our tostring method. Never ever put magic numbers in the code because in larger systems it creates ambiguity. Code is for humans first and for computers next.

* Best ways of working with a dictionary

Many of us will be working with dictionaries in Python as frequently as we take a sip of coffee. When carefully observed beginner developers usually have a habit of accessing a dictionary value using bracket method.

students =  {1: "Naren", 2: "Sriman", 3:"Habeeb"}
print students[1]

Everybody does that, you might wonder. Yes it is the most trivial way of accessing a value from a dictionary. But as we shouted in our first tip, you should handle error when you try to query dictionary for non-existing key. Like this you can say

students =  {1: "Naren", 2: "Sriman",  3:"Habeeb", 4:"Ashwin"}
try:
    print students[1]
except KeyError as e:
    print None

Instead of doing all these things we can do one straight operation called GET on a dictionary. Python will return you value for a key if key exists else returns None.

students =  {1: "Naren", 2: "Sriman", 3:"Habeeb", 4:"Ashwin "}
print students.get(1)
# This prints None
print students.get(101)

But in my beginning of development career, I used to mix both the ways in a program which looks pretty awkward. So my advice is to use get function or bracket [] method according to your personal taste but two things.

  • Using get gives you automatic error handling
  • Keep your program uniform.

One more useful case is when you are processing a dictionary and want’s to update an existing dictionary with the new one. Many people does this thing.

students =  {1: "Naren", 2: "Sriman", 3:"Habeeb", 4:"Ashwin "}
new_students = {5: "Tony", 6:"Srikanth", 7:"Rajesh"}
# A trivial way to add new students to students map
students[5] = new_students[5]
students[6] = new_students[6]
students[7] = new_students[7]
But there is a handy method called  UPDATE on any python dictionary. It allows us to merge the second dictionary with the first.
students =  {1: "Naren", 2: "Sriman", 3:"Habeeb", 4:"Ashwin "}
new_students = {5: "Tony", 6:"Srikanth", 7:"Rajesh"}
students.update(new_students)
This function is crisp because it is avoiding lot of typing. And also it makes program looks cleaner.

* Always do Validation of the data first and then pre-processing

My context here is, many fellow programmers do return empty (None) from a function when they found data is invalid to proceed. But they do lot of pre-processing before checking for validity. Here computation is wasted. It is illogical for a program to spend time in doing useless things and checking whether it use useful or ignored. It may seems fine for many people but handling this design pattern cleverly can have huge impact on code performance.

 

valid_data = [1,2,3,4,5]
def process(value):
    new_value = preprocess(value)
    if value not in valid_data:
        return None
    return new_value

def preprocess(value):
    # Do a heavy computation task
    return value

print process(23)
In less code situations we will capture the inefficiency of checking condition last. I always feel of losing my common sense, if I see above mistake in my code later. I always check conditions in the first line of function and then do anything with that data. So process function should be like this
def process(value):
    # Make habit of filtering in first line itself
     if value not in valid_data:
         return None
    # Now do whatever you want
    new_value = preprocess(value)
    return new_value

I write code for a telephony company where product is built upon thousands of lines of legacy python code. There performance is critical. If I design one procedure using above mistake, it will have a huge business impact on the product. Even few seconds delay is not bearable. So keep in mind. Always return invalid cases and then do pre-processing.  

* Avoid trivial conditionals in code

This is not actually a mistake but a very good practice to avoid lot of IF and ELSE blocks in the code.

def  is_even(value):
    if value % 2 == 0:
        return True
    else:
        return False
print is_even(4)

But observing carefully we can remove else here because it is trivial that if condition is True, control won’t stay any more in the function. So we can modify code to

def  is_even(value):
    if value % 2 == 0:
        return True
    return False
print is_even(4)

So remember this as a thumb rule. “Always  try to use a single conditional when there is a truth checking and take another as trivial thing“.

* Other notable points

In addition to above points, there are few other important things.

  • Touch maximum level of abstraction by placing common logic on top level of code and specific implementations on bottom.
  • Follow PEP-8 and PEP-257. It will make code more readable. I hated it first but now loving the structure of code.
  • Make sure of doc strings of classes and methods conveying the right message in a Python program.
  • In ORM like Django or SQLAlchemy use filter rather than Get because the former one is safe. FILTER always return empty list, GET throws duplicate error which you should  handle explicitly.
  • Make a habit of removing print statements and debuggers before committing the code to GIT.
  • When you add a new feature, please do write a unit test case. It will help a new developer in understanding functionality of class or procedure you had defined.
  • Never push code without developer testing.

Once again thanks for my inspirational man, Chandra -Software Architect @ Knowlarity Communications and Mohammed Habeeb  for reviewing my code and giving valuable tips with their vast software development experience.

base62

Building your own URL shortening service with python and flask

Have you ever wondered how people create URL shortening websites. They just do it using common sense. You heard it right. I too thought it is a very big task. But after thinking a bit, I came to know that simple mathematical concepts can be used in writing beautiful applications. What is the link between mathematics and URL shortening?. That is what we are going to unveil in this article.

In a single statement URL shortening service is built upon two things.

  1.  String mapping Algorithm to map long strings to short strings ( Base 62)
  2.  A simple web framework (Flask, Tornado) that redirects a short URL to Original URL

There are two obvious advantages of URL shortening.

  1. Can remember the URL. Easy to maintain.
  2. Can use the links where there are restrictions in text length Ex. Twitter.

Technique of URL shortening

There is nothing like URL shortening algorithm. Under the hoods, every record storing in the database is allocated with one Primary Key(PK).  That PK is passed into an algorithm which in turn generates a string. We will indirectly map that short string with the URL that customer registers with us.

I visit website of Bit.ly and pass my blog link http://www.impythonist.wordpress.com to it. Then I got this short link.

Screenshot from 2015-10-31 18:29:14

Here one question comes to our mind. How they reduce lengthy string to a short one? .  They are not actually reducing size of original link.They just do abstraction here. Steps every one do are:

  • Insert a record with URL into database
  • Use the record ID returned to generate the short string
  • Pass it back to Customer
  • Whenever you receive a request, then extract short string from URL and re-generate Database record ID -> Fetch the URL -> Simple Redirect to Website

base62

That’s it. It is very simple to generate a short string from a given large number using Base62 Algorithm. Whenever a request comes to our website,  we can get back the number by decoding the short string from URL. Then use that number ID to fetch record from database and redirect to that URL.

Let us build one such URL shortener in Python

Code for this project is available at my git repo. https://github.com/narenaryan/Pyster

As I told you before there are three ingredients in preparing a URL shortening service.

  • Base62 Encoder and Decoder
  • Flask for handling requests and redirects
  • SQLite3 for serving the purpose of database

Now If you know about converting Base10 to Base64 or Base62( any base) then you can proceed with me. Other wise just see what are base conversions here.

http://tools.ietf.org/html/rfc3548.html

I here interested only in Base62 because I need to generate strings which are combinations of [a-z][A-Z][0-9].  Encoder maps integer to a string. Decoder generates integer from given string.  They are like Function and Reverse Functions. This is the Base62 code for encoder and decoder in Python

from math import floor
import string

def toBase62(num, b = 62):
    if b <= 0 or b > 62:
        return 0
    base = string.digits + string.lowercase + string.uppercase
    r = num % b
    res = base[r];
    q = floor(num / b)
    while q:
        r = q % b
        q = floor(q / b)
        res = base[int(r)] + res
    return res

def toBase10(num, b = 62):
    base = string.digits + string.lowercase + string.uppercase
    limit = len(num)
    res = 0
    for i in xrange(limit):
        res = b * res + base.find(num[i])
    return res
Now let me create a database called urls.db using the following command.
 $ sqlite3 urls.db

Now I am creating main.py  for flask app and a template file.

# main.py 

from flask import Flask, request, render_template, redirect
from math import floor
from sqlite3 import OperationalError
import string, sqlite3
from urlparse import urlparse

host = 'http://localhost:5000/'

#Assuming urls.db is in your app root folder
def table_check():
    create_table = """
        CREATE TABLE WEB_URL(
        ID INT PRIMARY KEY     AUTOINCREMENT,
        URL  TEXT    NOT NULL
        );
        """
    with sqlite3.connect('urls.db') as conn:
        cursor = conn.cursor()
        try:
            cursor.execute(create_table)
        except OperationalError:
            pass

# Base62 Encoder and Decoder
def toBase62(num, b = 62):
    if b <= 0 or b > 62:
        return 0
    base = string.digits + string.lowercase + string.uppercase
    r = num % b
    res = base[r];
    q = floor(num / b)
    while q:
        r = q % b
        q = floor(q / b)
        res = base[int(r)] + res
    return res

def toBase10(num, b = 62):
    base = string.digits + string.lowercase + string.uppercase
    limit = len(num)
    res = 0
    for i in xrange(limit):
        res = b * res + base.find(num[i])
    return res


app = Flask(__name__)

# Home page where user should enter 
@app.route('/', methods=['GET', 'POST'])
def home():
    if request.method == 'POST':
        original_url = request.form.get('url')
        if urlparse(original_url).scheme == '':
            original_url = 'http://' + original_url
        with sqlite3.connect('urls.db') as conn:
            cursor = conn.cursor()
            insert_row = """
                INSERT INTO WEB_URL (URL)
                    VALUES ('%s')
                """%(original_url)
            result_cursor = cursor.execute(insert_row)
            encoded_string = toBase62(result_cursor.lastrowid)
        return render_template('home.html',short_url= host + encoded_string)
    return render_template('home.html')



@app.route('/<short_url>')
def redirect_short_url(short_url):
    decoded_string = toBase10(short_url)
    redirect_url = 'http://localhost:5000'
    with sqlite3.connect('urls.db') as conn:
        cursor = conn.cursor()
        select_row = """
                SELECT URL FROM WEB_URL
                    WHERE ID=%s
                """%(decoded_string)
        result_cursor = cursor.execute(select_row)
        try:
            redirect_url = result_cursor.fetchone()[0]
        except Exception as e:
            print e
    return redirect(redirect_url)


if __name__ == '__main__':
    # This code checks whether database table is created or not
    table_check()
    app.run(debug=True)

 Let me explain what is going on here.
  • We have Base62 encoder and decoder
  • We have two functions one is index. Another one is short_url
  • Index function(‘/’) returns home page and also posts original URL into database
  • short url(‘/short_url’) just recieves the request for redirect and finally redirects shortened URL to Original URL. If you observe code carefully, you can easily grasp things.

We can also give look at template here. https://raw.githubusercontent.com/narenaryan/Pyster/master/templates/home.html .

Project structure looks this way.

Screenshot from 2015-11-01 01:50:01

Run the flask app on port 5000.

$ python main.py
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat......

If you visit http://localhost:5000 in your browser you will see

Screenshot from 2015-11-01 01:30:49

Now  enter URL to shorten and click submit. It posts data to database and generates short string like below image. In my case it is http://localhost:5000/f . The string seems to be very short, but as no of URLs registered increase the string increases gradually. Ex. 11Qxd etc

Screenshot from 2015-11-01 01:32:20

 Now if we click that link, it takes us to http://www.example.org
Screenshot from 2015-11-01 01:34:52
So this is how URL shortening work. For entire code, just clone my repo and give a try. https://github.com/narenaryan/Pyster
I hope you enjoyed the article. Please do comment if you have any query. Even you can mail me at narenarya@live.com
B7mY398IMAA1o-A

A primer on Database Transactions and Asynchronous Requests in Django

Hello, Namaste.  Today we are going to  look at few Django web framework cookies that makes our life more sweeter.  Let us learn few things which helps us implement the functionality when situation demands. The topics are following:

  1. Implementing Database transactions in  Django
  2.  Making asynchronous HTTP requests from Django code

1) Django DB Transactions

I am creating a REST API. I want to insert POST data  into database. But here a list is received in POST. I want to validate each element in list and  make sure to insert data. Here there are two rules to say this insertion operation atomic.

  • Insert data if all elements pass the validation criteria.
  • While inserting, if  there is duplicate data then abort the transaction  and return integrity error.

Demo project will be available at https://github.com/narenaryan/trans-cookie

Let me create a sample Django project to illustrate all the things we are going to discuss. I am doing this on Ubuntu14.04 Machine with Python2.7, Django1.8 and MySQL

$ virtualenv cookie
$ source cookie/bin/activate
$ pip install django==1.8.5 python-mysqldb

Now let us create a sample project called cookie

$ django-admin startproject cookie

Here in cookie I am going to create a view which takes a list of numbers and if all numbers are primes then it will store those numbers in db. If invalid prime or any duplicate entry  it aborts the operation.

$ django-admin startapp primer

Now do the following to create a model called Prime in primer app.

# primer/models.py

from django.db import models
import re

class Prime(models.Model):
    number = models.IntegerField(unique=True)
    def __str__(self):
        return str(self.number)
    def prime_check(self):
        if re.match(r'^1?$|^(11+?)\1+$', '1' * self.number):
            raise Exception('Number is not prime')

Prime_check is the function we defined to validate data before inserting into db. Always validate your data using a model class method. Now go and modify settings.py to change database to MySQL. Add primer app to settings.py and run migrations

 # cookie/settings.py
...

INSTALLED_APPS = (
 'django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'primer'
)

...
DATABASES = {
 'default': {
 'ENGINE': 'django.db.backends.mysql',
 'NAME': 'cookie',
 'USER': 'root',
 'PASSWORD': 'passme',
 'HOST': 'localhost',
 'PORT': '3306',
 }
}
...
$ python manage.py makemigrations
$ python manage.py syncdb

Now MySQL tables User,Prime will be created. Now let us create a url and view that takes list of numbers as POST and inserts into db, if all are primes. Now modify primer/urls.py and primer/models.py as below:

# primer/urls.py
from django.conf.urls import include, url
from django.contrib import admin
from primer import views

urlpatterns = [
   url(r'^admin/', include(admin.site.urls)),
   url(r'^supply_primes/$', views.supply_primes, name="prime")
]
# primer/views.py
from django.shortcuts import render
from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
from primer.models import Prime
import json 

# Create your views here.
@csrf_exempt
def supply_primes(request):
    if request.method == 'GET':
        return JsonResponse({'response':'prime numbers insert API'})
    if request.method == 'POST':
        primes = json.loads(request.body)['primes']
        #Validating data before inserting
        valid_prime = Prime()
        for number in primes:
            valid_prime.number = number
            try:
                valid_prime.prime_check()
            except Exception:
                message = {'error': {
                      'prime_number': 'The Prime number : %s \
                       is invalid.' % number}}
                return JsonResponse(message)
        return JsonResponse({"response":"data successfully stored"})

We can filter data before inserting anything. Integration error comes only when we insert data into db.

If we insert [11, 13, 15]  and next try to insert [14, 15, 13] then in the second case error will be returned while inserting 13 because it is duplicate.  But already 14, 15 are inserted. This is where transactions comes handy. Now we can modify code to

from django.shortcuts import render
from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
from primer.models import Prime
from django.db import transaction,IntegrityError
import json 

# Create your views here.

@csrf_exempt
def supply_primes(request):
    if request.method == 'GET':
        return JsonResponse({'response':'prime numbers insert API'})
    if request.method == 'POST':
        primes = json.loads(request.body)['primes']
        #Validating data before inserting
        valid_prime = Prime()
        for number in primes:
            valid_prime.number = number
            try:
                valid_prime.prime_check()
            except Exception:
                message = {'error': {
                      'prime_number': 'The Prime number : %s \
                       is invalid.' % number}}
                return JsonResponse(message)
         #Carefully look for exceptions in real time at inserting
        transaction.set_autocommit(False)
        for number in primes:
            try:
                Prime(number=number).save()
            except IntegrityError:
            # We got error. undo all previous insertions
                transaction.rollback()
                message = {'error': {'prime_number': 'This prime number(%s) is already registered.' % number}}
                return JsonResponse(message)
         # If everything is fine, Commit the changes and flush db
        transaction.commit()
        return JsonResponse({"response":"data successfully stored"})

 

The three statements I used for transactions

  • transaction.set_autocommit(False)
  • transaction.rollback()
  • transaction.commit()

First statement tells remove auto pilot and make it manual . Let me chose whether to save something or not

Second statement tells rollback whatever changes I did until last commit

Third statement pinpoints that commit and flush to db.

These statements gives us full control of storage pattern in databases. Without transactions you have one single statement Prime(number=number).save(),  which directly push changes to database. If we need to put something into DB through our own logic then use transaction library in Django.

Let us see it in action

Run Django web  server as below

  $ python manage.py runserver 0.0.0.0:8200

It runs our Django project on localhost with PORT 8200

Let us use fire postman to make a POST request to http://localhost:8200/supply_primes . You can also use CURL.

Screenshot from 2015-10-26 16:46:51

It is showing that data is successfully stored. Because all are primes. If we see the data.

Screenshot from 2015-10-26 16:50:04

Now let me try to insert [26, 13, 17]. Because 26 is not prime it returns me following response.

Screenshot from 2015-10-26 16:51:01

cool. Then try to insert [29, 13, 67]. If you observe we are trying to insert duplicate.

Screenshot from 2015-10-26 16:52:42

and database looks like

Screenshot from 2015-10-26 16:53:38

Here 29 is not inserted. It is inserted actually but rolled back when 13 generates IntegrityError. This is how transactions work.

2) Asynchronous Requests from the Django Code

Think that your Django code base is too large and slow. Some one is asking you to insert a hook in the code which posts some data to external URL. Then your django behaves more slower. If you are making 100 sequential requests then the last hook is executed after a long time. critical code should not be blocked because of side players.

The solution to overcome this problem is to make asynchronous non-blocking requests from the Django code.

* Synchronous code

import requests
res = requests.get('http://localhost:8200/supply_primes')
# some other django task
counter += 1

Here counter will be incremented after a successful or erratic request made by second statement. It means blocking request is making django to pause until request is processed.

* Asynchronous code

If we are using Python3 we have a wonderful library called asyncio to make parallel HTTP requests. visit this diligent link if you are using Python3. http://geekgirl.io/concurrent-http-requests-with-python3-and-asyncio/  . If you are using your Django projects with Python 2.7.x then carry on.

$ pip install requests-futures

This is the library which makes parallel requests through requests library of Python.

from requests_futures.sessions import FuturesSession
session = FuturesSession()
res = session.get('http://localhost:8200/supply_primes')
# Some other django task
count += 1

Here session.get won’t block the increment of counter. So your django code speeds up. Always use this library wherever you want to spawn a separate process for making HTTP requests.

For more details visit demo project at https://github.com/narenaryan/trans-cookie 

Five trivial things every python programmer should work with

 

Namaste everyone. This time I came up with  few sensitive suggestions which can effect our coding style. A good habit leads to a good output. If you are already working with the things I am going to mention, then you are on right track. Other wise you will sure gain something useful in few minutes.

1) Virtualenv

Yeah, the first important thing we should know about is working with virtual environments in python. I observed that lot of people are installing packages for their default python Interpreter. Separating the interpreting environment  always keeps things clean. We can work with different projects on same machine without conflicts using virtual environment. For installing virtualenv on Ubuntu -14.04 machine just do

$ sudo apt-get install python-pip
$ pip install virtualenv

Suppose I am working on Flask project, I create a virtual environment for that and install all dependencies for Flask. A virtual environment is created with command “virtualenv env_name”

# This creates a virtual environment called flask_env
$ virtualenv ~/flask_env

Now tell machine to drop default python interpreter and load this flask_env interpreter using

$ source ~/flask_env/bin/activate

Now you are in a separate world. Install packages using pip.

(flask_env)$ pip install flask requests

Now if you want to drop from virtual environment do, deactivate

# This command deactivates virtual environment's interpreter and loads default

(flask_env)$ deactivate

Hint: Always use Virtualenv to separate project environments.

2) IPython

Have you ever faced problem of hitting up arrow key for several times to collect nth previous command in python shell. Also you need to rush to Python API for knowing about properties and methods available in a package or module. Then you should use IPython. It is an interactive shell with tons of options. You can see method names, properties of any module on the fly. It is a tool that every programmer should have.  For installing IPython just use this command.

$ pip install ipython

there is another variation of IPython called Notebook where we can save our scripts as notebooks on web based interpreter. We can share them, use them.

You can launch IPython shell using ” ipython” command. To see the suggestion lookup for method names press TAB after entering the dot ( . )

Screenshot from 2015-10-11 19:54:56

Generally IPython is used for creating shorter scripts and testing language features . My favorite command is “%cpaste”. Using it I can copy code directly into terminal without losing the indentation.  In conventional python shell pasting and formatting is painful. For more details visit this link  https://github.com/ipython/ipython  

3) Anaconda sublime plugin

Screenshot from 2015-10-11 20:16:26

If you are writing shorter scripts and testing them IPython is sufficient. But if you want a full fledged  python editor with following features:

  • Automatic code completion
  • PEP-8 and PEP-257 checking and reporting

Then you should use Anaconda plugin with Sublime Text . Sublime Text 3 is a great editor for python development. It is fluid, takes less resources and can handle any kind of file without pain. Combining  [ Anaconda plugin + Sublime Text 3 ] = Python IDE . You can see how to setup plugin using package control here.

http://damnwidget.github.io/anaconda/

4) IPdb

One more common thing I observe in python beginners is not using any debugger while testing their code. Python is interpreted language. It executes line by line. But still in big projects with various function calls, we do not  get the actual code flow. We all know classic debugger in python called Pdb. IPdb is a combination of IPython + Pdb (Interactive Python Debugger).

Using IPdb we can set break points anywhere in our code using one single statement.

import ipdb;ipdb.set_trace()

Insert above statement into your python code. When program executes, control stops at the above line. From then you can go line by line and inspect variables etc to debug code. I am listing the primary keys used for debugging here.

  • n – execute next line
  • c – execute remaining program
  • r – continue execution until current function returns

For more details about IPdb and debugging visit this link. http://georgejhunt.com/olpc/pydebug/pydebug/ipdb.html

5)  Logging

I saw people  putting print statements many times to debug the code and to write information on console. Logging information on console is a very bad practice. Python provides an excellent in-built library which is sadly neglected by most of the python developers. Logging your program activities is a very good habit to avoid failures. Here we can jump start how to log a python function activity in  a file.

# loggex.py
import logging


logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

fh = logging.FileHandler('add.log')

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

fh.setFormatter(formatter)
logger.addHandler(fh)

def add(x,y):
    logger.info("Just now received parameters %d and %d" % (x, y))
    addition = x + y
    logger.info("Returning the computed addition %d" % addition)
    return addition

if __name__ == '__main__':
    add(13,31)

Here we are not doing anything fancy. We are just logging activity of an add function in a file called add.log. Here we created a python script  loggex.py which does these things.

  • Create a logger object with current file name as handle
  • Set level to DEBUG. It can also be INFO or ERROR according to the context of log
  • Create a file handler, which redirects logs to a physical file
  • Create format handler and set it to file handler. This is nothing but defining custom message for time and date etc  that appears in the log file.
  • Add file handler to logger object we created
  • Sprinkle INFO  or DEBUG messages wherever you want to note down activity. They will be recorded in the file. You can review a log file in case of failure.

Screenshot from 2015-10-11 20:58:54

See how simple logging is. But very few developers shows interest in doing it while building software. Make logging in your program as a habit.

So these are five notable minimum things every python developer should use and care about to improve their productivity. If you have any queries just comment below. Thanks.

narenarya@live.com

@Narenarya3

Mattias-Adolfsson5

Build massively scalable RESTFul API with Falcon and PyPy

Namaste everyone. If you build a RESTFul API for some purpose, what technology stack you use in python and why?. I may receive the following answers from you.

1)  I use Flask with Flask-RESTFul

2)  I use (Django + Tastypie) or (Django + REST Framework)

Both options are not suitable for me. Because there is a very good light-weight API framework available in python called Falcon. I always keep my project and REST API loosely coupled. It means my REST API knows little about the Django or Flask project that is being implemented. Creating cloud API’s with low-level web framework than a bulky wrapped one always speeds up my API.

What is Falcon?

As per Falcon website Falcon official website

“Falcon is a minimalist WSGI library for building speedy web APIs and app backends. We like to think of Falcon as the Dieter Rams of web frameworks.”

“When it comes to building HTTP APIs, other frameworks weigh you down with tons of dependencies and unnecessary abstractions. Falcon cuts to the chase with a clean design that embraces HTTP and the REST architectural style.”

If you want to hit bare metal for creating API use Falcon. You can build easy to develop, easy to serve and easy to scale API with Falcon. Just use it for speed.

What is PyPy?

“If you want your code to run faster, you should probably just use PyPy.” — Guido van Rossum

PyPy is a fast, compliant alternative implementation of the Python language

So PyPy is a JIT implementation for your Python code. It is a separate interpreter that can be used as a normal interpreter in a virtual environment to power our projects. In most of the cases, there are no issues with PyPy.

Let’s start building a simple todo REST API

Note: Project source is available at https://github.com/narenaryan/Falcon-REST-API-Pattern

Falcon and PyPy are our ingredients to build scalable, faster REST API. We start with a virtual environment that runs PyPy with falcon installed using pip. Then we use rethinkDB as the resource provider for our API. Our todo app does three main things.

  1. Create a note (PUT)
  2. Fetch a note by ID (GET)
  3. Fetch all notes (GET)
  4. PUT & DELETE are obvious

Install RethinkDB on Ubuntu14.04 in this way.

$ source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt $DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
$ wget -qO- http://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install rethinkdb
$ sudo cp /etc/rethinkdb/default.conf.sample /etc/rethinkdb/instances.d/instance1.conf
$ sudo /etc/init.d/rethinkdb restart

Create virtualenv for the project and install required libraries. Download PyPy from this URL .PyPy Download. After downloading, extract files and install pip if required.

$ sudo apt-get install python-pip
$ virtualenv -p pypy-2.6.1-linux64/bin/pypy falconenv
$ source falconenv/bin/activate
$ pip install rethinkdb falcon gunicorn

Now we are ready with our stack. PyPy as python interpreter, Falcon as web framework to build the RESTful API. Gunicorn is a WSGI server that serves our API. Now, let us prepare our rethinkDB database client for fetching and inserting resources. Let me give the filename “db_client.py”

#db_client.py
import os
import rethinkdb as r
from rethinkdb.errors import RqlRuntimeError, RqlDriverError

RDB_HOST = 'localhost'
RDB_PORT = 28015

# Datbase is todo and table is notes
PROJECT_DB = 'todo'
PROJECT_TABLE = 'notes'

# Set up db connection client
db_connection = r.connect(RDB_HOST,RDB_PORT)


# Function is for cross-checking database and table exists 
def dbSetup():
    try:
        r.db_create(PROJECT_DB).run(db_connection)
        print 'Database setup completed.'
    except RqlRuntimeError:
        try:
            r.db(PROJECT_DB).table_create(PROJECT_TABLE).run(db_connection)
            print 'Table creation completed'
        except:
            print 'Table already exists.Nothing to do'

dbSetup()

Don’t worry, if you do not know about rethinkDB. Just go to this link and see quickstart. RethinkDB Python. We just prepared a db connection client and created database, table. Now the actual thing comes. Falcon allows us to define a resource class which we can route to a URL. In that resource class we can have four REST methods

  1. on_get
  2. on_post
  3. on_put
  4. on_delete

So we are going to implement first two functions in this article. Create a file called app.py.

#app.py
import falcon
import json

from db_client import *

class NoteResource:
 
    def on_get(self, req, resp):
        """Handles GET requests"""
        # Return note for particular ID
        if req.get_param("id"):
            result = {'note': r.db(PROJECT_DB).table(PROJECT_TABLE). get(req.get_param("id")).run(db_connection)}
        else:
            note_cursor = r.db(PROJECT_DB).table(PROJECT_TABLE).run(db_connection)
            result = {'notes': [i for i in note_cursor]}
        resp.body = json.dumps(result)

    def on_post(self, req, resp):
         """Handles POST requests"""
         try:
             raw_json = req.stream.read()
         except Exception as ex:
             raise falcon.HTTPError(falcon.HTTP_400,'Error',ex.message)

         try:
             result = json.loads(raw_json, encoding='utf-8')
             sid =  r.db(PROJECT_DB).table(PROJECT_TABLE).insert({'title':result['title'],'body':result['body']}).run(db_connection)
             resp.body = 'Successfully inserted %s'%sid
         except ValueError:
             raise falcon.HTTPError(falcon.HTTP_400,'Invalid JSON','Could not decode the request body. The ''JSON was incorrect.')

api = falcon.API()
api.add_route('/notes', NoteResource())

We can break down the code into following pieces.

  1. We imported falcon and database client
  2. Created a resource class called NoteResource
  3. Created two methods called on_get and on_post on NoteResource.
  4. In on_get method, we are checking for “id” parameter in the request and sending one resource (note) or all resources (notes). req, resp are the request and response objects of falcon respectively.
  5. In on_post method, we are checking for data as a raw JSON. We are decoding that raw JSON to store title and body in the rethinkDB notes table.
  6. We are creating API class of falcon and adding a route for it. ex: ‘/notes’ in our case.

Now in order to serve API, we should start WSGI server because falcon needs an independent server to deliver the API. So launch Gunicorn

$ gunicorn app:api

 

This will run Gunicorn WSGI server on port 8000. Visit

http://localhost:8000/notes

to view all notes stored.

If notes are empty then add one using POST request to our API.

Screenshot from 2015-09-13 03:13:54

Now add one more note as shown above with different data. Let us say it is { “title” : “At 10:00 AM” , “body” : ” Scrum meeting scheduled”}. Now visit http://localhost:8000/notes once again and you will find this

 

.Screenshot from 2015-09-13 03:19:37

If we want to fetch an element by id then do it with this. http://localhost:8000/notes?id=d24866be-36f0-4713-81fd-750b1b2b3bd4. Now only one note with given ID will be displayed.

Screenshot from 2015-09-13 03:22:51

This is how falcon enables us to create REST API easily at very low level. There are many additional features available for Falcon. For more details visit Falcon home page. If you want to see the full source code of above demonstration, visit this link.

https://github.com/narenaryan/Falcon-REST-API-Pattern

please do comment if you have any query. Have a good day :) .

Build a real time data push engine using Python and Rethinkdb

art_hipster-wallpaper-960x540

Namaste everyone.Today we are going to talk about building real time data push engines.How to design models for the modern realtime web will be the lime light point in this article. We are going to build a cool push enigne that notifies “Super Heroes” real time in the Justice League(DC). We can also develop real time chat applications very easily with same principles.

What actually is a Data Push Engine?

Push engine is nothing but a software piece that pushes notifications from the server to all the clients who subscribed to recieve these events.When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain.In order to overcome this burden two proposals were made.

  1. Web Sockets
  2. Server Sent Events(SSE)

But using any one of the above technologies is not sufficient for modern real time web. Think it in this way. The query-response database access model works well on the web because it maps directly to HTTP’s request-response. However, modern marketplaces, streaming analytics apps, multiplayer games, and collaborative web and mobile apps require sending data directly to the client in realtime. For example, when a user changes the position of a button in a collaborative design app, the server has to notify other users that are simultaneously working on the same project. Web browsers support these use cases via WebSockets and long-lived HTTP connections, but relying on database to notify updates is cool.

Seeing is believing

I am going to run my project first to make you confident with it. Project is nothing but a website which does following. Code for this project is available at  https://github.com/narenaryan/thinktor

  • I am going to start a Justice League website (like the one superman runs).
  • Website collects nickname and email of a SuperHero.
  • Notify all existing heroes about new joinees in real time.

So I am going to tell you a small story. Just click the first image and navigate to last one by one. Don’t forget to read description below in each  image!. Press Esc to exit from slide show.

I think you got something with above story.If you don’t let me explain. Here we are asking information from clients and navigating them to their dashboard. From then all clients who are on dashboard will be notified about newly joined people instantly. No refresh,No ajax polling. Thanks to our push engine.

Are you kidding ,I can implement that using web sockets?

Yes are right. You can purely implement the above notification system using websockets. But why I used few more things to do that. Here is the answer.

“Using websockets code for designing push logic  is cumbersome. Websocket code must do a push from server and recieve that in client . Traditional databases do not know about the websockets or Server Sent Events. There we need to poll the database changes and then push them to intermediate queue and from there to clients. I say remove that headache from our server. Just exploit database capability of pushing changes in realtime whenever a change occurs to it’s data. That is why I chose RethinkDB plus Websockets“.

How I build that Push engine

I used two main ingredients to create data push engine shown above.

  1. Python Tornado web server ( for handling websocket requests and responses)
  2. RethinkDB ( for storing data and also to push real time changes to the server)

What is RethinkDB?

According to RethinkDB official website

RethinkDB is the first open-source, scalable JSON database built from the ground up for the realtime web. It inverts the traditional database architecture by exposing an exciting new access model – instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results to applications in realtime. RethinkDB’s realtime push architecture dramatically reduces the time and effort necessary to build scalable realtime apps.

When is RethinkDB a good choice?

RethinkDB is a great choice when your applications could benefit from realtime feeds to your data.

The query-response database access model works well on the web because it maps directly to HTTP’s request-response. However, modern applications require sending data directly to the client in realtime. Use cases where companies benefited from RethinkDB’s realtime push architecture include:

  • Collaborative web and mobile apps
  • Streaming analytics apps
  • Multiplayer games
  • Realtime marketplaces
  • Connected devices

We know that modern web demands falls in one of the above catagories.So RethinkDB is extremely useful for the people want to exploit it’s real power for building real time apps.

RethinkDB has a dedicated python driver.In our project we are just inserting our dicument and reading the changes on users table.For getting familiar with RethinkDB python client visit these links.

http://rethinkdb.com/docs/guide/python/

http://rethinkdb.com/docs/introduction-to-reql/

Setup for our data push engine

Install RethinkDB on Ubuntu14.04 in this way.

$ source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt $DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
$ wget -qO- http://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install rethinkdb
$ sudo cp /etc/rethinkdb/default.conf.sample /etc/rethinkdb/instances.d/instance1.conf
$ sudo /etc/init.d/rethinkdb restart

Create virtualenv for the project and install required libraries

$ virtualenv rethink
$ source rethink/bin/activate
$ pip install tornado rethinkdb jinja2

Now everything is fine.My main applciation will be app.py and there are templates and staticfiles in my project.The project structure looks like this.

.
|-- app.py
|-- conf.py
|-- requirements.txt
|-- static
|  `-- js
|      `-- sockhand.js
`-- templates
|   `--detail.html
    `--home.html

Now letus write our app.py file.

#For tornado server stuff 

import tornado.ioloop
import tornado.web
import tornado.gen
import tornado.websocket
import tornado.httpserver
from tornado.concurrent import Future


from jinja2 import Environment, FileSystemLoader #For templating stuff

import rethinkdb as r #For db stuff

from rethinkdb.errors import RqlRuntimeError, RqlDriverError

from conf import * #Fetching db and table details here


#Load the template environment

template_env = Environment(loader=FileSystemLoader("templates"))

db_connection = r.connect(RDB_HOST,RDB_PORT) #Connecting to RethinkDB server

#Our superheroes who connects to server
subscribers = set() 

#This is just for cross-checking database and table exists 
def dbSetup():
    print PROJECT_DB,db_connection
    try:
        r.db_create(PROJECT_DB).run(db_connection)
        print 'Database setup completed.'
    except RqlRuntimeError:
        try:
            r.db(PROJECT_DB).table_create(PROJECT_TABLE).run(connection)
            print 'Table creation completed'
        except:
            print 'Table already exists.Nothing to do'
        print 'App database already exists.Nothing to do'
    db_connection.close()

#There is a loop type in python rethinkDB client.set it to tornado
r.set_loop_type("tornado")


class MainHandler(tornado.web.RequestHandler): #Class that renders details page and Dashbaord
    @tornado.gen.coroutine
    def get(self):
        detail_template = template_env.get_template("detail.html") #Loads tenplate
        self.write(detail_template.render())
 
    @tornado.gen.coroutine
    def post(self):
        home_template = template_env.get_template("home.html")
        email = self.get_argument("email")
        name = self.get_argument("nickname")
        connection = r.connect(RDB_HOST, RDB_PORT, PROJECT_DB)
        #Thread the connection
        threaded_conn = yield connection
        result = r.table(PROJECT_TABLE).insert({ "name": name , "email" : email}, conflict="error").run(threaded_conn)
        print 'log: %s inserted successfully'%result
        self.write(home_template.render({"name":name}))


#Sends the new user joined alerts to all subscribers who subscribed
@tornado.gen.coroutine
def send_user_alert():
    while True:
        try:
            temp_conn = yield r.connect(RDB_HOST,RDB_PORT,PROJECT_DB)
            feed = yield r.table("users").changes().run(temp_conn)
            while (yield feed.fetch_next()):
                new_user_alert = yield feed.next()
                for subscriber in subscribers:
                    subscriber.write_message(new_user_alert)
        except:
            pass


class WSocketHandler(tornado.websocket.WebSocketHandler): #Tornado Websocket Handler
    def check_origin(self, origin):
        return True

    def open(self):
        self.stream.set_nodelay(True)
        subscribers.add(self) #Join client to our league

    def on_close(self):
        if self in subscribers:
            subscribers.remove(self) #Remove client


if __name__ == "__main__":
    dbSetup() #Check DB and Tables were pre created
 
    #Define tornado application
    current_dir = os.path.dirname(os.path.abspath(__file__))
    static_folder = os.path.join(current_dir, 'static')
    tornado_app = tornado.web.Application([('/', MainHandler), #For Landing Page (r'/ws', WSocketHandler), #For Sockets
(r'/static/(.*)', tornado.web.StaticFileHandler, { 'path': static_folder }) #Define static folder 
 ])

    #Start the server
    server = tornado.httpserver.HTTPServer(tornado_app)
    server.listen(8000) #Bind port 8888 to server
    tornado.ioloop.IOLoop.current().add_callback(send_user_alert)
    tornado.ioloop.IOLoop.instance().start()

I am going to define database configuration parameters like db_name, table_name etc in seperate conf.py file.

import os

RDB_HOST = os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
PROJECT_DB = 'userfeed'
PROJECT_TABLE = 'users'

That’s it. We had our app.py and conf.py ready. I will explain what I did above in app.py point-wise below.

  • importing tornado tools and rethinkDB client drivers
  • writing a function called db_setup that checks whether required database and table were created or not
  • using MainHandler class to handle http requests. For GET request displaying enter details page and for POST showing the dashboard.
  • WSocketHandler is the tornado websocket handler that adds or removes subscribers.
  • We have one method called send_user_alert . It is the actual pusher of changes to the client.It does only two things. “subscribing to database table change” . “sending those changes to client “

In rethinkdb we have a concept called change feeds. It is similar to Redis PUBSUB.We can subscibe to a particular change-feed and rethindb returns us a cursor which is of infinite length.Whenever db recieves a change in particular table it triggers event to that subscribed cursor with new and old values of data.For example.

#cursor is returned when we subscribe to changes on authors table
cursor = r.table("users").changes().run(connection)

#just loop through it infinitely to grab changes that RethinkDB push to cursor
for document in cursor:
     print(document)

I think you got the thing by now. The other files in our project are templates and static files

  • detail.html
  • home.html
  • sockhand.js

The code for templates is quite obvious. You can find templates here  https://github.com/narenaryan/thinktor

But we need to look into js file

 
//function that listens to Socket and do something when notification comes
function listen() {
    var source = new WebSocket('ws://' + window.location.host + '/ws');
    var parent = document.getElementById("mycol")
    source.onmessage = function(msg) {
              var message = JSON.parse(msg.data);
              console.log(message);
              //Return random color for superhero
 
              var child = document.createElement("DIV");
              child.className = 'ui red message';
 
              var text = message['new_val']['name'].toUpperCase() + ' joined the league on '+ Date(); 
              var content = document.createTextNode(text);
              child.appendChild(content);
              parent.appendChild(child);
              return false;
       }
}

$(document).ready(function(){
    console.log('I am ready'); 
    listen();
});

Here we are defining a listen function when webpage is loaded. That listen function initializes a variable called source which is of type WebSocket and links it to the /ws url that we defined in the Tornado application. It also sets a callback when a message is recieved and that callback code updates the DOM structure and adds information about new user.

If you are still confused ,then run  application yourselves and see the things. The app we wrote above is a data push engine that routes directly from database to client.  Go to this project link https://github.com/narenaryan/thinktor . clone it. Install requirements.txt .Then run app.py.Just visit localhost:8000. If you still have any queries on how it works then feel free to comment below or approach narenarya@live.com

I thought to introduce rethinkDB for absolute beginners but article becomes very lengthy then.Sure I will come up with an article dedicated for RethinkDB in near future.

In this way we can build a real time data push engine using python and Rethinkdb.

Points to ponder

  • Use rethinkDB for building real time applications.It is scalable too.
  • Use Tornado because it can easily handle concurrent connections without any fuss.
  • Remove queuing from your architecturaal design
  • Use websockets for bidirectional communication
  • Try out new things frequently

References

381292

Build a Collatz conjecture solver with Python and ZeroMQ

Connecting computers is so difficult that software and services to do this is a multi-billion dollar business. So today we’re still connecting applications using raw UDP and TCP, proprietary protocols, HTTP, Websockets. It remains painful, slow, hard to scale, and essentially centralized.

To fix the world, we needed to do two things. One, to solve the general problem of “how to connect any code to any code, anywhere”. Two, to wrap that up in the simplest possible building blocks that people could understand and use easily. It sounds ridiculously simple. And maybe it is. That’s kind of the whole point. Zero MQ comes to rescue us from the problem.With averge hardware configuration, we can handle 2.5-8 Million messages/second using ZeroMQ.

What is ZMQ?

ZeroMQ is a library used to implement messaging and communication systems between applications and processes – fast and asynchronously.It is faster like a bullet train.You can use it for multiple purposes.Like the things listed below.

* Networking and concurrency library

* Asynchronous messaging

* Brokerless communication

* Multiple transport

* Cross-platform and open-source

Why a message queue is required in the distributed applications and how ZeroMQ can be used as the best communication practise between applciations will be explained in few minutes.

Selection_001

I guess you captured the logic from above pic.Instead of hitting the server directly for each request  we can push it into a message queue and process it by workers , then route it to appropriate location. Ok, now context switch to collatz conjecture.

What is Collatz conjecture?

This is the 3n+1 mathematical problem (or Collatz conjecture). Collatz conjecture states that for any number n, the following function f(n) will always boil down to 1 as  result, if you keep feeding the previous result to the function over and over again.
  f(n) = {
           3n+1, if n is odd,
           n/2, if n is even
           1, if n is 1
        }
Eg: if n = 20, then:
           f(20) = 20/2 = 10
           f(10) = 10/2 = 5
           f(5)  = 3*5+1 = 16
           f(16) = 16/2 = 8
           f(8)  = 8/2 = 4
           f(4)  = 4/2 = 2
           f(2)  = 1
The term cycle count refers to the length of the sequence of numbers generated. In the above case, cycle count for f(20) is 9.
We are going to build a collatz conjecture cycle finding server using Python and ZeroMQ. complete code of this project is available at this location

Beauty of collatz conjecture

All numbers leads to one. It is the philosophy of collatz conjecture. visit this site to visually see the construction of collatz numbers for orbital length of 18.
1011-collatz-graph.v01

Let us build a Collatz Conjecture Cycle Server

Now come to the coding part.Our aim is to construct the server that takes a number from client and calculates longest collatz cycle from 1 to that number and returns it back. For ex. If we give input of 1000 our server should calculate collatz cycles for 1,2,3,……..,1000 seperately and return the longest cycle of all.

Ingredients

  • Python
  • ZeroMQ
  • Gevent

we can build the same server with Python and Gevent alone , but that setup is vulnerable after 10K connections. For scaling it to millions ,we should use the power of ZeroMQ.

Requirements

  • Install  ZeroMQ4.1.2 ( http://zeromq.org/intro:get-the-software ) .Below process shows step wise installation procedure.
$ sudo apt-get install uuid uuid-dev uuid-runtime
$ sudo apt-get install libzmq-dbg libzmq-dev libzmq1
$ sudo apt-get install build-essential gcc
$ cd tmp/ && wget http://download.zeromq.org/zeromq-4.1.2.tar.gz
$ tar -xvf zeromq-4.1.2.tar.gz && cd ./zeromq-4.1.2.tar.gz
$ ./configure && make
$ sudo make install
$ sudo ldconfig
  • Install pyzmq
$ sudo apt-get install python-dev
$ sudo pip install pyzmq
  • Install gevent
$ sudo pip install gevent

Please be aware that ZeroMQ installation will fail if all deapendecies are not installed. pyzmq installation will fail if python-dev library deapendency is not fulfilled. I hope now you are ready with the required set up on a Ubuntu 14.04 machine.

This ZeroMQ server is used to serve requests through a TCP port to which a zmq socket is attached. Server collects data from that bound socket. let us first design function for returning maximum collatz cycle for a given inupt range. That algorithm would look like below

Collatz Conjecture Algorithm

import gevent
from gevent import monkey

monkey.patch_all()

#Algorithm for finding collatz conjecture longest cycle.
#Returns max cycle from cycles calculated from 1 to n.
def do_collatz(n):
    def collatz(n,cycle=''):
        while True:
            if n == 1:
                cycle += str(n)
                break
            else:
                if n % 2 == 1:
                    n = ( 3 * n ) + 1
                    cycle += str(n)
                else:
                    n = n/2
                    cycle += str(n)
       return len(cycle)
    #This Gevent code is for speeding up the calculation of cycles
    jobs = [ gevent.spawn(collatz, x) for x in range(1,n) ]
    gevent.joinall(jobs)
    return max([g.value for g in jobs])

Now let us use this function in our ZeroMQ server that we are going to write below. It just recieves a number from client and calls this do_collatz function and returns the longest cycle back to the client.

ZeroMQ Collatz Server

#zmq_collatz_server.py

import time
import zmq
import gevent
from gevent import monkey

monkey.patch_all()

#Create context
context = zmq.Context()

#Set type of socket
socket = context.socket(zmq.REP)

#Bind socket to port 5555
socket.bind("tcp://*:5555")

#Algorithm for finding collatz conjecture
def do_collatz(n):
    def collatz(n,cycle=''):
        while True:
            if n == 1:
                cycle += str(n)
                break
            else:
                if n % 2 == 1:
                    n = ( 3 * n ) + 1
                    cycle += str(n)
                else:
                    n = n/2
                    cycle += str(n)
       return len(cycle)
    #This Gevent code is for speeding up the calculation of cycles 
    jobs = [ gevent.spawn(collatz, x) for x in range(1,n) ]
    gevent.joinall(jobs)
    return max([g.value for g in jobs])

#Create a loop and listen for clients to send requests 
while True:
    # Wait for next request from client
    number = int(socket.recv())
    print("Received request for finding max collatz cycle between 1.....%s" % number)
    # Send reply back collatz conjecture maximum cycle to client
    num = str(do_collatz(number))
    socket.send(num)

 

Now we have a server ready to serve any no of clients.Let us build a ZeroMQ client to send request to above server and recieves the maximum collatz cycle for a given no.

ZeroMQ Collatz Client

#zmq_collatz_client.py

import zmq
context = zmq.Context()

# Socket to talk to server
print 'Connecting to hello world server'

socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:5555")

number = raw_input("please give a no to calculate collatz conjecture max cycle: ")
print 'Sending request %s ' % number
#Send number to server
socket.send(number)
#Wait and print result
message = socket.recv()
print 'Collatz Conjecture max cycle of %s is <[ %s ]>' % (number, message)

That’s it. Our client is ready too. Now open the terminal with three tabs. One for server and other two for two clients to send the request. The output looks like this on my Ubuntu  14.04 machine.

Next try to give input 1000 from one client and 7000 from another client. Server instantly return maximum collatz cycle back to the client. It looks like this.

Selection_003

So it is clearly visible that our ZeroMQ server is working perfectly for serving the clients and solves collatz conjecture problem. This is called Request-Reply pattern of implementation of ZeroMQ. Here communication is acheived through TCP rather than HTTP. There are three more patterns can be implemented using ZeroMQ. They are:

  • Publish/Subscribe Pattern: Used for distributing data from a single process (e.g. publisher) to multiple recipients (e.g. subscribers).
  • Pipeline Pattern: Used for distributing data to connected nodes.
  • Exclusive Pair Pattern: Used for connecting two peers together, forming a pair.

So ZeroMQ has a lot of scope. It is a good scalability solution for current distributed application architectures. All code for above collatz-cycle is availlable at below github link.

https://github.com/narenaryan/collatz-cycle-server

References:

https://www.digitalocean.com/community/tutorials/how-to-work-with-the-zeromq-messaging-library

http://zeromq.org/intro:read-the-manual

http://inerciatech.com/post/5251827502/a-rabbitmq-to-zeromq-gateway

https://speakerd.s3.amazonaws.com/presentations/8035a1002fcd013209132673290742c6/ZeroMQ.pdf

 

Build an API under 30 lines of code with Python and Flask

 

python_flute

Hello everyone. Now a days developers need to perform many jobs. Like web development, database development, API development and so on. Some companies are just having jobs called API developer on their openings sheet.What role APIs are playing now and why one should learn building them is our topic today. Developing an API with Python is a very easy task when compared to other languages. So,sit back and grab this skill for you. Take my words ,this skill is hot right now in the market.

What is a REST API?

REST (REpresentational State Transfer) is an architectural style, and an approach to communications that is often used in the development of Web services. The use of REST is often preferred over the more heavyweight SOAP (Simple Object Access Protocol) style because REST does not leverage as much bandwidth, which makes it a better fit for use over the Internet. The SOAP approach requires writing or using a provided server program (to serve data) and a client program (to request data).

In simple three lines REST API is a:

1 ) A way to expose your internal system to the outside world.

2) Programmatic way of interfacing third party systems.

3) Communication between different domains and technologies.

I think we are sounding technical. let us jump into practical things.By the end of this tutorial ,you will be comfortable in creating any API using Python and Flask.

Ingredients to build our API

We are going to use these things to build a running API.

*  Python

*  Flask web framework

*  Flask-RESTFul extension

*  SQLite3

* SQLAlchemy

Let us build Chicago employees salary API under 30 lines of code

I am going to build a Salary info API of Chicago city employees. Do you know ?,it is damn easy. An API can give you computation result or data from a remote database in a nice format. It is what API is intended for.API is a bridge between private databases and applications. I am collecting employee salary details from Chicago city data website data.cityofchicago.org

code of this entire project can be found at this link https://github.com/narenaryan/Salary-API

Let’s begin the show……..

First , i downloaded the data-set as CSV and dumped it into my sqlite database.

$ sqlite3 salaries.db
sqlite> .mode csv salaries
sqlite> .import employee_chicago.csv salaries

and imported CSV.

Now we are going to build a flask app that serves this data as a REST API.

$ virtualenv rest-api
$ source rest-api/bin/activate
$ mkdir ~/rest-app
$ cd ~/rest-app

Now we are in the main folder of app.Create a file called app.py in that folder.We need few libraries to finish our task.Install them by typing below commands.

$ pip install flask
$ pip install flask-restful
$ pip install sqlalchemy

That’s it. We are ready to build a cool salary API that can even be accessed through mobile. Let us recall the REST API design .It has 4 options. GET,PUT,POST,DELETE

rstapi

 

here we are  dealing with an open data which can be accessed by multiple applications. So we implement GET here and remaining REST options becomes quite obvious.

 

from flask import Flask, request
from flask_restful import Resource, Api
from sqlalchemy import create_engine
from json import dumps

#Create a engine for connecting to SQLite3.
#Assuming salaries.db is in your app root folder

e = create_engine('sqlite:///salaries.db')

app = Flask(__name__)
api = Api(app)

class Departments_Meta(Resource):
    def get(self):
        #Connect to databse
        conn = e.connect()
        #Perform query and return JSON data
        query = conn.execute("select distinct DEPARTMENT from salaries")
        return {'departments': [i[0] for i in query.cursor.fetchall()]}

class Departmental_Salary(Resource):
    def get(self, department_name):
        conn = e.connect()
        query = conn.execute("select * from salaries where Department='%s'"%department_name.upper())
        #Query the result and get cursor.Dumping that data to a JSON is looked by extension
        result = {'data': [dict(zip(tuple (query.keys()) ,i)) for i in query.cursor]}
        return result
        #We can have PUT,DELETE,POST here. But in our API GET implementation is sufficient
 
api.add_resource(Departmental_Salary, '/dept/<string:department_name>')
api.add_resource(Departments_Meta, '/departments')

if __name__ == '__main__':
     app.run()

save it as app.py and run as

 $ python app.py 

That’s it. Your salary API is up and running now on localhost , port 5000. There are two rules we defined in the API. One is to get details of all departments available and second is to get employee full detail, who is working in a particular department.

So now go to

http://localhost:5000/departments

and you will find this.

Selection_074

See how flask is serving database data into JSON through the REST API we defined. Next modify URL to peek all employees who are working in Police department.

http://localhost:5000/dept/police

Selection_073

Oh man, seems like police officers are well paid in Chicago but they can’t beat a Django or Python developer who earns $ 1,00,000 per annum. just kidding.

My code walk-through is as follows

*  I downloaded latest Salary dataset from chicago data site

*  Dumped that CSV  into my SQLite db.

*  Used SQLAlchemy to connect to database and do select operations.

*  Created Flask-Restful classes to map functions with API URL

*  Returned the queried data as JSON ,which can be used universally.

See how simple it is to create a data API. We can also add support to PUT,POST and DELETE on data too.We can also have an authentication system for fetching data through API. Python and Flask are very powerful tools to create API rapidly. GitHub link is given below. Give it a try and extend it with the things mentioned above.

https://github.com/narenaryan/Salary-API

See you soon with more stories.