buildings_by_milescat

A practical guide for building Restful API with Django Tastypie

Namaste everyone. It is been very long since I wrote an article. This time I came up with an essential thing for creating API in Django web framework. We all know that REST API are needed these days for building communication channels between different systems and devices.

What is Django Tastypie?

Django Tastypie is a framework that allows us to create RESTful API in our Django web applications. Simply putting, it is a web service API framework. There is one more such library called Django REST Framework. Without a library we can only implement a basic API with no security or no customized API results. Using Tastypie provides tons of features to build API with a very good control. In the upcoming journey we starts with basics and see advanced stuff next.

We will build an API for a Restaurant. You can access the sample working code here https://github.com/narenaryan/tastypie-tutorial

Basics of Tastypie

Since we are following a practical hands-on approach, let us create a sample Django project from scratch. I hope that you are in a virtual environment. Virtual environments are good for isolating different projects. If you are not familiar with them, just give a look at http://docs.python-guide.org/en/latest/dev/virtualenvs/ 

In this article we are going to create an API for a fake restaurant which allows developers to build apps for their products. I use Django 1.8 for my illustration.

$ pip install django-tastypie
$ django-admin startproject urban_tastes
$ cd urban_tastes
$ django-admin startapp services

We just installed Tastypie library and created a project called urban_tastes. Then we created an app called services which takes care of the  API for the restaurant.  Now project structure looks like this.

urban_tastes

 

There are certain steps in using Tastypie. They are.

  • Include “tastypie” in INSTALLED_APPS
  • Create api.py file in app
  • Create Resources for Django models
  • Provide Authorization

Let us create a Django model for holding product, order information of restaurant.

# services/models.py

from django.db import models
import uuid

class Product(models.Model):
    name = models.CharField(max_length=30)
    product_type = models.CharField(max_length=50)
    price = models.IntegerField(max_length=20)

    def __str__(self):
        return self.name

class Order(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    product = models.ForeignKey(Product)

    def __str__(self):
        return self.id

Let us add few products using admin panel. But in order to access custom models Product, Order in Django admin panel we need to register them in the admin.py file. So go and edit admin file as follows.

# services/admin.py

from django.contrib import admin
from services.models import Product, Order

admin.site.register(Product)
admin.site.register(Order)

Now we need to add both our app “services” and “tastypie” in the INSTALLED_APPS list.

# urban_tastes/settings.py
...
INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'tastypie',
    'services'
)
...

Now apply migration for new models.

$ python manage.py makemigrations
$ python manage.py migrate

Now I added three products called Pizza, Hamburger and Cake through admin panel.

$ python manage.py createsuperuser
$ python manage.py runserver 0.0.0.0:8000 # Visit http://localhost:8000/admin

products

We are going to authenticate our upcoming REST API using an API key. Tastypie has a builtin API Key generator with it. We can use it by adding a signal for User object to tell whenever a new user is created, just generate an API Key.  Tastypie already has a signal defined. We just need to use it. So create apps.py in services and define custom app configuration.

# services/apps.py

from django.apps import AppConfig
from django.contrib.auth.models import User
from django.db.models import signals
from tastypie.models import create_api_key


class ServiceConfig(AppConfig):
    name = "services"

   def ready(self):
       # This line dispatches signal to Tastypie to create APIKey
       signals.post_save.connect(create_api_key, sender=User)

Now add the defined ServiceConfig app configuration as default in the __init__.py file.

# services/__init__.py
default_app_config = 'services.apps.ServiceConfig'

Basic step is done. If we create a new user, Tastypie automatically generates an API Key. I created a super user and also added another user. For both users, API keys are generated.

users_tasty

keys_tasty

Resources in Tastypie

Resources are the heart of Tastypie. By defining a resource we can actually convert a model into an API stream. The data is automatically converted into API response. The resources of Tastypie gives us good flexibility in checking the validity of requested data and also modifying a response before sending to client.

Let us create resources for both Product and Order so that API is available.

# services/api.py

from tastypie.resources import ModelResource
from services.models import Product, Order


class ProductResource(ModelResource):
    class Meta:
        queryset = Product.objects.all()
        resource_name = 'product'
        allowed_methods = ['get']


class OrderResource(ModelResource):
    class Meta:
        queryset = Order.objects.all()
        resource_name = 'order'
        allowed_methods = ['get', 'post', 'put']

Let us understand the process of creating a resource.

  1. Import ModelResource from tastypie
  2. Import models from services app
  3. Create custom resource by inheriting ModelResource and link app model in inner Meta class of resource. We created two resources for building API for two models product, order. The allowed_methods setting defines what methods API supports.We make our product API implement GET method. But an order can be created, modified so it supports GET, POST, PUT.

Add API URL in the urls.py of app. Here create urls.py under the services app.

# services/urls.py

from django.conf.urls import url, include
from tastypie.api import Api
from services.api import ProductResource, OrderResource

v1_api = Api(api_name='v1')
v1_api.register(ProductResource())
v1_api.register(OrderResource())

urlpatterns = [url(r'^api/', include(v1_api.urls))]

Now include these URL in project’s URL dispatcher file.

# urban_tastes/urls.py

from django.conf.urls import include, url
from django.contrib import admin

urlpatterns = [
 url(r'^admin/', include(admin.site.urls)),
 url(r'', include("services.urls")),
]

That’s it. We have setup the Tastypie with Django and ready to access API.  Now run the server. If it is already running just visit this URL in browser or make a python request using requests.

http://localhost:8000/api/v1/product/?format=json

You will see something like this

{
     "meta":{
     "limit":20,
     "next":null,
     "offset":0,
     "previous":null,
     "total_count":3
     },
     "objects":
   [
     {
     "id":1,
     "name":"Pizza",
     "price":2,
     "product_type":"food",
     "resource_uri":"/api/v1/product/1/"
     },
     {
     "id":2,
     "name":"Hamburger",
     "price":3,
     "product_type":"food",
     "resource_uri":"/api/v1/product/2/"
     },
     {
     "id":3,
     "name":"Cake",
     "price":2,
     "product_type":"snack",
     "resource_uri":"/api/v1/product/3/"
     }
  ]
}

Now try to access Orders with the same API structure.

http://localhost:8000/api/v1/order/?format=json

It return with zero objects.

{"meta": {"limit": 20, "next": null, "offset": 0, "previous": null, "total_count": 0}, "objects": []}

Above Tastypie product API  returns data for all the fields. In order to make only few fields accessible to developer, we use a setting called excludes  

# services/api.py

...
class ProductResource(ModelResource):
    class Meta:
        queryset = Product.objects.all()
        resource_name = 'product'
        excludes = ["product_type", "price"]
        allowed_methods = ['get']
...

Now the JSON response coming from API will not contain data for product_type and price.

{
     "meta":{
     "limit":20,
     "next":null,
     "offset":0,
     "previous":null,
     "total_count":3
     },
     "objects":
  [
     {
     "id":1,
     "name":"Pizza",
     "resource_uri":"/api/v1/product/1/"
     },
     {
     "id":2,
     "name":"Hamburger",
     "resource_uri":"/api/v1/product/2/"
     },
     {
     "id":3,
     "name":"Cake",
     "resource_uri":"/api/v1/product/3/"
     }
  ]
}

Here we don’t have any authentication. Let us add basic authentication where username and password needs to be provided to access the Tastypie REST API.

# services/api.py
from tastypie.authentication import BasicAuthentication

class ProductResource(ModelResource):
    class Meta:
        ...
        authentication = BasicAuthentication()
Now open postman or any other REST client and check the difference. Since I am a Pythonista I give python request.
>>> import requests
>>> from requests.auth import HTTPBasicAuth
>>> print requests.get('http://localhost:8000/api/v1/product/')
<Response [401]>
>>> print requests.get('http://localhost:8000/api/v1/product/', auth=('naren', 'passme'))
<Response [200]>
BasicAuthentication  is a very naive way of authenticating users. For production grade API, we should have an API key based authentication. Let us implement it.
Tastypie provides  ApiKeyAuthentication, SessionAuthentication, OAuthAuthentication  etc in additional to BasicAuthentication. For more details about authentication visit http://django-tastypie.readthedocs.io/en/latest/authentication.html

Implementing custom API key authentication in Tastypie

Tastypie generally provides a bootstrapped version of API key authentication. Let us remove BasicAuthentication and add ApiKeyAuthentication in api.py

# services/api.py

from tastypie.authentication import ApiKeyAuthentication
...
class ProductResource(ModelResource):
    class Meta:
        queryset = Product.objects.all()
        resource_name = 'product'
        excludes = ["product_type", "price"]
        allowed_methods = ['get']
        authentication = ApiKeyAuthentication()
...

API key for a user can be obtained from admin panel for now. API Key for user naren is 4fa7b65b6fcb951b6185000c699a22450b1cd060 . Now you need to pass Api key in the Authorization header in the following format.

Authorization =>   "ApiKey naren:4fa7b65b6fcb951b6185000c699a22450b1cd060"

In Python requests, terminology of the above request looks like this.

>>> requests.get('http://localhost:8000/api/v1/product/', headers={"Authorization": "ApiKey naren:4fa7b65b6fcb951b6185000c699a22450b1cd060"})
<Response [200]>

This kind of authentication is not useful because of the username. We need an API which actually takes sole API key  as the Authorization parameter. We just generate an API key and will give it to developer. It should look like is this.

Authorization =>   "4fa7b65b6fcb951b6185000c699a22450b1cd060"

For this Tastypie allows us to implement our own authentication system. We just need to Inherit the Authentication class from Tastypie. create a file called authentication.py and define custom Authentication there.

# services/authentication.py

from tastypie.models import ApiKey
from tastypie.http import HttpUnauthorized
from tastypie.authentication import Authentication
from django.core.exceptions import ObjectDoesNotExist

class CustomApiKeyAuthentication(Authentication):
    def _unauthorized(self):
        return HttpUnauthorized()

    def is_authenticated(self, request, **kwargs):
        if not(request.META.get('HTTP_AUTHORIZATION')):
            return self._unauthorized()

        api_key = request.META['HTTP_AUTHORIZATION']
        key_auth_check = self.get_key(api_key,request)
        return key_auth_check

    def get_key(self, api_key, request):
        """
        Finding Api Key from UserProperties Model
        """
        try:
            user = ApiKey.objects.get(key=api_key)
        except ObjectDoesNotExist:
            return self._unauthorized()
        return True

Explaining above code:

  • We Inherited Authentication class and overridden is_authenticated method.
  • is_authenticated method should return a Boolean value. Here we are checking whether the api key that passed exists or not. If not exist we are not authorizing the request.

After replacing ApiKeyAuthentication with our CustomApiKeyAuthentication complete api.py file looks like below.

# services/api.py

from tastypie.resources import ModelResource
from services.models import Product, Order
from services.authentication import CustomApiKeyAuthentication


class ProductResource(ModelResource):
    class Meta:
        queryset = Product.objects.all()
        resource_name = 'product'
        excludes = ["product_type", "price"]
        allowed_methods = ['get']
        authentication = CustomApiKeyAuthentication()


class OrderResource(ModelResource):
    class Meta:
        queryset = Order.objects.all()
        resource_name = 'order'
        allowed_methods = ['get', 'post', 'put']
        authentication = CustomApiKeyAuthentication()
Now if we make a python request with API key as authorization from shell, we can receive data
>>> requests.get('http://localhost:8000/api/v1/product/', headers={"Authorization": "4fa7b65b6fcb951b6185000c699a22450b1cd060"})
<Response [200]>
In this way we can implement custom authentication as per our requirement. These days Django developers are using Tastypie API instead of Django views for rendering user interfaces. It is the idea of using micro services.
As I already mentioned, Tastypie has tons of fetaures which allow you to write super cool REST API. Few methods which are quite often used in Tastypie are:
* DeHydrate method
* Hydrate method
* Paginator class

Dehydrating the JSON data

tastypie_ill

Dehydration in Tastypie means making alterations before sending data to the client. Suppose we need to send capitalized product names instead of small letters. We normally iterate over objects and capitalize letters in the JSON response in the client side. But Tastypie provides useful methods for altering ready to send data on the fly. Now we see two kinds of dehydrate methods.

Dehydrate_field method

This dehydrate_field is used to modify field on the response JSON. See below code how it works.

# services/api.py

...
class ProductResource(ModelResource):
    class Meta:
        queryset = Product.objects.all()
        resource_name = 'product'
        excludes = ["product_type", "price"]
        allowed_methods = ['get']
        authentication = CustomApiKeyAuthentication()

    # This method look for field "name" and apply upper on it
    def dehydrate_name(self, bundle):
        return bundle.data['name'].upper()
...

Now resulting JSON looks ilke this

{
     "meta": {
       "limit": 20,
       "next": null,
       "offset": 0,
       "previous": null,
       "total_count": 3
    },
      "objects":
 [
     {
      "id": 1,
      "name": "PIZZA",
      "resource_uri": "/api/v1/product/1/"
     },
    {
     "id": 2,
     "name": "HAMBURGER",
     "resource_uri": "/api/v1/product/2/"
    },
    {
     "id": 3,
     "name": "CAKE",
     "resource_uri": "/api/v1/product/3/"
    }
  ]
}

Observe carefully. We got PIZZA, HAMBURGER, CAKE instead of small letters. Similarly we can use dehydrate method to modify the bundle data. Bundle is the serialized data that Tastypie kept in ready to send to client.

Dehydrate method

Dehydrate method is useful for adding additional fields to bundle (response data). Let us add server time as additional data to the response JSON.
# services/api.py

import time

...
class ProductResource(ModelResource):
    class Meta:
        queryset = Product.objects.all()
        resource_name = 'product'
        excludes = ["product_type", "price"]
        allowed_methods = ['get']
        authentication = CustomApiKeyAuthentication()

    # This method look for field "name" and apply upper on it
    def dehydrate_name(self, bundle):
        return bundle.data['name'].upper()

    # Using dehydrate we can add more fields or modify like above
    def dehydrate(self, bundle):
        bundle.data["server_time"] = time.ctime()
        return bundle
...
Now response JSON will look like
{
     "meta": {
       "limit": 20,
       "next": null,
       "offset": 0,
       "previous": null,
       "total_count": 3
    },
      "objects":
 [
     {
      "id": 1,
      "name": "PIZZA",
      "resource_uri": "/api/v1/product/1/",
      "server_time": "Sat Apr 30 14:06:14 2016"
     },
    {
     "id": 2,
     "name": "HAMBURGER",
     "resource_uri": "/api/v1/product/2/",
     "server_time": "Sat Apr 30 14:06:14 2016"
    },
    {
     "id": 3,
     "name": "CAKE",
     "resource_uri": "/api/v1/product/3/"
     "server_time": "Sat Apr 30 14:06:14 2016"
    }
  ]
}
Similarly using hydrate method we can alter the bundle data which is generated from request at the time of PUT or POST methods. Hydrate method is a life saver when you give a documentation for developers and want to format data before Tastypie entering data into DB.
Once again reminding you the code for this article is here https://github.com/narenaryan/tastypie-tutorial
For information on other important methods of Tastypie Resource visit Tastypie documentation.
There are plenty of other things to discuss. Tastypie is so vast that even this lengthy article is not able to discuss completely.  May be in upcoming posts I will dig into advanced usage of Tastypie internals like serialization and throttling.
Until then Bye. Feel free to contact me if you have any doubt. narenarya@live.com
django high performance systems

Building High Performance Django Systems

The main motto of Django web framework is:

The web framework for perfectionists with deadlines

It is true. Django always gives a polished product within the time. Today all Django developers are racing to finish the project development with Python as their favorite choice. But evil of wrong development practices can slow down the project by significant amount.

These days perfectionism is falling for deadlines. The eagerness to finish task dominates the efficiency and optimization. People complain too much about Django’s code abstraction which makes it slow. But it is not true. I am going to prove my statement here. I will show how to optimize the Django code and where to optimize. We need to hit the sweet spot and do repair there.

The techniques those can improve our Django website performance:

  • Advanced & Correct Django ORM usage
  • Query caching
  • Django template caching
  • Non-blocking code
  • Alternate data stores

* Django ORM (Doctor Evil of New comers)

Django ORM is the easiest thing to link an application and a database(MySQL, PostreSQL). For any web stack communication between web application and database is the slowest part. With bad ORM usage practices we are making it even much slower. Django is a very good framework which gives you full customization of how you define business logic. I am going to show here how we can fall into traps of ORM, which in turn turns our website not scalable.

* Select all illusion

When a developer new to Django writes code, she usually have a bad habit of doing this.

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    city = models.CharField(max_length=30)

# Find number of persons in DB. Very bad thing
>>> count = len(Person.objects.all())

# How an amateur should do that in  a right way
>>> count = Person.objects.count()

Loading objects into memory and processing is a bad thing. SQL is an excellent querying language to filter and process data. There is no need for us to bring raw data and process them. If possible use the ORM functions which maps one-to-one to the SQL. If we see above example with one hundred thousand records in MySQL times will be

See the time difference for both ORM queries

See the time difference for both ORM queries

Journey from almost no time to nearly 9 seconds. If you insert the second query in 20 places, website will  be dead slow even with high resources. There is  a chance for experienced people not doing this silly mistake. But there is a “select * from db illusion” that got taught in our first database class and widely used. People even though need few fields, fetches objects with full data from DB to make overhead. It is like doing

mysql> select first_name from person
mysql> select * from person

Here we have only one additional field. But in reality we need 5 fields out of 40. Then querying all fields loads the memory with unnecessary data. There is a solution for this. Let us fetch only the first names of people who live in Hyderabad city.

# This query fetches only id, first_name from DB
>>> p1 = Person.objects.filter(city="Hyderabad").values("id","first_name")[0]
>>> print p1["first_name]

# This fetches all fields information
>>> p1 = Person.objects.filter(city="Hyderabad")[0]
>>> print p1["first_name]

This query only fetches two columns id, first_name instead of fetching all. It will save memory of unwanted fields from just filtering.

* Repetitive Database calls

In SQL, joins are used to fetch data in a single shot from related tables. We can apply inner joins to combine results from multiple tables matching a criteria.  Django provides advanced constructs like select_related and prefetch_related to optimize the related object queries. I will show here why we need to use them.

from django.db import models

class Author(models.Model):
    name = models.CharField(max_length=30)
    # ...

class Book(models.Model):
    name = models.CharField(max_length=30)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)
    # ...

Here Book has a foreign key of Author. So we can query books in this way.

# Hits the DB for first time
>>> book = Book.objects.get(id=1)
>>> book.name

# Hits the DB again 
>>> book.author.name

If you are querying a set of books and then trying to access all their related authors, it is a bunch of queries suffocating the DB.

from django.utils import timezone

# Find all the authors who published books
authors = set()

for e in Book.objects.filter(pub_date__lt=timezone.now()):
    # For each published book make a DB query to fetch author.
    authors.add(e.author)
 It means if there are 300 books, 300 queries are going to be hit.
What is Solution?
You should use select_related in that case. It fetches all related fields specified using Joins Similarly
>>> book = Book.objects.select_related('author').get(id=1)
# This won't cost another query
>>> book.author.name
 Similarly you can use prefetch_related for many to many fields since select_related can only used for one to one field. For thorough inspection of how Django ORM is making SQL calls use connection.queries from django.db library
>>> from django import db

# It gives a list of raw SQL queries those executed by django on DB
>>> print db.connection.queries

# Clear that list and start listening to SQL
>>> db.reset_queries()
muftaba
For more advanced tips for optimization of ORM visit these official django docs.

* Caching (Swiss knife)

Caching is the best method to reduce the DB hits as many as possible. There are different kinds of caching implementations in Django.

  • cached property on model
  • template caching
  • query caching

cached property on model

We all use properties on Django models. They are the functions which returns calculated properties from a particular model. For example let us have a fullName property which returns complete name by appending first_name + last_name. Each time you compute fullName on a model, some processing needs to be done on a model data.

from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)

    @property
    def fullName(self):
        # Any expensive calculation on instance data
        return self.first_name + " " + self.last_name



>>> naren = Person.objects.get(pk = 1)

# Now it calculates fullName from first_name and last_name data of instance
>>> naren.fullName
Naren Aryan

And if we call it in template, once again value is calculated from data.

<p> Name: {{ naren.fullName }} </p>
If you know that for a particular model instance, calculated property won’t change then you can cache that result instead of calculating it once again. So modify code to….
from django.utils.functional import cached_property
    # ...
    @cached_property
    def fullName(self):
        # Any expensive calculation on instance data
        # This returning value is cached and not calculated again
        return self.first_name + " " + self.last_name
Now if you call the fullName property model returns a cached value instead of returning computed first_name + last_name. You can invalidate the data by deleting the property on a model instance. Here appending first_name and last_name is a simple thing. It is very useful in optimizing a heavily computation task that processed in a property.

Query caching

Many times we call the same queries to fetch data. If data is not changing rapidly we can cache the QuerySet which is returned by a particular query. Caching systems generates a hash of SQL query and maps them to cached results. So whenever ORM tries to call the model query sets the cached query sets will be called. There are two good caching libraries in available in Django.

Using cache machine with Redis as back-end store, we can cache the QuerySets. Usage is very simple. But invalidating data here is done by timeouts. Invalidating data and refreshing query set can also be done effectively using post-save data hook on a model. For example

from django.db import models

from caching.base import CachingManager, CachingMixin

class Person(CachingMixin, models.Model):
    name = CharField(max_length=30)
    objects = CachingManager()
We can cache all QuerySets generated for the Person model by simple syntax as above. It is a good feature if you have more reads over writes. And remember to invalidate a Query set when new data is saved. Use timeouts according to actual situation.

Template Caching

If you have web pages whose content won’t change for longer periods of time then cache the parts like sub menu page or navigation bar of website which remains constant. For a news website, content remains same on side pane etc. You can give time out for a particular fragment of template. Until the timeout happen, only cached page will be returned reducing the DB hits. We can use cache machine once again for doing this task. Django also has an inbuilt caching available. This is a small but effective step in optimizing the Django web site.

{% load cache %}
{% cache 500 sidebar %}
    .. sidebar ..
{% endcache %}

For more information visit this link for Per-view caching and many more.

Non blocking Code

When your Django project size is growing and different teams are cluttering your code, the main problem comes with adding synchronous API calls in between the code. There is another case where Django code got blocked in doing “No Hurry” things (like sending email, converting invoice HTML to PDF) and instant necessities  (show web page) are not being served .In both the cases you need to follow asynchronous task completion which removes burden from your main Django’s python interpreter. Use following

  • Messaging Queues + Worker management (Rabbit MQ + Celery)
  • Async IO – Python 3 (or) Python future-requests -Python 2.7

I wrote a practical guide of how to use celery and Redis to do that in my article.integrating Mailgun Email service with Django

* Scaling Infrastructure

In additional to coding standards for optimization, stack also plays a vital role in scaling a Django website. But it is waste to set up huge stack with all bad practices. Here I am going to briefly show which stack allows us to scale.

django_sky1

But think of having all these when you really need them.The essential components those should be in your stack are:

  • Load Balancers (HAProxy)
  • Web accelarators (Varnish)
  • Caching backends (Redis)
  • JSON stores (PostgreSQL JSON store)

Caching back-end like Redis can be used for multiple purposes. Storing cache results from multiple caching systems and to store frequent data of small size for verifying the users etc. Varnish is a good static file caching system. You can have heartbeat based load balancers that shares load between multiple web application servers intelligently. There are lot of good open source tools available too for tuning a website and analyzing the week points. I prefer postgreSQL JSON store than Mongo DB for storing JSON documents.

All this proves that a Django website can live happily with minimal stack with correct ORM implementation standards. If actually needed, then right infrastructure will comes to the rescue. Many of these patterns are also applicable to other language web frameworks too.

If you have any query, comment below or mail me at narenarya@live.com

References

https://docs.djangoproject.com/en/1.9/ref/utils/#django.utils.functional.cached_property

https://docs.djangoproject.com/en/1.9/topics/cache/

https://github.com/jmoiron/johnny-cache

http://blog.narenarya.in

https://highperformancedjango.com/

https://dzone.com/articles/milk-your-caching-all-its

internet-of-things-9

Building a Virtual Personal Assistant with Telegram app and Telepot

Have you ever wondered what comforts a truly programmable app can give to a consumer. Many of us admire Internet of things( IOT) . So today I am going to talk about creating Personal Assistants (PA) for Telegram app, an application similar to whatsApp but fully programmable using it’s Bot API. We can pass messages to multiple devices in a click of eye. Telegram is the app which hackers likes because of it’s customization. Messaging apps should not only provide communication between humans but also should lay channels between humans and programmable machines. There are obvious advantages for programmers who use Telegram App.

  1. Get Github and Sentry messages directly to your app
  2. Get favorite tweets from Twitter
  3. Get updates from multiple information sources like weather or scores
  4. Control home appliances by sending pre-defined commands

Applications are endless. In this IOT generation you need a platform to program and Telegram API provides you that.

This tutorial will use two main ingredients.

  1. Telegram app
  2. Telepot python library

Target

Our goal is to build a PA for us using Bot API provided by Telegram app. We need to install the app

Telegram app on Google playstore

You can also have a Telegram client for your system. Mine is Ubuntu14.04. You can download all clients here.

Telegram desktop clients

I presume that you installed the telegram app. Now we need to create a bot for us. Telegram provides us a program called Bot Father using which we can create custom bots. Launch BotFather by visiting this link.

https://telegram.me/botfather

After adding BotFather to your chat list enter into it, you will see few options like this

 

Screenshot from 2015-12-06 21:31:56

now type /newbot and hit Enter. It will ask you for a name, give that. Next it will generate an API key, which we are going to use to build our Bot. Store this API key. It also gives a link to your bot. Visit that link to add it as one of your friends. Share it to others if you want to.

Telepot, a python client for Telegram

Telepot is a python client which is the REST wrapper for Telegram API. Using it we can take commands from user and compute something and give back results to the user. Now I am going to build a small bot program which does following things when below commands are given.

  1. /timeline -> Should fetch the latest tweets on my timeline
  2. /tweet=message  -> Should tweet my message on Twitter
  3. /chat  -> Should launch a virtual chat with machine
  4. /stopchat -> You are bored and stop chatting

These tasks might be simpler one. But you should be aware that if you know how to unleash the power of message passing between devices you can define your own custom tasks which can have greater value of application. Code for this application is at  https://github.com/narenaryan/Mika

Let us build the PA

First we need to install the sufficient libraries for constructing the Virtual Assistant. I name my PA as mika.

$ virtualenv  telegram-bot
$ source telegram-bot/bin/activate
$ pip install telepot tweepy nltk

We are installing telepot for sending and receiving messages from Telegram. NLTK for using it’s virtual chat engines. Tweepy is used for accessing twitter account through consumer keys. For now I am creating a simple bot command which returns “Hello, how are you?” when we say hello to it.

# pa.py
import telepot, time

def handle(msg):
    chat_id = msg['chat']['id']
    command = msg['text']
    print 'Got command: %s' % command

    if command == '/hello':
        bot.sendMessage(chat_id, "Hello, how are you?")

# Create a bot object with API key
bot = telepot.Bot('152871568:AAFRaZ6ibZQ52wEs2sd2XXXXXXXXX')

# Attach a function to notifyOnMessage call back
bot.notifyOnMessage(handle)

# Listen to the messages
while 1:
    time.sleep(10)

Run $ python pa.py

Now enter /hello in the bot channel you created. you will see the following output.

Screenshot from 2015-12-06 21:49:20

So our bot received our message and replied back to us with the greeting. It is actually the Python code running under the hood managing those tasks. Code is very simple. We need to

  • Create  Bot object using API key
  • Create a function for handling commands and returning information
  • Attach the above function to call back handler of Bot. Whenever bot receives a message this function handler executes. We can have any logic in those handlers.

You can see what all inputs you can accept and types of outputs you can send to users from bot here. telepot github link

For now let us Integrate twitter and chat engines of NLTK into our bot. We all know that NLTK comes with few chat engines like Eliza, Iesha,Zen etc. I am here using a chatbot called Iesha. Before I create a file called tweep.py for managing my tweet and timeline fetch tasks.

# tweep.py
import tweepy

#I prepared this class for simplicity. Fill in details and use it.
class Tweet:
    #My Twitter consumer key
    consumer_key='3CbMubgpZvXXXXXXXXXX'
    #My consumer secret
    consumer_secret='Clua2xLNfvbjj3Zoi4BQU5EXXXXXXXXXXX'
    #My access token
    access_token='153952894-cPurjdaQW7bA3B3eXXXXXXXXXXXX'
    #My access token secret
    access_token_secret='r6NJ6qjPrYDenqwuHaop1eBnXXXXXXXXXXXXX'

    def __init__(self):
        self.auth = tweepy.OAuthHandler(self.consumer_key,self.consumer_secret)
        self.auth.set_access_token(self.access_token, self.access_token_secret)
        self.handle = tweepy.API(self.auth)

        def hitme(self,str):
            self.handle.update_status(str)
            print 'tweet posted succesfully'

Now let me finish the show with adding both chatting and Tweeting.

import telepot, time
from nltk.chat.iesha import iesha_chatbot
from tweep import Tweet

# create tweet client
tweet_client = Tweet()
is_chatting = False

def handle(msg):
    global is_chatting
    global tweet_client
    chat_id = msg['chat']['id']
    command = msg['text']
    print 'Got command: %s' % command
    
    if command == '/timeline' and not is_chatting:
        bot.sendMessage(chat_id, '\n'.join([message.text for message in tweet_client.handle.home_timeline()]))
    elif command.split('=')[0] == '/tweet' and not is_chatting:
        try:
            tweet_client.hitme(command.split('=')[1] + ' #mika')
            bot.sendMessage(chat_id, 'Your message tweeted successfully')
        except:
            bot.sendMessage(chat_id, 'There is some problem tweeting! Try after some time')
    elif command == '/chat':
        is_chatting = True
        bot.sendMessage(chat_id, 'Hi I am Iesha. Who are You?')
    elif command == '/stopchat':
        is_chatting = False
        bot.sendMessage(chat_id, 'Bye Bye. take care!')
    elif not command.startswith('/') and is_chatting:
        bot.sendMessage(chat_id, iesha_chatbot.respond(command))
    else:
        pass


# Create a bot object with API key
bot = telepot.Bot('152871568:AAFRaZ6ibZQ52wEs2sd2Tp4Wcs-IXoWfA-Q')

# Attach a function to notifyOnMessage call back
bot.notifyOnMessage(handle)

# Listen to the messages
while 1:
 time.sleep(10)

So output screens will be like this for /chat and /tweet

Screenshot from 2015-12-06 23:41:21

For /timeline

Screenshot from 2015-12-06 23:42:32

Isn’t it funny. We can add lots of features to this basic personal assistant bot like

  1. Tracking time
  2. Scheduler and alert
  3. Notes taking etc
  4. Opening your garage gate when you push a command to bot

If you observe the code there isn’t much in that. I just used Elisha chat bot from NLTK. Used tweepy methods to fetch timeline and post a tweet. If you want to use code visit my repo.

https://github.com/narenaryan/Mika

Thanks for reading this stuff. Hope it will be helpful for you to build your own IOT.

tennis_ball_by_chopshopstuk-d385m7m

Visualize Tennis World with Plot.ly,Docker and Pandas

Namaste everyone. Today I am going to do a small experiment on the tennis game history. I questioned my self  “how I can know the facts of tennis without asking others?”. “What if I generate them myselves?”. So I tried to visualize and see what are the top countries producing majority number of Tennis players. But here I don’t want to go straight forward into solution. Rather we will discuss about few things which are useful in constructing a universal visualization lab. I want to use this article to introduce plot.ly, a plotting library in Python.

What I finally visualized in the experiment

I wanted to find out what countries are having large number of players in professional ATP tennis.

 

Western countries are occupying top list in producing tennis players in ATP history.

Western countries are occupying top list in producing tennis players.

We can solve many other queries like:

“How well players are performing in their respective ages?”

“Which country is producing more quality players?”

and more. But I am going to show you how we can visualize and bring solution like above one.

For downloading the Ipython notebook visit this link. https://github.com/narenaryan/tennis_atp/blob/master/most_player_countries.ipynb 

Building a Python data visualization lab in the docker

Folks, you may be wondering why I brought docker into picture. I am discussing about docker because it is an advantage for a data analyst or a developer to isolate his job with other stuff. I need to write 100 articles showing setup procedure in 100 operating systems. But docker allows us to create an identical container in any operating system we are working with. I will show now how to build a complete scientific python stack from scratch in a docker container. You can store it as a package which you can also push to cloud via dockerhub. So let us begin.

I hope you know something about docker. If not just read my previous article here. Docker up and running

Step 1

$ docker run -i -t -p 0.0.0.0:8000:8000 -p 0.0.0.0:8001:8001 -v /home/naren/pylab:/home/pylab ubuntu:14.04

By this a Ubuntu14.04 container will be created with two ports open. 8000,8001.We can use these ports to forward Ipython notebook to host browser in our visualization procedure later. It also mounts the pylab folder in my host /home directory to /pylab in container.  When you run this, you will be automatically enter into the bash shell of the container.

Step 2

Now install required packages as below.

root@ffrt76yu:/# apt-get update && apt-get upgrade
root@ffrt76yu:/# apt-get install build-essential
root@ffrt76yu:/# apt-get install python python-pip python-dev
root@ffrt76yu:/# pip install pandas ipython jupyter plotly

That’s it.  Pandas will install numpy and matplotlib as deapendencies. We are now ready with our development environment for visualizing anything. We can launch a Ipython notebook using this command.

s ipython notebook --ip=0.0.0.0 --port=8000

So now we have a running Ipython notebook on port 8000 of our local machine. Now fire up your browser and you will find notebook software is running on it. select new “python 27” project in the top right menu.

If you don’t want all the pain, just pull my plotting environment from docker hub.

$ docker run -i -t -p 0.0.0.0:8000:8000 -p 0.0.0.0:8001:8001 -v /home/naren/pylab:/home/pylab narenarya/plotlab

Beginning of the visualization

plot.ly is a library which allows us to create complex graphs  and charts using numpy and pandas. We can load a dataset into a dataframe using pandas. Then we will plot the cleaned data using plot.ly. Full documentation of plot.ly can be found at: https://plot.ly/python/

For my work I used Jeff Sachmann’s ATP tennis dataset from github. https://github.com/JeffSackmann/tennis_atp 

Extract all data set files to your pylab so that it is visible to your notebook. We here are interested in the atp_players.csv. We first clean data to find out how many players belong to a single country and map them on a scatter plot. Code looks like this.

from random import shuffle
import colorsys
import pandas as pd
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *

init_notebook_mode()

# Load players into players dataframe 
players = pd.read_csv('atp_players.csv')

# Find top 20 countries with more player frequncies 
countries = players.groupby(['Country']).size()
selected_countries = countries.sort_values(ascending=False)[:20]

# Generating 20 random color palettes for plotting each country.
N = 20
HSV_tuples = [(x*1.0/N, 0.5, 0.5) for x in range(N)]
RGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)
shuffle(RGB_tuples)

""" Plot.ly plotting code. A plot.ly iplot needs data and a layout 
    So now we prepare data and then layout. Here data is a scatter plot
"""
trace0 = Scatter(
    x = list(selected_countries.index),
    y = list(selected_countries.values),
    mode = 'markers',
    marker = {'color' : plot_colors, 'size' : [30] * N}
)

# Data can be a list of plot types. You can have more than one scatter plots on figure 
data = [trace0]

# layout has properties like x-axis label, y-axis label, background-color etc
layout = Layout(
    xaxis = {'title':"Country"}, # x-axis label
    yaxis = {'title':" No of ATP players produced"}, # y-axis label
    showlegend=False,
    height=600, # height & width of plot
    width=600,
    paper_bgcolor='rgb(233,233,233)', 
    plot_bgcolor='rgb(233,233,233)', # background color of plot layout
)

# Build figure from data, layout and plot it.
fig = Figure(data=data, layout=layout)
iplot(fig)

There is nothiing facny in the code. We just did the following things:

  • Loaded ATP players dataset into Pandas Dataframe
  • We need to assign different random colors to each country. So created random RGB values
  • Created a Scatter kind of plot with markers mode
  • Created a layout with axis details
  • plotted data and layout using iplot method of plotly library.

When I run this code in Ipython notebook (Shift + Enter). I will see the scatter plot given in the beginning of article.

For full documentation on all kinds of plots visit this link. https://plot.ly/python/

This is only one visualization from dataset. You can draw so many analytics from all the datasets provided in the git repo. One obvious advantage here is you are doing this entire thing in a docker container. It will be faster and easy to overcome failure of environments. You can also commit your container to a docker image.

For downloading my Ipython notebook visit this link. https://github.com/narenaryan/tennis_atp/blob/master/most_player_countries.ipynb 

my email address is: narenarya@live.com . Thanks to all.

snake_python_color_head_51750_3840x2160

Lessons I learnt in quest of writing beautiful python code

Hello everyone. I always wonder what are the good practices in developing software in Python. I am young and  inexperienced few years back. But  people around me and situations I faced from past few years had taught me many things. Many things about coding style, good development patterns etc. Here I am going to discuss few things which are important to turn your normal coding style into an elegant one. These things are collected from my own , others code reviews.

If you keep all these points in mind  from tomorrow you will see a different aspect of coding. Thanks for my inspirational man, Chandra -Software Architect @ Knowlarity Communications for reviewing my code and giving valuable tips with his vast software development experience. Let us see how not to write code.

* Your code is a Baby. Protect it with Exception Handling

A Software or program fails when it accepts the wrong input. A good developer always handles his piece of code. No one can guess all possible bugs that creeps in. In statically typed languages like C, C++ type system enforces the kind of information passed to a variable. But in dynamic languages like Python and Ruby there are many chances of failure of a program due to entry of incorrect type. Duck typing is a comfort. But it comes with expense of more careful error handling. Here I always wrap my code in TRY | EXCEPT blocks. If you know what type of error you might encounter, it is easy to make your code function properly. At least it won’t break your code. Let us see the first illustration of handling JSON data.

import json

def handle_json(data_string):
    parsed_data = json.loads(data_string)
    return parsed_data

A newbie of Python just leaves the above code and thinks his job was finished. But code may break if ill formed JSON is passed through handle_json function. So it is better to handle error.

import json

def handle_json(data_string):
    try:   
        parsed_data = json.loads(data_string)
    except:
       return {}
    return parsed_data

This is basic error handling. It will turn into a good practice if we log a message when error occurs. Handling specific error will do more good.

import json

def handle_json(data_string):
    try:   
        parsed_data = json.loads(data_string)
    except ValueError as e:
        logger.info("Error occured: %s" % e.message)
        return {}
    return parsed_data

So never think error handling as an add on. It is a compulsory thing when writing software for reliable systems.

* Never put magic numbers in the code

It is common for us to use constants in the programs. We define few things as mapping to sequence of numbers. Enumerate data type is an example. It gives us a range of named constants. So use name of the constant instead of constant itself.

fruit = int(raw_input("1.Apple\n2.Mango\n3.Gauva\n4.Grape\n5.Orange\nEnter your favorite fruit: "))
if fruit == 1:
    print "Fruit is Apple"
elif fruit == 2:
    print "Fruit is Mango"
elif fruit == 3:
    print "Fruit is Gauva"
.....
else:
    print "Fruit is not available"

It is just a simple program which inputs a number and uses that input to select fruit type. But when one sees the code, he will be wondered what those 1,2,3 means. English names convey better messages than mere numbers. So good practice is not to hard code anything. Instead use your own Enum type to map meaningful names to Numbers.

class Fruit(object):
    APPLE, MANGO, GAUVA, GRAPE, ORANGE = range(1,6) 
 
    @classmethod
     def tostring(cls, val):
         """String representation of a Fruit type."""
         for k, v in vars(cls).iteritems():
             if v == val:
                 return k
fruit = int(raw_input("1.Apple\n2.Mango\n3.Gauva\n4.Grape\n5.Orange\nEnter your favorite fruit: "))
print "The fruit is: %s" % (Fruit.tostring(Fruit.APPLE)).capitalize()

See by building our own enumeration, we are able to transform a hard coded program into beautiful, meaningful one. Here we defined a class to store named constants. We are reverse looking a key from value using our tostring method. Never ever put magic numbers in the code because in larger systems it creates ambiguity. Code is for humans first and for computers next.

* Best ways of working with a dictionary

Many of us will be working with dictionaries in Python as frequently as we take a sip of coffee. When carefully observed beginner developers usually have a habit of accessing a dictionary value using bracket method.

students =  {1: "Naren", 2: "Sriman", 3:"Habeeb"}
print students[1]

Everybody does that, you might wonder. Yes it is the most trivial way of accessing a value from a dictionary. But as we shouted in our first tip, you should handle error when you try to query dictionary for non-existing key. Like this you can say

students =  {1: "Naren", 2: "Sriman",  3:"Habeeb", 4:"Ashwin"}
try:
    print students[1]
except KeyError as e:
    print None

Instead of doing all these things we can do one straight operation called GET on a dictionary. Python will return you value for a key if key exists else returns None.

students =  {1: "Naren", 2: "Sriman", 3:"Habeeb", 4:"Ashwin "}
print students.get(1)
# This prints None
print students.get(101)

But in my beginning of development career, I used to mix both the ways in a program which looks pretty awkward. So my advice is to use get function or bracket [] method according to your personal taste but two things.

  • Using get gives you automatic error handling
  • Keep your program uniform.

One more useful case is when you are processing a dictionary and want’s to update an existing dictionary with the new one. Many people does this thing.

students =  {1: "Naren", 2: "Sriman", 3:"Habeeb", 4:"Ashwin "}
new_students = {5: "Tony", 6:"Srikanth", 7:"Rajesh"}
# A trivial way to add new students to students map
students[5] = new_students[5]
students[6] = new_students[6]
students[7] = new_students[7]
But there is a handy method called  UPDATE on any python dictionary. It allows us to merge the second dictionary with the first.
students =  {1: "Naren", 2: "Sriman", 3:"Habeeb", 4:"Ashwin "}
new_students = {5: "Tony", 6:"Srikanth", 7:"Rajesh"}
students.update(new_students)
This function is crisp because it is avoiding lot of typing. And also it makes program looks cleaner.

* Always do Validation of the data first and then pre-processing

My context here is, many fellow programmers do return empty (None) from a function when they found data is invalid to proceed. But they do lot of pre-processing before checking for validity. Here computation is wasted. It is illogical for a program to spend time in doing useless things and checking whether it use useful or ignored. It may seems fine for many people but handling this design pattern cleverly can have huge impact on code performance.

 

valid_data = [1,2,3,4,5]
def process(value):
    new_value = preprocess(value)
    if value not in valid_data:
        return None
    return new_value

def preprocess(value):
    # Do a heavy computation task
    return value

print process(23)
In less code situations we will capture the inefficiency of checking condition last. I always feel of losing my common sense, if I see above mistake in my code later. I always check conditions in the first line of function and then do anything with that data. So process function should be like this
def process(value):
    # Make habit of filtering in first line itself
     if value not in valid_data:
         return None
    # Now do whatever you want
    new_value = preprocess(value)
    return new_value

I write code for a telephony company where product is built upon thousands of lines of legacy python code. There performance is critical. If I design one procedure using above mistake, it will have a huge business impact on the product. Even few seconds delay is not bearable. So keep in mind. Always return invalid cases and then do pre-processing.  

* Avoid trivial conditionals in code

This is not actually a mistake but a very good practice to avoid lot of IF and ELSE blocks in the code.

def  is_even(value):
    if value % 2 == 0:
        return True
    else:
        return False
print is_even(4)

But observing carefully we can remove else here because it is trivial that if condition is True, control won’t stay any more in the function. So we can modify code to

def  is_even(value):
    if value % 2 == 0:
        return True
    return False
print is_even(4)

So remember this as a thumb rule. “Always  try to use a single conditional when there is a truth checking and take another as trivial thing“.

* Other notable points

In addition to above points, there are few other important things.

  • Touch maximum level of abstraction by placing common logic on top level of code and specific implementations on bottom.
  • Follow PEP-8 and PEP-257. It will make code more readable. I hated it first but now loving the structure of code.
  • Make sure of doc strings of classes and methods conveying the right message in a Python program.
  • In ORM like Django or SQLAlchemy use filter rather than Get because the former one is safe. FILTER always return empty list, GET throws duplicate error which you should  handle explicitly.
  • Make a habit of removing print statements and debuggers before committing the code to GIT.
  • When you add a new feature, please do write a unit test case. It will help a new developer in understanding functionality of class or procedure you had defined.
  • Never push code without developer testing.

Once again thanks for my inspirational man, Chandra -Software Architect @ Knowlarity Communications and Mohammed Habeeb  for reviewing my code and giving valuable tips with their vast software development experience.

base62

Building your own URL shortening service with python and flask

Have you ever wondered how people create URL shortening websites. They just do it using common sense. You heard it right. I too thought it is a very big task. But after thinking a bit, I came to know that simple mathematical concepts can be used in writing beautiful applications. What is the link between mathematics and URL shortening?. That is what we are going to unveil in this article.

In a single statement URL shortening service is built upon two things.

  1.  String mapping Algorithm to map long strings to short strings ( Base 62)
  2.  A simple web framework (Flask, Tornado) that redirects a short URL to Original URL

There are two obvious advantages of URL shortening.

  1. Can remember the URL. Easy to maintain.
  2. Can use the links where there are restrictions in text length Ex. Twitter.

Technique of URL shortening

There is nothing like URL shortening algorithm. Under the hoods, every record storing in the database is allocated with one Primary Key(PK).  That PK is passed into an algorithm which in turn generates a string. We will indirectly map that short string with the URL that customer registers with us.

I visit website of Bit.ly and pass my blog link http://www.impythonist.wordpress.com to it. Then I got this short link.

Screenshot from 2015-10-31 18:29:14

Here one question comes to our mind. How they reduce lengthy string to a short one? .  They are not actually reducing size of original link.They just do abstraction here. Steps every one do are:

  • Insert a record with URL into database
  • Use the record ID returned to generate the short string
  • Pass it back to Customer
  • Whenever you receive a request, then extract short string from URL and re-generate Database record ID -> Fetch the URL -> Simple Redirect to Website

base62

That’s it. It is very simple to generate a short string from a given large number using Base62 Algorithm. Whenever a request comes to our website,  we can get back the number by decoding the short string from URL. Then use that number ID to fetch record from database and redirect to that URL.

Let us build one such URL shortener in Python

Code for this project is available at my git repo. https://github.com/narenaryan/Pyster

As I told you before there are three ingredients in preparing a URL shortening service.

  • Base62 Encoder and Decoder
  • Flask for handling requests and redirects
  • SQLite3 for serving the purpose of database

Now If you know about converting Base10 to Base64 or Base62( any base) then you can proceed with me. Other wise just see what are base conversions here.

http://tools.ietf.org/html/rfc3548.html

I here interested only in Base62 because I need to generate strings which are combinations of [a-z][A-Z][0-9].  Encoder maps integer to a string. Decoder generates integer from given string.  They are like Function and Reverse Functions. This is the Base62 code for encoder and decoder in Python

from math import floor
import string

def toBase62(num, b = 62):
    if b <= 0 or b > 62:
        return 0
    base = string.digits + string.lowercase + string.uppercase
    r = num % b
    res = base[r];
    q = floor(num / b)
    while q:
        r = q % b
        q = floor(q / b)
        res = base[int(r)] + res
    return res

def toBase10(num, b = 62):
    base = string.digits + string.lowercase + string.uppercase
    limit = len(num)
    res = 0
    for i in xrange(limit):
        res = b * res + base.find(num[i])
    return res
Now let me create a database called urls.db using the following command.
 $ sqlite3 urls.db

Now I am creating main.py  for flask app and a template file.

# main.py 

from flask import Flask, request, render_template, redirect
from math import floor
from sqlite3 import OperationalError
import string, sqlite3
from urlparse import urlparse

host = 'http://localhost:5000/'

#Assuming urls.db is in your app root folder
def table_check():
    create_table = """
        CREATE TABLE WEB_URL(
        ID INT PRIMARY KEY     AUTOINCREMENT,
        URL  TEXT    NOT NULL
        );
        """
    with sqlite3.connect('urls.db') as conn:
        cursor = conn.cursor()
        try:
            cursor.execute(create_table)
        except OperationalError:
            pass

# Base62 Encoder and Decoder
def toBase62(num, b = 62):
    if b <= 0 or b > 62:
        return 0
    base = string.digits + string.lowercase + string.uppercase
    r = num % b
    res = base[r];
    q = floor(num / b)
    while q:
        r = q % b
        q = floor(q / b)
        res = base[int(r)] + res
    return res

def toBase10(num, b = 62):
    base = string.digits + string.lowercase + string.uppercase
    limit = len(num)
    res = 0
    for i in xrange(limit):
        res = b * res + base.find(num[i])
    return res


app = Flask(__name__)

# Home page where user should enter 
@app.route('/', methods=['GET', 'POST'])
def home():
    if request.method == 'POST':
        original_url = request.form.get('url')
        if urlparse(original_url).scheme == '':
            original_url = 'http://' + original_url
        with sqlite3.connect('urls.db') as conn:
            cursor = conn.cursor()
            insert_row = """
                INSERT INTO WEB_URL (URL)
                    VALUES ('%s')
                """%(original_url)
            result_cursor = cursor.execute(insert_row)
            encoded_string = toBase62(result_cursor.lastrowid)
        return render_template('home.html',short_url= host + encoded_string)
    return render_template('home.html')



@app.route('/<short_url>')
def redirect_short_url(short_url):
    decoded_string = toBase10(short_url)
    redirect_url = 'http://localhost:5000'
    with sqlite3.connect('urls.db') as conn:
        cursor = conn.cursor()
        select_row = """
                SELECT URL FROM WEB_URL
                    WHERE ID=%s
                """%(decoded_string)
        result_cursor = cursor.execute(select_row)
        try:
            redirect_url = result_cursor.fetchone()[0]
        except Exception as e:
            print e
    return redirect(redirect_url)


if __name__ == '__main__':
    # This code checks whether database table is created or not
    table_check()
    app.run(debug=True)

 Let me explain what is going on here.
  • We have Base62 encoder and decoder
  • We have two functions one is index. Another one is short_url
  • Index function(‘/’) returns home page and also posts original URL into database
  • short url(‘/short_url’) just recieves the request for redirect and finally redirects shortened URL to Original URL. If you observe code carefully, you can easily grasp things.

We can also give look at template here. https://raw.githubusercontent.com/narenaryan/Pyster/master/templates/home.html .

Project structure looks this way.

Screenshot from 2015-11-01 01:50:01

Run the flask app on port 5000.

$ python main.py
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat......

If you visit http://localhost:5000 in your browser you will see

Screenshot from 2015-11-01 01:30:49

Now  enter URL to shorten and click submit. It posts data to database and generates short string like below image. In my case it is http://localhost:5000/f . The string seems to be very short, but as no of URLs registered increase the string increases gradually. Ex. 11Qxd etc

Screenshot from 2015-11-01 01:32:20

 Now if we click that link, it takes us to http://www.example.org
Screenshot from 2015-11-01 01:34:52
So this is how URL shortening work. For entire code, just clone my repo and give a try. https://github.com/narenaryan/Pyster
I hope you enjoyed the article. Please do comment if you have any query. Even you can mail me at narenarya@live.com
B7mY398IMAA1o-A

A primer on Database Transactions and Asynchronous Requests in Django

Hello, Namaste.  Today we are going to  look at few Django web framework cookies that makes our life more sweeter.  Let us learn few things which helps us implement the functionality when situation demands. The topics are following:

  1. Implementing Database transactions in  Django
  2.  Making asynchronous HTTP requests from Django code

1) Django DB Transactions

I am creating a REST API. I want to insert POST data  into database. But here a list is received in POST. I want to validate each element in list and  make sure to insert data. Here there are two rules to say this insertion operation atomic.

  • Insert data if all elements pass the validation criteria.
  • While inserting, if  there is duplicate data then abort the transaction  and return integrity error.

Demo project will be available at https://github.com/narenaryan/trans-cookie

Let me create a sample Django project to illustrate all the things we are going to discuss. I am doing this on Ubuntu14.04 Machine with Python2.7, Django1.8 and MySQL

$ virtualenv cookie
$ source cookie/bin/activate
$ pip install django==1.8.5 python-mysqldb

Now let us create a sample project called cookie

$ django-admin startproject cookie

Here in cookie I am going to create a view which takes a list of numbers and if all numbers are primes then it will store those numbers in db. If invalid prime or any duplicate entry  it aborts the operation.

$ django-admin startapp primer

Now do the following to create a model called Prime in primer app.

# primer/models.py

from django.db import models
import re

class Prime(models.Model):
    number = models.IntegerField(unique=True)
    def __str__(self):
        return str(self.number)
    def prime_check(self):
        if re.match(r'^1?$|^(11+?)\1+$', '1' * self.number):
            raise Exception('Number is not prime')

Prime_check is the function we defined to validate data before inserting into db. Always validate your data using a model class method. Now go and modify settings.py to change database to MySQL. Add primer app to settings.py and run migrations

 # cookie/settings.py
...

INSTALLED_APPS = (
 'django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'primer'
)

...
DATABASES = {
 'default': {
 'ENGINE': 'django.db.backends.mysql',
 'NAME': 'cookie',
 'USER': 'root',
 'PASSWORD': 'passme',
 'HOST': 'localhost',
 'PORT': '3306',
 }
}
...
$ python manage.py makemigrations
$ python manage.py syncdb

Now MySQL tables User,Prime will be created. Now let us create a url and view that takes list of numbers as POST and inserts into db, if all are primes. Now modify primer/urls.py and primer/models.py as below:

# primer/urls.py
from django.conf.urls import include, url
from django.contrib import admin
from primer import views

urlpatterns = [
   url(r'^admin/', include(admin.site.urls)),
   url(r'^supply_primes/$', views.supply_primes, name="prime")
]
# primer/views.py
from django.shortcuts import render
from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
from primer.models import Prime
import json 

# Create your views here.
@csrf_exempt
def supply_primes(request):
    if request.method == 'GET':
        return JsonResponse({'response':'prime numbers insert API'})
    if request.method == 'POST':
        primes = json.loads(request.body)['primes']
        #Validating data before inserting
        valid_prime = Prime()
        for number in primes:
            valid_prime.number = number
            try:
                valid_prime.prime_check()
            except Exception:
                message = {'error': {
                      'prime_number': 'The Prime number : %s \
                       is invalid.' % number}}
                return JsonResponse(message)
        return JsonResponse({"response":"data successfully stored"})

We can filter data before inserting anything. Integration error comes only when we insert data into db.

If we insert [11, 13, 15]  and next try to insert [14, 15, 13] then in the second case error will be returned while inserting 13 because it is duplicate.  But already 14, 15 are inserted. This is where transactions comes handy. Now we can modify code to

from django.shortcuts import render
from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
from primer.models import Prime
from django.db import transaction,IntegrityError
import json 

# Create your views here.

@csrf_exempt
def supply_primes(request):
    if request.method == 'GET':
        return JsonResponse({'response':'prime numbers insert API'})
    if request.method == 'POST':
        primes = json.loads(request.body)['primes']
        #Validating data before inserting
        valid_prime = Prime()
        for number in primes:
            valid_prime.number = number
            try:
                valid_prime.prime_check()
            except Exception:
                message = {'error': {
                      'prime_number': 'The Prime number : %s \
                       is invalid.' % number}}
                return JsonResponse(message)
         #Carefully look for exceptions in real time at inserting
        transaction.set_autocommit(False)
        for number in primes:
            try:
                Prime(number=number).save()
            except IntegrityError:
            # We got error. undo all previous insertions
                transaction.rollback()
                message = {'error': {'prime_number': 'This prime number(%s) is already registered.' % number}}
                return JsonResponse(message)
         # If everything is fine, Commit the changes and flush db
        transaction.commit()
        return JsonResponse({"response":"data successfully stored"})

 

The three statements I used for transactions

  • transaction.set_autocommit(False)
  • transaction.rollback()
  • transaction.commit()

First statement tells remove auto pilot and make it manual . Let me chose whether to save something or not

Second statement tells rollback whatever changes I did until last commit

Third statement pinpoints that commit and flush to db.

These statements gives us full control of storage pattern in databases. Without transactions you have one single statement Prime(number=number).save(),  which directly push changes to database. If we need to put something into DB through our own logic then use transaction library in Django.

Let us see it in action

Run Django web  server as below

  $ python manage.py runserver 0.0.0.0:8200

It runs our Django project on localhost with PORT 8200

Let us use fire postman to make a POST request to http://localhost:8200/supply_primes . You can also use CURL.

Screenshot from 2015-10-26 16:46:51

It is showing that data is successfully stored. Because all are primes. If we see the data.

Screenshot from 2015-10-26 16:50:04

Now let me try to insert [26, 13, 17]. Because 26 is not prime it returns me following response.

Screenshot from 2015-10-26 16:51:01

cool. Then try to insert [29, 13, 67]. If you observe we are trying to insert duplicate.

Screenshot from 2015-10-26 16:52:42

and database looks like

Screenshot from 2015-10-26 16:53:38

Here 29 is not inserted. It is inserted actually but rolled back when 13 generates IntegrityError. This is how transactions work.

2) Asynchronous Requests from the Django Code

Think that your Django code base is too large and slow. Some one is asking you to insert a hook in the code which posts some data to external URL. Then your django behaves more slower. If you are making 100 sequential requests then the last hook is executed after a long time. critical code should not be blocked because of side players.

The solution to overcome this problem is to make asynchronous non-blocking requests from the Django code.

* Synchronous code

import requests
res = requests.get('http://localhost:8200/supply_primes')
# some other django task
counter += 1

Here counter will be incremented after a successful or erratic request made by second statement. It means blocking request is making django to pause until request is processed.

* Asynchronous code

If we are using Python3 we have a wonderful library called asyncio to make parallel HTTP requests. visit this diligent link if you are using Python3. http://geekgirl.io/concurrent-http-requests-with-python3-and-asyncio/  . If you are using your Django projects with Python 2.7.x then carry on.

$ pip install requests-futures

This is the library which makes parallel requests through requests library of Python.

from requests_futures.sessions import FuturesSession
session = FuturesSession()
res = session.get('http://localhost:8200/supply_primes')
# Some other django task
count += 1

Here session.get won’t block the increment of counter. So your django code speeds up. Always use this library wherever you want to spawn a separate process for making HTTP requests.

For more details visit demo project at https://github.com/narenaryan/trans-cookie 

Five trivial things every python programmer should work with

 

Namaste everyone. This time I came up with  few sensitive suggestions which can effect our coding style. A good habit leads to a good output. If you are already working with the things I am going to mention, then you are on right track. Other wise you will sure gain something useful in few minutes.

1) Virtualenv

Yeah, the first important thing we should know about is working with virtual environments in python. I observed that lot of people are installing packages for their default python Interpreter. Separating the interpreting environment  always keeps things clean. We can work with different projects on same machine without conflicts using virtual environment. For installing virtualenv on Ubuntu -14.04 machine just do

$ sudo apt-get install python-pip
$ pip install virtualenv

Suppose I am working on Flask project, I create a virtual environment for that and install all dependencies for Flask. A virtual environment is created with command “virtualenv env_name”

# This creates a virtual environment called flask_env
$ virtualenv ~/flask_env

Now tell machine to drop default python interpreter and load this flask_env interpreter using

$ source ~/flask_env/bin/activate

Now you are in a separate world. Install packages using pip.

(flask_env)$ pip install flask requests

Now if you want to drop from virtual environment do, deactivate

# This command deactivates virtual environment's interpreter and loads default

(flask_env)$ deactivate

Hint: Always use Virtualenv to separate project environments.

2) IPython

Have you ever faced problem of hitting up arrow key for several times to collect nth previous command in python shell. Also you need to rush to Python API for knowing about properties and methods available in a package or module. Then you should use IPython. It is an interactive shell with tons of options. You can see method names, properties of any module on the fly. It is a tool that every programmer should have.  For installing IPython just use this command.

$ pip install ipython

there is another variation of IPython called Notebook where we can save our scripts as notebooks on web based interpreter. We can share them, use them.

You can launch IPython shell using ” ipython” command. To see the suggestion lookup for method names press TAB after entering the dot ( . )

Screenshot from 2015-10-11 19:54:56

Generally IPython is used for creating shorter scripts and testing language features . My favorite command is “%cpaste”. Using it I can copy code directly into terminal without losing the indentation.  In conventional python shell pasting and formatting is painful. For more details visit this link  https://github.com/ipython/ipython  

3) Anaconda sublime plugin

Screenshot from 2015-10-11 20:16:26

If you are writing shorter scripts and testing them IPython is sufficient. But if you want a full fledged  python editor with following features:

  • Automatic code completion
  • PEP-8 and PEP-257 checking and reporting

Then you should use Anaconda plugin with Sublime Text . Sublime Text 3 is a great editor for python development. It is fluid, takes less resources and can handle any kind of file without pain. Combining  [ Anaconda plugin + Sublime Text 3 ] = Python IDE . You can see how to setup plugin using package control here.

http://damnwidget.github.io/anaconda/

4) IPdb

One more common thing I observe in python beginners is not using any debugger while testing their code. Python is interpreted language. It executes line by line. But still in big projects with various function calls, we do not  get the actual code flow. We all know classic debugger in python called Pdb. IPdb is a combination of IPython + Pdb (Interactive Python Debugger).

Using IPdb we can set break points anywhere in our code using one single statement.

import ipdb;ipdb.set_trace()

Insert above statement into your python code. When program executes, control stops at the above line. From then you can go line by line and inspect variables etc to debug code. I am listing the primary keys used for debugging here.

  • n – execute next line
  • c – execute remaining program
  • r – continue execution until current function returns

For more details about IPdb and debugging visit this link. http://georgejhunt.com/olpc/pydebug/pydebug/ipdb.html

5)  Logging

I saw people  putting print statements many times to debug the code and to write information on console. Logging information on console is a very bad practice. Python provides an excellent in-built library which is sadly neglected by most of the python developers. Logging your program activities is a very good habit to avoid failures. Here we can jump start how to log a python function activity in  a file.

# loggex.py
import logging


logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

fh = logging.FileHandler('add.log')

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

fh.setFormatter(formatter)
logger.addHandler(fh)

def add(x,y):
    logger.info("Just now received parameters %d and %d" % (x, y))
    addition = x + y
    logger.info("Returning the computed addition %d" % addition)
    return addition

if __name__ == '__main__':
    add(13,31)

Here we are not doing anything fancy. We are just logging activity of an add function in a file called add.log. Here we created a python script  loggex.py which does these things.

  • Create a logger object with current file name as handle
  • Set level to DEBUG. It can also be INFO or ERROR according to the context of log
  • Create a file handler, which redirects logs to a physical file
  • Create format handler and set it to file handler. This is nothing but defining custom message for time and date etc  that appears in the log file.
  • Add file handler to logger object we created
  • Sprinkle INFO  or DEBUG messages wherever you want to note down activity. They will be recorded in the file. You can review a log file in case of failure.

Screenshot from 2015-10-11 20:58:54

See how simple logging is. But very few developers shows interest in doing it while building software. Make logging in your program as a habit.

So these are five notable minimum things every python developer should use and care about to improve their productivity. If you have any queries just comment below. Thanks.

narenarya@live.com

@Narenarya3

Mattias-Adolfsson5

Build massively scalable RESTFul API with Falcon and PyPy

Namaste everyone. If you build a RESTFul API for some purpose, what technology stack you use in python and why?. I may receive the following answers from you.

1)  I use Flask with Flask-RESTFul

2)  I use (Django + Tastypie) or (Django + REST Framework)

Both options are not suitable for me. Because there is a very good light-weight API framework available in python called Falcon. I always keep my project and REST API loosely coupled. It means my REST API knows little about the Django or Flask project that is being implemented. Creating cloud API’s with low-level web framework than a bulky wrapped one always speeds up my API.

What is Falcon?

As per Falcon website Falcon official website

“Falcon is a minimalist WSGI library for building speedy web APIs and app backends. We like to think of Falcon as the Dieter Rams of web frameworks.”

“When it comes to building HTTP APIs, other frameworks weigh you down with tons of dependencies and unnecessary abstractions. Falcon cuts to the chase with a clean design that embraces HTTP and the REST architectural style.”

If you want to hit bare metal for creating API use Falcon. You can build easy to develop, easy to serve and easy to scale API with Falcon. Just use it for speed.

What is PyPy?

“If you want your code to run faster, you should probably just use PyPy.” — Guido van Rossum

PyPy is a fast, compliant alternative implementation of the Python language

So PyPy is a JIT implementation for your Python code. It is a separate interpreter that can be used as a normal interpreter in a virtual environment to power our projects. In most of the cases, there are no issues with PyPy.

Let’s start building a simple todo REST API

Note: Project source is available at https://github.com/narenaryan/Falcon-REST-API-Pattern

Falcon and PyPy are our ingredients to build scalable, faster REST API. We start with a virtual environment that runs PyPy with falcon installed using pip. Then we use rethinkDB as the resource provider for our API. Our todo app does three main things.

  1. Create a note (PUT)
  2. Fetch a note by ID (GET)
  3. Fetch all notes (GET)
  4. PUT & DELETE are obvious

Install RethinkDB on Ubuntu14.04 in this way.

$ source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt $DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
$ wget -qO- http://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install rethinkdb
$ sudo cp /etc/rethinkdb/default.conf.sample /etc/rethinkdb/instances.d/instance1.conf
$ sudo /etc/init.d/rethinkdb restart

Create virtualenv for the project and install required libraries. Download PyPy from this URL .PyPy Download. After downloading, extract files and install pip if required.

$ sudo apt-get install python-pip
$ virtualenv -p pypy-2.6.1-linux64/bin/pypy falconenv
$ source falconenv/bin/activate
$ pip install rethinkdb falcon gunicorn

Now we are ready with our stack. PyPy as python interpreter, Falcon as web framework to build the RESTful API. Gunicorn is a WSGI server that serves our API. Now, let us prepare our rethinkDB database client for fetching and inserting resources. Let me give the filename “db_client.py”

#db_client.py
import os
import rethinkdb as r
from rethinkdb.errors import RqlRuntimeError, RqlDriverError

RDB_HOST = 'localhost'
RDB_PORT = 28015

# Datbase is todo and table is notes
PROJECT_DB = 'todo'
PROJECT_TABLE = 'notes'

# Set up db connection client
db_connection = r.connect(RDB_HOST,RDB_PORT)


# Function is for cross-checking database and table exists 
def dbSetup():
    try:
        r.db_create(PROJECT_DB).run(db_connection)
        print 'Database setup completed.'
    except RqlRuntimeError:
        try:
            r.db(PROJECT_DB).table_create(PROJECT_TABLE).run(db_connection)
            print 'Table creation completed'
        except:
            print 'Table already exists.Nothing to do'

dbSetup()

Don’t worry, if you do not know about rethinkDB. Just go to this link and see quickstart. RethinkDB Python. We just prepared a db connection client and created database, table. Now the actual thing comes. Falcon allows us to define a resource class which we can route to a URL. In that resource class we can have four REST methods

  1. on_get
  2. on_post
  3. on_put
  4. on_delete

So we are going to implement first two functions in this article. Create a file called app.py.

#app.py
import falcon
import json

from db_client import *

class NoteResource:
 
    def on_get(self, req, resp):
        """Handles GET requests"""
        # Return note for particular ID
        if req.get_param("id"):
            result = {'note': r.db(PROJECT_DB).table(PROJECT_TABLE). get(req.get_param("id")).run(db_connection)}
        else:
            note_cursor = r.db(PROJECT_DB).table(PROJECT_TABLE).run(db_connection)
            result = {'notes': [i for i in note_cursor]}
        resp.body = json.dumps(result)

    def on_post(self, req, resp):
         """Handles POST requests"""
         try:
             raw_json = req.stream.read()
         except Exception as ex:
             raise falcon.HTTPError(falcon.HTTP_400,'Error',ex.message)

         try:
             result = json.loads(raw_json, encoding='utf-8')
             sid =  r.db(PROJECT_DB).table(PROJECT_TABLE).insert({'title':result['title'],'body':result['body']}).run(db_connection)
             resp.body = 'Successfully inserted %s'%sid
         except ValueError:
             raise falcon.HTTPError(falcon.HTTP_400,'Invalid JSON','Could not decode the request body. The ''JSON was incorrect.')

api = falcon.API()
api.add_route('/notes', NoteResource())

We can break down the code into following pieces.

  1. We imported falcon and database client
  2. Created a resource class called NoteResource
  3. Created two methods called on_get and on_post on NoteResource.
  4. In on_get method, we are checking for “id” parameter in the request and sending one resource (note) or all resources (notes). req, resp are the request and response objects of falcon respectively.
  5. In on_post method, we are checking for data as a raw JSON. We are decoding that raw JSON to store title and body in the rethinkDB notes table.
  6. We are creating API class of falcon and adding a route for it. ex: ‘/notes’ in our case.

Now in order to serve API, we should start WSGI server because falcon needs an independent server to deliver the API. So launch Gunicorn

$ gunicorn app:api

 

This will run Gunicorn WSGI server on port 8000. Visit

http://localhost:8000/notes

to view all notes stored.

If notes are empty then add one using POST request to our API.

Screenshot from 2015-09-13 03:13:54

Now add one more note as shown above with different data. Let us say it is { “title” : “At 10:00 AM” , “body” : ” Scrum meeting scheduled”}. Now visit http://localhost:8000/notes once again and you will find this

 

.Screenshot from 2015-09-13 03:19:37

If we want to fetch an element by id then do it with this. http://localhost:8000/notes?id=d24866be-36f0-4713-81fd-750b1b2b3bd4. Now only one note with given ID will be displayed.

Screenshot from 2015-09-13 03:22:51

This is how falcon enables us to create REST API easily at very low level. There are many additional features available for Falcon. For more details visit Falcon home page. If you want to see the full source code of above demonstration, visit this link.

https://github.com/narenaryan/Falcon-REST-API-Pattern

please do comment if you have any query. Have a good day:) .

Build a real time data push engine using Python and Rethinkdb

art_hipster-wallpaper-960x540

Namaste everyone.Today we are going to talk about building real time data push engines.How to design models for the modern realtime web will be the lime light point in this article. We are going to build a cool push enigne that notifies “Super Heroes” real time in the Justice League(DC). We can also develop real time chat applications very easily with same principles.

What actually is a Data Push Engine?

Push engine is nothing but a software piece that pushes notifications from the server to all the clients who subscribed to recieve these events.When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain.In order to overcome this burden two proposals were made.

  1. Web Sockets
  2. Server Sent Events(SSE)

But using any one of the above technologies is not sufficient for modern real time web. Think it in this way. The query-response database access model works well on the web because it maps directly to HTTP’s request-response. However, modern marketplaces, streaming analytics apps, multiplayer games, and collaborative web and mobile apps require sending data directly to the client in realtime. For example, when a user changes the position of a button in a collaborative design app, the server has to notify other users that are simultaneously working on the same project. Web browsers support these use cases via WebSockets and long-lived HTTP connections, but relying on database to notify updates is cool.

Seeing is believing

I am going to run my project first to make you confident with it. Project is nothing but a website which does following. Code for this project is available at  https://github.com/narenaryan/thinktor

  • I am going to start a Justice League website (like the one superman runs).
  • Website collects nickname and email of a SuperHero.
  • Notify all existing heroes about new joinees in real time.

So I am going to tell you a small story. Just click the first image and navigate to last one by one. Don’t forget to read description below in each  image!. Press Esc to exit from slide show.

I think you got something with above story.If you don’t let me explain. Here we are asking information from clients and navigating them to their dashboard. From then all clients who are on dashboard will be notified about newly joined people instantly. No refresh,No ajax polling. Thanks to our push engine.

Are you kidding ,I can implement that using web sockets?

Yes are right. You can purely implement the above notification system using websockets. But why I used few more things to do that. Here is the answer.

“Using websockets code for designing push logic  is cumbersome. Websocket code must do a push from server and recieve that in client . Traditional databases do not know about the websockets or Server Sent Events. There we need to poll the database changes and then push them to intermediate queue and from there to clients. I say remove that headache from our server. Just exploit database capability of pushing changes in realtime whenever a change occurs to it’s data. That is why I chose RethinkDB plus Websockets“.

How I build that Push engine

I used two main ingredients to create data push engine shown above.

  1. Python Tornado web server ( for handling websocket requests and responses)
  2. RethinkDB ( for storing data and also to push real time changes to the server)

What is RethinkDB?

According to RethinkDB official website

RethinkDB is the first open-source, scalable JSON database built from the ground up for the realtime web. It inverts the traditional database architecture by exposing an exciting new access model – instead of polling for changes, the developer can tell RethinkDB to continuously push updated query results to applications in realtime. RethinkDB’s realtime push architecture dramatically reduces the time and effort necessary to build scalable realtime apps.

When is RethinkDB a good choice?

RethinkDB is a great choice when your applications could benefit from realtime feeds to your data.

The query-response database access model works well on the web because it maps directly to HTTP’s request-response. However, modern applications require sending data directly to the client in realtime. Use cases where companies benefited from RethinkDB’s realtime push architecture include:

  • Collaborative web and mobile apps
  • Streaming analytics apps
  • Multiplayer games
  • Realtime marketplaces
  • Connected devices

We know that modern web demands falls in one of the above catagories.So RethinkDB is extremely useful for the people want to exploit it’s real power for building real time apps.

RethinkDB has a dedicated python driver.In our project we are just inserting our dicument and reading the changes on users table.For getting familiar with RethinkDB python client visit these links.

http://rethinkdb.com/docs/guide/python/

http://rethinkdb.com/docs/introduction-to-reql/

Setup for our data push engine

Install RethinkDB on Ubuntu14.04 in this way.

$ source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt $DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
$ wget -qO- http://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install rethinkdb
$ sudo cp /etc/rethinkdb/default.conf.sample /etc/rethinkdb/instances.d/instance1.conf
$ sudo /etc/init.d/rethinkdb restart

Create virtualenv for the project and install required libraries

$ virtualenv rethink
$ source rethink/bin/activate
$ pip install tornado rethinkdb jinja2

Now everything is fine.My main applciation will be app.py and there are templates and staticfiles in my project.The project structure looks like this.

.
|-- app.py
|-- conf.py
|-- requirements.txt
|-- static
|  `-- js
|      `-- sockhand.js
`-- templates
|   `--detail.html
    `--home.html

Now letus write our app.py file.

#For tornado server stuff 

import tornado.ioloop
import tornado.web
import tornado.gen
import tornado.websocket
import tornado.httpserver
from tornado.concurrent import Future


from jinja2 import Environment, FileSystemLoader #For templating stuff

import rethinkdb as r #For db stuff

from rethinkdb.errors import RqlRuntimeError, RqlDriverError

from conf import * #Fetching db and table details here


#Load the template environment

template_env = Environment(loader=FileSystemLoader("templates"))

db_connection = r.connect(RDB_HOST,RDB_PORT) #Connecting to RethinkDB server

#Our superheroes who connects to server
subscribers = set() 

#This is just for cross-checking database and table exists 
def dbSetup():
    print PROJECT_DB,db_connection
    try:
        r.db_create(PROJECT_DB).run(db_connection)
        print 'Database setup completed.'
    except RqlRuntimeError:
        try:
            r.db(PROJECT_DB).table_create(PROJECT_TABLE).run(connection)
            print 'Table creation completed'
        except:
            print 'Table already exists.Nothing to do'
        print 'App database already exists.Nothing to do'
    db_connection.close()

#There is a loop type in python rethinkDB client.set it to tornado
r.set_loop_type("tornado")


class MainHandler(tornado.web.RequestHandler): #Class that renders details page and Dashbaord
    @tornado.gen.coroutine
    def get(self):
        detail_template = template_env.get_template("detail.html") #Loads tenplate
        self.write(detail_template.render())
 
    @tornado.gen.coroutine
    def post(self):
        home_template = template_env.get_template("home.html")
        email = self.get_argument("email")
        name = self.get_argument("nickname")
        connection = r.connect(RDB_HOST, RDB_PORT, PROJECT_DB)
        #Thread the connection
        threaded_conn = yield connection
        result = r.table(PROJECT_TABLE).insert({ "name": name , "email" : email}, conflict="error").run(threaded_conn)
        print 'log: %s inserted successfully'%result
        self.write(home_template.render({"name":name}))


#Sends the new user joined alerts to all subscribers who subscribed
@tornado.gen.coroutine
def send_user_alert():
    while True:
        try:
            temp_conn = yield r.connect(RDB_HOST,RDB_PORT,PROJECT_DB)
            feed = yield r.table("users").changes().run(temp_conn)
            while (yield feed.fetch_next()):
                new_user_alert = yield feed.next()
                for subscriber in subscribers:
                    subscriber.write_message(new_user_alert)
        except:
            pass


class WSocketHandler(tornado.websocket.WebSocketHandler): #Tornado Websocket Handler
    def check_origin(self, origin):
        return True

    def open(self):
        self.stream.set_nodelay(True)
        subscribers.add(self) #Join client to our league

    def on_close(self):
        if self in subscribers:
            subscribers.remove(self) #Remove client


if __name__ == "__main__":
    dbSetup() #Check DB and Tables were pre created
 
    #Define tornado application
    current_dir = os.path.dirname(os.path.abspath(__file__))
    static_folder = os.path.join(current_dir, 'static')
    tornado_app = tornado.web.Application([('/', MainHandler), #For Landing Page (r'/ws', WSocketHandler), #For Sockets
(r'/static/(.*)', tornado.web.StaticFileHandler, { 'path': static_folder }) #Define static folder 
 ])

    #Start the server
    server = tornado.httpserver.HTTPServer(tornado_app)
    server.listen(8000) #Bind port 8888 to server
    tornado.ioloop.IOLoop.current().add_callback(send_user_alert)
    tornado.ioloop.IOLoop.instance().start()

I am going to define database configuration parameters like db_name, table_name etc in seperate conf.py file.

import os

RDB_HOST = os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
PROJECT_DB = 'userfeed'
PROJECT_TABLE = 'users'

That’s it. We had our app.py and conf.py ready. I will explain what I did above in app.py point-wise below.

  • importing tornado tools and rethinkDB client drivers
  • writing a function called db_setup that checks whether required database and table were created or not
  • using MainHandler class to handle http requests. For GET request displaying enter details page and for POST showing the dashboard.
  • WSocketHandler is the tornado websocket handler that adds or removes subscribers.
  • We have one method called send_user_alert . It is the actual pusher of changes to the client.It does only two things. “subscribing to database table change” . “sending those changes to client “

In rethinkdb we have a concept called change feeds. It is similar to Redis PUBSUB.We can subscibe to a particular change-feed and rethindb returns us a cursor which is of infinite length.Whenever db recieves a change in particular table it triggers event to that subscribed cursor with new and old values of data.For example.

#cursor is returned when we subscribe to changes on authors table
cursor = r.table("users").changes().run(connection)

#just loop through it infinitely to grab changes that RethinkDB push to cursor
for document in cursor:
     print(document)

I think you got the thing by now. The other files in our project are templates and static files

  • detail.html
  • home.html
  • sockhand.js

The code for templates is quite obvious. You can find templates here  https://github.com/narenaryan/thinktor

But we need to look into js file

 
//function that listens to Socket and do something when notification comes
function listen() {
    var source = new WebSocket('ws://' + window.location.host + '/ws');
    var parent = document.getElementById("mycol")
    source.onmessage = function(msg) {
              var message = JSON.parse(msg.data);
              console.log(message);
              //Return random color for superhero
 
              var child = document.createElement("DIV");
              child.className = 'ui red message';
 
              var text = message['new_val']['name'].toUpperCase() + ' joined the league on '+ Date(); 
              var content = document.createTextNode(text);
              child.appendChild(content);
              parent.appendChild(child);
              return false;
       }
}

$(document).ready(function(){
    console.log('I am ready'); 
    listen();
});

Here we are defining a listen function when webpage is loaded. That listen function initializes a variable called source which is of type WebSocket and links it to the /ws url that we defined in the Tornado application. It also sets a callback when a message is recieved and that callback code updates the DOM structure and adds information about new user.

If you are still confused ,then run  application yourselves and see the things. The app we wrote above is a data push engine that routes directly from database to client.  Go to this project link https://github.com/narenaryan/thinktor . clone it. Install requirements.txt .Then run app.py.Just visit localhost:8000. If you still have any queries on how it works then feel free to comment below or approach narenarya@live.com

I thought to introduce rethinkDB for absolute beginners but article becomes very lengthy then.Sure I will come up with an article dedicated for RethinkDB in near future.

In this way we can build a real time data push engine using python and Rethinkdb.

Points to ponder

  • Use rethinkDB for building real time applications.It is scalable too.
  • Use Tornado because it can easily handle concurrent connections without any fuss.
  • Remove queuing from your architecturaal design
  • Use websockets for bidirectional communication
  • Try out new things frequently

References