Anatomy and application of parallel programming in python

Google data center clusters

What is Parallel  Processing?

Parallel processing is the simultaneous use of more than one CPU or processor core to execute a program or multiple computational threads. Ideally, parallel processing makes programs run faster because there are more engines (CPUs or cores) running it. In practice, it is often difficult to divide a program in such a way that separate CPUs or cores can execute different portions without interfering with each other. Most computers have just one CPU, but some models have several, and multi-core processor chips are becoming the norm. There are even computers with thousands of CPUs. After reading that weird Wikipedia definition, what are you thinking?. The same stuff repeated in our ears for years.We all know the core concepts,but coming to implementation we take step lazily.There are many reasons for that. In this post we are going to understand the basic concepts of parallel processing,distributed computing and jump start to a hands-on example for implementing them.

Why parallel Processing?

Assume that you have a very popular website,it has a web Fibonacci calculator.If everything is processed sequentially the requests from users are stored in a Queue and program calculates Fibonacci no and return it to user.If website got 80000 requests and calculating sequentially causes waiting of last submitted user.His reload bar will rotate for ever.How all e-commerce websites are serving the customers with light speed spontaneously?. They use parallelization techniques.Learn how to use it,implement straight away in your programmes.

 Let’s begin the show

The parallel program is the program that is distributed among various process,they in turn given to the cores of the processor to execute. Take a look at smallest parallel computing illustration here.It is multiplying a scalar to a matrix.Multiplication is done by sequentially multiplying scalar to each element.But in parallel computing each element is divided as a unit,and workers are allocated with tasks of multiplying scalar to that element unit.This big task is distributed to 4 processes. Selection_022

Patterns for designing parallel structures

The criteria for designing a pattern for a parallel problem depends on the context of problem itself.The no of workers to be dispatched,no of cores to be used all this factors are dealt with taking main problem into consideration.But here we are going to see a universal pattern for parallel computing problems.

Pipeline concept

This concept is so simple.A task is processed in different stages.many no of workers are present in the each stage.At stage 1,workers 1 process the data,next it goes to stage 2 etc. Selection_026 It is like an assembly line shown in Discovery and NatGeo shows of car manufacturing.In that a car chassis  will be sent to the next stage in which multiple robotic hands paint body at a time,next it moves to another stage in which another set of robotic workers fix the bolts etc. Bayerische Motoren Werke AG Unveil Their Latest Mini Automobile

Analyzing best python tools for implementing parallelism

There are four ways for achieving concurrent processing in python. 1) threading and concurrent.features 2) multiprocessing and ProcessPoolExecutor 3) Parallel Python 4) Celery First two tools are available as built in libraries.Next two are external libraries that do most of the job behind the screens.threading solution to a parallel problem is not preferable since,synchronization mechanisms need to be carefully implemented.In this post we are making things simple and clear.No locks,no synchronization techniques.Just execute a program in a parallel way.

Parallel Python, a wonderful ,simple,cool library for parallel as well as distributed computing

If you installed Parallel Python,it is fine.Other wise just open your terminal and install it by downloading compressed file,extract and running files available here. After pp is successfully installed, we are going to build a great practical example to illustrate parallel processing using Parallel Python library,which is a very good tool.

What Parallel Python can do?

The most important advantage of using PP is the abstraction that this module provides. Some important features of PP are as follows: •     Automatic detection of number of processors to improve load balance •     Many processors allocated can be changed at runtime •     Load balance at runtime •     Auto-discovery resources throughout the network

Parallel Python basics

Just we need to create a Server from pp next use submit method of that instance to submit tasks for multiple processes.

import pp

#Server encapsulates and dispatches task to multiple processes
s = pp.Server()

#Server can also be started with multiple cores and other distributed systems

s = pp.Server(ncpus=4,ppservers = ("", ""))

# ncpus => no of cores to use, ppservers = Ip's_of_computers_conected_as_cluster
#There is one important function called submit for adding tasks to processes.

submit(self, func, args=(), depfuncs=(), modules=(),
callback=None, callbackargs=(), group='default',

func -> target function to be executed,
args -> arguments to be passed for func,
modules -> list of packages to be imported by func to do it's function
callback ->A function to which result of func will be returned,we can process results here,like sending computed values as response to user
#For exmaple see this submit call below.
s.submit(fibo_task, (item,), modules=('os',),

with these basics let us solve a real world problem.

Problem ( calculate greater circle distance )

I created a popular website for getting greater circle distance and I need to process 80000 requests.It means the data entered by users should be processed i.e results are computed concurrently. What is greater circle distance?. It is the distance between two locations on earth.Location is a tuple of (latitude,longitude). Calculating greater circle distance uses a formula called Haversine formula.Don’t be panic by seeing big terms.It is a function that returns distance if two locations are given as arguments.

import math

#This is a function.Don't bother it's contents 

def haversine(key,lat1, lon1, lat2, lon2):
    R = 6372.8 # Earth radius in kilometers
    dLat = math.radians(lat2 - lat1)
    dLon = math.radians(lon2 - lon1)
    lat1 = math.radians(lat1)
    lat2 = math.radians(lat2)
    a = math.sin(dLat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dLon/2)**2
    c = 2* math.asin(math.sqrt(a))
    #calculating KM
    a = R * c
    return a

Think this way.haversine function takes lat1,lon1 and lat2,lan2 and return Greater circle distance.For example we have California(37.0000° N, 120.0000° W) and New Jersey(40.0000° N, 74.5000° W).If we want to find distance between California and New Jersey then use above function as distance_ca_nj = haversine(37.0000,120.0000,40.0000,74.0000) #Gives us distance between California and New jersey in KM.

How to make haversine function execute in parallel fashion

Here I am using four inputs from users,we can extend it to any no of inputs and make them run concurrently.

import os, pp

users={'california_to_newJersey' : (37.0000,120.0000,40.0000,74.0000),
'oklahoma_to_texas' : (35.5000, 98.0000,31.0000, 100.0000),
'arizona_to_kansas' : (34.0000, 112.0000,38.5000, 98.0000),
'mississippi_to_boston' : (33.0000, 90.0000,42.3581, 71.0636)} 

#This dict stores Great distance values for each key in users
result_dict = {}

Now we can modify haversine function.

def haversine(key,lat1, lon1, lat2, lon2):
    R = 6372.8 # Earth radius in kilometers
    dLat = math.radians(lat2 - lat1)
    dLon = math.radians(lon2 - lon1)
    lat1 = math.radians(lat1)
    lat2 = math.radians(lat2)
    a = math.sin(dLat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dLon/2)**2
    c = 2* math.asin(math.sqrt(a))
    #calculating KM
    a = R * c
    message = "the Great Circle Distance calculated by pid %d was %d KM"%(os.getpid(), a)
    return (key, message)
Next we need to process the result returned.So let’s create a callback function that takes result returned and stores it in the result_dict
def aggregate_results(result):
    print "Computing results with PID [%d]" % os.getpid()
    result_dict[result[0]] = result[1]
 Now let us code the main thing that adds tasks to process,we will use all the functions we created above.
job_server = pp.Server(ncpus=4)

for key in users.keys(): 
    job_server.submit(haversine,(key,users[key][0],users[key][1],users[key] [2],users[key][3]), modules=('os','math'), \

Above line creates multiple processes and assign 
executing haversine function with arguments as key,expanded tuple.
Processes starts executing using all cores of your processor.
Wait for all processes complete execution before retrieving result

#Next main process starts executing
print "Main process PID [%d]" % os.getpid()
for key, value in result_dict.items():
    print "For input %d, %s" % (key, value)

See the output, It works

Why there is common pid for all processes?,actually program executed parallel under main process.So main process process id is listed for each dispatched worker.But it  executes in 1/4 th  time of general sequential program.
Total code can be found at this github link.Just save from below repository and run it.You will find it’s lightening execution speed.
any discussions please forward to

Parallel Programming with Python
By Jan Palach



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s