Visualize Tennis World with,Docker and Pandas

Namaste everyone. Today I am going to do a small experiment on the tennis game history. I questioned my self  “how I can know the facts of tennis without asking others?”. “What if I generate them myselves?”. So I tried to visualize and see what are the top countries producing majority number of Tennis players. But here I don’t want to go straight forward into solution. Rather we will discuss about few things which are useful in constructing a universal visualization lab. I want to use this article to introduce, a plotting library in Python.

What I finally visualized in the experiment

I wanted to find out what countries are having large number of players in professional ATP tennis.


Western countries are occupying top list in producing tennis players in ATP history.
Western countries are occupying top list in producing tennis players.

We can solve many other queries like:

“How well players are performing in their respective ages?”

“Which country is producing more quality players?”

and more. But I am going to show you how we can visualize and bring solution like above one.

For downloading the Ipython notebook visit this link. 

Building a Python data visualization lab in the docker

Folks, you may be wondering why I brought docker into picture. I am discussing about docker because it is an advantage for a data analyst or a developer to isolate his job with other stuff. I need to write 100 articles showing setup procedure in 100 operating systems. But docker allows us to create an identical container in any operating system we are working with. I will show now how to build a complete scientific python stack from scratch in a docker container. You can store it as a package which you can also push to cloud via dockerhub. So let us begin.

I hope you know something about docker. If not just read my previous article here. Docker up and running

Step 1

$ docker run -i -t -p -p -v /home/naren/pylab:/home/pylab ubuntu:14.04

By this a Ubuntu14.04 container will be created with two ports open. 8000,8001.We can use these ports to forward Ipython notebook to host browser in our visualization procedure later. It also mounts the pylab folder in my host /home directory to /pylab in container.  When you run this, you will be automatically enter into the bash shell of the container.

Step 2

Now install required packages as below.

root@ffrt76yu:/# apt-get update && apt-get upgrade
root@ffrt76yu:/# apt-get install build-essential
root@ffrt76yu:/# apt-get install python python-pip python-dev
root@ffrt76yu:/# pip install pandas ipython jupyter plotly

That’s it.  Pandas will install numpy and matplotlib as deapendencies. We are now ready with our development environment for visualizing anything. We can launch a Ipython notebook using this command.

s ipython notebook --ip= --port=8000

So now we have a running Ipython notebook on port 8000 of our local machine. Now fire up your browser and you will find notebook software is running on it. select new “python 27” project in the top right menu.

If you don’t want all the pain, just pull my plotting environment from docker hub.

$ docker run -i -t -p -p -v /home/naren/pylab:/home/pylab narenarya/plotlab

Beginning of the visualization is a library which allows us to create complex graphs  and charts using numpy and pandas. We can load a dataset into a dataframe using pandas. Then we will plot the cleaned data using Full documentation of can be found at:

For my work I used Jeff Sachmann’s ATP tennis dataset from github. 

Extract all data set files to your pylab so that it is visible to your notebook. We here are interested in the atp_players.csv. We first clean data to find out how many players belong to a single country and map them on a scatter plot. Code looks like this.

from random import shuffle
import colorsys
import pandas as pd
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *


# Load players into players dataframe 
players = pd.read_csv('atp_players.csv')

# Find top 20 countries with more player frequncies 
countries = players.groupby(['Country']).size()
selected_countries = countries.sort_values(ascending=False)[:20]

# Generating 20 random color palettes for plotting each country.
N = 20
HSV_tuples = [(x*1.0/N, 0.5, 0.5) for x in range(N)]
RGB_tuples = map(lambda x: colorsys.hsv_to_rgb(*x), HSV_tuples)

""" plotting code. A iplot needs data and a layout 
    So now we prepare data and then layout. Here data is a scatter plot
trace0 = Scatter(
    x = list(selected_countries.index),
    y = list(selected_countries.values),
    mode = 'markers',
    marker = {'color' : plot_colors, 'size' : [30] * N}

# Data can be a list of plot types. You can have more than one scatter plots on figure 
data = [trace0]

# layout has properties like x-axis label, y-axis label, background-color etc
layout = Layout(
    xaxis = {'title':"Country"}, # x-axis label
    yaxis = {'title':" No of ATP players produced"}, # y-axis label
    height=600, # height & width of plot
    plot_bgcolor='rgb(233,233,233)', # background color of plot layout

# Build figure from data, layout and plot it.
fig = Figure(data=data, layout=layout)

There is nothiing facny in the code. We just did the following things:

  • Loaded ATP players dataset into Pandas Dataframe
  • We need to assign different random colors to each country. So created random RGB values
  • Created a Scatter kind of plot with markers mode
  • Created a layout with axis details
  • plotted data and layout using iplot method of plotly library.

When I run this code in Ipython notebook (Shift + Enter). I will see the scatter plot given in the beginning of article.

For full documentation on all kinds of plots visit this link.

This is only one visualization from dataset. You can draw so many analytics from all the datasets provided in the git repo. One obvious advantage here is you are doing this entire thing in a docker container. It will be faster and easy to overcome failure of environments. You can also commit your container to a docker image.

For downloading my Ipython notebook visit this link. 

my email address is: . Thanks to all.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s