Data mining now simplified with python

http://www.logilab.org/file/203841/raw/pandas_logo.png

Do you know pandas, a Python library for data analysis? Version 0.13 came out on January the 16th and this post describes a few new features and improvements that I think are important.

Each release has its list of bug fixes and API changes. You may read the full release note if you want all the details, but I will just focus on a few things.

You may be interested in one of my previous blog post that showed a few useful Pandas features with datasets from the Quandl website and came with an IPython Notebook for reproducing the results.

Let’s talk about some new and improved Pandas features. I suppose that you have some knowledge of Pandas features and main objects such as Series andDataFrame. If not, I suggest you watch the tutorial video by Wes McKinney on the main page of the project or to read 10 Minutes to Pandas in the documentation.

Refactoring

I congrat the refactoring effort: the Series type, subclassed from ndarray, has now the same base class as DataFrame and Panel, i.e. NDFrame. This work unifies methods and behaviors for these classes. Be aware that you can hit two potential incompatibilities with versions less that 0.13. See internal refactoring for more details.

Timeseries

to_timedelta()

Function pd.to_timedelta to convert a string, scalar or array of strings to a Numpy timedelta type (np.timedelta64 in nanoseconds). It requires a Numpy version >= 1.7. You can handle an array of timedeltas, divide it by an other timedelta to carry out a frequency conversion.

from datetime import timedelta
import numpy as np
import pandas as pd

# Create a Series of timedelta from two DatetimeIndex.
dr1 = pd.date_range('2013/06/23', periods=5)
dr2 = pd.date_range('2013/07/17', periods=5)
td = pd.Series(dr2) - pd.Series(dr1)

# Set some Na{N,T} values.
td[2] -= np.timedelta64(timedelta(minutes=10, seconds=7))
td[3] = np.nan
td[4] += np.timedelta64(timedelta(hours=14, minutes=33))
td
0   24 days, 00:00:00
1   24 days, 00:00:00
2   23 days, 23:49:53
3                 NaT
4   24 days, 14:33:00
dtype: timedelta64[ns]

Note the NaT type (instead of the well-known NaN). For day conversion:

td / np.timedelta64(1, 'D')
0    24.000000
1    24.000000
2    23.992975
3          NaN
4    24.606250
dtype: float64

You can also use the DateOffSet as:

td + pd.offsets.Minute(10) - pd.offsets.Second(7) + pd.offsets.Milli(102)

Nanosecond Time

Support for nanosecond time as an offset. See pd.offsets.Nano. You can use N of this offset in the pd.date_range function as the value of the argument freq.

Daylight Savings

The tz_localize method can now infer a fall daylight savings transition based on the structure of the unlocalized data. This method, as the tz_convert method is available for any DatetimeIndexSeries and DataFrame with a DatetimeIndex. You can use it to localize your datasets thanks to the pytz module or convert your timeseries to a different time zone. See the related documentation about time zone handling. To use the daylight savings inference in the method tz_localize, set the infer_dst argument to True.

DataFrame Features

New Method isin()

New DataFrame method isin which is used for boolean indexing. The argument to this method can be an other DataFrame, a Series, or a dictionary of a list of values. Comparing two DataFrame with isin is equivalent to do df1 == df2. But you can also check if values from a list occur in any column or check if some values for a few specific columns occur in the DataFrame (i.e. using a dict instead of a list as argument):

df = pd.DataFrame({'A': [3, 4, 2, 5],
                   'Q': ['f', 'e', 'd', 'c'],
                   'X': [1.2, 3.4, -5.4, 3.0]})
   A  Q    X
0  3  f  1.2
1  4  e  3.4
2  2  d -5.4
3  5  c  3.0

and then:

df.isin(['f', 1.2, 3.0, 5, 2, 'd'])
       A      Q      X
0   True   True   True
1  False  False  False
2   True   True  False
3   True  False   True

Of course, you can use the previous result as a mask for the current DataFrame.

mask = _
df[mask.any(1)]
      A  Q    X
   0  3  f  1.2
   2  2  d -5.4
   3  5  c  3.0

When you pass a dictionary to the ``isin`` method, you can specify the column
labels for each values.
mask = df.isin({'A': [2, 3, 5], 'Q': ['d', 'c', 'e'], 'X': [1.2, -5.4]})
df[mask]
    A    Q    X
0   3  NaN  1.2
1 NaN    e  NaN
2   2    d -5.4
3   5    c  NaN

See the related documentation for more details or different examples.

New Method str.extract

The new vectorized extract method from the StringMethods object, available with the suffix str on Series or DataFrame. Thus, it is possible to extract some data thanks to regular expressions as followed:

s = pd.Series(['doe@umail.com', 'nobody@post.org', 'wrong.mail', 'pandas@pydata.org', ''])
# Extract usernames.
s.str.extract(r'(\w+)@\w+\.\w+')

returns:

0       doe
1    nobody
2       NaN
3    pandas
4       NaN
dtype: object

Note that the result is a Series with the re match objects. You can also add more groups as:

# Extract usernames and domain.
s.str.extract(r'(\w+)@(\w+\.\w+)')
        0           1
0     doe   umail.com
1  nobody    post.org
2     NaN         NaN
3  pandas  pydata.org
4     NaN         NaN

Elements that do no math return NaN. You can use named groups. More useful if you want a more explicit column names (without NaN values in the following example):

# Extract usernames and domain with named groups.
s.str.extract(r'(?P<user>\w+)@(?P<at>\w+\.\w+)').dropna()
     user          at
0     doe   umail.com
1  nobody    post.org
3  pandas  pydata.org

Thanks to this part of the documentation, I also found out other useful strings methods such as splitstripreplace, etc. when you handle a Series of str for instance. Note that the most of them have already been available in 0.8.1. Take a look at the string handling API doc (recently added) and some basics aboutvectorized strings methods.

Interpolation Methods

DataFrame has a new interpolate method, similar to Series. It was possible to interpolate missing data in a DataFrame before, but it did not take into account the dates if you had index timeseries. Now, it is possible to pass a specific interpolation method to the method function argument. You can use scipy interpolation functions such as slinearquadraticpolynomial, and others. The time method is used to take your index timeseries into account.

from datetime import date
# Arbitrary timeseries
ts = pd.DatetimeIndex([date(2006,5,2), date(2006,12,23), date(2007,4,13),
                       date(2007,6,14), date(2008,8,31)])
df = pd.DataFrame(np.random.randn(5, 2), index=ts, columns=['X', 'Z'])
# Fill the DataFrame with missing values.
df['X'].iloc[[1, -1]] = np.nan
df['Z'].iloc[3] = np.nan
df
                   X         Z
2006-05-02  0.104836 -0.078031
2006-12-23       NaN -0.589680
2007-04-13 -1.751863  0.543744
2007-06-14  1.210980       NaN
2008-08-31       NaN  0.566205

Without any optional argument, you have:

df.interpolate()
                   X         Z
2006-05-02  0.104836 -0.078031
2006-12-23 -0.823514 -0.589680
2007-04-13 -1.751863  0.543744
2007-06-14  1.210980  0.554975
2008-08-31  1.210980  0.566205

With the time method, you obtain:

df.interpolate(method='time')
                   X         Z
2006-05-02  0.104836 -0.078031
2006-12-23 -1.156217 -0.589680
2007-04-13 -1.751863  0.543744
2007-06-14  1.210980  0.546496
2008-08-31  1.210980  0.566205

I suggest you to read more examples in the missing data doc part and the scipy documentation about the module interpolate.

Misc

Convert a Series to a single-column DataFrame with its method to_frame.

Misc & Experimental Features

Retrieve R Datasets

Not a killing feature but very pleasant: the possibility to load into a DataFrame all R datasets listed at http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

import pandas.rpy.common as com
titanic = com.load_data('Titanic')
titanic.head()
  Survived    Age     Sex Class value
0       No  Child    Male   1st   0.0
1       No  Child    Male   2nd   0.0
2       No  Child    Male   3rd  35.0
3       No  Child    Male  Crew   0.0
4       No  Child  Female   1st   0.0

for the datasets about survival of passengers on the Titanic. You can find several and different datasets about New York air quality measurements, body temperature series of two beavers, plant growth results or the violent crime rates by US state for instance. Very useful if you would like to show pandas to a friend, a colleague or your Grandma and you do not have a dataset with you.

And then three great experimental features.

Eval and Query Experimental Features

The eval and query methods which use numexpr which can fastly evaluate array expressions as x - 0.5 * y. For numexprx and y are Numpy arrays. You can use this powerfull feature in pandas to evaluate different DataFrame columns. By the way, we have already talked about numexpr a few years ago in EuroScipy 09: Need for Speed.

df = pd.DataFrame(np.random.randn(10, 3), columns=['x', 'y', 'z'])
df.head()
          x         y         z
0 -0.617131  0.460250 -0.202790
1 -1.943937  0.682401 -0.335515
2  1.139353  0.461892  1.055904
3 -1.441968  0.477755  0.076249
4 -0.375609 -1.338211 -0.852466
df.eval('x + 0.5 * y - z').head()
0   -0.184217
1   -1.267222
2    0.314395
3   -1.279340
4   -0.192248
dtype: float64

About the query method, you can select elements using a very simple query syntax.

df.query('x >= y > z')
          x         y         z
9  2.560888 -0.827737 -1.326839

msgpack Serialization

New reading and writing functions to serialize your data with the great and well-known msgpack library. Note this experimental feature does not have a stable storage format. You can imagine to use zmq to transfer msgpack serialized pandas objects over TCP, IPC or SSH for instance.

Google BigQuery

A recent module pandas.io.gbq which provides a way to load into and extract datasets from the Google BigQuery Web service. I’ve not installed the requirements for this feature now. The example of the release note shows how you can select the average monthly temperature in the year 2000 across the USA. You can also read the related pandas documentation. Nevertheless, you will need a BigQuery account as the other Google’s products.

Take Your Keyboard

Give it a try, play with some data, mangle and plot them, compute some stats, retrieve some patterns or whatever. I’m convinced that pandas will be more and more used and not only for data scientists or quantitative analysts. Open an IPython Notebook, pick up some data and let yourself be tempted by pandas.

I think I will use more the vectorized strings methods that I found out about when writing this post. I’m glad to learn more about timeseries because I know that I’ll use these features. I’m looking forward to the two experimental features such as eval/query and msgpack serialization.

You can follow me on Twitter (@jazzydag). See also Logilab (@logilab_org).

January 21, 2014 12:22 PM

January 20, 2014


Tomasz Ducin

python os.fork parent child PIDs

Ever wondered how to resolve forked child process ID? The following code snippet make things pretty clear:

You need to know two facts:

  • You can always access the PID of the current process using os.getpid(), no matter if you’re in the parent or child process,
  • os.fork() returns 0 inside the child process and (this is standard for Unix systems) and the child PID inside the parent.

Basing on above facts we can analyse output of the script:

[parent] starts PID: 17420
[parent] parent process have created child with PID: 17421
[child] child process can't use os.fork() PID, since it's 0
[child] but it can reevaluate os.getpid() to get it's own PID: 17421

The evaluation goes like this:

  • when executing python fork.py, Operating System creates a new process (which we’ll call parent process in term of forking) that has PID = 17420 (we store it inside the parent_pid variable),
  • when executing os.fork, the process is forked, which means there’d be two processes now. The original process (the parent) continues execution and a new process is created (the child, which is a copy of the parent process),
  • the value of parent_pid is the same for both processes, since the memory was copied from the parent to the child. But the os.fork result is different. It is either 0 or 17421. And basing on this difference we determine in which process we are,
  • if you want to know the child PID inside the PID, you can simply call os.getpid() inside the child process code.

January 20, 2014 11:33 PM

python open interactive console

In this article I’ll show a small code snippet that simulates a breakpoint without using any IDE (Integrated Development Environment). This is similar to firebug’s / chrome developer tools’ javascript console, where you may run your custom commands (typed in realtime) while being enclosed in the brakpoint’s scope. This is very useful when dealing with big/undocumented/legacy code and you want to check the state of variables.

All this code does is copying local/global variables, setting the console autocompletion and starting the interactive shell, where you, the developer, can look at the python runtime environment. The following code presents the console.py module with copen method and test.py file which demonstrates the console usage:

Fetch the repository and run the test.py file:

git clone git@github.com:6882621.git py_console && cd py_console && python test.py

type dir() to check current scope content and see example_list and example_tuple. After closing the console, the script will continue where it stopped (see the print statement):

remote: Counting objects: 14, done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 14 (delta 3), reused 8 (delta 2)
Receiving objects: 100% (14/14), done.
Resolving deltas: 100% (3/3), done.
Python 2.7.2+ (default, Jul 20 2012, 22:12:53) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> dir()
['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'console', 'example_list', 'example_tuple']
>>> example_list
[1, 2, 3]
>>> example_tuple
('abc', 'def')
>>> # hit ctrl+D to quit
>>> 
this will be continued
$

January 20, 2014 11:31 PM


Richard Jones

Python 3.3 and virtualenv

We’re kicking off some new projects using Python 3 (yay!) but had some issues getting virtualenvs working. Which is kinda ironic given that Python 3.3 included virtualenv in it, as pyvenv. Unfortunately, pyvenv isn’t quite the same thing as virtualenv, and in particular it doesn’t install/include pip and setuptools. There’s also some additional issues introduced under Ubuntu.

First, you’ll need to obtain Python 3.3. Some of the methods you could use will work and some are known to produce a non-viable environment. In particular:

  • OS X: get it from homebrew (“brew install python3″). I’ve not tried other avenues, but this works and is the easiest approach in my opinion.
  • Ubuntu: get it from source, building like so:
    sudo apt-get install build-essential libsqlite-dev sqlite3 bzip2 libbz2-dev
    wget http://python.org/ftp/python/3.3.3/Python-3.3.3.tar.bz2
    tar jxf ./Python-3.3.3.tar.bz2
    cd ./Python-3.3.3
    ./configure --prefix=/opt/python3.3
    make && sudo make install

    Do not attempt to use any currently-available pre-built packages (eg. from a PPA) as they will create broken virtualenvs. See this discussion for some enlightenment, but note the lack of a reasonable solution.

  • Windows: no idea, sorry.

Now that you’ve got a Python 3.3 installation, you can create your virtual environment. You do this with this command combination:

pyvenv-3.3 
. /bin/activate
wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py
python3.3 ez_setup.py
wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py
python3.3 get-pip.py

Now you should have a viable, working Python 3.3 virtual environment.

Fortunately Python 3.4 is going to improve on this by installing pip alongside python.

Also, pip 1.5.1′s “get-pip.py” will let you skip that extra setuptools install above when it’s out (real soon).

January 20, 2014 10:33 PM


François Dion

Python tip[6]

Tip #6

Today’s tip is in response to a great question on a local Linux user group:

python -m cProfile myscript.py

What it does: It’ll give you a breakdown per line of how much time each operation takes to execute. Normally, profiling is best done with something like dtrace, to minimize the impact on the run time, but the original question was about figuring out the time for each operation in a python script running on the Raspberry Pi (no dtrace…).

Assuming the following script (we’ll use sleep to simulate different runtime, and not call the same function either, else each would be collased under one line on the report):

from time import sleep

def x():
sleep(4)

def y():
sleep(5)

def z():
sleep(2)

x()
y()
z()
print(“outta here”)

we get:
python -m cProfile script.py
outta here
8 function calls in 11.009 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000   11.009   11.009 t.py:1(<module>)
1    0.000    0.000    4.002    4.002 t.py:3(x)
1    0.000    0.000    5.005    5.005 t.py:6(y)
1    0.000    0.000    2.002    2.002 t.py:9(z)
1    0.000    0.000    0.000    0.000 {method ‘disable’ of ‘_lsprof.Profiler’ objects}
3   11.009    3.670   11.009    3.670 {time.sleep}

François
@f_dion

January 20, 2014 09:40 PM


Mike C. Fletcher

Voice Dictation Privilege

Apparently I’m blessed with a Voice Dictation friendly accent and tone. I can pretty much use most Voice Dictation systems, including the ones on Android, Dragon Naturally Speaking, and PocketSphinx. The results aren’t mind-blowing-ly good, but if it’s quiet and I speak clearly and train anything that gets missed, I get reasonable results.

Watching a child play with both the Android and PocketSphinx recognizers this weekend made me realize just how much I’m privileged.  His accent is pretty much identical to mine, but the recognizers just don’t seem to pick up his (much higher) voice anywhere near as well using the same setup. Of the dozens of phrases he tried, there were only 2 or 3 that were close (off by a word or two), and the vast bulk were just laughable, maybe a single syllable or a word would match, but the rest would just be gibberish.  Possibly his (quiet) voice just cut below the word/noise threshold? Will need to figure that out before I can may any toys that actually use the tool.

Of course, being a child, he thought it was hilarious how bad the results were when using PocketSphinx (without any training, mind you).  I’ll likely code up a toy that does the recognition and then reads it back using festival (he loves the “type and have festival read it” toy) so he can play with it more (and see the random words, ’cause hey, random words is good words). The results with Google/Android, however, were a deadly serious matter of not getting to the pantomime zebravideo he wanted to watch.

January 20, 2014 08:38 PM


Europython

Spread the Word – EuroPython 2014 is the Summer Highlight

EuroPython 2014 will be in Berlin, Germany starting July 21.

Let everybody know and twitter, blog and talk about it.

January 20, 2014 06:41 PM


Stefan Scherfke

SimPy: Environments

This is the second in a series of guides that describe how SimPy works and how to use it best. This time I’ll discuss environments.

A simulation environment manages the simulation time as well as the scheduling and processing of events. It also provides means to step through or execute the simulation.

The base class for all environments is BaseEnvironment. “Normal” simulations usually use its subclass Environment. For real-time simulations, SimPy provides aRealtimeEnvironment (more on that in another guide).

Simulation control

SimPy is very flexible in terms of simulation execution. You can run your simulation until there is no more event, until a certain simulation time is reached, or until a certain event is triggered. You can also step through the simulation event by event. Furthermore, you can mix these things as you like.

For example, you could run your simulation until an interesting event occurs. You could then step through the simulation event by event for a while; and finally run the simulation until there is no more event left and your processes all have terminated.

The most important method here is Environment.run():

  • If you call it without any argument (env.run()), it steps through the simulation until there is no more event left.WARNING: If your processes run forever (while True: yield env.timeout(1)), this method will never terminate (unless you kill your script by e.g., pressingCtrl-C).
  • In most cases it is more advisable to stop your simulation when it reaches a certain simulation time. Therefore, you can pass the desired time via the untilparameter, e.g.: env.run(until=10).The simulation will then stop when the internal clock reaches 10 but will not process any events scheduled for time 10. This is similar to a new environment where the clock is 0 but (obviously) no events have yet been processed.If you want to integrate your simulation in a GUI and want to draw a process bar, you can repeatedly call this function with increasing until values and update your progress bar after each call:
    for i in range(100):
        env.run(until=i)
        progressbar.update(i)
  • Instead of passing a number to run(), you can also pass any event to it. run() will then return when the event has been processed.Assuming that the current time is 0, env.run(until=env.timeout(5)) is equivalent to env.run(until=5).You can also pass other types of events (remember, that a Process is an event, too):
    >>> import simpy
    >>>
    >>> def my_proc(env):
    ...     yield env.timeout(1)
    ...     return 'Monty Python’s Flying Circus'
    >>>
    >>> env = simpy.Environment()
    >>> proc = env.process(my_proc(env))
    >>> env.run(until=proc)
    'Monty Python’s Flying Circus'

To step through the simulation event by event, the environment offers peek() and step().

peek() returns the time of the next scheduled event of infinity (float('inf')) of no more event is scheduled.

step() processes the next scheduled event. It raises an EmptySchedule exception if no event is available.

In a typical use case, you use these methods in a loop like:

until = 10
while env.peek() < until:
   env.step()

State access

The environment allows you to get the current simulation time via the Environment.now property. The simulation time is a number without unit and is increased viaTimeout events.

By default, now starts at 0, but you can pass an initial_time to the Environment to use something else.

NOTE: Although the simulation time is technically unitless, you can pretend that it is, for example, in seconds and use it like a timestamp returned by time.time()to calculate a date or the day of the week.

The property Environment.active_process is comparable to os.getpid() and is either None or pointing at the currently active Process. A process is active when its process function is being executed. It becomes inactive (or suspended) when it yields an event.

Thus, it makes only sense to access this property from within a process function or a function that is called by your process function:

>>> def subfunc(env):
...     print(env.active_process)  # will print "p1"
>>>
>>> def my_proc(env):
...     while True:
...         print(env.active_process)  # will print "p1"
...         subfunc(env)
...         yield env.timeout(1)
>>>
>>> env = simpy.Environment()
>>> p1 = env.process(my_proc(env))
>>> env.active_process  # None
>>> env.step()
<Process(my_proc) object at 0x...>
<Process(my_proc) object at 0x...>
>>> env.active_process  # None

An exemplary use case for this is the resource system: If a process function calls request() to request a resource, the resource determines the requesting process via env.active_process. Take a look at the code to see how we do this :-).

Event creation

To create events, you normally have to import simpy.events, instantiate the event class and pass a reference to the environment to it. To reduce the amount of typing, the Environment provides some shortcuts for event creation. For example, Environment.event() is equivalent to simpy.events.Event(env).

Other shortcuts are:

More details on what the events do can be found in the guide to events (not yet written :-)).

Miscellaneous

Since Python 3.3, a generator function can have a return value:

def my_proc(env):
    yield env.timeout(1)
    return 42

In SimPy, this can be used to provide return values for processes that can be used by other processes:

def other_proc(env):
    ret_val = yield env.process(my_proc(env))
    assert ret_val == 42

Internally, Python passes the return value as parameter to the StopIteration exception that it raises when a generator is exhausted. So in Python 2.7 and 3.2 you could replace the return 42 with a raise StopIteration(42) to achieve the same result.

To keep your code more readable, the environment provides the method exit() to do exactly this:

def my_proc(env):
    yield env.timeout(1)
    env.exit(42)  # Py2 equivalent to "return 42"

You can find the complete guide on Read the Docs. The next one will be about events and the event types provided by SimPy.

January 20, 2014 04:33 PM


Nathan Lemoine

ggplot2 in Python: A major barrier broken

I have been working with Python recently and I have to say, I love it. There’s a learning curve, of course, which has been frustrating. However, once I got comfortable with it (and continue to do so), I found that … Continue reading →

January 20, 2014 03:00 PM


Python Piedmont Triad User Group

PYPTUG Meeting – January 27th

PYthon Piedmont Triad User Group meeting

Come join PYPTUG at out next meeting (January 27th 2014) to learn more about the Python programming language, modules and tools. Python is the perfect language to learn if you’ve never programmed before, and at the other end, it is also the perfect tool that no expert would do without.

What

Meeting will start at 5:30pm.

We will open on an Intro to PYPTUG and on how to get started with Python, PYPTUG activities and members projects, then on to News from the community. After that, on to our main talk.

Main talk: Get to know Django

by Brandon Taylor

Brandon had been a professional web developer for about 16 years, currently an Enterprise UX Architect at Inmar and owner of bTaylor Web, he’s also worked at Dell, The Texas Tribune and Razorfish.

Abstract:

  • What is Django? Similarities and differences to other frameworks.
  • Overview of Models, Forms, Urls, Views and Templates
  • A quick “To do” app writing actual code.
    Django admin

With plenty of room for questions.

Secondary talk: Analytics and Visualization

by Francois Dion

Francois is a Python developer at Inmar and owner of Dion Research.

Abstract:
You know the usual suspects: SAS, R, Matlab. We’ll see how Python is transforming that space. We’ll review training material. We’ll talk pandas (no, not the cute animals) and integration, and finally, we’ll draw pretty pictures (no pandas harmed in the process).

Lightning talks!

After the talks, we will have some time for extemporaneous “lightning talks” of 5-10 minute duration. If you’d like to do one, some suggestions of talks wereprovided here, if you are looking for inspiration. Or talk about a project you are working on.

When

Monday, January 27th 2014
Meeting starts at 5:30PM

Where

We continue to have the meetings at Wake Forest University, close to Polo Rd and University Parkway:

Wake Forest University, Winston-Salem, NC 27109

 Map this

See also this campus map (PDF) and also the Parking Map (PDF) (Manchester hall is #20A on the parking map)

And speaking of parking:  Parking after 5pm is on a first-come, first-serve basis.  The official parking policy is:

Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit.

Mailing List

Don’t forget to sign up to our user group mailing list:

https://groups.google.com/d/forum/pyptug?hl=en

It is the only step required to become a PYPTUG member.

Meetup Group

In order to get a feel for how much pizza we’ll need, we ask that you register your attendance to this meeting on meetup:

http://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG

January 20, 2014 02:36 PM


Mike Driscoll

Python 201: Properties

Python has a neat little concept called a property that can do several useful things. In this article, we will be looking into how to do the following:

  • Convert class methods into read-only attributes
  • Reimplement setters and getters into an attribute

In this article, you will learn how to use the builtin class property in several different ways. Hopefully by the end of the article, you will see how useful it is.

Getting Started

One of the simplest ways to use a property is to use it as a decorator of a method. This allows you to turn a class method into a class attribute. I find this useful when I need to do some kind of combination of values. Others have found it useful for writing conversion methods that they want to have access to as methods. Let’s take a look at a simple example:

########################################################################
class Person(object):
    """"""

    #----------------------------------------------------------------------
    def __init__(self, first_name, last_name):
        """Constructor"""
        self.first_name = first_name
        self.last_name = last_name

    #----------------------------------------------------------------------
    @property
    def full_name(self):
        """
        Return the full name
        """
        return "%s %s" % (self.first_name, self.last_name)

In the code above, we create two class attributes or properties: self.first_name and self.last_name. Next we create a full_name method that has a @propertydecorator attached to it. This allows us to the following in an interpreter session:

>>> person = Person("Mike", "Driscoll")
>>> person.full_name
'Mike Driscoll'
>>> person.first_name
'Mike'
>>> person.full_name = "Jackalope"
Traceback (most recent call last):
  File "<string>", line 1, in <fragment>
AttributeError: can't set attribute

As you can see, because we turned the method into a property, we can access it using normal dot notation. However, is we try to set the property to something different, we will cause an AttributeError to be raised. The only way to change the full_name property is to do so indirectly:

>>> person.first_name = "Dan"
>>> person.full_name
'Dan Driscoll'

This is kind of limiting, so let’s look at another example where we can make a property that does allow us to set it.

Replacing Setters and Getters with Python property

Let’s pretend that we have some legacy code that someone wrote who didn’t understand Python very well. If you’re like me, you’ve already seen this kind of code before:

from decimal import Decimal

########################################################################
class Fees(object):
    """"""

    #----------------------------------------------------------------------
    def __init__(self):
        """Constructor"""
        self._fee = None

    #----------------------------------------------------------------------
    def get_fee(self):
        """
        Return the current fee
        """
        return self._fee

    #----------------------------------------------------------------------
    def set_fee(self, value):
        """
        Set the fee
        """
        if isinstance(value, str):
            self._fee = Decimal(value)
        elif isinstance(value, Decimal):
            self._fee = value

To use this class, we have to use the setters and getters that are defined:

>>> f = Fees()
>>> f.set_fee("1")
>>> f.get_fee()
Decimal('1')

If you want to add the normal dot notation access of attributes to this code without breaking all the applications that depend on this piece of code, you can change it very simply by adding a property:

from decimal import Decimal

########################################################################
class Fees(object):
    """"""

    #----------------------------------------------------------------------
    def __init__(self):
        """Constructor"""
        self._fee = None

    #----------------------------------------------------------------------
    def get_fee(self):
        """
        Return the current fee
        """
        return self._fee

    #----------------------------------------------------------------------
    def set_fee(self, value):
        """
        Set the fee
        """
        if isinstance(value, str):
            self._fee = Decimal(value)
        elif isinstance(value, Decimal):
            self._fee = value

    fee = property(get_fee, set_fee)

We added one line to the end of this code. Now we can do stuff like this:

>>> f = Fees()
>>> f.set_fee("1")
>>> f.fee
Decimal('1')
>>> f.fee = "2"
>>> f.get_fee()
Decimal('2')

As you can see, when we use property in this manner, it allows the fee property to set and get the value itself without breaking the legacy code. Let’s rewrite this code using the property decorator and see if we can get it to allow setting.

from decimal import Decimal

########################################################################
class Fees(object):
    """"""

    #----------------------------------------------------------------------
    def __init__(self):
        """Constructor"""
        self._fee = None

    #----------------------------------------------------------------------
    @property
    def fee(self):
        """
        The fee property - the getter
        """
        return self._fee

    #----------------------------------------------------------------------
    @fee.setter
    def fee(self, value):
        """
        The setter of the fee property
        """
        if isinstance(value, str):
            self._fee = Decimal(value)
        elif isinstance(value, Decimal):
            self._fee = value

#----------------------------------------------------------------------
if __name__ == "__main__":
    f = Fees()

The code above demonstrates how to create a “setter” for the fee property. You can do this by decorating a second method that is also called fee with a decorator called @fee.setter. The setter is invoked when you do something like this:

>>> f = Fees()
>>> f.fee = "1"

If you look at the signature for property, it has fget, fset, fdel and doc as “arguments”. You can create another decorated method using the same name to correspond to a delete function using @fee.deleter if you want to catch the del command against the attribute.

Wrapping Up

Now you know how to use Python properties in your own classes. Hopefully you can find even more useful ways to use them in your own code.

Additional Reading

January 20, 2014 02:17 PM


PyTennessee

PyTN Profiles: T. Scot Clausing and Pyrson

Speaker Profile: T. Scot Clausing (@tsclausing)

Scot recently worked for Digital Reasoning and currently resides at Emma – two great Python employers in Nashville, TN. Projects have ranged from Django web and API development to log analysis with Pandas and a personal pursuit to randomly generate Shakespeare’s complete works.

On Sunday February 23nd at 9:00 AM, Scot will be presenting on PyCharm in 60 Minutes.

JetBrains has released a Community Edition (free) version of PyCharm! This 60 minute tutorial will be a fast-paced, worthwhile investment in your PyCharm productivity – or it may just help you decide if PyCharm is right for you. We’ll cover each of the following topics (with plenty of take-home material to continue playing, I mean working).

  • Editing
  • Debugging
  • Refactoring
  • Testing

Sponsor Profile: Pyrson (@pyrsonwho)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s