Friday, August 31, 2012

Data Analysis with Python - Installation

I wanted to start exploring data analysis with Python using the book "Python for Data Analysis", O'Reilly book page. It in in preview release, so if you want to buy it you need to buy it from O'Reilly, I don't know why so late, but looks like the release date is set to October 22nd, 2012.
Here is the link for Amazon: Python for Data Analysis

Just to get started, the book suggested getting the all-in-one package from Enthought, then install Pandas package from Pypi. Easy as it sounds, I find the actual steps a little bit different. 

1. For Windows and Mac users, there are dmg and exe packages, just need to pick the right one matching your Python version. I use Python 2.7 (lots of packages are not updated to 3 yet). 


My Windows path for Python27 is C:\Python27, I still want to keep it so I chose C:\Python27EPD and all the later packages picked up the right path right away. 

2. When trying to install Pandas (sudo easy_install pandas or packages), it gave an error of requiring Numpy > 1.6. The all-in-one package has some lower number, you can find out by:

import numpy
numpy.version.version()

So better off just download the package, I just picked the latest stable code: 


For Ubuntu, I used 'sudo apt-get install -U numpy' to get the latest version. 

3. Now time to download and install Pandas:


4. Checking now, click on PyLab icon:







5. Then try to import Pandas module and do a simple plot:

import pandas
plot(arange(10)) # note that this would work regardless of pandas since this is matplotlib.
















6. You should see this pop up on the screen, and you can change the graph interactively:

























Looking forward to know more about data analysis with Python... Stay tuned..




No comments:

Post a Comment