Tuesday, December 31, 2013

Project: Mining the Social Web 2nd Edition - Part 2

This is the part 2 of my experience trying out and experimenting with the code in Mining the Soical Web 2nd Edition. This covers Chapter 1 to Chapter 3 (Twitter, Facebook, LinkedIn).

A couple of thoughts so far:

1. The author is really dedicated to his book and audience, evident in his responsiveness in answering my question (#7 below).
2. The VM experience is totally awesome.
3. Each of the topic covered can be expanded into a whole book by itself. We are scratching only the surface.
4. For me, when writing a script on a new topic, finding the right library is typically half of the battle. Nothing is more frustrating when you are already going down the path of using a library only to find a better one (or worse, nothing at all). The book gave you a nice list of best-of-breed libraries for each topic, which in itself is worth every penny of buying the book.

Here is more detail about my experience:

1. I reran the exercise I did in previous post via IPython notebook via the VM. The three top trending corresponds to the bombing in Russia and the NFL game that was on at the time:

set([u'AJ Green', u'Volgograd', u'Andy Dalton'])

2.  I ran the rest of the code from the VM because I was short on time:

| Screen Name    | Count |
| justinbieber   |     6 |
| Kid_Charliej   |     2 |
| Cavillafuerte  |     2 |
| touchmestyles_ |     1 |
| aliceorr96     |     1 |
| gymleeam       |     1 |
| fienas         |     1 |
| nayely_1D      |     1 |
| angelchute     |     1 |
3. I was really glad I used virtualenv to isolate each of the chapters as each of them reuqire a different Python package. There was no problem installing any of the packages and run the scripts that I tried on the Win8.1 platform.
4. Facebook Graph API is a beast. It really require more reading and understanding in order to fully grasp the depth and what you can do with it. Here is a partial list of my friend's top interest list: 

Top likes amongst friends
| Name                                   | Freq |
| Amazon.com                             |    9 |
| Python for Network Engineering         |    5 |
| George Takei                           |    4 |
| West Seattle Rolfing                   |    4 |
| Photography on Facebook                |    3 |
| Music on Facebook                      |    3 |
5. D3 looks really awesome, but there is a learning curve to it that I will need to come back to. 

6. For the first 2 LinkedIn scripts, I have decided to break the second one into a more readable list comprehension. For me, I think if the List Comprehension goes into 2-degree deep, I'd like to break them apart:

     1. First file to get the connection file:

from linkedin import linkedin
import json

consumer_key = ''
consumer_secret = ''
user_token = ''
user_secret = ''

return_url = ''

auth = linkedin.LinkedInDeveloperAuthentication(consumer_key, consumer_secret,
     user_token, user_secret, return_url,

app = linkedin.LinkedInApplication(auth)

# Get you own profile
#print app.get_profile()

# Getting connections as a ego point
#print app.get_connections()

# A good idea to get connections and store them in file to save API calls
connections = app.get_connections()
f = open('linkedin_connections.json', 'w')
f.write(json.dumps(connections, indent=1))

for line in open('linkedin_connections.json', 'r'):
     print line

     2. Second file to do the parsing: 

from prettytable import PrettyTable
import json

f = open('linkedin_connections.json', 'r')
connections = json.loads(f.read())

# PrettyTable Print
pt = PrettyTable(field_names=['Name', 'Location'])
pt.align = 'l'

for c in connections['values']:
     if c.has_key('location'):
          pt.add_row([c['firstName'] + ' ' + c['lastName'], c['location']['name']])

print pt
 7. I got stuck under the CSV file export portion, turns out the path should be ipynb/resources/<blah> instead of just resources/<blah>. More details on the book Facebook page: https://www.facebook.com/MiningTheSocialWeb/posts/594068610648372?comment_id=5203748&reply_comment_id=5204030&offset=0&total_comments=16&notif_t=feed_comment.

I am enjoying the experience very much and feel like I am learning a lot. Again, the biggest difference for me is the author's dedication to his audience as well as the VM experience.

Happy coding!

No comments:

Post a Comment