Wednesday, August 10, 2016

How to Mine Twitter in Under One Hour

There are a number of books discussing using Python to mine the social web. After trying out Facebook, LinkedIn, and Twitter API's, it is my believe that Twitter provides the most friendly workflow for anybody to get started. For the past two days I have been working on a project where mining the Twitter feed was a part of the overall process. Here I want to briefly share some of the steps I took using Python. I believe the whole process shouldn't take you more than 1 hour.

1. Determine the Twitter module to use.

There were a number of Python Twitter modules, a simple search turns up two popular modules: Python-Twitter and Tweepy. Since I had the chance to listen to Elizabeth Uselton's PyCon 2016 Talk a few weeks back at a local Python meetup, I decided to use the same library, which was python-twitter.

[Update 9/5/2016] Make sure you use 'python-twitter' package, i.e. pip install python-twitter. Not pip install twitter.

2. Register a developer account at https://dev.twitter.com/ for documentation and create an app via https://apps.twitter.com/ to get all the keys for your program.

3. Follow the instruction here to pip install the module and give it a spin. Below is an example using the Python REPL (off my Raspberry Pi 3, mind you :)).

pi@raspberrypi:~/Alexa $ python
Python 2.7.9 (default, Mar  8 2015, 00:52:26)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import twitter, pprint, json
>>> api = twitter.Api(consumer_key="<you get this from the twitter app>",
...                   consumer_secret="<you get this from the twitter app>",
...                   access_token_key="<you get this from the twitter app>",
...                   access_token_secret="<you get this from the twitter app>")
>>> print(api.VerifyCredentials())
{"created_at": "Thu Dec 18 17:35:54 +0000 2008", "description": "Network Automation Nerds", "favourites_count": 16, "followers_count": 58, "friends_count": 225, "name": "ericchou", "profile_background_color": <skip>}
>>>

4. Cool, let's do some search. Since the Olympics is going on, I bet that is a hot topic:

>>> searchResult = api.GetSearch(term="Olympics", lang='en', result_type='recent', count=5, max_id='')
>>> pprint.pprint(searchResult)
[Status(ID=763393850215960577, ScreenName=Brezshun, Created=Wed Aug 10 15:17:31 +0000 2016, Text=u'RT @DragonflyJonez: Even Hitler sent Jesse Owens a nondescript, mass mailing thank you card for participating in the Olympics. FDR never ev\u2026'),
 Status(ID=763393849964294144, ScreenName=JeremyMcDoniell, Created=Wed Aug 10 15:17:31 +0000 2016, Text=u'RT @br_uk: Michael Phelps has won more men\u2019s swimming #Gold medals than all but two countries in the history of the #Olympics \U0001f64c https://t.c\u2026'),
 Status(ID=763393849930584065, ScreenName=alyajaafar, Created=Wed Aug 10 15:17:31 +0000 2016, Text=u'RT @BBAnimaIVids: Kitten Summer Olympics \U0001f63a https://t.co/EGUWnOrGsX'),
 Status(ID=763393849536372736, ScreenName=ilhamfachrul, Created=Wed Aug 10 15:17:31 +0000 2016, Text=u'RT @BBCSport: Watch live @BBCOne (UK only) and the @BBCSport website.\n\nhttps://t.co/b87rfBg0qt #RioOlympics2016  https://t.co/767BAdAmza'),
 Status(ID=763393848747888640, ScreenName=JustLandlords, Created=Wed Aug 10 15:17:31 +0000 2016, Text=u'RT @NewsLandlords: Have the London Olympics had a long-lasting effect on the property market? https://t.co/QolbtKou6a #London #property htt\u2026')]
>>>

5. There you have it, quick and simple. Here is the official API Doc from Twitter.

Happy Coding!









3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Hi Ahmed, thanks for reading. Not sure why it returned an empty list, I'd imagine it would return either an error or something. I thought maybe people are not twitting about the Olympics anymore. But I just ran the code again in a script format and it returned the results below so I think people are still twitting about the Olympics:

      [Status(ID=773013590680989696, ScreenName=marcio_quintal, Created=Tue Sep 06 04:22:56 +0000 2016, Text=u'RT @HistoricalPics: 56 years ago today Muhammad Ali (then Cassius Clay) won gold for USA at the Rome Olympics. https://t.co/OLjMnpKTD7'),
      Status(ID=773013581394681856, ScreenName=MyloveTomiho, Created=Tue Sep 06 04:22:54 +0000 2016, Text=u'RT @akosisahlle: Abangan! @EsguerraTommy @gt_miho on Myx Olympics ct to ms janice n ms diana\n\nHAPPY 1st ANNIVERSARY TOMIHOS https://t.co/5d\u2026'),
      Status(ID=773013576747409408, ScreenName=IceApex_, Created=Tue Sep 06 04:22:53 +0000 2016, Text=u'RT @LegendaryRoasts: Hood Olympics https://t.co/Xt1iyWh6vX'),
      Status(ID=773013574184828928, ScreenName=twtweetr, Created=Tue Sep 06 04:22:52 +0000 2016, Text=u'RT @IT_securitynews: How a Massive 540 Gb/sec DDoS Attack Failed to Spoil the Rio Olympics: On 21 August, 2016, the\u2026 https://t.co/27iRGKu3M\u2026'),
      Status(ID=773013551938232320, ScreenName=SamanthaTowe1, Created=Tue Sep 06 04:22:47 +0000 2016, Text=u"RT @DobreMarcus: I can't believe my mom went to the Olympics in 1988 for Gymnastics. It's such an honor. I'm so proud of you and I love you\u2026")]

      Did the api.VerifyCredentials() return a valid response that shows the API authentication works?

      One other thing is that this is make sure you are using the pythong-twitter package, i.e. pip install python-twitter, and not just twitter.

      Let me know if that helps.

      Delete
    2. Hi Ahmed, you can also check out the latest post on using AWS Lambda for the Twitter bot, http://blog.pythonicneteng.com/2016/09/make-twitter-bot-in-python-and-aws.html. It uses a different library Twython as the API to Twitter. Happy coding! :)

      Delete