Saturday, September 1, 2012

[Book Review] Regular Expression Cookbook, 2nd Edition

Book review for Regular Expressions Cookbook , by Jan Goyyaerts and Steven Levithan, ISBN 1449319432.

We work with text file a lot in scripts and O'Reilly cookbooks are awesome. They are quick references when you need something fast. This book is full of good examples when you need to, say, validate an IPv4 address in a bunch of text, or find a particular BGP neighbor / IP combination.

Here is a quick example of what I drew up from Recipe 3.5 to search for IPs in a list of IPs:

##### Script #####

# Modified from "Regular Expression Cookbook 2nd Edition", Recipe 3.5

import re

pattern = raw_input("Type in Pattern to Search: ")
f = raw_input("Type the file to search: ")
subject = open(f, 'r').readlines()

reobj = re.compile(pattern)

print('... Start ...')
for line in subject:
    if reobj.search(line):
        print("Matched: " + pattern + " in line " + line.strip())

print('... Done ...')
#####

I use the following script to generate a file that has all the IPs in 192.168.1.0/24 and 10.0.0.0/24 called TestIP.txt:

#####
#!/usr/bin/env python

f = open('TestIP.txt', 'w')

# prints all the IPs in 192.168.1.0/24
for i in range(256):
    f.write('192.168.1.'+str(i)+'\n')

# prints all the IPs in 10.0.0.0/24
for i in range(256):
    f.write('10.0.0.'+str(i)+'\n')

f.close()
#####

##### TestIP.txt #####
.....
192.168.1.0
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5
192.168.1.6
<blah>
192.168.1.252
192.168.1.253
192.168.1.254
192.168.1.255
10.0.0.0
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
10.0.0.5
<blah>
10.0.0.252
10.0.0.253
10.0.0.254
10.0.0.255

##### Here is one usage ###
>>> 
Type in Pattern to Search: 10.0.0.5
Type the file to search: TestIP.txt
... Start ...
Matched: 10.0.0.5 in line 10.0.0.5
Matched: 10.0.0.5 in line 10.0.0.50
Matched: 10.0.0.5 in line 10.0.0.51
Matched: 10.0.0.5 in line 10.0.0.52
Matched: 10.0.0.5 in line 10.0.0.53
Matched: 10.0.0.5 in line 10.0.0.54
Matched: 10.0.0.5 in line 10.0.0.55
Matched: 10.0.0.5 in line 10.0.0.56
Matched: 10.0.0.5 in line 10.0.0.57
Matched: 10.0.0.5 in line 10.0.0.58
Matched: 10.0.0.5 in line 10.0.0.59
... Done ...
>>> 

##### You can also use meta characters to narrow down your search ($ means end) 
>>> 
Type in Pattern to Search: 10.0.0.5$
Type the file to search: TestIP.txt
Matched: 10.0.0.5$ in line 10.0.0.5
###
Done
>>>


### Some examples that allows for IPv4 and IPv6 addrsses

8.16 Checking IPv4 address, disallowing leading 0:
^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3} (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$
8.17 Checking IPv6 address within text8.16 :
(?<![:.\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\w])



Here is the review as it appears on Amazon:


O'Reilly cookbooks are awesome. But just like I don't read the recipes cover-to-cover in regular cookbooks, I don't read all the recipes in the O'Reilly cookbooks either. Also just like regular cookbooks, the day before Thanksgiving is not a good time to open the cookbook for the first time, I at least glance thru all the recipes to know what is there, pick out a few that I can use right away, and dog ear the ones I think I will come back to. So here are the criteria that I review this book with: 

1. Easy Navigation: Yep, this book is easy to navigate. If I need to do, say form validation, I know I should start at Chapter 4 "Validation and Formatting". 
2. Clear and precise explanation: Yes, I think the explanation are short and precise to the topic of discussion. 
3. Pointer for more information: This is hard to do, but the book has a section on "See Also" for correlation between recipes and a general pointer toward 'Master Regular Expression' in the introduction chapter. 
4. Easy Reading: Hum.. here is more of a wish list of mine, I wish the book is broken down into different books by language. The book covers these languages, VB.NET, C#, Java, JavaScript, XRegExp, PHP, Perl, Python, and Ruby. I typically skip down to Python and occasionally stop at C# and PHP. The book is over 600 pages and listed at $49.99. I would have been happy to pay 1/5 of the price to get one that just focus on Python, and another 1/5 of the price to get one on PHP. 

All in all, it is a good value and a keeper on the bookshelf. But I really think it should be broken down into language-specific cookbook as most reader probably use only one or two languages on a daily basis. With today's print-on-demand, e-book format, I think it would be very minimal work for the author and a whole lot of less skipping for the readers. Just my 2 cents. 


5 comments:

  1. Hey Eric.

    This books actually looks really cool! Perhaps bit "geekish" to think of regex as being cool, but whatever! :D

    Really nice example you have there!

    I'm having fun with python, and actually creating a small script for Mac tracking("show mac add | include ab12", and show the cdp neighbor and continue until found). But regex' with PExpect ain't that easy imo >.<

    ReplyDelete
    Replies
    1. Hi Joe, yeah totally. I think 90% of the time we just figure out the RE for whatever it is we are doing and just move on with life. :) Which makes it a perfect cookbook-type of subject.

      For the problem you described, do you grab the output, dump it to a temp file and analyze? That is how I find easier to do than try to do it in memory via PExpect. But maybe you mean something else?

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Pretty awesome Joe. I will make sure I use it next time I have the need. Nice of you to put all the comments in the code. :)

      Delete
  2. Cleaned it up a bit: https://github.com/Joe-testing/mac-track_joe.py/blob/master/mactrack_joe.py :-)

    ReplyDelete