By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,222 Members | 2,478 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,222 IT Pros & Developers. It's quick & easy.

Regular expressions and Python

P: 1
Hello,

I am working with Regular Expressions in Python.

I have a text file (authors.txt) file that contains the first and last name of an author separated with a whitespace, then a whitespace and the book title:

Peter Smith The Lobster story
Christine Bower In the closet
Tom Martin How to paint your furniture


My questions:

1) I want to transform the name into a string like this:

Peter Smith => psmith

I tried to get the first character of the first group, then to concatenate it with the second group and transform the whole string into lower case.


2.) Then I want to transform the book title into a string like this:

The Lobster story => the_lobster_story

I guess I just have to replace the whitespaces with an underscore '_' and transform the whole thing into lowercase but I don't know what function to use and how...


Then I wrote this script.py:


import re
import string

rgx = re.compile("(([A-Z])+\w+)[ ](([A-Z])+\w+)[ ]([^:]+)")

inf = open('authors.txt', 'r')
outf= open('authors2.txt', 'w')


for row in inf.readlines():
corr = rgx.search(row)
#these variables are false
first_name = corr.group(0)
last_name = corr.group(1)
name = corr.lower(first_name[0])+corr.lower(last_name)
title = corr.group(2)
title2 = title.lower(title.replace(' ', '_'))

if corr != None:
str = name ,"@bookstore.com" + " " + http://www.bookstore.org/", title2
outf.write(str)


inf.close()
outf.close()


Can anyone help?
Mar 23 '08 #1
Share this Question
Share on Google+
2 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
Please use code tags when posting code. Posting Guidelines - How to ask a question
You do not need regular expressions for this. Create an empty list to hold the results. Split the string, append the processed results to the list.
Expand|Select|Wrap|Line Numbers
  1. f = open('file_name')
  2. output = []
  3. for line in f:
  4.     lineList = line.split()
  5.     output.append(['%s%s' % (lineList[0][0].lower(), lineList[1].lower()), \
  6.                    '_'.join([word.lower() for word in lineList[2:]])])
  7.  
  8. f.close()    
  9.  
  10. for item in output:
  11.     print item
  12.  
>>> ['psmith', 'the_lobster_story']
['cbower', 'in_the_closet']
['tmartin', 'how_to_paint_your_furniture']
>>>
Mar 23 '08 #2

bvdet
Expert Mod 2.5K+
P: 2,851
I modified your regex pattern somewhat and added names to the groups. Using names, it is easier to read the structure of the pattern, and you can access the matched substrings with the MatchObject.groupdict() method. The rest is almost the same code as the non-regex solution.
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. rgx = re.compile(r'%s %s %s' % ('(?P<first_name>(?P<first_initial>[A-Z])?\w+)', \
  4.                                 '(?P<last_name>(?P<last_initial>[A-Z])?\w+)', \
  5.                                 '(?P<book_title>.+)'))
  6. f = open(fn)
  7. output = []
  8. for line in f:
  9.     m = rgx.search(line)
  10.     dd = m.groupdict()
  11.     output.append(['%s%s' % (dd['first_initial'].lower(), dd['last_name'].lower()), \
  12.                    '_'.join([word.lower() for word in dd['book_title'].split()])])
  13.  
  14. f.close()    
  15.  
  16. for item in output:
  17.     print item 
>>> ['psmith', 'the_lobster_story']
['cbower', 'in_the_closet']
['tmartin', 'how_to_paint_your_furniture']
>>>
Mar 23 '08 #3

Post your reply

Sign in to post your reply or Sign up for a free account.