471,305 Members | 1,522 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,305 software developers and data experts.

Filtering content of a text file

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)


CP: An inch of time is an inch of gold but you can't buy that inch of
time with an inch of gold.

Random Link Generator
--------------------------------------------------
http://www.transactioncodes.com
--------------------------------------------------

Jul 27 '07 #1
7 1337
On 7/27/07, Ir*******@gmail.com <Ir*******@gmail.comwrote:
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :

import string # import string module
alnums = list(string.lowercase+string.digits) # create a list of
alphabets and digits

for alnum in alnums:
outfile = open(alnum+'.txt', 'w')
for line in file("myrecords.txt"): # iterate over the records
if line.startswith("//-+"+alnum): # check your condition
# write the matches to a file
outfile.write(line)
outfile.close()

However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)
4: ch=line[4]) , and if it is alnum then process it.
Cheers,
--
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377

Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
Jul 27 '07 #2
Ir*******@gmail.com a écrit :
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.
No problem. It's 600 euro per day. Do I send you the contract ?
-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This really looks like homework, and asking people to do your homework
for you is a pretty bad idea. On most newsgroup, the answer would stop
here, but c.l.py is a very friendly place, so I'll give you a couple
starting points:

1/ for char in "abc":
print "char is %s" % char
print "//-+%s" % char

2/ for line in open('somefile'):
print line

3/ print "//-+alibaba sinage"[4:]

4/ print "//-+alibaba sinage"[4:].startswith('a')

5/ data = []
data.append("//-+alibaba sinage\n")
data.append("//-+amra damian//_9\n")
print "".join(data)

6/ f = open('someotherfile.txt', 'w')
f.write("line1\nline2\nline3\n")
f.close()

This is all you need to know to complete your task.
Jul 27 '07 #3
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
4/ print "//-+alibaba sinage"[4:].startswith('a')
print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #4
Marc 'BlackJack' Rintsch a écrit :
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
>4/ print "//-+alibaba sinage"[4:].startswith('a')

print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.
One learns everyday...
Thanks Marc.
Jul 27 '07 #5
On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote:
I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The example seems to be sorted, is this true for the real data too? And
are there records that don't start with a-z or 0-9?
a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This might be a little bit inefficient because the file gets read 36
times. If the data is already sorted you can use `itertools.groupby()` to
get the groups and write them to several files. Otherwise if the files
can be read into memory completely you can sort in memory and then use
`itertools.groupby()`.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #6
Ir*******@gmail.com wrote:
I'd greatly appreciate if you can take a look at the task I need
help with.

It'd be outstanding if someone can provide some sample Python
code.
Sure.
CP: An inch of time is an inch of gold but you can't buy that inch
of time with an inch of gold.
So, how much gold will I get for an "inch" of time?

Regards,
Björn

--
BOFH excuse #135:

You put the disk in upside down.

Jul 27 '07 #7
Thanks all for the input. This is going to be a great basis for
starting. And, yeah - I wish it was a homework.

Best,

Ira

Jul 27 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Charles Hartman | last post: by
3 posts views Thread by serge calderara | last post: by
2 posts views Thread by Konrad | last post: by
8 posts views Thread by Michiel Rapati-Kekkonen | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.