473,387 Members | 1,572 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Filtering content of a text file

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)


CP: An inch of time is an inch of gold but you can't buy that inch of
time with an inch of gold.

Random Link Generator
--------------------------------------------------
http://www.transactioncodes.com
--------------------------------------------------

Jul 27 '07 #1
7 1453
On 7/27/07, Ir*******@gmail.com <Ir*******@gmail.comwrote:
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :

import string # import string module
alnums = list(string.lowercase+string.digits) # create a list of
alphabets and digits

for alnum in alnums:
outfile = open(alnum+'.txt', 'w')
for line in file("myrecords.txt"): # iterate over the records
if line.startswith("//-+"+alnum): # check your condition
# write the matches to a file
outfile.write(line)
outfile.close()

However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)
4: ch=line[4]) , and if it is alnum then process it.
Cheers,
--
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377

Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
Jul 27 '07 #2
Ir*******@gmail.com a écrit :
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.
No problem. It's 600 euro per day. Do I send you the contract ?
-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This really looks like homework, and asking people to do your homework
for you is a pretty bad idea. On most newsgroup, the answer would stop
here, but c.l.py is a very friendly place, so I'll give you a couple
starting points:

1/ for char in "abc":
print "char is %s" % char
print "//-+%s" % char

2/ for line in open('somefile'):
print line

3/ print "//-+alibaba sinage"[4:]

4/ print "//-+alibaba sinage"[4:].startswith('a')

5/ data = []
data.append("//-+alibaba sinage\n")
data.append("//-+amra damian//_9\n")
print "".join(data)

6/ f = open('someotherfile.txt', 'w')
f.write("line1\nline2\nline3\n")
f.close()

This is all you need to know to complete your task.
Jul 27 '07 #3
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
4/ print "//-+alibaba sinage"[4:].startswith('a')
print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #4
Marc 'BlackJack' Rintsch a écrit :
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
>4/ print "//-+alibaba sinage"[4:].startswith('a')

print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.
One learns everyday...
Thanks Marc.
Jul 27 '07 #5
On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote:
I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The example seems to be sorted, is this true for the real data too? And
are there records that don't start with a-z or 0-9?
a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This might be a little bit inefficient because the file gets read 36
times. If the data is already sorted you can use `itertools.groupby()` to
get the groups and write them to several files. Otherwise if the files
can be read into memory completely you can sort in memory and then use
`itertools.groupby()`.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #6
Ir*******@gmail.com wrote:
I'd greatly appreciate if you can take a look at the task I need
help with.

It'd be outstanding if someone can provide some sample Python
code.
Sure.
CP: An inch of time is an inch of gold but you can't buy that inch
of time with an inch of gold.
So, how much gold will I get for an "inch" of time?

Regards,
Björn

--
BOFH excuse #135:

You put the disk in upside down.

Jul 27 '07 #7
Thanks all for the input. This is going to be a great basis for
starting. And, yeah - I wish it was a homework.

Best,

Ira

Jul 27 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Charles Hartman | last post by:
I'm working on text-handling programs that want plain-text files as input. It's fine to tell users to feed the programs with plain-text only, but not all users know what this means, even after you...
3
by: serge calderara | last post by:
Dear all, I have a csv file data which has been read and populate on a dataset object. then I need to bind part of the content of the dataset to a treeview control. I have read that XML format...
8
by: Tim Pollard | last post by:
Hi I am trying to filter a table of users to select only those records whose roleID matches a value in an array. There could be any number of IDs held in the array from one to a few hundred. The...
2
by: Konrad | last post by:
Hi Can you point examples in .NET of filtering (avoiding) displaying web pages with unwanted content on machine with ie? Thanks Konrad
0
by: Neo | last post by:
Hello: I am receiving a Binary File in a Request from a application. The stream which comes to me has the boundary (Something like "---------------------------390C0F3E0099" without the quotes),...
0
by: Romulo NF | last post by:
Greetings again everyone Recently i´ve been asked to develop a script to allow filtering in the content of the table, with dinamic options based on the own content. Example: a table with the name of...
8
by: levi | last post by:
Hallo I have some problems regarding text filtering with Visual Basic. In order to filter only some text lines I want to make a copy of the file as a temporary one, read this copy as a file...
8
by: Michiel Rapati-Kekkonen | last post by:
Hi, I would like that my subform is immediately filtered as soon as one types a letter in an unbound searchbox in the form. If one types an 's' the content of the subform shows only records...
3
by: premprakashbhati | last post by:
hi, good evening.. i am going to upload an image in a web form .....for that iam using HTML input(file) control and one web control button i.e., Upload_Button() here is the code ...its work fine...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.