472,989 Members | 3,007 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,989 software developers and data experts.

Filtering content of a text file

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)


CP: An inch of time is an inch of gold but you can't buy that inch of
time with an inch of gold.

Random Link Generator
--------------------------------------------------
http://www.transactioncodes.com
--------------------------------------------------

Jul 27 '07 #1
7 1420
On 7/27/07, Ir*******@gmail.com <Ir*******@gmail.comwrote:
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :

import string # import string module
alnums = list(string.lowercase+string.digits) # create a list of
alphabets and digits

for alnum in alnums:
outfile = open(alnum+'.txt', 'w')
for line in file("myrecords.txt"): # iterate over the records
if line.startswith("//-+"+alnum): # check your condition
# write the matches to a file
outfile.write(line)
outfile.close()

However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)
4: ch=line[4]) , and if it is alnum then process it.
Cheers,
--
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377

Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
Jul 27 '07 #2
Ir*******@gmail.com a écrit :
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.
No problem. It's 600 euro per day. Do I send you the contract ?
-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This really looks like homework, and asking people to do your homework
for you is a pretty bad idea. On most newsgroup, the answer would stop
here, but c.l.py is a very friendly place, so I'll give you a couple
starting points:

1/ for char in "abc":
print "char is %s" % char
print "//-+%s" % char

2/ for line in open('somefile'):
print line

3/ print "//-+alibaba sinage"[4:]

4/ print "//-+alibaba sinage"[4:].startswith('a')

5/ data = []
data.append("//-+alibaba sinage\n")
data.append("//-+amra damian//_9\n")
print "".join(data)

6/ f = open('someotherfile.txt', 'w')
f.write("line1\nline2\nline3\n")
f.close()

This is all you need to know to complete your task.
Jul 27 '07 #3
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
4/ print "//-+alibaba sinage"[4:].startswith('a')
print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #4
Marc 'BlackJack' Rintsch a écrit :
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
>4/ print "//-+alibaba sinage"[4:].startswith('a')

print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.
One learns everyday...
Thanks Marc.
Jul 27 '07 #5
On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote:
I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The example seems to be sorted, is this true for the real data too? And
are there records that don't start with a-z or 0-9?
a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This might be a little bit inefficient because the file gets read 36
times. If the data is already sorted you can use `itertools.groupby()` to
get the groups and write them to several files. Otherwise if the files
can be read into memory completely you can sort in memory and then use
`itertools.groupby()`.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #6
Ir*******@gmail.com wrote:
I'd greatly appreciate if you can take a look at the task I need
help with.

It'd be outstanding if someone can provide some sample Python
code.
Sure.
CP: An inch of time is an inch of gold but you can't buy that inch
of time with an inch of gold.
So, how much gold will I get for an "inch" of time?

Regards,
Björn

--
BOFH excuse #135:

You put the disk in upside down.

Jul 27 '07 #7
Thanks all for the input. This is going to be a great basis for
starting. And, yeah - I wish it was a homework.

Best,

Ira

Jul 27 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Charles Hartman | last post by:
I'm working on text-handling programs that want plain-text files as input. It's fine to tell users to feed the programs with plain-text only, but not all users know what this means, even after you...
3
by: serge calderara | last post by:
Dear all, I have a csv file data which has been read and populate on a dataset object. then I need to bind part of the content of the dataset to a treeview control. I have read that XML format...
8
by: Tim Pollard | last post by:
Hi I am trying to filter a table of users to select only those records whose roleID matches a value in an array. There could be any number of IDs held in the array from one to a few hundred. The...
2
by: Konrad | last post by:
Hi Can you point examples in .NET of filtering (avoiding) displaying web pages with unwanted content on machine with ie? Thanks Konrad
0
by: Neo | last post by:
Hello: I am receiving a Binary File in a Request from a application. The stream which comes to me has the boundary (Something like "---------------------------390C0F3E0099" without the quotes),...
0
by: Romulo NF | last post by:
Greetings again everyone Recently i´ve been asked to develop a script to allow filtering in the content of the table, with dinamic options based on the own content. Example: a table with the name of...
8
by: levi | last post by:
Hallo I have some problems regarding text filtering with Visual Basic. In order to filter only some text lines I want to make a copy of the file as a temporary one, read this copy as a file...
8
by: Michiel Rapati-Kekkonen | last post by:
Hi, I would like that my subform is immediately filtered as soon as one types a letter in an unbound searchbox in the form. If one types an 's' the content of the subform shows only records...
3
by: premprakashbhati | last post by:
hi, good evening.. i am going to upload an image in a web form .....for that iam using HTML input(file) control and one web control button i.e., Upload_Button() here is the code ...its work fine...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
3
SueHopson
by: SueHopson | last post by:
Hi All, I'm trying to create a single code (run off a button that calls the Private Sub) for our parts list report that will allow the user to filter by either/both PartVendor and PartType. On...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.