Filtering content of a text file

Ira.Kovac

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)

CP: An inch of time is an inch of gold but you can't buy that inch of
time with an inch of gold.

Random Link Generator
--------------------------------------------------
http://www.transactioncodes.com
--------------------------------------------------

Jul 27 '07 #1

Subscribe Post Reply

1453

Amit Khemka

On 7/27/07, Ir*******@gmail.com <Ir*******@gmail.comwrote:

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)

Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :

import string # import string module
alnums = list(string.lowercase+string.digits) # create a list of
alphabets and digits

for alnum in alnums:
outfile = open(alnum+'.txt', 'w')
for line in file("myrecords.txt"): # iterate over the records
if line.startswith("//-+"+alnum): # check your condition
# write the matches to a file
outfile.write(line)
outfile.close()

However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)

4: ch=line[4]) , and if it is alnum then process it.

Cheers,
--
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377

Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.

Jul 27 '07 #2

Bruno Desthuilliers

Ir*******@gmail.com a écrit :

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

No problem. It's 600 euro per day. Do I send you the contract ?

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)

This really looks like homework, and asking people to do your homework
for you is a pretty bad idea. On most newsgroup, the answer would stop
here, but c.l.py is a very friendly place, so I'll give you a couple
starting points:

1/ for char in "abc":
print "char is %s" % char
print "//-+%s" % char

2/ for line in open('somefile'):
print line

3/ print "//-+alibaba sinage"[4:]

4/ print "//-+alibaba sinage"[4:].startswith('a')

5/ data = []
data.append("//-+alibaba sinage\n")
data.append("//-+amra damian//_9\n")
print "".join(data)

6/ f = open('someotherfile.txt', 'w')
f.write("line1\nline2\nline3\n")
f.close()

This is all you need to know to complete your task.

Jul 27 '07 #3

Marc 'BlackJack' Rintsch

On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:

4/ print "//-+alibaba sinage"[4:].startswith('a')

print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.

Ciao,
Marc 'BlackJack' Rintsch

Jul 27 '07 #4

Bruno Desthuilliers

Marc 'BlackJack' Rintsch a Ã©crit :

On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:

>4/ print "//-+alibaba sinage"[4:].startswith('a')

print "//-+alibaba sinage".startswith('a', 4)

This does not create an extra string from the slicing.

One learns everyday...
Thanks Marc.

Jul 27 '07 #5

Marc 'BlackJack' Rintsch

On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote:

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole

The example seems to be sorted, is this true for the real data too? And
are there records that don't start with a-z or 0-9?

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)

This might be a little bit inefficient because the file gets read 36
times. If the data is already sorted you can use `itertools.groupby()` to
get the groups and write them to several files. Otherwise if the files
can be read into memory completely you can sort in memory and then use
`itertools.groupby()`.

Ciao,
Marc 'BlackJack' Rintsch

Jul 27 '07 #6

Bjoern Schliessmann

Ir*******@gmail.com wrote:

I'd greatly appreciate if you can take a look at the task I need
help with.

It'd be outstanding if someone can provide some sample Python
code.

Sure.

CP: An inch of time is an inch of gold but you can't buy that inch
of time with an inch of gold.

So, how much gold will I get for an "inch" of time?

Regards,
Björn

--
BOFH excuse #135:

You put the disk in upside down.

Jul 27 '07 #7

Ira.Kovac

Thanks all for the input. This is going to be a great basis for
starting. And, yeah - I wish it was a homework.

Best,

Ira

Jul 27 '07 #8

by: Charles Hartman | last post by:

I'm working on text-handling programs that want plain-text files as input. It's fine to tell users to feed the programs with plain-text only, but not all users know what this means, even after you...

Python

Help on Filtering XML nodes ???

by: serge calderara | last post by:

Dear all, I have a csv file data which has been read and populate on a dataset object. then I need to bind part of the content of the dataset to a treeview control. I have read that XML format...

.NET Framework

Problem filtering recordset by values held in array

by: Tim Pollard | last post by:

Hi I am trying to filter a table of users to select only those records whose roleID matches a value in an array. There could be any number of IDs held in the array from one to a few hundred. The...

ASP / Active Server Pages

filtering displaying of web pages

by: Konrad | last post by:

Hi Can you point examples in .NET of filtering (avoiding) displaying web pages with unwanted content on machine with ie? Thanks Konrad

ASP.NET

Reading Binary Content from a File and Filtering it

by: Neo | last post by:

Hello: I am receiving a Binary File in a Request from a application. The stream which comes to me has the boundary (Something like "---------------------------390C0F3E0099" without the quotes),...

ASP.NET

Table with filtering to the content

by: Romulo NF | last post by:

Greetings again everyone Recently i´ve been asked to develop a script to allow filtering in the content of the table, with dinamic options based on the own content. Example: a table with the name of...

Javascript

text filtering with Visual Basic

by: levi | last post by:

Hallo I have some problems regarding text filtering with Visual Basic. In order to filter only some text lines I want to make a copy of the file as a temporary one, read this copy as a file...

Visual Basic 4 / 5 / 6

immediate filtering

by: Michiel Rapati-Kekkonen | last post by:

Hi, I would like that my subform is immediately filtered as soon as one types a letter in an unbound searchbox in the form. If one types an 's' the content of the subform shows only records...

Microsoft Access / VBA

Upload photo in asp.net content page

by: premprakashbhati | last post by:

hi, good evening.. i am going to upload an image in a web form .....for that iam using HTML input(file) control and one web control button i.e., Upload_Button() here is the code ...its work fine...

ASP.NET

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Filtering content of a text file

Similar topics