473,651 Members | 2,518 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Filtering content of a text file

Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)


CP: An inch of time is an inch of gold but you can't buy that inch of
time with an inch of gold.

Random Link Generator
--------------------------------------------------
http://www.transactioncodes.com
--------------------------------------------------

Jul 27 '07 #1
7 1459
On 7/27/07, Ir*******@gmail .com <Ir*******@gmai l.comwrote:
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.

Thanks a lot,

Ira

-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
Well that should be easy if you take a look at methods in "string" module:
A rough sketch would be :

import string # import string module
alnums = list(string.low ercase+string.d igits) # create a list of
alphabets and digits

for alnum in alnums:
outfile = open(alnum+'.tx t', 'w')
for line in file("myrecords .txt"): # iterate over the records
if line.startswith ("//-+"+alnum): # check your condition
# write the matches to a file
outfile.write(l ine)
outfile.close()

However rather than looping over the file for each alnum you may just
iterate over the file, and check the starting characters (if len(line)
4: ch=line[4]) , and if it is alnum then process it.
Cheers,
--
----
Amit Khemka
website: www.onyomo.com
wap-site: www.owap.in
Home Page: www.cse.iitd.ernet.in/~csd00377

Endless the world's turn, endless the sun's Spinning, Endless the quest;
I turn again, back to my own beginning, And here, find rest.
Jul 27 '07 #2
Ir*******@gmail .com a écrit :
Hello All,

I'd greatly appreciate if you can take a look at the task I need help
with.

It'd be outstanding if someone can provide some sample Python code.
No problem. It's 600 euro per day. Do I send you the contract ?
-------------------------------------------------------------------------------
Problem
-------------------------------------------------------------------------------

I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The records start with the same string (in the example //-+) wich is
followed by another string of characters taht's changing from record
to record.

I am working on one file at the time and for each file I need to be
able to do the following:

a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This really looks like homework, and asking people to do your homework
for you is a pretty bad idea. On most newsgroup, the answer would stop
here, but c.l.py is a very friendly place, so I'll give you a couple
starting points:

1/ for char in "abc":
print "char is %s" % char
print "//-+%s" % char

2/ for line in open('somefile' ):
print line

3/ print "//-+alibaba sinage"[4:]

4/ print "//-+alibaba sinage"[4:].startswith('a' )

5/ data = []
data.append("//-+alibaba sinage\n")
data.append("//-+amra damian//_9\n")
print "".join(dat a)

6/ f = open('someother file.txt', 'w')
f.write("line1\ nline2\nline3\n ")
f.close()

This is all you need to know to complete your task.
Jul 27 '07 #3
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
4/ print "//-+alibaba sinage"[4:].startswith('a' )
print "//-+alibaba sinage".startsw ith('a', 4)

This does not create an extra string from the slicing.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #4
Marc 'BlackJack' Rintsch a écrit :
On Fri, 27 Jul 2007 12:15:25 +0200, Bruno Desthuilliers wrote:
>4/ print "//-+alibaba sinage"[4:].startswith('a' )

print "//-+alibaba sinage".startsw ith('a', 4)

This does not create an extra string from the slicing.
One learns everyday...
Thanks Marc.
Jul 27 '07 #5
On Fri, 27 Jul 2007 02:28:27 -0700, Ira.Kovac wrote:
I am working with 30K+ record datasets in flat file format (.txt) that
look like this:

//-+alibaba sinage
//-+amra damian//_9
//-+anix anire//_
//-+borom
//-+bokima sun drane
//-+ciren
//-+cop calestieon eded
//-+ciciban
//-+drago kimano sole
The example seems to be sorted, is this true for the real data too? And
are there records that don't start with a-z or 0-9?
a) By looping thru the file the program should isolate all records
that have letter a following the //-+
b) The isolated dataset will contain only records that start with //-
+a
c) Save the isolated dataset as flat flat text file named a.txt
d) Repeat a), b) and c) for all letters of english alphabet (a thru z)
and numerical values (0 thru 9)
This might be a little bit inefficient because the file gets read 36
times. If the data is already sorted you can use `itertools.grou pby()` to
get the groups and write them to several files. Otherwise if the files
can be read into memory completely you can sort in memory and then use
`itertools.grou pby()`.

Ciao,
Marc 'BlackJack' Rintsch
Jul 27 '07 #6
Ir*******@gmail .com wrote:
I'd greatly appreciate if you can take a look at the task I need
help with.

It'd be outstanding if someone can provide some sample Python
code.
Sure.
CP: An inch of time is an inch of gold but you can't buy that inch
of time with an inch of gold.
So, how much gold will I get for an "inch" of time?

Regards,
Björn

--
BOFH excuse #135:

You put the disk in upside down.

Jul 27 '07 #7
Thanks all for the input. This is going to be a great basis for
starting. And, yeah - I wish it was a homework.

Best,

Ira

Jul 27 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1428
by: Charles Hartman | last post by:
I'm working on text-handling programs that want plain-text files as input. It's fine to tell users to feed the programs with plain-text only, but not all users know what this means, even after you explain it, or they forget. So it would be nice to be able to handle gracefully the stuff that MS Word (or any word-processor) puts into a file. Inserting a 0-127 filter is easy but not very friendly. Typically, the w.p. file loads OK (into a...
3
2171
by: serge calderara | last post by:
Dear all, I have a csv file data which has been read and populate on a dataset object. then I need to bind part of the content of the dataset to a treeview control. I have read that XML format is particular usefull to be bind to a treeview as it handle nodes. So I have try to save my dataset content into a xml file. The content of that file is as follow : - <NewDataSet> - <REC_INFO>
8
2097
by: Tim Pollard | last post by:
Hi I am trying to filter a table of users to select only those records whose roleID matches a value in an array. There could be any number of IDs held in the array from one to a few hundred. The array is generated by splitting a comma delimited memo field from a second table in an Access DB. I can split the memo field OK, I can response.write its values, but what I now want to do is add a value from another table to my reponse write...
2
1470
by: Konrad | last post by:
Hi Can you point examples in .NET of filtering (avoiding) displaying web pages with unwanted content on machine with ie? Thanks Konrad
0
1457
by: Neo | last post by:
Hello: I am receiving a Binary File in a Request from a application. The stream which comes to me has the boundary (Something like "---------------------------390C0F3E0099" without the quotes), and also some more text like this and file name (e.g. "Content-Disposition: form-data; name="upload_file"; filename="C:\testing\myfile.dat" Content-Type: application/octet-stream") The binary content starts after "application/octet-stream".
0
6097
by: Romulo NF | last post by:
Greetings again everyone Recently i´ve been asked to develop a script to allow filtering in the content of the table, with dinamic options based on the own content. Example: a table with the name of some students and their respective numbers, and then you wanna show only studentes called "Joao", or students with number "5", or even only students called "joao" with number "5". The structure we are going to use is a basic html table, like: ...
8
2629
by: levi | last post by:
Hallo I have some problems regarding text filtering with Visual Basic. In order to filter only some text lines I want to make a copy of the file as a temporary one, read this copy as a file input via basic and print out to the file. The original text file looks as follows: 2.11.2007 11:35:36 The new source 'SOCKET:192.168.0.75:1458' has been accepted 22.11.2007 11:34:42 Network Services stopped (GT EX TEST) 22.11.2007 11:34:52 Network...
8
1282
by: Michiel Rapati-Kekkonen | last post by:
Hi, I would like that my subform is immediately filtered as soon as one types a letter in an unbound searchbox in the form. If one types an 's' the content of the subform shows only records starting with s If one continues with 'e' the subform would show the ones starting with se. If one would correct that to 'e' by going back and removing the 's' the records would start with an e.
3
4412
by: premprakashbhati | last post by:
hi, good evening.. i am going to upload an image in a web form .....for that iam using HTML input(file) control and one web control button i.e., Upload_Button() here is the code ...its work fine when iam using a normal web page... but can't in content page.... Code in Master Page <%@ Master Language="C#" AutoEventWireup="true" CodeFile="submaster.master.cs" Inherits="submaster" %> <%@ Register Assembly="AjaxControlToolkit"...
0
8349
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8795
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8460
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7296
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4143
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4281
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2696
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1906
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.