split and regexp on textfile

Flyzone

Hi,
i have a problem with the split function and regexp.
I have a file that i want to split using the date as token.
Here a sample:
-----
Mon Apr 9 22:30:18 2007
text
text
Mon Apr 9 22:31:10 2007
text
text
----

I'm trying to put all the lines in a one string and then to separate
it
(could be better to not delete the \n if possible...)
while 1:
line = ftoparse.readli ne()
if not line: break
if line[-1]=='\n': line=line[:-1]
file_str += line
matchobj=re.com pile('[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ][0-9]
[ ][0-9][0-9][:]')
matchobj=matcho bj.split(file_s tr)
print matchobj

i have tried also
matchobj=re.spl it(r"^[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ]
[0-9][ ][0-9][0-9][:]",file_str)
and reading all with one:
file_str=ftopar se.readlines()
but the split doesn't work...where i am wronging?

Apr 13 '07 #1

Subscribe Reply

1656

mik3l3374

On Apr 13, 3:59 pm, "Flyzone" <flyz...@techno logist.comwrote :

Hi,
i have a problem with the split function and regexp.
I have a file that i want to split using the date as token.
Here a sample:
-----
Mon Apr 9 22:30:18 2007
text
text
Mon Apr 9 22:31:10 2007
text
text
----

I'm trying to put all the lines in a one string and then to separate
it
(could be better to not delete the \n if possible...)
while 1:
line = ftoparse.readli ne()
if not line: break
if line[-1]=='\n': line=line[:-1]
file_str += line
matchobj=re.com pile('[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ][0-9]
[ ][0-9][0-9][:]')
matchobj=matcho bj.split(file_s tr)
print matchobj

i have tried also
matchobj=re.spl it(r"^[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ]
[0-9][ ][0-9][0-9][:]",file_str)
and reading all with one:
file_str=ftopar se.readlines()
but the split doesn't work...where i am wronging?

you trying to match the date part right? if re is what you desire,
here's one example:

>>data = open("file").re ad()
pat = re.compile("[A-Z][a-z]{2} [A-Z][a-z]{2} \d{,2}\s+\d{,2} :\d{,2}:\d{,2} \d{4}",re.M|re. DOTALL)
print pat.findall(dat a)

['Mon Apr 9 22:30:18 2007', 'Mon Apr 9 22:31:10 2007']

Apr 13 '07 #2

Flyzone

On 13 Apr, 10:40, mik3l3...@gmail .com wrote:

you trying to match the date part right? if re is what you desire,
here's one example:

Amm..not! I need to get the text-block between the two data, not the
data! :)

Apr 13 '07 #3

mik3l3374

On Apr 13, 4:55 pm, "Flyzone" <flyz...@techno logist.comwrote :

On 13 Apr, 10:40, mik3l3...@gmail .com wrote:

you trying to match the date part right? if re is what you desire,
here's one example:

Amm..not! I need to get the text-block between the two data, not the
data! :)

change to pat.split(data) then.
I get this:

['', '\ntext\ntext\n ', '\ntext\ntext ']

Apr 13 '07 #4

Flyzone

On 13 Apr, 11:14, mik3l3...@gmail .com wrote:

change to pat.split(data) then.

next what i have tried originally..but is not working, my result is
here:

["Mon Feb 26 11:25:04 2007\ntext\n text\ntext\nMon Feb 26 11:25:16
2007\ntext\n text\n text\nMon Feb 26 17:06:41 2007\ntext"]

all together :(

Apr 13 '07 #5

bearophileHUGS

Flyzone:

i have a problem with the split function and regexp.
I have a file that i want to split using the date as token.

My first try:

data = """
error text
Mon Apr 9 22:30:18 2007
text
text
Mon Apr 9 22:31:10 2007
text
text
Mon Apr 10 22:31:10 2007
text
text
"""

import re
date_find = re.compile(r"\d \d:\d\d:\d\d \d{4}$")

section = []
for line in data.splitlines ():
if date_find.searc h(line):
if section:
print "\n" + "-" * 10 + "\n", "\n".join(secti on)
section = [line]
else:
if line:
section.append( line)

print "\n" + "-" * 10 + "\n", "\n".join(secti on)

itertools.group by() is fit to split sequences like:
111110001111110 001110010101111 1
as:
11111 000 111111 000 111 00 1 0 1 0 11111
While here we have a sequence like:
100001000101100 001000000010000
that has to be splitted as:
10000 1000 10 1 10000 10000000 10000
A standard itertool can be added for such quite common situation too.

Along those lines I have devised this different (and maybe over-
engineered) version:
from itertools import groupby
import re

class Splitter(object ):
# Not tested much
def __init__(self, predicate):
self.predicate = predicate
self.precedent_ el = None
self.state = True
def __call__(self, el):
if self.predicate( el):
self.state = not self.state
self.precedent_ el = el
return self.state

date_find = re.compile(r"\d \d:\d\d:\d\d \d{4}$")
splitter = Splitter(date_f ind.search)

sections = ("\n".join(g ) for h,g in groupby(data.sp litlines(),
key=splitter))
for section in sections:
if section:
print "\n" + "-" * 10 + "\n", section
The Splitter class + the groupby can become a single simpler
generator, like in this this version:
def grouper(seq, key=bool):
# A fast identity function can be used instead of bool()
# Not tested much
group = []
for part in seq:
if key(part):
if group: yield group
group = [part]
else:
group.append(pa rt)
yield group

import re
date_find = re.compile(r"\d \d:\d\d:\d\d \d{4}$")

for section in grouper(data.sp litlines(), date_find.searc h):
print "\n" + "-" * 10 + "\n", "\n".join(secti on)
Maybe that grouper can be modified to manage group lazily, like
groupby does, instead of building a true list.
Flyzone (seen later):

>Amm..not! I need to get the text-block between the two data, not the data! :)

Then you can modify the code like this:

def grouper(seq, key=bool):
group = []
for part in seq:
if key(part):
if group: yield group
group = [] # changed
else:
group.append(pa rt)
yield group

Bye,
bearophile

Apr 13 '07 #6

Flyzone

On 13 Apr, 11:30, "Flyzone" <flyz...@techno logist.comwrote :

all together :(

Damn was wrong mine regexp:
pat = re.compile("[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ][0-9][ ]
[0-9][0-9][:][0-9][0-9]",re.M|re.DOTAL L)

now is working! :)
Great! really thanks for the helps!

A little question: the pat.split can split without delete the date?

Apr 13 '07 #7

mik3l3374

On Apr 13, 6:08 pm, "Flyzone" <flyz...@techno logist.comwrote :

On 13 Apr, 11:30, "Flyzone" <flyz...@techno logist.comwrote :

all together :(

Damn was wrong mine regexp:
pat = re.compile("[A-Z][a-z][a-z][ ][A-Z][a-z][a-z][ ][0-9| ][0-9][ ]
[0-9][0-9][:][0-9][0-9]",re.M|re.DOTAL L)

now is working! :)
Great! really thanks for the helps!

A little question: the pat.split can split without delete the date?

not that i know of.

Apr 13 '07 #8

Gabriel Genellina

En Fri, 13 Apr 2007 07:08:05 -0300, Flyzone <fl*****@techno logist.com>
escribió:

A little question: the pat.split can split without delete the date?

No, but instead of reading the whole file and splitting on dates, you
could iterate over the file and detect block endings:

def split_on_dates( ftoparse):
block = None
for line in ftoparse:
if fancy_date_rege xp.match(line):
# a new block begins, yield the previous one
if block is not None:
yield current_date, block
current_date = line
block = []
else:
# accumulate lines for current block
block.append(li ne)
# don't forget the last block
if block is not None:
yield current_date, block

for date, block in split_on_dates( ftoparse):
# process block

--
Gabriel Genellina

Apr 15 '07 #9

Similar topics

4901

split -command ??

by: nieuws | last post by:

Hi, I was trying to do the following. It's my first php "project", so it's quiet logic that i have some problems. Perhaps the php community might help. It's about this : I have a txt file with the following data : 1. Stijn Piot 58.12; 2. Kim Van Rooy 1.25; 3. Johnny Marcovich 2.37; 4. John Terlaeken (Bel) 1 ronde/tour; 5. Michael Bertrand 2.12;

PHP

2903

split or regex difference between FF and IE

by: jhcorey | last post by:

I don't know where the actual issue is, but hopefully someone can explain. The following displays "5" in FireFox, but "3" in IE: <script type="text/javascript" language="javascript"> var newString = ",a,b,c,"; var treeArray = newString.split(/\,/i); alert(treeArray.length);

Javascript

1700

Read Text File and split them to individual text file

by: Krish | last post by:

I have requirement, that i get one big chunk of text file. This text file will have has information, that on finding "****End of Information****", i have to split them individual text file with our naming standard (unique id) and create them designated folder. This requirement should be created as a batch job and preferrably this job should monitor the folder where one big chunk of text file lands and process them immediately. ...

ASP.NET

2323

multi split function taking delimiter list

by: martinskou | last post by:

Hi, I'm looking for something like: multi_split( 'a:=b+c' , ) returning: whats the python way to achieve this, preferably without regexp? Thanks.

Python

8378

String Handling Opportunities with split(), indexOf() and RegExp

by: Atli | last post by:

The following small HowTo is a compilation of an original problem in getting some cookie-values through different methods of string-handling. The original Problem was posted as follows: As you can see, there could have been a problem with the split-method. The following short article handles ways around this possible problem, that we couldn't reproduce, but someone may possibly encounter it too sometimes. If nothing else, the shown...

Javascript

4723

CSV to array converter

by: tom t/LA | last post by:

Here is a function to convert a CSV file to a Javascript array. Uses idealized file reading functions based on the std C library, since there is no Javascript standard. Not fully tested. function csvToArray (f /* file handle */) { // convert csv file to Javascript 2d array (array of arrays) // written in Javascript but file lib functions are idealized var array2d = new Array(0);

Javascript

2912

RegExp split for Spell Check

by: SmokeWilliams | last post by:

Hi, I am working on a Spell checker for my richtext editor. I cannot use any open source, and must develop everything myself. I need a RegExp pattern to split text into a word array. I have been doing it by splitting by spaces or <ptags. I run into a probelm with the richtext part of my editor. When I change the font, it wraps the text in a tag. the tag has something like <font face="arial>some words</ font This splits the text at...

Javascript

2620

Split text and store it in arraylist [ problem with code]

by: perdoname | last post by:

Hello, Im trying to implement a program which will split a text file and then parses the elements to an arraylist. My text file looks like that: My program is that: public class Parse {

Java

1900

split a textfile in to two

by: kashif73 | last post by:

I have a texfile with hundreds of records. Each record in a line has 1250 values & values are seperated by a semi colon. IS there a way in VB.NET that i can split each line for example first 1000 values stored in a seperate textfile & the rest 250 in another text file.? and like wise each line of the original text file is split the same way. I want to import these two new text files then into SQL server 2 tables ( table one with 1000 columns &...

.NET Framework

10305

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10285

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

10063

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

9115

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7598

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6838

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5494

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4270

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2966

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General