472,986 Members | 2,935 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,986 software developers and data experts.

A vote for re scanner

Every couple of months I have a use for the experimental 'scanner'
object in the re module, and when I do, as I did this morning, it's
really handy. So if anyone is counting votes for making it a standard
part of the module, here's my vote:

+1

-- Wade Leftwich
Ithaca, NY
Jul 18 '05 #1
18 2494
wa**@lightlink.com (Wade Leftwich) wrote in message news:<5b**************************@posting.google. com>...
Every couple of months I have a use for the experimental 'scanner'
object in the re module, and when I do, as I did this morning, it's
really handy. So if anyone is counting votes for making it a standard
part of the module, here's my vote:


While I don't think they're still accepting votes :), you've pointed
me to something I didn't know about until now. What kinds of things
have you been using re.Scanner for?

Jeremy
Jul 18 '05 #2
tw*********@hotmail.com (Jeremy Fincher) wrote in message news:<69**************************@posting.google. com>...
wa**@lightlink.com (Wade Leftwich) wrote in message news:<5b**************************@posting.google. com>...
Every couple of months I have a use for the experimental 'scanner'
object in the re module, and when I do, as I did this morning, it's
really handy. So if anyone is counting votes for making it a standard
part of the module, here's my vote:


While I don't think they're still accepting votes :), you've pointed
me to something I didn't know about until now. What kinds of things
have you been using re.Scanner for?

Jeremy


A scanner is constructed from a regex object and a string to be
scanned. Each call to the scanner's search() method returns the next
match object of the regex on the string. So to work on a string that
has multiple matches, it's the bee's roller skates.
Jul 18 '05 #3
On 12 Nov 2003 13:04:36 -0800, wa**@lightlink.com (Wade Leftwich)
wrote:
tw*********@hotmail.com (Jeremy Fincher) wrote in message news:<69**************************@posting.google. com>...
wa**@lightlink.com (Wade Leftwich) wrote in message news:<5b**************************@posting.google. com>...
> Every couple of months I have a use for the experimental 'scanner'
> object in the re module, and when I do, as I did this morning, it's
> really handy. So if anyone is counting votes for making it a standard
> part of the module, here's my vote:


While I don't think they're still accepting votes :), you've pointed
me to something I didn't know about until now. What kinds of things
have you been using re.Scanner for?

Jeremy


A scanner is constructed from a regex object and a string to be
scanned. Each call to the scanner's search() method returns the next
match object of the regex on the string. So to work on a string that
has multiple matches, it's the bee's roller skates.


Or in Eric's case, *the* roller skate.
--dang
Jul 18 '05 #4
Wade Leftwich wrote:
...
A scanner is constructed from a regex object and a string to be
scanned. Each call to the scanner's search() method returns the next
match object of the regex on the string. So to work on a string that
has multiple matches, it's the bee's roller skates.


....if that method's name was 'next' (and an appropriate __iter__
also present) it might be even cooler, though...
Alex

Jul 18 '05 #5
Alex Martelli <al***@aleax.it> wrote:
Wade Leftwich wrote:
...
A scanner is constructed from a regex object and a string to be
scanned. Each call to the scanner's search() method returns the next
match object of the regex on the string. So to work on a string that
has multiple matches, it's the bee's roller skates.


...if that method's name was 'next' (and an appropriate __iter__
also present) it might be even cooler, though...
Alex


Indeed:
class CoolerScanner(object): .... def __init__(self, regex, s):
.... self.scanner = regex.scanner(s)
.... def next(self):
.... m = self.scanner.search()
.... if m:
.... return m
.... else:
.... raise StopIteration
.... def __iter__(self):
.... while 1:
.... yield self.next()
.... regex = re.compile(r'(?P<before>.)a(?P<after>.)')
s = '1ab2ac3ad'
for m in CoolerScanner(regex, s): .... print m.group('before'), m.group('after')
....
1 b
2 c
3 d


-- Wade
Jul 18 '05 #6
Wade Leftwich wrote:
regex = re.compile(r'(?P<before>.)a(?P<after>.)')
s = '1ab2ac3ad'
for m in CoolerScanner(regex, s): ... print m.group('before'), m.group('after')
...
1 b
2 c
3 d

regex = re.compile(r'(?P<before>.)a(?P<after>.)')
s = '1ab2ac3ad'
for m in regex.finditer(s):

.... print m.group('before'), m.group('after')
....
1 b
2 c
3 d

</F>


Jul 18 '05 #7
Alex Martelli wrote:
Wade Leftwich wrote:
...
A scanner is constructed from a regex object and a string to be
scanned. Each call to the scanner's search() method returns the next
match object of the regex on the string. So to work on a string that
has multiple matches, it's the bee's roller skates.


...if that method's name was 'next' (and an appropriate __iter__
also present) it might be even cooler, though...


re.finditer

</F>


Jul 18 '05 #8
Fredrik Lundh wrote:
Alex Martelli wrote:
Wade Leftwich wrote:
...
> A scanner is constructed from a regex object and a string to be
> scanned. Each call to the scanner's search() method returns the next
> match object of the regex on the string. So to work on a string that
> has multiple matches, it's the bee's roller skates.


...if that method's name was 'next' (and an appropriate __iter__
also present) it might be even cooler, though...


re.finditer


Yep. So the scanner isn't warranted any longer, right?
Alex

Jul 18 '05 #9
"Fredrik Lundh" <fr*****@pythonware.com> wrote in message news:<ma************************************@pytho n.org>...
Wade Leftwich wrote:
>> regex = re.compile(r'(?P<before>.)a(?P<after>.)')
>> s = '1ab2ac3ad'
>> for m in CoolerScanner(regex, s):

... print m.group('before'), m.group('after')
...
1 b
2 c
3 d

regex = re.compile(r'(?P<before>.)a(?P<after>.)')
s = '1ab2ac3ad'
for m in regex.finditer(s):

... print m.group('before'), m.group('after')
...
1 b
2 c
3 d

</F>


There I go, reimplementing the wheel again. Guess I didn't pay enough
attention to "What's New In 2.2". Thanks for the pointer. It appears
we don't need that scanner() method after all.

However, from my point of view it was a good exercise, because now I
know how easy it is to make an iterator.

Thanks again

-- Wade
Jul 18 '05 #10
Alex Martelli wrote:
...if that method's name was 'next' (and an appropriate __iter__
also present) it might be even cooler, though...


re.finditer


Yep. So the scanner isn't warranted any longer, right?


if you remove it, you'll break re.Scanner.

</F>


Jul 18 '05 #11
I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.
Any ideas would be greatly appreciated.

Allan
Jul 18 '05 #12
On Wed, 04 Feb 2004 19:35:52 GMT, allanc
<ka***********@nospamyahoo.ca> wrote:
I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.


Are the key fields in fixed positions? If so, pluck them out and use
them as an index into a dictionary of functions to call. I can't tell
from your example where the keys are, so I'm assuming the first 8 are
simply a line number and the next 4 are the key.

Maybe something along these lines:

def header(x):
print 'header: %s' % x # process header

def testinstruction(x):
print 'test instruction: %s' % x # process test instruction

def lineitem(x):
print 'lineitem: %s' % x # process line item

ptable = {'0190':header, '5549': testinstruction, '2069': lineitem}

for line in file("data.dat"):
ptable[line[8:12]](line)

--dang
Jul 18 '05 #13
allanc wrote:
Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.


I've written many programs to parse data very similar to this,
until I generalized the algorithm (a line-oriented state machine)
into a module. You can find the module (internally documented)
at http://docutils.sf.net/docutils/statemachine.py.

Hope it helps!

--
David Goodger http://python.net/~goodger
For hire: http://python.net/~goodger/cv
Jul 18 '05 #14


allanc wrote:
I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.
Any ideas would be greatly appreciated.

Allan

allanc,
-slices as in str[0:5] or str[5:] or str[5:-1] - get pieces of a string
-you'll probably want to strip leading/trailing spaces; see strings doc
-you may need to cast/convert
_int = int("55")
_float = float("4.2")
wes

Jul 18 '05 #15


allanc wrote:
I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.
Any ideas would be greatly appreciated.

Allan


Allan,
Maybe this will help more:
line = "015083915549 SHORT ON LAST ORDER 0150839220692"
print line[0:10] 0150839155 print line [:10] 0150839155 print line[5:10] 39155 print line[-10:-1] 083922069 print int(line[-10:-1]) 83922069 print " xyz ".strip()

xyz

wes

Jul 18 '05 #16
"allanc" <ka***********@nospamyahoo.ca> wrote in message
news:Xn******************************@198.161.157. 145...
I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.
Any ideas would be greatly appreciated.

Allan

Allan -

Let me put in a plug for pyparsing. I think your problem is tailor-made for
pyparsing's easy-to-use grammar definitions and execution. No special
lexx/yacc-like syntax or RE symbology to master, you assemble your grammar
using simply-named classes (such as Literal, OneOrMore, Word(wordchars),
Optional, etc.) and intuitive operators (+ for sequence, | for greedy
alternation, ^ for longest-match alternation, ~ for, um, Not-tion).

A grammar to parse "Hello, World!" might look like:
helloGrammar = Word(alphas) + "," + Word(alphas) + oneOf(". ! ? !! !!!")
which could then parse any of:
Hello, World!
Hello , World !
Hello,World!
Yo, Adrian!!!
Hey, man.
Whattup, dude?

You can associate field names with specific parse elements, so that the
fields can be extracted from the results such as:
helloGrammar = Word(alphas).setResultsName("greeting") + "," + \
Word(alphas).setResultsName("to") + oneOf(". ! ? !! !!!")
results = helloGrammar.parseString( greetingstring )
print results.greeting
print results.to

You can associate parse actions (a la SAX) to fire when matching parse
elements are matched in the input.

You can find the pyparsing home page at http://pyparsing.sourceforge.net.

-- Paul McGuire
Jul 18 '05 #17
I think one of the easiest ways to do this is to
write a class that knows how to parse each of the
unique lines. As you are reading through the file/table
and encounter a line like the first, create a new
class instance and pass it the line's contents. The
__init__ method of the class can parse the line and
place each of the field values in an attribute of the
class.

Something like (this is pseudocode):

class linetype01:
#
# Define a list that contains information about how to
# parse a single linetype. The info is fieldname,
# beginning column, ending column, fieldlength
#

_parsinginfo=[('recnum',0,8),
('linetype',8,3),
('dataitem2',11,3),
...)
def __init__(self, linetext):
self.linetext=linetext
for fieldname, begincol, fieldlength in _parsinginfo:
self.__dict__[fieldname]=linetext[begincol,
begincol+fieldlength+1]
return

you would define a class like this for each unique linetype

in main program
import sys

#
# Insert code to open file/table here
#
for line in table:
#
# See which linetype it is
#
linetype=line[8:10]
if linetype == "01":
pline=linetype01(line)
#
# Now you can extract the values by accessing attributes of
# the class.
#
recordnum=pline.recnum
tlinetype=pline.linetype
#
# Do something with the values
#
elif linetype == "55":
pline=linetype55(line)

elif linetype == "20":
pline=linetype20(line)
else:
print "ERROR-Illegal linetype encountered")
sys.exit(2)
Just one of many ways to solve this problem.

-Larry
"allanc" <ka***********@nospamyahoo.ca> wrote in message
news:Xn******************************@198.161.157. 145...
I'm new with python so bear with me.

I'm looking for a way to elegantly parse fixed-width text data (as opposed
to CSV) and saving the parsed data unto a database. The text data comes
from an old ISAM-format table and each line may be a different record
structure depending on key fields in the line.

RegExp with match and split are of interest but it's been too long since
I've dabbled with RE to be able to judge whether its use will make the
problem more complex.

Here's a sample of the records I need to parse:

01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400

1st Line is a (portion of) header record.
2nd Line is an text instruction record.
3rd Line is a Transaction Line Item record.

Each type of record has a different structure. But these set of lines
appear in the one table.
Any ideas would be greatly appreciated.

Allan

Jul 18 '05 #18
> 01508390019002 11284361000002SUGARPLUM
015083915549 SHORT ON LAST ORDER
0150839220692 000002EA BMC 15 KG 001400


Is the above the format of all possible lines (aside from empty lines)?

- Josiah
Jul 18 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Otis Green | last post by:
Vote for or against a new newsgroup proposal. To summarize what you need to do, just send an empty e-mail to postgresql-ballot@netagw.com You will receive a ballot by e-mail. Follow the...
8
by: William Drew | last post by:
REQUEST FOR DISCUSSION (RFD) unmoderated group comp.databases.mysql This is an invitation to discuss the following proposal to create newsgroup comp.databases.mysql. Please note that YOU...
14
by: Otis Green | last post by:
Vote for or against a new newsgroup proposal. To summarize what you need to do, just send an empty e-mail to postgresql-ballot@netagw.com You'll receive a ballot by e-mail. Follow the...
1
by: Denis Van der Heyden | last post by:
Hi, I have a friend who creates non-professional films and he is actually on a german website. I want to vote for him a lot of time,and I would need to create a shortcut for the vote. When we...
0
by: Daniel Bass | last post by:
Symbol MC9000k scanner running Windows Mobile 2003. C# .Net (.Net CF) with OpenNetCf 1.2 installed. Latest Symbol SDK driving the scan engine stuff. I've extracted the barcode scanner handling...
9
by: Dan =o\) | last post by:
Hey guys, I wonder if you could please provide me with some ideas as to how to get around this problem. Symbol MC9000-k, Pocket PC 2003... With a scanner application I've written, data is...
7
by: DemonWasp | last post by:
I've been having some trouble getting the Scanner class to operate the way I'd like. I'm doing some fairly basic file IO and I can't seem to get the class to load the last line/token any way I try....
3
by: thename1000 | last post by:
Hi, I'm trying to create this output: Input team 1's name: Team 1 Input team 1's ranking: 90.4 etc.
6
by: rotaryfreak | last post by:
Hi everyone, ive had this problem for a while and i cant seem to figure out why. I am using eclipse to create my java code. When import the Scanner class, create a new object and so on... ...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Aliciasmith | last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
3
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM) Please note that the UK and Europe revert to winter time on...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
4
by: GKJR | last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.