By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,024 Members | 2,027 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,024 IT Pros & Developers. It's quick & easy.

Regular expression for file name

P: n/a
Hello All,

In a configuration file there can be ID's and filename tokens.
The file names have a known suffix (.o or .mls) and I need to get a regular
expression that will catch filename but not an ID.

Currently:
ID = r"[a-zA-Z\.]\w+(?![/\\])"
FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))"

However if I have the filename "Sources/kernel/rom_kernel.mls" then
"Source" is interrupted as ID and "s/kernel/rom_kernel.mls" is interrupted
as file name.

Any way to do better?

BTW: I'm using PLY (http://systems.cs.uchicago.edu/ply/) for parsing.

Bye.
--
------------------------------------------------------------------------
Miki Tebeka <mi*********@zoran.com>
http://tebeka.spymac.net
The only difference between children and adults is the price of the toys
Jul 18 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Sun, 18 Jul 2004 14:21:14 +0200, "Miki Tebeka" <mi*********@zoran.com> wrote:
Hello All,

In a configuration file there can be ID's and filename tokens.
The file names have a known suffix (.o or .mls) and I need to get a regular
expression that will catch filename but not an ID.

Currently:
ID = r"[a-zA-Z\.]\w+(?![/\\])"
FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))"

However if I have the filename "Sources/kernel/rom_kernel.mls" then
"Source" is interrupted as ID and "s/kernel/rom_kernel.mls" is interrupted
as file name. ITYM s/interrupted/interpreted/ ;-)
Any way to do better?

If you want to prioritize matching amongst several
patterns with some leading commonality, UIAM or'ed terms get
tried left to right. I'm not checking your terms, but I think
here's a possible way to give priority to the FILENAME
pattern:
import re
ID = r"[a-zA-Z\.]\w+(?![/\\])"
FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))"
COMBINED = '(?P<file>%s)|(?P<id>%s)' % (FILENAME, ID)
rxo = re.compile(COMBINED)
filename = "Sources/kernel/rom_kernel.mls"
rxo.search(filename).groupdict() {'id': None, 'file': 'Sources/kernel/rom_kernel.mls'}

Try it with an id:
rxo.search('no_slashes_in_this').groupdict() {'id': 'no_slashes_in_this', 'file': None}

Of course you can mess with the result, e.g.,
result = rxo.search('no_slashes_in_this').groupdict()
result['id'] 'no_slashes_in_this' result['file']
result['file'] is None True result['id'], result['file']

('no_slashes_in_this', None)

No guarantees, but HTH

Regards,
Bengt Richter
Jul 18 '05 #2

P: n/a
On Sun, 18 Jul 2004, Miki Tebeka wrote:
In a configuration file there can be ID's and filename tokens.
The file names have a known suffix (.o or .mls) and I need to get a regular
expression that will catch filename but not an ID.

Currently:
ID = r"[a-zA-Z\.]\w+(?![/\\])"
FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))"

However if I have the filename "Sources/kernel/rom_kernel.mls" then
"Source" is interrupted as ID and "s/kernel/rom_kernel.mls" is interrupted
as file name.


I'm not familiar with PLY, but my guess as to the cause is that it gives
you those results because it is trying to match ID first, and then
FILENAME. The best way to solve this is to incorporate another restraint
in your RE, that is, the delimiter at the end of the pattern (presumably
whitespace):

ID = r"[a-zA-Z\.]\w+(?=\s)"
FILENAME = r"([a-zA-Z]:)?[\w./\\]+\.((mls)|(o))(?=\s)"

I'm not sure if PLY supports (?=...) or not, but I assume it does, since
you used its complement ((?!...)) in your original REs.

Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.