By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,971 Members | 900 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,971 IT Pros & Developers. It's quick & easy.

SUBSTRING for a regular expression

P: n/a
I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column:

Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer.

Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance.

Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt.

Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely.

Into separate columns for: name, age, address, charge. For example the
first record would have

name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.'

To get the name, for instance, I tried

SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log;

or the age value

SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log;

But return values are all NULL. Can anyone give me some RE help, please?

--Berend Tober


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
What language are you using for this plperl?

That command listed in the documentation certainly works in psql or
plpgsql but I don't know about plperl.

Are you assigning the result to a variable and then printing the result
of that variable before the function ends or printing based on what the
function returns?

I think you will probably need to show us the rest of the code in your
function.

On Mon, 2004-07-05 at 19:40, bt****@computer.org wrote:
I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column:

Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer.

Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance.

Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt.

Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely.

Into separate columns for: name, age, address, charge. For example the
first record would have

name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.'

To get the name, for instance, I tried

SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log;

or the age value

SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log;

But return values are all NULL. Can anyone give me some RE help, please?

--Berend Tober


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2

P: n/a
In article <64617.206.53.65.243.1089074434.squirrel@$HOSTNAME >,
<bt****@computer.org> writes:
I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column: Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer. Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance. Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt. Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely. Into separate columns for: name, age, address, charge. For example the
first record would have name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.' To get the name, for instance, I tried SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log; or the age value SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log; But return values are all NULL. Can anyone give me some RE help, please?


Could you use Perl? A Perl regexp for that would be

/^(.+), (\d+), of (.+?),? (was charged.+)$/
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #3

P: n/a

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged') AS
address, SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;


Aha! The old double-slash escape. Thank you very much.

--Berend Tober


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #4

P: n/a

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged')
AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

- --
Greg Sabino Mullane gr**@turnstep.com
PGP Key: 0x14964AC8 200407062103
-----BEGIN PGP SIGNATURE-----

iD8DBQFA60wAvJuQZxSWSsgRAuKPAJ0QAeG0hdoJ/Ofqq/lXVtwMjyzQjACgoer3
kwPy0xvRiZxwr3cgPq6Rjwc=
=mF/C
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5

P: n/a
> SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged')
AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

- --
Greg Sabino Mullane gr**@turnstep.com
PGP Key: 0x14964AC8 200407062103


Thanks Greg,

I was hopeful that this would work, since I had missed the need to double
the back-slash escape character in my original work, but something still
isn't right.

First I got an error message that psql didn't like the "?" characters in
the RE, so I eliminated them and wrote

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*), was charged') AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;
This modified version of your suggestion gets the dow, and charge columns
right, but I'm still not seeing the vname, age, and address columns --
they return null. You've gotten me part way there, and I appreciate that.
Any further ideas?

--Berend Tober


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.