By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,573 Members | 903 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,573 IT Pros & Developers. It's quick & easy.

Complex regular expression?

P: n/a
Hi All,

I couldn't find a regular expressions group to ask this in, so I
thought I'd ask here as I'm a little familiar with php's regular
expressions syntax.

I have a comma delimited text file that I need to change to being tab
delimited.

My problem is that commas appear in the values of one of my columns,
and I'm trying to think of a graceful way of changing the other commas
(ie those that do indicate the delimitation of a field, rather than
which appear within the value of a field) in the file to tabs without
affecting the commas that appear in the column in question.

An example of the contents of the file would be:

1,"1","20040301","08-08","BOOK, RETAIL",20.00,23.56
2,"1","20040301","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15.99,22.00

So, I'm trying to create a regular expression that will change all the
commas to tabs, except where the comma(s) appear within quotes.

I've tried several different approaches, including a three-step
process where I just change the commas that appear within quotes to a
known 'escape' value, then changing all the commas to tabs, then
changing the 'escape' values back to commas, but I can't seem to
create a regular expression that will take into account the
possibility of several commas appearing between quotes.

I'm wondering if anyone can help me understand this better?

Many thanks in advance,

Murray
Jul 17 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
"M Wells" <pl**********@planetthoughtful.org> wrote in message
news:oa********************************@4ax.com...
Hi All,

I couldn't find a regular expressions group to ask this in, so I
thought I'd ask here as I'm a little familiar with php's regular
expressions syntax.

I have a comma delimited text file that I need to change to being tab
delimited.

My problem is that commas appear in the values of one of my columns,
and I'm trying to think of a graceful way of changing the other commas
(ie those that do indicate the delimitation of a field, rather than
which appear within the value of a field) in the file to tabs without
affecting the commas that appear in the column in question.

An example of the contents of the file would be:

1,"1","20040301","08-08","BOOK, RETAIL",20.00,23.56
2,"1","20040301","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15.99,22.00

So, I'm trying to create a regular expression that will change all the
commas to tabs, except where the comma(s) appear within quotes.

I've tried several different approaches, including a three-step
process where I just change the commas that appear within quotes to a
known 'escape' value, then changing all the commas to tabs, then
changing the 'escape' values back to commas, but I can't seem to
create a regular expression that will take into account the
possibility of several commas appearing between quotes.


A not so elegant way:

function to_tab($matches) {
return strtr($matches[1], ",", "\t") . $matches[2];
}

$r = preg_replace_callback('/([^"]*)("?[^"]*"?)/', 'to_tab', $s);
Jul 17 '05 #2

P: n/a
M Wells schrieb:
Hi All,

I couldn't find a regular expressions group to ask this in, so I
thought I'd ask here as I'm a little familiar with php's regular
expressions syntax.

I have a comma delimited text file that I need to change to being tab
delimited.

My problem is that commas appear in the values of one of my columns,
and I'm trying to think of a graceful way of changing the other
commas (ie those that do indicate the delimitation of a field,
rather than which appear within the value of a field) in the file
to tabs without affecting the commas that appear in the column in
question.

An example of the contents of the file would be:

1,"1","20040301","08-08","BOOK, RETAIL",20.00,23.56
2,"1","20040301","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15.99,22.00

So, I'm trying to create a regular expression that will change all
the commas to tabs, except where the comma(s) appear within quotes.

I've tried several different approaches, including a three-step
process where I just change the commas that appear within quotes to a
known 'escape' value, then changing all the commas to tabs, then
changing the 'escape' values back to commas, but I can't seem to
create a regular expression that will take into account the
possibility of several commas appearing between quotes.

I'm wondering if anyone can help me understand this better?

Many thanks in advance,

Murray


Another way to solve this problem is to replace all commas which where
NOT followed by spaces... If u can be sure that commas in quotes always
have a space behind them...

$new_string = preg_replace('/\,([\S])/',"\t$1",$string);

*Hannes*

Jul 17 '05 #3

P: n/a
M Wells wrote:
I have a comma delimited text file that I need to change to being tab
delimited.

My problem is that commas appear in the values of one of my columns,
and I'm trying to think of a graceful way of changing the other commas
(ie those that do indicate the delimitation of a field, rather than
which appear within the value of a field) in the file to tabs without
affecting the commas that appear in the column in question.


http://www.php.net/manual/en/function.fgetcsv.php

If you have commas inside the quoted fields this function takes care of it
for you. You can specify what sort of delimiter as well (eg tab, comma etc)

Chris

--
Chris Hope
The Electric Toolbox Ltd
http://www.electrictoolbox.com/
Jul 17 '05 #4

P: n/a

"M Wells" <pl**********@planetthoughtful.org> wrote in message
news:oa********************************@4ax.com...
Hi All,

I couldn't find a regular expressions group to ask this in, so I
thought I'd ask here as I'm a little familiar with php's regular
expressions syntax.

I have a comma delimited text file that I need to change to being tab
delimited.

My problem is that commas appear in the values of one of my columns,
and I'm trying to think of a graceful way of changing the other commas
(ie those that do indicate the delimitation of a field, rather than
which appear within the value of a field) in the file to tabs without
affecting the commas that appear in the column in question.

An example of the contents of the file would be:

1,"1","20040301","08-08","BOOK, RETAIL",20.00,23.56
2,"1","20040301","03-09","BOOK, WHOLESALE, DISTRIBUTOR",15.99,22.00

So, I'm trying to create a regular expression that will change all the
commas to tabs, except where the comma(s) appear within quotes.

I've tried several different approaches, including a three-step
process where I just change the commas that appear within quotes to a
known 'escape' value, then changing all the commas to tabs, then
changing the 'escape' values back to commas, but I can't seem to
create a regular expression that will take into account the
possibility of several commas appearing between quotes.

I'm wondering if anyone can help me understand this better?

Many thanks in advance,

Murray


Had a bit of a tinker, came up with this:

<?php
$x='1,2,3,"some text in quotes",4,5,"some more, this time with a comma"';
preg_match_all('/(".*?")/',$x,$r);

$r[0] now looks like this:
Array
(
[0] => "some text in quotes"
[1] => "some more, this time with a comma"
)

As you can see, the non-greediness of the regexp handles is the key. Run
your original line through substr_replace() to get these strings replaced
with tokens and resume where you left off.

HTH
Garp

Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.