469,284 Members | 2,488 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,284 developers. It's quick & easy.

Regular expression, (preg_split etc...), some help please.

Hi,

I need some help to split data using regular expression

Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.

So how can i search for a regular expression that is *not within*
apostrophes?

I think i might have to write my own split function especially if i have an
extreme case like, '1," 2 , \" 3"', (note the escape apostrophe).

Many thanks for you input.

Sims

Jul 17 '05 #1
16 3347
In article <c0*************@ID-162430.news.uni-berlin.de>,
"Sims" <si*********@hotmail.com> wrote:
Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.


Is the overall goal to extract numbers having arbitrary separators? If so,
how about splitting on "/\D+/"?

--
CC
Jul 17 '05 #2
In article <c0*************@ID-162430.news.uni-berlin.de>,
"Sims" <si*********@hotmail.com> wrote:
Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.


(Oops, never mind that last post. )

--
CC
Jul 17 '05 #3
On Thu, 12 Feb 2004 08:49:59 +0200, Sims wrote:
I need some help to split data using regular expression


No you don't :) http://www.php.net/fgetcsv
Jul 17 '05 #4

"Ewoud Dronkert" <me@privacy.net> wrote in message
news:n3********************************@4ax.com...
On Thu, 12 Feb 2004 08:49:59 +0200, Sims wrote:
I need some help to split data using regular expression


No you don't :) http://www.php.net/fgetcsv


Sorry but that is reading from a file, i am reading from a line.
I do want the same sort of output but not from a file.

I will have a look at the C code to see if can create a php function to
achieve what i need.

Sims.
Jul 17 '05 #5
"Sims" <si*********@hotmail.com> wrote in message news:<c0*************@ID-162430.news.uni-berlin.de>...
Hi,

I need some help to split data using regular expression

Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.

So how can i search for a regular expression that is *not within*
apostrophes?

I think i might have to write my own split function especially if i have an
extreme case like, '1," 2 , \" 3"', (note the escape apostrophe).

Many thanks for you input.

Sims


My guess is following RegExp.

[SNIP]

$str = '1," 2 ,\" 3 ", 4, 5';
$arr = preg_match_all('#(?<=")\s*[\d]+.*(?=[^\\\]")|[\d]+#U',$str,$matches);
print_r($matches);

[/SNIP]

Try and let me know if it does not work as per your requirements.

--
Cheers,
Rahul Anand
Jul 17 '05 #6

My guess is following RegExp.

[SNIP]

$str = '1," 2 ,\" 3 ", 4, 5';
$arr = preg_match_all('#(?<=")\s*[\d]+.*(?=[^\\\]")|[\d]+#U',$str,$matches); print_r($matches);

[/SNIP]


If it is a guess then it is brilliant, thanks : ).

It works 99.9%.

The only time it does not work is when you have something like... $str =
'1," 2 ,\" 3a", 4, 5';, (Note the 'a' after the '3').
Your RegExp removes the last item, (letter),
Your expression is very very complicated for me so i am going to spend some
time trying to work it all out.

Thanks a million.
Jul 17 '05 #7
On Thu, 12 Feb 2004 11:42:08 +0200, Sims wrote:
No you don't :) http://www.php.net/fgetcsv


Sorry but that is reading from a file, i am reading from a line.


So post the data to a script, open a filepointer to "php://input" and
read from that. Or write data to diskfile and read from diskfile (duh).
Jul 17 '05 #8

"Ewoud Dronkert" <me@privacy.net> wrote in message
news:ak********************************@4ax.com...
On Thu, 12 Feb 2004 11:42:08 +0200, Sims wrote:
No you don't :) http://www.php.net/fgetcsv
Sorry but that is reading from a file, i am reading from a line.


So post the data to a script, open a filepointer to "php://input" and
read from that. Or write data to diskfile and read from diskfile


Sorry that is simply not good practice at all, in fact it is very bad
programming.

Reading/Writing data to file? Just to use a function, i would bring my
server to a stand still.
I think i will rather use a more realistic function like the one offered by
Rahul.
(duh).


Thanks.

Sims
Jul 17 '05 #9
On Thu, 12 Feb 2004 21:59:30 +0200, Sims wrote:
Sorry that is simply not good practice at all, in fact it is very bad
programming.


Then provide some more details of the environment/circumstances/typical
situation. You gave none. For all we know, you were trying to convert a
small one-time output by your heartrate monitor or something.

Please don't put me down for failing prerequisites you did not mention.
Jul 17 '05 #10

Then provide some more details of the environment/circumstances/typical
situation.
You are trying to dig yourself out, i gave a list of strings and what i
wanted to achieve with the strings and the problems i was coming across.
My environment/circumstances/typical is not necessary in my particular case.

I wanted to get an array from a string that is more than enough info.
Knowing that i have an Apache server with Win2003 is of no use to the
problem, (to a knowledgeable programmer anyway).
You gave none. For all we know, you were trying to convert a
small one-time output by your heartrate monitor or something.
Still, creating a file and reading it is simply wrong, regardless what i was
trying to achieve.

Please don't put me down for failing prerequisites you did not mention.


Then don't quote Homer and read the OP properly.

Regards.

Sims
Jul 17 '05 #11
Sims wrote:
Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"')
There's a typo in there somewhere, I believe.
and i correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.


This is very similar to -- and based on -- Rahul Anand's "guess". :-)

A jungle of assertions!:

preg_match_all(
'`(?<=").*?(?=(?<!\\\)")|\d+`s',
$string,
$array)

Describing it at a high-ish level, the pattern looks for either of
two alternatives: zero or more quoted substrings, or one or more
decimal digits. A quoted substring, intuitively, begins and ends
with double-quotes.

Describing it at a much lower level, the pattern uses a positive
look-behind assertion to check to see if a quoted substring follows,
i.e. a double-quote precedes the current matching point. Everything
until the closing double-quote is part of a quoted substring. The
closing double-quote is found using two more assertions: firstly, a
positive look-ahead assertion checks that the next character is a
double-quote; secondly, a negative look-behind assertion checks this
double-quote is not preceded by a backslash. (The three backslashes
are needed to escape the backslash character's special meaning.) If
no match can be found for a quoted substring, one or more decimal
digits are looked for, using the \d character type.

The s pattern modifier means the dot metacharacter matches newlines,
which it doesn't by default.

All these grim details of the syntax are explained in the Manual's
section on PCRE pattern syntax.

--
Jock
Jul 17 '05 #12
Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"')
There's a typo in there somewhere, I believe.


Yes indeed, sorry, maybe i should have copied and paste rather.
and i correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.
This is very similar to -- and based on -- Rahul Anand's "guess". :-)


I could see that Rahul was not far from it.
A jungle of assertions!:

preg_match_all(
'`(?<=").*?(?=(?<!\\\)")|\d+`s',
$string,
$array)
so using an example like,

$string = 'a1a ,2b , 3c, " 4, \"aaa, 5", 7';
preg_match_all( '`(?<=").*?(?=(?<!\\\)")|\d+`s', $string, $array);
print_r( $array );

i get an output like...

Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4, \"aaa, 5 [4]
=> 7 ) )

Why are some of the letters ignored?
Describing it at a high-ish level, the pattern looks for either of
two alternatives: zero or more quoted substrings, or one or more
decimal digits. A quoted substring, intuitively, begins and ends
with double-quotes.
What part looks for a decimal number?
I am afraid that my description should not have included numbers only.
I want the RegEx to work for anything.

so that a case like

$string = 'xx,xx , xx," x, \"x, x", " x, x", xx, "xx", x';

would work regardless what 'x' represents a letter, a number or a symbol,
(apart form x= " itself).

All these grim details of the syntax are explained in the Manual's
section on PCRE pattern syntax.

Once i have your and Rahul's example in front of me then i can try and work
it out but the number of assertions is simply mind boggling.
--
Jock


Many thanks

Sims
Jul 17 '05 #13

On 12-Feb-2004, "Sims" <si*********@hotmail.com> wrote:

My guess is following RegExp.

[SNIP]

$str = '1," 2 ,\" 3 ", 4, 5';
$arr =

preg_match_all('#(?<=")\s*[\d]+.*(?=[^\\\]")|[\d]+#U',$str,$matches);
print_r($matches);

[/SNIP]


If it is a guess then it is brilliant, thanks : ).

It works 99.9%.

The only time it does not work is when you have something like... $str
=
'1," 2 ,\" 3a", 4, 5';, (Note the 'a' after the '3').
Your RegExp removes the last item, (letter),
Your expression is very very complicated for me so i am going to spend
some
time trying to work it all out.

Thanks a million.


There are several classes for handling comma delimited stuff on
phpclasses.org, search on 'comma'. You should be able to hack one of them
into parsing your string.

--
Tom Thackrey
www.creative-light.com
tom (at) creative (dash) light (dot) com
do NOT send email to ja*********@willglen.net (it's reserved for spammers)
Jul 17 '05 #14
On Thu, 12 Feb 2004 22:25:46 +0200, Sims wrote:
You are trying to dig yourself out


I give up.
Jul 17 '05 #15
"Sims" <si*********@hotmail.com> wrote in message news:<c0*************@ID-162430.news.uni-berlin.de>...

$string = 'a1a ,2b , 3c, " 4, \"aaa, 5", 7';
preg_match_all( '`(?<=").*?(?=(?<!\\\)")|\d+`s', $string, $array);
print_r( $array );

i get an output like...

Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4, \"aaa, 5 [4]
=> 7 ) )

Why are some of the letters ignored?
Describing it at a high-ish level, the pattern looks for either of
two alternatives: zero or more quoted substrings, or one or more
decimal digits. A quoted substring, intuitively, begins and ends
with double-quotes.


What part looks for a decimal number?
I am afraid that my description should not have included numbers only.
I want the RegEx to work for anything.

so that a case like

$string = 'xx,xx , xx," x, \"x, x", " x, x", xx, "xx", x';

would work regardless what 'x' represents a letter, a number or a symbol,
(apart form x= " itself).


Modify your RegExp according to following example.
I hope now you will get the expected result.

[SNIP]

$str = '1," 2 ,\" 3aaa", 4, 5';
$arr = preg_match_all('#(?<=")\s*[\d]+.*[^\\\](?=")|[\d]+#U',$str,$matches);
print_r($matches);

[/SNIP]

--
Cheers,
Rahul Anand
Jul 17 '05 #16
Sims wrote:
$string = 'a1a ,2b , 3c, " 4, \"aaa, 5", 7';
preg_match_all( '`(?<=").*?(?=(?<!\\\)")|\d+`s', $string, $array);
print_r( $array );

i get an output like...

Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4, \"aaa, 5 [4]
=> 7 ) )

Why are some of the letters ignored?
Because that pattern doesn't allow letters outside quoted strings.
What part looks for a decimal number?
The decimal number character type ("\d"). That's in the second
alternative, near the end of the pattern.
I am afraid that my description should not have included numbers only.
I want the RegEx to work for anything.

so that a case like

$string = 'xx,xx , xx," x, \"x, x", " x, x", xx, "xx", x';

would work regardless what 'x' represents a letter, a number or a symbol,
(apart form x= " itself).


Well, I spent some considerable time on this and discovered I was
going round and round in circles, covering the same ground. I can't
think of how to do it all in a single regular expression. :-(

Because the pattern before used assertions to check for double-
quotes, it would match the commas between quoted substrings. Since
the assertion is zero-width -- that is, it doesn't consume any
characters -- the closing double-quote is taken to also mean the
start of a quoted substring.

Consider:

preg_match_all(
'`"(.*?)(?<!\\\)"|([^\s,"]+)`s',
$string,
$array)

Matched against your example, this returns three arrays. You can
retrieve the information you want from the second and third arrays.
The first array contains all the data, but with double-quotes left
in. You could use the first array instead, and remove any leading
and trailing double-quotes.

--
Jock
Jul 17 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by tdl | last post: by
8 posts views Thread by brendan | last post: by
9 posts views Thread by www.douglassdavis.com | last post: by
2 posts views Thread by hillcountry74 | last post: by
4 posts views Thread by Buddy | last post: by
2 posts views Thread by Brian Kitt | last post: by
3 posts views Thread by Lucky | last post: by
3 posts views Thread by Zach | last post: by
3 posts views Thread by Mr.Steskal | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.