473,396 Members | 2,004 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Regular expression, (preg_split etc...), some help please.

Hi,

I need some help to split data using regular expression

Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.

So how can i search for a regular expression that is *not within*
apostrophes?

I think i might have to write my own split function especially if i have an
extreme case like, '1," 2 , \" 3"', (note the escape apostrophe).

Many thanks for you input.

Sims

Jul 17 '05 #1
16 3522
In article <c0*************@ID-162430.news.uni-berlin.de>,
"Sims" <si*********@hotmail.com> wrote:
Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.


Is the overall goal to extract numbers having arbitrary separators? If so,
how about splitting on "/\D+/"?

--
CC
Jul 17 '05 #2
In article <c0*************@ID-162430.news.uni-berlin.de>,
"Sims" <si*********@hotmail.com> wrote:
Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.


(Oops, never mind that last post. )

--
CC
Jul 17 '05 #3
On Thu, 12 Feb 2004 08:49:59 +0200, Sims wrote:
I need some help to split data using regular expression


No you don't :) http://www.php.net/fgetcsv
Jul 17 '05 #4

"Ewoud Dronkert" <me@privacy.net> wrote in message
news:n3********************************@4ax.com...
On Thu, 12 Feb 2004 08:49:59 +0200, Sims wrote:
I need some help to split data using regular expression


No you don't :) http://www.php.net/fgetcsv


Sorry but that is reading from a file, i am reading from a line.
I do want the same sort of output but not from a file.

I will have a look at the C code to see if can create a php function to
achieve what i need.

Sims.
Jul 17 '05 #5
"Sims" <si*********@hotmail.com> wrote in message news:<c0*************@ID-162430.news.uni-berlin.de>...
Hi,

I need some help to split data using regular expression

Consider the string

'1,2,3', I can split it using, preg_split("/,/", '1,2,3') and i correctly
get [0]=1, [1]=2,[2]=3.

Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"') and i
correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.

So how can i search for a regular expression that is *not within*
apostrophes?

I think i might have to write my own split function especially if i have an
extreme case like, '1," 2 , \" 3"', (note the escape apostrophe).

Many thanks for you input.

Sims


My guess is following RegExp.

[SNIP]

$str = '1," 2 ,\" 3 ", 4, 5';
$arr = preg_match_all('#(?<=")\s*[\d]+.*(?=[^\\\]")|[\d]+#U',$str,$matches);
print_r($matches);

[/SNIP]

Try and let me know if it does not work as per your requirements.

--
Cheers,
Rahul Anand
Jul 17 '05 #6

My guess is following RegExp.

[SNIP]

$str = '1," 2 ,\" 3 ", 4, 5';
$arr = preg_match_all('#(?<=")\s*[\d]+.*(?=[^\\\]")|[\d]+#U',$str,$matches); print_r($matches);

[/SNIP]


If it is a guess then it is brilliant, thanks : ).

It works 99.9%.

The only time it does not work is when you have something like... $str =
'1," 2 ,\" 3a", 4, 5';, (Note the 'a' after the '3').
Your RegExp removes the last item, (letter),
Your expression is very very complicated for me so i am going to spend some
time trying to work it all out.

Thanks a million.
Jul 17 '05 #7
On Thu, 12 Feb 2004 11:42:08 +0200, Sims wrote:
No you don't :) http://www.php.net/fgetcsv


Sorry but that is reading from a file, i am reading from a line.


So post the data to a script, open a filepointer to "php://input" and
read from that. Or write data to diskfile and read from diskfile (duh).
Jul 17 '05 #8

"Ewoud Dronkert" <me@privacy.net> wrote in message
news:ak********************************@4ax.com...
On Thu, 12 Feb 2004 11:42:08 +0200, Sims wrote:
No you don't :) http://www.php.net/fgetcsv
Sorry but that is reading from a file, i am reading from a line.


So post the data to a script, open a filepointer to "php://input" and
read from that. Or write data to diskfile and read from diskfile


Sorry that is simply not good practice at all, in fact it is very bad
programming.

Reading/Writing data to file? Just to use a function, i would bring my
server to a stand still.
I think i will rather use a more realistic function like the one offered by
Rahul.
(duh).


Thanks.

Sims
Jul 17 '05 #9
On Thu, 12 Feb 2004 21:59:30 +0200, Sims wrote:
Sorry that is simply not good practice at all, in fact it is very bad
programming.


Then provide some more details of the environment/circumstances/typical
situation. You gave none. For all we know, you were trying to convert a
small one-time output by your heartrate monitor or something.

Please don't put me down for failing prerequisites you did not mention.
Jul 17 '05 #10

Then provide some more details of the environment/circumstances/typical
situation.
You are trying to dig yourself out, i gave a list of strings and what i
wanted to achieve with the strings and the problems i was coming across.
My environment/circumstances/typical is not necessary in my particular case.

I wanted to get an array from a string that is more than enough info.
Knowing that i have an Apache server with Win2003 is of no use to the
problem, (to a knowledgeable programmer anyway).
You gave none. For all we know, you were trying to convert a
small one-time output by your heartrate monitor or something.
Still, creating a file and reading it is simply wrong, regardless what i was
trying to achieve.

Please don't put me down for failing prerequisites you did not mention.


Then don't quote Homer and read the OP properly.

Regards.

Sims
Jul 17 '05 #11
Sims wrote:
Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"')
There's a typo in there somewhere, I believe.
and i correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.


This is very similar to -- and based on -- Rahul Anand's "guess". :-)

A jungle of assertions!:

preg_match_all(
'`(?<=").*?(?=(?<!\\\)")|\d+`s',
$string,
$array)

Describing it at a high-ish level, the pattern looks for either of
two alternatives: zero or more quoted substrings, or one or more
decimal digits. A quoted substring, intuitively, begins and ends
with double-quotes.

Describing it at a much lower level, the pattern uses a positive
look-behind assertion to check to see if a quoted substring follows,
i.e. a double-quote precedes the current matching point. Everything
until the closing double-quote is part of a quoted substring. The
closing double-quote is found using two more assertions: firstly, a
positive look-ahead assertion checks that the next character is a
double-quote; secondly, a negative look-behind assertion checks this
double-quote is not preceded by a backslash. (The three backslashes
are needed to escape the backslash character's special meaning.) If
no match can be found for a quoted substring, one or more decimal
digits are looked for, using the \d character type.

The s pattern modifier means the dot metacharacter matches newlines,
which it doesn't by default.

All these grim details of the syntax are explained in the Manual's
section on PCRE pattern syntax.

--
Jock
Jul 17 '05 #12
Now if i have

'1,"2,3"' i could split it using preg_split("/(?<!\"),/\d", '1,"2,3"')
There's a typo in there somewhere, I believe.


Yes indeed, sorry, maybe i should have copied and paste rather.
and i correctly get [0]=1, [1]="2,3".

But it clearly does not work in some more advanced cases, for example

'1," 2 , 3"' or '1,"2 , 3 "' mainly because the /d is no longer useful.
This is very similar to -- and based on -- Rahul Anand's "guess". :-)


I could see that Rahul was not far from it.
A jungle of assertions!:

preg_match_all(
'`(?<=").*?(?=(?<!\\\)")|\d+`s',
$string,
$array)
so using an example like,

$string = 'a1a ,2b , 3c, " 4, \"aaa, 5", 7';
preg_match_all( '`(?<=").*?(?=(?<!\\\)")|\d+`s', $string, $array);
print_r( $array );

i get an output like...

Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4, \"aaa, 5 [4]
=> 7 ) )

Why are some of the letters ignored?
Describing it at a high-ish level, the pattern looks for either of
two alternatives: zero or more quoted substrings, or one or more
decimal digits. A quoted substring, intuitively, begins and ends
with double-quotes.
What part looks for a decimal number?
I am afraid that my description should not have included numbers only.
I want the RegEx to work for anything.

so that a case like

$string = 'xx,xx , xx," x, \"x, x", " x, x", xx, "xx", x';

would work regardless what 'x' represents a letter, a number or a symbol,
(apart form x= " itself).

All these grim details of the syntax are explained in the Manual's
section on PCRE pattern syntax.

Once i have your and Rahul's example in front of me then i can try and work
it out but the number of assertions is simply mind boggling.
--
Jock


Many thanks

Sims
Jul 17 '05 #13

On 12-Feb-2004, "Sims" <si*********@hotmail.com> wrote:

My guess is following RegExp.

[SNIP]

$str = '1," 2 ,\" 3 ", 4, 5';
$arr =

preg_match_all('#(?<=")\s*[\d]+.*(?=[^\\\]")|[\d]+#U',$str,$matches);
print_r($matches);

[/SNIP]


If it is a guess then it is brilliant, thanks : ).

It works 99.9%.

The only time it does not work is when you have something like... $str
=
'1," 2 ,\" 3a", 4, 5';, (Note the 'a' after the '3').
Your RegExp removes the last item, (letter),
Your expression is very very complicated for me so i am going to spend
some
time trying to work it all out.

Thanks a million.


There are several classes for handling comma delimited stuff on
phpclasses.org, search on 'comma'. You should be able to hack one of them
into parsing your string.

--
Tom Thackrey
www.creative-light.com
tom (at) creative (dash) light (dot) com
do NOT send email to ja*********@willglen.net (it's reserved for spammers)
Jul 17 '05 #14
On Thu, 12 Feb 2004 22:25:46 +0200, Sims wrote:
You are trying to dig yourself out


I give up.
Jul 17 '05 #15
"Sims" <si*********@hotmail.com> wrote in message news:<c0*************@ID-162430.news.uni-berlin.de>...

$string = 'a1a ,2b , 3c, " 4, \"aaa, 5", 7';
preg_match_all( '`(?<=").*?(?=(?<!\\\)")|\d+`s', $string, $array);
print_r( $array );

i get an output like...

Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4, \"aaa, 5 [4]
=> 7 ) )

Why are some of the letters ignored?
Describing it at a high-ish level, the pattern looks for either of
two alternatives: zero or more quoted substrings, or one or more
decimal digits. A quoted substring, intuitively, begins and ends
with double-quotes.


What part looks for a decimal number?
I am afraid that my description should not have included numbers only.
I want the RegEx to work for anything.

so that a case like

$string = 'xx,xx , xx," x, \"x, x", " x, x", xx, "xx", x';

would work regardless what 'x' represents a letter, a number or a symbol,
(apart form x= " itself).


Modify your RegExp according to following example.
I hope now you will get the expected result.

[SNIP]

$str = '1," 2 ,\" 3aaa", 4, 5';
$arr = preg_match_all('#(?<=")\s*[\d]+.*[^\\\](?=")|[\d]+#U',$str,$matches);
print_r($matches);

[/SNIP]

--
Cheers,
Rahul Anand
Jul 17 '05 #16
Sims wrote:
$string = 'a1a ,2b , 3c, " 4, \"aaa, 5", 7';
preg_match_all( '`(?<=").*?(?=(?<!\\\)")|\d+`s', $string, $array);
print_r( $array );

i get an output like...

Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 [3] => 4, \"aaa, 5 [4]
=> 7 ) )

Why are some of the letters ignored?
Because that pattern doesn't allow letters outside quoted strings.
What part looks for a decimal number?
The decimal number character type ("\d"). That's in the second
alternative, near the end of the pattern.
I am afraid that my description should not have included numbers only.
I want the RegEx to work for anything.

so that a case like

$string = 'xx,xx , xx," x, \"x, x", " x, x", xx, "xx", x';

would work regardless what 'x' represents a letter, a number or a symbol,
(apart form x= " itself).


Well, I spent some considerable time on this and discovered I was
going round and round in circles, covering the same ground. I can't
think of how to do it all in a single regular expression. :-(

Because the pattern before used assertions to check for double-
quotes, it would match the commas between quoted substrings. Since
the assertion is zero-width -- that is, it doesn't consume any
characters -- the closing double-quote is taken to also mean the
start of a quoted substring.

Consider:

preg_match_all(
'`"(.*?)(?<!\\\)"|([^\s,"]+)`s',
$string,
$array)

Matched against your example, this returns three arrays. You can
retrieve the information you want from the second and third arrays.
The first array contains all the data, but with double-quotes left
in. You could use the first array instead, and remove any leading
and trailing double-quotes.

--
Jock
Jul 17 '05 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: tdl | last post by:
Hi everybody, I'm a regular expressions newbie; I read many tutorials, but need more experience before becoming indipendent in this field. I need a little help from you. How can I split such a...
8
by: brendan | last post by:
my regular expression knowledge is admittedly amateurish ... I have spent an hour trying to get this to work without any great success, but am in the middle of a big project so I can't waste a day...
9
by: www.douglassdavis.com | last post by:
I am using the preg_match function (in PHP) that uses perl regular expressions. Apparently I don't really understand regular expressions though. Could some one explain this? If this is the...
2
by: hillcountry74 | last post by:
Hi, I'm stuck with this regular expression from past 2 days. Desperately need help. I need a regular expression that will allow all characters except these *:~<>' This is my code in...
4
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...
2
by: Brian Kitt | last post by:
I have a process where I do some minimal reformating on a TAB delimited document to prepare for DTS load. This process has been running fine, but I recently made a change. I have a Full Text...
3
by: Lucky | last post by:
hi guys, i'm practising regular expression. i've got one string and i want it to split in groups. i was trying to make one regular expression but i didn't successed. please help me guys. i'm...
3
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular...
3
by: Mr.Steskal | last post by:
Posted: Wed Jul 11, 2007 7:01 am Post subject: Regular Expression Help -------------------------------------------------------------------------------- I need help writing a regular...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.