Connecting Tech Pros Worldwide Forums | Help | Site Map

simple regex problem

Chamomile
Guest
 
Posts: n/a
#1: Jul 17 '05
I have to split strings of the type:

$str1 =' Large ladies hats 1.365 0.334';
$str2 = 'Pins 0.335 0.22';

into separate variables (or array members) :
say, $textStr, $number1, $number2 or $pieces[0] $pieces[1] $pieces[2]
for inclusion in 3 separate fields of a database row
I am new to the world of regex and split() or preg_split() etc. and have
been trying all day to do this seemingly
simple task but keep running into problems till my head spins.
Can anyone give me some pointers?
(I have tried the manuals!)





John Dunlop
Guest
 
Posts: n/a
#2: Jul 17 '05

re: simple regex problem


Chamomile multiposted:
[color=blue]
> I have to split strings of the type:
>
> $str1 =' Large ladies hats 1.365 0.334';
> $str2 = 'Pins 0.335 0.22';[/color]

What type is that exactly?
[color=blue]
> into separate variables (or array members)[/color]

preg_split('`\s{2,}`',$string)

That returns an array containing substrings of $string split along
boundaries of two or more whitespace characters.

--
Jock
Chamomile
Guest
 
Posts: n/a
#3: Jul 17 '05

re: simple regex problem



"John Dunlop" <john+usenet@johndunlop.info> wrote in message
news:MPG.1a8f434f8d7d31cd989684@News.Individual.NE T...[color=blue]
> Chamomile multiposted:
>[color=green]
> > I have to split strings of the type:
> >
> > $str1 =' Large ladies hats 1.365 0.334';
> > $str2 = 'Pins 0.335 0.22';[/color]
>
> What type is that exactly?
>[color=green]
> > into separate variables (or array members)[/color]
>
> preg_split('`\s{2,}`',$string)
>
> That returns an array containing substrings of $string split along
> boundaries of two or more whitespace characters.
>
> --
> Jock[/color]

thank you , jock
Yes, I suppose I should have said 'kind' not type .
'multiposting' (by which I assume you mean also posting on alt.php) is a
bad thing then?
I am not well versed with the etiquette of newsgroups.
Also, I should have said that the boundry between the first (text) portion
of the string is sometimes only 1 space (unforgiveable I know) and I have
been trying to use an alpha vs. numeric comparison to do the first split.



John Dunlop
Guest
 
Posts: n/a
#4: Jul 17 '05

re: simple regex problem


Chamomile wrote:
[color=blue]
> Yes, I suppose I should have said 'kind' not type .[/color]

I wasn't nitpicking your word choice. Sorry if it came across as if
I was -- my phraseology was obviously poor in that case. I was
trying to get a better idea as to how you wanted to split the string.
[color=blue]
> 'multiposting' (by which I assume you mean also posting on alt.php) is a
> bad thing then?[/color]

Yes. Most definitely.

("Multiposting" is posting the same article separately to multiple
newsgroups; "crossposting" is simultaneously sending a *single*
article to multiple newsgroups.)

If someone were to followup to your article in alt.php, there'd be
two threads discussing exactly the same subject, which'd pointlessly
cover the same ground. It so happens that most folks read both
groups, but the possibility remains.

If you want to post the same article to different newsgroups,
crosspost, and seriously consider setting followups to the group
where the discussion is most topical. It's usually unnecessary to
even crosspost though. Some groups condemn crossposting; some
moderated groups even prevent crossposting.

http://www.cs.tut.fi/~jkorpela/usenet/xpost.html
[color=blue]
> Also, I should have said that the boundry between the first (text) portion
> of the string is sometimes only 1 space (unforgiveable I know)[/color]

That makes it slightly more tricky.
[color=blue]
> and I have been trying to use an alpha vs. numeric comparison to do the
> first split.[/color]

Right. That's probably the best way, unless there are other
constraints you're hiding. ;-)

Consider:

preg_split('`\s+(?=\d)`',$string)

That returns an array containing substrings of $string split along
boundaries of one or more whitespace characters that are followed by
a decimal digit.

--
Jock
Chamomile
Guest
 
Posts: n/a
#5: Jul 17 '05

re: simple regex problem


I take your points about cross and multiposting.
[color=blue][color=green]
> > Also, I should have said that the boundry between the first (text)[/color][/color]
portion[color=blue][color=green]
> > of the string is sometimes only 1 space (unforgiveable I know)[/color]
>
> That makes it slightly more tricky.
>[color=green]
> > and I have been trying to use an alpha vs. numeric comparison to do the
> > first split.[/color]
>
> Right. That's probably the best way, unless there are other
> constraints you're hiding. ;-)[/color]

no, I'm finding this difficult enough..
[color=blue]
> Consider:
>
> preg_split('`\s+(?=\d)`',$string)
>
> That returns an array containing substrings of $string split along
> boundaries of one or more whitespace characters that are followed by
> a decimal digit.[/color]

yes, that works! I'll use that solution to try and reverse engineer the
preg_split() thing
so gain some insight into how it works.. I seem to find regex stuff
increadibly
difficult - I hope it's just lack of practice, not an incurable brain
deficit.
thanks for your help
mjg


John Dunlop
Guest
 
Posts: n/a
#6: Jul 17 '05

re: simple regex problem


Chamomile wrote:
[color=blue]
> [John Dunlop wrote:]
>[color=green]
> > preg_split('`\s+(?=\d)`',$string)
> >
> > That returns an array containing substrings of $string split along
> > boundaries of one or more whitespace characters that are followed by
> > a decimal digit.[/color]
>
> yes, that works! I'll use that solution to try and reverse engineer the
> preg_split() thing
> so gain some insight into how it works..[/color]

The pattern itself isn't too complicated:

`\s+(?=\d)`

Firstly, the "\s" is a character type, which stands for any
whitespace character. The quantifier tells how many times that type
is allowed: "+" means one or more times.

The "(?=" starts a zero-width positive look-ahead assertion. A look-
ahead assertion looks at the characters following the current
character in the string, but does not "consume" them. So the pattern
is looking ahead of the last whitespace character. The "\d" is
another character type, this time meaning any decimal digit (the
character class [0-9]).

The details are covered in the Manual.

http://www.php.net/manual/en/pcre.pattern.syntax.php

Preg_split simply uses that pattern to return an array of substrings
from the original string. The original string is split up along
matches of the pattern, which are only ever whitespace characters
since no decimal digits are taken up by the pattern.

http://www.php.net/manual/en/function.preg-split.php

--
Jock
Closed Thread