473,322 Members | 1,345 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

reg ex expression - finding long character strings

"Garp" <ga***@no7.blueyonder.co.uk> wrote in message news:<_v*********************@news-text.cableinet.net>...
"lawrence" <lk******@geocities.com> wrote in message
news:da**************************@posting.google.c om...
This reg ex will find me any block of strings 40 or more characters in
length without a white space, yes?

[^ ]{40}
To get it to include tabs and newlines, do I to this?

[^ \n\t]{40}


\s is the whitespace token, if that's easier for you.


Good, but now my question is how to insert the white space that I
want. If I do this:

$string = ereg_replace ([^\s]{40}, " ", $string);

Then the text gets obliterated and replaced by a white space. That is
not what I want. I simply want to break up long strings (mostly urls)
that threaten to destroy the format of a page. This is especially true
of Internet Explorer, which tends to expand DIV tags to fit the
contents (Netscape lets long urls burts outside the boundries of the
DIV.)

Go look at this page using IE 5 or 6:

http://www.publicdomainsoftware.org

You'll see a comment (right now it is the second one down) that looks
like this:
>>>>>>>>>>> Misty, I assume you're the one who came up with these interesting
photos of vegetables? Are they from the ARE garden?
http://www.publicdomainsoftware.org/...egetables2.JPG ...read
more>>>>>>>>>>>


That long url is distorting the whole page. I need to break it up.

I suppose I could hit the whole string with explode() and break them
on the white space and then loop through the array and test each entry
for a length of more than 30 or 40 or so, and then stitch it all back
together with implode, but I was assuming I could do it all more
elegantly with regular expressions. I don't know much about regular
expressions, but if someone does, please let me know.
Jul 17 '05 #1
5 2489
lawrence wrote:
I suppose I could hit the whole string with explode() and break them
on the white space and then loop through the array and test each entry
for a length of more than 30 or 40 or so, and then stitch it all back
together with implode, but I was assuming I could do it all more
elegantly with regular expressions. I don't know much about regular
expressions, but if someone does, please let me know.


Try this. Change as you see fit.
<?php
function compress_url($txt, $size=40) {
$rx = '=(http://\S{' . ($size-7) . ',})=e';
$compressed_txt = preg_replace($rx,
"'[<a class=\"compressed\" href=\"$1\" title=\"$1\">'
. substr('$1', 0, $size-10)
. '...'
. substr('$1', -7)
. '</a>]'",
$txt);
return $compressed_txt;
}
$txt = '
Misty, I assume you\'re the one who came up with these interesting
photos of vegetables? Are they from the ARE garden?
http://www.publicdomainsoftware.org/...egetables2.JPG ...read
more';

# # # # # # # # # # # # # # # # # # # # # # # #
#
# Remember to define a "compressed" class in your stylesheet
#
# # # # # # # # # # # # # # # # # # # # # # # #

echo compress_url($txt);
?>
Happy Coding :-)

--
USENET would be a better place if everybody read: : mail address :
http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
http://www.expita.com/nomime.html : to 10K bytes :
Jul 17 '05 #2
Pedro Graca <he****@hotpop.com> wrote in message news:<sl*******************@ID-203069.user.uni-berlin.de>...
lawrence wrote:
I suppose I could hit the whole string with explode() and break them
on the white space and then loop through the array and test each entry
for a length of more than 30 or 40 or so, and then stitch it all back
together with implode, but I was assuming I could do it all more
elegantly with regular expressions. I don't know much about regular
expressions, but if someone does, please let me know.


Try this. Change as you see fit.
<?php
function compress_url($txt, $size=40) {
$rx = '=(http://\S{' . ($size-7) . ',})=e';
$compressed_txt = preg_replace($rx,
"'[<a class=\"compressed\" href=\"$1\" title=\"$1\">'
. substr('$1', 0, $size-10)
. '...'
. substr('$1', -7)
. '</a>]'",
$txt);
return $compressed_txt;
}


That looks brilliant, though I have trouble reading it. When you write:

http://\S{' . ($size-7) . ',

are the dots saying "one or more of this white space"?
Jul 17 '05 #3
lawrence wrote:
Pedro Graca <he****@hotpop.com> wrote in message news:<sl*******************@ID-203069.user.uni-berlin.de>...
Try this. Change as you see fit.
<?php
function compress_url($txt, $size=40) {
$rx = '=(http://\S{' . ($size-7) . ',})=e';
$compressed_txt = preg_replace($rx,
"'[<a class=\"compressed\" href=\"$1\" title=\"$1\">'
. substr('$1', 0, $size-10)
. '...'
. substr('$1', -7)
. '</a>]'",
$txt);
return $compressed_txt;
}


That looks brilliant, though I have trouble reading it. When you write:

http://\S{' . ($size-7) . ',

are the dots saying "one or more of this white space"?


No. They are the string concatenator; they are not part of the regular
expression.

If I want to find 40 or more non-whitespace characters in a regular
expression I do

\S{40,}

In the function, I made the length a parameter, so that should be

\S{$size,} *** DOES NOT WORK LIKE THIS!

but, for that specific function I'm already using "http://" (7 chars),
so, that part of the regexp is

\S{$size-7,} *** DOES NOT WORK LIKE THIS!

So, that $rx line concatenates these three strings:
=(http://\S{
$size - 7 *** the result of the subtraction
,})=e

giving, for $size=40

=(http://\S{33,})=e

so it will match http urls (and not https, ftp, mailto, ...) longer than
40 characters.
HTH

--
USENET would be a better place if everybody read: : mail address :
http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
http://www.expita.com/nomime.html : to 10K bytes :
Jul 17 '05 #4
Pedro Graca <he****@hotpop.com> wrote in message news:<sl*******************@ID-203069.user.uni-berlin.de>...
lawrence wrote:
Pedro Graca <he****@hotpop.com> wrote in message news:<sl*******************@ID-203069.user.uni-berlin.de>...
Try this. Change as you see fit.
<?php
function compress_url($txt, $size=40) {
$rx = '=(http://\S{' . ($size-7) . ',})=e';
$compressed_txt = preg_replace($rx,
"'[<a class=\"compressed\" href=\"$1\" title=\"$1\">'
. substr('$1', 0, $size-10)
. '...'
. substr('$1', -7)
. '</a>]'",
$txt);
return $compressed_txt;
}


That looks brilliant, though I have trouble reading it. When you write:

http://\S{' . ($size-7) . ',

are the dots saying "one or more of this white space"?


No. They are the string concatenator; they are not part of the regular
expression.

If I want to find 40 or more non-whitespace characters in a regular
expression I do

\S{40,}

In the function, I made the length a parameter, so that should be

\S{$size,} *** DOES NOT WORK LIKE THIS!

but, for that specific function I'm already using "http://" (7 chars),
so, that part of the regexp is

\S{$size-7,} *** DOES NOT WORK LIKE THIS!

So, that $rx line concatenates these three strings:
=(http://\S{
$size - 7 *** the result of the subtraction
,})=e

giving, for $size=40

=(http://\S{33,})=e

so it will match http urls (and not https, ftp, mailto, ...) longer than
40 characters.

So you can, so to speak, go in and out of "regex mode" by using a
single quote:

'

I assume this is simply the way PHP is built. And when I wanted to use
a real ' I suppose I would do this:

\'
Jul 17 '05 #5
lawrence wrote:
Pedro Graca <he****@hotpop.com> wrote in message news:<sl*******************@ID-203069.user.uni-berlin.de>...
lawrence wrote:
> Pedro Graca <he****@hotpop.com> wrote in message news:<sl*******************@ID-203069.user.uni-berlin.de>...
>> Try this. Change as you see fit.
>>
>>
>> <?php
>> function compress_url($txt, $size=40) {
>> $rx = '=(http://\S{' . ($size-7) . ',})=e';
>> $compressed_txt = preg_replace($rx,
>> "'[<a class=\"compressed\" href=\"$1\" title=\"$1\">'
>> . substr('$1', 0, $size-10)
>> . '...'
>> . substr('$1', -7)
>> . '</a>]'",
>> $txt);
>> return $compressed_txt;
>> }
>
> That looks brilliant, though I have trouble reading it. When you write:
>
> http://\S{' . ($size-7) . ',
>
> are the dots saying "one or more of this white space"?
No. They are the string concatenator; they are not part of the regular
expression.

If I want to find 40 or more non-whitespace characters in a regular
expression I do

\S{40,}

In the function, I made the length a parameter, so that should be

\S{$size,} *** DOES NOT WORK LIKE THIS!

but, for that specific function I'm already using "http://" (7 chars),
so, that part of the regexp is

\S{$size-7,} *** DOES NOT WORK LIKE THIS!

So, that $rx line concatenates these three strings:
=(http://\S{
$size - 7 *** the result of the subtraction
,})=e

giving, for $size=40

=(http://\S{33,})=e

so it will match http urls (and not https, ftp, mailto, ...) longer than
40 characters.

So you can, so to speak, go in and out of "regex mode" by using a
single quote:

'


No, not quite!
This is all standard string management:
http://www.php.net/manual/en/language.types.string.php

I prefer to use single quotes most of the time.

$x = 'abc'; // $x holds a three-character string
$x = $x . 14; // PHP automagically transforms the number 14 into a
// two-character string; $x now holds a five-character
// string
$x .= 'yz'; // add tow more characters to $x
// making it "abc14xy" (without the quotes)

It's the exact same thing with the regexp above :)
Instead of it being constant, it is a /dynamic/ regexp.

I assume this is simply the way PHP is built. And when I wanted to use
a real ' I suppose I would do this:

\'


If it's inside single quotes, yes.

--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: lawrence | last post by:
This reg ex will find me any block of strings 40 or more characters in length without a white space, yes? {40} To get it to include tabs and newlines, do I to this? {40}
9
by: Ron Adam | last post by:
Is it possible to match a string to regular expression pattern instead of the other way around? For example, instead of finding a match within a string, I want to find out, (pass or fail), if...
2
by: B Moor | last post by:
I have a database with 100,000's records, each with a unique reference, eg A123BNK456 I would like to generate a search facility whereby we can choose an exact match or partial match, where the...
2
by: Brian Kitt | last post by:
I have a process where I do some minimal reformating on a TAB delimited document to prepare for DTS load. This process has been running fine, but I recently made a change. I have a Full Text...
3
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
11
by: Steve | last post by:
Hi All, I'm having a tough time converting the following regex.compile patterns into the new re.compile format. There is also a differences in the regsub.sub() vs. re.sub() Could anyone lend...
28
by: Marc Gravell | last post by:
In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...
18
by: Lit | last post by:
Hi, I am looking for a Regular expression for a password for my RegExp ValidationControl Requirements are, At least 8 characters long. At least one digit At least one upper case character
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.