473,406 Members | 2,849 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Grab Text Between Tags

I have researched the jist behind making this work but I am having a unique problem that I can't seem to fix.

I use the following code to grab the text between these two tags.
Expand|Select|Wrap|Line Numbers
  1. $content = "[tag]Hello[/tag]";
  2. preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
  3. print_r($matches);
  4.  
  5.  
This yields the following results succesfully:
Expand|Select|Wrap|Line Numbers
  1. Array
  2. (
  3.     [0] => Array
  4.         (
  5.             [0] => [tag]Hello[/tag]
  6.         )
  7.     [1] => Array
  8.         (
  9.             [0] => [tag]
  10.         )
  11.     [2] => Array
  12.         (
  13.             [0] => tag
  14.         )
  15.     [3] => Array
  16.         (
  17.             [0] => Hello
  18.         )
  19.     [4] => Array
  20.         (
  21.             [0] => [/tag]
  22.         )
  23. )
  24.  
The problem arises when there are duplicate tags.
Expand|Select|Wrap|Line Numbers
  1. $content = "[tag]Hello[/tag] More Text [tag]Hello2[/tag]";
  2. preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
  3. print_r($matches);
  4.  
  5.  
With this I end up with the following:
Expand|Select|Wrap|Line Numbers
  1. Array
  2. (
  3.     [0] => Array
  4.         (
  5.             [0] => [tag]Hello[/tag] More Text [tag]Hello2[/tag]
  6.         )
  7.     [1] => Array
  8.         (
  9.             [0] => [tag]
  10.         )
  11.     [2] => Array
  12.         (
  13.             [0] => tag
  14.         )
  15.     [3] => Array
  16.         (
  17.             [0] => Hello[/tag] More Text [tag]Hello2
  18.         )
  19.     [4] => Array
  20.         (
  21.             [0] => [/tag]
  22.         )
  23. )
  24.  
  25.  
I need to be able to have it stop at the first close tag and then register the second open and close tag in a different array. Any Ideas?
Apr 11 '07 #1
2 2815
bucabay
18
I have researched the jist behind making this work but I am having a unique problem that I can't seem to fix.

I use the following code to grab the text between these two tags.
Expand|Select|Wrap|Line Numbers
  1. $content = "[tag]Hello[/tag]";
  2. preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
  3. print_r($matches);
  4.  
  5.  
This yields the following results succesfully:
Expand|Select|Wrap|Line Numbers
  1. Array
  2. (
  3.     [0] => Array
  4.         (
  5.             [0] => [tag]Hello[/tag]
  6.         )
  7.     [1] => Array
  8.         (
  9.             [0] => [tag]
  10.         )
  11.     [2] => Array
  12.         (
  13.             [0] => tag
  14.         )
  15.     [3] => Array
  16.         (
  17.             [0] => Hello
  18.         )
  19.     [4] => Array
  20.         (
  21.             [0] => [/tag]
  22.         )
  23. )
  24.  
The problem arises when there are duplicate tags.
Expand|Select|Wrap|Line Numbers
  1. $content = "[tag]Hello[/tag] More Text [tag]Hello2[/tag]";
  2. preg_match_all("/(\[([\w]+)\])(.*)(\[\/\\2\])/", $content, $matches);
  3. print_r($matches);
  4.  
  5.  
With this I end up with the following:
Expand|Select|Wrap|Line Numbers
  1. Array
  2. (
  3.     [0] => Array
  4.         (
  5.             [0] => [tag]Hello[/tag] More Text [tag]Hello2[/tag]
  6.         )
  7.     [1] => Array
  8.         (
  9.             [0] => [tag]
  10.         )
  11.     [2] => Array
  12.         (
  13.             [0] => tag
  14.         )
  15.     [3] => Array
  16.         (
  17.             [0] => Hello[/tag] More Text [tag]Hello2
  18.         )
  19.     [4] => Array
  20.         (
  21.             [0] => [/tag]
  22.         )
  23. )
  24.  
  25.  
I need to be able to have it stop at the first close tag and then register the second open and close tag in a different array. Any Ideas?
What is happening is that PHP is matching as many characters as it can for the quantifier .* in your regular expression.

To make the quantifiers match the least amount of characters use the "un-greedy" indicator, "?".

eg:

[PHP]preg_match_all("/(\[([\w]+)\])(.*?)(\[\/\\2\])/", $content, $matches);[/PHP]

notice you now have (.*?) matching the characters in between tags rather than the previous (.*)

(.*?) will match until it reaches the first (\[\/\\2\])

before you had:

(.*) will match until it reaches the last (\[\/\\2\])
Apr 11 '07 #2
What is happening is that PHP is matching as many characters as it can for the quantifier .* in your regular expression.

To make the quantifiers match the least amount of characters use the "un-greedy" indicator, "?".

eg:

[PHP]preg_match_all("/(\[([\w]+)\])(.*?)(\[\/\\2\])/", $content, $matches);[/PHP]

notice you now have (.*?) matching the characters in between tags rather than the previous (.*)

(.*?) will match until it reaches the first (\[\/\\2\])

before you had:

(.*) will match until it reaches the last (\[\/\\2\])

Rock on! Thanks a lot. I thought I had tried that but I guess working at 4 am with 0 sleep can screw with your head. Thanks again.
Apr 11 '07 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: dan glenn | last post by:
hi. I want to code a 'preview' function into a guestbook entry page. I can do it with a button that posts, bringing up a whole new page showing a preview of what has been entered, and then the user...
2
by: SK | last post by:
Is there a way to store HTML into a MySQL TEXT column, yet be able to search over textual content only? For example, if I store the following HTML snippet: <p>A very <em>small</em>...
0
by: Baby Blue | last post by:
I have 2 code like below to grab a news website for my site. However, when I click some links (such as : http://wwww.vnexpress.net/xxx/xxxx ) inside the site which I want to grab, it has some...
3
by: James | last post by:
Hi guys, I have been building a search engine here - not because I have plans of dethrowning Google but as a simple app upon which to develop a function set that I can use for other things. ...
14
by: Seth Russell | last post by:
I'm running Kevin Roth's rte box and i want to deactivate the ability to past inside the box. People sometimes paste outrageous things in there that might break my site. How can I deactivate the...
8
by: '69 Camaro | last post by:
Perhaps I'm Googling for the wrong terms. Does anyone have links to examples of the syntax necessary to read the HTML on another Web page when that HTML is produced from JavaScript using the...
3
by: Alex | last post by:
Hello. First, with AJAX I will get a remote web page into a string. Thus, a string will contain HTML tags and such. I will need to extract text from one <span> for which I know the ID the inner...
13
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them...
5
by: =?Utf-8?B?U29waGll?= | last post by:
Hi, I try to design a window application to grab attachment from a list of given URL and save them to a user difined folder on their local drive. The list of URL for me to grab attachment will...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.