472,805 Members | 755 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,805 software developers and data experts.

HELP !!! (Capture text between html tags)

Hey ppl,

How can we capture text between html tags using regular expressions? For example, how to capture the words "hello", "world", "bla", "bla" and "bla" in the following input:

<br><i>hello world <br><br> bla bla bla <br>

Best Regards
Jul 31 '08 #1
3 3045
284 100+
Parser may exist already for this kind of thing, but why don't you just parse it yourself? Read in the text one character at a time. When you get to a ">" you see what comes up next. If it's "<" you do nothing, otherwise you keep it and store it (capture it?) Basically, if there's text between ">" and "<" you extract it. Otherwise, you keep going.

Hope this helped,

Edit: you might also want to do something to ignore whitespace-only text in case someone goes <br> <br>Something</br></br> ...
Jul 31 '08 #2
11,448 Expert 8TB
Java has an entire framework implemented for manipulating HTML so why do it
all yourself? Create an HTMLEditorKit and an HTMLDocument. Make
the kit read data into the document given a simple Reader. When the content
is loaded create an HTML.DocumentIterator using the document. The iterator
needs an HTML.Tag to iterate over; the iterator delivers the text between
the tags.

kind regards,

Aug 1 '08 #3
228 100+
I would use regular expresion
Expand|Select|Wrap|Line Numbers
  1. String [] yourArray; // an array of the captured text 
  2. yourArray = yourString.split("<[^>]*>");
unfortunately there is no implode method so if you need it to be one string you have to connect those parts together using i.e. StringBuilder

good luck with your project
jan jarczyk
Aug 2 '08 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

by: jennyw | last post by:
I'm trying to parse a product catalog written in HTML. Some of the information I need are attributes of tags (like the product name, which is in an anchor). Some (like product description) are...
by: JKJ | last post by:
I need help with a regular expression that will pull the title and all the meta tags held in the head section of an HTML file (including the head tags). I want to exclude everything else such as...
by: Martin Andert | last post by:
Hello, i want to parse some html with regex and have the following problem: --- html to parse start --- some text <span class="x"> some text with linebreaks and tabs and <b>tags <i>in...
by: msnews.microsoft.com | last post by:
Hi, I have the expression "<font+>""(*)""</font>+\?AUTHOR_ID=+"">(*)</a>" Any body can tell me what is the meaning of that expression and what is the output of the expression. Regards, Muhammad...
by: Luhar | last post by:
After much scouring of information on Regular Expressions from books and the web, I've come up with the this handy little Regex to parse links from HTML: ...
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...
by: hzgt9b | last post by:
Using VB.NET under .NET 1.1 in VS2003, BACKGROUND I have a windows application that dereferences the MsHTM.dll. The app is successfully able to parse existing HTM documents allowing me to...
by: zeny | last post by:
Hey ppl, How can we capture text between html tags using regular expressions? For example, how to capture the words "hello", "world", "bla", "bla" and "bla" in the following input: <br><i>hello...
by: zeny | last post by:
Hey ppl, How can we capture text between html tags using regular expressions? For example, how to capture the words "hello", "world", "bla", "bla" and "bla" in the following input: <br><i>hello...
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
by: lllomh | last post by:
How does React native implement an English player?
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.