473,734 Members | 2,647 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

extract text from html

I've got some text with a few HTML tags, such as the following
<Bold>Hello</Bold>There buddy<p>please .....

I need to be able to extract just the text, which would be
Hello there buddy please....

Note, this is a Windows App, and not a Web App.
Any ideas anyone?
Dec 12 '05 #1
4 4025
Patrick,

You can have a look if this sample fits you

http://www.vb-tips.com/default.aspx?...f-56dbb63fdf1c

I hope this helps,

Cor
Dec 12 '05 #2
"Patrick" <pr***@pnews.ui k> schrieb:
I've got some text with a few HTML tags, such as the following
<Bold>Hello</Bold>There buddy<p>please .....


<URL:http://dotnet.mvps.org/dotnet/code/net/#InternetLoadFi le>

+

MSHTML
(<URL:http://groups.google.d e/group/microsoft.publi c.de.german.ent wickler.dotnet. csharp/msg/a83d872faebd113 4>)

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://classicvb.org/petition/>

Dec 12 '05 #3
Patrick ,

if you mean your Goal is just simply removing the HTML tags from a string

i made a function for this purpose with some Regex

Private Function stripHTML(ByVal strHTML) As String

Dim objRegExp As New System.Text.Reg ularExpressions .Regex("<(.|\n) +?>")

Return objRegExp.Repla ce(strHTML, "")

End Function

i use this in a winforms app that stripes websites for valuable information
with a webclient

hth

Michel Posseth [MCP]
"Patrick" <pr***@pnews.ui k> wrote in message
news:%2******** ********@TK2MSF TNGP11.phx.gbl. ..
I've got some text with a few HTML tags, such as the following
<Bold>Hello</Bold>There buddy<p>please .....

I need to be able to extract just the text, which would be
Hello there buddy please....

Note, this is a Windows App, and not a Web App.
Any ideas anyone?

Dec 14 '05 #4
Excellent solutions.
Thanx.

"m.posseth" <mi*****@nohaus ystems.nl> wrote in message
news:Oj******** *****@tk2msftng p13.phx.gbl...
Patrick ,

if you mean your Goal is just simply removing the HTML tags from a string

i made a function for this purpose with some Regex

Private Function stripHTML(ByVal strHTML) As String

Dim objRegExp As New System.Text.Reg ularExpressions .Regex("<(.|\n) +?>")

Return objRegExp.Repla ce(strHTML, "")

End Function

i use this in a winforms app that stripes websites for valuable
information with a webclient

hth

Michel Posseth [MCP]
"Patrick" <pr***@pnews.ui k> wrote in message
news:%2******** ********@TK2MSF TNGP11.phx.gbl. ..
I've got some text with a few HTML tags, such as the following
<Bold>Hello</Bold>There buddy<p>please .....

I need to be able to extract just the text, which would be
Hello there buddy please....

Note, this is a Windows App, and not a Web App.
Any ideas anyone?


Dec 15 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
21065
by: Guogang | last post by:
Hi, I need to extract plain text from HTML page (i.e. do not show images, html formatting, ...) Is there some C# class/function that can help me on this? Thanks, Guogang
1
3451
by: Ori | last post by:
Hi, I have a HTML text which I need to parse in order to extract data from it. My html contain a table contains few rows and two columns. I want to extract the data from the 2nd column in the most efficient way (using Reg Ex.) either than using the "indexOf" function of String. Thanks,
9
7307
by: trihanhcie | last post by:
Hi, I would like to extract the text in an HTML file For the moment, I'm trying to get all text between <tdand </td>. I used a regular expression because i don't know the "format between <tdand </td> It can be : <tdtext1 </td> or
9
2114
by: gregmcmullinjr | last post by:
Hello, I am new to the concept of XSL and am looking for some assistance. Take the following XML document: <binder> <author>Greg</author> <notes> <time>11:45</time>
0
1192
by: manuel.reil | last post by:
Hello, currently i am developing a very small cms using python and cheetah. very early i have noticed that i was lacking the method to extract/recover the contents (html,text) from the html that is generated by cheetah and delivered to the site viewer. to explain it further: during the output processing by cheetah placeholders are replaced with my text/html input. to edit/alter the page i have to extract my personal input out of the...
1
3667
by: steveyjg | last post by:
I want to extract the following data from a retrieved html file and store the information as strings. 'get the text of "title" <h1 id="test_title">title</h1> 'get the contents of the value attribute <input name="test_code" type="text" value='<object </object>' > 'get the text of "category" or value of c <div class="smallText">
1
3661
by: nkg1234567 | last post by:
I'm trying to extract HTML from a website in the form of a string, and then I want to extract particular elements from the string using the substr function: here is some sample code that I have thus far: use HTTP::Request::Common; use LWP::UserAgent; use LWP::Simple; $ua = LWP::UserAgent->new;
9
9368
by: flit | last post by:
Hello All, Using poplib in python I can extract only the headers using the .top, there is a way to extract only the message text without the headers? like remove the fields below: " Return-Path: X-Original-To: Received: from
1
4790
by: Alberto Sartori | last post by:
Hello, I have a html text with custom tags which looks like html comment, such: "text text text <p>text</ptext test test text text text <p>text</ptext test test <!-- @MyTag@ -->extract this<!-- /@MyTag@ --> text text text <p>text</ptext test test <!-- @MyTag@ -->and this<!-- /@MyTag@ --> text text text <p>text</ptext test test"
18
3969
by: Ecka | last post by:
Hi everyone, I'm trying to write a PHP script that connects to a bank's currency convertor page using cURL and that part works fine. The issue is that I end up with a page that includes a lot of information that I don't need. Using the PHP function strip_tags I've ended with the text below and from the remaining HTML code, I'd like to extract the lines starting with "<TABLE BORDER="1" WIDTH="315">" up to its closing </TABLEtag. How do...
0
8946
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8776
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9449
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9236
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9182
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8186
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6735
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
3261
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2724
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.