473,396 Members | 1,738 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Find text within HTML file

Hi

Having a keyword i need to search HTML file for keyword dismissing all
the tags, and checking only plain text.
Is there an easy way to do it in C#?

Thanks
PK

Sep 8 '07 #1
5 11250
Hi Piotrekk,

For example, System.IO.File.ReadAllText(@"C:\text.txt").Contain s("something")

Regards, Alex
[TechBlog] http://devkids.blogspot.com
Hi

Having a keyword i need to search HTML file for keyword dismissing all
the tags, and checking only plain text.
Is there an easy way to do it in C#?
Thanks
PK

Sep 8 '07 #2
This will not do what i asked for.
This method only opens file and reads text. I need to find text within
HTML TAGS - text visible for the user opening the page.

Hi Piotrekk,

For example, System.IO.File.ReadAllText(@"C:\text.txt").Contain s("something")
Sep 8 '07 #3
Hi Alex,

Hmm, yeah, sorry. The simplest way is to match Regex like "search_string(?=[^>]*<)".
Other is defined by props of html (is it valid, what tags should be ingnored
and so on).

Regards, Alex
[TechBlog] http://devkids.blogspot.com
Hi Piotrekk,

For example,
System.IO.File.ReadAllText(@"C:\text.txt").Contain s("something")

Regards, Alex
[TechBlog] http://devkids.blogspot.com
>Hi

Having a keyword i need to search HTML file for keyword dismissing
all
the tags, and checking only plain text.
Is there an easy way to do it in C#?
Thanks
PK

Sep 8 '07 #4
You could use a Regex.Replace statement with the correct Regex expression to
"clean" all the HTML tags from the text string of the HTML Page, but that
might not even be necessary since it is unlikely your keyword will be found
in HTML tag names or attributes.
Have you tried just:
int foundPosition = myHtmlString.IndexOf(keyWord) ... ?
this will return the first position of the keyword, or -1 if not found.
-- Peter
Recursion: see Recursion
site: http://www.eggheadcafe.com
unBlog: http://petesbloggerama.blogspot.com
BlogMetaFinder: http://www.blogmetafinder.com

"Piotrekk" wrote:
Hi

Having a keyword i need to search HTML file for keyword dismissing all
the tags, and checking only plain text.
Is there an easy way to do it in C#?

Thanks
PK

Sep 8 '07 #5
On Sat, 08 Sep 2007 03:01:15 -0700, Piotrekk
<Pi*************@gmail.comwrote:
>Hi

Having a keyword i need to search HTML file for keyword dismissing all
the tags, and checking only plain text.
Is there an easy way to do it in C#?

Thanks
PK
Have a look at
http://www.codeplex.com/Wiki/View.as...tmlagilitypack
It is a luvly bit of freeware that will parse just about any xml
structure.
It is great for wandering around inside a HTML page plucking out and
inserting field values.
hth
Bob
Sep 9 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: ash | last post by:
I want search phrase in html file and mark (like Google when I click on "cached") it (phrase). Does somebody know any class, that can help me? Maybe somebody know how could I make this? thanks
7
by: Go USA! Go Israel! | last post by:
I was wondering if the following was possible: Instead of using a frameset which references other individual HTML files, have the <SPAN> tag reference another HTML file, which is placed within...
4
by: Ralf Koms | last post by:
Hi, I would like to reference some other HTML files within an "main" HTML file (within the "header"), Something like this: <link rel="part1" href="file1.htm"> <link rel="part2"...
2
by: Faz | last post by:
I am trying to extract text before a certain character appears in a string. This character is the letter "C". Here is some sample data - the field is called REFERENCE_2: REFERENCE_2...
9
by: Jay Kim | last post by:
Hi, We're implementing a Windows application using Visual Basic .NET. One of the key features we need to implement is that we should be able to get the accurate byte offset of user selected...
2
by: Tim_Mac | last post by:
hi, i have a tricky problem and my regex expertise has reached its limit. i have read other posts on this newsgroup that pull out the plain text from a html string, but that won't work for me...
9
by: trihanhcie | last post by:
Hi, I would like to extract the text in an HTML file For the moment, I'm trying to get all text between <tdand </td>. I used a regular expression because i don't know the "format between...
7
by: rsculthorp | last post by:
I need help with my scripts. I am currently running a configuration script with multiple If Then statements using grep to check for the existance of a particulare set of text. If grep returns a null...
0
by: veer | last post by:
Hi can any one help me by providing the method how i read the text from html file. i did it by this method Open newstr For Input As #1 While Not EOF(1) ...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.