473,471 Members | 1,868 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

c#.NET get text between body tags of an html file

rhitam30111985
112 New Member
Hi all ,

I am trying to read an html file and retrieve only the text between
the body tags of that file. Now , for reading a string between two
strings , i already have a function :


http://www.mycsharpcorner.com/Post.aspx?postID=15


But the problem is that the body tag might have some attribute. In
that case i dont know how to exclude that and get only the text
between the tags. Ie , something like this :


<body style="margin:0;padding:0">
..
.
.
.
</body>


Any ideas?


Regards,
Rhitam
May 5 '09 #1
2 18466
cloud255
427 Recognized Expert Contributor
This is just a slight modification on the search algorithm where your start index is not the entire opening tag in this case "<body>" but rather the first occurrence of ">" after the body tag start "<body".

So you need to find the end index of the "<body" string and use this as the start index for your next search in which you will look for the ">" character. This is then the starting index of your actual message.

From this point its fairly straight forward to get the starting index of the closing "</body>" tag.

This is string manipulation which can be done using Regex or the .NET string class.
May 5 '09 #2
r035198x
13,262 MVP
Just use regex.
If you have
Expand|Select|Wrap|Line Numbers
  1. string text = "<body style=\"margin:0;padding:0\">r035198x</body> ";
then
Expand|Select|Wrap|Line Numbers
  1. Regex.Replace(text, "\\</*body.*?>", "");
should do it.

P.S Not tested on C# compiler.
May 5 '09 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Roger Withnell | last post by:
My customer needs to be able to change some of the text on a page from time to time. She can do this using a form's text area but I'd prefer to design so that she can prepare the text in her PC...
10
by: J. Alan Rueckgauer | last post by:
Hello. I'm looking for a simple way to do the following: We have a database that serves-up content to a website. Some of those items are events, some are news articles. They're stored in the...
2
by: xyz | last post by:
I am trying to display some XML-tagged text in a web page. The following example works well in Netscape 7.1, but only the H2 heading is colored in MSIE. It seems that MSIE only applies style...
1
by: Sketcher | last post by:
Hi all, I have a basic menu (compatible with IE and N6), the code of which is shown below. Is there any way that I can tidy this up - perhaps define the criteria once only for the entire menu....
8
by: Jakej | last post by:
I've been using a javascript in an html file for a banner slider, and it works as desired. But I'd like to use it on more than one page and it would be great if I could transfer the code to a .js...
4
by: Stu | last post by:
Hi, I am writing a content management system that has to have W3C compliant output. The pages are template driven and there are special strings within the template to be used as placeholders for...
4
by: Spondishy | last post by:
Hi, I'm looking for help with a regular expression and c#. I want to remove all tags from a piece of html except the following. <a> <b> <h1> <h2>
3
by: Alex | last post by:
Hello. First, with AJAX I will get a remote web page into a string. Thus, a string will contain HTML tags and such. I will need to extract text from one <span> for which I know the ID the inner...
23
by: Big Bill | last post by:
http://www.promcars.co.uk/pages/bonnie.php I don't believe they should be there, can I take them out without stopping the includes from functioning? I'm the (hapless) optimiser on this one... I...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.