473,382 Members | 1,447 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

The best way to parse an html file?

Hi,

I have a html file file that I want to parse with ASP.NET to retreive the
value of a custom tag. Let's say that the average html file is about 30 ko.
Once the html file is loaded and converted into a single string, I'm using
for now is two string.indexOf to find the begin and the end of the desired
tag and then a string.substring to extract the data. I'm not using regular
expressions since I know exactly what are the tags to find.
My function goes like this:

private string ParseHtml(string html)
{
html = html.Replace("\r\n","");
int begin = html.IndexOf("%%StartGetHtml%%");
int end = html.IndexOf("%%EndGetHtml%%",begin);
int begin2, end2;
string str = null;
if (begin > 0 && end > 0)
{
// Gets the beginning of the tag
begin2 = html.IndexOf("<",begin);
// Gets the end of the tag
end2 = html.IndexOf(">",end-3);
if (begin2 < end2 && end2 < end)
{
// Gets the tag
str = html.Substring(begin2,end-begin2);
}
}
return str;
}

Is this the fastest way or there could be a better way to do this?

Thanks

Stephane
Nov 18 '05 #1
1 2351


Stephane wrote:

I have a html file file that I want to parse with ASP.NET to retreive the
value of a custom tag. Let's say that the average html file is about 30 ko.
Once the html file is loaded and converted into a single string, I'm using
for now is two string.indexOf to find the begin and the end of the desired
tag and then a string.substring to extract the data. I'm not using regular
expressions since I know exactly what are the tags to find.
My function goes like this:

private string ParseHtml(string html)
{
html = html.Replace("\r\n","");
int begin = html.IndexOf("%%StartGetHtml%%");
int end = html.IndexOf("%%EndGetHtml%%",begin);
int begin2, end2;
string str = null;
if (begin > 0 && end > 0)
{
// Gets the beginning of the tag
begin2 = html.IndexOf("<",begin);
// Gets the end of the tag
end2 = html.IndexOf(">",end-3);
if (begin2 < end2 && end2 < end)
{
// Gets the tag
str = html.Substring(begin2,end-begin2);
}
}
return str;
}

Is this the fastest way or there could be a better way to do this?


If those string processing attempts suffice for you then use them but in
general if you want to parse HTML you might want to check SGMLReader, see
http://www.gotdotnet.com/community/u...ery=sgmlreader

--

Martin Honnen
http://JavaScript.FAQTs.com/
Nov 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Els | last post by:
***newbie question*** Hi, I am trying to make my server (Apache) parse .html files as .php. I found this line of code: ForceType application/x-httpd-php placed it in an .htaccess file and...
3
by: Mitchua | last post by:
When I run the well quoted line: my $ascii = HTML::FormatText->new->format(HTML::Parse::parse_html($html)); to remove HTML tags from an html document, it replaces all tables with "". Is there a...
12
by: jacob nikom | last post by:
Hi, I would like to store XML files in MySQL. What is the best solution: 1. Convert it to string and store it as CLOB/text 2. Serialize it and store as byte array 3. Flatten it out and create...
3
by: Johnny | last post by:
Hello all, I have a 1GB XML file that I need to read once a day and I would like to get feedback to find out what is the most efficient way to go about reading this file. The application reading...
1
by: Andy Britcliffe | last post by:
Hi I'm faced with the situation where I could have a single physical file that could contain multiplie XML documents e.g file.txt contains the following: <?xml version="1.0"...
19
by: Johnny Google | last post by:
Here is an example of the type of data from a file I will have: Apple,4322,3435,4653,6543,4652 Banana,6934,5423,6753,6531 Carrot,3454,4534,3434,1111,9120,5453 Cheese,4411,5522,6622,6641 The...
14
by: Rob Meade | last post by:
Hi all, I'm working on a project where there are just under 1300 course files, these are HTML files - my problem is that I need to do more with the content of these pages - and the thought of...
13
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them...
5
by: GenCode | last post by:
What is the best way to read a "readable" web directory... I know I can do this Client.DownloadFile("http://www.mydomain.com/readabledir/", c:\ \dir.txt"); But that gives me the html and all...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.