473,326 Members | 2,148 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

ASP Question: Parse HTML file?

Hi all,

I'm working on a project where there are just under 1300 course files, these
are HTML files - my problem is that I need to do more with the content of
these pages - and the thought of writing 1300 asp pages to deal with this
doesn't thrill me.

The HTML pages are provided by a training company. They seem to be
"structured" to some degree, but I'm not sure how easy its going to be to
parse the page.

Typically there are the following "sections" of each page:

Title
Summary
Topics
Technical Requirements
Copyright Information
Terms Of Use

I need to get the content for the Title, Summary, Topics, Technical
Requirements and lose the Copyright and Terms of use...in addition I need to
squeeze in a new section which will display pricing information and a link
to "Add to cart" etc....

My "plan" (if you can call it that) was to have 1 asp page which can parse
the appropriate HTML file based on the asp page being passed a code in the
querystring - the code will match the filename of the HTML page (the first
part prior to the dot).

What I then need to do is go through the content of the HTML....this is
where I am currently stuck....

I have pasted an example of one of these pages below - if anyone can suggest
to me how I might achieve this I would be most grateful - in addition - if
anyone can explain the XML Name Space stuff in there that would be handy
too - I figure this is just a normal HTML page, as there is no declaration
or anything at the top?

Any information/suggestions would be most appreciated.

Thanks in advance for your help,

Regards

Rob
Example file:

<html>
<head>
<title>Novell 560 CNE Series: File System</title>
<meta name="Description" content="">
<link rel="stylesheet" href="../resource/mlcatstyle.css"
type="text/css">
</head>
<body class="MlCatPage">
<table class="Header" xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<tr>
<td class="Logo" colspan="2">
<img class="Logo" src="../images/logo.gif">
</td>
</tr>
<tr>
<td class="Title">
<div class="ProductTitle">
<span class="CoCat">Novell 560 CNE Series: File System</span>
</div>
<div class="ProductDetails">
<span class="SmallText">
<span class="BoldText"> Product Code: </span>
560c04<span class="BoldText"> Time: </span>
4.0 hour(s)<span class="BoldText"> CEUs: </span>
Available</span>
</div>
</td>
<td class="Back">
<div class="BackButton">
<a href="javascript:history.back()">
<img src="../images/back.gif" align="right" border="0">
</a>
</div>
</td>
</tr>
</table>
<br xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<table class="HighLevel" xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<tr>
<td class="BlockHeader">
<h3 class="sectiontext">Summary:</h3>
</td>
</tr>
<tr>
<td class="Overview">
<div class="ProductSummary">This course provides an introduction
to NetWare 5 file system concepts and management procedures.</div>
<br>
<h3 class="Sectiontext">Objectives:</h3>
<div class="FreeText">After completing this course, students will
be able to: </div>
<div class="ObjectiveList">
<ul class="listing">
<li class="ObjectiveItem">Explain the relationship of the file
system and login scripts</li>
<li class="ObjectiveItem">Create login scripts</li>
<li class="ObjectiveItem">Manage file system directories and
files</li>
<li class="ObjectiveItem">Map network drives</li>
</ul>
</div>
<br></br>
<h3 class="Sectiontext">Topics:</h3>
<div class="OutlineList">
<ul class="listing">
<li class="OutlineItem">Managing the File System</li>
<li class="OutlineItem">Volume Space</li>
<li class="OutlineItem">Examining Login Scripts</li>
<li class="OutlineItem">Creating and Executing Login
Scripts</li>
<li class="OutlineItem">Drive Mappings</li>
<li class="OutlineItem">Login Scripts and Resources</li>
</ul>
</div>
</td>
</tr>
</table>
<br xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<table class="Details" xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<tr>
<td class="BlockHeader">
<h3 class="Sectiontext">Technical Requirements:</h3>
</td>
</tr>
<tr>
<td class="Details">
<div class="ProductRequirements">200MHz Pentium with 32MB Ram. 800
x 600 minimum screen resolution. Windows 98, NT, 2000, or XP. 56K minimum
connection speed, broadband (256 kbps or greater) connection recommended.
Internet Explorer 5.0 or higher required. Flash Player 7.0 or higher
required. JavaScript must be enabled. Netscape, Firefox and AOL browsers not
supported.</div>
</td>
</tr>
</table>
<br xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<table class="Legal" xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:fn="http://www.w3.org/2005/xpath-functions">
<tr>
<td class="BlockHeader">
<h3 class="Sectiontext">Copyright Information:</h3>
</td>
</tr>
<tr>
<td class="Copyright">
<div class="ProductRequirements">Product names mentioned in this
catalog may be trademarks/servicemarks or registered trademarks/servicemarks
of their respective companies and are hereby acknowledged. All product
names that are known to be trademarks or service marks have been
appropriately capitalized. Use of a name in this catalog is for
identification purposes only, and should not be regarded as affecting the
validity of any trademark or service mark, or as suggesting any affiliation
between MindLeaders.com, Inc. and the trademark/servicemark
proprietor.</div>
<br>
<h3 class="Sectiontext">Terms of Use:</h3>
<div class="ProductUsenote"></div>
</td>
</tr>
</table>
<p align="center">
<span class="SmallText">Copyright &copy; 2006 MindLeaders. All rights
reserved.</span>
</p>
</body>
</html>
May 18 '06 #1
14 7310

Rob Meade wrote:
Hi all,

I'm working on a project where there are just under 1300 course files, these
are HTML files - my problem is that I need to do more with the content of
these pages - and the thought of writing 1300 asp pages to deal with this
doesn't thrill me.

The HTML pages are provided by a training company. They seem to be
"structured" to some degree, but I'm not sure how easy its going to be to
parse the page.

Typically there are the following "sections" of each page:

Title
Summary
Topics
Technical Requirements
Copyright Information
Terms Of Use


If you can identify the specific divs that hold this information (and
they are consistent across pages), you could use regex to parse the
files and pop the relevant bits into a database.

--
Mike Brind

May 18 '06 #2

I have pasted an example of one of these pages below - if anyone can suggest to me how I might achieve this I would be most grateful - in addition - if
anyone can explain the XML Name Space stuff in there that would be handy
too - I figure this is just a normal HTML page, as there is no declaration
or anything at the top?


These pages will have been generated via an XSLT transform. The transform
will have made use of these namespaces. However unless informed otherwise
XSLT will output the xmlns tags for these namespaces even though no element
is output belonging to them which is the case here.

That's a long winded way of saying they don't do anything, ignore them.

It's a pity they didn't go the whole hog and output the whole page as XML it
would be a lot easier to do what you need. Still it's a good sign that the
content of the other 1299 pages are likely to be consistent so Mike's idea
of scanning with RegExp should work.

Anthony.
May 18 '06 #3
"Rob Meade" <ro********@NO-SPAM.kingswoodweb.net> wrote in message
news:PS*****************@text.news.blueyonder.co.u k...
Hi all,

I'm working on a project where there are just under 1300 course files, these are HTML files - my problem is that I need to do more with the content of
these pages - and the thought of writing 1300 asp pages to deal with this
doesn't thrill me.

The HTML pages are provided by a training company. They seem to be
"structured" to some degree, but I'm not sure how easy its going to be to
parse the page.

Typically there are the following "sections" of each page:

Title
Summary
Topics
Technical Requirements
Copyright Information
Terms Of Use

I need to get the content for the Title, Summary, Topics, Technical
Requirements and lose the Copyright and Terms of use...in addition I need to squeeze in a new section which will display pricing information and a link
to "Add to cart" etc....

My "plan" (if you can call it that) was to have 1 asp page which can parse
the appropriate HTML file based on the asp page being passed a code in the
querystring - the code will match the filename of the HTML page (the first
part prior to the dot).

What I then need to do is go through the content of the HTML....this is
where I am currently stuck....

I have pasted an example of one of these pages below - if anyone can suggest to me how I might achieve this I would be most grateful - in addition - if
anyone can explain the XML Name Space stuff in there that would be handy
too - I figure this is just a normal HTML page, as there is no declaration
or anything at the top?

Any information/suggestions would be most appreciated.


[snip]

Consider displaying their page inside of an <iframe>
inside of a page that has your content.

"The iframe element creates an inline frame that contains another document."
http://www.w3schools.com/tags/tag_iframe.asp
May 18 '06 #4
"McKirahan" wrote ...
Consider displaying their page inside of an <iframe>
inside of a page that has your content.


Hi McKirahan,

Thanks for your reply - alas I need "bits" of their pages, with "bits" of my
stuff inserted in between, so including their whole page as-is unfortunately
is no good for me.

Regards

Rob
May 19 '06 #5
"Mike Brind" wrote ...
If you can identify the specific divs that hold this information (and
they are consistent across pages), you could use regex to parse the
files and pop the relevant bits into a database.


Hi Mike,

Thanks for your reply.

I don't suppose by any chance you might have an example that would get me
started with that approach would you - it sounds like it could well work.

Regards

Rob
May 19 '06 #6
"Anthony Jones" wrote ...
These pages will have been generated via an XSLT transform. The transform
will have made use of these namespaces. However unless informed otherwise
XSLT will output the xmlns tags for these namespaces even though no
element
is output belonging to them which is the case here.

That's a long winded way of saying they don't do anything, ignore them.

It's a pity they didn't go the whole hog and output the whole page as XML
it
would be a lot easier to do what you need. Still it's a good sign that
the
content of the other 1299 pages are likely to be consistent so Mike's idea
of scanning with RegExp should work.


Hi Anthony,

Thanks for the reply.

I especially appreciate the explanation for why they are there - I tried
googling it last night and found some stuff about XSLT 2.0 but it didn't
really get me anywhere - I would agree that it's a shame they are not as
XML - that would have been nice!

Cheers

Rob
May 19 '06 #7
"Mike Brind" <pa*******@hotmail.com> wrote in message
news:11**********************@y43g2000cwc.googlegr oups.com...

Rob Meade wrote:
Hi all,

I'm working on a project where there are just under 1300 course files, these are HTML files - my problem is that I need to do more with the content of these pages - and the thought of writing 1300 asp pages to deal with this doesn't thrill me.

The HTML pages are provided by a training company. They seem to be
"structured" to some degree, but I'm not sure how easy its going to be to parse the page.

Typically there are the following "sections" of each page:

Title
Summary
Topics
Technical Requirements
Copyright Information
Terms Of Use


If you can identify the specific divs that hold this information (and
they are consistent across pages), you could use regex to parse the
files and pop the relevant bits into a database.

--
Mike Brind


It would have been nice if each div calss were unquie.
This one is repeated:
<div class="ProductRequirements">
It's not wrong just (potentially) inconvenient.

<td class="Details">
<div class="ProductRequirements">200MHz Pentium ...

<td class="Copyright">
<div class="ProductRequirements">Product names ...

Which div's are you interested in?
Here's a script that will extract all the div's into a new file:

Option Explicit
'*
Const cVBS = "Novell.vbs"
Const cOT1 = "Novell.htm" '= Input filename
Const cOT2 = "Novell.txt" '= Output filename
Const cDIV = "</div>"
'*
'* Declare Variables
'*
Dim intBEG
intBEG = 1
Dim arrDIV(9)
arrDIV(0) = "<div class=" & Chr(34) & "?" & Chr(34) & ">"
arrDIV(1) = "ProductTitle"
arrDIV(2) = "ProductDetails"
arrDIV(3) = "ProductSummary"
arrDIV(4) = "FreeText"
arrDIV(5) = "ObjectiveList"
arrDIV(6) = "OutlineList"
arrDIV(7) = "ProductRequirements"
arrDIV(8) = "ProductRequirements"
arrDIV(9) = "ProductUsenote"
Dim intDIV
Dim strDIV
Dim arrOT1
Dim intOT1
Dim strOT1
Dim strOT2
Dim intPOS
'*
'* Declare Objects
'*
Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objOT1
Set objOT1 = objFSO.OpenTextFile(cOT1,1)
Dim objOT2
Set objOT2 = objFSO.OpenTextFile(cOT2,2,True)
'*
'* Read File, Extract "div", Write Line
'*
strOT1 = objOT1.ReadAll()
For intDIV = 1 To UBound(arrDIV)
strOT2 = Mid(strOT1,intBEG)
strDIV = Replace(arrDIV(0),"?",arrDIV(intDIV))
intPOS = InStr(strOT2,strDIV)
If intPOS > 0 Then
strOT2 = Mid(strOT2,intPOS)
intPOS = InStr(strOT2,cDIV)
strOT2 = Left(strOT2,intPOS+Len(cDIV))
objOT2.WriteLine(strOT2 & vbCrLf)
intBEG = intPOS + Len(cDIV) + 1
End If
Next
'*
'* Destroy Objects
'*
Set objOT1 = Nothing
Set objOT2 = Nothing
Set objFSO = Nothing
'*
'* Done!
'*
MsgBox "Done!",vbInformation,cVBS

You could modify it to loop through a list or folder of files.

Note that each "class=" is in the stylesheet:
<link rel="stylesheet" href="../resource/mlcatstyle.css"
type="text/css">
which you should refer to when using their div's.
May 19 '06 #8
"McKirahan" wrote ...

Hi McKirahan, thank you again for your reply and example.

I should add that I wont be writing these out to another file, instead it'll
need to do it on the fly, ie, take the original source page by the code
passed in the URL, read in the appropriate parts, and then spit out my own
layout and extra parts.

With the example you posted (below) - does it extract whats between the DIV
tags, ie the <tr>'s and <td's> as well, or just the actually "text"?

Thanks again

Rob
PS: The copyright one can be excluded..
PPS: When I say its going to happen on the fly, this would obviously depend
on how quick and efficient it is - if it turns out that because of the
number of hits they get on the site in question its a bit too slow, then I
might have to have some kind of "import" process which obviously would make
more sense anyway, this could then create new pages, or perhaps store the
information in the database.
It would have been nice if each div calss were unquie.
This one is repeated:
<div class="ProductRequirements">
It's not wrong just (potentially) inconvenient.

<td class="Details">
<div class="ProductRequirements">200MHz Pentium ...

<td class="Copyright">
<div class="ProductRequirements">Product names ...

Which div's are you interested in?
Here's a script that will extract all the div's into a new file:

Option Explicit
'*
Const cVBS = "Novell.vbs"
Const cOT1 = "Novell.htm" '= Input filename
Const cOT2 = "Novell.txt" '= Output filename
Const cDIV = "</div>"
'*
'* Declare Variables
'*
Dim intBEG
intBEG = 1
Dim arrDIV(9)
arrDIV(0) = "<div class=" & Chr(34) & "?" & Chr(34) & ">"
arrDIV(1) = "ProductTitle"
arrDIV(2) = "ProductDetails"
arrDIV(3) = "ProductSummary"
arrDIV(4) = "FreeText"
arrDIV(5) = "ObjectiveList"
arrDIV(6) = "OutlineList"
arrDIV(7) = "ProductRequirements"
arrDIV(8) = "ProductRequirements"
arrDIV(9) = "ProductUsenote"
Dim intDIV
Dim strDIV
Dim arrOT1
Dim intOT1
Dim strOT1
Dim strOT2
Dim intPOS
'*
'* Declare Objects
'*
Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")
Dim objOT1
Set objOT1 = objFSO.OpenTextFile(cOT1,1)
Dim objOT2
Set objOT2 = objFSO.OpenTextFile(cOT2,2,True)
'*
'* Read File, Extract "div", Write Line
'*
strOT1 = objOT1.ReadAll()
For intDIV = 1 To UBound(arrDIV)
strOT2 = Mid(strOT1,intBEG)
strDIV = Replace(arrDIV(0),"?",arrDIV(intDIV))
intPOS = InStr(strOT2,strDIV)
If intPOS > 0 Then
strOT2 = Mid(strOT2,intPOS)
intPOS = InStr(strOT2,cDIV)
strOT2 = Left(strOT2,intPOS+Len(cDIV))
objOT2.WriteLine(strOT2 & vbCrLf)
intBEG = intPOS + Len(cDIV) + 1
End If
Next
'*
'* Destroy Objects
'*
Set objOT1 = Nothing
Set objOT2 = Nothing
Set objFSO = Nothing
'*
'* Done!
'*
MsgBox "Done!",vbInformation,cVBS

You could modify it to loop through a list or folder of files.

Note that each "class=" is in the stylesheet:
<link rel="stylesheet" href="../resource/mlcatstyle.css"
type="text/css">
which you should refer to when using their div's.

May 19 '06 #9
"Rob Meade" <ku***************@edaem.bor> wrote in message
news:e3**************@TK2MSFTNGP03.phx.gbl...
"McKirahan" wrote ...

Hi McKirahan, thank you again for your reply and example.

I should add that I wont be writing these out to another file, instead it'll need to do it on the fly, ie, take the original source page by the code
passed in the URL, read in the appropriate parts, and then spit out my own
layout and extra parts.

With the example you posted (below) - does it extract whats between the DIV tags, ie the <tr>'s and <td's> as well, or just the actually "text"?

Thanks again

Rob
PS: The copyright one can be excluded..
PPS: When I say its going to happen on the fly, this would obviously depend on how quick and efficient it is - if it turns out that because of the
number of hits they get on the site in question its a bit too slow, then I
might have to have some kind of "import" process which obviously would make more sense anyway, this could then create new pages, or perhaps store the
information in the database.


Did you try it as-is to see what you get?

I would probably put all 1300 files (pages) in a single folder.
Then run a process against each to generate 1300 new files in
a different folder. These would be posted for quick access.

Prior to posting the could be reviewed for accuracy.

Also, instead of extracting out the div's you could just identify
where you want your stuff inserted.
May 19 '06 #10
"McKirahan" wrote ...
Did you try it as-is to see what you get?
Hi McKirahan, thanks for your reply.

Not as of yet no - but I'm home this weekend so will be giving it ago :o)
I would probably put all 1300 files (pages) in a single folder.
They come in a /courses directory
Then run a process against each to generate 1300 new files in
a different folder. These would be posted for quick access.
I think I might have to change the process a bit but the idea is the same -
the content provider has other bits that link to these files, so they'd
still need to be in a /courses directory, but I could put them somewhere
else first, "mangle" them and then spit them out to the /courses directory
:o)
Prior to posting the could be reviewed for accuracy.
I might check a couple - but not all 1300 - I dont wanna go mental... :oD
Also, instead of extracting out the div's you could just identify
where you want your stuff inserted.


Yeah, but there were bits I needed to lose, ie the copyright section etc..

I seem to remember a long time back a discussion about transforming pages, I
think it might have been done in an ISAPI filter or something though - not
sure - from what I remember the requested page would get grabbed, actions
happen and then it can be spat out as a different page - I wonder if this is
what the previous company that did this adopted, because I find it hard to
believe they would have created 1300 asp files, but yet all of the links on
the original site were <course-code>.asp as opposed to the real file
<course-code.html - if you see what I mean...

Regards

Rob
May 20 '06 #11
"Rob Meade" <te*********************@edaem.bbor> wrote in message
news:cT******************@text.news.blueyonder.co. uk...

[snip]
I seem to remember a long time back a discussion about transforming pages, I think it might have been done in an ISAPI filter or something though - not
sure - from what I remember the requested page would get grabbed, actions
happen and then it can be spat out as a different page - I wonder if this is what the previous company that did this adopted, because I find it hard to
believe they would have created 1300 asp files, but yet all of the links on the original site were <course-code>.asp as opposed to the real file
<course-code.html - if you see what I mean...


An approach they could have taken was to store the "sections" in a database
table -- one memo field per section -- then generate static pages from it.

Thus, the header, navigation, and footer could be modified independently.
May 20 '06 #12
"McKirahan" wrote ...
An approach they could have taken was to store the "sections" in a
database
table -- one memo field per section -- then generate static pages from it.

Thus, the header, navigation, and footer could be modified independently.


I suspect the company does have this, but they most likely use it for the
generation of these files which they then sell on etc...

The one thing I do have missing at the moment is a nice file that ties the
<course_code.html> file names (or just the codes) - to the titles of the
courses!

They give you a "contents.html" file which has all of the courses listed and
the codes / files as hyperlinks - but again it would mean parsing the entire
file to get at the goodies, I'm going to ask them if they have the same
thing in XML/Database or something to hopefully make that a bit easier..

Thanks again for your help - alas due to my 9 month old son I have yet to
get around to trying your example! But I will :o)

Rob
May 21 '06 #13
When you do get to try Rob's code, you will see that it opens a number
of possibilities - one of which is to insert the contents of the divs
into an database instead of writing them to 1300 text files. I really
can't understand why this is not at the top of your list of options -
manage 1300 files...? or manage 1? Hmmmm.... But then you obviously
know a lot more about your project then I do :-)

If you were using Rob's code, you can insert this into it:

If intDiv = 2 Then
Dim re, m, myMatches, pcode
Set re = New RegExp
With re
.Pattern = "Product Code: </span>[\s]+[\n]+[\s]+([a-z0-9]{6})"
.IgnoreCase = True
.Global = True
End With
Set myMatches = re.Execute(strOT2)
For Each m In myMatches
If m.Value <>"" Then
pcode = Replace(m.Value,"Product Code: </span>","")
pcode = Replace(pcode," ","")
pcode = Replace(pcode,chr(10),"")
pcode = Replace(pcode,chr(13),"")
Response.Write pcode 'or write to db
End If
Next
Set re = Nothing
End If

And that will return the Product Code on it's own. Change the pattern
to "<title>[\.]*</title>" and you get the title stripped out too.

--
Mike Brind
Rob Meade wrote:
"McKirahan" wrote ...
An approach they could have taken was to store the "sections" in a
database
table -- one memo field per section -- then generate static pages from it.

Thus, the header, navigation, and footer could be modified independently.


I suspect the company does have this, but they most likely use it for the
generation of these files which they then sell on etc...

The one thing I do have missing at the moment is a nice file that ties the
<course_code.html> file names (or just the codes) - to the titles of the
courses!

They give you a "contents.html" file which has all of the courses listed and
the codes / files as hyperlinks - but again it would mean parsing the entire
file to get at the goodies, I'm going to ask them if they have the same
thing in XML/Database or something to hopefully make that a bit easier..

Thanks again for your help - alas due to my 9 month old son I have yet to
get around to trying your example! But I will :o)

Rob


May 21 '06 #14
"Mike Brind" wrote ...
When you do get to try Rob's code, you will see that it opens a number
of possibilities - one of which is to insert the contents of the divs
into an database instead of writing them to 1300 text files. I really
can't understand why this is not at the top of your list of options -
manage 1300 files...? or manage 1? Hmmmm.... But then you obviously
know a lot more about your project then I do :-)

If you were using Rob's code, you can insert this into it:

If intDiv = 2 Then
Dim re, m, myMatches, pcode
Set re = New RegExp
With re
.Pattern = "Product Code: </span>[\s]+[\n]+[\s]+([a-z0-9]{6})"
.IgnoreCase = True
.Global = True
End With
Set myMatches = re.Execute(strOT2)
For Each m In myMatches
If m.Value <>"" Then
pcode = Replace(m.Value,"Product Code: </span>","")
pcode = Replace(pcode," ","")
pcode = Replace(pcode,chr(10),"")
pcode = Replace(pcode,chr(13),"")
Response.Write pcode 'or write to db
End If
Next
Set re = Nothing
End If

And that will return the Product Code on it's own. Change the pattern
to "<title>[\.]*</title>" and you get the title stripped out too.


Hi Mike,

Thanks for your reply - something else to try with it - very much
appreciated, thank you.

Regards

Rob
PS: It's McKirahan's code ;o)
May 22 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Chad Lupkes | last post by:
I'm just starting in php, although I have a little programming experience in other languages, but only on a surface level. I'm hoping someone can help me. I have a multidimensional array that I...
3
by: Mitchua | last post by:
When I run the well quoted line: my $ascii = HTML::FormatText->new->format(HTML::Parse::parse_html($html)); to remove HTML tags from an html document, it replaces all tables with "". Is there a...
10
by: ST | last post by:
Hi, I'm new to vb.net programming, and I keep getting this error: Name 'GetQuery' is not declared. I can't figure out why?? It seems like I have the right references/namespaces. This is my code...
5
by: jwang | last post by:
I'm currently writing some C code that uses libxml. I've seen several example of parsing xml when the xml are in files. However, I would like to parse the xml from a char buffer. Currently I am ...
1
by: Stephane | last post by:
Hi, I have a html file file that I want to parse with ASP.NET to retreive the value of a custom tag. Let's say that the average html file is about 30 ko. Once the html file is loaded and...
3
by: Matt Fuerst | last post by:
Hi all, I pre-apologize for the level of stupidity that this message will contain. I nearly guarantee that your IQ will be lowered by the end of this message. Me and a co-worker (I only bring...
13
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them...
4
by: bovanshi | last post by:
got this annoying error I'm completly new to php... and i have no clue what is wrong here, from what i can tell there is nothing rong with this code... but that isn't what the borwser say :P ...
9
by: Alec | last post by:
Sorry guys, stupid question.... Am no programming expert and have only just started using php for creating dynamic news pages. Then I see a dynamic website without the php extension. ...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.