Connecting Tech Pros Worldwide Forums | Help | Site Map

Regular Expression help

Rob
Guest
 
Posts: n/a
#1: Apr 26 '07
Hi,
I need to convert our word documents to html for our website. I've used
MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
2.0" to clean up the code but I am stuck with a lot of additional code
and I want to write a script that will do a custom cleanup.

The Word document has a "Table of Contents" and when I convert, I get
links at the top of my page that link to the appropriate section but I
get code like this:

<a name="_Toc54767572"></a><a name="_Toc58978952"></a><a
name="_Toc58980987"></a><a
name="_Toc58981749"></a><a name="_Toc90871301"></a><a
name="_Toc93973545"></a><a
name="_Toc126114863"></a>
<a name="_Toc157391168">My Title</a>

I get a whole bunch of empty anchor tags each with a different name and
only the last anchor tag is correct. I would like to use regular
expressions to remove all empty "a" tags.

I know how to use regular expressions with ASP 3.0 but I don't know the
pattern.

Does anyone know the regex.pattern to replace all empty <atags with an
empty string?

Thanks
Rob



*** Sent via Developersdex http://www.developersdex.com ***

Alexey Smirnov
Guest
 
Posts: n/a
#2: Apr 27 '07

re: Regular Expression help



"Rob" <robert@hotmail.comwrote in message
news:uMscFjDiHHA.4904@TK2MSFTNGP05.phx.gbl...
Quote:
Hi,
I need to convert our word documents to html for our website. I've used
MS Word's "Save as HTML" feature and ran "Microsoft Office HTML Filtrer
2.0" to clean up the code but I am stuck with a lot of additional code
and I want to write a script that will do a custom cleanup.
>
The Word document has a "Table of Contents" and when I convert, I get
links at the top of my page that link to the appropriate section but I
get code like this:
>
<a name="_Toc54767572"></a><a name="_Toc58978952"></a><a
name="_Toc58980987"></a><a
name="_Toc58981749"></a><a name="_Toc90871301"></a><a
name="_Toc93973545"></a><a
name="_Toc126114863"></a>
<a name="_Toc157391168">My Title</a>
>
I get a whole bunch of empty anchor tags each with a different name and
only the last anchor tag is correct. I would like to use regular
expressions to remove all empty "a" tags.
>
Rob, I think something similar to

Set RegularExpressionObject = New RegExp

With RegularExpressionObject
..Pattern = "\<a(.|\n)*\>\<\/a\>"
..IgnoreCase = True
..Global = True
End With

ReplacedText = RegularExpressionObject.Replace(InitialText, "")


Evertjan.
Guest
 
Posts: n/a
#3: Apr 27 '07

re: Regular Expression help


Alexey Smirnov wrote on 27 apr 2007 in
microsoft.public.inetserver.asp.general:
Quote:
>
"Rob" <robert@hotmail.comwrote in message
news:uMscFjDiHHA.4904@TK2MSFTNGP05.phx.gbl...
[..]
Quote:
Quote:
>>
>I get a whole bunch of empty anchor tags each with a different name
>and only the last anchor tag is correct. I would like to use regular
>expressions to remove all empty "a" tags.
>>
>
Rob, I think something similar to
>
Set RegularExpressionObject = New RegExp
>
With RegularExpressionObject
.Pattern = "\<a(.|\n)*\>\<\/a\>"
.IgnoreCase = True
.Global = True
End With
>
ReplacedText = RegularExpressionObject.Replace(InitialText, "")
..Pattern = "<a[^>]*>\s*<\/a>"

will do.

=================

However, why [yes, I know it is personal preference] not use a bit of
jscript even if you use vbs in ASP:


<% ' vbs
dim t,result
t="x<a \nhref='bbb'\n </a>\n\n<a href='bbb'x </a>"
result = deleteEmptyAnchors(t)
%>


<script language='jscript' runat='server'>
function deleteEmptyAnchors(t){
return t.replace(/<a[^>]*>\s*<\/a>/gi,'');
};
</script>


--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Rob
Guest
 
Posts: n/a
#4: Apr 27 '07

re: Regular Expression help


Thanks Evertjan

I tried the other example "\<a(.|\n)*\>\<\/a\>" but my page was taking
too long to process it. Then I tried your example "<a[^>]*>\s*<\/a>" and
it works great.

Thanks again.

Rob



*** Sent via Developersdex http://www.developersdex.com ***
Closed Thread


Similar ASP / Active Server Pages bytes