Tough Regular Expression problem

Bryan

Hi All:

I'm trying to find the right Regexp string to remove empty SPAN tags
from an HTML string.

Say I have a string like so, and I want to remove the empty span tags:

This is my text

A simple expression like this /(.*)?<\/SPAN>/gi will give me the
text between the two span tags, which I can then use in a replace
statement.

This gets much more complicated when we have nested tags, however.
For example:

one two three four five

What I really want after the replace statement is this:

one two three four five

I'm having trouble crafting the perfect expression for this. I can't
seem to get my head around the right solution to handle the greedy vs
non-greedy thing, and not eliminate the wrong closing tag.

Is this even possible with straight expressions?

Thanks in advance for any help you can provide!

Bryan

Jul 23 '05 #1

Subscribe Reply

1609

J. J. Cale

"Bryan" <br***@chameleon-systems.com> wrote in message
news:b1**************************@posting.google.c om...

Hi All:

I'm trying to find the right Regexp string to remove empty SPAN tags
from an HTML string.
if you need to remove the element try the DOM
and specifially the childNodes collection

<snip>
This gets much more complicated when we have nested tags, however.
For example:
one two three four five
 is the containing element
a node of nodeType element. (obj.nodeType = 1)
First you need a reference to the containing span. Either find it via the
DOM tree or give it a specific id <span id="anId" and use
var oRef = document.getElementById('anId');
or whatever you wish to support.
one is a text node type 3 oRef.childNodes[0] or oRef.firstChild
oRef.childNodes[0].nodeValue is 'one'
oRef.childNodes[1] is the next span element (type 1) containing
oRef.childNodes[1].firstChild the textNode containing 'two'
From here there are a number of ways to deal with this.
What I really want after the replace statement is this:
one two three four five

Create a new text node, insert it before the span
you want to delete and delete the span.
Or clone the spanToDelete.firstChild node, insert it.
before the span to delete and delete the span.
Or, copy the span.firstChild.nodeValue, delete the span
and append the copied text to the firstSpan.firstChild.nodeValue
and other possibilities
Google for DOM Level 2 to see how to do these things correctly.
Hope this helps
Jimbo

Jul 23 '05 #2

Bryan

J. J. Cale wrote...

if you need to remove the element try the DOM
and specifially the childNodes collection

Huh. That's an interesting idea. A little more complicated than a
regexp replace, but it should work. If I can come up with something
that's cross-browser, I might be able to use that approach.

Thanks for the idea.

Jul 23 '05 #3

Thomas 'PointedEars' Lahn

Bryan wrote:

[...]
A simple expression like this /(.*)?<\/SPAN>/gi will give me the
text between the two span tags, which I can then use in a replace
statement.

This gets much more complicated when we have nested tags, however.
For example:

one two three four five

What I really want after the replace statement is this:

one two three four five

I'm having trouble crafting the perfect expression for this. I can't
seem to get my head around the right solution to handle the greedy vs
non-greedy thing, and not eliminate the wrong closing tag.

Is this even possible with straight expressions?

No, it is not, by design; or let us say it is not generally possible --
enough constraints provided (such as that `span' elements may not nest,
in opposition to the HTML specifications), it may be possible (which
is why removeTags() exists in my JSX:string.js, BTW).

AIUI, Regular Expressions require either a DFA or a NFA or both of them
to be matched against a text (that said, know that because ECMAScript
implementations like JavaScript and JScript support PCRE alternation,
they must be using either a NFA or a combination of DFA and NFA to
match RegExps). However, to parse arbitrary occurrences of open and
matching close tags, i.e. to recognize a program in a (deterministic)
context-free language, you require a (N)PDA (which could be implemented
as a markup parser to build a parse tree which indeed is done in common
HTML UAs) [1].

See Jeffrey E. F. Friedl, Mastering Regular Expressions, chapter 4,
section 'Multi-Character "Quotes"' pp., available online at
<http://www.oreilly.com/catalog/regex/chapter/ch04.html> for
further information and possible solutions.
PointedEars
___________
[1] It has been a while since my lectures in automata theory, please CMIIW.
--
"Nothing makes you appreciate the weekend like idiots."
-- Jen

Jul 23 '05 #4

by: Harry | last post by:

Hi there, does anyone know how I can build a regular expression e.g. for the string.search() function on runtime, depending on the content of variables? Should be something like this: var...

Javascript

Regular expression problem - Replacing a pattern

by: Dimitris Georgakopuolos | last post by:

Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However,...

C# / C Sharp

Replacing special chars using regular expressions

by: James D. Marshall | last post by:

The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is...

Visual Basic .NET

Regular expression optimization

by: Billa | last post by:

Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...

.NET Framework

Regular Expression Matches

by: Pete Davis | last post by:

I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...

C# / C Sharp

Regular Expressions in C#

by: LordHog | last post by:

Hello all, I am attempting to create a small scripting application to be used during testing. I extract the commands from the script file I was going to tokenize the each line as one of the...

.NET Framework

Get regular expression

by: Mike | last post by:

I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...

C# / C Sharp

Regular Expression Hangs

by: shawnmkramer | last post by:

Anyone every heard of the Regex.IsMatch and Regex.Match methods just hanging and eventually getting a message "Requested Service not found"? I have the following pattern: ^(?<OrgCity>(+)+),...

C# / C Sharp

Regarding regular expressions in Solaris

by: sunil | last post by:

Hi, Am writing one C program for one of my module and facing one problem with the regular expression functions provided by the library libgen.h in solaris. In this library we are having two...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

Tough Regular Expression problem

Similar topics