Connecting Tech Pros Worldwide Forums | Help | Site Map

Some help with regular expressions

Dmitri
Guest
 
Posts: n/a
#1: Aug 4 '08
Hello RegExp gurus,
I have a little problem.

I need to convert huge and dirty HTML to CSV format, I stuck when I
need to add semicolons inside tags eg:

<tr calss="testme">
<td><b>Foo</b></td>
<td><strong id="bla bla">Bar</strong></td>
</tr>

I need to get:

<tr calss="testme">
<td><b>Foo;</b></td>
<td><strong id="bla bla">Bar;</strong></td>
</tr>

Then I will strip tags, that I can do myself.
Thanks in advance.

Michael Fesser
Guest
 
Posts: n/a
#2: Aug 4 '08

re: Some help with regular expressions


..oO(Dmitri)
Quote:
>Hello RegExp gurus,
>I have a little problem.
>
>I need to convert huge and dirty HTML to CSV format, I stuck when I
>need to add semicolons inside tags eg:
>
><tr calss="testme">
<td><b>Foo</b></td>
<td><strong id="bla bla">Bar</strong></td>
></tr>
>
>I need to get:
>
><tr calss="testme">
<td><b>Foo;</b></td>
<td><strong id="bla bla">Bar;</strong></td>
></tr>
IMHO regular expressions are the wrong tool here.
Quote:
>Then I will strip tags, that I can do myself.
>Thanks in advance.
Have a look at DOM instead to parse the HTML into an XML tree. Then you
can use XPath syntax to access all the nodes you need and easily format
them any way you want.

Micha
Dmitri
Guest
 
Posts: n/a
#3: Aug 4 '08

re: Some help with regular expressions


On Aug 4, 6:47*pm, Michael Fesser <neti...@gmx.dewrote:
Quote:
.oO(Dmitri)
>
>
>
Quote:
Hello RegExp gurus,
I have a little problem.
>
Quote:
I need to convert huge and dirty HTML to CSV format, I stuck when I
need to add semicolons inside tags eg:
>
Quote:
<tr calss="testme">
* *<td><b>Foo</b></td>
* *<td><strong id="bla bla">Bar</strong></td>
</tr>
>
Quote:
I need to get:
>
Quote:
<tr calss="testme">
* *<td><b>Foo;</b></td>
* *<td><strong id="bla bla">Bar;</strong></td>
</tr>
>
IMHO regular expressions are the wrong tool here.
>
Quote:
Then I will strip tags, that I can do myself.
Thanks in advance.
>
Have a look at DOM instead to parse the HTML into an XML tree. Then you
can use XPath syntax to access all the nodes you need and easily format
them any way you want.
>
Micha
Thanks for idea. I didn't think that. ))))
Closed Thread