By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,994 Members | 2,036 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,994 IT Pros & Developers. It's quick & easy.

xhtml -> database

P: n/a
cross-posted to: mailing.database.myodbc,comp.text.xml

I have an xhtml file whose data I'd like to import to MySQL.
Unfortunately, mysqlimport will only work with text files. Mixed in
with text are some links, URL's, which I'd like to import to the
database. For the most part, a copy/paste into a plain-text file would
do the trick, but the links get lost in the process.

how do I grab the links?

Here's a snippet of the xhtml:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta
http-equiv="content-type" content="text/html; charset=utf-8" /><title
/><meta name="generator" content="StarOffice/OpenOffice.org XSLT
(http://xml.openoffice.org/sx2ml)" /><meta name="created"
content="2006-02-07T04:18:53" /><meta name="changed"
content="2006-02-07T04:19:36" /><base href="." /><style
type="text/css">
@page { }
table { border-collapse:collapse; border-spacing:0;
empty-cells:show }
td, th { vertical-align:top; }
h1, h2, h3, h4, h5, h6 { clear:both }
ol, ul { padding:0; }
* { margin:0; }
*.ta1 { }
*.ce1 { font-family:'Nimbus Roman No9 L'; font-size:10pt;
font-style:normal; text-shadow:none; font-weight:normal; }
*.Default { font-family:'Bitstream Vera Sans'; }
*.Heading { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
*.Heading1 { font-family:'Bitstream Vera Sans';
text-align:center ! important; font-size:16pt; font-style:italic;
font-weight:bold; }
*.Result { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
*.Result2 { font-family:'Bitstream Vera Sans';
font-style:italic; font-weight:bold; text-decoration:underline; }
*.co1 { width:0.8925in; }
*.ro1 { height:0.4146in; }
*.ro2 { height:0.2173in; }
*.ro3 { height:0.611in; }
*.ro4 { height:0.8083in; }
*.ro5 { height:0.1681in; }
....

thanks,

Thufir

Feb 7 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
ha**********@gmail.com wrote:
cross-posted to: mailing.database.myodbc,comp.text.xml

I have an xhtml file whose data I'd like to import to MySQL.
Unfortunately, mysqlimport will only work with text files. Mixed in
with text are some links, URL's, which I'd like to import to the
database. For the most part, a copy/paste into a plain-text file would
do the trick, but the links get lost in t


If you are looking for URLs *ANYWHERE* in the doc, irrespective of where
they are and what they are pointing at, I suggest you think of the XML
as just a text, and use a regular expression extractor thing. Unix geeks
have sed, grep etc., or you can code it in .NET or Java or Perl pretty
easily.

Soren
Feb 10 '06 #2

P: n/a
Soren Kuula wrote:
ha**********@gmail.com wrote:
cross-posted to: mailing.database.myodbc,comp.text.xml

I have an xhtml file whose data I'd like to import to MySQL.
Unfortunately, mysqlimport will only work with text files. Mixed in
with text are some links, URL's, which I'd like to import to the
database. For the most part, a copy/paste into a plain-text file would
do the trick, but the links get lost in t


If you are looking for URLs *ANYWHERE* in the doc, irrespective of where
they are and what they are pointing at, I suggest you think of the XML
as just a text, and use a regular expression extractor thing. Unix geeks
have sed, grep etc., or you can code it in .NET or Java or Perl pretty
easily.

Soren


I think I'll give it a go with Saxon, hopefully this weekend. For this
particular example, yes, the URL's are "anywhere" but that might not be
the case down the road.
thanks,

Thufir

Feb 10 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.