473,387 Members | 1,463 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Does a DTD change the default namespace or something?

I'm feeling very stupid about this ...

pdf2html (http://pdf2html.sourceforge.net) is an app that reads a PDF
and can generate HTML or XML; in my case I'm using the XML. The PDF I'm
working with is a concatenation of many reports; my objective is to
find the first page of each report, which I've discovered can be found
in this particular instance by looking for an xml element with a
particular attribute "left" equal to 277.

So I want to consume this XML using XPath, to find all "page" elements
that contain "text" elements that have an attribute of 277. The XPath
expression is therefore:

"/pdf2xml/page/text[@left=277]"

Works great ... IF I change the XML output by the tool to remove the
DTD reference. If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceManager? What do I use it
with?

Altering the input XML is not the preferred option here. I also have a
version that just uses the Reader to walk the tree ... I want to get
away from that because I eventually want to be able to specify an XPath
query as input.

My code:
Sub test()
Dim inputfile As String = "test.xml"
Dim r As New XmlTextReader(inputfile)
Dim xd as New Xml.XPathDocument(r)
Dim nav As XPath.XPathNavigator = xd.CreateNavigator()
Dim expr As XPath.XPathExpression =
nav.Compile("/pdf2xml/page/text[@left=277]")
Dim ni As XPath.XPathNodeIterator = nav.Select(expr)
Do While ni.MoveNext()
Dim node As XPath.XPathNavigator = ni.Current
Dim ani As XPath.XPathNodeIterator = _
node.SelectAncestors(XPath.XPathNodeType.Element, False)
ani.MoveNext()
Dim pagenum As Integer = ani.Current.GetAttribute("number",
"")
Debug.WriteLine(pagenum)
Loop
End Sub

My XML is below, showing two pages; the desired result is to get the
first page. It's actual output from pdf2html, slightly stripped and
censored.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml>
<page number="1" position="absolute" top="0" left="0" height="1188"
width="918">
<text top="805" left="277" width="0" height="18" font="0"><i><b>Person
Name</b></i></text>
<text top="805" left="298" width="0" height="18" font="0"><i><b>123
Main St</b></i></text>
<text top="805" left="319" width="0" height="18"
font="0"><i><b>Hometown, IL 60000</b></i></text>
</page>
<page number="2" position="absolute" top="0" left="0" height="1188"
width="918">
<text top="245" left="144" width="136" height="18"
font="0"><i><b>Person Name</b></i></text>
<text top="266" left="144" width="124" height="18" font="0"><i><b>123
Main St</b></i></text>
<text top="287" left="144" width="168" height="18"
font="0"><i><b>Hometown, IL 60000</b></i></text>
<text top="470" left="143" width="319" height="19"
font="1"><b>STATEMENT OF MANAGEMENT FEES</b></text>
</page>
</pdf2xml>

Jan 10 '07 #1
4 3350
* Ross Presser wrote in microsoft.public.dotnet.xml:
>Works great ... IF I change the XML output by the tool to remove the
DTD reference. If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceManager? What do I use it
with?
Yes, unfortunately some DTDs declare a default namespace and cause such
confusion. If the generating tool does not itself declare the namespace
in the document, I would consider that a bug in the tool. For using the
XmlNamespaceManager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jan 10 '07 #2

Bjoern Hoehrmann wrote:
* Ross Presser wrote in microsoft.public.dotnet.xml:
Works great ... IF I change the XML output by the tool to remove the
DTD reference. If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceManager? What do I use it
with?

Yes, unfortunately some DTDs declare a default namespace and cause such
confusion. If the generating tool does not itself declare the namespace
in the document, I would consider that a bug in the tool. For using the
XmlNamespaceManager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx
The thing is, I can't figure out what namespace is being applied.
This was the DTD line in the XML file:

<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

and this is the contents of pdf2xml.dtd:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT pdf2xml (page+)>
<!ELEMENT page (fontspec*, text*)>
<!ATTLIST page
number CDATA #REQUIRED
position CDATA #REQUIRED
top CDATA #REQUIRED
left CDATA #REQUIRED
height CDATA #REQUIRED
width CDATA #REQUIRED
>
<!ELEMENT fontspec EMPTY>
<!ATTLIST fontspec
id CDATA #REQUIRED
size CDATA #REQUIRED
family CDATA #REQUIRED
color CDATA #REQUIRED
>
<!ELEMENT text (#PCDATA | b | i)*>
<!ATTLIST text
top CDATA #REQUIRED
left CDATA #REQUIRED
width CDATA #REQUIRED
height CDATA #REQUIRED
font CDATA #REQUIRED
>
<!ELEMENT b (#PCDATA)>
<!ELEMENT i (#PCDATA)>

Some experimentation with msxslt, by the way, did not seem to show a
need to use a namespace.

Jan 10 '07 #3
Ross Presser wrote:
and this is the contents of pdf2xml.dtd:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT pdf2xml (page+)>
<!ELEMENT page (fontspec*, text*)>
<!ATTLIST page
number CDATA #REQUIRED
position CDATA #REQUIRED
top CDATA #REQUIRED
left CDATA #REQUIRED
height CDATA #REQUIRED
width CDATA #REQUIRED
<!ELEMENT fontspec EMPTY>
<!ATTLIST fontspec
id CDATA #REQUIRED
size CDATA #REQUIRED
family CDATA #REQUIRED
color CDATA #REQUIRED
<!ELEMENT text (#PCDATA | b | i)*>
<!ATTLIST text
top CDATA #REQUIRED
left CDATA #REQUIRED
width CDATA #REQUIRED
height CDATA #REQUIRED
font CDATA #REQUIRED
<!ELEMENT b (#PCDATA)>
<!ELEMENT i (#PCDATA)>

Some experimentation with msxslt, by the way, did not seem to show a
need to use a namespace.
There is no xmlns attribute defined in that DTD.

As for your original problem with .NET code, which version of the .NET
framework are you using?
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 11 '07 #4

Martin Honnen wrote:
There is no xmlns attribute defined in that DTD.

As for your original problem with .NET code, which version of the .NET
framework are you using?
Version 1.1

Jan 11 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Patrick Kowalzick | last post by:
Dear NG, I would like to change the allocator of e.g. all std::strings, without changing my code. Is there a portable solution to achieve this? The only nice solution I can think of, would be...
1
by: Izvra | last post by:
How can i specify default XML namespace when it does not declared in the xml document I need it for validation against xml schema @@@I have a procedure Sub ValidateXMLDocument(ByRef XMLDocument...
2
by: Rick | last post by:
I have an XML document that is generated from Infopath, I need to change the value of a namespace that is defined in a node in the form: <xsf:xDocumentClass "xmlns:my=valuehere">. when i navigate...
5
by: Genboy | last post by:
My "VIS" Website, which is a C# site created in VS.NET, Framework 1.1, is no longer compiling for me via the command line. As I have done 600 times in the last year and a half, I can compile to...
2
by: Besta | last post by:
Hello all, I am having trouble creating a windows service with a timer. Everything seems to go ok but the elapsed event does not fire.Can anyone shed any light on this, may be something simple as...
14
by: Anoop | last post by:
Hi, I am new to this newsgroup and need help in the following questions. 1. I am workin' on a GUI application. Does C# provides Layout Managers the way Java does to design GUI? I know that it...
1
by: Christof Nordiek | last post by:
Hi all, I'm working on an Outlook-AddIn with VSTO 2005. I'd like to change the default namespace to something like "Componyname.Projectgroup.Addinname". But in the property page the default...
7
by: beachdog | last post by:
I'm using Visual Studio 2005/C# to build a web client. The web server is something I've written in a different framework, which does not support generating wsdl, so I have hand-built a wsdl file,...
5
by: =?Utf-8?B?bXBhaW5l?= | last post by:
Hello, I am completely lost as to why I can't update a DropDownList inside a DetailsView after I perform an insert into an object datasource. I tried to simply it down to the core demostration:...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.