473,773 Members | 2,326 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Does a DTD change the default namespace or something?

I'm feeling very stupid about this ...

pdf2html (http://pdf2html.sourceforge.net) is an app that reads a PDF
and can generate HTML or XML; in my case I'm using the XML. The PDF I'm
working with is a concatenation of many reports; my objective is to
find the first page of each report, which I've discovered can be found
in this particular instance by looking for an xml element with a
particular attribute "left" equal to 277.

So I want to consume this XML using XPath, to find all "page" elements
that contain "text" elements that have an attribute of 277. The XPath
expression is therefore:

"/pdf2xml/page/text[@left=277]"

Works great ... IF I change the XML output by the tool to remove the
DTD reference. If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceMan ager? What do I use it
with?

Altering the input XML is not the preferred option here. I also have a
version that just uses the Reader to walk the tree ... I want to get
away from that because I eventually want to be able to specify an XPath
query as input.

My code:
Sub test()
Dim inputfile As String = "test.xml"
Dim r As New XmlTextReader(i nputfile)
Dim xd as New Xml.XPathDocume nt(r)
Dim nav As XPath.XPathNavi gator = xd.CreateNaviga tor()
Dim expr As XPath.XPathExpr ession =
nav.Compile("/pdf2xml/page/text[@left=277]")
Dim ni As XPath.XPathNode Iterator = nav.Select(expr )
Do While ni.MoveNext()
Dim node As XPath.XPathNavi gator = ni.Current
Dim ani As XPath.XPathNode Iterator = _
node.SelectAnce stors(XPath.XPa thNodeType.Elem ent, False)
ani.MoveNext()
Dim pagenum As Integer = ani.Current.Get Attribute("numb er",
"")
Debug.WriteLine (pagenum)
Loop
End Sub

My XML is below, showing two pages; the desired result is to get the
first page. It's actual output from pdf2html, slightly stripped and
censored.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dt d">

<pdf2xml>
<page number="1" position="absol ute" top="0" left="0" height="1188"
width="918">
<text top="805" left="277" width="0" height="18" font="0"><i><b> Person
Name</b></i></text>
<text top="805" left="298" width="0" height="18" font="0"><i><b> 123
Main St</b></i></text>
<text top="805" left="319" width="0" height="18"
font="0"><i><b> Hometown, IL 60000</b></i></text>
</page>
<page number="2" position="absol ute" top="0" left="0" height="1188"
width="918">
<text top="245" left="144" width="136" height="18"
font="0"><i><b> Person Name</b></i></text>
<text top="266" left="144" width="124" height="18" font="0"><i><b> 123
Main St</b></i></text>
<text top="287" left="144" width="168" height="18"
font="0"><i><b> Hometown, IL 60000</b></i></text>
<text top="470" left="143" width="319" height="19"
font="1"><b>STA TEMENT OF MANAGEMENT FEES</b></text>
</page>
</pdf2xml>

Jan 10 '07 #1
4 3378
* Ross Presser wrote in microsoft.publi c.dotnet.xml:
>Works great ... IF I change the XML output by the tool to remove the
DTD reference. If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceMan ager? What do I use it
with?
Yes, unfortunately some DTDs declare a default namespace and cause such
confusion. If the generating tool does not itself declare the namespace
in the document, I would consider that a bug in the tool. For using the
XmlNamespaceMan ager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx
--
Björn Höhrmann · mailto:bj****@h oehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jan 10 '07 #2

Bjoern Hoehrmann wrote:
* Ross Presser wrote in microsoft.publi c.dotnet.xml:
Works great ... IF I change the XML output by the tool to remove the
DTD reference. If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceMan ager? What do I use it
with?

Yes, unfortunately some DTDs declare a default namespace and cause such
confusion. If the generating tool does not itself declare the namespace
in the document, I would consider that a bug in the tool. For using the
XmlNamespaceMan ager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx
The thing is, I can't figure out what namespace is being applied.
This was the DTD line in the XML file:

<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dt d">

and this is the contents of pdf2xml.dtd:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT pdf2xml (page+)>
<!ELEMENT page (fontspec*, text*)>
<!ATTLIST page
number CDATA #REQUIRED
position CDATA #REQUIRED
top CDATA #REQUIRED
left CDATA #REQUIRED
height CDATA #REQUIRED
width CDATA #REQUIRED
>
<!ELEMENT fontspec EMPTY>
<!ATTLIST fontspec
id CDATA #REQUIRED
size CDATA #REQUIRED
family CDATA #REQUIRED
color CDATA #REQUIRED
>
<!ELEMENT text (#PCDATA | b | i)*>
<!ATTLIST text
top CDATA #REQUIRED
left CDATA #REQUIRED
width CDATA #REQUIRED
height CDATA #REQUIRED
font CDATA #REQUIRED
>
<!ELEMENT b (#PCDATA)>
<!ELEMENT i (#PCDATA)>

Some experimentation with msxslt, by the way, did not seem to show a
need to use a namespace.

Jan 10 '07 #3
Ross Presser wrote:
and this is the contents of pdf2xml.dtd:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT pdf2xml (page+)>
<!ELEMENT page (fontspec*, text*)>
<!ATTLIST page
number CDATA #REQUIRED
position CDATA #REQUIRED
top CDATA #REQUIRED
left CDATA #REQUIRED
height CDATA #REQUIRED
width CDATA #REQUIRED
<!ELEMENT fontspec EMPTY>
<!ATTLIST fontspec
id CDATA #REQUIRED
size CDATA #REQUIRED
family CDATA #REQUIRED
color CDATA #REQUIRED
<!ELEMENT text (#PCDATA | b | i)*>
<!ATTLIST text
top CDATA #REQUIRED
left CDATA #REQUIRED
width CDATA #REQUIRED
height CDATA #REQUIRED
font CDATA #REQUIRED
<!ELEMENT b (#PCDATA)>
<!ELEMENT i (#PCDATA)>

Some experimentation with msxslt, by the way, did not seem to show a
need to use a namespace.
There is no xmlns attribute defined in that DTD.

As for your original problem with .NET code, which version of the .NET
framework are you using?
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jan 11 '07 #4

Martin Honnen wrote:
There is no xmlns attribute defined in that DTD.

As for your original problem with .NET code, which version of the .NET
framework are you using?
Version 1.1

Jan 11 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
9200
by: Patrick Kowalzick | last post by:
Dear NG, I would like to change the allocator of e.g. all std::strings, without changing my code. Is there a portable solution to achieve this? The only nice solution I can think of, would be a namespace and another typedef to basic_string: namespace my_string {
1
2970
by: Izvra | last post by:
How can i specify default XML namespace when it does not declared in the xml document I need it for validation against xml schema @@@I have a procedure Sub ValidateXMLDocument(ByRef XMLDocument As Xml.XmlDocument, ByVal SchemaPath as string) If XMLDocument.DocumentElement.NamespaceURI = String.Empty Then '@@@ calculating target namspace of the schema
2
6854
by: Rick | last post by:
I have an XML document that is generated from Infopath, I need to change the value of a namespace that is defined in a node in the form: <xsf:xDocumentClass "xmlns:my=valuehere">. when i navigate to the namespace value it is read only and cannot be changed. Can anyone tell me how to change this value? My Code: Dim sSingleNode As XmlNode = objDoc.SelectSingleNode("//xsf:Node", NamespaceManager)
5
14038
by: Genboy | last post by:
My "VIS" Website, which is a C# site created in VS.NET, Framework 1.1, is no longer compiling for me via the command line. As I have done 600 times in the last year and a half, I can compile to VIS.DLL via Visual Studio, with no problems: ------ Rebuild All started: Project: VIS, Configuration: Debug .NET ------ Preparing resources... Updating references...
2
5161
by: Besta | last post by:
Hello all, I am having trouble creating a windows service with a timer. Everything seems to go ok but the elapsed event does not fire.Can anyone shed any light on this, may be something simple as I am new to this. Full code below : using System; using System.Collections; using System.ComponentModel;
14
4863
by: Anoop | last post by:
Hi, I am new to this newsgroup and need help in the following questions. 1. I am workin' on a GUI application. Does C# provides Layout Managers the way Java does to design GUI? I know that it can be done using the designer but I intentionally don't want to use that. The one reason is that you cannot change the code generated by the designer. The other could be that you have more free hand and control to design your GUI. 2....
1
2547
by: Christof Nordiek | last post by:
Hi all, I'm working on an Outlook-AddIn with VSTO 2005. I'd like to change the default namespace to something like "Componyname.Projectgroup.Addinname". But in the property page the default namespace input is disabled. Why is this so? And, how can I change the defaultnamespace used. thx Christof
7
5881
by: beachdog | last post by:
I'm using Visual Studio 2005/C# to build a web client. The web server is something I've written in a different framework, which does not support generating wsdl, so I have hand-built a wsdl file, then created my proxy class by running wsdl.exe. The problem is that the SOAP message that the client generates contains an empty namespace for the parameters in my message, instead of the namespace I intended it to have. I am guessing it is...
5
3466
by: =?Utf-8?B?bXBhaW5l?= | last post by:
Hello, I am completely lost as to why I can't update a DropDownList inside a DetailsView after I perform an insert into an object datasource. I tried to simply it down to the core demostration: default.aspx:
0
9621
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10264
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10106
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10039
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9914
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8937
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
2
3610
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2852
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.