473,795 Members | 2,924 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

resolving an entity

I am writing a parser for xml that will not have
an associated DTD. I want to be able to handle
certain character references (e.g., ©) in
the program.

When I run the following against a chunk of xml
containing ©, I get the following:

org.xml.sax.SAX ParseException: Reference to undefined entity "©".
at org.apache.crim son.parser.Pars er2.fatal(Parse r2.java:3182)
at org.apache.crim son.parser.Pars er2.fatal(Parse r2.java:3176)
at
org.apache.crim son.parser.Pars er2.expandEntit yInContent(Pars er2.java:2513)
at
org.apache.crim son.parser.Pars er2.maybeRefere nceInContent(Pa rser2.java:2422 )
at org.apache.crim son.parser.Pars er2.content(Par ser2.java:1833)
at org.apache.crim son.parser.Pars er2.maybeElemen t(Parser2.java: 1507)
at org.apache.crim son.parser.Pars er2.content(Par ser2.java:1779)
at org.apache.crim son.parser.Pars er2.maybeElemen t(Parser2.java: 1507)
at org.apache.crim son.parser.Pars er2.content(Par ser2.java:1779)
at org.apache.crim son.parser.Pars er2.maybeElemen t(Parser2.java: 1507)
at org.apache.crim son.parser.Pars er2.parseIntern al(Parser2.java :500)
at org.apache.crim son.parser.Pars er2.parse(Parse r2.java:305)
at org.apache.crim son.parser.XMLR eaderImpl.parse (XMLReaderImpl. java:442)
at javax.xml.parse rs.SAXParser.pa rse(SAXParser.j ava:345)
at javax.xml.parse rs.SAXParser.pa rse(SAXParser.j ava:281)
at Article.main(Ar ticle.java:18)

What can I do to catch these references in my code and output replacement
text for it?

Thanks.
Dean Hoover

Here's the two java files:
---
import java.io.*;
import javax.xml.parse rs.*;
import org.xml.sax.*;
import org.xml.sax.hel pers.*;

public class Article
{
public static void main(String argv[])
{
String file = argv[0];
PrintWriter pw = new PrintWriter(Sys tem.out);
DefaultHandler handler = new LoadXML(pw, LoadXML.TYPE_HT ML);
SAXParserFactor y factory = SAXParserFactor y.newInstance() ;

try
{
SAXParser reader = factory.newSAXP arser();
reader.parse(ne w File(file), handler);
}
catch (Exception e)
{
e.printStackTra ce();
return;
}

pw.flush();
}
}
---
import java.io.*;
import java.util.*;
import javax.xml.parse rs.*;
import org.xml.sax.*;
import org.xml.sax.hel pers.*;

public class LoadXML extends DefaultHandler
{
public static final int TYPE_HTML = 1;
public static final int TYPE_TEXT = 2;

public LoadXML
(
java.io.Writer writer,
int type
)
{
elements_ = new Stack();
writer_ = writer;
type_ = type;
}

public InputSource resolveEntity
(
String publicId,
String systemId
) throws SAXException
{
String s = "stuff";
return new InputSource(new CharArrayReader (s.toCharArray( )));
}

public void startDocument() throws SAXException
{
}

public void endDocument() throws SAXException
{
}

public void startElement
(
String uri,
String localName,
String qName,
Attributes attributes
) throws SAXException
{
String elementName = qName;
elements_.push( elementName);

try
{
if (elementName.eq uals("p"))
{
if (type_ == TYPE_HTML)
writer_.write(" <p class=\"article-text\">");
}
else if (elementName.eq uals("title"))
{
if (type_ == TYPE_HTML)
writer_.write(" <p class=\"article-title\">");
}
else if (elementName.eq uals("by"))
{
if (type_ == TYPE_HTML)
writer_.write(" <p class=\"article-by\">");
}
else if (elementName.eq uals("copyright "))
{
if (type_ == TYPE_HTML)
writer_.write(" <p class=\"article-copyright\">");
}
}
catch (IOException e)
{
throw new SAXException(e) ;
}
}

public void endElement
(
String uri,
String localName,
String qName
) throws SAXException
{
String elementName = qName;
elements_.pop() ;

try
{
if (type_ == TYPE_HTML)
{
if (elementName.eq uals("p") || elementName.equ als("title") ||
elementName.equ als("by") || elementName.equ als("copyright" ))
{
writer_.write(" </p>\n");
}
else if (elementName.eq uals("br"))
{
writer_.write(" <br/>\n");
}
}
}
catch (IOException e)
{
throw new SAXException(e) ;
}
}

public void characters
(
char[] ch,
int start,
int length
) throws SAXException
{
try
{
String content = new String(ch, start, length);
String top = (String)element s_.peek();
String text =
content.replace All("\n", " ").replaceA ll(" +", " ").trim();

if (text.length() == 0)
return;

if (type_ == TYPE_HTML)
{
if (top.equals("p" ) || top.equals("tit le") ||
top.equals("by" ) || top.equals("cop yright"))
writer_.write(t ext);
}
}
catch (IOException e)
{
throw new SAXException(e) ;
}
}

private Stack elements_;
private java.io.Writer writer_;
private int type_;
}

Jul 20 '05 #1
5 2621
"Dean A. Hoover" <dh*******@yaho o.com> wrote in message
news:4q******** ************@tw ister.nyroc.rr. com...
I am writing a parser for xml that will not have
an associated DTD. I want to be able to handle
certain character references (e.g., &copy;) in
the program.


As I understand it, that's quite impossible. The case is defined
in the spec, and without a DTD you don't get to choose what
entities are defined or not.

But DTD may not mean what you think it does. Would it be permissible
for this document to have an internal DTD subset?

<?xml version="1.0"?>
<!DOCTYPE root [ <!ENTITY copy 'copy'> ]>
<root>&copy;</root>

A quick reading of the XML spec suggests (but I may have missed
something) that this is a correct construction in XML.

Groetjes,
Maarten Wiltink
Jul 20 '05 #2
Maarten Wiltink wrote:
"Dean A. Hoover" <dh*******@yaho o.com> wrote in message
news:4q******** ************@tw ister.nyroc.rr. com...
I am writing a parser for xml that will not have
an associated DTD. I want to be able to handle
certain character references (e.g., &copy;) in
the program.

As I understand it, that's quite impossible. The case is defined
in the spec, and without a DTD you don't get to choose what
entities are defined or not.

But DTD may not mean what you think it does. Would it be permissible
for this document to have an internal DTD subset?

<?xml version="1.0"?>
<!DOCTYPE root [ <!ENTITY copy 'copy'> ]>
<root>&copy;</root>

A quick reading of the XML spec suggests (but I may have missed
something) that this is a correct construction in XML.

I really don't want any DTD in the document at all. I am writing
some code that will parse an xml document and output either html
or plain text depending on a parameter. In the case of HTML it
would output "&copy;", in the case of plain text it would output
"(c)". I have other similar context based entities to handle as
well.

Dean

Jul 20 '05 #3


Dean A. Hoover wrote:
Maarten Wiltink wrote:
"Dean A. Hoover" <dh*******@yaho o.com> wrote in message
news:4q******** ************@tw ister.nyroc.rr. com...
I am writing a parser for xml that will not have
an associated DTD. I want to be able to handle
certain character references (e.g., &copy;) in
the program.


As I understand it, that's quite impossible. The case is defined
in the spec, and without a DTD you don't get to choose what
entities are defined or not.

But DTD may not mean what you think it does. Would it be permissible
for this document to have an internal DTD subset?

<?xml version="1.0"?>
<!DOCTYPE root [ <!ENTITY copy 'copy'> ]>
<root>&copy;</root>

A quick reading of the XML spec suggests (but I may have missed
something) that this is a correct construction in XML.

I really don't want any DTD in the document at all. I am writing
some code that will parse an xml document and output either html
or plain text depending on a parameter. In the case of HTML it
would output "&copy;", in the case of plain text it would output
"(c)". I have other similar context based entities to handle as
well.


Well, if you write your own parser then you can of course parse
something alike XML but with references to undefined entities. But then
don't attempt to parse it with an XML parser which expects entities to
be defined.

--

Martin Honnen
http://JavaScript.FAQTs.com/

Jul 20 '05 #4
"Dean A. Hoover" <dh*******@yaho o.com> wrote in message
news:uL******** ************@tw ister.nyroc.rr. com...
Maarten Wiltink wrote:
"Dean A. Hoover" <dh*******@yaho o.com> wrote in message
news:4q******** ************@tw ister.nyroc.rr. com...
I am writing a parser for xml that will not have
an associated DTD. I want to be able to handle
certain character references (e.g., &copy;) in
the program.
[...] I really don't want any DTD in the document at all. I am writing
some code that will parse an xml document and output either html
or plain text depending on a parameter. In the case of HTML it
would output "&copy;", in the case of plain text it would output
"(c)". I have other similar context based entities to handle as
well.


That's reasonable, but entities simply aren't the solution.
Would using processing instructions instead be acceptable?

In XSLT, you could even source in the transformation itself
with document('') and switch treatment of <?copy?> based on
the output method.

I'm working under the assumption that you want the source to
be well-formed XML, valid if possible.

Groetjes,
Maarten Wiltink
Jul 20 '05 #5
In article <4q************ ********@twiste r.nyroc.rr.com> ,
Dean A. Hoover <dh*******@yaho o.com> wrote:
I am writing a parser for xml that will not have
an associated DTD. I want to be able to handle
certain character references (e.g., &copy;) in
the program.


Well, this is not *real* XML.

The simplest thing to do would be to read the file into a string and
prepend an internal subset that declares the entities in question.
This will be easy if you know that there isn't an XML declaration or
DOCTYPE declaration in the file and you know the file's encoding.
Otherwise it will be more tedious.

-- Richard
--
Spam filter: to mail me from a .com/.net site, put my surname in the headers.

FreeBSD rules!
Jul 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2830
by: Vincent Lefevre | last post by:
I would like to know if the base URI considered to resolve an unparsed entity defined by a relative URI should be the URI before or after its rewriting due to a possible catalog. Let's take an example. Here's my XML file: <?xml version="1.0"?> <!DOCTYPE para PUBLIC "-//Norman Walsh//DTD Website Full V2.4.0//EN" "http://docbook.sourceforge.net/release/website/2.4.0/website-full.dtd"
2
3160
by: Ed Dennison | last post by:
I'm starting to look at DocBook-XML (not SGML) for producing a large documentation set. The hierarchy of DocBook elements for organizing the content is (more or less); set book part chapter sect1 sect2
11
2395
by: Douglas Reith | last post by:
Hi There, Can someone please tell me why the XML spec states that an attribute value with an external entity is forbidden? Or point me to the appropriate document? Or better still, perhaps you know of a work around? It is a little frustrating that the normally powerful external entities are limited in this fashion. Example (myextent.txt contains just one word without a CR):
1
2346
by: Vineeth | last post by:
Hi, I am using xerces2.6.0 and am developing a program for converting an xml document to a text file. My program is extending the DefaultHandler. The first problem I am facing is that even though I have turned the Validation feature off, the SAX parser still needs the DTD to be present. If I remove the DOCTYPE declaration from the XML file then it raises an exception for unparsed entities. If I provide a zero byte DTD then the SX parser...
1
6597
by: Razvan | last post by:
Hi What is the difference between an internal and an external entity ? The first one is defined in the internal subset (not in a separate DTD file, but in the XML file itself - in DOCTYPE) while the second one is defined in the external subset (in a separate DTD file) ? Or an
2
6850
by: Gustaf Liljegren | last post by:
I need to merge several XML files into one large. All of them has a DOCTYPE tag, but the SYSTEM identifier points to a DTD that doesn't exist. (I use the PUBLIC identifier with catalog files, so the SYSTEM identifiers has no purpose in my application.) Anyway, when I load each document, using XmlDocument.Load(file), I get a FileNotFoundException, because of the DTD pointer in the SYSTEM identifier. I'd rather skip all the entity...
4
7050
by: terry | last post by:
could someone tell me how to add or remove entity to a xml file when i dim xmlentity as new xmlentity it's say it's sube new is private thks
7
4923
by: Trac Bannon | last post by:
When I load XML from a file into a dotNet XMLDataDocument, the UTF-8 codes are resolved but the 5 special XML entities are not. How can I force those 5 special character types to be translated?
1
2369
by: markla | last post by:
Hi, I have an Entity data model built in Entity Framework, which sources data primarily from an MS SQL 2008 database, and sources some static (data dictionary) values from code-based objects. I know I *could* store the data dict values in SQL: for various reasons that's not the path I want to take. I have some lookups, which are based on code-based Objects. They are encapsulated in Objects because the values can change during...
0
9672
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9519
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10439
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9043
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6783
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5437
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5563
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2920
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.