473,800 Members | 3,056 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

character to HTML ampersand escape sequence converter

Hello,
I'm looking for a program that converts characters of different
encodings (such as EUC-JP, Big5, GB-18030, etc.) into HTML ampersand
escape sequences. Anybody knows where I can find one?

thx.

Jul 23 '05
18 14751
In article <Pi************ *************** ****@ppepc56.ph .gla.ac.uk>,
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
On Sat, 18 Dec 2004, Henri Sivonen wrote:
In article <fg************ @hugin.webthing .com>,
ni**@hugin.webt hing.com (Nick Kew) wrote:
Indeed. I was on the point of suggesting AN XML processor until I
saw that (libxml2 accepts HTML as well as XML input).
A quick glance at the API docs suggested that the HTML API is similar
but separate from the XML API. Is it so?


But does this matter, in the context of the original question?


Perhaps not. It was a new question in the spirit of "discussion
forum--not help desk". :-)
Surely, given any WWW-compatible HTML or XHTML data stream, one can
choose to convert any non-ascii coded character (or any selection of
non-ascii characters) to a unicode code point and thence into
&#bignumber; notation, purely at the character stream layer, without
parsing the rest of the material at all?


Yes, except comments change if they exist and contain non-ASCII.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 23 '05 #11
* Alan J. Flavell wrote in comp.infosystem s.www.authoring.html:
I don't dispute that in theory you can produce counter-examples where
the simple method described above gives the wrong result, for the
reasons you gave; but I'm interested if a real-life example can be
produced where this would matter.


Consider a HTML document with

<style type="text/css">
q:lang(no) { quotes: "«" "»" '"' '"' }
</style>

or consider HTML documents with scripts such as those in

http://www.rfs.jp/sitebuilder/javascript/01/08.html
--
Björn Höhrmann · mailto:bj****@h oehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Jul 23 '05 #12
On Sat, 18 Dec 2004, Bjoern Hoehrmann wrote:
Consider a HTML document with

<style type="text/css">
q:lang(no) { quotes: "«" "»" '"' '"' }
</style>

or consider HTML documents with scripts such as those in

http://www.rfs.jp/sitebuilder/javascript/01/08.html


OK, I concede.

Of course, if the target encoding was meant to be us-ascii with
&#bignumber; representations of non-ascii characters (which might have
been what the questioner had in mind, since I undestood the request to
be for &#bignumber; representation rather than actual utf-8-encoded
characters in the HTML part), then you'd need CSS-aware and
Javascript-aware converters to know how to represent those non-ascii
characters in their respective languages.

Indeed the W3C were wise in their XHTML documentation to recommend
moving those enclosures out into separate files rather than trying to
in-line them as CDATA ;-)
Jul 23 '05 #13
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
Indeed. I was on the point of suggesting AN XML processor until I saw
that (libxml2 accepts HTML as well as XML input).
A quick glance at the API docs suggested that the HTML API is similar
but separate from the XML API. Is it so?


Yes, that's a reasonably fair summary. The HTML parser is the XML
parser with tolerance of non-XML and knowledge of HTML4.
Is there an equivalent of SAX
filter or somesuch that would make HTML appear to the app as XHTML?
The HTML parser gives you either SAX or DOM, and will process either
HTML or XHTML input without distinction. HTML mode is also tolerant
of tag-soup, though not quite as forgiving as a typical browser.
There are a few bugs wrt the spec: most obviously, it only recognises
XML comment syntax (but then, so do the browsers).

As a corollary, you can use it to apply XML processing to HTML.
TagSoup on the Java side appears to the app as an XML parser parsing
XHTML.
I'm not familiar with that, but it's not uncommon.
Has anyone compared the tag slurping features of TagSoup and libxml2? I
Wonder which one is a better idea when writing in Python: using libxml2
with CPython or using TagSoup with Jython?


Couldn't tell you. But I'd venture a strong guess that libxml2 will be
not only a great deal faster than anything-java, but also no harder
and possibly easier to work with.
--
Nick Kew
Jul 23 '05 #14
In article <fu************ @hugin.webthing .com>,
ni**@hugin.webt hing.com (Nick Kew) wrote:
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
Indeed. I was on the point of suggesting AN XML processor until I saw
that (libxml2 accepts HTML as well as XML input).
The HTML parser gives you either SAX or DOM, and will process either
HTML or XHTML input without distinction.
Are the elements in the XHTML namespace or in no namespace? The good
thing about TagSoup is that it allows the app internals to be written
for XHTML, so the same app internals work for HTML, XHTML *and*
XHTML+FooML (using an XML parser). That is, the HTML/XHTML difference is
left on the parsing level and not carried over to higher levels as in
browsers.
But I'd venture a strong guess that libxml2 will be
not only a great deal faster than anything-java, but also no harder
and possibly easier to work with.


I think I read somewhere that the libxml2 wrapper gives the Python side
UTF-8 byte strings instead of Python Unicode strings.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 23 '05 #15
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
>> Indeed. I was on the point of suggesting AN XML processor until I saw
>> that (libxml2 accepts HTML as well as XML input).
The HTML parser gives you either SAX or DOM, and will process either
HTML or XHTML input without distinction.


Are the elements in the XHTML namespace or in no namespace?


They're not namespaced. At least not in the SAX parse mode, which is
where I've investigated the issue. At least, my preliminary experiments
trying to use the HTML parser in SAX2 mode were not successful, which
is not to say I won't return to the issue.
The good
thing about TagSoup is that it allows the app internals to be written
for XHTML, so the same app internals work for HTML, XHTML *and*
XHTML+FooML (using an XML parser). That is, the HTML/XHTML difference is
left on the parsing level and not carried over to higher levels as in
browsers.


Watch this space. That's what I'd like mod_publisher to do. OTOH,
how many people mix HTML (no X) with other namespaces in real life?
The full capability is at best a pathological edge-case.

BTW, if you're interested in namespace processing on the Web,
may I refer you to my recently-published article at
http://www.xml.com/pub/a/2004/12/15/...amespaces.html

--
Nick Kew
Jul 23 '05 #16
In article <cq***********@ hugin.webthing. com>,
ni**@hugin.webt hing.com (Nick Kew) wrote:
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:

>> Indeed. I was on the point of suggesting AN XML processor until I saw
>> that (libxml2 accepts HTML as well as XML input).
The HTML parser gives you either SAX or DOM, and will process either
HTML or XHTML input without distinction.


Are the elements in the XHTML namespace or in no namespace?


They're not namespaced.


That's a pity. Of course, it's possible to write a filter that takes
SAX1 events, adds the namespacing and emits SAX2 events, but it is
uncool to have to implement stuff that a library should be able to do
out of the box.
The good
thing about TagSoup is that it allows the app internals to be written
for XHTML, so the same app internals work for HTML, XHTML *and*
XHTML+FooML (using an XML parser). That is, the HTML/XHTML difference is
left on the parsing level and not carried over to higher levels as in
browsers.


Watch this space. That's what I'd like mod_publisher to do. OTOH,
how many people mix HTML (no X) with other namespaces in real life?


The people who export from MS Office?

I was not suggesting that namespaces in HTML should be supported. How
that would work isn't even defined.

However, I think it doesn't make sense to write the app internals for
namespaceless HTML so that massive rework is needed for XHTML+FooML. It
makes more sense to write the app internals for namespaced compound
documents and to convert HTML to XHTML at parse time. Using an XML
parser is the right way to go for XHTML and XHTML+FooML.
BTW, if you're interested in namespace processing on the Web,
may I refer you to my recently-published article at
http://www.xml.com/pub/a/2004/12/15/...amespaces.html


Interesting.

BTW, how do you reconcile the GPL and the Apache license?

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 23 '05 #17
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
Watch this space. That's what I'd like mod_publisher to do. OTOH,
how many people mix HTML (no X) with other namespaces in real life?
The people who export from MS Office?


Good catch. I'd forgotten that one. Don't they try/claim to be XHTML?
I was not suggesting that namespaces in HTML should be supported. How
that would work isn't even defined.


It would presumably work by treating it as XHTML. Like XPath, XSLT,
etc, which do work fine with HTML and the libxml2 parser.
BTW, if you're interested in namespace processing on the Web,
may I refer you to my recently-published article at
http://www.xml.com/pub/a/2004/12/15/...amespaces.html


Interesting.

BTW, how do you reconcile the GPL and the Apache license?


Why is that a problem? My work is GPL (if you want it free - dual
licensing available otherwise). Apache is ASF license. They are
distributed separately. Those Linux distros (and FreeBSD) that
package my GPL modules offer them to users as separate packages,
and don't have a problem with it. Even the fundamentalists at
Debian don't have a problem with it. Any more than they have a
problem distributing non-GPL apps like Apache to run on Linux itself.

--
Nick Kew
Jul 23 '05 #18
In article <l8************ @hugin.webthing .com>,
ni**@hugin.webt hing.com (Nick Kew) wrote:
In article <hs************ *************** *@news.dnainter net.net>,
Henri Sivonen <hs******@iki.f i> writes:
Watch this space. That's what I'd like mod_publisher to do. OTOH,
how many people mix HTML (no X) with other namespaces in real life?
The people who export from MS Office?


Good catch. I'd forgotten that one. Don't they try/claim to be XHTML?


I don't think so. It's more like HTML tag soup spiced up with colonified
names and XML "data islands".
I was not suggesting that namespaces in HTML should be supported. How
that would work isn't even defined.


It would presumably work by treating it as XHTML.


With namespaces in HTML I meant this kind of Microsoftism:

<HTML xmlns:k='urn:ke wl-schema-urn'>
<HEAD>
<TITLE>Test</TITLE>
<xml>
<k:foo>
<k:bar/>
</k:foo>
</xml>
</HEAD>
<BODY>
....
</BODY>
</HTML>

(I suppose Microsoft has defined how that is supposed to work. So saying
it isn't defined was not entirely accurate.)
Why is that a problem?
The FSF lists the Apache licenses 1.0, 1.1 and 2.0 as GPL-incompatible
free software licenses.

http://www.fsf.org/licenses/license-...atibleLicenses
Even the fundamentalists at Debian don't have a problem with it.
That's surprising. :-)
Any more than they have a
problem distributing non-GPL apps like Apache to run on Linux itself.


IIRC, Linus Torvalds declared an exception when the subject came up.

--
Henri Sivonen
hs******@iki.fi
http://iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 23 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2025
by: news.hunterlink.net.au | last post by:
(* note the escaped ampersand and the character reference have extra spaces to avoid being converted when viewed) I have a job that requires the following <ThisElement>Here is some text & a m p ; here is a & # x E 2 ; character</ThisElement> to end up as
1
1170
by: Rob Morrison | last post by:
The sample below demonstates an issue that I cannot seem to workaround. I have an Url with a value that contains an ampersand. I have escaped the Url using both the hex value and it works fine when used in a href. But, if I pass the same Url to the open() function it unescapes my ampersand while leaving the other escaped untouched. This behavior is the same for both IE and Mozilla Firefox, I guess this is known behvoir unknown to me. ...
9
3363
by: Christian Kandeler | last post by:
Hi, if I want to store the string "123456" in a variable of type char, I can do it like this: char s = "123456"; Or like this: char s = { '1', '2', '3', '4', '5', '6', '\0' };
7
96338
by: teachtiro | last post by:
Hi, 'C' says \ is the escape character to be used when characters are to be interpreted in an uncommon sense, e.g. \t usage in printf(), but for printing % through printf(), i have read that %% should be used. Wouldn't it have been better (from design perspective) if the same escape character had been used in this case too. Forgive me for posting without verfying things with any standard compiler, i don't have the means for now.
12
9647
by: Jeff S | last post by:
In a VB.NET code behind module, I build a string for a link that points to a JavaScript function. The two lines of code below show what is relevant. PopupLink = "javascript:PopUpWindow(" & Chr(34) & PopUpWindowTitle & Chr(34) & ", " & Chr(34) & CurrentEventDetails & ")" strTemp += "<BR><A HREF='#' onClick='" & PopupLink & "'>" & EventName & "</A><BR>" The problem I have is that when the string variables or contain a string with an...
15
18322
by: pkaeowic | last post by:
I am having a problem with the "escape" character \e. This code is in my Windows form KeyPress event. The compiler gives me "unrecognized escape sequence" even though this is documented in MSDN. Any idea if this is a bug? if (e.KeyChar == '\e') { this.Close(); }
2
1872
by: christopher taylor | last post by:
hello python-list! the other day, i was trying to match unicode character sequences that looked like this: \\uAD0X... my issue, is that the pattern i used was returning:
8
3078
by: mdh | last post by:
Hi all, I have a file, whose path is: "/Users/m/k&R/test_file" How do I include the '&' in a string constant? ( I need this for the example on p162). I have tried to use the Hex notation x26, as in "/Users/m/k\x26R/test_file".
0
9550
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10495
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10032
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9085
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6811
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5469
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5597
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4148
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2942
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.