473,324 Members | 2,541 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Translating foreign characters to HTML code

Is there a way or a program (for windows) that can translate foreign
characters inot the proper html code? I have a word document with many
different characters and I really don't want to spend all the time
editing it with all the html code ie "&#347ci&#261". Certainly someone
must have a program that can do this automatically.

Jul 24 '05 #1
8 2720
gr***@kcls.org wrote:
Is there a way or a program (for windows) that can translate foreign
characters inot the proper html code? I have a word document with many
different characters and I really don't want to spend all the time
editing it with all the html code ie "&#347ci&#261". Certainly someone
must have a program that can do this automatically.


Just save it with a UTF-8 character encoding. Then you don't need to specify
UTF-8 characters with character references.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Jul 24 '05 #2
gr***@kcls.org wrote:
Is there a way or a program (for windows) that can translate foreign
characters inot the proper html code? I have a word document with many
different characters and I really don't want to spend all the time
editing it with all the html code ie "&#347ci&#261". Certainly someone
must have a program that can do this automatically.


You need to learn Unicode.
http://lachy.id.au/log/2004/12/guide-to-unicode-part-1

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Jul 24 '05 #3
gr***@kcls.org wrote:
Is there a way or a program (for windows) that can translate foreign
characters inot the proper html code? I have a word document with many
different characters and I really don't want to spend all the time
editing it with all the html code ie "&#347ci&#261".


That's fine, because it wouldn't work anyway. You forgot the semicolons.
Jul 24 '05 #4
Harlan Messinger <hm*******************@comcast.net> wrote:
gr***@kcls.org wrote:
Is there a way or a program (for windows) that can translate
foreign characters inot the proper html code? I have a word
document with many different characters and I really don't want
to spend all the time editing it with all the html code ie
"&#347ci&#261".


That's fine, because it wouldn't work anyway. You forgot the
semicolons.


It _will_ work (for HTML, not XHTML) since the parsing for the numeric
character reference will end with the first non-digit, in this case
"c".

--
David Håsäther
Jul 24 '05 #5
On Mon, 27 Jun 2005, David Håsäther wrote:

[ "&#347ci&#261". ]
It _will_ work (for HTML, not XHTML) since the parsing for the numeric
character reference will end with the first non-digit, in this case
"c".


No. Some client agents may choose that fixup, but I challenge you to
produce any authoritative specification which requires it.
Jul 24 '05 #6
Alan J. Flavell <fl*****@ph.gla.ac.uk> wrote:
On Mon, 27 Jun 2005, David Håsäther wrote:

[ "&#347ci&#261". ]
It _will_ work (for HTML, not XHTML) since the parsing for the
numeric character reference will end with the first non-digit, in
this case "c".


No. Some client agents may choose that fixup, but I challenge you
to produce any authoritative specification which requires it.


The SGML Handbook (353:3) says (emphasis mine)

| The refc or RE can be omitted only if the reference is _not followed
| by a character that could occur in the reference_, or by a character
| that could be interpreted as the omitted reference end.

Looking at the production for a numeric character reference[62.2] (this
production was introduced in the Web SGML Adaptations TC) we see that
after CRO it's followed by "character number" (which is one or more
digit), i.e., nothing other than digits can occur in a numeric
character reference, and therefore REFC can be omitted.

Try it in nsgmls too.

[62.2] numeric character reference =
cro, character number, reference end

--
David Håsäther
Jul 24 '05 #7
On Mon, 27 Jun 2005, David Håsäther wrote:
Alan J. Flavell <fl*****@ph.gla.ac.uk> wrote:
On Mon, 27 Jun 2005, David Håsäther wrote:

[ "&#347ci&#261". ]
It _will_ work (for HTML, not XHTML) since the parsing for the
numeric character reference will end with the first non-digit, in
this case "c".


No. Some client agents may choose that fixup, but I challenge you
to produce any authoritative specification which requires it.


The SGML Handbook (353:3) says (emphasis mine)


[...]

OK, I have to admit that this SGML detail was not known to me, and it
appears you are correct about that.

However, the HTML specification (parts displayed in black)
does not include this option - see

http://www.w3.org/TR/html4/charset.html#h-5.3

Indeed there is a note (in green) stating:

Note. In SGML, it is possible to eliminate the final ";" after a
character reference in some cases (e.g., at a line break or
immediately before a tag). In other circumstances it may not be
eliminated (e.g., in the middle of a word). We strongly suggest using
the ";" in all cases to avoid problems with user agents that require
this character to be present.

And all of their recipes and examples include the terminating
semicolon.

So although I concede that you may be technically correct on the SGML
front, I think it's overstating the case to claim that it's true for
HTML.

(Modulo the usual arguments about "the HTML specification purports to
exclude some features of SGML which SGML does not permit to be
excluded.)

thanks.
Jul 24 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Smash | last post by:
i have this function: ------------------------------------------------------------ function isAlfaNumeric(vnos,space) { if (space==false) { validRegExp = /^{0,}$/; } else { validRegExp =...
9
by: Mangesh | last post by:
hi, I am using HTTPWebrequest object to download google results. in the response stream I am not getting some foreign characters eg. If I search "signo de pregunta", all the spanish characters are...
1
by: dalei | last post by:
I like to make foreign letters to appear in the textarea. For instance, when typing the letter 'a' on the keyboard, the Japanese letter &#+12449; would appear in the textarea. Could somebody...
23
by: gregf | last post by:
I have a paragraph of text pasted into a word document, it's in Polish, complete with polish characters. They show up just fine in word, but the program I use for web page programming, HomeSite,...
13
by: Xah Lee | last post by:
the Journey of Foreign Characters thru Internet Xah Lee, 20051101 There's a bunch of confusions about the display of non-ascii characters such as the bullet "•". These confusions are...
2
by: Harley | last post by:
im working on an ASP.NET app in VB.NET and have problems with foreign characters. everything works ok, special characters are presented ok onscreen when typed in the body of the page, using html...
7
by: alnoir | last post by:
I'm working on this script that grabs a web page from a foreign site, searches it for specific information, and grabs web pages from links on the original page. Once I had it working, I tried it out...
3
by: MitchellEr | last post by:
I can't seem to get consistency in my application with foreign character handling. I'm creating a series of forms that update database tables. So, when trying to edit a form, the field values that...
5
by: Matt | last post by:
I originally posted this in microsoft.public.sqlserver.server, and it was suggested that I post here. I'm having problems with searches via a classic ASP front-end of terms including foreign...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.