473,756 Members | 2,977 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Multiple coding systems, and filesystems

On some of my course pages, I quote (with attribution)
small sections of Wikipedia and the like. E.g, the top
of
http://en.wiktionary.org/wiki/entropy

has "entropia" in Greek font,

http://en.wikipedia.org/wiki/Goedel

has the o-umlaut from German, and

http://en.wikipedia.org/wiki/Origami

has a Japanese font. What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?

And can the HTML-page be set up so that it will validate?
=============== =============== =============== =======

Actually, I'm ahead of myself. In the past I've cut&pasted
a snippet from, say, wiki/entropy, into an Emacs buffer,
adjoined a "From Wictionary http://..." and attempted to
save the buffer. Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.

If I'm using multiple coding systems on the same webpage,
do I have to save the different snippets in different files
stored with different coding systems, and then

<!--#include ... -->

each of them into one webpage? Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?

FWIW, my home OS is MacOSX and I need to upload my webpages
to school. The math dept. server is probably running
Unix; when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.

Sincerely,
Prof. Jonathan King (gentsquash)
Mathematics dept, Univ. of Florida
Jun 27 '08 #1
4 1841
Tue, 3 Jun 2008 14:08:25 -0700 (PDT), /ge********@gmai l.com/:
Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?
Files generally store bytes. How these bytes will be interpreted is
up to the application reading them. Characters are encoded into
bytes using different coding schemes which generally are capable of
representing the characters of a specific character set. The
Unicode character set generally contains all possible characters so
if you use some UTF (Unicode Transformation Format) variant you can
have all characters you need encoded in a single entity. So make
sure your text editor supports reading/saving files using UTF-8, for
example.

--
Stanimir
Jun 27 '08 #2
Scripsit ge********@gmai l.com:
On some of my course pages, I quote (with attribution)
small sections of Wikipedia and the like. E.g, the top
of
http://en.wiktionary.org/wiki/entropy

has "entropia" in Greek font,
Technically, it has the word in Greek _characters_ (letters). This is
the key issue; fonts are secondary. The page has a style sheet that
makes special suggestions on the font of such words, in a most confusing
and tricky way.
What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?
The proper _character encoding_ is UTF-8 in such cases. As soon as you
have Japanese, Greek, and umlaut Latin letters on one page, that's
definitely the best option. If there were just a few "special"
characters, you could present them using entity references like &ouml;
or character references like ą, but this gets clumsy (or requires
suitable software for generating them) if you have full sentences that
consist of "special" characters.

It's not possible (in practice on web pages) to switch the character
encoding in the middle of an HTML document.
In the past I've cut&pasted
a snippet from, say, wiki/entropy, into an Emacs buffer,
adjoined a "From Wictionary http://..." and attempted to
save the buffer. Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.
UTF-8, if Emacs can really produce it. The version of Emacs I've been
using does not deal with "special" characters, but I recently looked at
the newest version of Emacs for Windows, and it seems to have an
impressive support to "special" characters.

Note that the server should be configured to send an appropriate HTTP
header. You normally do this by adding something to your .htaccess file,
and in practice you need to use the same encoding for all ".html" files
in a directory (folder), though you could use, for example, ISO-8859-1
for ".html" and UTF-8 for ".htm" files.
If I'm using multiple coding systems on the same webpage,
do I have to save the different snippets in different files
stored with different coding systems, and then

<!--#include ... -->

each of them into one webpage?
No, it won't work that way, even if your server supports SSI includes.
They result in a single document, which can have one encoding only. (I
won't mention <iframe>, because it's really a poor hack for things like
this, but it performs sort-of include where the included document is
displayed "autonomous ly" inside the main canvas and may have a different
encoding.)
FWIW, my home OS is MacOSX and I need to upload my webpages
to school. The math dept. server is probably running
Unix; when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.
A nice mess :-) but it should be manageable when using UTF-8. When
uploading with FTP, use binary (not Ascii) mode, since no character
conversion shall be performed - the data is already in a
system-independent encoding.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Jun 27 '08 #3
On Tue, 3 Jun 2008, ge********@gmai l.com wrote:
Greek
German
Japanese
What is the correct --maybe "coding
system" is the term?-- so that I could quote all three of
these on the same HTML page?
Use Unicode in the encoding ("charset") UTF-8:
http://www.unics.uni-hannover.de/nht...ilingual1.html
Sometimes Emacs asked me for what coding
system to use --and I don't know how to placate it.
Choose UTF-8 for the web.
Or can the file system
permit a file that simultaneously has Greek, German and
Japanese characters?
Yes - with Unicode.
when I manipulate the html files (when at work), I'm
using Emacs running on a Solaris (unix) system.
Either use a UTF-8 locale such as

export LC_ALL="en_US.U TF-8"
export LANG="en_US.UTF-8"

or write all non-ASCII characters as character references
&#number;
http://www.unics.uni-hannover.de/nht...ilingual2.html

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell
Jun 27 '08 #4
On Wed, 4 Jun 2008, Jukka K. Korpela wrote:
though you could use, for example, ISO-8859-1
for ".html" and UTF-8 for ".htm" files.
A better idea is to separate content-type and charset.
For example, use "utf8" for UTF-8 and "iso1" for ISO-8859-1.
On Apache, you can write into your .htaccess file:

Options +Multiviews
DefaultType text/html
AddCharset iso-8859-1 iso1
AddCharset utf-8 utf8

Name the files as "mypage.html.is o1" and "anotherpage.ht ml.utf8"
or simply as "mypage.iso 1" and "anotherpage.ut f8";
and don't forget "stylesheet.css .utf8".

In the URLs, omit ".iso1" and ".utf8" of course:

<a href="mypage.ht ml">
<a href="anotherpa ge.html">
/* One wonders if you need ISO-8859-1 at all
when you can have documents in UTF-8. */

--
Solipsists of the world - unite!
Jun 27 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
6234
by: garabik-news-2005-05 | last post by:
Hi all, I am trying to port my (linux) program to MacOSX, and I need to get a list of mounted filesystems. Under linux, it was easy, I was parsing /etc/mtab (or /proc/mounts), this works also on some other unices. But I have no idea how to do it on MacOSX, apart from calling "mount" as an external program and parsing the output - but I prefer "cleaner" solutions. Is there one? And when we are at this, how to find out a list of mounted...
24
3647
by: sureshjayaram | last post by:
In some functions where i need to return multiple error codes at multiple places, I use multiple return statements. Say for ex. if (Found == 1) { if (val == -1) return error1; } else { if (val2 == -1)
6
2811
by: Cable | last post by:
Hello, I am hoping that someone can answer a question or two regarding file access. I have created an app that reads an image from a file then displays it (using OpenGL). It works well using fopen() with fgetc() to access each byte. I have decided to move further with this app and allow the user to select the first file of an image sequence and it will play the sequence back at at 24 frames per second. I have almost everything...
27
1831
by: Smithers | last post by:
Until now I have worked on small teams (1-3 developers) and we've been able to stay out of each others way. Now I'm about to start work on a project that will have 5 developers. I would appreciate some guidance on how we can proceed to develop without having to worry about "who is working on what files". We're developing with SQL Server 2005, VS 2005 Pro (no way management is going to spring for the $10,000 team edition for everybody),...
7
4963
by: Robert Seacord | last post by:
The CERT/CC has just deployed a new web site dedicated to developing secure coding standards for the C programming language, C++, and eventually other programming language. We have already developed significant content for the C programming language that is available at: https://www.securecoding.cert.org/ by clicking on the "CERT C Programming Language Secure Coding Standard"
27
2087
by: kvnsmnsn | last post by:
I've written a piece of code that interfaces with Postgres. It needs to write a Postgres table to disk, which it does with the <COPYcom- mand. That command requires the absolute file name of the file being written to. Right now I've got it hard coded to the exact location where I want it, but that's not very portable. Is there some way in C to retrieve the absolute path name of the current directory, so that I could use that in my code...
3
4713
by: Search & You Will Find | last post by:
I have a database in Access 2000 that I need some help on. I have three tables: PROJECTS, SYSTEMS, & SYSTEMSREF. They possess the following fields: ----------------------------------- PROJECTS project_pk (autonumber) project_name
4
1310
by: Tommy | last post by:
I want to make a program to minitor the usages of all of the file systems on my AIX or Linux systems. But I will not use shell commands like "df -k" for some FILE SYSTEMS maybe not mounted and "df -k" will not collect info about that FILE SYSTEM ,so I want to obtain theses info from system calls or lib functions . First I want to list all the FILE SYSTEMS on my OS ,which c function should I use ? Thanks!
0
9275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10040
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9713
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8713
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7248
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6534
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5142
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5304
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2666
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.