By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,396 Members | 1,869 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,396 IT Pros & Developers. It's quick & easy.

Notepad and UTF-8

P: n/a
Okay my web site grew up and is moving to a non-Windows server, Unix. I am
converting my static HTML/CSS files to Drupal content management system. The
leading white spaces I use to indent text for easy editing are not collapsed
by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
environment and ran a sed command to strip whitespace.

When I opened the files in Notepad they were all on one line each. So I
tried copying them from Microsoft FrontPage where they looked okay in HTML
view and pasting them into Notepad then saving over the HTML file. I most
definitely and carefully chose save as UTF-8 from the list of options
offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?

Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Mar 8 '08 #1
Share this Question
Share on Google+
6 Replies


P: n/a
On 2008-03-08, The Bicycling Guitarist <Ch***@TheBicyclingGuitarist.netwrote:
Okay my web site grew up and is moving to a non-Windows server, Unix. I am
converting my static HTML/CSS files to Drupal content management system. The
leading white spaces I use to indent text for easy editing are not collapsed
by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
environment and ran a sed command to strip whitespace.

When I opened the files in Notepad they were all on one line each. So I
tried copying them from Microsoft FrontPage where they looked okay in HTML
view and pasting them into Notepad then saving over the HTML file. I most
definitely and carefully chose save as UTF-8 from the list of options
offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?
Not sure what you mean by ANSI. Everything appeared on one line probably
because cygwin sed put Unix line separators (just CR, not CRLF) at the
ends of the lines. You can configure cygwin somehow not to do that, I
think on a per-filesystem basis.

Most editors even on Windows will sort of half-work with just CR, which
is probably why it looked OK in FrontPage but not in Notepad.
Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Just don't use Notepad or FrontPage. It could have been the copy and
pasting from FrontPage that messed up the UTF-8.

You could try to set up cygwin to use DOS line endings, or just stick to
Unix line endings. But then you need to be careful because some Windows
editors may open the file silently and apparently OK with the Unix line
endings, but then save DOS line endings on the one or two lines you edit
leaving you with an inconsistent mixture. Without any decent tools it's
often hard to know what you've actually ended up with or why things are
going wrong.
Mar 8 '08 #2

P: n/a
On Thu, 13 Mar 2008, Ben C wrote:
Better to use a Content-Language header and/or set the lang attribute on
the html element to tell the browser the language so it can use that as
a hint to pick a font.
But that does not work in Internet Explorer. It works in Mozilla & Co.
http://www.unics.uni-hannover.de/nht...-attribute.htm
How about others like Opera?

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell
Mar 13 '08 #3

P: n/a
On 2008-03-13, Andreas Prilop <ap*********@trashmail.netwrote:
On Thu, 13 Mar 2008, Ben C wrote:
>Better to use a Content-Language header and/or set the lang attribute on
the html element to tell the browser the language so it can use that as
a hint to pick a font.

But that does not work in Internet Explorer.
I didn't know that. It doesn't surprise me though.
It works in Mozilla & Co.
http://www.unics.uni-hannover.de/nht...-attribute.htm
How about others like Opera?
In that test everything gets the same font. I think what Opera does,
but this is just a guess, is choose a font based on the actual
characters.

Although I don't know how they tell the difference between zh-tw and
zh-cn (languages and codepoints very similar but you need different
fonts-- simplified characters for zh-cn and traditional ones for zh-tw).
Mar 13 '08 #4

P: n/a
On Thu, 13 Mar 2008, Ben C wrote:
>http://www.unics.uni-hannover.de/nht...-attribute.htm

In that test everything gets the same font. I think what Opera does,
but this is just a guess, is choose a font based on the actual
characters.
If that is true, you should be able to see different fonts for
Latin letters and Greek letters on
http://www.unics.uni-hannover.de/nhtcapri/greek.html7
and different fonts for Latin letters and Cyrillic letters on
http://www.unics.uni-hannover.de/nht...cyrillic.html5

But I doubt. I believe Opera uses only one font for each of
these two test pages.
Although I don't know how they tell the difference between zh-tw and
zh-cn (languages and codepoints very similar but you need different
fonts-- simplified characters for zh-cn and traditional ones for zh-tw).
But how to do this with "charset=utf-8"? The codepoints in Unicode
are the same for CN and TW and JP.

--
Solipsists of the world - unite!
Mar 14 '08 #5

P: n/a
On 2008-03-14, Andreas Prilop <ap*********@trashmail.netwrote:
On Thu, 13 Mar 2008, Ben C wrote:
>>http://www.unics.uni-hannover.de/nht...-attribute.htm

In that test everything gets the same font. I think what Opera does,
but this is just a guess, is choose a font based on the actual
characters.

If that is true, you should be able to see different fonts for
Latin letters and Greek letters on
http://www.unics.uni-hannover.de/nhtcapri/greek.html7
and different fonts for Latin letters and Cyrillic letters on
http://www.unics.uni-hannover.de/nht...cyrillic.html5

But I doubt. I believe Opera uses only one font for each of
these two test pages.
Probably. I don't know what it does.
>Although I don't know how they tell the difference between zh-tw and
zh-cn (languages and codepoints very similar but you need different
fonts-- simplified characters for zh-cn and traditional ones for zh-tw).

But how to do this with "charset=utf-8"? The codepoints in Unicode
are the same for CN and TW and JP.
Exactly, that was my point.
Mar 14 '08 #6

P: n/a
Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Check out Notepad2 and Notepad++
Mar 15 '08 #7

This discussion thread is closed

Replies have been disabled for this discussion.