473,325 Members | 2,860 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

Notepad and UTF-8

Okay my web site grew up and is moving to a non-Windows server, Unix. I am
converting my static HTML/CSS files to Drupal content management system. The
leading white spaces I use to indent text for easy editing are not collapsed
by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
environment and ran a sed command to strip whitespace.

When I opened the files in Notepad they were all on one line each. So I
tried copying them from Microsoft FrontPage where they looked okay in HTML
view and pasting them into Notepad then saving over the HTML file. I most
definitely and carefully chose save as UTF-8 from the list of options
offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?

Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Mar 8 '08 #1
6 6157
On 2008-03-08, The Bicycling Guitarist <Ch***@TheBicyclingGuitarist.netwrote:
Okay my web site grew up and is moving to a non-Windows server, Unix. I am
converting my static HTML/CSS files to Drupal content management system. The
leading white spaces I use to indent text for easy editing are not collapsed
by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
environment and ran a sed command to strip whitespace.

When I opened the files in Notepad they were all on one line each. So I
tried copying them from Microsoft FrontPage where they looked okay in HTML
view and pasting them into Notepad then saving over the HTML file. I most
definitely and carefully chose save as UTF-8 from the list of options
offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?
Not sure what you mean by ANSI. Everything appeared on one line probably
because cygwin sed put Unix line separators (just CR, not CRLF) at the
ends of the lines. You can configure cygwin somehow not to do that, I
think on a per-filesystem basis.

Most editors even on Windows will sort of half-work with just CR, which
is probably why it looked OK in FrontPage but not in Notepad.
Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Just don't use Notepad or FrontPage. It could have been the copy and
pasting from FrontPage that messed up the UTF-8.

You could try to set up cygwin to use DOS line endings, or just stick to
Unix line endings. But then you need to be careful because some Windows
editors may open the file silently and apparently OK with the Unix line
endings, but then save DOS line endings on the one or two lines you edit
leaving you with an inconsistent mixture. Without any decent tools it's
often hard to know what you've actually ended up with or why things are
going wrong.
Mar 8 '08 #2
On Thu, 13 Mar 2008, Ben C wrote:
Better to use a Content-Language header and/or set the lang attribute on
the html element to tell the browser the language so it can use that as
a hint to pick a font.
But that does not work in Internet Explorer. It works in Mozilla & Co.
http://www.unics.uni-hannover.de/nht...-attribute.htm
How about others like Opera?

--
In memoriam Alan J. Flavell
http://groups.google.com/groups/sear...Alan.J.Flavell
Mar 13 '08 #3
On 2008-03-13, Andreas Prilop <ap*********@trashmail.netwrote:
On Thu, 13 Mar 2008, Ben C wrote:
>Better to use a Content-Language header and/or set the lang attribute on
the html element to tell the browser the language so it can use that as
a hint to pick a font.

But that does not work in Internet Explorer.
I didn't know that. It doesn't surprise me though.
It works in Mozilla & Co.
http://www.unics.uni-hannover.de/nht...-attribute.htm
How about others like Opera?
In that test everything gets the same font. I think what Opera does,
but this is just a guess, is choose a font based on the actual
characters.

Although I don't know how they tell the difference between zh-tw and
zh-cn (languages and codepoints very similar but you need different
fonts-- simplified characters for zh-cn and traditional ones for zh-tw).
Mar 13 '08 #4
On Thu, 13 Mar 2008, Ben C wrote:
>http://www.unics.uni-hannover.de/nht...-attribute.htm

In that test everything gets the same font. I think what Opera does,
but this is just a guess, is choose a font based on the actual
characters.
If that is true, you should be able to see different fonts for
Latin letters and Greek letters on
http://www.unics.uni-hannover.de/nhtcapri/greek.html7
and different fonts for Latin letters and Cyrillic letters on
http://www.unics.uni-hannover.de/nht...cyrillic.html5

But I doubt. I believe Opera uses only one font for each of
these two test pages.
Although I don't know how they tell the difference between zh-tw and
zh-cn (languages and codepoints very similar but you need different
fonts-- simplified characters for zh-cn and traditional ones for zh-tw).
But how to do this with "charset=utf-8"? The codepoints in Unicode
are the same for CN and TW and JP.

--
Solipsists of the world - unite!
Mar 14 '08 #5
On 2008-03-14, Andreas Prilop <ap*********@trashmail.netwrote:
On Thu, 13 Mar 2008, Ben C wrote:
>>http://www.unics.uni-hannover.de/nht...-attribute.htm

In that test everything gets the same font. I think what Opera does,
but this is just a guess, is choose a font based on the actual
characters.

If that is true, you should be able to see different fonts for
Latin letters and Greek letters on
http://www.unics.uni-hannover.de/nhtcapri/greek.html7
and different fonts for Latin letters and Cyrillic letters on
http://www.unics.uni-hannover.de/nht...cyrillic.html5

But I doubt. I believe Opera uses only one font for each of
these two test pages.
Probably. I don't know what it does.
>Although I don't know how they tell the difference between zh-tw and
zh-cn (languages and codepoints very similar but you need different
fonts-- simplified characters for zh-cn and traditional ones for zh-tw).

But how to do this with "charset=utf-8"? The codepoints in Unicode
are the same for CN and TW and JP.
Exactly, that was my point.
Mar 14 '08 #6
Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Check out Notepad2 and Notepad++
Mar 15 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Jesper | last post by:
How can I open a textfile from C# using notepad (or the user assigned application for this).
2
by: Ziver MALHASOGLU | last post by:
Hi, I produce a text file using my windows application written with c#. -- System.Text.Encoding encOutput=null; encOutput=System.Text.Encoding.UTF8; StreamWriter sw=new...
1
by: jj | last post by:
How do I programatically default the encoding of my "Notepad" into ANSI. My application uses notepad to save some text. Some of the client computers have UTF as the default encoding in their...
9
by: Sandy | last post by:
can mfc application, send text data to opened notepad file in desktop?(live transfer of data) . can anybody help
2
by: andreas | last post by:
hi, In windows xp in the start launch menu when i put notepad "c:\test.txt" i get notepad with test.txt in it. in vb.net when i state system.diagnostics.process.start("notepad.exe" i get...
1
by: Lasse Vågsæther Karlsen | last post by:
I found the excellent article by Paulo Reichert at http://blogs.conchango.com/pauloreichert/archive/2005/05/21/1459.aspx, which shows how to build a code generator for Visual Studio 2005. I followed...
9
by: =?Utf-8?B?Vmlua2k=?= | last post by:
Hello Everyone, I have this code for sendKeys. This simply sends a text to the notepad. This method runs fine, but I don't see the notepad and the text entered in that notepad. Is there any way...
3
by: Learning.Net | last post by:
How to read a Unicode data saved as ASCII in notepad file as txt ? I tried using streamReader but it is not showing Unicode data. eg. using (StreamReader sr = new StreamReader(test.txt) {...
36
by: Don | last post by:
I wrote an app that alerts a user who attempts to open a file that the file is currently in use. It works fine except when the file is opened by Notepad. If a text file is opened, most computers...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.