On 2008-03-08, The Bicycling Guitarist <Ch***@TheBicyclingGuitarist.netwrote:
Okay my web site grew up and is moving to a non-Windows server, Unix. I am
converting my static HTML/CSS files to Drupal content management system. The
leading white spaces I use to indent text for easy editing are not collapsed
by Drupal, so I installed Cygwin on my Windows machine to simulate a Unix
environment and ran a sed command to strip whitespace.
When I opened the files in Notepad they were all on one line each. So I
tried copying them from Microsoft FrontPage where they looked okay in HTML
view and pasting them into Notepad then saving over the HTML file. I most
definitely and carefully chose save as UTF-8 from the list of options
offered by Notepad, but now all the files are ANSI instead of UTF-8. WTF?
Not sure what you mean by ANSI. Everything appeared on one line probably
because cygwin sed put Unix line separators (just CR, not CRLF) at the
ends of the lines. You can configure cygwin somehow not to do that, I
think on a per-filesystem basis.
Most editors even on Windows will sort of half-work with just CR, which
is probably why it looked OK in FrontPage but not in Notepad.
Please tell me there is an easier way... I need to
a) strip leading whitespace from the content of my html files and
b) save these files as UTF-8 and have them STAY UTF-8. Thanks
Just don't use Notepad or FrontPage. It could have been the copy and
pasting from FrontPage that messed up the UTF-8.
You could try to set up cygwin to use DOS line endings, or just stick to
Unix line endings. But then you need to be careful because some Windows
editors may open the file silently and apparently OK with the Unix line
endings, but then save DOS line endings on the one or two lines you edit
leaving you with an inconsistent mixture. Without any decent tools it's
often hard to know what you've actually ended up with or why things are
going wrong.