By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,967 Members | 2,124 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,967 IT Pros & Developers. It's quick & easy.

Unicode line endings

P: n/a
After switching text editors, my code started causing mysterious PHP
errors. I narrowed the problem down to the Unicode line endings I
started using with the new text editor: when I save documents using
unicode line endings, PHP no longer registers the line endings, meaning
that:

<?php

echo "Hello World!";

?>

registers as:

<?phpecho "Hello World!";?>

I've verified that I'm using correct unicode line endings. PHP accepts
all files without problem when they are saved using Unix/DOS line
endings, but unicode line endings really seem to confuse it.

Does anyone know what could be causing this? Are there any known fixes?

Thanks for any help - I'm pulling hair out over this one.

Jun 21 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
*** jdbartlett escribió/wrote (21 Jun 2006 14:00:23 -0700):
After switching text editors, my code started causing mysterious PHP
errors. I narrowed the problem down to the Unicode line endings registers as:

<?phpecho "Hello World!";?>


We'd need to know two things: what's what you call "unicode line endings"
and what do you mean with "register"...

Anyway, I'd say you're using a Mac to edit files and you upload them using
FTP in binary mode. Try ascii mode instead.

--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Jun 21 '06 #2

P: n/a
Thanks for the response. I am using a Mac, but I'm not uploading files
at all, just saving them and then using command line PHP to execute
them.

The text editor I'm using is called TextWrangler from BareBones
software. According to the TextWrangler manual, Unicode has its own
standard for line endings (page 36, second para). In the 'line ending'
menu, TextWrangler offers 4 options (Unicode in addition to Unix,
Macintosh and WIN/DOS). I have selected "Unicode". PHP recognizes Unix,
Mac and WIN/DOS line endings just fine, but seems to have trouble
recognizing these "Unicode" line endings when no other apps do.

Alvaro G. Vicario wrote:
We'd need to know two things: what's what you call "unicode line endings"
and what do you mean with "register"...

Anyway, I'd say you're using a Mac to edit files and you upload them using
FTP in binary mode. Try ascii mode instead.


Jun 21 '06 #3

P: n/a
*** jdbartlett escribió/wrote (21 Jun 2006 14:48:21 -0700):
Thanks for the response. I am using a Mac, but I'm not uploading files
at all, just saving them and then using command line PHP to execute
them.

The text editor I'm using is called TextWrangler from BareBones
software. According to the TextWrangler manual, Unicode has its own
standard for line endings (page 36, second para). In the 'line ending'
menu, TextWrangler offers 4 options (Unicode in addition to Unix,
Macintosh and WIN/DOS).


Oh my... No matter how much I learn about web development, there's always
more :)

http://en.wikipedia.org/wiki/Newline

Sorry, I couldn't find any references about PHP so my best educated guess
is that it isn't supported :-?
--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Jun 21 '06 #4

P: n/a
I e-mailed BareBones, and they informed me they are using 0x2029 for
Unicode line endings. They also recommended against using Unicode line
endings for web content and everything else unless there is a specific
need.

With that in mind, I'm switching to UTF-8 encoding with Unix line
endings.

Thanks again!

Alvaro G. Vicario wrote:
*** jdbartlett escribió/wrote (21 Jun 2006 14:48:21 -0700):
Thanks for the response. I am using a Mac, but I'm not uploading files
at all, just saving them and then using command line PHP to execute
them.

The text editor I'm using is called TextWrangler from BareBones
software. According to the TextWrangler manual, Unicode has its own
standard for line endings (page 36, second para). In the 'line ending'
menu, TextWrangler offers 4 options (Unicode in addition to Unix,
Macintosh and WIN/DOS).


Oh my... No matter how much I learn about web development, there's always
more :)

http://en.wikipedia.org/wiki/Newline

Sorry, I couldn't find any references about PHP so my best educated guess
is that it isn't supported :-?
--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--


Jun 21 '06 #5

P: n/a
jdbartlett (jd*****@gmail.com) wrote:
: I e-mailed BareBones, and they informed me they are using 0x2029 for
: Unicode line endings. They also recommended against using Unicode line
: endings for web content and everything else unless there is a specific
: need.

: With that in mind, I'm switching to UTF-8 encoding with Unix line
: endings.
Google can tell you about unicode line ending. Basically the character
0x85 is called "NEL" - Newline character, plus there is 0x2029 called
Paragraph separator, and 0x2028 called Line separator (probably what
BareBones meant to tell you, not 0x2029). Unicode suggests that about
eight (?) characters be recognized as denoting new lines, including the
normal things like carriage-return, plus the NEL LS PS things, plus ones
like form-feed.

The 0x85 character in the default dos codepage is "a grave", which is the
letter "a" with an accent somewhat like \ only smaller and on top.

However 0x85 in my default windows codepage is three dots in a row, like
"..." only fitting into a single character.

If you use utf-8 then 0x85 requires two bytes, so it isn't even a single
"character" for any older software.

PS and LS can't be included directly as themselves at all in a byte stream
since they are bigger than a byte, so they will always under go some kind
of (posssible mis) interpretation. In utf-8 I assume they take three
bytes though I havnen't checked.

It seems to me that the whole thing is a bit problematical, rather like
using a word processor to do your coding - it can be done but do you
really need the headaches?

The key thing is that a programmer is not writing "text" at all - these
are not english essays to be read to your friends - in fact you are laying
out a carefully arranged set of bytes that the compiler can understand.
The compiler accepts things that look a lot like text to make it practical
for a programmer to work with, but it's not text at all, it's a
communication protocol between you and the compiler.
google: unicode line ending

gives all sorts of interesting details.
Jun 22 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.