473,772 Members | 2,965 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode line endings

After switching text editors, my code started causing mysterious PHP
errors. I narrowed the problem down to the Unicode line endings I
started using with the new text editor: when I save documents using
unicode line endings, PHP no longer registers the line endings, meaning
that:

<?php

echo "Hello World!";

?>

registers as:

<?phpecho "Hello World!";?>

I've verified that I'm using correct unicode line endings. PHP accepts
all files without problem when they are saved using Unix/DOS line
endings, but unicode line endings really seem to confuse it.

Does anyone know what could be causing this? Are there any known fixes?

Thanks for any help - I'm pulling hair out over this one.

Jun 21 '06 #1
5 4492
*** jdbartlett escribió/wrote (21 Jun 2006 14:00:23 -0700):
After switching text editors, my code started causing mysterious PHP
errors. I narrowed the problem down to the Unicode line endings registers as:

<?phpecho "Hello World!";?>


We'd need to know two things: what's what you call "unicode line endings"
and what do you mean with "register". ..

Anyway, I'd say you're using a Mac to edit files and you upload them using
FTP in binary mode. Try ascii mode instead.

--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Jun 21 '06 #2
Thanks for the response. I am using a Mac, but I'm not uploading files
at all, just saving them and then using command line PHP to execute
them.

The text editor I'm using is called TextWrangler from BareBones
software. According to the TextWrangler manual, Unicode has its own
standard for line endings (page 36, second para). In the 'line ending'
menu, TextWrangler offers 4 options (Unicode in addition to Unix,
Macintosh and WIN/DOS). I have selected "Unicode". PHP recognizes Unix,
Mac and WIN/DOS line endings just fine, but seems to have trouble
recognizing these "Unicode" line endings when no other apps do.

Alvaro G. Vicario wrote:
We'd need to know two things: what's what you call "unicode line endings"
and what do you mean with "register". ..

Anyway, I'd say you're using a Mac to edit files and you upload them using
FTP in binary mode. Try ascii mode instead.


Jun 21 '06 #3
*** jdbartlett escribió/wrote (21 Jun 2006 14:48:21 -0700):
Thanks for the response. I am using a Mac, but I'm not uploading files
at all, just saving them and then using command line PHP to execute
them.

The text editor I'm using is called TextWrangler from BareBones
software. According to the TextWrangler manual, Unicode has its own
standard for line endings (page 36, second para). In the 'line ending'
menu, TextWrangler offers 4 options (Unicode in addition to Unix,
Macintosh and WIN/DOS).


Oh my... No matter how much I learn about web development, there's always
more :)

http://en.wikipedia.org/wiki/Newline

Sorry, I couldn't find any references about PHP so my best educated guess
is that it isn't supported :-?
--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Jun 21 '06 #4
I e-mailed BareBones, and they informed me they are using 0x2029 for
Unicode line endings. They also recommended against using Unicode line
endings for web content and everything else unless there is a specific
need.

With that in mind, I'm switching to UTF-8 encoding with Unix line
endings.

Thanks again!

Alvaro G. Vicario wrote:
*** jdbartlett escribió/wrote (21 Jun 2006 14:48:21 -0700):
Thanks for the response. I am using a Mac, but I'm not uploading files
at all, just saving them and then using command line PHP to execute
them.

The text editor I'm using is called TextWrangler from BareBones
software. According to the TextWrangler manual, Unicode has its own
standard for line endings (page 36, second para). In the 'line ending'
menu, TextWrangler offers 4 options (Unicode in addition to Unix,
Macintosh and WIN/DOS).


Oh my... No matter how much I learn about web development, there's always
more :)

http://en.wikipedia.org/wiki/Newline

Sorry, I couldn't find any references about PHP so my best educated guess
is that it isn't supported :-?
--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--


Jun 21 '06 #5
jdbartlett (jd*****@gmail. com) wrote:
: I e-mailed BareBones, and they informed me they are using 0x2029 for
: Unicode line endings. They also recommended against using Unicode line
: endings for web content and everything else unless there is a specific
: need.

: With that in mind, I'm switching to UTF-8 encoding with Unix line
: endings.
Google can tell you about unicode line ending. Basically the character
0x85 is called "NEL" - Newline character, plus there is 0x2029 called
Paragraph separator, and 0x2028 called Line separator (probably what
BareBones meant to tell you, not 0x2029). Unicode suggests that about
eight (?) characters be recognized as denoting new lines, including the
normal things like carriage-return, plus the NEL LS PS things, plus ones
like form-feed.

The 0x85 character in the default dos codepage is "a grave", which is the
letter "a" with an accent somewhat like \ only smaller and on top.

However 0x85 in my default windows codepage is three dots in a row, like
"..." only fitting into a single character.

If you use utf-8 then 0x85 requires two bytes, so it isn't even a single
"character" for any older software.

PS and LS can't be included directly as themselves at all in a byte stream
since they are bigger than a byte, so they will always under go some kind
of (posssible mis) interpretation. In utf-8 I assume they take three
bytes though I havnen't checked.

It seems to me that the whole thing is a bit problematical, rather like
using a word processor to do your coding - it can be done but do you
really need the headaches?

The key thing is that a programmer is not writing "text" at all - these
are not english essays to be read to your friends - in fact you are laying
out a carefully arranged set of bytes that the compiler can understand.
The compiler accepts things that look a lot like text to make it practical
for a programmer to work with, but it's not text at all, it's a
communication protocol between you and the compiler.
google: unicode line ending

gives all sorts of interesting details.
Jun 22 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
10413
by: ajikoe | last post by:
Hello, I use windows notepad editor to write text. For example I write (in d:\myfile.txt): Helo World If I open it with python: FName = open(d:\myfile.txt,'r')
24
2852
by: chri_schiller | last post by:
I have a home-made website that provides a free 1100 page physics textbook. It is written in html and css. I recently added some chinese text, and since that day there are problems. The entry page has two chinese characters, but these are not seen on all browsers, even though the page is validated by the w3c validator. ( http://www.motionmountain.net/welcome.html)
5
4078
by: Matthew Thompson | last post by:
I have as issue I am finding hard to research. I use a stored proecdure in SQL 2000 to provide search capability for our database of news stories and articles. Being an international magazine publisher we use foreign characters extensively. When searching for words (I am using Full Text Indexing and using the CONTAINSTABLE method) with accented characters such as Møller (Second character is Alt+0248) the form receives back Møller
4
3133
by: Fuzzyman | last post by:
Hello all, I'm handling some text files where I don't (necessarily) know the encoding beforehand. Because I use regular expressions to parse the text I *must* decode UTF16 encoded text (otherwise the regexes split on byte boundaries). I can recognise UTF8 and BOM and remove (but not necessarily decode). For UTF16 it seems that the Python codec will automatically remove the BOM. Having detected it (to trigger a decode) is it considered
18
9577
by: Fuzzyman | last post by:
Hello all, I'm trying to detect line endings used in text files. I *might* be decoding the files into unicode first (which may be encoded using multi-byte encodings) - which is why I'm not letting Python handle the line endings. Is the following safe and sane : text = open('test.txt', 'rb').read()
8
2357
by: Richard Schulman | last post by:
The following program fragment works correctly with an ascii input file. But the file I actually want to process is Unicode (utf-16 encoding). The file must be Unicode rather than ASCII or Latin-1 because it contains mixed Chinese and English characters. When I run the program below I get an attribute_count of zero, which is incorrect for the input file, which should give a value of fifteen or sixteen. In other words, the count...
1
7610
by: jandhondt | last post by:
IN Visual Studio 2005 with VB.NET when I open a solution I often get this warning: The line endings in the following file are not consistent. Do you want to normalize the line endings? The warning occurs on an inherited form. My solution is under Source control with Visual Sourcesafe. No matter if I answer Yes or no, the next time it will still ask this. Does anyone know how to avoid this?
5
2379
by: fidtz | last post by:
The code: import codecs udlASCII = file("c:\\temp\\CSVDB.udl",'r') udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16") udlUNI.write(udlASCII.read()) udlUNI.close()
3
4224
by: towers | last post by:
Hi I'm probably doing something stupid but I've run into a problem whereby I'm trying to add a csv file to a zip archive - see example code below. The csv just has several rows with carriage return line feeds (CRLF). However after adding it to an archive and then decompressing the line endings have been converted to just line feeds (LF).
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10261
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10038
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8934
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7460
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6715
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5354
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2850
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.