By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,916 Members | 1,286 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,916 IT Pros & Developers. It's quick & easy.

string length and newlines

P: n/a
Rob
I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob
Jan 10 '08 #1
Share this Question
Share on Google+
14 Replies


P: n/a
On Jan 10, 8:38*am, Rob <rdenn...@triactive.comwrote:
I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. *Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. *If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. *Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob
The most compact code I can think of would be:

var num_newlines = some_string.split("\n").length - 1;
var augmented_length = some_string.length + num_newlines;

-Joey
Jan 10 '08 #2

P: n/a
On 1/10/2008 11:47 AM, jh******@gmail.com wrote:
On Jan 10, 8:38 am, Rob <rdenn...@triactive.comwrote:
>I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob

The most compact code I can think of would be:

var num_newlines = some_string.split("\n").length - 1;
var augmented_length = some_string.length + num_newlines;

-Joey
some_string.replace(/\n/g/, ' ');

will replace the newlines in the string with a single space

--
anthony at my pet programmer dot com
Jan 10 '08 #3

P: n/a
In comp.lang.javascript message <fm**********@registered.motzarella.org>
, Thu, 10 Jan 2008 12:26:44, Anthony Levensalor
<ki******@mypetprogrammer.composted:
>On 1/10/2008 11:47 AM, jh******@gmail.com wrote:
>On Jan 10, 8:38 am, Rob <rdenn...@triactive.comwrote:
>>I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob
The most compact code I can think of would be:
var num_newlines = some_string.split("\n").length - 1;
var augmented_length = some_string.length + num_newlines;
That requires creating an Object for each line, which could be a little
slow.
>some_string.replace(/\n/g/, ' ');

will replace the newlines in the string with a single space
If the third slash is first removed.

If the line separation always contains a match to \n, then

X = some_string.replace(/[^\n]/g, "")

should give a count of new lines, *possibly* quicker. Untested.

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/- FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "" (SonOfRFC1036)
Jan 10 '08 #4

P: n/a
On 1/10/2008 6:27 PM, Dr J R Stockton wrote:
>some_string.replace(/\n/g/, ' ');

will replace the newlines in the string with a single space

If the third slash is first removed.
D'oh! Thanks for the catch
If the line separation always contains a match to \n, then

X = some_string.replace(/[^\n]/g, "")

should give a count of new lines, *possibly* quicker. Untested.
Actually, I'm betting it's a ton quicker, but I am waaaaaay too busy to
test that at the moment.

~A!
--
anthony at my pet programmer dot com
Jan 11 '08 #5

P: n/a
On Jan 10, 11:38*am, Rob <rdenn...@triactive.comwrote:
I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. *Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. *If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. *Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob
What happens if you add a maxlength attribute to the textarea? If it
still allows the wrong number of characters, strip out the /r's on the
server before storing the data.
Jan 11 '08 #6

P: n/a
Rob wrote:
[...] Currently, I am just using str.length, but if the textarea contains
newlines, str.length is inaccurate.
It isn't.
If I view the string, I will see something like '123\n456', but when this
gets passed back to the server, '\n' will be changed to '\r\n' and what
had a length of 7 characters now has a length of 8. Is the best way to
approach this to search the string for newlines and add 1 to the length
count
No.
or is there a simpler way?
Your problem is server-side, not client-side. And since you can't expect
consistent results from the client, you should replace all \r and \r\n with
\n server-side before, as I suppose, storing it in the database.
PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
Jan 11 '08 #7

P: n/a
David Mark wrote:
On Jan 13, 6:11*pm, Bart Van der Donck <b...@nijlen.comwrote:
>It is the browser itself who silently converts \n (or \r) into
\r\n, before the data is sent to the server. The script at the
server only reads out what was offered.

But the database should store in a predetermined canonical form,
regardless of what the browser says. *Whether that is \n, \n\r or \r
is up to the DBA.
You probably mean '\r\n' in stead of '\n\r'. I would say that it's
rather up to the operating system. I haven't seen a case where the DBA
interferes with these OS settings when it comes to _storing_ data.

From http://en.wikipedia.org/wiki/Newline :
\r: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS
X, etc.), BeOS, Amiga, RISC OS, and others
\r\n: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/
M, DOS, OS/2, Microsoft Windows
\n: Commodore machines, Apple II family and Mac OS up to version 9
>http://www.rfc-editor.org/EOLstory.txtsays:
| ASCII text (ed.: like percent-encoded form-data) transmitted across
| the network *must* use the two-character sequence: CR LF (ed.: \r
\n).
>I don't agree with your suggestion to store end-of-line characters as
\n by force; I would always store \r\n, as offered by the browser.

As offered by which browser? *As mentioned, some don't send \r\n.
When a browser doesn't send '\r\n', it violates RFC (see quotation
above from http://www.rfc-editor.org/EOLstory.txt). The word *must*
means:

| MUST This word, or the terms "REQUIRED" or "SHALL", mean that
the
| definition is an absolute requirement of the specification.

http://www.faqs.org/rfcs/rfc2119.html

One can safely conclude that a browser which doesn't send '\r\n' is a
bad browser.
>To calculate the length, I would use a regular expression to replace \r
\n by a single character.

Then how could you store what is offered by the browser?
The browser *must* offer '\r\n' anyhow, so in theory there can be no
discussion. It is the operating system which decides which newline-
character it uses internally. You are right that the stored data might
not be identical to the data that was offered by the browser regarding
line-ends. But this is not important for browsers, because any stored
line-end *must* be sent over the network again as '\r\n', no matter
how it was stored at server.

--
Bart
Jan 14 '08 #8

P: n/a
On Jan 14, 4:59*am, Bart Van der Donck <b...@nijlen.comwrote:
David Mark wrote:
On Jan 13, 6:11*pm, Bart Van der Donck <b...@nijlen.comwrote:
It is the browser itself who silently converts \n (or \r) into
\r\n, before the data is sent to the server. The script at the
server only reads out what was offered.
But the database should store in a predetermined canonical form,
regardless of what the browser says. *Whether that is \n, \n\r or \r
is up to the DBA.

You probably mean '\r\n' in stead of '\n\r'. I would say that it's
Yes. CRLF.
rather up to the operating system. I haven't seen a case where the DBA
interferes with these OS settings when it comes to _storing_ data.

Fromhttp://en.wikipedia.org/wiki/Newline:
\r: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS
X, etc.), BeOS, Amiga, RISC OS, and others
\r\n: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/
M, DOS, OS/2, Microsoft Windows
\n: Commodore machines, Apple II family and Mac OS up to version 9
>http://www.rfc-editor.org/EOLstory.txtsays:
| ASCII text (ed.: like percent-encoded form-data) transmitted across
| the network *must* use the two-character sequence: CR LF (ed.: \r
\n).
I don't agree with your suggestion to store end-of-line characters as
\n by force; I would always store \r\n, as offered by the browser.
As offered by which browser? *As mentioned, some don't send \r\n.

When a browser doesn't send '\r\n', it violates RFC (see quotation
above fromhttp://www.rfc-editor.org/EOLstory.txt). The word *must*
means:

* | MUST * This word, or the terms "REQUIRED" or "SHALL", mean that
the
* | definition is an absolute requirement of the specification.

http://www.faqs.org/rfcs/rfc2119.html

One can safely conclude that a browser which doesn't send '\r\n' is a
bad browser.
I re-read the OP as I thought it had implied that some browsers were
sending \n alone. If they all send \r\n and a text field is used in
the database (which would likely be the norm in this case), then you
are right on all counts.

The issue is only related to client-side validation. If the client
counts /n as one character, then it will disagree with the server side
validation. Your suggestion to convert two characters to one before
client-side validation doesn't seem to address the issue (though I may
be missing something.) It seems more logical to me to do the opposite
(you know it will be sent as two, so count it as two in the client.)
If the database stores it as one, there is no harm done.
Jan 14 '08 #9

P: n/a
David Mark wrote:
I re-read the OP as I thought it had implied that some browsers were
sending \n alone. If they all send \r\n and a text field is used in
the database (which would likely be the norm in this case), then you
are right on all counts.
I have a related question. Many of my webpages use simple flat files as
their "database" with one line added per transaction. This is fine until
the data to be stored comes from a TEXTAREA, because that can contain
embedded CRLF/CR/LF sequences which would screw up the lines in my file.

I've adopted the convention of converting CRLF or CR or LF into x'0102'
on the assumption that no one (certainly no one in their right mind)
will ever enter hex 01 or 02 characters into a text area. I'm curious to
know if anyone sees a problem with this; I've not encountered one in
many years of practice.

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk
Jan 15 '08 #10

P: n/a
Steve Swift wrote:
I have a related question. Many of my webpages use simple flat files as
their "database" with one line added per transaction. This is fine until
the data to be stored comes from a TEXTAREA, because that can contain
embedded CRLF/CR/LF sequences which would screw up the lines in my file.
Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC. This is an old and wide-spread convention; I
would be surprised to see any browser which would behave differently
(I would immediately send a bug report anyway).
I've adopted the convention of converting CRLF or CR or LF into x'0102'
on the assumption that no one (certainly no one in their right mind)
will ever enter hex 01 or 02 characters into a text area.
You should be pretty safe. MSIE, FF and Opera don't allow \x01 and
\x02 to be typed inside form elements; CTRL+A and CTRL+B are shortcuts
to browser functions.
I'm curious to know if anyone sees a problem with this; I've not
encountered one in many years of practice.
I think you have a robust solution. A good deal of the ASCII control
characters were actually meant for this purpose; you see them all the
time on older mainframe systems.

--
Bart
Jan 15 '08 #11

P: n/a
Bart Van der Donck wrote:
Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC.
Bart, Thank you for confirming what I'd noticed in practice.
I do, however, have a few examples where single x'0A' characters have
found their way into my data files, and since this is the linend
sequence on my linux server, it caused problems.

I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm just waiting for the browser that sends x'0A0D' now, but hope to
retire before that occurs. :-)

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk
Jan 16 '08 #12

P: n/a
Bart Van der Donck wrote:
Steve Swift wrote:

>>Bart Van der Donck wrote:

>>>Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC.
....
>
I'm thinking of 4 possibilities:
[5] User copying and psting.
Mick
Jan 16 '08 #13

P: n/a
In comp.lang.javascript message <47******@news.greennet.net>, Wed, 16
Jan 2008 06:47:48, Steve Swift <St***********@gmail.composted:
>
I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm just waiting for the browser that sends x'0A0D' now, but hope to
retire before that occurs. :-)
Whenever data is of possibly uncertain origin, it is well to assume the
worst of the characters which come between the lines.

In (past?) Delphi, for example, one could by various editing generate a
source file in which most line separations were CRLF but some were just
LF (or maybe just CR). Unfortunately, the IDE editor believed both, but
the compiler only believed LF.

Therefore, in Delphi, with
<statement1CR LF
// comment LF
<statement2CR LF
<statement3CR LF

<statement2would not be compiled. An LF between statements would not
matter so much, since, in Delphi, newline is a terminator only for that
type of comment, and not for code statements.

One needs an algorithm to convert bad newlines to good ones.

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Delphi 3? Turnpike 6.05
<URL:http://www.merlyn.demon.co.uk/TP/BP/Delphi/&c., FAQqy topics & links;
<URL:http://www.bancoems.com/CompLangPascalDelphiMisc-MiniFAQ.htmclpdmFAQ;
<URL:http://www.borland.com/newsgroups/guide.htmlnews:borland.* Guidelines
Jan 16 '08 #14

P: n/a
Dr J R Stockton wrote:
One needs an algorithm to convert bad newlines to good ones.
man recode
man iconv
PointedEars
Jan 16 '08 #15

This discussion thread is closed

Replies have been disabled for this discussion.