By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,909 Members | 2,016 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,909 IT Pros & Developers. It's quick & easy.

Split string on empty line

P: n/a
I'm using string.split(/^$/m, 2) on a curl output to separate header
and body. There’s an empty line between them. ^$ doesn’t seem to work...

Example curl output:
HTTP/1.1 404 Not Found
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>test</TITLE>
</HEAD><BODY>
<H1>test</H1>
The requested URL was not found on this server.<P>
<HR>
</BODY></HTML>

Feb 22 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Sen Haerens wrote:
I'm using string.split(/^$/m, 2) on a curl output to separate header and
body. There’s an empty line between them. ^$ doesn’t seem to work...
First the caveat: not all UAs support the use of regular expressions as
arguments to split().

Now the problem: you haven't given any regular expression to match. The
'^' character means match the pattern when it occurs at the start of a
line, it does not match the start of a line itself. Similarly, $ does
not match the end of a line (which is different to how they are treated
in regular expressions in some other environments).

If you are looking for an empty line, then match the pattern that
represents two consecutive newlines. Where text is input to a browser
through a form control, in Firefox the required pattern is \n\n and in
IE it is \r\n\r\n. Other patterns may be needed for other browsers.
Since your text is generated elsewhere, you may need some other pattern.

You can match different patterns simultaneously using '|' (which means
or) between the patterns:

/\n\n|\r\n\r\n/
will match either \n\n or \r\n\r\n - i.e. a sequence of two consecutive
new lines in both Firefox and IE (presuming that there is absolutely
nothing on each line).

A safer pattern that allows for possible white space on the 'empty' line is:

/\n\s*\n|\r\n\s*\r\n/

Example curl output:
HTTP/1.1 404 Not Found
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1


That does not have any empty lines, how should head and body be
separated? Here is a small test case based on your sample text:
<form action="">
<textarea id="ta" rows="10" cols="60">HTTP/1.1 404 Not Found

Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1</textarea>
<input type="button" value="Split header & body"
onclick="splitHB(this.form.ta.value);">
</form>

<script type="text/javascript">
function splitHB(txt)
{
var bits = txt.split(/\n\s*\n|\r\n\s*\r\n/);
alert('Header:\n' + bits[0]
+ '\n\nBody:\n' + bits[1]);
}
</script>
Shows an alert with:

Header:
HTTP/1.1 404 Not Found

Body:
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
[...]
--
Rob
Feb 22 '06 #2

P: n/a
On 2006-02-22 02:22:09 +0100, RobG <rg***@iinet.net.au> said:
Similarly, $ does not match the end of a line (which is different to
how they are treated in regular expressions in some other environments).
That was the confusing part. ^$ works perfectly in other programs.
A safer pattern that allows for possible white space on the 'empty' line is:
/\n\s*\n|\r\n\s*\r\n/


This one pulled off the trick.

Thanks a lot!
Sen

Feb 22 '06 #3

P: n/a
RobG wrote:
Sen Haerens wrote:
I'm using string.split(/^$/m, 2) on a curl output to separate header and
body. There’s an empty line between them. ^$ doesn’t seem to work...
First the caveat: not all UAs support the use of regular expressions as
arguments to split().


Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0, JScript
3.0/IE 4.0) do. The issue is a different one here: The `m' modifier,
specified for Regular Expressions in ECMAScript Edition 3, is not supported
before JavaScript 1.5 (Gecko-based incl. NN6+) and JScript 3.0 (IE 4.0+
based).
Now the problem: you haven't given any regular expression to match. The
'^' character means match the pattern when it occurs at the start of a
line, it does not match the start of a line itself. Similarly, $ does
not match the end of a line (which is different to how they are treated
in regular expressions in some other environments).
IBTD. `$' matches end-of-line with the `m' modifier, or end-of-input
without that modifier. It does not match the newline character (sequence),
_at the end of a line_, though.
If you are looking for an empty line, then match the pattern that
represents two consecutive newlines. Where text is input to a browser
through a form control, in Firefox the required pattern is \n\n and in
IE it is \r\n\r\n. Other patterns may be needed for other browsers.
Since your text is generated elsewhere, you may need some other pattern.

You can match different patterns simultaneously using '|' (which means
or) between the patterns:

/\n\n|\r\n\r\n/
Can be simplified to and made more compatible with /(\r\n?|\n){2}/.
will match either \n\n or \r\n\r\n - i.e. a sequence of two consecutive
new lines in both Firefox and IE (presuming that there is absolutely
nothing on each line).

A safer pattern that allows for possible white space on the 'empty' line
is:

/\n\s*\n|\r\n\s*\r\n/
Consequently,

/(\r\n?|\n)\s*(\r\n?|\n)/

or

/(\r\n?|\n)\s*\1/

if backreferences are supported within the expression.

[completed quotation]
Example curl output:
HTTP/1.1 404 Not Found
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

^
That does not have any empty lines,


There can be no completely empty line here, really. Every line that
conforms to the Internet Message Format (RFC822/STD11), as a line in
a HTTP/1.1 response, must be ended with <CR><LF>.
PointedEars
Feb 22 '06 #4

P: n/a
On 2006-02-22 15:23:34 +0100, Thomas 'PointedEars' Lahn
<Po*********@web.de> said:
RobG wrote:
Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0, JScript
3.0/IE 4.0) do. The issue is a different one here: The `m' modifier,
specified for Regular Expressions in ECMAScript Edition 3, is not supported
before JavaScript 1.5 (Gecko-based incl. NN6+) and JScript 3.0 (IE 4.0+
based).
The browser used is Safari 2.0.3.
There can be no completely empty line here, really. Every line that
conforms to the Internet Message Format (RFC822/STD11), as a line in
a HTTP/1.1 response, must be ended with <CR><LF>.


That clears it all up. Thank you!

Feb 24 '06 #5

P: n/a
Sen Haerens wrote:

[Restored context]
[...] Thomas 'PointedEars' Lahn [...] said:
RobG wrote:
First the caveat: not all UAs support the use of regular expressions
as arguments to split().

Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0,
JScript 3.0/IE 4.0) do. The issue is a different one here: The `m'
modifier, specified for Regular Expressions in ECMAScript Edition 3,
is not supported before JavaScript 1.5 (Gecko-based incl. NN6+) and
JScript 3.0 (IE 4.0+ based).


The browser used is Safari 2.0.3.


Apple Safari 2.0.3's Webcore 417.8 and WebKit/417.9 (released January 10,
2006) should implement at least KJS 3.4.1 (KDE 3.4.1 was released May 31,
2005), which supports both a Regular Expression object reference as
argument to String.prototype.split(), and the `m' modifier for Regular
Expressions:

<URL:http://developer.kde.org/documentation/library/3.4-api/kjs/html/string__object_8cpp-source.html>
<URL:http://developer.kde.org/documentation/library/3.4-api/kjs/html/regexp__object_8cpp-source.html>

However, it is unclear what the test browser has to do with the target
browser here.
There can be no completely empty line here, really. Every line that
conforms to the Internet Message Format (RFC822/STD11), as a line in
a HTTP/1.1 response, must be ended with <CR><LF>.


That clears it all up. Thank you!


You are welcome :)
PointedEars
Feb 24 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.