471,854 Members | 1,509 Online

# Save exact contents of web form (including special characters) to file?

Hello all, I am quite new a web scripting and making web pages in
general and I have stumbled across a problem I have as yet been unable
to solve. I am trying to take the contents of a textarea box and save
it to a file. This step is not to hard however the contents of the
textarea is mostly latex source so it contains just about every special
character you can imagine. My question is this, how do I save an exact
copy of the textarea contents with special characters, carriage
returns, etc to a file?

Any help would be great,
Greetings, David.

Mar 6 '06 #1
8 3438

da*****************@gmail.com wrote:
Hello all, I am quite new a web scripting and making web pages in
general and I have stumbled across a problem I have as yet been unable
to solve. I am trying to take the contents of a textarea box and save
it to a file. This step is not to hard however the contents of the
textarea is mostly latex source so it contains just about every special
character you can imagine. My question is this, how do I save an exact
copy of the textarea contents with special characters, carriage
returns, etc to a file?

Any help would be great,
Greetings, David.

I am an amateur at this, but your problem could be better defined.
I.e. is your problem:-

(a) you don't know at all how to extract the contents of a TEXTAREA
element using the BOM/DOM (browser object model/document object model)
and save this using the ActiveX Scripting.FileSystemObject (IE only) or
XPCOM (Firefox)?

(b) when you extract the contents of a TEXTAREA using the BOM, the
Javascript String does not contain all of the "special characters" you
expect it to. In which case, can you be explicit as to which
characters are causing a problem for you. A Javascript String consists
of 16bit characters, and so can represent unicode values from 0000 to
FFFF I think.

(c) when you save the contents to a text file, the component throws an
error or the resulting save file does not contain all of the "special
characters" you expect it to. In which case, again can you be explicit
as to which characters are causing a problem for you. If your problem
is (c), and you are working in Windows, then you may need to become
familiar with "code pages" (I only have a rough knowledge of them).
For english code pages, when Scripting.FileSystemObject saves a text
file, there are some characters (I think in the ascii range 129 to 159)
which can cause errors.

Regards

Julian Turner

Mar 6 '06 #2
da*****************@gmail.com wrote:
[...] I am trying to take the contents of a textarea box and save
it to a file. This step is not to hard however the contents of the
textarea is mostly latex source so it contains just about every
special character you can imagine.
(La)TeX source is plain text, I do not see why that would be relevant.
Are confusing the source code with the output it generates?
My question is this, how do I save an exact copy of the textarea contents
with special characters, carriage returns, etc to a file?

The same way you save other content to a file, which has been discussed
here several times before. You are going to retrieve the value with a
recent ECMAScript implementation (because older implementations do not
correspond with the required host object models). All of those use UTF-16
as character string encoding, it does not matter what characters are in
there.
PointedEars
Mar 6 '06 #3
Ok, thanks for the replies although a lot of it was over my head. I
have been learning web scripting for 3 or 4 days now so be please dumb
it down a bit. I think my problem is
(b) when you extract the contents of a TEXTAREA using the BOM, the
Javascript String does not contain all of the "special characters" you
expect it to. In which case, can you be explicit as to which
characters are causing a problem for you. A Javascript String consists
of 16bit characters, and so can represent unicode values from 0000 to
FFFF I think.
and it is the "+" symbol that I seem to be missing, at least thats the
only one missing from my short test latex file.
(La)TeX source is plain text, I do not see why that would be relevant.
Are confusing the source code with the output it generates?

When I say special characters I mean things like "+/!$&<" not things like alpha, beta, gamma etc you see in LaTex output. I believe my problem may lie in either the way I am retrieving the contents of the textarea or in the way i am saving it to a file. I have included a striped down version of my code that still does not save the "+" symbol but does save the carriage returns after including the escape() function and stripslashes function in the php script used to write the file. sample.html <html> <head> <script language="javascript" type="text/javascript"> function loadtex(url){ var xmlHttp = new XMLHttpRequest(); xmlHttp.open('GET',url,true); xmlHttp.onreadystatechange=function(){ setsource(xmlHttp); }; xmlHttp.send(null); } function setsource(xmlHttp){ if(xmlHttp.readyState==4){ if (xmlHttp.status==200){ var response = xmlHttp.responseText; document.getElementById("texSource").value=respons e; } } } function savetex(){ var updtSrc = escape(document.getElementById("texSource").value) ; var urlS = 'save.php?content='+updtSrc; var svHttp = new XMLHttpRequest(); svHttp.open('POST',urlS,true); svHttp.setRequestHeader('Content-Type','application/x-www-form-urlencoded'); svHttp.send(updtSrc); } </script> </head> <body> <table border="1" width=100%> <tr><td><center> <INPUT type=button value="Load LaTex Source" onclick="loadtex('sample.tex')"> </center></td></tr> <tr><td> <br> <center> <textarea name="tex source" id="texSource", cols=80, rows=35></textarea> </center></td></tr> <tr><td><center> <INPUT type=button value="Save Source" onclick="savetex()"> </center></td></tr> </table> </body> </html> save.php <?$datevar=date("YmdHms");
$array1=array($datevar,'.tex');
$stuff=stripslashes($_GET['content']);
$filename=implode($array1);
$f=fopen($filename,"w");
fwrite($f,$stuff);
fclose($f); ?> Mar 6 '06 #4 da*****************@gmail.com wrote: Ok, thanks for the replies although a lot of it was over my head. I have been learning web scripting for 3 or 4 days now so be please dumb it down a bit. I think my problem is (b) when you extract the contents of a TEXTAREA using the BOM, the Javascript String does not contain all of the "special characters" you expect it to. In which case, can you be explicit as to which characters are causing a problem for you. A Javascript String consists of 16bit characters, and so can represent unicode values from 0000 to FFFF I think. and it is the "+" symbol that I seem to be missing, at least thats the only one missing from my short test latex file. (La)TeX source is plain text, I do not see why that would be relevant. Are confusing the source code with the output it generates? When I say special characters I mean things like "+/!$&<" not things
like alpha, beta, gamma etc you see in LaTex output. I believe my
problem may lie in either the way I am retrieving the contents of the
textarea or in the way i am saving it to a file. I have included a
striped down version of my code that still does not save the "+" symbol
but does save the carriage returns after including the escape()
function and stripslashes function in the php script used to write the
file.

sample.html

<html>
<script language="javascript" type="text/javascript">

The language attribute is deprecated, remove it. Keep type.

var xmlHttp = new XMLHttpRequest();
xmlHttp.open('GET',url,true);
xmlHttp.send(null);
}

function setsource(xmlHttp){
if (xmlHttp.status==200){
var response = xmlHttp.responseText;
document.getElementById("texSource").value=respons e;
}
}
}

function savetex(){
var updtSrc = escape(document.getElementById("texSource").value) ; --------------------^^^^^^

Don't use 'escape', use encodeURIComponent.

"The encodeURI and decodeURI functions are intended to work
with complete URIs; they assume that any reserved characters
in the URI are intended to have special meaning and so are
not encoded. The encodeURIComponent and decodeURIComponent
functions are intended to work with the individual component
parts of a URI; they assume that any reserved characters
represent text and so must be encoded so that they are
not interpreted as reserved characters when the component
is part of a complete URI."
ECMAScript Language Specification, Ed 3, 15.1.3
var urlS = 'save.php?content='+updtSrc;
var svHttp = new XMLHttpRequest();

svHttp.open('POST',urlS,true);

svHttp.send(updtSrc);
}
</script>

<body>
<table border="1" width=100%>
<tr><td><center>
<INPUT type=button value="Load LaTex Source"
</center></td></tr>
<tr><td>
<br>
<center>
<textarea name="tex source" id="texSource", cols=80,

It's OK to have spaces in the name attribute but it can make life more
difficult. It's generally better to have the name the same as the ID
unless you have a reason to make them different - consider using
'texSource' for both. Since the element isn't in a form, giving it a
name attribute seems redundant.
[...]

--
Rob
Mar 7 '06 #5
First of all, please learn to post. NetNews is thread-based; you should
post a followup to the text you are directly referring to, provide
attribution of quoted material (in Google Groups: show options, Reply),
and trim your quotes to the minimum necessary to retain context:

<URL:http://jibbering.com/faq/faq_notes/pots1.html#ps1Post>

That said, for the flaw in its Web interface, posting with Google Groups is
recommended against anyway. Use a decent newsreader application instead.

__________________________________________________ _________________________

da*****************@gmail.com wrote:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You better use your name here.
Ok, thanks for the replies although a lot of it was over my head. I
have been learning web scripting for 3 or 4 days now so be please dumb
it down a bit.
I will try. However, never hesitate to ask explicitly about things you do
not (fully) understand.
I think my problem is

[Julian Turner wrote:]
(b) when you extract the contents of a TEXTAREA using the BOM, the
Javascript String does not contain all of the "special characters" you
expect it to. In which case, can you be explicit as to which
characters are causing a problem for you. A Javascript String consists
of 16bit characters, and so can represent unicode values from 0000 to
FFFF I think.
and it is the "+" symbol that I seem to be missing, at least thats the
only one missing from my short test latex file.

Because you are including it in the request URI, even though you are
using POST. The plain +' character is considered a substitute for the
space character in the query component by many server-side applications,
including PHP.
[Thomas 'PointedEars' Lahn wrote:]
(La)TeX source is plain text, I do not see why that would be relevant.
Are confusing the source code with the output it generates?
When I say special characters I mean things like "+/!&<" not things like alpha, beta, gamma etc you see in LaTex output. Of those characters, only +' and &' can be considered special, and only within the query component of a URI. (See above for the +' character.) The &' character is the delimiter between each parameter of the component, so it must be escaped if it does not serve as such (see RFC3986; you do this already). I believe my problem may lie in either the way I am retrieving the contents of the textarea or in the way i am saving it to a file. Your assumption is correct. <html> The DOCTYPE declaration is missing before this, so the markup is not Valid. <URL:http://validator.w3.org/> <head> <script language="javascript" type="text/javascript"> You can safely omit the deprecated language' attribute. You must omit it with a Strict (X)HTML document type. function loadtex(url){ var xmlHttp = new XMLHttpRequest(); See below. xmlHttp.open('GET',url,true); xmlHttp.onreadystatechange=function(){ setsource(xmlHttp); }; Unless you are planning to reuse setsource() somewhere, there is no need to call it. You can simply use its code as code of the anonymous function. xmlHttp.send(null); } function setsource(xmlHttp){ if(xmlHttp.readyState==4){ if (xmlHttp.status==200){ var response = xmlHttp.responseText; document.getElementById("texSource").value=respons e; Use a form' element to contain the form controls, and pass this.form' as second argument to the method, say f'. Then you can refer to the control within the method by f.elements['texSource'].value = response; However, you would not need the "texSource" ID here: f.elements['tex source'].value = response; ("tex source" is the current name of the control) would work as well. } } } function savetex(){ var updtSrc = escape(document.getElementById("texSource").value) ; See above. var urlS = 'save.php?content='+updtSrc; You are doing this the wrong way. Although specified different, there is a limit on URI length in some browsers (IE is especially restrictive here). So on request what is properly evaluated from 'save.php?content='+updtSrc by the script engine is truncated for longer LaTeX code anyway. var svHttp = new XMLHttpRequest(); This will not work in Internet Explorer before version 7 beta 2+. See <URL:http://jibbering.com/2002/4/httprequest.html> for details. svHttp.open('POST',urlS,true); Using POST is the correct approach to work around the URI length limit, however then you do not need to use the query component anymore: var urlS = 'save.php'; svHttp.setRequestHeader('Content-Type','application/x-www-form-urlencoded'); Opera 8+ supports XMLHTTPRequest, but Opera 8.0 does not implement the setRequestHeader() method yet. (It is supported since version 8.01.) svHttp.send(updtSrc); svHttp.send("content=" + updtSrc); } </script> </head> <body> <table border="1" width=100%> Consider not using tables here, and learn to use CSS. <tr><td><center> You are looking for <td style="text-align:center"> instead of the deprecated center' element. <INPUT type=button value="Load LaTex Source" onclick="loadtex('sample.tex')"> Whenever you use event handler attributes such as onclick', you should declare the default scripting language in the 'head' element: <head> ... <meta http-equiv="Content-Script-Type" content="text/javascript"> ... </head> </center></td></tr> <tr><td> <br> Do not use the br' element to achieve margins. Use the CSS padding' and margin' properties instead. <center> See above. <textarea name="tex source" id="texSource", cols=80, ^ ^ rows=35></textarea> You are confusing programming and markup language here. Commas are not allowed within a (start) tag outside of CDATA attribute values. They may be ignored, but one should not depend on that. Remove them. Attribute values should always be single-quoted or double-quoted. Although not required, maybe it is better for later element object access to replace the space in the name with an underscore (_) or, if you omit the ID attribute as rendered redundant by my suggestion above, just remove the space. You should have a preference for lowercase names and IDs. </center></td></tr> <tr><td><center> See above. <INPUT type=button value="Save Source" onclick="savetex()"> </center></td></tr> </table> </body> </html> save.php <?datevar=date("YmdHms");
$array1=array($datevar,'.tex');
This is unnecessary, see below.
$stuff=stripslashes($_GET['content']);
This line removes all backslashes (e.g. from "\tilde{}"):

<URL:http://de.php.net/manual/en/function.stripslashes.php>

Do not use stripslashes() here.

Use $_POST instead of$_GET here (see above).
$filename=implode($array1);
Unnecessary inefficient. Consider

$filename = date('YmdHms') . '.tex'; instead.$f=fopen($filename,"w"); fwrite($f,$stuff); fclose($f);
?>

You should test whether fopen() was successful before you fwrite() and
fclose():

$f = fopen($filename, "w");
if ($f) { fwrite($f, $stuff); fclose($f);
}
HTH

PointedEars
Mar 7 '06 #6

da*****************@gmail.com wrote:
When I say special characters I mean things like "+/!$&<" not things like alpha, beta, gamma etc you see in LaTex output. I believe my problem may lie in either the way I am retrieving the contents of the textarea or in the way i am saving it to a file. I have included a striped down version of my code that still does not save the "+" symbol but does save the carriage returns after including the escape() function and stripslashes function in the php script used to write the file. That can be two issues at once: 1) The infamous "Korean issue" in IE <http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/ed1443e6b5e51bc6/f6455f4b2c3ffaa7> if you server doesn't set charset. * Always declare charset on your page - or be sure that your server will provide one * 2) "+" is ised in CGI to transmit encoded chars. * Always encode your values before sending them * (on regular GET / POST browser does it for you, with ajaxoids you have to do it manually) Mar 7 '06 #7 Thomas 'PointedEars' Lahn wrote: First of all, please learn to post. NetNews is thread-based; you should post a followup to the text you are directly referring to, provide attribution of quoted material (in Google Groups: show options, Reply), and trim your quotes to the minimum necessary to retain context: <URL:http://jibbering.com/faq/faq_notes/pots1.html#ps1Post> That said, for the flaw in its Web interface, posting with Google Groups is recommended against anyway. Use a decent newsreader application instead. Still working on the decent newsreader but hopefully I am posting correctly this time. da*****************@gmail.com wrote: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You better use your name here. This should be fixed now$stuff=stripslashes($_GET['content']); This line removes all backslashes (e.g. from "\tilde{}"): Do not use stripslashes() here. I have implemented all the recommendations succesfully except this one. I found that even when using the$_POST I still needed to include the
stripslashes() function or the saved file included extra slashes. For
example, without the stripslashes() the saved file would include

" \' " instead of " ' "
and
"\\end{document}" instead of "\end{document}"

So using the following code gives me the results I want, hopefully it
is a little more correct now. Please point out any amatuer mistakes

sample.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<title>Test Latex Save</title>
<meta http-equiv="Content-Script-Type" content="text/javascript">
<link rel="stylesheet" type="text/css" href="sample.css">
<script type="text/javascript">

var xmlHttp = new XMLHttpRequest();
xmlHttp.open('GET',url,true);
if (xmlHttp.status==200){
var response = xmlHttp.responseText;
form1.elements['texsource'].value=response;
}
}
}
xmlHttp.send(null);
}

function savetex(form1){
var updtSrc =
encodeURIComponent(form1.elements['texsource'].value);
var urlS = 'save.php';
var svHttp = new XMLHttpRequest();

svHttp.open('POST',urlS,true);

svHttp.send("content=" + updtSrc);
}
</script>

<body>
<div id='main'>
<form name="form1" method="post" action="simple.html">
<div><input type=button value="Load LaTex Source"
<div><textarea name='texsource' cols='80'
rows='35'></textarea></div>
<div><input type=button value='Save Source'
onclick='savetex(form1)'></div>
</form>
</div>
</body>
</html>

save.php

<?
$stuff=$_POST['content'];
$filename=date('YmdHms') . '.tex';$f=fopen($filename,"w"); if ($f)
{
fwrite($f,$stuff);
fclose(f); } ?> sample.css div#main {background: white; padding: 10px; text-align: center;} Mar 8 '06 #8 David L Green wrote: Thomas 'PointedEars' Lahn wrote: Still working on the decent newsreader but hopefully I am posting correctly this time. Looks OK now :) da*****************@gmail.com wrote: [...] >stuff=stripslashes($_GET['content']); This line removes all backslashes (e.g. from "\tilde{}"): Do not use stripslashes() here. I have implemented all the recommendations succesfully except this one. I found that even when using the$_POST I still needed to include the
stripslashes() function or the saved file included extra slashes. For
example, without the stripslashes() the saved file would include

" \' " instead of " ' "
and
"\\end{document}" instead of "\end{document}"

The reason is you have the magic_quotes_gpc' option in php.ini set to
on' (default).

<URL:http://php.net/manual/en/ref.info.php#ini.magic-quotes-gpc>
<URL:http://php.net/manual/en/security.magicquotes.php>
So using the following code gives me the results I want, hopefully it
is a little more correct now. Please point out any amatuer mistakes

sample.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<title>Test Latex Save</title>
A little bit more indentation would be nice. Usually two spaces suffice.

And it is LaTeX' (or LaTeΧ'; or -- almost the original --
L<sup>A</sup>T<sub>E</sub>Χ', but not within the title' element),
really. Compare <URL:http://en.wikipedia.org/wiki/Latex> :)
[...]
The second argument is named like your form. Both names are
unnecessary (the named argument itself is not), see below.
var xmlHttp = new XMLHttpRequest();
[...]
function savetex(form1){
var updtSrc =
encodeURIComponent(form1.elements['texsource'].value);
var urlS = 'save.php';
These two variables are not really needed, you can include their values
directly as arguments in the method calls.
var svHttp = new XMLHttpRequest();
See above.
svHttp.open('POST',urlS,true);

function isMethodType(s)
{
return (s == "function" || s == "object");
}

{
'Content-Type',
'application/x-www-form-urlencoded');
}
svHttp.send("content=" + updtSrc);
}
[...]
</script>

<body>
<div id='main'>
The div' element appears to be redundant with the CSS declarations below;
you can target the body' element directly:

body {
/* ... */
}
<form name="form1" method="post" action="simple.html">
<div><input type=button value="Load LaTex Source"
You are still using proprietary references here, referring to the
HTMLFormElement by its name. You should not, and you do not have
to. As long as the input' element is descendant of a form'
element, it knows its ancestor:

...<input type="button" value="Load LaTeX Source"

(BTW, you cannot be sure that the _TeX_ source uses the LaTeX macro
package, can you? Therefore, AIUI, it would be better to refer only
to TeX explicitly.)
<div><textarea name='texsource' cols='80'
rows='35'></textarea></div>
<div><input type=button value='Save Source'
onclick='savetex(form1)'></div> ^^^^^
See above. Maybe you also want to write those buttons dynamically,
because they will not work without client-side script support. E.g.

<script type="text/javascript">
document.write('<div><input type=button value="Save Source"'
+ ' onclick="savetex(this.form);"><\/div>');
</script>

Without script support, no button is displayed then, which is what
we want. (No button, no cry ;-))

In order to maintain your previous column-based layout, you can make
the div' elements float around each other with the float' and clear'
CSS properties:

<URL:http://www.w3.org/TR/CSS2/visuren.html#floats>

However, I just wanted you to take into consideration that tables may not
be appropriate here. If you are convinced that your form can be interpreted
as tabular data, there is nothing wrong with using tables; in fact, it
would be wrong not to use a table then. For most standard Web forms, a
table is just fine (for me) as it shows a label-control relationship.
[...]
save.php

<?

<URL:http://php.net/manual/en/ini.core.php#ini.short-open-tag>
<URL:http://php.net/manual/en/language.basic-syntax.php>
$f=fopen($filename,"w");
When string expansion is not needed, use single quotes
instead of double quotes in PHP. They are considerably
faster:

<URL:http://php.net/manual/en/language.types.string.php>
<URL:http://www.zend.com/zend/tut/using-strings.php>
[...]
sample.css

div#main {background: white; padding: 10px; text-align: center;}

^^^^^^^^^^^^^^^^^^
background-color: white;

suffices. And be sure to declare both the background (color) and
the foreground color always:

<URL:http://www.w3.org/QA/Tips/color>

The snipped parts looks just fine :)
Regards,
PointedEars
Mar 8 '06 #9

### This discussion thread is closed

Replies have been disabled for this discussion.