473,666 Members | 2,143 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with cleaning input text - removing control characters

I have an HTML form with a textarea input box. When the user conducts a
post request (e.g. clicks the submit button), an HTML preview page is
presented to them with the information they have filled out in the prior
page's form elements.

Naturally some users like to copy and paste text into the textarea box and
presumably do so from say a word processor program. Some Macintosh based
users I know of experience problems with foreign looking characters
appearing in the HTML output, i.e tiny square boxes. The server processing
their requests is PC/Microsoft Windows (2000) based.

To fix the problem, I know this is a matter of removing certain control
characters. I would like to write some client side Javascript validation
code to handle this.

The problem for me is two-fold. I do not have a Mac/PowerPC to use for
testing. I am not all that familiar with Macs or know what control
characters to screen for. (About the only thing I know is Mac and Windows
use different control character representations for line feeds or carriage
returns or both).

Can someone shed some light on this for me? For example, which characters
to look for in parsing strings, i.e. \n, \t, etc. Thanks.

--
Peter O'Reilly
Jul 23 '05 #1
8 14267
Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javas cript:
To fix the problem, I know this is a matter of removing certain control
characters. I would like to write some client side Javascript validation
code to handle this.

<input
onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,'')"


removes anything that is not alphanumeric or space after loss of focus

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 23 '05 #2
Evertjan. wrote:
Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javas cript:

To fix the problem, I know this is a matter of removing certain control
characters. I would like to write some client side Javascript validation
code to handle this.
<input
onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,'')"


onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,' ')"

Replace any character that is not a-z or a number with a space.
Better, no?
Mick

removes anything that is not alphanumeric or space after loss of focus

Jul 23 '05 #3
Mick White wrote:
Evertjan. wrote:
Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javas cript:

To fix the problem, I know this is a matter of removing certain control
characters. I would like to write some client side Javascript
validation
code to handle this.
<input
onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,'')"

onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,' ')"

Replace any character that is not a-z or a number with a space.
Better, no?
Mick


Oops, you're right, I didn't notice the space in your "not" character set.
Mick


removes anything that is not alphanumeric or space after loss of focus

Jul 23 '05 #4
Mick White wrote on 05 aug 2004 in comp.lang.javas cript:
<input
onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,'')"


onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,' ')"

Replace any character that is not a-z or a number with a space.
Better, no?


Better, yes.
But not quite complete:

==============

Replace any group of characters that are
not a-z
or A-Z
or a number
or a space
with a space:

onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,' ')"

[this will leave multiple spaces as they are,
but replace multiple repaceants with one space]

==============

Replace any character that is
not a-z
or A-Z
or a number
or a space
with a space:

onchange="this. value=this.valu e.replace(/[^a-z\d ]/ig,' ')"

[this will leave multiple spaces as they are,
and replace multiple repaceants with multiple spaces]
==============

Replace any group of characters that are
not a-z
or A-Z
or a number
with a space:

onchange="this. value=this.valu e.replace(/[^a-z\d]+/ig,' ')"

[this will replace multiple white space with one space,
and replace multiple repaceants with multiple spaces]

===============

not tested, beware of any silly mistake.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 23 '05 #5
Evertjan & Mick,

Thank you both for the very helpful replies and code samples.

To be honest though, I am a little bit uncomfortable with the "what to
allow" approach. Don't get me wrong, your regular expressions are great, but
I'm afraid it may be a bit too aggressive in replacing text. For example,
consideration must be given for characters like !, @, #, $ ~ , etc. Of
course, those characters can always be added to the regular expression. I'm
afraid I will not think of all possible allowable characters.

Instead, a "what not to allow" approach would be most ideal,
e.g.specificall y targeting those few characters to screen out. What those
characters are is a mystery to me.
Perhaps String.charCode At() approach is needed?

Thanks again/dank u wel.

--
Peter O'Reilly
Jul 23 '05 #6
Peter O'Reilly wrote on 05 aug 2004 in comp.lang.javas cript:
Instead, a "what not to allow" approach would be most ideal,
e.g.specificall y targeting those few characters to screen out. What
those characters are is a mystery to me.


If you do not know the character or it's ascii value or it's unicode value,
it will be very difficult to specify a positive exclusion, Peter.

onchange="this. value=this.valu e.replace(/[@\\\n\x08\x1b\u 00A9]+/ig,' ')"

This will exclude:
The @
the \ itself (\\)
the linfeed char (\n)
the backspace (\x08 = hex 8)
the escape (\x21 = hex 1b = decimal 27)
the unicode copyright symbol (\u00A9 = )
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 23 '05 #7
> If you do not know the character or it's ascii value or it's unicode
value,
it will be very difficult to specify a positive exclusion, Peter.


Evertjan, it's good to see that you are finally catching on. If someone
could shed some more insight into the original query, that would be great.
I'm sure someone else here must have experienced such problem and found a
solution for it.

In particular information on the character encoding issues or type(s) used
by English Macintosh users
(versus the IBM-PC/OEM ASCII character set I am accustomed to) would be
helpful.
--
Peter "UTF-8" O'Reilly
Jul 23 '05 #8
On Thu, 05 Aug 2004 16:33:49 +0000, Evertjan. wrote:
<input
onchange="this. value=this.valu e.replace(/[^a-z\d ]+/ig,'')"


Don't forget to perform this validation on the server side, too, for those
with JavaScript disabled in their browser.

La'ie Techie

Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2763
by: Greg Lindstrom | last post by:
Hello- I have a text file generated on a HP-9000 running HP-Unix with embedded control characters and would like to read it in, modify it, and write it out in PDF format. The new page character is control-L which I can match with regular expressions as /^M (build using control-Q control-M) but when I attempt to use this to match the string in the program using
2
2785
by: Paul M | last post by:
Hi there, i hope someone can help me out here.. I have a input screen where i want the user to enter text in 2 different languages, english and macedonian. How is it possible to make the macedonian input boxes so when you type into them, it is entered into macedonian font? I dont want the users to be downloading anything also. thanks,
1
3811
by: Phil Amey | last post by:
In a web based form I am able to make sure that there is text in an input field but I want to restrict the user from using such characters as ~ # & ' How can I modify this JavaScript below to enable this ? if (document.form1.ProjectTitle.value == ""){ alert("Please complete the Project Title: field") document.form1.ProjectTitle.focus() validFlag = false return validFlag
3
4731
by: addi | last post by:
I'm looking to perform input validation on an HTML input text element; specifically, I'm looking to prevent anything other than numerical characters from being entered. I've got it working just fine for English characters by using an 'onkeypress' event handler, where basically all that is done is a regular expression check of the character represented by window.event.keyCode against a string containing "0123456789". The problem I'm...
3
4212
by: Ali | last post by:
I have 3 html input tex in my asp.net form. Two of them are calling javascript client side to calculate the differnce of two dates and put the result into the third input text. i haven't include runat=server in those input texts. it works fine with me and i get the result in the third input text. Now when i want to insert the 3 html input text values into my database among other web controls ... i found out that i can't see them unlike web...
1
2061
by: Fred Nelson | last post by:
Hi: I'm writing an error handling system for my vb.net windows application. I have an error trapping routine that is catching all unexpected errors, writing an entry in an sql database and terminating the program. This is the best error handling for this program as its printing a bunch of form letters and should something go wrong there is nothing the user can do to fix it - I need to get the error message. My problem is that I...
11
11143
by: Ron L | last post by:
I have a barcode scanner which uses a "keyboard wedge" program so that the data it scans comes through as if it was typed on a keyboard. I am trying to have the data in the barcode be displayed in a text box. I know that there are certain control characters embedded in the data that I want to display substitutions for in the text box, for instance I want to replace ASCII character 04 with the string "<EOT>". I have tried doing a simple...
2
2102
by: =?Utf-8?B?R3VoYW5hdGg=?= | last post by:
Hi, In a web page we have many single line input text boxes inside table,which has the max length of 250 characters and few drop downs where auto post back property is true. When we provide 250 characters in any of the input text box and select any value in drop down,the page loads again.Now the text box length is exceeding the page view in the browser(ie: table gets expanded with page gets horizontal scroll).
2
9721
by: Bazza Formez | last post by:
I have a bound field in a DetailsView control that displays free form description type data from my SQL database table (typical data is a couple of paragraphs of written product description being held in a single database field of type ntext). This description data typically has various simple control characters in it - ie. new line, carriage returns etc) to make the paragraph more readable. My problem is that these control characters...
0
8445
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
8356
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8871
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8781
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8640
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7386
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6198
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5664
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
2011
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.