473,801 Members | 2,316 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Valid Characters

I'm trying to ensure that all the characters in my XML document are
characters specified in this document:
http://www.w3.org/TR/2000/REC-xml-20001006#charsets

Would a function like this work:

private static string formatXMLString (string n)
{
if (string.IsNullO rEmpty(n)) return n;
System.Text.Str ingBuilder sb = new System.Text.Str ingBuilder();
char[] chrs = n.ToCharArray() ;
char c;
int x, j = chrs.Length;
for (x = 0; x < j; x++)
{
c = chrs[x];
if (c == 0x9 || c == 0xA || c == 0xD ||
(c 0x20 && c < 0xd7ff) ||
(c 0xe000 && c < 0xffd) ||
(c 0x10000 && c < 0x10ffff))
{
sb.Append(c);
}
}
return sb.ToString();
}

I've never compared characters to like this (0x9, 0xffd, etc...)?
I'm not trying to be lazy and not test it myself, I just don't know if this
type of character comparison is the correct logic for the results I'm
looking for.

Any input?
Nov 30 '06
13 3234
Thank you.....I was aware of the tab, carriage return, and line feed....and
I did end up catching my "inclusive comparison" problem, but I did not know
what (U+FFFE) and U+10000-U+10FFFF.

Thank you.
Dec 1 '06 #11


"Jon Skeet [C# MVP]" <sk***@pobox.co mwrote in message
news:MP******** *************** *@msnews.micros oft.com...
<"David Browne" <davidbaxterbro wne no potted me**@hotmail.co m>wrote:

<snip>
>Sure. Don't be lazy.

And a char is a 2-byte type, so your literals should all be 2-byte
literals,
and should be cast to char for comparison.

eg
char space = (char)0x0020;

While I agree that the "high order" comparisons are invalid, where do
you see the benefit in converting to char for comparison?
Just to remove implicit conversions from the code. The same reason I like
to see parentheses control order of operations instead of relying on
operator precedence. I think it makes the code more readable and less
fragile to have the the operations explicit.

David

Dec 1 '06 #12
<"David Browne" <davidbaxterbro wne no potted me**@hotmail.co m>wrote:
While I agree that the "high order" comparisons are invalid, where do
you see the benefit in converting to char for comparison?

Just to remove implicit conversions from the code. The same reason I like
to see parentheses control order of operations instead of relying on
operator precedence. I think it makes the code more readable and less
fragile to have the the operations explicit.
In some cases I'd agree, but in this case I think it would make it
harder to read overall. Personal preference though...

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 1 '06 #13
Yes, but a surrogate will occupy two char's. A char is not a Unicode
character; it's an unsigned 16bit integer, and the range of a char is not
U+0000 to U+ffff, it's 0x0000 to 0xffff.
Yes.
Which means that the condition
(c 0x10000 && c < 0x10ffff))
has to be rewriten in terms of surrogates.
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Dec 2 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
17354
by: Martin Lucas-Smith | last post by:
Can anyone point me to a regular expression in PHP which could be used to check that a proposed (My)SQL database/table/column name is valid, i.e. shouldn't result in an SQL error when created? The user of my (hopefully to be opensourced) program has the ability to create database/table/column names on the fly. I'm aware of obvious characters such as ., , things like >, etc., which won't work, but haven't been able to source a...
8
2767
by: John V | last post by:
What kind of regular expression pattern is needed to check if URL is valid? It's enought if most of cases are covered. I have PHP 4.x. Br
4
16987
by: Todd Perkins | last post by:
Hello all, surprisingly enough, this is my first newsgroup post, I usually rely on google. So I hope I have enough info contained. Thank you in advance for any help! Problem: I am getting this error when I try to pull up my edit page to display the current database information in the form, and then edit it on click:
4
1607
by: Lee Chapman | last post by:
Hi, I am having difficulty getting the ASP.NET framework to generate valid XHTML. My immediate problem surrounds user input in, for example, textbox controls. I consider characters such as less-than and ampersand perfectly valid in user input. So I've disabled request validation by adding the following to my web.config file.
14
4296
by: Jack Russell | last post by:
Is there a simple function to test if a string is a valid file name (i.e does not contain illegal characters etc) other than doing it the long way? Thanks Jack Russell
10
4274
by: SpreadTooThin | last post by:
Hi I'm writing a python script that creates directories from user input. Sometimes the user inputs characters that aren't valid characters for a file or directory name. Here are the characters that I consider to be valid characters... valid = ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ' if I have a string called fname I want to go through each character in
1
1707
by: Chris Curvey | last post by:
Hey all, I'm trying to write something that will "fail fast" if one of my users gives me non-latin-1 characters. So I tried this: u'\x80' I would have thought that that should have raised an error, because \x80 is not a valid character in latin-1 (according to what I can find). Is this the expected behavior, or am I missing something?
6
2721
by: adurth | last post by:
Hi! Is there any function that converts a string containing characters that are invalid for use in an element name to a valid one? Thanks, Andreas
10
5941
by: Academia | last post by:
I'd like to check a string to see that it is a valid file name. Is there a Like pattern or RegEx that can do that. 1) Just the file name with maybe an extension 2)A full path An help for either of the above would be appreciated.
0
9697
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9555
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10515
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10291
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10260
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10049
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9100
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6827
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
3771
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.