473,803 Members | 2,599 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

getting rid of non english characters

hey ppl, i am currently developing a parsing application my input is a
10MB english text file the parsing works fine however every now and then
a non english character appears that messes everything up. i need to get
rid of all these characters before i parse.HELP!

Teekus
(P.S. i used Regex.Replace but that did not take out the non english
characters!)

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 16 '05 #1
3 4327
TeekUS <Te****@teekus. com> wrote:
hey ppl, i am currently developing a parsing application my input is a
10MB english text file the parsing works fine however every now and then
a non english character appears that messes everything up. i need to get
rid of all these characters before i parse.HELP!

(P.S. i used Regex.Replace but that did not take out the non english
characters!)


If it's an English text file but you're reading some non-English
characters, that suggests that you're losing data - I'd worry about
that to start with. Have you looked at the file to see what's actually
there where you're getting incorrect data?

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2
Jon ---> no i dont think i am losing data. The BSCs sometimes just send
down some rubbish in their files for a reason we do not yet know. right
now all i need is to take these characters out so that my parser can run
smoothly. (if i replace the character with any other english character,
it is handeld correctly)Sugge stions??

Teekus

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 16 '05 #3
TeekUS <Te****@teekus. com> wrote:
Jon ---> no i dont think i am losing data. The BSCs sometimes just send
down some rubbish in their files for a reason we do not yet know. right
now all i need is to take these characters out so that my parser can run
smoothly. (if i replace the character with any other english character,
it is handeld correctly)Sugge stions??


Well, the easiest way would be to do something like:

1) Read the file line by line
2) For each line, check whether or not there are any non-ASCII
characters
3) If there are, use ToCharArray to get a character array for the
string, then run through that array and convert any non-ASCII character
into '?', then convert the char array to a string
4) Do whatever you want to do with the line.

Something like:

using (StreamReader reader = ...)
{
string line;

while ( (line=reader.Re adLine())!=null )
{
bool hasNonAscii=fal se;
foreach (char c in line)
{
if (c > 127)
{
hasNonAscii=tru e;
break;
}
}
if (hasNonAscii)
{
char[] chars = line.ToCharArra y();
for (int i=0; i < chars.Length; i++)
{
if (chars[i] > 127)
{
chars[i]='?';
}
}
line = new string(chars);
}

// Do whatever with line
}
}

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2349
by: Ricky Romaya | last post by:
Hi, Anybody could show me a regex for capturing words (alphas, without numerics) in languages other than english (languages with special characters i.e. french, german)? I've tried '+' but the special letters for some language (i.e. french) are not captured. The '\w+' works fine, but it also include numerics, which I don't want. TIA
3
2887
by: sebb | last post by:
For information, I have windows xp (english edition). When I use special non-english characters (like é à è ...) in my python code, IDLE tells me to add a line like: # -*- coding: cp1252 -*- .... because of the presence of non-ASCII characters. When I write a script like the folowing:
0
1683
by: Mark Stralka | last post by:
My company's enterprise directory is supposed to store all data fields in English. Before the data standards were implmented, many of the HR systems that fed data into the ED were sending some fields with a combination of English and non-English characters. I want to identify records with erroneous characters so we can ask the HR systems to re-send the correct English-only values. So... I've exported a list of all users in the ED into...
18
5477
by: OrenFlekser | last post by:
Hi I've posted this message couple of days ago, but I can't find it now, so sorry if you see it twice... Anyways - I have a text box, and I want my users to be able to write only in english inside it. I want to prevent the Alt+Shift option of switching to other languages. Thanks alot - Oren
17
14364
by: Olivier Bellemare | last post by:
I've tried to make a function that returns the middle of a string. For example: strmid("this is a text",6,4); would return "is a". Here is my code: char *strmid(char *texte, int depart, int longueur) { char *resultat = " "; char *temporaire = " "; int nbr;
14
2985
by: Gidi | last post by:
Hi, For the last week, i'm looking for a way to make a TextBox always write in English (No matter what the OS default language is). i asked here few times but the answers i got didn't help me. i search in google and found a way with changing the CultureInfo but still didn't work on a TextBox. i'm sure there's a way to do that, but i don't know what's the way. I'm desperate, if some one knows the answer, i will be very thankful to know...
0
1387
by: news.online.de | last post by:
Hello everybody, probably it's a FAQ but I didn't find anything so far concerning my problem, so I am asking here :-) I am facing the following problem: - I have developed a webservice client to an axis service on a german windows system with a german VS.net 2002 - If I have german special characters (Umlaute äöü...) in a string variable everything works fine on the german system, but if a install the client on an english system the...
5
2131
by: siLver | last post by:
Hi there, I need to create a asp page with fields like FirstName, LastName etc. But this 2 fields only allow the user to enter english characters, no chinese, no jap.. nothing else.. What is there that i must do to disable users from entering character other then English?? Thanx lotz..
12
2747
by: Steve Howell | last post by:
The never-ending debate about PEP 3131 got me thinking about natural languages with respect to Python, and I have a bunch of mostly simple observations (some factual, some anecdotal). I present these mostly as food for thought, but I do make my own continent-by-continent recommendations at the bottom of the email. (My own linguistic biases are also disclosed at the bottom of the email.) Nationality of various technologists who use...
0
10542
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10309
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10289
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10068
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9119
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6840
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5496
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4274
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3795
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.