473,683 Members | 3,719 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

way to remove all non-ascii characters from a file?

I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?
Jul 18 '05 #1
5 18877
Something simple like following will work for files
that fit in memory:

def onlyascii(char) :
if ord(char) < 48 or ord(char) > 127: return ''
else: return char

f=open('filenam e.ext','r')
data=f.read()
f.close()
filtered_data=f ilter(onlyascii , data)

For larger files you will need to loop and read
the data in chunks.

-Larry Bates
----------------------------
"omission9" <ru******@salem state.edu> wrote in message
news:de******** *************** ***@posting.goo gle.com...
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?

Jul 18 '05 #2
omission9 wrote:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?


file("file2","w ").write("".joi n(
[ch for ch in file("file1", "r").read()
if ch in string.ascii_le tters]))

but this will also strip line breaks and whatnot :)

(n.b. I didn't actualy test the above code, and wrote it because of
amusement value :) )

Jul 18 '05 #3
omission9 wrote:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?


Read it in chunks, then remove the non-ascii charactors like so:
t = "".join(map(chr , range(256)))
d = "".join(map(chr , range(128,256)) )
"Törichte Logik böser Kobold".transla te(t,d) 'Trichte Logik bser Kobold'

and finally write the maimed chunks to a file. However, it's not clear to
me, how removing characters could be a good idea in the first place.
Replacing them at least gives some mimimal hints that something is missing:
t = "".join(map(chr , range(128))) + "?" * 128
"Törichte Logik böser Kobold".transla te(t)

'T?richte Logik b?ser Kobold'

Peter
Jul 18 '05 #4
omission9 wrote:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?


Here's a simple example that does what you want:
orig = "Häring"
"".join([x for x in orig if ord(x) < 128])

'Hring'

-- Gerhard

Jul 18 '05 #5
Gerhard Häring wrote:

omission9 wrote:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?


Here's a simple example that does what you want:
>>> orig = "Häring"
>>> "".join([x for x in orig if ord(x) < 128]) 'Hring'

Or, if performance is critical, it's possible something like this would
be faster. (A regex might be even better, avoiding the redundant identity
transformation step.) :
from string import maketrans, translate
table = maketrans('', '')
translate(orig, table, table[128:])

'Hring'
-Peter
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
12707
by: Christopher Armstrong | last post by:
Hello! I'm trying to write a part of a program that will remove all files in its directory. I have tried the std::remove feature of the standard library, but I don't know its syntax. Also, what's the difference between std::remove and std::erase? Thanks for your time!
12
5218
by: Oberon | last post by:
I have a large HTML document. It has hundreds of <span>s which have no attributes so these <span>s are redundant. How can I remove these tags automatically? The document also has <span>s with style attributes that I don't want to remove.
3
1713
by: Phil Powell | last post by:
URL: http://www3.brinskter.com/soa/soa_samp.asp?refURL=valsignalandet This URL produces a nice page with form material to produce user-inputted band search information Problem is that Brinkster interferes with the HTML resultset by embedding their annoying ads all over the place. How on earth do I remove them??
11
5288
by: lovecreatesbeauty | last post by:
Hello experts, Is const_cast only applied to pointers or references? If I have a constant object, then how can I remove constant attribute from it? #include <vector> #include <string> using namespace std;
6
6663
by: Henry Combrinck | last post by:
Hello all I've been approached by the development people about removing the 'public' schema. They complain about having to manually remove the 'public_' tag from table names generated by their development software whenever they link to PG via ODBC. Renaming or using another schema is not what they're after either. 1. If it is possible to remove the public schema, what are the
80
7864
by: Andrew R | last post by:
Hi I'm creating a series of forms, each with with around 15-20 text boxes. The text boxes will show data from tables, but are unbound to make them more flexible. I want the form to be used for both adding new data and modifying existing data. I have created a save button on the form. When the user clicks the save button, the code checks to see if there
3
2717
by: Mark Poppers | last post by:
Assume the following sequence of user actions starting with a Form and an e.g. StatusStrip: 1.) User doubleclicks on the StatusStrip 2.) VisualStudio jumps to the new generated source code mystatusstrip_itemClicked() { ..... } 3.) The user deletes this function from source code (because he decide not to add such an event) 4.) "Rebuild of source code" 5.) An error appears:
33
3712
by: llothar | last post by:
I'm afraid that the GIL is killing the usefullness of python for some types of applications now where 4,8 oder 64 threads on a chip are here or comming soon. What is the status about that for the future of python? I know that at the moment allmost nobody in the scripting world has solved this problem, but it bites and it bites hard. Only groovy as a Java Plugin has support but i never tried it. Writing an interpreter that does MT this...
3
5264
by: Allen Chen [MSFT] | last post by:
Hi Richard, Quote from Richard================================================== However I also want to be able to remove the panes. I have tried to include this, but find that when I first add the pane the remove event does not get handled, though thereafter it is handled without problems. ==================================================
12
2019
by: milk242 | last post by:
Hi, I'm having a problem explaining why this loop does what it does. string isbn = "--0---13-6--15--250-3"; // Go through isbn, remove any non digits for ( int x = 0 ; x < 10 ; x++) { if (!isdigit(isbn))
0
8922
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8753
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8771
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7582
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6429
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5789
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4299
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4519
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2934
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.