473,394 Members | 1,806 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Unicode, lists & strings

A document contains text and notes below it. Both sections contains unicode. I want to add a disclaimer above the notelist.

Expand|Select|Wrap|Line Numbers
  1. import re
  2. f = open('C:/example.txt', 'r') 
  3. source =  f.read()
  4. patt = re.compile(r'(.*)[a-zA-Z\-]+\s?(?<![Error]):[^\[\]\n]*', re.DOTALL) # finding the main text
  5. text= patt.search(source) # returns text containing unicode
  6. patt = re.compile(r'[a-zA-Z\-]+\s?(?<![Error]):[^\[\]\n]*') # finding the notes
  7. notelist = patt.findall(source) # returns a list of notes, containing more unicode
  8. notes = u'\n'.join(str(x) for x in notelist)  # converting the list to a string
  9. outputFile = open('C:/example.txt', 'w')
  10. outputFile.write(u'%s%s\n\n%s' % (text, 'This is a disclaimer above the list of notes',  notes))  # writes the orignal text to the original file with the disclaimer added
The first error here comes in point 8 where the list 'notelist' is converted to a string. The second one appears at point 10 where 'text' contains unicode characters that cannot be decoded.
Apr 24 '08 #1
2 1471
jlm699
314 100+
Expand|Select|Wrap|Line Numbers
  1. import re
  2. f = open('C:/example.txt', 'r') 
  3. source =  f.read()
  4. patt = re.compile(r'(.*)[a-zA-Z\-]+\s?(?<![Error]):[^\[\]\n]*', re.DOTALL) # finding the main text
  5. text= patt.search(source) # returns text containing unicode
  6. patt = re.compile(r'[a-zA-Z\-]+\s?(?<![Error]):[^\[\]\n]*') # finding the notes
  7. notelist = patt.findall(source) # returns a list of notes, containing more unicode
  8. notes = u'\n'.join(str(x) for x in notelist)  # converting the list to a string
  9. outputFile = open('C:/example.txt', 'w')
  10. outputFile.write(u'%s%s\n\n%s' % (text, 'This is a disclaimer above the list of notes',  notes))  # writes the orignal text to the original file with the disclaimer added
The first error here comes in point 8 where the list 'notelist' is converted to a string. The second one appears at point 10 where 'text' contains unicode characters that cannot be decoded.
What are your errors?

On first inspection I'd say add [] around your list comprehension on line 8, ie:
Expand|Select|Wrap|Line Numbers
  1. notes = u'\n'.join([str(x) for x in notelist])  # converting the list to a string
Apr 24 '08 #2
What are your errors?

On first inspection I'd say add [] around your list comprehension on line 8, ie:
Expand|Select|Wrap|Line Numbers
  1. notes = u'\n'.join([str(x) for x in notelist])  # converting the list to a string
It turns out my problems do not lie where I thought they did. I'll have to do more research; thanks for the help.
Apr 25 '08 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

5
by: Edward K. Ream | last post by:
Am I reading pep 277 correctly? On Windows NT/XP, should filenames always be converted to Unicode using the mbcs encoding? For example, myFile = unicode(__file__, "mbcs", "strict") This...
1
by: Jonathon Blake | last post by:
All: Question Python is currently Unicode Compliant. What happens when strings are read in from text files that were created using GB 2312-1980, or KPS 9566-2003, or other, equally...
3
by: anthony hornby | last post by:
Hi, I am starting my honours degree project and part of it is going to be manipulating ASCII encoded XML files from a legacy database and converting them to Unicode and doing text processing stuff...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
7
by: Robert | last post by:
Hello, I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this: I want to use win32-fuctions like win32ui.MessageBox, listctrl.InsertItem ..... to get unicode strings on the...
15
by: John Salerno | last post by:
Forgive my newbieness, but I don't quite understand why Unicode is still something that needs special treatment in Python (and perhaps elsewhere). I'm reading Dive Into Python right now, and it...
4
by: laxmikiran.bachu | last post by:
Can we have change a unicode string Type object to a Tuple type object.. If so how ????
1
by: Peter Robinson | last post by:
Dear list I am at my wits end on what seemed a very simple task: I have some greek text, nicely encoded in utf8, going in and out of a xml database, being passed over and beautifully displayed...
1
by: nkarkhan | last post by:
Hello, I have a list of strings, some of the strings might be unicode. I am trying to a .join operation on the list and the .join raises a unicode exception. I am looking for ways to get around...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.