473,626 Members | 3,216 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

unicode text file

I have some unicode (utf8) text file. I _tried_ to write a simple
program that read one of them and write it to the standard output but...
of course it doesn't work. There is an easy way to do it? Thanks, K.

This is my program.

#include <fstream>
#include <iostream>
#include <string>

using namespace std;

int main(){
ifstream infile ("in.txt");
string s;
while (infile >> s) {
cout << s;
}
}
Jul 23 '05 #1
14 2606

"Koulbak" <tu********@gma il.com> wrote in message
news:42******** **@news.bluewin .ch...
I have some unicode (utf8) text file. I _tried_ to write a simple program
that read one of them and write it to the standard output but... of course
it doesn't work. There is an easy way to do it? Thanks, K.

This is my program.

#include <fstream>
#include <iostream>
#include <string>

using namespace std;

int main(){
ifstream infile ("in.txt");
You should here check that file was opened successfully
before attempting to read from it.
string s;
while (infile >> s) {
cout << s;
}
}


Try using 'wifstream' and 'wcout'.

-Mike
Jul 23 '05 #2
Mike Wahler wrote:
[read unicode text file]
int main(){
ifstream infile ("in.txt");


You should here check that file was opened successfully
before attempting to read from it.


In the real program of course I do it, but in my post I put only the
essential part of the question.
string s;
while (infile >> s) {
cout << s;
}
}

Try using 'wifstream' and 'wcout'.


1 Tried, it doesn't compile.

error C2679: binary '>>' : no operator found which takes a right-hand
operand of type 'std::string' (or there is no acceptable conversion)

I added also wstring and it compile but it doens't work correctly: it
prints a lot of garbage.

2 I thought that with C++ there was the possibility to use exactly the
standard way (avoid special construct as wcout) maybe setting some
library option. Is it not at all true?

Thanks a lot, K.
Jul 23 '05 #3
Koulbak wrote:
1 Tried, it doesn't compile.

error C2679: binary '>>' : no operator found which takes a right-hand
operand of type 'std::string' (or there is no acceptable conversion)

You should use wstring. A wchar_t string literal is prefixed with L. For example:
wstring s= L"Some string";
I added also wstring and it compile but it doens't work correctly: it
prints a lot of garbage.

2 I thought that with C++ there was the possibility to use exactly the
standard way (avoid special construct as wcout) maybe setting some
library option. Is it not at all true?


These *are* standard facilities. All string facilities come with their wchar_t equivalents
(including the facilities of the C-subset).

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #4
Ioannis Vranos wrote:
You should use wstring. [...]


I add wstring, it doesn't works.
2 I thought that with C++ there was the possibility to use exactly the
standard way (avoid special construct as wcout) maybe setting some
library option. Is it not at all true?

These *are* standard facilities. All string facilities come with their
wchar_t equivalents (including the facilities of the C-subset).


Sorry I was not clear at all. I would like to avoid as mush as possible
the implementation details. I don't want to use explicitely unicode
function but simply say to the compiler or to the library that my
character code is unicode and then read a file exactly in the usual way.

I would like to avoid to learn a new set of function to read and
manipulate unicode character, unicode string and so on. Of course if it
is possible.

Thanks, K.
Jul 23 '05 #5
Koulbak wrote:
string s;
while (infile >> s) {
cout << s;
}
}

1 Tried, it doesn't compile.

error C2679: binary '>>' : no operator found which takes a right-hand

operand of type 'std::string' (or there is no acceptable conversion)
You have not included all necessary or the wrong header files (or have
the wrong files in your include path).
I added also wstring and it compile but it doens't work correctly: it prints a lot of garbage.


wstring is not appropriate for UTF-8.

R.C.

Jul 23 '05 #6
[....]
I added also wstring and it compile but it doens't work correctly: it


prints a lot of garbage.

wstring is not appropriate for UTF-8.


Ok, that' s the problem. My encoding is UTF-8.

Any solution?
Thanks, K.
Jul 23 '05 #7
Koulbak wrote:

wstring is not appropriate for UTF-8.


Ok, that' s the problem. My encoding is UTF-8.

Any solution?


Maybe I've been wrong. See e.g.
http://www.cl.cam.ac.uk/~mgk25/unicode.html
http://www-106.ibm.com/developerwork.../l-linuni.html
or search for other 'UTF-8' resources.

Jul 23 '05 #8
Koulbak wrote:
I have some unicode (utf8) text file. I _tried_ to write a
simple program that read one of them and write it to the
standard output but... of course it doesn't work. There
is an easy way to do it? Thanks, K.

This is my program.

#include <fstream>
#include <iostream>
#include <string>

using namespace std;

int main(){
ifstream infile ("in.txt");
string s;
while (infile >> s) {
cout << s;
}
}


ostream >> string reads a word (up to whitespace), and then
ignores any adjacent whitespace and newlines.
To do line-by-line reading, you would go:

while (getline(infile , s))
cout << s;

But this is not good for UTF-8 files because newline characters
might be part of a UTF-8 code.

To output the whole file at once:

cout << infile.rdbuf();

I'm assuming you want to output UTF-8 on stdout (Standard
C++ offers no facilities for converting UTF-8 to a stream
of wide characters). Can you clarify your intention?

The best thing to do (IMHO) would be to open the file in
binary mode, and also force std::cout into binary mode. (This
would require a system-specific code). Then, no translation
will occur and it will work correctly.

If you can't force cout to binary, then it *might* work to
open the input in text mode too, and hope that the input
conversions match the output conversions!

Jul 23 '05 #9
Koulbak wrote:
Sorry I was not clear at all. I would like to avoid as mush as possible
the implementation details. I don't want to use explicitely unicode
function but simply say to the compiler or to the library that my
character code is unicode and then read a file exactly in the usual way.

I would like to avoid to learn a new set of function to read and
manipulate unicode character, unicode string and so on. Of course if it
is possible.


wchar_t represents the largest character set of a system, char mainly represents a byte
and 1 byte character sets. If you have to deal with various character sets, then better
stick to wchar_t and the corresponding facilities for it (which are the same with plain
char facilities, with an additional w in their name) .

--
Ioannis Vranos

http://www23.brinkster.com/noicys
Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
26598
by: ..... | last post by:
I have an established program that I am changing to allow users to select one of eight languages and have all the label captions change accordingly. I have no problems with English, French, Dutch, German, Spanish or Italian. The Polish language is causing me trouble. From what I have read, VB supports UNICODE, in fact it uses UNICODE internally, which means that ANY character in pretty much any language should be readable from a UNICODE...
3
17612
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code that - opens a file appropriately for output - writes to this file Thanks very much. Michael Weir
19
5655
by: Svennglenn | last post by:
I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for letters. When I run the following testscript I get an error message. # -*- coding: cp1252 -*-
3
7760
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a DB and display it onscreen. No matter which way I open the file, convert it to Unicode/leave it as is or what ever, I see all single bytes ok, but double bytes become 2 seperate single bytes. Surely there is an easy way to convert these mixed...
3
5464
by: Kidus Yared | last post by:
I am having a problem displaying Unicode characters on my Forms labels and buttons. After coding Button1.Text = unicode; where the unicode is a Unicode character or string (‘\u1234’ or “\u1234”) It seems to work the first time I set the button to the Unicode character. After a while, when saving my code, I get a pop-up window stating that I need to save the file as a Unicode else my changes will not be saved. Seance I do not want...
1
6905
by: David Dvali | last post by:
Hello. I have a problem with sending Unicode text in mail message. So what I do: First of all I have some template file like this: ================================= <html> <head><title>Test Message</title></head> <body> <p>Hello {0}</p>
10
8045
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
18
34111
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found Encoding.Convert, but that needs byte arrays. Thanks, /Ger
6
7021
by: Jeff | last post by:
Hi - I'm setting up a streamreader in a VB.NET app to read a text file and display its contents in a multiline textbox. If I set it up with System.Text.Encoding.Unicode, it reads a unicode file just fine. If I set it up as ASCII, it reads a non-unicode text file. But I don't know the file format in advance. How can my app determine whether to use Unicode encoding before I read the
2
7523
by: starffly | last post by:
I want to read a xml file in Unicode, UTF-8 or a native encoding into a wchar_t type string, so i write a routine as follows, however, sometimes a Unicode file including Chinese character cannot be read completely. and I cannot tell where its root located, so NEED your help, GIVE me a hand please. THX. static Status LoadXMLFile2String(const char *filename, wchar_t *text){ FILE *f; if(!(f = fopen(filename, "r"))){ __printDebugA("Input...
0
8268
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
8202
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8510
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7199
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5575
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4202
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2628
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1812
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1512
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.