473,837 Members | 1,780 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to read unicode (utf-8) / binary file line by line

13 New Member
Hi programmers,

I want read line by line a Unicode (UTF-8) text file created by Notepad, i don't want display the Unicode string in the screen, i want just read and compare the strings!.

This code read ANSI file line by line, and compare the strings

What i want
  • Read test_ansi.txt line by line
  • if the line = "b" print "YES!"
  • else print "NO!"

read_ansi_line_ by_line.c

Expand|Select|Wrap|Line Numbers
  1. #include <stdio.h>
  2.  
  3. int main()
  4. {
  5.     char *inname = "test_ansi.txt";
  6.     FILE *infile;
  7.     char line_buffer[BUFSIZ]; /* BUFSIZ is defined if you include stdio.h */
  8.     char line_number;
  9.  
  10.     infile = fopen(inname, "r");
  11.     if (!infile) {
  12.         printf("\nfile '%s' not found\n", inname);
  13.         return 0;
  14.     }
  15.     printf("\n%s\n\n", inname);
  16.  
  17.     line_number = 0;
  18.     while (fgets(line_buffer, sizeof(line_buffer), infile)) {
  19.         ++line_number;
  20.         /* note that the newline is in the buffer */
  21.         if (strcmp("b\n", line_buffer) == 0 ){
  22.             printf("%d: YES!\n", line_number);
  23.         }else{
  24.             printf("%d: NO!\n", line_number,line_buffer);
  25.         }
  26.     }
  27.     printf("\n\nTotal: %d\n", line_number);
  28.     return 0;
  29. }
test_ansi.txt

Expand|Select|Wrap|Line Numbers
  1. a
  2. b
  3. c
Compiling

Expand|Select|Wrap|Line Numbers
  1. gcc -o read_ansi_line_by_line read_ansi_line_by_line.c
Output

Expand|Select|Wrap|Line Numbers
  1. test_ansi.txt
  2.  
  3. 1: NO!
  4. 2: YES!
  5. 3: NO!
  6.  
  7.  
  8. Total: 3
Now i need read Unicode (UTF-8) file created by Notepad, after more than 6 months i don't found any good code/library in C can read file coded in UTF-8!, i don't know exactly why but i think the standard C don't support Unicode!

Reading Unicode binary file its OK!, but the probleme is the binary file most be already created in binary mode!, that mean if we want read a Unicode (UTF-8) file created by Notepad we need to translate it from UTF-8 file to BINARY file!

This code write Unicode string to a binary file, NOTE the C file is coded in UTF-8 and compiled by GCC

What i want
  • Write the Unicode char "ب" to test_bin.dat

create_bin.c

Expand|Select|Wrap|Line Numbers
  1. #define UNICODE
  2. #ifdef UNICODE
  3. #define _UNICODE
  4. #else
  5. #define _MBCS
  6. #endif
  7.  
  8. #include <stdio.h>
  9. #include <wchar.h>
  10.  
  11. int main()
  12. {
  13.      /*Data to be stored in file*/
  14.      wchar_t line_buffer[BUFSIZ]=L"ب";
  15.      /*Opening file for writing in binary mode*/
  16.      FILE *infile=fopen("test_bin.dat","wb");
  17.      /*Writing data to file*/
  18.      fwrite(line_buffer, 1, 13, infile);
  19.      /*Closing File*/
  20.      fclose(infile);
  21.  
  22.     return 0;
  23. }
Compiling

Expand|Select|Wrap|Line Numbers
  1. gcc -o create_bin create_bin.c
Output

Expand|Select|Wrap|Line Numbers
  1. create test_bin.dat


Now i want read the binary file line by line and compare!

What i want
  • Read test_bin.dat line by line
  • if the line = "ب" print "YES!"
  • else print "NO!"

read_bin_line_b y_line.c

Expand|Select|Wrap|Line Numbers
  1. #define UNICODE
  2. #ifdef UNICODE
  3. #define _UNICODE
  4. #else
  5. #define _MBCS
  6. #endif
  7.  
  8. #include <stdio.h>
  9. #include <wchar.h>
  10.  
  11. int main()
  12. {
  13.     wchar_t *inname = L"test_bin.dat";
  14.     FILE *infile;
  15.     wchar_t line_buffer[BUFSIZ]; /* BUFSIZ is defined if you include stdio.h */
  16.  
  17.     infile = _wfopen(inname,L"rb");
  18.     if (!infile) {
  19.         wprintf(L"\nfile '%s' not found\n", inname);
  20.         return 0;
  21.     }
  22.     wprintf(L"\n%s\n\n", inname);
  23.  
  24.     /*Reading data from file into temporary buffer*/
  25.     while (fread(line_buffer,1,13,infile)) {
  26.         /* note that the newline is in the buffer */
  27.         if ( wcscmp ( L"ب" , line_buffer ) == 0 ){
  28.              wprintf(L"YES!\n");
  29.         }else{
  30.              wprintf(L"NO!\n", line_buffer);
  31.         }
  32.     }
  33.     /*Closing File*/
  34.     fclose(infile);
  35.     return 0;
  36. }
Compiling

Expand|Select|Wrap|Line Numbers
  1. gcc -o read_bin_line_by_line read_bin_line_by_line.c
Output

Expand|Select|Wrap|Line Numbers
  1. test_bin.dat
  2.  
  3. YES!
THE PROBLEM

This method is VERY LONG! and NOT POWERFUL (i m beginner in software engineering)

Please any one know how to read Unicode file ? (i know its not easy!) Please any one know how to convert Unicode file to Binary file ? (simple method) Please any one know how to read Unicode file in binary mode ? (i m not sure)

Thank You.
Jan 21 '10 #1
12 13741
johny10151981
1,059 Top Contributor
Hello,
few things.
1. UNICODE and utf-8 is not same(if i am not wrong). UNICODE is 2 byte long. On the other hand UTF-8 is a multybyte encoding system.

2. (dont listen to me). Looking for a easy way. not a good idea :)

Best Regrads,
JOHNY
Jan 22 '10 #2
RedSon
5,000 Recognized Expert Expert
Instead of using fgets and strcmp you are going to want to use the wide character version of those methods.

You will have to read the documentation of the OS/libraries you are using to find out what the wide char variants are.

If you are using Windows a quick search on MSDN should be helpful. Also you can do conversions from one to the other.
Jan 22 '10 #3
freeseif
13 New Member
@JOHNY

Yes, you are right, i want edit title to remove "unicode" but no permission ^_^
if you have a UTF-8 project, and you want to read UTF-8 file line by line, what is the easy way you use ? =)
Jan 22 '10 #4
freeseif
13 New Member
@RedSon

6 months of searching in Books, MSDN, Documentations, Internet, Forums.. i never found a solution to read UTF-8 file in C99!, can you help me please ? =)
Jan 22 '10 #5
RedSon
5,000 Recognized Expert Expert
Did you read the Unicode and Character Set functions on MSDN?

http://msdn.microsoft.com/en-us/libr...85(VS.85).aspx
Jan 22 '10 #6
RedSon
5,000 Recognized Expert Expert
Oh wait, if you are using gcc then you are not on a windows machine, so that MSDN link is not going to do you any good. I don't know why you are even searching MSDN like you state in post #5.

That is why I suggested that you search your libraries and other documentation for wide string functions. Your header files that come with C99 should have something for that.
Jan 22 '10 #7
freeseif
13 New Member
@RedSon

First Thank you, i already read all MSDN pages that talking about UTF-8 ^_^, but i think i need use MultiByteToWide Char() after reading string from UTF-8 file, but i don't know how to use exactly!
Jan 22 '10 #8
freeseif
13 New Member
@RedSon

Yes, i m looking for a solution in C99 with GCC, i think i need read the UTF-8 file in binary mode and convert UTF-8 to UTF-16 or not! or other way.. i need help seriously =)
Jan 22 '10 #9
RedSon
5,000 Recognized Expert Expert
Like I said, you won't be able to use it, because you are not building a windows application using windows libraries.

You will need to find an appropriate library call in your headers.
Jan 22 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

0
4613
by: php_xml | last post by:
I have some japanese utf-8 files, translated from english.. how to read it use php program?
4
28393
by: Achim Domma | last post by:
Hi, I read some text from a utf-8 encoded text file like this: text = codecs.open('example.txt','r','utf8').read() If I pass this text to a COM object, I can see that there is still the BOM in the file, which marks the file as utf-8. Simply removing the first character in the string is not ok, because the BOM is optional. So I tried something like this:
4
6864
by: 99miles | last post by:
Hello- I am having trouble figuring out how to read in a UTF-8 file. Could somebody please give me a simple example? I am using VS 6.0 writing for windows only. The file is a zstring.dct file. Thanks a lot- Mac
0
4827
by: sangui | last post by:
Helllo. this is biginner programmer. Would u check the file that I programed on c#(winform)? I tryed to make the program reading the binary file by C# programming but I failed. If u have more free time, would u program for ur skill up and me ^^; below contens is programed by unix c.
5
6021
by: davihigh | last post by:
Hi Friends: fileObj = codecs.open( filename, "r", "utf-8" ) u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file print u It says error: UnicodeEncodeError: 'gbk' codec can't encode character u'\ufeff' in position 0:
1
1254
by: Anderson | last post by:
Dear all, Could anyone tell me how to convert binary data in buffer to string format,thanks in advance! Anderson
6
7382
by: ericunfuk | last post by:
Hi ALL, I want to read a binary file(it's pic.tif file, I guess it's binary file?), then write it to a new file), I have several questions about this process: When I use fread() to read a chunk of the file into a buffer, when it encounters the end of the file, will the EOF indicator be put into the buffer automatically just as an ordinary byte of the file, or do I have to do it manually?
6
342
by: zl2k | last post by:
hi, there I have a appendable binary file of complex data structure named data.bin created by myself. It is written in the following format: number of Data, Data array Suppose I have following data.bin (3 Data appended to 2 Data): 2, data0, data1, 3, data0, data1, data2
3
2351
by: Jim Cousins | last post by:
I succeed in extracting all of the header information, and then only a portion of the data. The data section is 123,410 doubles, but after retrieving 62 of the numbers, I can not go further, struct.unpack indicates that the string argument is the wrong length. I set up a loop to get the numbers one at a time, and it stops at the same place. The file is not corrupt, and will open in the original software. Does anyone have suggestions for figuring...
0
9693
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10897
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10583
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10638
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7823
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5679
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5859
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4056
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3128
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.