473,406 Members | 2,894 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Hi...about some strange character in textfile

9
Hi all,
I'm a newbie in VB.Net Programming..

Hope that some of you can help me to solve this..

I'm working out to read,parse and save textfile into SQL Server.
The textfile contains thousands of rows with about 50 coloums every row..

Everythings goes well until I found one textfile with some strange character...seems to be Japanese character(because it's a Japanese company who owns this textfile)

The problem is..
Not all rows in this file have this strange characters..and because of this character, the parse function can't work properly(I'm using Substring to read every column).
Because the length of the row become different..
the length of the row with the strange character is 208.The length without this strange character is only 206.

Can someone please tell me how to fix this kind of problems??

Thanks for any suggestions and answers.....
Sep 3 '07 #1
14 3268
Robbie
180 100+
Hi. I do not know about .NET programming much at all, but I know a lot about dealing with such Unicode characters in VB6, and maybe .NET is similar in this respect...?

A single one of these Unicode characters is made of two bytes.
If, for example, you ask VB to give you the length of...
"aaaか" (3 of letter 'a', then a Hiragana 'ka' - don't know if that'll come out alright on these forums)
...using Len(), then VB will say that it is 4 characters long, but if you use LenB() it'll say it's 5 characters long (5 bytes).

I'm not understanding your problem though.
You say that it's not letting something parse properly, and you also seemed to think that VB thinking it was 2 characters was wrong.

Well, I've tried to explain why it says it's 2 characters less when you remove the Unicode character, but I don't understand what you mean about the parsing, sorry. -_-;

EDIT: So basically, if there are different functions in .NET which let you deal with strings of characters by the 'number of characters' and 'number of bytes', as there are in VB6, maybe you need to play around with the different functions to find ones which work...? (Yep, I'm still not understanding >__<)
Sep 3 '07 #2
blumen
9
Thanks,Robbie..

I know that it has something to do with the unicode...,but i don't know how can I get over this in my code...

so, it is like this..
I have one textfile with almost 40,000 rows and 50 columns...
in one of this column it has some strange character.
But it shows up only in some rows..not all of the rows in the textfile.That makes the length of each rows become different...

As I say before ,I'm working out to read,parse and save the textfile into SQLServer.
First of all I read the textfile line per line.After that I try to read and save the data column per column in every line. Here, I'm using Substring to do it.
Simple to say, I'm counting the length of every column to get the data and save it to database...

For ex:

The column:
ItemCode ItemDescription InvoiceNo

The Data :
CV1025 HandkerchiefRED SX100 --> no strange character
SC22254 Leather Purse Orange U SC452 --> with strange character

Let say the length for column itemcode is 10,
the length for column itemdescription is 15(here sometimes contains strange characters)
and for column invoiceNo is 10


But because the length become different in some rows...the function I've made to read out the data per column is not working properly anymore...

in row without strange character I'll get the data exactly as the data in textfile according to the column..

column itemcode : CV1025
column ItemDescription : HandkerchiefRED
InvoiceNo : SX100


but in rows with the strange character, I get the data like this:

column itemcode : SC22254
column ItemDescription : Leather Purse Orange (the strange character is missing)
InvoiceNo : 452 (the SC is missing)

and it can't be save into the database.

Hope that the problem is more clear now...

can anyone help me??

thanks...
Sep 3 '07 #3
Robbie
180 100+
Sorry, I can't help with SQLServer at all. It means nothing to me.
But I think that your function would only fail if different parts in .NET are working differently. For example, getting the length of a piece of text measures in characters (a Unicode character is counted as 1), but you using Substring to pic out some text measures in bytes (a Unicode character is counted as 2).
Or something to that effect.
That's all I can do, only hint towards where I think the problem is; I don't know how to solve it. T_T
Sep 3 '07 #4
QVeen72
1,445 Expert 1GB
Hi,

By looking into ur last post, I noticed, u r using a Space Character as a Seperator between Columns.. And If a Field Value Contains a Space, then u may not be able to Parse/Read properly.. So why not use some other NonPrintable Char To seperate the Columns in Text File.. Say Chr(165) or Chr(166) or a Tab..
Or SemiColon/ a Star...
Another way out is, u can use Fixed Lenth Strings.. Say 1 to 10 Chars contain Item Code, 11 To 50 Contain Item Desc.. and so on..

REgards
Veena
Sep 3 '07 #5
blumen
9
Thanks anyway, Robbie..

I hope that others can give me solutions too..


thanks be4
Sep 3 '07 #6
blumen
9
Thanks Veena..
I think I've been using fixed length string..

Dim itemDescription As String = l.Substring(85, 50)
Dim invoiceNo As String = l.Substring(135, 10)

I begin to read the item description data in position 85 with length 50.
and the invoiceNo will begin at position 135.

How can I fixed this problems??

thanks
Sep 3 '07 #7
QVeen72
1,445 Expert 1GB
Hi,

Can u post the Code(Read and Write TextFile), How are u Populating the TextFile.. If using Fixed Length, then while writing to TextFile, are u Padding the FieldValue with Proper Spaces..? It will be easy to help u..

REgards
Veena
Sep 3 '07 #8
blumen
9
Actually I don't generate the textfile...It's generated by other application..
I only use the textfile as input file in my application..
So, I upload the file, open,read and then work with it..

Here 's the code to open the file:

Private Sub textfile()
Dim sr As StreamReader = File.OpenText(Me.txtFile.Text)
Dim read As String
read = sr.ReadLine()
While Not read Is Nothing
Dim x = parseLineTextfile(read)
read = sr.ReadLine()
Try
' I open the connection and execute the query here
Finally
' I close the connection
End Try
End While


and here's the code tp parse the line...:
Private Function parseLineTextfile(ByVal l As String)
If l.Length <= 0 Then
Return Nothing
End If

Dim year As String = l.Substring(0, 4)
'
'
'
Dim itemCode As String = l.Substring(70,10)
Dim itemDescription As String = l.Substring(80, 50)
Dim InvoiceNo As String = l.Substring(130, 10)
'
'
'

'after that I save the data into the database using parameter

End Function

As I say, the problem is that the strange character change the length of the rows...and cause the cod eto parse can't work properly anymore..and I can't get the right data anymore..

So, any suggestion??
and because there's no separator like Tab or | or others..I can't use split function to cut the data per column..I only can count the beginning position per column and the column length per each column....Does anyone has andere ideas????

thanks....
Sep 4 '07 #9
QVeen72
1,445 Expert 1GB
Hi,

What type of Strange Chars do you have? Can you post some text lines which contain these Characters..?

Regards
Veena
Sep 4 '07 #10
blumen
9
here is two rows data..

200202ML11 SM01 GUDANG WM05 PART ASSY V844930 RACK MOLDING (R) CLP-120/130/150/170 Y20020228693374 PC 288.0000 2.0325 585.36 KZZ

--> without strange characters --> length 208


200202ML11 SM01 GUDANG WM05 PART ASSY V845560 CUSHION 380X7XT4 ¸Û Y20020206729872 PC 50.0000 0.0241 1.21 KZZ
--> with strange characters ( ¸Û ) --> length206

for my database I'm using varchar.

How can I count this strange character as 1 character, not as 2 bytes???

thanks...
Sep 4 '07 #11
QVeen72
1,445 Expert 1GB
Hi,

Before U parse the String, U can Build another String , which excludes all the special Chars, Some thing like this :

Expand|Select|Wrap|Line Numbers
  1. Public Function RemoveSpChar(ByVal TempStr As String) As String
  2. Dim i As Integer
  3. Dim NewStr As String
  4. Dim TStr As String 
  5. Dim TAsc As Integer
  6. NewStr = ""
  7. For i = 1 To Len(TempStr)
  8.    TStr = Mid(TempStr, i ,1)
  9.    TAsc =Asc(Tstr)
  10.    If TAsc >=65 And TAsc<=122 Then
  11.       ' A To Z and a to z
  12.    ElseIf TAsc >= 32 And TAsc<= 57 Then
  13.       ' printable Chars and Numbers
  14.    Else
  15.       'Special Char
  16.       TStr =""
  17.    End If
  18.    NewStr = NewStr & TStr
  19. Next
  20. RemoveSpChar =NewStr
  21.  
May be U can check for few more Ascii's and zap the Char if not satisfying ur KeyAscii...


REgards
Veena
Sep 4 '07 #12
blumen
9
thanks..Veena..
I'll try the code now..

Li
Sep 5 '07 #13
blumen
9
Veena...it doesn't work...
I can't catch the strange character....

but thankss anyway..I'm still trying to find the solution..
Sep 5 '07 #14
blumen
9
Veena....I've got it..
thanks a lot for your time and suggestion..:)

Li
Sep 5 '07 #15

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: Hans A | last post by:
I have a textfile "textfile.txt" containing a list of words. There is one word on each line. I want to pick two random lines from this textfile, and I have tried to do something like: //Loading...
7
by: copx | last post by:
For some reason Python (on Windows) doesn't use the system's default character set and that's a serious problem for me. I need to process German textfiles (containing umlauts and other > 7bit...
4
by: Andyza | last post by:
I'm using FileSystemObject to open and write to a tab delimited text file. First, I connect to a database and select some data. Then I create the text file and insert each record in the text...
1
by: Richard Sweeny | last post by:
I will be supplied a file of names delimited by the ASCII character 13. I know in AppleScript I would set this : set cr to ASCII character 13 How do I refer or set this in java. I figure I can...
1
by: Nathan Sokalski | last post by:
Visual Studio 2005 unexpectedly stopped generating the *.designer.vb files for *.aspx and *.ascx files. After a few days of frustration trying to fix this, I noticed that it had the following...
6
by: ssetz | last post by:
Hello, For work, I need to write a password filter. The problem is that my C+ + experience is only some practice in school, 10 years ago. I now develop in C# which is completely different to me....
6
by: Lasse Edsvik | last post by:
Hello I have a slight problem, I'm trying to open a textfile that has been saved as UTF-8. But when I run it it displays strange chars eventhough i've specified that it should read the file as...
7
by: tempest | last post by:
Hi all. This is a rather long posting but I have some questions concerning the usage of character entities in XML documents and PCI security compliance. The company I work for is using a...
1
by: asedt | last post by:
With my Excel macro and two text files I want to create a new textfile containing the first textfile then text from the sheet and then the second textfile. My problem is that i don't know how to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.