470,613 Members | 2,121 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,613 developers. It's quick & easy.

Hi...about some strange character in textfile

Hi all,
I'm a newbie in VB.Net Programming..

Hope that some of you can help me to solve this..

I'm working out to read,parse and save textfile into SQL Server.
The textfile contains thousands of rows with about 50 coloums every row..

Everythings goes well until I found one textfile with some strange character...seems to be Japanese character(because it's a Japanese company who owns this textfile)

The problem is..
Not all rows in this file have this strange characters..and because of this character, the parse function can't work properly(I'm using Substring to read every column).
Because the length of the row become different..
the length of the row with the strange character is 208.The length without this strange character is only 206.

Can someone please tell me how to fix this kind of problems??

Thanks for any suggestions and answers.....
Sep 3 '07 #1
14 3041
180 100+
Hi. I do not know about .NET programming much at all, but I know a lot about dealing with such Unicode characters in VB6, and maybe .NET is similar in this respect...?

A single one of these Unicode characters is made of two bytes.
If, for example, you ask VB to give you the length of...
"aaaか" (3 of letter 'a', then a Hiragana 'ka' - don't know if that'll come out alright on these forums)
...using Len(), then VB will say that it is 4 characters long, but if you use LenB() it'll say it's 5 characters long (5 bytes).

I'm not understanding your problem though.
You say that it's not letting something parse properly, and you also seemed to think that VB thinking it was 2 characters was wrong.

Well, I've tried to explain why it says it's 2 characters less when you remove the Unicode character, but I don't understand what you mean about the parsing, sorry. -_-;

EDIT: So basically, if there are different functions in .NET which let you deal with strings of characters by the 'number of characters' and 'number of bytes', as there are in VB6, maybe you need to play around with the different functions to find ones which work...? (Yep, I'm still not understanding >__<)
Sep 3 '07 #2

I know that it has something to do with the unicode...,but i don't know how can I get over this in my code...

so, it is like this..
I have one textfile with almost 40,000 rows and 50 columns...
in one of this column it has some strange character.
But it shows up only in some rows..not all of the rows in the textfile.That makes the length of each rows become different...

As I say before ,I'm working out to read,parse and save the textfile into SQLServer.
First of all I read the textfile line per line.After that I try to read and save the data column per column in every line. Here, I'm using Substring to do it.
Simple to say, I'm counting the length of every column to get the data and save it to database...

For ex:

The column:
ItemCode ItemDescription InvoiceNo

The Data :
CV1025 HandkerchiefRED SX100 --> no strange character
SC22254 Leather Purse Orange U SC452 --> with strange character

Let say the length for column itemcode is 10,
the length for column itemdescription is 15(here sometimes contains strange characters)
and for column invoiceNo is 10

But because the length become different in some rows...the function I've made to read out the data per column is not working properly anymore...

in row without strange character I'll get the data exactly as the data in textfile according to the column..

column itemcode : CV1025
column ItemDescription : HandkerchiefRED
InvoiceNo : SX100

but in rows with the strange character, I get the data like this:

column itemcode : SC22254
column ItemDescription : Leather Purse Orange (the strange character is missing)
InvoiceNo : 452 (the SC is missing)

and it can't be save into the database.

Hope that the problem is more clear now...

can anyone help me??

Sep 3 '07 #3
180 100+
Sorry, I can't help with SQLServer at all. It means nothing to me.
But I think that your function would only fail if different parts in .NET are working differently. For example, getting the length of a piece of text measures in characters (a Unicode character is counted as 1), but you using Substring to pic out some text measures in bytes (a Unicode character is counted as 2).
Or something to that effect.
That's all I can do, only hint towards where I think the problem is; I don't know how to solve it. T_T
Sep 3 '07 #4
1,445 Expert 1GB

By looking into ur last post, I noticed, u r using a Space Character as a Seperator between Columns.. And If a Field Value Contains a Space, then u may not be able to Parse/Read properly.. So why not use some other NonPrintable Char To seperate the Columns in Text File.. Say Chr(165) or Chr(166) or a Tab..
Or SemiColon/ a Star...
Another way out is, u can use Fixed Lenth Strings.. Say 1 to 10 Chars contain Item Code, 11 To 50 Contain Item Desc.. and so on..

Sep 3 '07 #5
Thanks anyway, Robbie..

I hope that others can give me solutions too..

thanks be4
Sep 3 '07 #6
Thanks Veena..
I think I've been using fixed length string..

Dim itemDescription As String = l.Substring(85, 50)
Dim invoiceNo As String = l.Substring(135, 10)

I begin to read the item description data in position 85 with length 50.
and the invoiceNo will begin at position 135.

How can I fixed this problems??

Sep 3 '07 #7
1,445 Expert 1GB

Can u post the Code(Read and Write TextFile), How are u Populating the TextFile.. If using Fixed Length, then while writing to TextFile, are u Padding the FieldValue with Proper Spaces..? It will be easy to help u..

Sep 3 '07 #8
Actually I don't generate the textfile...It's generated by other application..
I only use the textfile as input file in my application..
So, I upload the file, open,read and then work with it..

Here 's the code to open the file:

Private Sub textfile()
Dim sr As StreamReader = File.OpenText(Me.txtFile.Text)
Dim read As String
read = sr.ReadLine()
While Not read Is Nothing
Dim x = parseLineTextfile(read)
read = sr.ReadLine()
' I open the connection and execute the query here
' I close the connection
End Try
End While

and here's the code tp parse the line...:
Private Function parseLineTextfile(ByVal l As String)
If l.Length <= 0 Then
Return Nothing
End If

Dim year As String = l.Substring(0, 4)
Dim itemCode As String = l.Substring(70,10)
Dim itemDescription As String = l.Substring(80, 50)
Dim InvoiceNo As String = l.Substring(130, 10)

'after that I save the data into the database using parameter

End Function

As I say, the problem is that the strange character change the length of the rows...and cause the cod eto parse can't work properly anymore..and I can't get the right data anymore..

So, any suggestion??
and because there's no separator like Tab or | or others..I can't use split function to cut the data per column..I only can count the beginning position per column and the column length per each column....Does anyone has andere ideas????

Sep 4 '07 #9
1,445 Expert 1GB

What type of Strange Chars do you have? Can you post some text lines which contain these Characters..?

Sep 4 '07 #10
here is two rows data..

200202ML11 SM01 GUDANG WM05 PART ASSY V844930 RACK MOLDING (R) CLP-120/130/150/170 Y20020228693374 PC 288.0000 2.0325 585.36 KZZ

--> without strange characters --> length 208

200202ML11 SM01 GUDANG WM05 PART ASSY V845560 CUSHION 380X7XT4 łŘ Y20020206729872 PC 50.0000 0.0241 1.21 KZZ
--> with strange characters ( łŘ ) --> length206

for my database I'm using varchar.

How can I count this strange character as 1 character, not as 2 bytes???

Sep 4 '07 #11
1,445 Expert 1GB

Before U parse the String, U can Build another String , which excludes all the special Chars, Some thing like this :

Expand|Select|Wrap|Line Numbers
  1. Public Function RemoveSpChar(ByVal TempStr As String) As String
  2. Dim i As Integer
  3. Dim NewStr As String
  4. Dim TStr As String 
  5. Dim TAsc As Integer
  6. NewStr = ""
  7. For i = 1 To Len(TempStr)
  8.    TStr = Mid(TempStr, i ,1)
  9.    TAsc =Asc(Tstr)
  10.    If TAsc >=65 And TAsc<=122 Then
  11.       ' A To Z and a to z
  12.    ElseIf TAsc >= 32 And TAsc<= 57 Then
  13.       ' printable Chars and Numbers
  14.    Else
  15.       'Special Char
  16.       TStr =""
  17.    End If
  18.    NewStr = NewStr & TStr
  19. Next
  20. RemoveSpChar =NewStr
May be U can check for few more Ascii's and zap the Char if not satisfying ur KeyAscii...

Sep 4 '07 #12
I'll try the code now..

Sep 5 '07 #13
Veena...it doesn't work...
I can't catch the strange character....

but thankss anyway..I'm still trying to find the solution..
Sep 5 '07 #14
Veena....I've got it..
thanks a lot for your time and suggestion..:)

Sep 5 '07 #15

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

7 posts views Thread by Hans A | last post: by
7 posts views Thread by copx | last post: by
1 post views Thread by Richard Sweeny | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.