By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,558 Members | 1,603 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,558 IT Pros & Developers. It's quick & easy.

Hi...about some strange character in textfile

P: 9
Hi all,
I'm a newbie in VB.Net Programming..

Hope that some of you can help me to solve this..

I'm working out to read,parse and save textfile into SQL Server.
The textfile contains thousands of rows with about 50 coloums every row..

Everythings goes well until I found one textfile with some strange character...seems to be Japanese character(because it's a Japanese company who owns this textfile)

The problem is..
Not all rows in this file have this strange characters..and because of this character, the parse function can't work properly(I'm using Substring to read every column).
Because the length of the row become different..
the length of the row with the strange character is 208.The length without this strange character is only 206.

Can someone please tell me how to fix this kind of problems??

Thanks for any suggestions and answers.....
Sep 3 '07 #1
Share this Question
Share on Google+
14 Replies


Robbie
100+
P: 180
Hi. I do not know about .NET programming much at all, but I know a lot about dealing with such Unicode characters in VB6, and maybe .NET is similar in this respect...?

A single one of these Unicode characters is made of two bytes.
If, for example, you ask VB to give you the length of...
"aaaか" (3 of letter 'a', then a Hiragana 'ka' - don't know if that'll come out alright on these forums)
...using Len(), then VB will say that it is 4 characters long, but if you use LenB() it'll say it's 5 characters long (5 bytes).

I'm not understanding your problem though.
You say that it's not letting something parse properly, and you also seemed to think that VB thinking it was 2 characters was wrong.

Well, I've tried to explain why it says it's 2 characters less when you remove the Unicode character, but I don't understand what you mean about the parsing, sorry. -_-;

EDIT: So basically, if there are different functions in .NET which let you deal with strings of characters by the 'number of characters' and 'number of bytes', as there are in VB6, maybe you need to play around with the different functions to find ones which work...? (Yep, I'm still not understanding >__<)
Sep 3 '07 #2

P: 9
Thanks,Robbie..

I know that it has something to do with the unicode...,but i don't know how can I get over this in my code...

so, it is like this..
I have one textfile with almost 40,000 rows and 50 columns...
in one of this column it has some strange character.
But it shows up only in some rows..not all of the rows in the textfile.That makes the length of each rows become different...

As I say before ,I'm working out to read,parse and save the textfile into SQLServer.
First of all I read the textfile line per line.After that I try to read and save the data column per column in every line. Here, I'm using Substring to do it.
Simple to say, I'm counting the length of every column to get the data and save it to database...

For ex:

The column:
ItemCode ItemDescription InvoiceNo

The Data :
CV1025 HandkerchiefRED SX100 --> no strange character
SC22254 Leather Purse Orange U SC452 --> with strange character

Let say the length for column itemcode is 10,
the length for column itemdescription is 15(here sometimes contains strange characters)
and for column invoiceNo is 10


But because the length become different in some rows...the function I've made to read out the data per column is not working properly anymore...

in row without strange character I'll get the data exactly as the data in textfile according to the column..

column itemcode : CV1025
column ItemDescription : HandkerchiefRED
InvoiceNo : SX100


but in rows with the strange character, I get the data like this:

column itemcode : SC22254
column ItemDescription : Leather Purse Orange (the strange character is missing)
InvoiceNo : 452 (the SC is missing)

and it can't be save into the database.

Hope that the problem is more clear now...

can anyone help me??

thanks...
Sep 3 '07 #3

Robbie
100+
P: 180
Sorry, I can't help with SQLServer at all. It means nothing to me.
But I think that your function would only fail if different parts in .NET are working differently. For example, getting the length of a piece of text measures in characters (a Unicode character is counted as 1), but you using Substring to pic out some text measures in bytes (a Unicode character is counted as 2).
Or something to that effect.
That's all I can do, only hint towards where I think the problem is; I don't know how to solve it. T_T
Sep 3 '07 #4

QVeen72
Expert 100+
P: 1,445
Hi,

By looking into ur last post, I noticed, u r using a Space Character as a Seperator between Columns.. And If a Field Value Contains a Space, then u may not be able to Parse/Read properly.. So why not use some other NonPrintable Char To seperate the Columns in Text File.. Say Chr(165) or Chr(166) or a Tab..
Or SemiColon/ a Star...
Another way out is, u can use Fixed Lenth Strings.. Say 1 to 10 Chars contain Item Code, 11 To 50 Contain Item Desc.. and so on..

REgards
Veena
Sep 3 '07 #5

P: 9
Thanks anyway, Robbie..

I hope that others can give me solutions too..


thanks be4
Sep 3 '07 #6

P: 9
Thanks Veena..
I think I've been using fixed length string..

Dim itemDescription As String = l.Substring(85, 50)
Dim invoiceNo As String = l.Substring(135, 10)

I begin to read the item description data in position 85 with length 50.
and the invoiceNo will begin at position 135.

How can I fixed this problems??

thanks
Sep 3 '07 #7

QVeen72
Expert 100+
P: 1,445
Hi,

Can u post the Code(Read and Write TextFile), How are u Populating the TextFile.. If using Fixed Length, then while writing to TextFile, are u Padding the FieldValue with Proper Spaces..? It will be easy to help u..

REgards
Veena
Sep 3 '07 #8

P: 9
Actually I don't generate the textfile...It's generated by other application..
I only use the textfile as input file in my application..
So, I upload the file, open,read and then work with it..

Here 's the code to open the file:

Private Sub textfile()
Dim sr As StreamReader = File.OpenText(Me.txtFile.Text)
Dim read As String
read = sr.ReadLine()
While Not read Is Nothing
Dim x = parseLineTextfile(read)
read = sr.ReadLine()
Try
' I open the connection and execute the query here
Finally
' I close the connection
End Try
End While


and here's the code tp parse the line...:
Private Function parseLineTextfile(ByVal l As String)
If l.Length <= 0 Then
Return Nothing
End If

Dim year As String = l.Substring(0, 4)
'
'
'
Dim itemCode As String = l.Substring(70,10)
Dim itemDescription As String = l.Substring(80, 50)
Dim InvoiceNo As String = l.Substring(130, 10)
'
'
'

'after that I save the data into the database using parameter

End Function

As I say, the problem is that the strange character change the length of the rows...and cause the cod eto parse can't work properly anymore..and I can't get the right data anymore..

So, any suggestion??
and because there's no separator like Tab or | or others..I can't use split function to cut the data per column..I only can count the beginning position per column and the column length per each column....Does anyone has andere ideas????

thanks....
Sep 4 '07 #9

QVeen72
Expert 100+
P: 1,445
Hi,

What type of Strange Chars do you have? Can you post some text lines which contain these Characters..?

Regards
Veena
Sep 4 '07 #10

P: 9
here is two rows data..

200202ML11 SM01 GUDANG WM05 PART ASSY V844930 RACK MOLDING (R) CLP-120/130/150/170 Y20020228693374 PC 288.0000 2.0325 585.36 KZZ

--> without strange characters --> length 208


200202ML11 SM01 GUDANG WM05 PART ASSY V845560 CUSHION 380X7XT4 Y20020206729872 PC 50.0000 0.0241 1.21 KZZ
--> with strange characters ( ) --> length206

for my database I'm using varchar.

How can I count this strange character as 1 character, not as 2 bytes???

thanks...
Sep 4 '07 #11

QVeen72
Expert 100+
P: 1,445
Hi,

Before U parse the String, U can Build another String , which excludes all the special Chars, Some thing like this :

Expand|Select|Wrap|Line Numbers
  1. Public Function RemoveSpChar(ByVal TempStr As String) As String
  2. Dim i As Integer
  3. Dim NewStr As String
  4. Dim TStr As String 
  5. Dim TAsc As Integer
  6. NewStr = ""
  7. For i = 1 To Len(TempStr)
  8.    TStr = Mid(TempStr, i ,1)
  9.    TAsc =Asc(Tstr)
  10.    If TAsc >=65 And TAsc<=122 Then
  11.       ' A To Z and a to z
  12.    ElseIf TAsc >= 32 And TAsc<= 57 Then
  13.       ' printable Chars and Numbers
  14.    Else
  15.       'Special Char
  16.       TStr =""
  17.    End If
  18.    NewStr = NewStr & TStr
  19. Next
  20. RemoveSpChar =NewStr
  21.  
May be U can check for few more Ascii's and zap the Char if not satisfying ur KeyAscii...


REgards
Veena
Sep 4 '07 #12

P: 9
thanks..Veena..
I'll try the code now..

Li
Sep 5 '07 #13

P: 9
Veena...it doesn't work...
I can't catch the strange character....

but thankss anyway..I'm still trying to find the solution..
Sep 5 '07 #14

P: 9
Veena....I've got it..
thanks a lot for your time and suggestion..:)

Li
Sep 5 '07 #15

Post your reply

Sign in to post your reply or Sign up for a free account.