By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,266 Members | 1,816 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,266 IT Pros & Developers. It's quick & easy.

Problems with Replace Method

P: n/a
Hi, I am loading a CSV file ( Comma Seperated Value) into a Richtext box. I have a routine that splits the data up when it hits
the "," and then copies the results into a listbox. The data also has some different characters in it that I am trying to
remove. The small a with two dots over it and the small y with two dots over it. Here is my code so far to remove the small y:

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the results to a listbox

Dim str As String = RichTextBox1.Text

Dim x As String = Chr(255) 'Chr(255)

Dim arr() As String = str.Split(","c)

'Dim arr() As String = str.Split(Chr(255), c)

For Each s As String In arr

' s = Replace(Chr(255), Chr(32),)

For Each x In s

s.Replace(x, Chr(32))

Next

ListBox1.Items.Add(s)

Next

End Sub

The problem is, when I put a Breakpoint on ListBox1.Items.Add(s), I can step into the code as it goes thru the data from the
Richtextbox and it shows the value for X as being = """" instead of the y with the two dots over it! Therefore instead of
replacing the y with a Space ( Chr(32) , it does nothing! I have checked to be sure I am using the right character code and
that part is correct. ( I tested by setting a labels text property to : Chr(255) and it displayed correctly.)

I know that I am doing something wrong , I just cannot see what it is.

Any suggestions would be greatly appreciated.

james



Nov 21 '05 #1
Share this Question
Share on Google+
18 Replies


P: n/a
On Fri, 12 Nov 2004 16:17:53 -0600, james wrote:
s.Replace(x, Chr(32))


This line will not replace anything, you must assign the result back to the
string:

s = s.Replace(x,Chr(32))

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.
Nov 21 '05 #2

P: n/a
On Fri, 12 Nov 2004 16:17:53 -0600, james wrote:
s.Replace(x, Chr(32))


This line will not replace anything, you must assign the result back to the
string:

s = s.Replace(x,Chr(32))

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.
Nov 21 '05 #3

P: n/a
Chris, thanks for the suggestion, however, that just replaces all the text with a space " " .
What I am having a problem with is that the value of X that I define as Chr(255) ,,, which is supposed to be the small y with
two dots over it, has am empty value instead of the intended value. I don't understand how X 's value is being changed as I
have set it at one value and for whatever reason it becomes another.
I thought that maybe, I was actually using the wrong Chr value for the little y with the two dots above.
But, I cannot find any reference in Help (VB.NET 2003) that shows what the character values are. There is no table(s) in the
docs that I can find like my old VB6 reference library that had the ANSI character sets listed with the correct Chr(numbers)
listed that I can use to define the character I need to find and replace using Chr or ChrW. I think that is the problem.
VB.NET , since it uses Unicode for Characters, is not reading the correct value for X. Even though I can display that value and
character using Chr(255) on a label and it will show the correct character.
I'm getting cross-eyed (hard for a one eyed guy to do) looking for this reference and the solution to what should be a simple
problem.
james

"Chris Dunaway" <"dunawayc[[at]_lunchmeat_sbcglobal[dot]]net"> wrote in message
news:1v******************************@40tude.net.. .
On Fri, 12 Nov 2004 16:17:53 -0600, james wrote:
s.Replace(x, Chr(32))


This line will not replace anything, you must assign the result back to the
string:

s = s.Replace(x,Chr(32))

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.

Nov 21 '05 #4

P: n/a
Chris, thanks for the suggestion, however, that just replaces all the text with a space " " .
What I am having a problem with is that the value of X that I define as Chr(255) ,,, which is supposed to be the small y with
two dots over it, has am empty value instead of the intended value. I don't understand how X 's value is being changed as I
have set it at one value and for whatever reason it becomes another.
I thought that maybe, I was actually using the wrong Chr value for the little y with the two dots above.
But, I cannot find any reference in Help (VB.NET 2003) that shows what the character values are. There is no table(s) in the
docs that I can find like my old VB6 reference library that had the ANSI character sets listed with the correct Chr(numbers)
listed that I can use to define the character I need to find and replace using Chr or ChrW. I think that is the problem.
VB.NET , since it uses Unicode for Characters, is not reading the correct value for X. Even though I can display that value and
character using Chr(255) on a label and it will show the correct character.
I'm getting cross-eyed (hard for a one eyed guy to do) looking for this reference and the solution to what should be a simple
problem.
james

"Chris Dunaway" <"dunawayc[[at]_lunchmeat_sbcglobal[dot]]net"> wrote in message
news:1v******************************@40tude.net.. .
On Fri, 12 Nov 2004 16:17:53 -0600, james wrote:
s.Replace(x, Chr(32))


This line will not replace anything, you must assign the result back to the
string:

s = s.Replace(x,Chr(32))

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.

Nov 21 '05 #5

P: n/a
Fixed it!!!!!!!!!
For anyone interested (besides me) here's what I did:

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the results to a listbox

Dim str As String = RichTextBox1.Text

Dim x As String = Chr(255)

Dim arr() As String = str.Split(","c)

For Each s As String In arr

' the routine below looks for Chr(255) in the string s and then if it's there, replaces it with a space, if it's not there then
it just adds 'the original character to the string
If InStr(s, x) Then

s = s.Replace(x, Chr(32))

Else

s = s

End If

ListBox1.Items.Add(s)

Next

End Sub


Nov 21 '05 #6

P: n/a
Fixed it!!!!!!!!!
For anyone interested (besides me) here's what I did:

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the results to a listbox

Dim str As String = RichTextBox1.Text

Dim x As String = Chr(255)

Dim arr() As String = str.Split(","c)

For Each s As String In arr

' the routine below looks for Chr(255) in the string s and then if it's there, replaces it with a space, if it's not there then
it just adds 'the original character to the string
If InStr(s, x) Then

s = s.Replace(x, Chr(32))

Else

s = s

End If

ListBox1.Items.Add(s)

Next

End Sub


Nov 21 '05 #7

P: n/a
James,
I use Character Map under "Programs - Accessories - System Tools" on Windows
XP.

If you select a character you can look in the lower left hand corner for the
Unicode code point (which is displayed in hex). For example the Latin Small
Letter Y With Diaeresis () is U+00FF which in VB.NET you can use:

Const LatinSmallLetterYWithDiaeresis As Char = ChrW(&HFF)

It just happens that in the US (& most of Europe) that the ANSI code point
is the same 255 or &HFF. However there are a number of characters, such as
Left Double Quote " that the code points are different.

Const LeftDoubleQuote As Char = ChrW(&H201C) ' Ansi = 147
Const RightDoubleQuote As Char = ChrW(&H201D) ' Ansi = 148

I almost always use ChrW & the Unicode code points as Chr & the Ansi code
points can be effected by the region settings in Control Panel. ChrW(&HFF)
will be a no matter where you go, where as Chr(&HFF) may be some other
character in other countries...

Hope this helps
Jay

"james" <jjames700ReMoVeMe at earthlink dot net> wrote in message
news:uF**************@TK2MSFTNGP14.phx.gbl...
Chris, thanks for the suggestion, however, that just replaces all the text
with a space " " .
What I am having a problem with is that the value of X that I define as
Chr(255) ,,, which is supposed to be the small y with two dots over it,
has am empty value instead of the intended value. I don't understand how
X 's value is being changed as I have set it at one value and for whatever
reason it becomes another.
I thought that maybe, I was actually using the wrong Chr value for the
little y with the two dots above.
But, I cannot find any reference in Help (VB.NET 2003) that shows what the
character values are. There is no table(s) in the docs that I can find
like my old VB6 reference library that had the ANSI character sets listed
with the correct Chr(numbers) listed that I can use to define the
character I need to find and replace using Chr or ChrW. I think that is
the problem. VB.NET , since it uses Unicode for Characters, is not reading
the correct value for X. Even though I can display that value and
character using Chr(255) on a label and it will show the correct
character.
I'm getting cross-eyed (hard for a one eyed guy to do) looking for this
reference and the solution to what should be a simple problem.
james

"Chris Dunaway" <"dunawayc[[at]_lunchmeat_sbcglobal[dot]]net"> wrote in
message news:1v******************************@40tude.net.. .
On Fri, 12 Nov 2004 16:17:53 -0600, james wrote:
s.Replace(x, Chr(32))


This line will not replace anything, you must assign the result back to
the
string:

s = s.Replace(x,Chr(32))

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.


Nov 21 '05 #8

P: n/a
James,
I use Character Map under "Programs - Accessories - System Tools" on Windows
XP.

If you select a character you can look in the lower left hand corner for the
Unicode code point (which is displayed in hex). For example the Latin Small
Letter Y With Diaeresis () is U+00FF which in VB.NET you can use:

Const LatinSmallLetterYWithDiaeresis As Char = ChrW(&HFF)

It just happens that in the US (& most of Europe) that the ANSI code point
is the same 255 or &HFF. However there are a number of characters, such as
Left Double Quote " that the code points are different.

Const LeftDoubleQuote As Char = ChrW(&H201C) ' Ansi = 147
Const RightDoubleQuote As Char = ChrW(&H201D) ' Ansi = 148

I almost always use ChrW & the Unicode code points as Chr & the Ansi code
points can be effected by the region settings in Control Panel. ChrW(&HFF)
will be a no matter where you go, where as Chr(&HFF) may be some other
character in other countries...

Hope this helps
Jay

"james" <jjames700ReMoVeMe at earthlink dot net> wrote in message
news:uF**************@TK2MSFTNGP14.phx.gbl...
Chris, thanks for the suggestion, however, that just replaces all the text
with a space " " .
What I am having a problem with is that the value of X that I define as
Chr(255) ,,, which is supposed to be the small y with two dots over it,
has am empty value instead of the intended value. I don't understand how
X 's value is being changed as I have set it at one value and for whatever
reason it becomes another.
I thought that maybe, I was actually using the wrong Chr value for the
little y with the two dots above.
But, I cannot find any reference in Help (VB.NET 2003) that shows what the
character values are. There is no table(s) in the docs that I can find
like my old VB6 reference library that had the ANSI character sets listed
with the correct Chr(numbers) listed that I can use to define the
character I need to find and replace using Chr or ChrW. I think that is
the problem. VB.NET , since it uses Unicode for Characters, is not reading
the correct value for X. Even though I can display that value and
character using Chr(255) on a label and it will show the correct
character.
I'm getting cross-eyed (hard for a one eyed guy to do) looking for this
reference and the solution to what should be a simple problem.
james

"Chris Dunaway" <"dunawayc[[at]_lunchmeat_sbcglobal[dot]]net"> wrote in
message news:1v******************************@40tude.net.. .
On Fri, 12 Nov 2004 16:17:53 -0600, james wrote:
s.Replace(x, Chr(32))


This line will not replace anything, you must assign the result back to
the
string:

s = s.Replace(x,Chr(32))

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.


Nov 21 '05 #9

P: n/a
Thank you Jay. I guess I have been spoiled by having VB6's Reference Library and now that I am using VB.NET (nope I did not thow
away the VB6 Ref. Library) I find it hard to understand why the character sets are not listed in Help, where they can easily be
found.
As you have probably seen from my previous posts in this thread, I am working on a utility to filter out
ANSI characters from a CSV file. I have that working fine as of now. But, the problem I have run into now is when I open the
file(CSV) with Access as a Text file with a Delimiter "," it parses the file just fine but, more of the extra characters appear
in some of the text in the datagrid I display the data in.
I can use a program called Textpad, to view the original file (the one I filtered) and it does not show the extra characters. I
can go into Access 2003 itself, and build a new database and import that same file
and under the Advanced setting, I can select UTF-8 and Access will import the CSV file correctly and without the extra
characters. But, my utility has a function that I am working on to build a new Access database using the CSV file's info and
when I load the file into the datagrid using this routine:

Private Sub btnImportCSV_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnImportCSV.Click

DataGrid1.DataSource = Nothing

Dim strFileName As String

Dim strFilePath As String

Dim sSlash As Single

Try

With OpenFileDialog1

..Title = "Import CSV file"

'.InitialDirectory = "C:\"

..Filter = "File (*.xls;*.csv;*.txt)|*.xls;*.csv;*.txt|All files (*.*)|*.*"

..FileName = ""

..ShowDialog()

sSlash = InStrRev(.FileName, "\")

strFilePath = Mid(.FileName, 1, sSlash)

strFileName = Mid(.FileName, sSlash + 1, Len(.FileName))

End With

Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" _

& "Data Source=" & strFilePath & ";" _

& "Extended Properties=""text;HDR=Yes;FMT=Delimited\"""

Dim conn As New OleDb.OleDbConnection(strConnectionString)

conn.Open() ' Open connection with the database.

Dim objCmdSelect As New OleDb.OleDbCommand("SELECT * FROM [" & strFileName & "]", conn) .

' Create new OleDbDataAdapter that is used to build a DataSet based on the preceding SQL SELECT statement.

Dim objAdapter1 As New OleDb.OleDbDataAdapter

objAdapter1.SelectCommand = objCmdSelect 'Pass the Select command to the adapter.

Dim objDataset1 As New DataSet 'Create new DataSet to hold information from the worksheet.

Dim mytablename As String

objAdapter1.Fill(objDataset1, "mytable") 'Fill the DataSet with the information from the file.

DataGrid1.DataSource = objDataset1.Tables(0).DefaultView 'Build a table from the original data.

TextBox1.Text = objDataset1.Tables(0).Columns.Count

TextBox2.Text = objDataset1.Tables(0).TableName

TextBox3.Text = objDataset1.Tables(0).Rows.Count.ToString

conn.Close() 'Clean up objects.

Catch ex As Exception

MsgBox(ex.Message).ToString()

End Try

DataGrid1.Visible = True

DataGrid1.Refresh()

Me.Text = "Importing CSV File:" & strFileName

End Sub

I get the extra characters displayed in the datagrid. So, I know that the extra characters are not actually in the new CSV file
that I saved but, are being displayed by Access in some way.

Any suggestions on what I may be doing wrong here would be appreciated.

james


"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message news:%2****************@TK2MSFTNGP10.phx.gbl...
James,
I use Character Map under "Programs - Accessories - System Tools" on Windows XP.

If you select a character you can look in the lower left hand corner for the Unicode code point (which is displayed in hex).
For example the Latin Small Letter Y With Diaeresis () is U+00FF which in VB.NET you can use:

Const LatinSmallLetterYWithDiaeresis As Char = ChrW(&HFF)

It just happens that in the US (& most of Europe) that the ANSI code point is the same 255 or &HFF. However there are a number
of characters, such as Left Double Quote " that the code points are different.

Const LeftDoubleQuote As Char = ChrW(&H201C) ' Ansi = 147
Const RightDoubleQuote As Char = ChrW(&H201D) ' Ansi = 148

I almost always use ChrW & the Unicode code points as Chr & the Ansi code points can be effected by the region settings in
Control Panel. ChrW(&HFF) will be a no matter where you go, where as Chr(&HFF) may be some other character in other
countries...

Hope this helps
Jay

Nov 21 '05 #10

P: n/a
Thank you Jay. I guess I have been spoiled by having VB6's Reference Library and now that I am using VB.NET (nope I did not thow
away the VB6 Ref. Library) I find it hard to understand why the character sets are not listed in Help, where they can easily be
found.
As you have probably seen from my previous posts in this thread, I am working on a utility to filter out
ANSI characters from a CSV file. I have that working fine as of now. But, the problem I have run into now is when I open the
file(CSV) with Access as a Text file with a Delimiter "," it parses the file just fine but, more of the extra characters appear
in some of the text in the datagrid I display the data in.
I can use a program called Textpad, to view the original file (the one I filtered) and it does not show the extra characters. I
can go into Access 2003 itself, and build a new database and import that same file
and under the Advanced setting, I can select UTF-8 and Access will import the CSV file correctly and without the extra
characters. But, my utility has a function that I am working on to build a new Access database using the CSV file's info and
when I load the file into the datagrid using this routine:

Private Sub btnImportCSV_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnImportCSV.Click

DataGrid1.DataSource = Nothing

Dim strFileName As String

Dim strFilePath As String

Dim sSlash As Single

Try

With OpenFileDialog1

..Title = "Import CSV file"

'.InitialDirectory = "C:\"

..Filter = "File (*.xls;*.csv;*.txt)|*.xls;*.csv;*.txt|All files (*.*)|*.*"

..FileName = ""

..ShowDialog()

sSlash = InStrRev(.FileName, "\")

strFilePath = Mid(.FileName, 1, sSlash)

strFileName = Mid(.FileName, sSlash + 1, Len(.FileName))

End With

Dim strConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;" _

& "Data Source=" & strFilePath & ";" _

& "Extended Properties=""text;HDR=Yes;FMT=Delimited\"""

Dim conn As New OleDb.OleDbConnection(strConnectionString)

conn.Open() ' Open connection with the database.

Dim objCmdSelect As New OleDb.OleDbCommand("SELECT * FROM [" & strFileName & "]", conn) .

' Create new OleDbDataAdapter that is used to build a DataSet based on the preceding SQL SELECT statement.

Dim objAdapter1 As New OleDb.OleDbDataAdapter

objAdapter1.SelectCommand = objCmdSelect 'Pass the Select command to the adapter.

Dim objDataset1 As New DataSet 'Create new DataSet to hold information from the worksheet.

Dim mytablename As String

objAdapter1.Fill(objDataset1, "mytable") 'Fill the DataSet with the information from the file.

DataGrid1.DataSource = objDataset1.Tables(0).DefaultView 'Build a table from the original data.

TextBox1.Text = objDataset1.Tables(0).Columns.Count

TextBox2.Text = objDataset1.Tables(0).TableName

TextBox3.Text = objDataset1.Tables(0).Rows.Count.ToString

conn.Close() 'Clean up objects.

Catch ex As Exception

MsgBox(ex.Message).ToString()

End Try

DataGrid1.Visible = True

DataGrid1.Refresh()

Me.Text = "Importing CSV File:" & strFileName

End Sub

I get the extra characters displayed in the datagrid. So, I know that the extra characters are not actually in the new CSV file
that I saved but, are being displayed by Access in some way.

Any suggestions on what I may be doing wrong here would be appreciated.

james


"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message news:%2****************@TK2MSFTNGP10.phx.gbl...
James,
I use Character Map under "Programs - Accessories - System Tools" on Windows XP.

If you select a character you can look in the lower left hand corner for the Unicode code point (which is displayed in hex).
For example the Latin Small Letter Y With Diaeresis () is U+00FF which in VB.NET you can use:

Const LatinSmallLetterYWithDiaeresis As Char = ChrW(&HFF)

It just happens that in the US (& most of Europe) that the ANSI code point is the same 255 or &HFF. However there are a number
of characters, such as Left Double Quote " that the code points are different.

Const LeftDoubleQuote As Char = ChrW(&H201C) ' Ansi = 147
Const RightDoubleQuote As Char = ChrW(&H201D) ' Ansi = 148

I almost always use ChrW & the Unicode code points as Chr & the Ansi code points can be effected by the region settings in
Control Panel. ChrW(&HFF) will be a no matter where you go, where as Chr(&HFF) may be some other character in other
countries...

Hope this helps
Jay

Nov 21 '05 #11

P: n/a
I thought there might be someone out there that would like to see how I finally solved the Replace problem I had with odd
Characters appearing in old data files(DOS based Dataflex 3.1d .DAT files) that were converted to CSV files.
The "normal" character set that the old program used was from 0 to 127. I think that because Windows XP ( and Visual Basic.NET)
use Unicode character sets, that that was part of the problem.
I finally wrote a small routine to remove all characters from 128 to 255 and substitute them with Chr(32)
<SPACE> And that cleaned up the files I am working on. So much so, that I can use Access 2002 and build new databases and
import the cleaned CSV files and get the data back like I need. (even though I plan to write a complete tool to write the new
databases using ADOX eventually)
So, here is the routine I used:
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the results to a listbox

' later put back the "," when writing the cleaned file back so Access can import a "proper" Comma Seperated Value file(CSV)

'not in this routine but, in my save file routine

Dim str As String = RichTextBox1.Text

Dim arr() As String = str.Split(","c)

For Each s As String In arr

' this removes every character in character set(128-255) and subs with Space Chr(32)

Dim y As Int16

Dim z As String

For y = 128 To 255

z = Chr(y)

If InStr(s, z) Then

s = s.Replace(z, Chr(32))

Else

s = s

End If

Next

ListBox1.Items.Add(s)

Next

'counts every character proccessed

Label1.Text = ListBox1.Items.Count

End Sub
I searched quite a lot for this and the inspiration for the actual routine came from one posted by Cor Ligthert some time ago.
I just modified it for my needs. Also, Chris Dunaway gave me some good pointers on how to use the Replace function properly. I
hope this post helps someone else who is looking to solve a similar problem.
james

Nov 21 '05 #12

P: n/a
I thought there might be someone out there that would like to see how I finally solved the Replace problem I had with odd
Characters appearing in old data files(DOS based Dataflex 3.1d .DAT files) that were converted to CSV files.
The "normal" character set that the old program used was from 0 to 127. I think that because Windows XP ( and Visual Basic.NET)
use Unicode character sets, that that was part of the problem.
I finally wrote a small routine to remove all characters from 128 to 255 and substitute them with Chr(32)
<SPACE> And that cleaned up the files I am working on. So much so, that I can use Access 2002 and build new databases and
import the cleaned CSV files and get the data back like I need. (even though I plan to write a complete tool to write the new
databases using ADOX eventually)
So, here is the routine I used:
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the results to a listbox

' later put back the "," when writing the cleaned file back so Access can import a "proper" Comma Seperated Value file(CSV)

'not in this routine but, in my save file routine

Dim str As String = RichTextBox1.Text

Dim arr() As String = str.Split(","c)

For Each s As String In arr

' this removes every character in character set(128-255) and subs with Space Chr(32)

Dim y As Int16

Dim z As String

For y = 128 To 255

z = Chr(y)

If InStr(s, z) Then

s = s.Replace(z, Chr(32))

Else

s = s

End If

Next

ListBox1.Items.Add(s)

Next

'counts every character proccessed

Label1.Text = ListBox1.Items.Count

End Sub
I searched quite a lot for this and the inspiration for the actual routine came from one posted by Cor Ligthert some time ago.
I just modified it for my needs. Also, Chris Dunaway gave me some good pointers on how to use the Replace function properly. I
hope this post helps someone else who is looking to solve a similar problem.
james

Nov 21 '05 #13

P: n/a
James,
It really sounds like you & your program are confusing:

- Ascii (7 bit character encoding 0 to 127), used in files.
- Ansi (8 bit character encoding 0 to 255), used in files. Default for
NotePad & most Windows utilities (including XP). Uses Code Pages to
distinguish characters in the 128 to 255 range. Also generally means that a
file can only have one encoding. Note Windows Code Pages changed from DOS
Code Pages.
- UTF-8 (8 bit character encoding of Unicode characters), used in file,
default for .NET file I/O classes in System.IO namespace.
- UTF-16 aka Unicode (16 bit character encoding of Unicode characters), used
for the basis of System.Char & System.String. Also available for Win32 API
calls on Window NT, 2000 & XP.
- UTF-32 (32 bit character encoding of Unicode characters) not fully
supported in .NET until 2.0 (VS.NET 2005) aka Whidbey due out later in 2005.

I have not used Jet (Microsoft.Jet.OLEDB.4.0 aka Access) enough to know how
you specify the Encoding required to read a file correctly.

For further information on Encodings see :
http://www.yoda.arachsys.com/csharp/unicode.html

A couple of the links gets you to character reference tables, however due to
the size of the tables, I normally rely on System.Text.Encoding to convert
from one encoding to another.
I finally wrote a small routine to remove all characters from 128 to 255
and substitute them with Chr(32)
<SPACE> Although it appears to work, I hope you agree that really is not correct. I
would find out what the actual encoding of the Dataflex .DAT files are & set
the encoding appropriately. Of course if it works for you & it is "close
enough" for your customer, then go for it.

Hope this helps
Jay

"james" <jjames700ReMoVeMe at earthlink dot net> wrote in message
news:%2****************@TK2MSFTNGP12.phx.gbl...I thought there might be someone out there that would like to see how I
finally solved the Replace problem I had with odd Characters appearing in
old data files(DOS based Dataflex 3.1d .DAT files) that were converted to
CSV files.
The "normal" character set that the old program used was from 0 to 127. I
think that because Windows XP ( and Visual Basic.NET) use Unicode
character sets, that that was part of the problem.
I finally wrote a small routine to remove all characters from 128 to 255
and substitute them with Chr(32)
<SPACE> And that cleaned up the files I am working on. So much so, that
I can use Access 2002 and build new databases and import the cleaned CSV
files and get the data back like I need. (even though I plan to write a
complete tool to write the new databases using ADOX eventually)
So, here is the routine I used:
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the
results to a listbox

' later put back the "," when writing the cleaned file back so Access can
import a "proper" Comma Seperated Value file(CSV)

'not in this routine but, in my save file routine

Dim str As String = RichTextBox1.Text

Dim arr() As String = str.Split(","c)

For Each s As String In arr

' this removes every character in character set(128-255) and subs with
Space Chr(32)

Dim y As Int16

Dim z As String

For y = 128 To 255

z = Chr(y)

If InStr(s, z) Then

s = s.Replace(z, Chr(32))

Else

s = s

End If

Next

ListBox1.Items.Add(s)

Next

'counts every character proccessed

Label1.Text = ListBox1.Items.Count

End Sub
I searched quite a lot for this and the inspiration for the actual routine
came from one posted by Cor Ligthert some time ago. I just modified it for
my needs. Also, Chris Dunaway gave me some good pointers on how to use the
Replace function properly. I hope this post helps someone else who is
looking to solve a similar problem.
james


Nov 21 '05 #14

P: n/a
I understand what you are saying Jay. The problem is the old Dataflex program, which is DOS based, appears to be using Ascii
characters. But, DataAccess (the creators, or current creators, of Dataflex) will not disclose the formating of the data files
even though the format is very old. (well, old in computer age)
I have a copy of their latest development suite (time limited demo) and even when I use it to export the older, DOS based
program's, data files, the underlying data ends up having characters added that are not in the original program's data. Using
the original DOS application to view the data, none of those "extra" characters show up.
I can use a program called TEXTPAD (very nice shareware by the way) and look at the Exported CSV files from Visual Dataflex or
even from an old DOS based query program that they had for the original program, called DFQUERY, and the output from both
programs, will have the extra characters in the data. Somewhere, something is being added in the translation.
I am sure I do not have a firm grasp of exactly what is causing this problem and what exactly to do to get rid of it. That is
why I came up with the Character Substitution routine. Once I run each of the CSV files that either program exports, thru that
routine, all the data, no longer has the extra characters in it. And then when I import those same "filtered" CSV files into
Access, the extra characters are gone.
In my search to find out why this was happening, I ran into a lot of posts by others having the same problem. And most
explainations were similar to yours. But, even then, none of the original posters ever came back and said that making the
changes suggested worked. Most of the time, they said it did not and they ended up having to do something like I have done or
even manually re-entering the data in order to get rid of the problem in the new application. And several times, people have
come back and said that they just did not get it. That if they understood the different character encodings, that they would not
have a problem. I guess, what I am trying to say is, I may not know a lot about the different character encoding formats etc.
but, I do know that somewhere there is a conversion problem. Wheather it is with me or the way WindowsXP's character handling
interacts with old DOS based applications, I don't know. But, I do know, that I am not the only person to run into this same
problem.
That is why I posted the code I used to get rid of the extra characters. It is more or less, an, "if all else fails" solution.
And I figured that someone like me searching with Google etc. might find it and it might help solve the immediate problem.
Again, thank you Jay for your suggestions and I will try to investigate further the resources you gave and hopefully , I can
find a better, more elegant solution.
james

"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message news:eT**************@TK2MSFTNGP10.phx.gbl...
James,
It really sounds like you & your program are confusing:

- Ascii (7 bit character encoding 0 to 127), used in files.
- Ansi (8 bit character encoding 0 to 255), used in files. Default for NotePad & most Windows utilities (including XP). Uses
Code Pages to distinguish characters in the 128 to 255 range. Also generally means that a file can only have one encoding.
Note Windows Code Pages changed from DOS Code Pages.
- UTF-8 (8 bit character encoding of Unicode characters), used in file, default for .NET file I/O classes in System.IO
namespace.
- UTF-16 aka Unicode (16 bit character encoding of Unicode characters), used for the basis of System.Char & System.String.
Also available for Win32 API calls on Window NT, 2000 & XP.
- UTF-32 (32 bit character encoding of Unicode characters) not fully supported in .NET until 2.0 (VS.NET 2005) aka Whidbey due
out later in 2005.

I have not used Jet (Microsoft.Jet.OLEDB.4.0 aka Access) enough to know how you specify the Encoding required to read a file
correctly.

For further information on Encodings see :
http://www.yoda.arachsys.com/csharp/unicode.html

A couple of the links gets you to character reference tables, however due to the size of the tables, I normally rely on
System.Text.Encoding to convert from one encoding to another.
I finally wrote a small routine to remove all characters from 128 to 255 and substitute them with Chr(32)
<SPACE>

Although it appears to work, I hope you agree that really is not correct. I would find out what the actual encoding of the
Dataflex .DAT files are & set the encoding appropriately. Of course if it works for you & it is "close enough" for your
customer, then go for it.

Hope this helps
Jay

"james" <jjames700ReMoVeMe at earthlink dot net> wrote in message news:%2****************@TK2MSFTNGP12.phx.gbl...
I thought there might be someone out there that would like to see how I finally solved the Replace problem I had with odd
Characters appearing in old data files(DOS based Dataflex 3.1d .DAT files) that were converted to CSV files.
The "normal" character set that the old program used was from 0 to 127. I think that because Windows XP ( and Visual
Basic.NET) use Unicode character sets, that that was part of the problem.
I finally wrote a small routine to remove all characters from 128 to 255 and substitute them with Chr(32)
<SPACE> And that cleaned up the files I am working on. So much so, that I can use Access 2002 and build new databases and
import the cleaned CSV files and get the data back like I need. (even though I plan to write a complete tool to write the new
databases using ADOX eventually)
So, here is the routine I used:
Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click

'cleans the "," character out of a comma delimited string and outputs the results to a listbox

' later put back the "," when writing the cleaned file back so Access can import a "proper" Comma Seperated Value file(CSV)

'not in this routine but, in my save file routine

Dim str As String = RichTextBox1.Text

Dim arr() As String = str.Split(","c)

For Each s As String In arr

' this removes every character in character set(128-255) and subs with Space Chr(32)

Dim y As Int16

Dim z As String

For y = 128 To 255

z = Chr(y)

If InStr(s, z) Then

s = s.Replace(z, Chr(32))

Else

s = s

End If

Next

ListBox1.Items.Add(s)

Next

'counts every character proccessed

Label1.Text = ListBox1.Items.Count

End Sub
I searched quite a lot for this and the inspiration for the actual routine came from one posted by Cor Ligthert some time
ago. I just modified it for my needs. Also, Chris Dunaway gave me some good pointers on how to use the Replace function
properly. I hope this post helps someone else who is looking to solve a similar problem.
james



Nov 21 '05 #15

P: n/a
James,
I understand what you are saying Jay. The problem is the old Dataflex
program, which is DOS based, appears to be using Ascii characters. My point is there is NO ASCII char 255! As ASCII only has code points from 0
to 127. If you have a 255 character in your data file then there is an
extremely good chance that it is Ansi, either a DOS or Windows Code Page.

There a slight chance the extra characters you may be seeing might actually
be control characters of some kind (End of field & End of record come to
mind). As 255 is a "special number" it is a byte with all bits set. Some
programs use it to indicate end of field, some use a byte with all bits
cleared for the same thing.

Are you attempting to read the DAT file itself, or did you use Dataflex to
export the file to a CSV, or are you using Jet to attempt to read the DAT,
or are you using Jet to attempt to read the CSV? Depending on what you are
doing there are any number of places where you are loosing the file format
and/or the encoding. Hence you see the 255 in your VB.NET program.

Doing a google search on "Dataflex file format" came up with a handful of
promising sites, such as:

http://myfileformats.com/search.php?...tabase%20files

Hope this helps
Jay

"james" <jjames700ReMoVeMe at earthlink dot net> wrote in message
news:u1**************@TK2MSFTNGP15.phx.gbl...I understand what you are saying Jay. The problem is the old Dataflex
program, which is DOS based, appears to be using Ascii characters. But,
DataAccess (the creators, or current creators, of Dataflex) will not
disclose the formating of the data files even though the format is very
old. (well, old in computer age)
I have a copy of their latest development suite (time limited demo) and
even when I use it to export the older, DOS based program's, data files,
the underlying data ends up having characters added that are not in the
original program's data. Using the original DOS application to view the
data, none of those "extra" characters show up.
I can use a program called TEXTPAD (very nice shareware by the way) and
look at the Exported CSV files from Visual Dataflex or even from an old
DOS based query program that they had for the original program, called
DFQUERY, and the output from both programs, will have the extra characters
in the data. Somewhere, something is being added in the translation.
I am sure I do not have a firm grasp of exactly what is causing this
problem and what exactly to do to get rid of it. That is why I came up
with the Character Substitution routine. Once I run each of the CSV files
that either program exports, thru that routine, all the data, no longer
has the extra characters in it. And then when I import those same
"filtered" CSV files into Access, the extra characters are gone.
In my search to find out why this was happening, I ran into a lot of posts
by others having the same problem. And most explainations were similar to
yours. But, even then, none of the original posters ever came back and
said that making the changes suggested worked. Most of the time, they
said it did not and they ended up having to do something like I have done
or even manually re-entering the data in order to get rid of the problem
in the new application. And several times, people have come back and said
that they just did not get it. That if they understood the different
character encodings, that they would not have a problem. I guess, what I
am trying to say is, I may not know a lot about the different character
encoding formats etc. but, I do know that somewhere there is a conversion
problem. Wheather it is with me or the way WindowsXP's character handling
interacts with old DOS based applications, I don't know. But, I do know,
that I am not the only person to run into this same problem.
That is why I posted the code I used to get rid of the extra characters.
It is more or less, an, "if all else fails" solution. And I figured that
someone like me searching with Google etc. might find it and it might help
solve the immediate problem.
Again, thank you Jay for your suggestions and I will try to investigate
further the resources you gave and hopefully , I can find a better, more
elegant solution.
james

<<snip>>
Nov 21 '05 #16

P: n/a
My bad Jay, I meant to say ANSI Character Set. And Chr(255) in that set is a lower case y with the two dots over it. As of
right now I have a side project to read the actual .DAT files using a binary reader.
But, what I have been doing is using Visual Dataflex 10 (demo version) & the DOS program called
DFQUERY (which is a Query builder for Dataflex 3.1d) to export the original .DAT files to CSV format in order to import the data
into Access. And using the original DATABASE application to view the actual data files in DOS mode, the extra characters do
not appear in any of the data. But, exporting the original files using either one of the two programs I mentioned results in the
characters being added to the data. And it only occurs in a particular situation: Like so:
If you have an entry like this: Smith, John Doe there will be the small character (either a lower case a or y with the two
dots over them) between John & Doe where a space should be.
However, if you have : Smith, John D. they do not appear!! And looking at the original data where :
Smith, John Doe is listed, those characters are not there.
Another note too, one of the original developers of Dataflex who lives in the Netherlands, sent me two
small programs that run in DOS mode , using Dataflex 3.1d's runtime (like the old Quick Basic runtime)
to convert the XML or CSV files. And in both of those programs output, the same thing happens!!
So, I am thinking the problem exists because the original DOS based program used the ANSI Character set and when they are ran in
the underlying DOS for Windows XP, there is a conversion error taking place with the character sets. ANSI to Unicode.
That is also why I am working on a Binary reader to get past the problem. If this were to be a one time thing then I would
already be done since I built the little filtering/substitution utility to get rid of the extra characters. But, there are
several hundred users of the original application that could "potentially"become new clients for me who would want the new
application that does not have the limits that the older application has. And that would work better with newer hardware and
Operating Systems.
I have the file you pointed me to. But, it is for a previous version of Dataflex 2.3b, which the author of that file says
there may be changes between it and later versions that could break anything that uses what he presents. So, there is a chance,
that that info will not work. But, at least it is a starting point on reading the files in binary format and actually gettting
to the raw data and parsing it.
Thank you for taking the time to offer suggestions on this particular problem. Any and all suggestions are welcome. I have
Googled until my one eye is crossed............:-)
james

(oh, and yes, they could be control characters, but, I doubt it since they only appear in the conditions I mentioned above, and
the name listings occur exactly the way they should in the original database without the extra characters. Meaning there is a
Space between , Smith, John Doe instead of the lower case y with the two dots between John and Doe and it only occurs in that
particular situation. Longer text with lots of words does not contain it either)
"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message news:Oa**************@TK2MSFTNGP12.phx.gbl...
James,
I understand what you are saying Jay. The problem is the old Dataflex program, which is DOS based, appears to be using Ascii
characters.

My point is there is NO ASCII char 255! As ASCII only has code points from 0 to 127. If you have a 255 character in your data
file then there is an extremely good chance that it is Ansi, either a DOS or Windows Code Page.

There a slight chance the extra characters you may be seeing might actually be control characters of some kind (End of field &
End of record come to mind). As 255 is a "special number" it is a byte with all bits set. Some programs use it to indicate end
of field, some use a byte with all bits cleared for the same thing.

Are you attempting to read the DAT file itself, or did you use Dataflex to export the file to a CSV, or are you using Jet to
attempt to read the DAT, or are you using Jet to attempt to read the CSV? Depending on what you are doing there are any number
of places where you are loosing the file format and/or the encoding. Hence you see the 255 in your VB.NET program.

Doing a google search on "Dataflex file format" came up with a handful of promising sites, such as:

http://myfileformats.com/search.php?...tabase%20files

Hope this helps
Jay

"james" <jjames700ReMoVeMe at earthlink dot net> wrote in message news:u1**************@TK2MSFTNGP15.phx.gbl...
I understand what you are saying Jay. The problem is the old Dataflex program, which is DOS based, appears to be using Ascii
characters. But, DataAccess (the creators, or current creators, of Dataflex) will not disclose the formating of the data files
even though the format is very old. (well, old in computer age)
I have a copy of their latest development suite (time limited demo) and even when I use it to export the older, DOS based
program's, data files, the underlying data ends up having characters added that are not in the original program's data. Using
the original DOS application to view the data, none of those "extra" characters show up.
I can use a program called TEXTPAD (very nice shareware by the way) and look at the Exported CSV files from Visual Dataflex
or even from an old DOS based query program that they had for the original program, called DFQUERY, and the output from both
programs, will have the extra characters in the data. Somewhere, something is being added in the translation.
I am sure I do not have a firm grasp of exactly what is causing this problem and what exactly to do to get rid of it. That
is why I came up with the Character Substitution routine. Once I run each of the CSV files that either program exports, thru
that routine, all the data, no longer has the extra characters in it. And then when I import those same "filtered" CSV files
into Access, the extra characters are gone.
In my search to find out why this was happening, I ran into a lot of posts by others having the same problem. And most
explainations were similar to yours. But, even then, none of the original posters ever came back and said that making the
changes suggested worked. Most of the time, they said it did not and they ended up having to do something like I have done
or even manually re-entering the data in order to get rid of the problem in the new application. And several times, people
have come back and said that they just did not get it. That if they understood the different character encodings, that they
would not have a problem. I guess, what I am trying to say is, I may not know a lot about the different character encoding
formats etc. but, I do know that somewhere there is a conversion problem. Wheather it is with me or the way WindowsXP's
character handling interacts with old DOS based applications, I don't know. But, I do know, that I am not the only person
to run into this same problem.
That is why I posted the code I used to get rid of the extra characters. It is more or less, an, "if all else fails"
solution. And I figured that someone like me searching with Google etc. might find it and it might help solve the immediate
problem.
Again, thank you Jay for your suggestions and I will try to investigate further the resources you gave and hopefully , I can
find a better, more elegant solution.
james

<<snip>>

Nov 21 '05 #17

P: n/a

"james" <jjames700ReMoVeMe at earthlink dot net> wrote
My bad Jay, I meant to say ANSI Character Set. And Chr(255) in that set is a lower case y with the two dots over it. As of
right now I have a side project to read the actual .DAT files using a binary reader.
But, what I have been doing is using Visual Dataflex 10 (demo version) & the DOS program called
DFQUERY (which is a Query builder for Dataflex 3.1d) to export the original .DAT files to CSV format in order to import the data into Access. And using the original DATABASE application to view the actual data files in DOS mode, the extra characters do
not appear in any of the data. But, exporting the original files using either one of the two programs I mentioned results in the characters being added to the data. And it only occurs in a particular situation: <...> That is also why I am working on a Binary reader to get past the problem. If this were to be a one time thing then I would
already be done since I built the little filtering/substitution utility to get rid of the extra characters. But, there are > several hundred users of the original application that could "potentially"become new clients for me who would want the new application that does not have the limits that the older application has. And that would work better with newer hardware and
Operating Systems. <...> Thank you for taking the time to offer suggestions on this particular problem. Any and all suggestions are welcome. I have

I was going to suggest you use a table method, but I also thought there may be a code page
issue happening, something I am not very familiar with. But if you are resigned to do a manual
conversion, I would suggest you look at using a table of converted characters to simplifiy the
task. The table (0 - 255) simply contains the character you want for that value. If you want a
space to display for the character Chr(255) then place a space character in the table at location
255. Do that for all the characters you want changed. The ones you don't want changed should
be set to that same index value. Ex: the element for capital A is at location 65, that element should
contain 65 (the capital letter A).

Here is a quick example I typed up using a table method to substitute capital letters for the
lower case set;

HTH
LFS

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim tbl As Char() = BuildTable()
Dim text As String = "Now is the time for..."

Debug.WriteLine(Transpose(text, tbl))

End Sub
Private Function Transpose(ByVal Text As String, ByVal Table As Char()) As String
Dim result As Char()
result = Text.ToCharArray
For idx As Integer = 0 To result.GetUpperBound(0)
result(idx) = Table(Asc(result(idx)) And 255)
Next
Return result
End Function
Private Function BuildTable() As Char()
Dim Table(255) As Char

For idx As Integer = 0 To 255
Table(idx) = New Char
' This table converts lower case to upper case
If idx >= 97 And idx <= 122 Then
Table(idx) = Chr(idx - 32)
Else
Table(idx) = Chr(idx)
End If
Next
Return Table
End Function

Nov 21 '05 #18

P: n/a
Thank you Larry, I appreciate your input. I will try your suggestion on using a table for substitutions.
I take it you feel it would be faster than the routine I posted. So, I will give it a try.
Thanks again.
james

"Larry Serflaten" <se*******@usinternet.com> wrote in message news:%2****************@TK2MSFTNGP12.phx.gbl...
I was going to suggest you use a table method, but I also thought there may be a code page
issue happening, something I am not very familiar with. But if you are resigned to do a manual
conversion, I would suggest you look at using a table of converted characters to simplifiy the
task. The table (0 - 255) simply contains the character you want for that value. If you want a
space to display for the character Chr(255) then place a space character in the table at location
255. Do that for all the characters you want changed. The ones you don't want changed should
be set to that same index value. Ex: the element for capital A is at location 65, that element should
contain 65 (the capital letter A).

Here is a quick example I typed up using a table method to substitute capital letters for the
lower case set;

HTH
LFS

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim tbl As Char() = BuildTable()
Dim text As String = "Now is the time for..."

Debug.WriteLine(Transpose(text, tbl))

End Sub
Private Function Transpose(ByVal Text As String, ByVal Table As Char()) As String
Dim result As Char()
result = Text.ToCharArray
For idx As Integer = 0 To result.GetUpperBound(0)
result(idx) = Table(Asc(result(idx)) And 255)
Next
Return result
End Function
Private Function BuildTable() As Char()
Dim Table(255) As Char

For idx As Integer = 0 To 255
Table(idx) = New Char
' This table converts lower case to upper case
If idx >= 97 And idx <= 122 Then
Table(idx) = Chr(idx - 32)
Else
Table(idx) = Chr(idx)
End If
Next
Return Table
End Function

Nov 21 '05 #19

This discussion thread is closed

Replies have been disabled for this discussion.