I could not find over the web anything related to this issue that I found, so I started to open a discussion about this, and maybe can help me out, or give better ideas of how to handle with this.
Environment:
Windows XP Pro, VS2003, .NET 1.1, C#
The Case:
When we obtain the information of a csv file from a post type "multipart/form-data", the contents of the file come with strange chars, similar to double spaces that are not exactly spaces (very tricky to clean them up). I searched this case, but did not find anything related. So, I managed to solve this in a very dirty way, which I am embarassed to show here (lol).
We can upload (without using any COMs) files in 2 ways, basically:
1) using Request.BinaryRead (manual handling with posted data)
2) using HtmlInputFile.PostedFile (using .NET web controls)
I've tested my case for both and the results are the same.
Eg.
piece of the file = [bar foo,bar@foo.com,]
after obtaining the info from the post = [
b a r f o o , b a r @ f o o . c o m , ]
When I run a simple script to separate all elements of the content that I receive, see how it looks like:
[ ] [
](this is a new line content) [ ] [b] [ ] [a] [ ] [r] [ ] [ ] [ ] [f] [ ] [o] [ ] [o] [ ] [,] [ ] [b] [ ] [a] [ ] [r] [ ] [@] [ ] [f] [ ] [o] [ ] [o] [ ] [.] [ ] [c] [ ] [o] [ ] [m] [ ] [,] [ ]
For your information, the script:
// line is the line from the csv file
for(int i = 0; i < line.Length; i++)
{
string digit = line.Substring(i,1);
Response.Write(" ["+digit+"] ");
}
Note: Not all csv files have this problem. I got this from google's exporting features (gmail, orkut, etc).
The funny thing is when I display the contents on a webpage, everything seems ok, because the strange chars do not appear... I first discovered this when I did a script to automatically store the csv contents in a database... the data was very strange, because in the database all strange spaces were there, including the "new line" which does not disappear even if you replace it for anything else.
When I tried to compare the data of the spaces there, I could not find anything that would clean them:
line = line.Replace(" ", "") -> does not work
digit.Equals(string.Empty) -> always return false
"" + digit == "" -> is false
"" + digit == " " -> is false
So, I lost my hope on finding this char, which is not empty neither blank, so I went for the hashcode, and apparently solved the problem, but as I told before, I would not advise myself to do something like that, ever.
So, simply read everything char by char, and clean them up...
string correctedLine = "";
for(int i = 0; i < line.Length; i++)
{
string digit = line.Substring(i,1);
if (digit.GetHashCode() != 5381 && digit.GetHashCode() != 177583)
{
correctedLine += digit;
}
}
Has anyone ever seen this?
Thanks in advance.