473,804 Members | 2,933 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Testing File Format

Tom
Hi all,

I am looking for a smart way to assure a file is indeed a text file
within a C# method and not binary.

For example: Will "thisMysteryFil e.dat" be legible if opened in a
RichTextBox ... or is it a binary file?

I have searched various methods in the string class and am having no
luck.

Under consideration >>

Open the file in a binary reader and then test either the first 1000
char or until File End and if any char are less than 32 or greater
than 127 ... then flag it as binary.

If not binary >open in a RichTextBox

Can anyone tell me a more efficient way to accomplish this task?

Thanks !!
Nov 19 '07 #1
7 1953
The first problem I see with the "under consideration" method is that there
are so many legitimate characters (mostly in languages other than English)
that will fall outside your ASCII code range. Unicode (which can certainly be
the contents of a "text file" supports 65536 characters.

--Peter
"Inside every large program, there is a small program trying to get out."
http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://www.blogmetafinder.com

"Tom" wrote:
Hi all,

I am looking for a smart way to assure a file is indeed a text file
within a C# method and not binary.

For example: Will "thisMysteryFil e.dat" be legible if opened in a
RichTextBox ... or is it a binary file?

I have searched various methods in the string class and am having no
luck.

Under consideration >>

Open the file in a binary reader and then test either the first 1000
char or until File End and if any char are less than 32 or greater
than 127 ... then flag it as binary.

If not binary >open in a RichTextBox

Can anyone tell me a more efficient way to accomplish this task?

Thanks !!
Nov 19 '07 #2
Tom
Peter -- Thanks. Your comments have me thinking outside the match box
in which I was stuck. I'm now digging into the RichTextBoxStre amType
enumeration >UnicodePlainTe xt.

I'll experiment with this enumeration and see if loading a binary data
file throws an exception. All this RichTextBox stuff is new for me ...
so I have a lot to learn for sure.

Perhaps a restricted load of a tiny size for a preview and then have
control buttons with "Load Full File" or "Clear RichTextBox" options?

Avoiding the accidental loading of a huge binary data file is part of
my objective. The other part of the objective is read only viewing the
small parameter data file as part of a data run initialization.

I am always amazed at how another's input can cause me to refocus.
Darn trees ruining my view of the forrest!! LOL

Have a great day. Thanks again!

-- Tom

On Sun, 18 Nov 2007 18:04:00 -0800, Peter Bromberg [C# MVP]
<pb*******@yaho o.NoSpamMaam.co mwrote:
>The first problem I see with the "under consideration" method is that there
are so many legitimate characters (mostly in languages other than English)
that will fall outside your ASCII code range. Unicode (which can certainly be
the contents of a "text file" supports 65536 characters.

--Peter
"Inside every large program, there is a small program trying to get out."
http://www.eggheadcafe.com
http://petesbloggerama.blogspot.com
http://www.blogmetafinder.com

"Tom" wrote:
>Hi all,

I am looking for a smart way to assure a file is indeed a text file
within a C# method and not binary.

For example: Will "thisMysteryFil e.dat" be legible if opened in a
RichTextBox ... or is it a binary file?

I have searched various methods in the string class and am having no
luck.

Under consideration >>

Open the file in a binary reader and then test either the first 1000
char or until File End and if any char are less than 32 or greater
than 127 ... then flag it as binary.

If not binary >open in a RichTextBox

Can anyone tell me a more efficient way to accomplish this task?

Thanks !!
Nov 19 '07 #3
On 2007-11-18 19:45:15 -0800, Tom <Th********@ear thlink.netsaid:
Peter -- Thanks. Your comments have me thinking outside the match box
in which I was stuck. I'm now digging into the RichTextBoxStre amType
enumeration >UnicodePlainTe xt.
If you do that, won't you limit your input to Unicode files?

I think that one approach would be to use a StreamReader to
automatically detect the encoding of the file for you, and then read
the first 1K or so, counting how many characters return true for the
Char.IsLetterOr Digit method and comparing that to the total number of
characters.

It still won't be perfect, but you should be able to come up with a
reasonably good heuristic regarding what the ratio of alphanumeric
characters to other characters you would expect to see in a text file.

Of course, you can still include the user in the determination. For
example, run the above test and if the file passes go ahead and use it,
but if it fails provide the user with a chance to override your
analysis. You could even do this just as you suggest: provide a brief
preview of the initial part of the file to the user so that they can
visually decide whether it's a file they want treated as text.

Caveat: I have basically no experience with non-alphabetic languages,
and I don't know if in a non-alphabetic language a word character would
be considered a "letter" for the purpose of the above test. If that's
important to you, you'll want to verify that and/or find a form of
classification that will correctly detect those characters as text.

Pete

Nov 19 '07 #4
Unicode (which can certainly
be the contents of a "text file" supports 65536 characters.
Unicode goes up to 10FFFF, which is a bit more than one million.
Other than that, very good warning :-)
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Nov 19 '07 #5
Tom
Pete --

Thank you! I am new to C# and I am exploring StreamReader a.s.a.p.

I work only in the English language and am not developing programs for
global distribution. Your methodology seems solid to this newb. Usage
of Char.IsLetterOr Digit would effectively provide some language
independence. That independence makes for a MUCH better tool than what
I had been focused upon.

Very, very thought provoking!

Again, thanks. -- Tom

On Sun, 18 Nov 2007 20:05:37 -0800, Peter Duniho
<Np*********@Nn OwSlPiAnMk.comw rote:
>On 2007-11-18 19:45:15 -0800, Tom <Th********@ear thlink.netsaid:
>Peter -- Thanks. Your comments have me thinking outside the match box
in which I was stuck. I'm now digging into the RichTextBoxStre amType
enumeration >UnicodePlainTe xt.

If you do that, won't you limit your input to Unicode files?

I think that one approach would be to use a StreamReader to
automaticall y detect the encoding of the file for you, and then read
the first 1K or so, counting how many characters return true for the
Char.IsLetterO rDigit method and comparing that to the total number of
characters.

It still won't be perfect, but you should be able to come up with a
reasonably good heuristic regarding what the ratio of alphanumeric
characters to other characters you would expect to see in a text file.

Of course, you can still include the user in the determination. For
example, run the above test and if the file passes go ahead and use it,
but if it fails provide the user with a chance to override your
analysis. You could even do this just as you suggest: provide a brief
preview of the initial part of the file to the user so that they can
visually decide whether it's a file they want treated as text.

Caveat: I have basically no experience with non-alphabetic languages,
and I don't know if in a non-alphabetic language a word character would
be considered a "letter" for the purpose of the above test. If that's
important to you, you'll want to verify that and/or find a form of
classificati on that will correctly detect those characters as text.

Pete
Nov 19 '07 #6
Tom
Hey folks --

I've been rethinking my usage of RichTextBox long and hard. At first
it seemed the do all new magic class. For some tasks it is just that!
Accidentally opening a huge file from a ListView selection is
painfully slow and consumes resources like no tomorrow. Ouch.

What I really crave is a Text Viewer class without editing capability.
One that only loads a screen worth of text at a time. Where the thumb
is sized to reflect the file size and placement of the thumb loads
just that section of the data file. Like Petzold's painting with text
example from Programming Windows 95 ... only in .Net 2.0 C# and
integrated with a simpler TextBox? Or another text viewing control
that is more appropriate.

I'm still searching for such a Text Viewer. A search on "Thumb Size
..Net 2.0" led me to some graphics intensive TrackBarRendere r,
trackRectangle, thumbRectangle, etc. usage that goes way beyond the
WinForms book and C# Instructional Texts that I have. Certainly
steepening my learning curve!

My guess is someone has already duplicated that Petzold example in C#
2.0 and that I would learn more and faster from studying a guru's
coding than creating my own.

If anyone can point me towards such a useful, compact, and also
complex tool ... I would be without doubt grateful.

Thanks. -- Tom


Nov 19 '07 #7
On 2007-11-19 06:25:57 -0800, Tom <Th********@ear thlink.netsaid:
[...]
I'm still searching for such a Text Viewer. A search on "Thumb Size
.Net 2.0" led me to some graphics intensive TrackBarRendere r,
trackRectangle, thumbRectangle, etc. usage that goes way beyond the
WinForms book and C# Instructional Texts that I have. Certainly
steepening my learning curve!

My guess is someone has already duplicated that Petzold example in C#
2.0 and that I would learn more and faster from studying a guru's
coding than creating my own.

If anyone can point me towards such a useful, compact, and also
complex tool ... I would be without doubt grateful.
I'm not familiar with Petzold's examples, so I can't comment on that.
As far as what you're asking about, I'm not aware of a specific
text-box implementation that does what you're talking about. It
wouldn't be that hard to do, at least for the basic implementation
(duplicating the full functionality of the TextBoxBase classes would be
harder, but it sounds like you only need a minimal subset).

Interestingly, taking a suggestion from a different thread -- in which
someone suggested using a ListBox to implement a console-output-like
control -- you could use the DataGridView in a similar way, taking
advantage of its "VirtualMod e" mechanism. Using that, the control
handles all of the display and you provide the code that virtualizes
the data rather than having it all in memory at once.

It could be overkill -- the DataGridView control has lots of stuff in
it that would be of no value for this purpose -- and you might have
trouble getting it to look just right, since the DataGridView does have
a specific look and I don't know if you could get rid of the elements
that would be distracting in this use.

But hey, when you're hacking stuff, you can't be picky. :)

Pete

Nov 19 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3743
by: Hugh Cowan | last post by:
Hello, I don't program full-time (anymore), but I do try and stay on-top of the latest technologies and like most are always trying to upgrade my skills and remain current (as much as is possible). Most of my programming these days involves using PHP for creating script files for automating tasks and procedures (locally), and also for anything that might be needed by our divisional Intranet (not a huge site by any stretch of the...
2
2108
by: Edvard Majakari | last post by:
Hi all ya unit-testing experts there :) Code I'm working on has to parse large and complex files and detect equally complex and large amount of errors before the contents of the file is fed to module interpreting it. First I created a unit-test class named TestLoad which loaded, say, 40 files of which about 10 are correct and other 30 files contained over 20 different types of errors. Different methods on the TestLoad class were coded so...
5
4189
by: Jean-Marc Blaise | last post by:
Dear all, I have a table that contains a DATE in char(10) format. This table is LOADed. I put a check constraint on the column, some kind of CHECK (DATE(F) <= DATE('9999-12-31')) and I would like to move badly formatted field to the exception table. The problem is I will always get SQL0180N and the record is not moved into
0
1324
by: Todd D. Levy | last post by:
IF I have a database developed in the following environment: Windows XP Professional (all security patches and critical fixes installed) Office XP Professional (Access 2000 file format in Access 2002) And tested in the following environment: Windows NT SP 6.1 MS Access 2000 (9.0 27200)
0
3945
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen. It is almost like it is trying to implement it's own COM interfaces... below is the header, and a link to the dll+code: Zip file with header, example, and DLL:...
3
339
by: sviau | last post by:
whats the best way to test the load on the website which is similar the live load. we havea high traffic website (6 million page views per day). how can i use the iis logs from the live site; to to stress test the test site? thanks stephane
2
4402
by: UofFprogrammer | last post by:
I am experimenting with several ways to test for the end of file for an input file .txt If, for example, I have a text (test.txt) file that had: ab 5 fgd 3 fdfe 3 aasa 4 (intentionally blank line) where the first input is a character array and the second is an integer value.
0
1383
by: Matthew Fitzgibbons | last post by:
I'm by no means a testing expert, but I'll take a crack at it. Casey McGinty wrote: I've never run into this. Rule of thumb: always separate software from hardware. Write mock classes or functions that do your hardware/file access that always return known data (but remember to test for alpha and beta errors--make sure both valid and invalid data are handled correctly). That way you can test the client code that is accessing the...
0
9576
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10323
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10311
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10074
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7613
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5516
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5647
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4292
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2988
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.