473,653 Members | 3,015 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

? about file formats

Hi,

I have a dos program that creates data files which are used in another
program that is written by the same company. I am trying to figure out how I
can read the data from the data files.

When I open the files in notepad they look like crap so I'm assuming they
are written in binary?

Now, assuming they are written in binary are there any methods I can use to
try to determine the format?

I opened one in a hex editor and I see 4 rows of numbers (like 01 00 20 02
00 20 03 etc) on the left and a bunch of dots on the right side view. From
my limited knowledge I'm guessing that I'm looking at the file in
hexadecimal on the left and the right is non printable or non text
characters thus showing up as dots.

So I'm curious how people go about determining file formats. Is it mostly
guess work or is there a more strategic approach I can use?

Thanks alot!!

Btw please recommend a group I can ask this in if it doesn't apply here.

Jul 22 '05 #1
2 1239
James wrote:

Hi,

I have a dos program that creates data files which are used in another
program that is written by the same company. I am trying to figure out how I
can read the data from the data files.

When I open the files in notepad they look like crap so I'm assuming they
are written in binary?
Reasonable assumption

Now, assuming they are written in binary are there any methods I can use to
try to determine the format?

I opened one in a hex editor and I see 4 rows of numbers (like 01 00 20 02
00 20 03 etc) on the left and a bunch of dots on the right side view. From
my limited knowledge I'm guessing that I'm looking at the file in
hexadecimal on the left and the right is non printable or non text
characters thus showing up as dots.
Right. Most Hex Editors present the data in that way.

So I'm curious how people go about determining file formats. Is it mostly
guess work or is there a more strategic approach I can use?


Ask the company on a documentation for the file format.
If they don't give you that information, then it is .... guess work

Usually you start with:
let the program create a data file with minimal data (no user data
at all if possible). Name that file 'Empty'.
Now let the program create a data file with a little more user
data. Compare that file with 'Empty' and try to find the user data
(the things that change). If your user data contains some text you
most likely will find that text somewhere in the file. Other parts
of the file may have changed also. They could be some organizational
entries, such as: where in the file does the text section start, how
many entries are there (if a byte changes from 0 to 1, eg.). Things
like that. Try to make sense of that.
Try various other data files (but start with small ones. There is
no sense in analyzing a multi-MB data file. You will never figure out
how all those bytes are connected).

Good luck. It can take days or weeks to analyze a binary data format.

--
Karl Heinz Buchegger
kb******@gascad .at
Jul 22 '05 #2
"James" <j@j.net> wrote:
Hi,

I have a dos program that creates data files which are used in another
program that is written by the same company. I am trying to figure out how I
can read the data from the data files.

When I open the files in notepad they look like crap so I'm assuming they
are written in binary?
Something other than ASCII. Anything other than ASCII can be called
binary.
Now, assuming they are written in binary are there any methods I can use to
try to determine the format?
No. The only foolproof way is to examine the source code of the
program that wrote it. Or examine documentation written by somebody
who knew that code,
I opened one in a hex editor and I see 4 rows of numbers (like 01 00 20 02
00 20 03 etc) on the left and a bunch of dots on the right side view. From
my limited knowledge I'm guessing that I'm looking at the file in
hexadecimal on the left and the right is non printable or non text
characters thus showing up as dots.
That's right, that's how reasonable Hex editors work. A hex display
side-by-side with an ASCII display. Dots are usually displayed on the
ASCII side for unprintables.
So I'm curious how people go about determining file formats. Is it mostly
guess work or is there a more strategic approach I can use?


As stated above, guess work unless you can find source code or
documentation.

--
Tim Slattery
Sl********@bls. gov
Jul 22 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1795
by: Subodh | last post by:
Hi, Currently we get data from more then 200 different sources and all of our vendors provide data in different file formats. The problem is we have more then 100 DTS packages now and the maintainance is very diffucult. Every time our vendor changes the format we have to change in multiple DTS packages. Is anybody know what would be the right way of reducing the no. of DTS packages.
3
9613
by: Tanuki | last post by:
Hi All: I encounter a programming problem recently. I need to read a binary file. I need to translate the binary data into useful information. I have the format at hand, like 1st byte = ID, next 4 byte (int) = serial number etc. The first problem is Big Endian/ Little Endian problem. I can decipher if the format is big or little endian. But got confuse as to how to decipher the data.
12
7335
by: Danny Lu | last post by:
Can anyone tell me if all the .obj or .o files are compatible?
0
3930
by: Lokkju | last post by:
I am pretty much lost here - I am trying to create a managed c++ wrapper for this dll, so that I can use it from c#/vb.net, however, it does not conform to any standard style of coding I have seen. It is almost like it is trying to implement it's own COM interfaces... below is the header, and a link to the dll+code: Zip file with header, example, and DLL:...
7
331
by: Bart | last post by:
When I try to upload a file from whithin a form, it works locally. But when i deploy my asp.net application, i got the error 'uri formats are not supported'. I thought it has something to do with permissions, so i looked in the target-directory from the upload, and i have these permissions : everyone-full control. What could be the problem ? Bart
4
1939
by: Eric | last post by:
Hi, I need to find a way to identify between a few different file formats WITHOUT looking at the file extension. Very often our customers will name file incorrectly. For example, they'll send us a file that's named 'filename.xls', but it's actually a tab delimited or comma delimited file. The possible formats that I need to identify are: HTML, tab delimited, comma delimited or Excel.
68
5209
by: vim | last post by:
hello everybody Plz tell the differance between binary file and ascii file............... Thanks in advance vim
1
2611
by: feltra | last post by:
Hi, The following is from my friend who has only restricted net access from his office and hence cannot post.... -------------------------------------------------------------------------------------------------- I am trying to export a GridView data to multiple file formats. The requirement is that when more than one file format is selected and the "Submit" button is clicked, the data from the grid needs to get exported to multiple...
0
1653
by: feltra | last post by:
Hi all, I am trying to export a GridView data to multiple file formats. The requirement is that when more than one file format is selected and the "Submit" button is clicked, the data from the grid needs to get exported to multiple file formats. First of all is it possible? I am using ASP.NET 2.0 and VS 2005. If possible can you give me any hints as to how to go about it? Thank you. Right now I am using the Response.ContentType method...
2
2921
by: Peter Oliphant | last post by:
The Image class allows loading a bitmap from a graphic file. So far I've gotten it to work with JPG and BMP files. What other graphic file formats are supported in this way? Is this fixed based on the .NET Framework used (e.g., the Image class defines which formats can be used), or can different file formats be added after-the-fact (end-user capability in contrast to developer implementation)? Also, is it possible to save an image in...
0
8370
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8283
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8811
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8470
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8590
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7302
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5620
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4291
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2707
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.