473,770 Members | 4,029 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Suggestion needed on data storage format in text file


The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are yo******@google .com#|****@goog le.com#|# ... and so
on
Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish

Jul 19 '06 #1
14 2390
Manish wrote:
The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are yo******@google .com#|****@goog le.com#|# ... and so
on
Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish
Personally, I'd use a database. I wouldn't even try a flat file for
this. Too much work trying to keep things straight.

But you asked about the formats. The fixed length fields will have
extra space any time the amount of data is less than that of the amount
reserved. Then you run into the problem of someone who gets very
verbose with their subject line and exceeds the 100 characters. And 4
bytes allows up to 9999 ID's. Is that enough? Or are you going to try
to read/write binary (not easy in PHP)?

The second one is problematical because the user may include your
separator in its Subject: line (or even name/address if you pick the
wrong character).

Two other ways - use CSV format, which is well documented and supported
by PHP and other programs. Or, add a length field at the beginning of
each field, specifying how many characters in the following field.

But I'd still use a database.
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jul 19 '06 #2
On Tue, 18 Jul 2006 21:28:13 -0700, Manish wrote:
#1
ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From Address[100
bytes]...etc...

#2
1324#|#Hi, How are yo******@google .com#|****@goog le.com#|# ... and so on
Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)
The first one will be more efficient from a search/replace point of view,
the second will be more efficient from a space usage point of view.
Efficiency is subjective.
Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.
substr would be used to cut out various portions of the string (e.g. 100
charactes starting at position 4) and sprintf (or fprintf to do it in PHP5
if you're using PHP5 to save a step).

If you need more than a pointer to the right functions, then it's starting
to sound like a homework assignment and I wish you luck with it...

Cheers,
Andy
--
Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer
http://www.gphpedit.org | PHP editor for Gnome 2
http://www.andyjeffries.co.uk | Personal site and photos

Jul 19 '06 #3
ronverdonk
4,258 Recognized Expert Specialist
Don't re-ivent the wheel!
Either use a data base or, if you must, stay with well documented formats like CSV or XML.

Ron :cool:
Jul 19 '06 #4
ronverdonk
4,258 Recognized Expert Specialist
Don't re-ivent the wheel!
Either use a data base or, if you must, stay with well documented formats like CSV or XML.

Ron :cool:
And to contradict myself:
I just saw a new class 'Variable Length Coding' at the PHP Classes, link:
http://www.phpclasses. org/browse/package/3232.html

Its description reads:
This class can be used to compress and uncompress data using the variable length encoding.

It can read a stream of data and pack it using an pure PHP implementation of the variable length encoding algorithm.

It can also do the opposite reading a variable length encoded stream of data and unpack it to restore the original uncompressed data.

So, if you still want variable length coding, check it out!

Ronald :cool:
Jul 19 '06 #5
Manish wrote:
The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are yo******@google .com#|****@goog le.com#|# ... and so
on
Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish
That's the kind of project that SQLite was designed for. It's worth
looking into.

Jul 19 '06 #6
My suggestion is to use XML. PHP and Javascript has the Dom class that
supports this format very well. Its also easily extensible. And best of
all it's a text file.

Sample:

<mailbox name="some user">
<email>
<id>1234</id>
<subject>Send me the check<subject>
<to>no****@nosp am.com</to>
<from>so*****@s omeone.com</from>
<message><![CDATA[blah blah blah blah blah
blah blah blah]]></message>
<attach>path to attach 1</attach>
<attach>path to attach 2</attach>
</email>
<email>
<id>5678</id>
<subject>Send me the check<subject>
<to>no****@nosp am.com</to>
<from>so*****@s omeone.com</from>
<message><![cdata[blah blah blah ]]></message>
<attach>path to attach 1</attach>
<attach>path to attach 2</attach>
</email>
....etc...
</mailbox>
<mailbox name="some other user">
....
</mailbox>

Chung Leong wrote:
Manish wrote:
The project I am developing doesn't involves database. I want to parse
the mailbox file (.mbx) and store the summary in the text file for fast
retrieval and display of information in the Inbox page.

The sugegsted format are as:

#1

ID [4 bytes]: Subject [100 bytes]: To Address[100 bytes]: From
Address[100 bytes]...etc...

#2

Instead of preassining fixed size to variable (as actual data may be
much less or can grew to more), we can store the values continuously,
seperated by some unique seperator (#|#, *#*, ...)

1324#|#Hi, How are yo******@google .com#|****@goog le.com#|# ... and so
on
Which of these will be the efficeint one (as there will be frequent
insert/delete/update of the individual information, eg. set message as
read ..., delete message ..., new message ...)

Also please suggest on how to determine the variable size (100 bytes as
in #1), and assign the size to the variable accordingly and read it
(differentiate multiple variables) when required.

Thanks.

Manish

That's the kind of project that SQLite was designed for. It's worth
looking into.
Jul 20 '06 #7
Hi Jerry Stuckle, the project specifies not to use database, otherwise
it would have been definitely much easier. I have to store all the
information in the file itself. Thanks for bringing into atention that
whatever, seperator with least probbability of occurence is chosen, it
can occur in subject line. May be we should use some escape character
for it. As it is used in mailbox file. Every new mail starts with "From
", but if it's in the message itself, it's replaced by ">From ". I will
also look into the CSV format for storing the data.
Hi Andy Jeffries, we are using PHP 5, so sprintf/fprintf can be used. I
haven't come across using pointers in PHP. I will definitely try to
learn it.
Hi ImOk, our initial datastructure was in the XML format itself,
(individual XML file for every user). As there can be thousands of
email, the file will grew larger and reading/writing may be slow/error
prone. So it was suggested to use text file.

-----------------------------------------------------------------------------------------------------------------------------
This is how the datastructure is
-----------------------------------------------------------------------------------------------------------------------------

<mails>
<details id="">
<!-- Mail type (incoming, outgoing) -->
<mailtype></mailtype>
<!-- Whether the message is saved as templete (Yes: 1, No: 0) -->
<istemplate></istemplate>
<!-- The mailbox id in which the mail reside (id for Inbox, Personal
Folders, Trash ... ) -->
<mailboxid></mailboxid>
<!-- Message Priority (Normal:1, High Priority: 2) -->
<priority></priority>
<!-- Is message starred (Yes: 1, No: 0) -->
<isstarred></isstarred>
<!-- Is message read (Yes: 1, No: 0) -->
<isread></isread>
<!-- Is message replied back to sender (Yes: 1, No: 0) -->
<isreplied></isreplied>
<!-- Is message forwarded to any email (Yes: 1, No: 0) -->
<isforwarded> </isforwarded>

<!-- Does message has attachment (Yes: 1, No: 0) -->
<hasattachment> </hasattachment>
<!-- Attachment details -->
<attachments>
<attdetails id="">
<!-- Attachment file name -->
<filename></filename>
<!-- Attachment file size -->
<filesize></filesize>
</attdetails>
</attachments>
<!-- Sender name -->
<fromname></fromname>
<!-- Sender email -->
<fromemail></fromemail>
<!-- Total email conversation (1, 2, ... ) -->
<totalconversat ion></totalconversati on>
<!-- Main Email detail id (sno), from which the conversation started
-->
<mainemailsno ></mainemailsno>
<!-- Emails in To field -->
<toemails></toemails>
<!-- Emails in CC field -->
<ccemails></ccemails>

<!-- Mail content in HTML format -->
<htmlcontent> </htmlcontent>
<!-- Mail content in Text format -->
<textcontent> </textcontent>
<!-- Date time when the message was sent -->
<sentdatetime ></sentdatetime>
<!-- Message size in KB -->
<messagesize> </messagesize>

<!-- Offset in mbx file -->
<offsetinmbx> </offsetinmbx>

<!-- Extra details for incoming/outgoing type emails -->
<incomingdetail s>
</incomingdetails >
<outgoingdetail s>
<!-- Emails in CC field -->
<bccemails></bccemails>
<!-- Message Status (sent, pending) -->
<msgstatus></msgstatus>
<!-- Id of the signature to be appended to the message -->
<signatureid> </signatureid>
<!-- Scheduled date time (24 hour format) for sending the mail to
recepients (MM/DD/YYY hh:mm) -->
<scheduledtime> </scheduledtime>
<!-- Whether to request a return receipt (Yes: 1, No: 0) -->
<requestreceipt ></requestreceipt>
<!-- Message send status (pending, sent) -->
<sendstatus></sendstatus>
</outgoingdetails >
</details>

</mails>

-----------------------------------------------------------------------------------------------------------------------------

But the other setting will still be in XML file.

We are using SimpleXML functions (get values, update values), DOM
(insert). Still the delete functionality is not working. We are
thinking of implementing preg_replace() for it.

Thanks.

Manish

Jul 20 '06 #8
On Wed, 19 Jul 2006 21:07:06 -0700, Manish wrote:
Hi Andy Jeffries, we are using PHP 5, so sprintf/fprintf can be used. I
haven't come across using pointers in PHP. I will definitely try to learn
it.
It's not pointers but string parsing (getting out a section of a string
and formatting a string to contain exact lengths of string).
Hi ImOk, our initial datastructure was in the XML format itself,
(individual XML file for every user). As there can be thousands of email,
the file will grew larger and reading/writing may be slow/error prone. So
it was suggested to use text file.
I don't wish to sound offensive, but if you can't correctly write to an
XML file without errors, why do you think you'll be able to do it to a
flat file using functions/methods you don't know?

Also, bear in mind if you use a database it will also handle locking from
multiple processes easily, which you will have to handle yourself in this
situation.

Don't think "we'll only have one user accessing their account through a
single web instance so we won't have concurrency issues" - people these
days may use browser tabs to work on their mail concurrently.

And you really do run the risk of data loss/corruption if you don't
correctly lock access to the file.

Cheers,

Andy

--
Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer
http://www.gphpedit.org | PHP editor for Gnome 2
http://www.andyjeffries.co.uk | Personal site and photos

Jul 20 '06 #9
Manish wrote:
Hi Jerry Stuckle, the project specifies not to use database, otherwise
it would have been definitely much easier. I have to store all the
information in the file itself. Thanks for bringing into atention that
whatever, seperator with least probbability of occurence is chosen, it
can occur in subject line. May be we should use some escape character
for it. As it is used in mailbox file. Every new mail starts with "From
", but if it's in the message itself, it's replaced by ">From ". I will
also look into the CSV format for storing the data.
Hi Andy Jeffries, we are using PHP 5, so sprintf/fprintf can be used. I
haven't come across using pointers in PHP. I will definitely try to
learn it.
Hi ImOk, our initial datastructure was in the XML format itself,
(individual XML file for every user). As there can be thousands of
email, the file will grew larger and reading/writing may be slow/error
prone. So it was suggested to use text file.

-----------------------------------------------------------------------------------------------------------------------------
This is how the datastructure is
-----------------------------------------------------------------------------------------------------------------------------
<snip>
-----------------------------------------------------------------------------------------------------------------------------

But the other setting will still be in XML file.

We are using SimpleXML functions (get values, update values), DOM
(insert). Still the delete functionality is not working. We are
thinking of implementing preg_replace() for it.

Thanks.

Manish
Manish,

If the problem is speed, a flat file isn't going to help you that much
more. You'll still have to encode and decode the data, no matter which
format you use. And even if it's faster now, all you're doing is
delaying the inevitable. You definitely need a database.

If it were me, I'd go back to them and explain why they need a database.
But I'm only a consultant...

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Jul 20 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2757
by: Newgene | last post by:
Hi, group, I have python2.3 installed on win2k. I noticed that when I open a dos format text file (eol is '\r\n'), readline() always returns a line ending with '\n' only, not '\r\n'. While I read the same file on unix, it returns a line ending with '\r\n' correctly. This makes me difficult to determine the format of a text file, dos or unix. Is this a bug or intended behavior? If not a bug, then how to determine the format of a text...
0
2164
by: adrian GREEMAN | last post by:
When I try to import a text file with new data for an existing table I get the error "1148 - the used command is not allowed with this MySQL version." I have tried with both PHPMyAdmin2.3 and with MySQLFront 2.5. Both these GUI programmes have an "import from text file" command which I have used successfully several times to add entries to this table before - running just this MySQL version. I have structured the data in the text...
2
5680
by: ezelasky | last post by:
We are using the bcp utility (via APIs) to export data from a SQL table in a fixed format text file. BCP is inserting spaces for a field if the field contains a NULL. This is fine with us except at the end of the line, there are no spaces for that field just the end-of-row terminator prematurely, so it looks like that field is not present and messes up another piece of software we pump the text file into down stream. Example -- The...
1
300
by: sympatico | last post by:
pls help me slove this problem. i am a newbie in using asp.net and SQL server 2000 and my problem for now is to pass data to a textfile and i have fail to do so. the way i did it was i used a datagrid to display certain data for the user to choose, as they check the checkbox beside the data, the data shall be store into a text file. therefore, i wrote this oWrite = oFile.CreateText("C:\Inetpub\wwwroot\VolunteerMatch\tempstore.txt")...
13
4244
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them from the html page (replace the words in the html page with blank space) I'm new to python and could use a little push in the right direction, any ideas on how to implement this? Thanks!
2
4498
by: Praveen_db2 | last post by:
Hi all Db2 8.1.3 windows Is there any way to write data into a text file using a stored procedure? The way we return a cursor output to the calling application, can we return data in a text file? Regards Praveen
2
2553
by: Naha | last post by:
Hi Guys, Does anyone know how to display data from a text file onto a webpage? So far I have opened and read the data in the text file and printed it out on command line, but I am having difficulty of displaying the data from the text file on to the webpage.
7
3332
by: jetaw03 | last post by:
guys, can you help me to get data from a text file and putting it in an excel file? my programming language is visual basic 6.0 here is a sample data from the text file: ANO,BNO,TRANSDATE,TRANSTIME,ORIGINS,DESTINATION,CALL_TYPE,OUTROUTE,INROUTE,CALLS,ACTUAL_MINS,COST 0498373386,006623322001,20070216,16:33:35,SRO,THB,IDD,IBEF7O,,1,0.766666666666667,-0.014413 0498373386,006623322001,20070223,12:59:40,SRO,THB,IDD,0241,,1,0.8,-0.01264...
1
2066
by: okd | last post by:
I am transfering data from a text file which is in uniccode format to a table . I am getting below error. SQLState = 22005, NativeError = 0 Error = Invalid character value for cast specification Situation :- Text file contains Uniccode charecters. Importing that text file data to a table column varchar(8000). Length of uniccode text is more than 4000 . How to solve this problem ? or any alternate method.
0
9602
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9439
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10237
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10071
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9882
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8905
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5326
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5467
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3589
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.