473,563 Members | 2,867 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Hi reliability files, writing,reading and maintaining

Hello, help/advice appreciated.

Background:
I am writing some web scripts in python to receive small amounts of data
from remote sensors and store the data in a file. 50 to 100 bytes every 5 or
10 minutes. A new file for each day is anticipated. Of considerable
importance is the long term availability of this data and it's gathering and
storage without gaps.

As the remote sensors have little on board storage it is important that a
web server is available to receive the data. To that end two separately
located servers will be running at all times and updating each other as new
data arrives.

I also assume each server will maintain two copies of the current data file,
only one of which would be open at any one time, and some means of
indicating if a file write has failed and which file contains valid data.
The latter is not so important as the data itself will indicate both its
completeness (missing samples) and its newness because of a time stamp with
each sample.
I would wish to secure this data gathering against crashes of the OS,
hardware failures and power outages.

So my request:
1. Are there any python modules 'out there' that might help in securely
writing such files.
2. Can anyone suggest a book or two on this kind of file management. (These
kind of problems must have been solved in the financial world many times).

Many thanks,

John Pote

Feb 7 '06 #1
7 1515

"John Pote" <jo******@bluey onder.co.uk> wrote in message
news:Y1******** *************@f e3.news.blueyon der.co.uk...
I would wish to secure this data gathering against crashes of the OS,
I have read about people running *nix servers a year or more without
stopping.
hardware failures
To transparently write to duplicate disks, lookup RAID (but not level 0
which I believe does no duplication).
and power outages.


The UPSes (uninterruptabl e power supplies) sold in stores will run a
computer about half an hour on the battery. This is long enough to either
gracefully shut down the computer or startup a generator.


Feb 7 '06 #2
John Pote wrote:
Hello, help/advice appreciated.
I am writing some web scripts in python to receive small amounts of data
from remote sensors and store the data in a file. 50 to 100 bytes every 5 or
10 minutes. A new file for each day is anticipated. Of considerable
importance is the long term availability of this data and it's gathering and
storage without gaps.


This looks to me like the kind of thing a database is designed to
handle. File systems under many operating systems have a nasty
habit of re-ordering writes for I/O efficiency, and don't necessarily
have the behavior you need for your application. The "ACID" design
criteria for database design ask that operations on the DB are:
Atomic
Consistent
Independent
Durable
"Atomic" means that the database always appears as if the "transactio n"
has either happened or not; it is not possible for any transaction to
see the DB with any transaction in a semi-completed state. "Consistent "
says that if you have invariants that are true about the data in the
database, and each transaction preserves the invariants, the database
will always satisfy the invariants. "Independen t" essentially says that
no transaction (such as reading the DB) will be able to tell it is
running in parallel with other transactions (such as reads). "Durable"
says that, once a transaction has been committed, even pulling the plug
and restarting the DBMS should give a database with those transactions
which got committed there, and no pieces of any other there.

Databases often provide pre-packaged ways to do backups while the DB
is running. These considerations are the core considerations to database
design, so I'd suggest you consider using a DB for your application.

I do note that some of the most modern operating systems are trying
to provide "log-structured file systems," which may help with the
durability of file writes. I understand there is an attempt even to
provide transactional interactions to the file systems, but I'm not
sure how far down the line that goes.

--
-Scott David Daniels
sc***********@a cm.org
Feb 7 '06 #3
Terry Reedy wrote:
"John Pote" <jo******@bluey onder.co.uk> wrote in message
news:Y1******** *************@f e3.news.blueyon der.co.uk...
I would wish to secure this data gathering against crashes of the OS,
I have read about people running *nix servers a year or more without
stopping.

He'd probably want to check the various block-journaling filesystems to
boot (such as Reiser4 or ZFS). Even though they don't reach DB-level of
data integrity they've reached an interresting and certainly useful
level of recovery.
To transparently write to duplicate disks, lookup RAID (but not level 0
which I believe does no duplication).

Indeed, Raid0 stores data across several physical drives (striping),
Raid1 fully duplicates the data over several physical HDs (mirror raid),
Raid5 uses parity checks (which puts it between Raid0 and Raid1) and
requires at least 3 physical drives (Raid0 and Raid1 require 2 or more).

You can also nest Raid arrays, the most common nesting are Raid 01
(creating Raid1 arrays of Raid0 arrays), Raid 10 (creating Raid0 arrays
of Raid1 arrays), Raid 50 (Raid0 array of Raid5 arrays), and the "Raids
for Paranoids", Raid 15 and Raid 51 arrays (creatung a Raid5 array of
Raid1 arrays, or a Raid1 array of Raid5 arrays, both basically means
that you're wasting most of your storage space for redundancy
informations, but that the probability of losing any data is extremely low).
Feb 7 '06 #4
"John Pote" <jo******@bluey onder.co.uk> writes:
1. Are there any python modules 'out there' that might help in securely
writing such files.
2. Can anyone suggest a book or two on this kind of file management. (These
kind of problems must have been solved in the financial world many times).


It's a complicated subject and is intimately mixed up with details of
the OS and filesystem you're using. The relevant books are books
about database implementation.

One idea for your situation is use an actual database (e.g. MySQL or
PostgreSQL) to store the data, so someone else (the database
implementer) will have already dealt with the issues of making sure
data is flushed properly. Use one of the Python DbAPI modules to
communicate with the database.
Feb 8 '06 #5
John Pote wrote:
Hello, help/advice appreciated.

Background:
I am writing some web scripts in python to receive small amounts of data
from remote sensors and store the data in a file. 50 to 100 bytes every 5 or
10 minutes. A new file for each day is anticipated. Of considerable
importance is the long term availability of this data and it's gathering and
storage without gaps.

As the remote sensors have little on board storage it is important that a
web server is available to receive the data. To that end two separately
located servers will be running at all times and updating each other as new
data arrives.

I also assume each server will maintain two copies of the current data file,
only one of which would be open at any one time, and some means of
indicating if a file write has failed and which file contains valid data.
The latter is not so important as the data itself will indicate both its
completeness (missing samples) and its newness because of a time stamp with
each sample.
I would wish to secure this data gathering against crashes of the OS,
hardware failures and power outages.

So my request:
1. Are there any python modules 'out there' that might help in securely
writing such files.
2. Can anyone suggest a book or two on this kind of file management. (These
kind of problems must have been solved in the financial world many times).

Many thanks,

John Pote

Others have made recommendations that I agree with: Use a REAL
database that supports transactions. Other items you must
consider:

1) Don't spend a lot of time engineering your software and then
purchase the cheapest server you can find. Most fault tolerance
has to due with dealing with hardware failures. Eliminate as
many single-point-of-failure devices as possible. If your
application requires 99.999 uptime, consider clustering.

2) Using RAID arrays, multiple controllers, ECC memory, etc. is
not cheap but then fault tolerance requires such investments.

3) Don't forget that power and Internet access are normally the
final single point of failure. It doesn't matter about all the
rest if the power is off for an extended period of time. You
will need to host your server(s) at a hosting facility that has
rock-solid Internet pipes and generator backed power. It won't
do any good to have a kick-ass server and software that can
handle all types of failures if someone knocking over a power
pole outside your office can take you offline.

Hope info helps.

-Larry Bates
in a hosting facility
Feb 8 '06 #6
John Pote wrote:
I would wish to secure this data gathering against crashes of the OS,
hardware failures and power outages.


My first thought when reading this is "SQLite" (with the Python wrappers
PySqlite or APSW).

See http://www.sqlite.org where it claims "Transactio ns are atomic,
consistent, isolated, and durable (ACID) even after system crashes and
power failures",

.... or some of the sections in http://www.sqlite.org/lockingv3.html
which provide more technical background.

If intending to rely on this for a mission critical system, one would be
well advised to research independent analyses of the claims.

-Peter

Feb 8 '06 #7
On Wed, 08 Feb 2006 00:29:16 +0100, rumours say that Xavier Morel
<xa**********@m asklinn.net> might have written:
You can also nest Raid arrays, the most common nesting are Raid 01
(creating Raid1 arrays of Raid0 arrays), Raid 10 (creating Raid0 arrays
of Raid1 arrays), Raid 50 (Raid0 array of Raid5 arrays), and the "Raids
for Paranoids", Raid 15 and Raid 51 arrays (creatung a Raid5 array of
Raid1 arrays, or a Raid1 array of Raid5 arrays, both basically means
that you're wasting most of your storage space for redundancy
informations , but that the probability of losing any data is extremely low).


Nah, too much talk. Better provide images:

http://www.epidauros.be/raid.jpg

--
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians
Feb 10 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

77
5650
by: nospam | last post by:
Reasons for a 3-tier achitecture for the WEB? (NOTE: I said, WEB, NOT WINDOWS. DON'T shoot your mouth off if you don't understand the difference.) I hear only one reason and that's to switch a database from SQL Server to Oracle or DB2 or vice versa... and that's it.... And a lot of these enterprises don't need it as they already know what...
2
436
by: Tim Blizard | last post by:
I know this topic has been discussed before but I couldn't find any thread more recent than about 18 months and was interested in what conclusions people had come to recently. Invariably 3 advantages of XML config files are promoted; 1. The .NET framework provides built-in support for reading application configuration data from .config...
34
6375
by: Ville Voipio | last post by:
I would need to make some high-reliability software running on Linux in an embedded system. Performance (or lack of it) is not an issue, reliability is. The piece of software is rather simple, probably a few hundred lines of code in Python. There is a need to interact with network using the socket module, and then probably a need to do...
4
3678
by: knapak | last post by:
Hello I'm a self instructed amateur attempting to read a huge file from disk... so bear with me please... I just learned that reading a file in binary is faster than text. So I wrote the following code that compiles OK. It runs and shows the requested output. However, after execution, it pops one of those windows to send error reports...
4
1298
by: John Pote | last post by:
Hello, help/advice appreciated. Background: I am writing some web scripts in python to receive small amounts of data from remote sensors and store the data in a file. 50 to 100 bytes every 5 or 10 minutes. A new file for each day is anticipated. Of considerable importance is the long term availability of this data and it's gathering and...
6
5248
by: arne.muller | last post by:
Hello, I've come across some problems reading strucutres from binary files. Basically I've some strutures typedef struct { int i; double x; int n; double *mz;
41
2078
by: Carl J. Van Arsdall | last post by:
Hey everyone, I have a question about python threads. Before anyone goes further, this is not a debate about threads vs. processes, just a question. With that, are python threads reliable? Or rather, are they safe? I've had some strange errors in the past, I use threading.lock for my critical sections, but I wonder if that is really good...
4
2248
by: tdahsu | last post by:
All, I'd appreciate any help. I've got a list of files in a directory, and I'd like to iterate through that list and process each one. Rather than do that serially, I was thinking I should start five threads and process five files at a time. Is this a good idea? I picked the number five at random... I was thinking that I might check...
4
1501
by: daveh551 | last post by:
I have done a Google search on this, and the hits seem to indicate that there's probably not a computationally easy way to do it, but I'll ask anyway before I go off and re-invent the wheel. I have user control do display a specified image file (.jpg) from a database. Files may have arbitrary height, width, and aspect ratios. I have height...
0
7665
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7583
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7888
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7642
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7950
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5484
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3643
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2082
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.