473,385 Members | 1,445 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

How to best update remote compressed, encrypted archives incrementally?

Hello,

I want to put (incrementally) changed/new files from a big file tree
"directly,compressed and password-only-encrypted" to a remote backup
server incrementally via FTP,SFTP or DAV.... At best within a closed
algorithm inside Python without extra shell tools.
(The method should work with any protocol which allows somehow read,
write & seek to a remote file.)
On the server and the transmission line there should never be
unencrypted data.

Usually one would create a big archive, then compress, then encrypt
(e.g. with gpg -c file) , then transfer. However for that method you
need to have big free temp disk space and most costing: transfer always
the complete archive.
With proved block-file encryption methods like GPG I don't get the
flexibility needed for my task, I guess?

ZIP2 format allows encryption (Is this ZIP encryption method supported
with Python somehow/basically?). Somehow it would be possible to
navigate in a remote ZIP (e.g. over FTP) . But ZIP encryption is also
known to be very weak and can be cracked within some hours computing
time, at least when every file uses the same password.

Another method would be to produce slice files: Create inremental
TAR/ZIP archives, encrypt them locally with "gpg -c" and put them as
different files. Still a fragile setup, which allows only rough control,
needs a common archive time stamp (comparing individual file attributes
is not possible), and needs external tools.

Very nice would be a method which can directly compare agains and update
a single consistent file like
ftp://..../archive.zip.gpg

Is something like this possible?

Robert
Mar 10 '06 #1
8 2746
On Fri, 10 Mar 2006 15:13:07 +0100, robert wrote:
Hello,

I want to put (incrementally) changed/new files from a big file tree
"directly,compressed and password-only-encrypted" to a remote backup
server incrementally via FTP,SFTP or DAV.... At best within a closed
algorithm inside Python without extra shell tools.
What do you mean by "closed algorithm"?

The only thing I can think of is you mean a secret algorithm, one which
nobody but yourself will know. So let's get this straight... you are
asking a public newsgroup dedicated to an open-source language for
somebody to tell you a secret algorithm that only you will know?

Please tell me I've misunderstood.

(The method should work with any protocol which allows somehow read,
write & seek to a remote file.)
On the server and the transmission line there should never be
unencrypted data.


Break the job into multiple pieces. Your task is:

- transmit information to the remote server;

Can you use SSH for that? SSH will use industrial strength encryption,
likely better than anything you can create.

- you want to update the files at the other end;

Sounds like a job for any number of already existing technologies, like
rsync (which, by the way, already uses ssh for the encrypted transmission
of data).

--
Steven.

Mar 11 '06 #2
Steven D'Aprano wrote:
On Fri, 10 Mar 2006 15:13:07 +0100, robert wrote:

Hello,

I want to put (incrementally) changed/new files from a big file tree
"directly,compressed and password-only-encrypted" to a remote backup
server incrementally via FTP,SFTP or DAV.... At best within a closed
algorithm inside Python without extra shell tools.

What do you mean by "closed algorithm"?

The only thing I can think of is you mean a secret algorithm, one which
nobody but yourself will know. So let's get this straight... you are
asking a public newsgroup dedicated to an open-source language for
somebody to tell you a secret algorithm that only you will know?

Please tell me I've misunderstood.


no. I meant it terms of 'cohesive' : A Python solution without a lot of
other tools. (Only the password has to be secret)
(The method should work with any protocol which allows somehow read,
write & seek to a remote file.)
On the server and the transmission line there should never be
unencrypted data.

Break the job into multiple pieces. Your task is:

- transmit information to the remote server;

Can you use SSH for that? SSH will use industrial strength encryption,
likely better than anything you can create.


Yes, sftp (=SSH) or ftp with TSL (=SSL) are good protocols. They can
also read/navigate in a remote fila and append-to-file. But how about
incremental+encrypted?
- you want to update the files at the other end;

Sounds like a job for any number of already existing technologies, like
rsync (which, by the way, already uses ssh for the encrypted transmission
of data).


As far as I know, rsync cannot update compressed+encrypted into an
existing file(set) ?
I any case with rsync I would have to have a duplicate of the backup
file geometry on the local machine (consuming another magnitude of the
file stuff itself) ?

Thats why I ask: how to get all these tasks into a cohesive encrypted
backup solution not wasting disk space and network bandwidth?

Robert
Mar 11 '06 #3
On Sat, 11 Mar 2006 11:46:24 +0100, robert wrote:
Sounds like a job for any number of already existing technologies, like
rsync (which, by the way, already uses ssh for the encrypted transmission
of data).
As far as I know, rsync cannot update compressed+encrypted into an
existing file(set) ?
I any case with rsync I would have to have a duplicate of the backup
file geometry on the local machine (consuming another magnitude of the
file stuff itself) ?


Let me see if I understand you.

On the remote machine, you have one large file, which is compressed and
encrypted. Call the large file "Archive". Archive is made up of a number
of virtual files, call them A, B, ... Z. Think of Archive as a compressed
and encrypted tar file.

On the local machine, you have some, but not all, of those smaller
files, let's say B, C, D, and E. You want to modify those smaller files,
compress them, encrypt them, transmit them to the remote machine, and
insert them in Archive, replacing the existing B, C, D and E.

Is that correct?
Thats why I ask: how to get all these tasks into a cohesive encrypted
backup solution not wasting disk space and network bandwidth?


What's your budget for developing this solution? $100? $1000? $10,000?
Stop me when I get close. Remember, your time is money, and if you are a
developer, every hour you spend on this is costing your employer anything
from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
might value your time as Free.)

If you have an unlimited budget, you can probably create a solution to do
this, keeping in mind that compressed/encrypted and modify-in-place
*rarely* go together.

If you have a lower budget, I'd suggest you drop the "single file"
requirement. Hard disks are cheap, less than an Australian dollar a
gigabyte, so don't get trapped into the false economy of spending $100 of
developer time to save a gigabyte of data. Using multiple files makes it
*much* simpler to modify-in-place: you simply replace the modified file.
Of course the individual files can be compressed and encrypted, or you can
use a compressed/encrypted file system.

Lastly, have you considered that your attempted solution is completely the
wrong way to solve the problem? If you explain _what_ you are wanting to
do, rather than _how_ you want to do it, perhaps there is a better way.
--
Steven.

Mar 11 '06 #4
Steven D'Aprano wrote:

Let me see if I understand you.

On the remote machine, you have one large file, which is compressed and
encrypted. Call the large file "Archive". Archive is made up of a number
of virtual files, call them A, B, ... Z. Think of Archive as a compressed
and encrypted tar file.

On the local machine, you have some, but not all, of those smaller
files, let's say B, C, D, and E. You want to modify those smaller files,
compress them, encrypt them, transmit them to the remote machine, and
insert them in Archive, replacing the existing B, C, D and E.

Is that correct?


Yes, that is it. In addition a possiblity for (fast) comparison of
individual files would be optimal.
Thats why I ask: how to get all these tasks into a cohesive encrypted
backup solution not wasting disk space and network bandwidth?


What's your budget for developing this solution? $100? $1000? $10,000?
Stop me when I get close. Remember, your time is money, and if you are a
developer, every hour you spend on this is costing your employer anything
from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
might value your time as Free.)

If you have an unlimited budget, you can probably create a solution to do
this, keeping in mind that compressed/encrypted and modify-in-place
*rarely* go together.

If you have a lower budget, I'd suggest you drop the "single file"
requirement. Hard disks are cheap, less than an Australian dollar a
gigabyte, so don't get trapped into the false economy of spending $100 of
developer time to save a gigabyte of data. Using multiple files makes it
*much* simpler to modify-in-place: you simply replace the modified file.
Of course the individual files can be compressed and encrypted, or you can
use a compressed/encrypted file system.

Lastly, have you considered that your attempted solution is completely the
wrong way to solve the problem? If you explain _what_ you are wanting to
do, rather than _how_ you want to do it, perhaps there is a better way.


So, there seems to be a big barrier for that task, when encryption is on
the whole archive. A complex block navigation within a block cipher
would be required, and obviously there is no such (handy) code already
existing. Or is there a encryption/decryption method which you can can
use like a file pipe _and_ which supports 'seek'?

Thus, a simple method would use a common treshold timestamp or
archive-bits and create multiple archive slices. (Instable when the file
set is dynamic and older files are copied to the tree.)

2 nearly optimal solutions which allows comparing individual files

1st:
+ an (s)ftp(s)-to-zip/tar bridge seems to be possible. E.g. by hooking
ZipFile to use a virtual self.fp
+ the files would be individually encrypted by a password
- an external tool like "gpg -c" is necessary; (or is there a good
encryption with a native python module? Is PGP (password only) possible
with a native python module? )
- the filenames would be visible

2nd:
+ manage a dummy file-tree locally for speedy comparision (with 0-length
files)
+ create encrypted archive slices for upload with iterated filenames
- an external tool like "gpg -c" is necessary
- extra file tree or file attribute database
- unrolling status from multiple archive slices is arduous

Robert
Mar 11 '06 #5
On Sat, 11 Mar 2006 16:09:22 +0100, robert wrote:
Lastly, have you considered that your attempted solution is completely the
wrong way to solve the problem? If you explain _what_ you are wanting to
do, rather than _how_ you want to do it, perhaps there is a better way.


So, there seems to be a big barrier for that task, when encryption is on
the whole archive. A complex block navigation within a block cipher
would be required, and obviously there is no such (handy) code already
existing. Or is there a encryption/decryption method which you can can
use like a file pipe _and_ which supports 'seek'?


[snip]

Let's try again: rather than you telling us what technology you want to
use, tell us what your aim is. I suspect you are too close to the trees to
see the forest -- you are focusing on the fine detail. Let's hear the big
picture: what is the problem you are trying to solve? Because, frankly, as
far as I can see, the solution you are looking for doesn't exist. But
maybe I'm too far from the forest to see the individual trees.

"I need encryption that supports seek" -- no, that's you telling us _how_
you want to solve your problem.

Perhaps you can tick some/all of the following requirements:

- low bandwidth usage when updating the remote site

- transmission needs to be secure

- data on the remote site needs to be secure in case of theft or break-ins

- remote site is under the control of untrusted parties;
or remote site is trusted

- remote site is an old machine with limited processing power and very
small disk storage;
or remote site can be any machine we choose

- local site needs to run Windows/Macintosh/Linux/BSD/all of the above

- remote site runs on Windows/Macintosh/Linux/BSD/anything we like

- we are updating text files/binary files

- anything else you can tell us about the nature of your problem

--
Steven.

Mar 11 '06 #6
Steven D'Aprano wrote:
On Sat, 11 Mar 2006 16:09:22 +0100, robert wrote:

Lastly, have you considered that your attempted solution is completely the
wrong way to solve the problem? If you explain _what_ you are wanting to
do, rather than _how_ you want to do it, perhaps there is a better way.


So, there seems to be a big barrier for that task, when encryption is on
the whole archive. A complex block navigation within a block cipher
would be required, and obviously there is no such (handy) code already
existing. Or is there a encryption/decryption method which you can can
use like a file pipe _and_ which supports 'seek'?

[snip]

Let's try again: rather than you telling us what technology you want to
use, tell us what your aim is. I suspect you are too close to the trees to
see the forest -- you are focusing on the fine detail. Let's hear the big
picture: what is the problem you are trying to solve? Because, frankly, as
far as I can see, the solution you are looking for doesn't exist. But
maybe I'm too far from the forest to see the individual trees.

"I need encryption that supports seek" -- no, that's you telling us _how_
you want to solve your problem.

Perhaps you can tick some/all of the following requirements:

- low bandwidth usage when updating the remote site

- transmission needs to be secure

- data on the remote site needs to be secure in case of theft or break-ins

- remote site is under the control of untrusted parties;
or remote site is trusted

- remote site is an old machine with limited processing power and very
small disk storage;
or remote site can be any machine we choose

- local site needs to run Windows/Macintosh/Linux/BSD/all of the above

- remote site runs on Windows/Macintosh/Linux/BSD/anything we like

- we are updating text files/binary files

- anything else you can tell us about the nature of your problem


The main requirement is, that it has to be become a cohesive, reusable,
portable (FTP/SFTP standard) functionality as mentioned in the OP. A
Python module at best. For integration in a bigger Python app. not a
one-time admin hack with a bunch of tools to be fiddled together on each
user machine. So the 'how' is mostly =='what'. Its a Python question so far.

The last 2 methods I mentioned already are maybe a way to a compromise,
(if integrated one-stream encryption cannot be managed)

The only issue remaining: A native Python module for pgp-(pwd
only)-encryption or another kind of good (commonly supported)
encryption. ZIP2-encryption itself seems to be too weak? (Still so in
recent ZIP formats? what about the mode of 7zip etc?) But I found no
python modules for either.

http://www.amk.ca/python/code/gpg just calls into an external gpg
installation.

Can the functionality of "gpg -c" maybe fiddled together with PyCrypto
easily ? (variable length key/pwd only - no public key stuff required)

And what about ZIP password-only encryption itself? Are there maybe any
usable improvents ?

And: when there are many files encrypted with the same password (both
PGP and ZIP), will this decrease the strength of encryption?

Robert
Mar 11 '06 #7
Would rsync into a remote encrypted filesystem work for you?

Mar 13 '06 #8
ga*******@gmail.com wrote:
Would rsync into a remote encrypted filesystem work for you?


the sync (selection) is custom anyway. The remote filesystem is
general/unknow. FTP(S) / SFTP is the only standard given.
Mar 13 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Dennis Hotson | last post by:
Hi, I'm trying to write a function that adds a file-like-object to a compressed tarfile... eg ".tar.gz" or ".tar.bz2" I've had a look at the tarfile module but the append mode doesn't support...
4
by: brianobush | last post by:
# # My problem is that I want to create a # class, but the variables aren't known # all at once. So, I use a dictionary to # store the values in temporarily. # Then when I have a complete set, I...
0
by: Jawahar | last post by:
All I had posted this in the remote assistance group and could not get any replies so I thought that I could try in the developer group Thanks One of the issues we face when helping our remote...
6
by: pg | last post by:
Is there any simple way to query the most recent time of "changes" made to a table? I'm accessing my database with ODBC to a remote site thru internet. I want to eliminate some DUPLICATE long...
4
by: Pavel | last post by:
Hello. I am trying to make a folder compressed and failing miserably. Below are three ways that I tried to make it compressed, all of them compile and run w/o any problems, but the folder is...
13
by: Leonardo Francalanci | last post by:
With mysql I know how much space a row will take, based on the datatype of it columns. I also (approximately) know the size of indexes. Is there a way to know that in postgresql? Is there a way...
11
by: kiln | last post by:
I am starting a project that may be suitable for vb.net, using windows forms. I want a rich client, thus win forms vs web forms. Most users will access the app data over a LAN, but some will be...
1
by: Peter Thorne | last post by:
I am a perl newbie who is trying to write a script to automate a task. I have a large collection of compressed archives (mostly .tar.gz, tar.bz2, tar.Z, .tgz etc). This are stored in a number...
4
by: dgleeson3 | last post by:
Hello all I am creating a VB.Net distributed SQL server 2005 application. Each computer in the system has a database with a table of users and their telephone numbers. Each computer has a...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.