By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,360 Members | 2,961 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,360 IT Pros & Developers. It's quick & easy.

Using Python To Create An Encrypted Container

P: n/a
Is it possible? Bestcrypt can supposedly be set up on linux, but it seems
to need changes to the kernel before it can be installed, and I have no
intention of going through whatever hell that would cause.

If I could create a large file that could be encrypted, and maybe add
files to it by appending them and putting in some kind of delimiter
between files, maybe a homemade version of truecrypt could be constructed.

Any idea what it would take?

Apr 16 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Michael Sperlle <sp*****@yahoo.com> writes:
If I could create a large file that could be encrypted, and maybe add
files to it by appending them and putting in some kind of delimiter
between files, maybe a homemade version of truecrypt could be constructed.
Any idea what it would take?


If by container you mean a user-level file system with transparent
encryption, there are a bunch of ways to do it, but it's system
hacking, Python doesn't come into it much. If you just want an
encrypted archive, then put your files into a normal zip file and
encrypt the zip file.
Apr 16 '06 #2

P: n/a
Michael Sperlle <sp*****@yahoo.com> wrote:
Is it possible? Bestcrypt can supposedly be set up on linux, but it seems
to need changes to the kernel before it can be installed, and I have no
intention of going through whatever hell that would cause.

If I could create a large file that could be encrypted, and maybe add
files to it by appending them and putting in some kind of delimiter
between files, maybe a homemade version of truecrypt could be constructed.

Any idea what it would take?


you can either use fuse and its python bindings - this is rather trivial
filesystem-wise, of course there are challenges in order to implement
encryption effectively and efficiently (this is what encfs or phonebook
do, but they are written in C)

Or implement encrypted network block device in python - again, this
should not be _that_ hard.

Or implement either nfs or samba server with transparent encrypted
storage - this is what (again in C) cfs does. If you go the samba way,
it would be even cross-platform - but I have no idea how difficult it
would be to implement a samba server in python.

In all the cases, performance is going to be THE issue.

--
-----------------------------------------------------------
| Radovan GarabĂ*k http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
Apr 16 '06 #3

P: n/a
On Sun, 16 Apr 2006 08:11:12 -0700, Paul Rubin wrote:
Michael Sperlle <sp*****@yahoo.com> writes:
If I could create a large file that could be encrypted, and maybe add
files to it by appending them and putting in some kind of delimiter
between files, maybe a homemade version of truecrypt could be
constructed. Any idea what it would take?


If by container you mean a user-level file system with transparent
encryption, there are a bunch of ways to do it, but it's system hacking,
Python doesn't come into it much. If you just want an encrypted archive,
then put your files into a normal zip file and encrypt the zip file.


I've tried that, but the encryption and decryption take a long time
compared to opening/closing the container, once it's made.

The only other thing I can think of is making it non-readable for anyone
except root, but have the feeling that's not too secure.
Apr 16 '06 #4

P: n/a
Am Sonntag 16 April 2006 19:11 schrieb Michael Sperlle:
The only other thing I can think of is making it non-readable for anyone
except root, but have the feeling that's not too secure.


Huh? If you don't trust your operating system to correctly validate
file-permissions for you (on a server, on a client system which can be booted
by others than you or from which the physical harddisk can be extracted the
security implications are completely different), you're in absolutely no
position to even want encryption, because any malicious user can replace your
encryption code with code of his own, so that it's easily breakable by him.

Of course there are temporary local priviledge escalations (in some
applications, or even in the kernel of your operating system), but if you
rely on the operating system to keep your encryption code secure, you might
as well rely on the operating system to keep your data secure, because that's
basically the same thing.

--- Heiko.
Apr 16 '06 #5

P: n/a
>>>>> "Michael" == Michael Sperlle <sp*****@yahoo.com> writes:

Michael> Is it possible? Bestcrypt can supposedly be set up on
Michael> linux, but it seems to need changes to the kernel before
Michael> it can be installed, and I have no intention of going
Michael> through whatever hell that would cause.

Michael> If I could create a large file that could be encrypted,
Michael> and maybe add files to it by appending them and putting
Michael> in some kind of delimiter between files, maybe a homemade
Michael> version of truecrypt could be constructed.

One problem in using a large file which is encrypted is that a single
byte error can destroy all your data (see DEDICATION below). I wrote
a little python module called hashtar that works like tar, but
encrypts every file independently in a flattened dir structure with
filename hiding. It stores the file permissions, dir structure, and
ownership in an encrypted header of each encrypted file after first
padding some random data at the top to reduce the risk of known
plaintext attacks.

Here is some example usage
hashtar.py -cvf numeric.htar numeric Password:
Confirm:
numeric/__init__.py -> numeric.htar/1a/1a9f48d439144d1fa33b186fa49a6b63
numeric/contiguous_demo.py -> numeric.htar/8a/8a7757bf6f4a20e6904173f7c597eb45
numeric/diff_dmo.py -> numeric.htar/0c/0cea827761aef0ccfc55a869dd2aeb38
numeric/fileio.py -> numeric.htar/3e/3e50f59a1d2d87307c585212fb84be6a
numeric/find -> numeric.htar/b1/b1070de08f4ea531f10abdc58cfe8edc
...snip
find numeric.htar|head numeric.htar
numeric.htar/1a
numeric.htar/1a/1a9f48d439144d1fa33b186fa49a6b63
numeric.htar/8a
numeric.htar/8a/8a7757bf6f4a20e6904173f7c597eb45
numeric.htar/8a/8a4343ba60feda855fbaf8132e9b5a6b
numeric.htar/8a/8a72457096828c8d509ece6520e49d0b
numeric.htar/0c
numeric.htar/0c/0cea827761aef0ccfc55a869dd2aeb38
numeric.htar/3e

hashtar.py -tvf numeric.htar Password:
131 numeric/__init__.py
594 numeric/contiguous_demo.py
26944 numeric/Quant.pyc
209 numeric/extensions/test/rpm_build.sh
230 numeric/diff_dmo.py
439 numeric/fileio.py

It works across platforms (win32, OSX and linux tested), so files
encrypted on one will decrypt on others.

All the code lives in a single file (hashtar.py) included below.

See also the WARNING below (hint -- I am not a cryptographer)

#!/usr/bin/env python
"""
OVERVIEW

hashtar: an encrypted archive utility designed for secure archiving
to media vulnerable to corruption.

Recursively encrypt the files and directories passed as arguments.
Rather than preserving the directory structure, or archiving to a
single file as in tar, the files are encrypted to a single dir and
named with the hash of their relative path. The file information
(filename, permission mode, uid, gid) is encrypted and stored in the
header of the file itself, and can be used to restore the original
file with dir structure from the archive file. For example, the
command
hashtar.py -cvf tmp.htar finance/
prompts for a password and generates an encrypted recursive archive
of the finance dir in the tmp.htar dir, with filenames mapped like

finance/irs/98/f1040.pdf -> tmp.htar/e5/e5ed546c0bc0191d80d791bc2f73c890
finance/sale_house/notes -> tmp.htar/58/580e89bad7563ae76c295f75aecea030
finance/online/accounts.gz.mcr -> tmp.htar/bb/bbf12f06dc3fcee04067d40b9781f4a8
finance/phone/prepaid1242.doc -> tmp.htar/c1/c1fe52a9d8cbef55eff8840d379d972a

The encrypted files are placed in subdirs based on the first two
characters in their hash name because if too many files are placed
in one dir, it may not be possible to pass all of them as command
line arguments to the restore command. The entire finance dir
structure can later be restored with
hashtar.py -xvf tmp.htar
The advantage of this method of encrypted archiving, as opposed to
archiving to a single tar file and encrypting it, is that this
method is not sensitive to single byte corruption, which becomes
important especially on externally stored archives, such as on CDR,
or DVDR. Any individual file contains all the information needed to
restore itself, with directory structure, permission bits, etc. So
only the specific files that are corrupted on the media will be
lost.

The alternative strategy, encrypting all the files in place and then
archiving to external media, doesn't suffer from single byte
corruption but affords less privacy since the filenames, dir
structure, and permission bits are available, and less security
since a filename may indicate contents and thus expose the archive
to a known plaintext attack.

A match string allows you to only extract files matching a given
pattern. Eg, to only extract pdf and xls files, do
hashtar.py -m pdf,xls -xvf tmp.htar
Because the filenames are stored in the header, only a small portion
of the file needs to be decrypted to determine the match, so this is
quite fast.

Data can be encrypted and decrypted across platforms (tested between
linux and win32 and vice-versa) but of course some information may
be lost, such as uid, gid for platforms that don't support it.

USAGE:
hashtar.py [OPTIONS] files
OPTIONS

-h, --help Show help message and exit
-fDIR, --arcdir=DIR Write hashed filenames to archive dir
-pFILE, --passwdfile=FILE
Get passwd from FILE, otherwise prompt
-mPATTERN, --match=PATTERN
Only extract files that match PATTERN.
PATTERN is a comma separated list of strings,
one of which must match the filename
-u, --unlink Delete files after archiving them
-c, --create Create archive dir
-t, --tell Report information about files
-x, --extract Extract files recursively from archive dir
-v, --verbose Verbose listing of filenames to stdout

WARNING:

I think this software is suitable to protect your data from your
sister, your boss, and even the nosy computer hacker next door, but
not the NSA.

REQUIREMENTS:

python2.3 - python.org
yawPyCrypto and Flatten - http://yawpycrypto.sourceforge.net/
pycrypto - http://www.amk.ca/python/code/crypto.html

The python dependencies are very easy to install; just do the usual python setup.py install

PLATFORMS:

Tested on linux and win32

AUTHOR:

John D. Hunter <jd******@ace.bsd.uchicago.edu>

LICENSE:

same as python2.3

KNOWN BUGS:

Ignores symbolic links

DEDICATION:

For Erik Curiel, who's life's work I lost when I volunteered to
backup the only copy of his home dir on a CD containing a single
encrypted gzipped tar file, which was subsequently corrupted.

"""

import sys, os, random, struct, csv, time, glob
from md5 import md5
from optparse import OptionParser
from cStringIO import StringIO
from getpass import getpass
#def getpass(arg): pass
from yawPyCrypto.Cipher import DecryptCipher, EncryptCipher
from yawPyCrypto.Cipher import ZipDecryptCipher, ZipEncryptCipher
from yawPyCrypto.Constants import CIPHER_BLOWFISH, MODE_CBC

version = 0.3
pathsep = os.path.join('.','')[1:] # is there a better way to get this?

def encrypt_str(passwd, s, enc=None):
"""
Encrypt the string s using passwd and encryption cipher enc
"""
if enc is None:
enc = ZipEncryptCipher(passwd, CIPHER_BLOWFISH, MODE_CBC)
enc.feed(s)
enc.finish()
return enc.data

def decrypt_str(passwd, s, dec=None):
"""
Decrypt the string s using passwd and encryption cipher enc
"""
if dec is None:
dec = ZipDecryptCipher(passwd)
dec.feed(s)
dec.finish()
return dec.data

def ends_with_pathsep(fname):
"""
Return true if string fname ends in a path separator string
"""
head, tail = os.path.split(fname)
return tail == ''
junk = list('abcdefghijklmnopqrstuvwyzABCDEFGHIJKLMNOPQRS TUVWXYZ0123456789_- /\\.\n'*20)
numJunk = 200

def info_to_str(info):
"""
info is a fname, mode, uid, gid tuple

Return a string for storage in the encrypted archive outfile
"""
global junk
# Here is the real info string
sh = StringIO()
writer = csv.writer(sh)
writer.writerow(info)
s = sh.getvalue()

# because the info string will be fed thru zlib before encryption,
# I'm prepending numJunk garbage characters at the beginning of
# the key string to make it difficult for someone trying to do a
# known plaintext attack on the key file given that the info
# contains plain text that might be guessed and plain text that is
# common between archive file (such as base path names).
random.shuffle(junk)
junkStr = ''.join(junk[:numJunk])

return junkStr + s

def str_to_info(s):
"""
Takes string and returns (fname, mode, uid, gid)
"""
sh = StringIO(s[numJunk:])
reader = csv.reader(sh)
for row in reader:
fname,mode,uid,gid = row
return fname, int(mode), int(uid), int(gid)
def encode(infile, arcdir, passwd, unlink=0, verbose=0, endian='little'):
"""
Encrypt the file infile and hash it's filename to arcdir

- infile: the input filename
- arcdir is the dir that the files where the hashed filenames will
be placed
- passwd is the encryption passwd as string
- unlink, if true, will try to remove the src file after
archiving it

"""

if not os.path.exists(infile):
print >> sys.stderr, '%s does not exist: skipping' % infile
return
if os.path.isdir(infile):
# mark dirs with pathsep at the end if they don't have one
head, tail = os.path.split(infile)
if not ends_with_pathsep(infile):
infile = os.path.join(infile, '')

m = md5(infile)
hd = m.hexdigest()
tup = os.stat(infile)

if pathsep=='/': storeName = infile
else: storeName = '/'.join(infile.split(pathsep))
s = info_to_str( (storeName, tup.st_mode, tup.st_uid, tup.st_gid) )
s = encrypt_str(passwd, s)

outdir = os.path.join(arcdir, hd[:2])
if not os.path.isdir(outdir): os.mkdir(outdir, 0700)
outfile = os.path.join(outdir, hd)

oh = file(outfile, 'wb')

if endian=='little': fmt = '<'
else: fmt = '>'

oh.write(struct.pack(fmt+'fI', version, len(s)))
oh.write(s)

if verbose: print '%s -> %s' % (infile, outfile)
if os.path.isdir(infile): return 1 # nothing more to do for dirs

ih = file(infile, 'rb')
enc = EncryptCipher(passwd, CIPHER_BLOWFISH, MODE_CBC)
while 1:
data = ih.read(1024)
if len(data)==0: break
enc.feed(data)
oh.write(enc.data)
enc.finish()
oh.write(enc.data)
ih.close()
oh.close()

if unlink:
try: os.remove(infile)
except OSError, msg:
print >> sys.stderr, 'Could not remove', fname
print >> sys.stderr, msg
return 1

def decode(infile, passwd, unlink=0, verbose=0, match=None, endian='little'):
"""
Restore the original file system from the archive dir

- keys is a list of file information tuples; see str_to_keys.
- arcdir is the dir that the files with hashed filenames live
- passwd is the decryption passwd as string
- unlink, if true, will try to remove the archice file after
restoring it

"""

ih = file(infile, 'rb')
if endian=='little': fmt = '<'
else: fmt = '>'

thisVersion, lenHeader = struct.unpack(fmt+'fI', ih.read(8))
if lenHeader>2000:
print '%s header size %d too large; aborting (try flipping endian)'%(infile, lenHeader)
sys.exit()

try: header = decrypt_str(passwd, ih.read(lenHeader))
except MemoryError, msg:
print >>sys.stderr, 'Could not decode %s; skipping' % infile
return 0
except ValueError, msg:
print >>sys.stderr, 'Could not decode %s; bad passwd or file?' % infile
print >>sys.stderr, '\t', msg
return 0
fname, mode, uid, gid = str_to_info(header)

if match is not None:
for pattern in match.split(','):
if fname.find(pattern)!=-1: break
else: return 0

if verbose: print '%s -> %s' % (infile,fname)
if ends_with_pathsep(fname): # it's a dir
if not os.path.isdir(fname):
os.makedirs(fname)
os.chmod(fname, mode)
try: os.chown(fname, uid, gid)
except AttributeError, msg: pass
except OSError, msg: pass # if coming from win32, uid,gid=0

return 1 # nothing more to do

thedir, thename = os.path.split(fname)
if not os.path.isdir(thedir) and len(thedir): os.makedirs(thedir)

dec = DecryptCipher(passwd)
oh = file(fname, 'wb')

while 1:
data = ih.read(1024)
if len(data)==0: break
dec.feed(data)
oh.write(dec.data)
dec.finish()
oh.write(dec.data)
ih.close()
oh.close()

os.chmod(fname, mode)
try: os.chown(fname, uid, gid)
except AttributeError: pass
except OSError, msg: pass # if coming from win32, uid,gid=0

if unlink:
try: os.remove(infile)
except OSError, msg:
print >> sys.stderr, 'Could not remove', infile
def tell(infile, passwd, verbose=0, endian='little'):
"""
Report the information about infile
"""

if endian=='little': fmt = '<'
else: fmt = '>'

ih = file(infile, 'rb')
thisVersion, lenHeader = struct.unpack(fmt+'fI', ih.read(8))
if lenHeader>2000:
print '%s header size %d too large; aborting (try flipping endian)'%(infile, lenHeader)
sys.exit()
try: header = decrypt_str(passwd, ih.read(lenHeader))
except MemoryError, msg:
print >>sys.stderr, 'Could not decode %s; skipping' % infile
return 0
except ValueError, msg:
print >>sys.stderr, 'Could not decode %s; bad passwd or file?' % infile
print >>sys.stderr, '\t', msg
return 0
size = os.path.getsize(infile)-lenHeader
fname, mode, uid, gid = str_to_info(header)

print '%d\t%s'%(size, fname)
def getpass2():
"""
Prompt for a passwd twice, returning the string only when they
match
"""
p1 = getpass('Password: ')
p2 = getpass('Confirm: ')
if p1!=p2:
print >> sys.stderr, '\nPasswords do not match. Try again.\n'
return getpass2()
else:
return p1

def listFiles(root, patterns='*', recurse=1, return_folders=0):
# from Parmar and Martelli in the Python Cookbook
import os.path, fnmatch
# Expand patterns from semicolon-separated string to list
pattern_list = patterns.split(';')
# Collect input and output arguments into one bunch
class Bunch:
def __init__(self, **kwds): self.__dict__.update(kwds)
arg = Bunch(recurse=recurse, pattern_list=pattern_list,
return_folders=return_folders, results=[])

def visit(arg, dirname, files):
# Append to arg.results all relevant files (and perhaps folders)
for name in files:
fullname = os.path.normpath(os.path.join(dirname, name))
if arg.return_folders or os.path.isfile(fullname):
for pattern in arg.pattern_list:
if fnmatch.fnmatch(name, pattern):
arg.results.append(fullname)
break
# Block recursion if recursion was disallowed
if not arg.recurse: files[:]=[]

os.path.walk(root, visit, arg)

return arg.results

def get_recursive_filelist(args):
"""
Recurs all the files and dirs in args ignoring symbolic links and
return the files as a list of strings
"""
files = []

for arg in args:
if os.path.isfile(arg):
files.append(arg)
continue
if os.path.isdir(arg):
newfiles = listFiles(arg, recurse=1, return_folders=1)
files.extend(newfiles)

return [f for f in files if not os.path.islink(f)]
if __name__=='__main__':
parser = OptionParser()

parser.add_option("-f", "--arcdir", dest="arcdir",
help="Write hashed filenames to archive dir",
metavar="DIR",
default=os.getcwd())

parser.add_option("-e", "--endian", dest="endian",
help="big|little",
default="little")
parser.add_option("-p", "--passwdfile", dest="passwdfile",
help="Get passwd from FILE, otherwise prompt",
metavar="FILE", default=None)

parser.add_option("-m", "--match", dest="match",
help="Only extract files that match PATTERN. PATTERN is a comma separated list of strings, one of which must match the filename",
metavar="PATTERN", default=None)

parser.add_option("-u", "--unlink",
action="store_true", dest="unlink", default=False,
help="Delete files after archiving them")

parser.add_option("-c", "--create",
action="store_true", dest="create", default=False,
help="Create archive dir")

parser.add_option("-x", "--extract",
action="store_true", dest="extract", default=False,
help="Extract files recursively from archive dir")

parser.add_option("-t", "--tell",
action="store_true", dest="tell", default=False,
help="Report information about file but do not extract")

parser.add_option("-v", "--verbose",
action="store_true", dest="verbose", default=False,
help="Verbose listing of filenames to stdout")

(options, args) = parser.parse_args()

if options.create and options.extract:
print >>sys.stderr, 'Cannot create and extract archive simultaneously!'
sys.exit()
if not( options.create or options.extract or options.tell):
print >>sys.stderr, 'You must specify either -c or -x!'
sys.exit()

if os.path.exists(options.arcdir) and not os.path.isdir(options.arcdir):
print '%s exists and is not a dir' % options.arcdir

if not os.path.exists(options.arcdir):
os.makedirs(options.arcdir)

if options.passwdfile is not None:
passwd = file(options.passwdfile, 'rb').read()
else:
if options.create: passwd = getpass2()
else: passwd = getpass()

if options.extract:
if len(args)==0:
args = [options.arcdir]
files = get_recursive_filelist(args)

for thisFile in files:
head, tail = os.path.split(thisFile)
if not os.path.isfile(thisFile): continue
if not len(tail)==32:
print >>sys.stderr, '%s does not look like a hashname; skipping' % thisFile
continue
decode(thisFile, passwd,
unlink=options.unlink,
verbose=options.verbose,
match=options.match,
endian=options.endian
)
elif options.tell:
if len(args)==0:
args = [options.arcdir]
files = get_recursive_filelist(args)

for thisFile in files:
head, tail = os.path.split(thisFile)
if not os.path.isfile(thisFile): continue
if not len(tail)==32:
print >>sys.stderr, '%s does not look like a hashname; skipping' % thisFile
continue
tell(thisFile, passwd, verbose=options.verbose, endian=options.endian)

else:

if sys.platform=='win32':
# do glob expansion manually
expand = []
for arg in args:
expand.extend(glob.glob(arg))
args = expand
files = get_recursive_filelist(args)
for thisFile in files:
encode(thisFile, options.arcdir, passwd,
unlink=options.unlink,
verbose=options.verbose,
endian=options.endian)

Apr 17 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.