Connecting Tech Pros Worldwide Forums | Help | Site Map

Script to automate extraction of file from compressed archives

Peter Thorne
Guest
 
Posts: n/a
#1: May 7 '06
I am a perl newbie who is trying to write a script to automate a
task.

I have a large collection of compressed archives (mostly .tar.gz,
tar.bz2, tar.Z, .tgz etc). This are stored in a number of directories
/ sub-directories).

I am looking for a script that will recursively extract a single file
from each of these archives e.g. the file INSTALL, for the extracted
file to be moved to a different location and renamed to the name of
the archive itself, but keeping the same directory structure;

e.g.

Suppose I have archives files, x.tar.gz, y.tar.gz, and x.tar.Z in
/home/peter/a/

and in /home/peter/a/b/ files ab.tar.gz, b.tar.bz2, c.tgz

I would like the script to recursively extract the INSTALL from all of
these files, for the INSTALL files to be copied to /tmp, and renamed
to the name of the archive, so that in

/tmp/a/ there will be files named x.tar.gz, y.tar.gz, and x.tar.Z
(which are just the relevant INSTALL files), and in /tmp/a/b/ files
ab.tar.gz, b.tar.bz2, c.tgz (again these files to be just the INSTALL
files).


I appreciate that tar -zf name.tar.gz -x <file name> extracts just a
file, but it creates directories etc, which mean the above is
unworkable.

Would be really grateful for any help you can give. Please bear in
mind that I am not very technically minded.

Thanks,
Peter
peter_thorneNOSPAM@fastmail.fm




Michael Wehner
Guest
 
Posts: n/a
#2: May 21 '06

re: Script to automate extraction of file from compressed archives


Peter Thorne wrote:[color=blue]
> I am looking for a script that will recursively extract a single file
> from each of these archives e.g. the file INSTALL, for the extracted
> file to be moved to a different location and renamed to the name of
> the archive itself, but keeping the same directory structure;[/color]

This should be quite straightforward, especially if you intend to use an
external program like `tar' to unarchive your files.

You should need only one function that accepts as its sole argument a
directory name. This function is recursive, in that it calls itself,
passing along the name of whatever subdirectory it's currently looking
at. This function is just called from one starting point in your
program, and should be passed the initial directory name (perhaps from a
command-line switch).

The code might look like:

#!/usr/bin/perl

my $tar_path = '/bin/tar'; # Path to the tar program
my $starting_dir = shift; # Starting point for extraction

extract_dir($starting_dir);

exit;

sub extract_dir {
my $current_dir = shift; # Current directory level

# Get a filehandle for this directory
opendir my $DIR, $current_dir;

# Go through each file in this directory
for my $filename (readdir $DIR) {
# Check to see if this is a regular file
if (-f "$current_dir/$filename") {
# Extract the `INSTALL' file
system $tar_path, '-xf', $filename, 'INSTALL';
}

# Check to see if this is a subdirectory
if (-d "$current_dir/$filename") {
# Go through this directory
extract_dir("$current_dir/$filename");
}
}

return;
}


And of course, you'll want to add additional logic to make sure that the
current file it's iterating over isn't the same directory or a parent
directory (`.' and `..') or some cyclical symlink, and that if it's a
regular file, that it's an archive you actually want to extract. Given
your cited example, you'll probably want to also check the type of
archive in order to pass `tar' the appropriate arguments (since of
course, my example only works on a basic uncompressed archive).

Hope this gives some direction.

- Michael Wehner
Closed Thread