473,395 Members | 1,761 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Data Transfer Dilemma

goralondos
Hopefully this is the right forum, it didn't seem to quite fit into any of the others.

My company has 20 remote sites that backup date to our main building.
Presently, a scheduled batch job is setup that zips (tar+gnu) all data in one large file, then FTP the zip file to the main location. The size of the data coming over ranges from 50MB to 1.5GB per location, 12 GB total.

The size of the data will continue to grow as time goes on.
On a day to day basis, only 1-5% of the files coming over will have actually changed, but those files can be 20-25% of the total size of the transfer.

For me, it seemed like a waste to bring over the rest of the files that haven’t changed.

Why I came here: I’m hoping to get some advice about sequence of events and/or programs to accomplish just bring over the files that have changed.

I hope to have a zipped copy of the files in a backup directory at the remote site and at my location, with as little extra data transferred, in the least amount of time, possible.

It should be pretty simple…
Data Directories – (sync) -> Backup Directory
Backup Directory – (sync) -> FTP Site
It would be nice to have the Backup Directory and FTP Site sync zipped files instead, so the files travel across the internet compressed.
Is it possible to compare zipped archives through an ftp(sftp, etc) type connect and only send the files out of one archive to the other, that have changed?

A secondary option could be instead of creating one large zip file, zip all the files as individual archives in the backup directory, then sync?

Any thoughts would be greatly appreciated.

Thanks!
Adam
Mar 8 '07 #1
12 1484
Killer42
8,435 Expert 8TB
Don't know about through the FTP facility, but it should be possible to check the CRC of files within the archive, in order to get a comparison without having to transfer the entire archive.

Hm... how about this?

Along with the archive (zip) file, perhaps you could store a text file listing the contents of the archive. This could be FTP'd across, and examined to determine what else has to be done.

Ideally, of course, you want as much as possble to happen on the sending side (for example if anything needs to be extracted/re-zipped), to minimise traffic. You might be better off looking into real backup software rather than rolling your own.

Maybe you could just do a weekly full and daily incremental zip files? Zip utilities usually allow you to add only those files which have been touched.
Mar 8 '07 #2
Thank You for the response!

Ideally, of course, you want as much as possible to happen on the sending side (for example if anything needs to be extracted/re-zipped), to minimize traffic. You might be better off looking into real backup software rather than rolling your own.
All the remote sites use one kind of backup software or another already. Coordinating all the transferred around the backups is another fun “side” project . This backup is meant for disaster recovery. Just to give a little more detail, all the remote sites do different accounting related work. Some remote sites print payroll checks, some accept payments, etc. If a disaster struck on the right day, it could affect the payroll checks getting out on time, this backup is meant to minimize that chance.

Maybe you could just do a weekly full and daily incremental zip files? Zip utilities usually allow you to add only those files which have been touched.
I’ve been playing around with a utility I found on the web called “SyncronEx”
It will do a one-way sync of a set of backup directories to another location. Even though it would end up being a full backup in both the backup directory at the remote site and on the ftp site, it is incremental in the sense that it’s only moves the files to 2 other locations that have changed. The main problem I have with the program, and in general with this is I would feel more comfortable creating a secondary directory and syncing off that because of the time factor that can be involved in the transfer.
Mar 9 '07 #3
NeoPa
32,556 Expert Mod 16PB
Personally, I'd maintain a copy on each remote site of the current status of the files backed up to the central site (Keep separate files, the higher the granularity, the better the efficiency).
Every time a backup is scheduled, it will compare the current file with the file that is a duplicate of what is on the central site (No bandwidth used in this), only Zipping and copying those files which have been changed.
Alternatively, Zip them all before the comparison to waste less local space.
Mar 11 '07 #4
Banfa
9,065 Expert Mod 8TB
I’ve been playing around with a utility I found on the web called “SyncronEx” It will do a one-way sync of a set of backup directories to another location. Even though it would end up being a full backup in both the backup directory at the remote site and on the ftp site, it is incremental in the sense that it’s only moves the files to 2 other locations that have changed. The main problem I have with the program, and in general with this is I would feel more comfortable creating a secondary directory and syncing off that because of the time factor that can be involved in the transfer.
Surely you have just answered your own question.

Create a batch/command file that copies the files at the remote site to an alternate directory and then uses SyncronEx on that directory to backup to the server.
Mar 11 '07 #5
drhowarddrfine
7,435 Expert 4TB
This sounds like a job for 'dump' in Unix/BSD.
Mar 11 '07 #6
Killer42
8,435 Expert 8TB
Personally, I'd maintain a copy on each remote site of the current status of the files backed up to the central site (Keep separate files, the higher the granularity, the better the efficiency).
Every time a backup is scheduled, it will compare the current file with the file that is a duplicate of what is on the central site (No bandwidth used in this), only Zipping and copying those files which have been changed.
Alternatively, Zip them all before the comparison to waste less local space.
One bonus of this sort of approach is that, if you keep a complete copy at each site rather than just the status, you automatically have a number of off-site backups in case of disaster.

Of course, I may be totally destroying the bandwidth-minimisation idea there. Maybe they just need a faster connection. :-)
Mar 11 '07 #7
NeoPa
32,556 Expert Mod 16PB
The OP mentioned 12GB of data.
That's a delayed response even at LAN speeds. I think the most important issue here is to avoid uploading as much as possible. Uploading just the Deltas may be a step too far though ;)
Mar 12 '07 #8
Thank you for all the replies!

I did some testing with SynchronEx utility over the weekend. It works very well for Syncing directories if anyone needs that function in a simple command line style utility that works on multiple platforms.

Let me know what you guys think about a configuration like this…

Live Data Directory is compressed to Backup directory at Remote Site 1.
Instead of compressing the data in one large archive, compress each file in an individual archive in the backup directory at Remote Site 1. Then run a utility like SynchronEx, syncing the backup directory at Remote Site 1 to backup directory at my location.
Mar 12 '07 #9
NeoPa
32,556 Expert Mod 16PB
Thank you for all the replies!

I did some testing with SynchronEx utility over the weekend. It works very well for Syncing directories if anyone needs that function in a simple command line style utility that works on multiple platforms.

Let me know what you guys think about a configuration like this…

Live Data Directory is compressed to Backup directory at Remote Site 1.
Instead of compressing the data in one large archive, compress each file in an individual archive in the backup directory at Remote Site 1. Then run a utility like SynchronEx, syncing the backup directory at Remote Site 1 to backup directory at my location.
If "Live Data Directory is compressed to Backup directory at Remote Site 1." refers to a "Live Data Directory" at that same "Remote Site 1" then this sounds ideal. Obviously, this would not be limited to "Remote Site 1" but would have to be repeated at each of the remote sites.
Mar 12 '07 #10
If "Live Data Directory is compressed to Backup directory at Remote Site 1." refers to a "Live Data Directory" at that same "Remote Site 1" then this sounds ideal. Obviously, this would not be limited to "Remote Site 1" but would have to be repeated at each of the remote sites.
Yes, my apologizes for not being clear about that, the Live Data Directory and Remote Site 1 are at the same location.
Once I have it working at one remote site, I will need to repeat the new setup at the other remote sites.

Now I just have to find a good command line program that can zip multiple directories to backup directory as individual archives.

I do have to say, as much as projects like this can be frustrating, they are quite fun and I do greatly appreciate the input provided here.
Mar 12 '07 #11
NeoPa
32,556 Expert Mod 16PB
I have a routine which depends on having WinZip Command Line Interface (CLI) installed on the PC.
Expand|Select|Wrap|Line Numbers
  1. 'Zip zips up the files in strFiles into strZip.  Returns success state.
  2. Public Function Zip(strZip As String, strFiles As String) As Boolean
  3.     Dim strCmd As String, strExe As String
  4.  
  5.     Zip = True
  6.     On Error GoTo ErrorZ
  7.     strExe = RegRead(conHKLM, conZipKey, "")
  8.     strCmd = ParamReplace("""%E"" -a+ -ex -ybc ""%Z"" ""%F""", "%E", strExe, _
  9.                                                                "%Z", strZip, _
  10.                                                                "%F", strFiles)
  11.     Call Shell(PathName:=strCmd, WindowStyle:=vbNormalFocus)
  12.     Exit Function
  13.  
  14. ErrorZ:
  15.     strCmd = ParamReplace("Unable to zip {%F} into '%Z'", "%F", strFiles, _
  16.                                                           "%Z", strZip)
  17.     Call ShowMsg(strMsg:=strCmd, strTitle:="Zip", intButtons:=vbInformation)
  18.     Zip = False
  19. End Function
Mar 12 '07 #12
NeoPa
32,556 Expert Mod 16PB
There are two function calls in there I should really explain further :
RegRead()
This is provided as a separate module in Module to Read from the Windows Registry.
ParamReplace()
This is simply Replace() but for multiple parameter sets. I'll include it here for your reference (or usage) but you may prefer simply to change the code to issue multiple Replace() references.
Expand|Select|Wrap|Line Numbers
  1. 'ParamReplace replaces all occurrences of varParam in strMain with varReplace.
  2. 'Using VbBinaryCompare means that case is not ignored.
  3. Public Function ParamReplace(ByRef strMain As String, _
  4.                              ByVal varParam As Variant, _
  5.                              ByVal varReplace As Variant, _
  6.                              ParamArray avarArgs())
  7.     Dim intIdx As Integer
  8.  
  9.     If (UBound(avarArgs) - LBound(avarArgs)) Mod 2 = 0 Then Stop
  10.     ParamReplace = Replace(Expression:=strMain, _
  11.                            Find:=Nz(varParam, ""), _
  12.                            Replace:=Nz(varReplace, ""), _
  13.                            Compare:=vbBinaryCompare)
  14.     For intIdx = LBound(avarArgs) To UBound(avarArgs) Step 2
  15.         ParamReplace = Replace(Expression:=ParamReplace, _
  16.                                Find:=Nz(avarArgs(intIdx), ""), _
  17.                                Replace:=Nz(avarArgs(intIdx + 1), ""), _
  18.                                Compare:=vbBinaryCompare)
  19.     Next intIdx
  20. End Function
Mar 12 '07 #13

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Fatih BOY | last post by:
Hi, I want to send a report from a windows application to a web page like 'report.asp' Currently i can send it via post method with a context like local=En&Username=fatih&UserId=45&Firm=none...
7
by: Mark Waser | last post by:
Hi all, I'm trying to post multipart/form-data to a web page but seem to have run into a wall. I'm familiar with RFC 1867 and have done this before (with AOLServer and Tcl) but just can't seem...
4
by: richerdh | last post by:
Hi guys, i am in sort of a dilemma here. Hope someone can help me. I have a database with 3 tables named introducer, introducee and points. In the table introducer, i have a field called...
4
by: mshr25 | last post by:
Hello,all! I work on a kind of algorithm where the speed and memory consumption are critical. I understand that these are tradeofs but I want to try to reach an optimal solution. Now I'll try...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.