Quozl's Copy In Place |
| quozl@us.netrek.org
| up |
|
Scope: Linux Copy In Place Program.
When the archiving is done, an incremental rsync (--partial) slows dramatically if any portion of the output file already exists. No idea why that is, but what was needed was a way to copy the remaining portion of a file over NFS.
A : 0123456789012345678901234567890123456789 (input file) B : 0123456789012345678 (partial output file) C : 901234567890123456789 (to be copied)So what on Linux can do this? Tell Quozl.
command | comment | test time |
---|---|---|
cp | no relevant flag | |
scp | no relevant flag | |
dd | too hard, but possible with seek= | 1m2s |
wget --continue http://tv/file | works, but HTTP isn't in use | |
wget --continue ftp://tv/file | works, but FTP isn't in use | |
rsync rsync://tv/file | slows to 2.5Mbit/sec over 100Mbit/sec link, version 3.0.2 | 4m18s |
rsync --inplace rsync://tv/file | slows to 2.5Mbit/sec over 100Mbit/sec link, version 3.0.2 | 4m18s |
rsync --append rsync://tv/file | does not slow, but does checksum the current output file first | 1m29s |
cp-inplace | does not slow, does not checksum | 1m6s |
#!/usr/bin/python
""" copy the uncopied portion of a file """
import os, sys, time
r = open(sys.argv[1], 'r') # input file
a = open(sys.argv[2], 'a') # output file, may already exist
# seek input to end to determine current size
r.seek(0, os.SEEK_END)
rs = r.tell()
print rs, "size of input."
# seek output to end
a.seek(0, os.SEEK_END)
# get output file length
as = a.tell()
uncopied = rs - as
print as, "size of output,", uncopied, "to be moved."
# position input to current output position
r.seek(as, os.SEEK_SET)
start = time.time()
copied = 0
size = max(uncopied/10, 8192*1024)
# loop reading and writing until end of file on input
chunk = r.read(size)
while chunk != '':
a.write(chunk)
copied += len(chunk)
print r.tell(), "moved", len(chunk), "chunk", copied, "copied"
chunk = r.read(size)
# generate summary
elapsed = time.time() - start
bps = int(copied / elapsed)
print r.tell(), "eof", copied, "copied", bps, "bytes per second"