pdiffcopy: Fast large file synchronization inspired by rsync

https://travis-ci.org/xolox/python-pdiffcopy.svg?branch=master https://coveralls.io/repos/xolox/python-pdiffcopy/badge.svg?branch=master

Welcome to the documentation of pdiffcopy version 1.0.1! The following sections are available:

User documentation

The readme is the best place to start reading, it’s targeted at all users and documents the command line interface:

Fast large file synchronization inspired by rsync

https://travis-ci.org/xolox/python-pdiffcopy.svg?branch=master https://coveralls.io/repos/xolox/python-pdiffcopy/badge.svg?branch=master

The pdiffcopy program synchronizes large binary data files between Linux servers at blazing speeds by performing delta transfers and spreading its work over many CPU cores. It’s currently tested on Python 2.7, 3.5+ and PyPy (2.7) on Ubuntu Linux but is expected to work on most Linux systems.

Status

Although the first prototype of pdiffcopy was developed back in June 2019 it wasn’t until March 2020 that the first release was published as an open source project.

Note

This is an alpha release, meaning it’s not considered mature and you may encounter bugs. As such, if you’re going to use pdiffcopy, I would suggest you to keep backups, be cautious and sanity check your results.

There are lots of features and improvements I’d love to add but more importantly the project needs to actually be used for a while before I’ll consider changing the alpha label to beta or mature.

Installation

The pdiffcopy package is available on PyPI which means installation should be as simple as:

$ pip install 'pdiffcopy[client,server]'

There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions 😉.

The names between the square brackets (client and server) are called “extras” and they enable you to choose whether to install the client dependencies, server dependencies or both.

Command line

Usage: pdiffcopy [OPTIONS] [SOURCE, TARGET]

Synchronize large binary data files between Linux servers at blazing speeds by performing delta transfers and spreading the work over many CPU cores.

One of the SOURCE and TARGET arguments is expected to be the pathname of a local file and the other argument is expected to be a URL that provides the location of a remote pdiffcopy server and a remote filename. File data will be read from SOURCE and written to TARGET.

If no positional arguments are given the server is started.

Supported options:

Option Description
-b, --block-size=BYTES Customize the block size of the delta transfer. Can be a plain integer number (bytes) or an expression like 5K, 1MiB, etc.
-m, --hash-method=NAME Customize the hash method of the delta transfer (defaults to ‘sha1’ but supports all hash methods provided by the Python hashlib module).
-W, --whole-file Disable the delta transfer algorithm (skips computing of hashing and downloads all blocks unconditionally).
-c, --concurrency=COUNT Change the number of parallel block hash / copy operations.
-n, --dry-run Scan for differences between the source and target file and report the similarity index, but don’t write any changed blocks to the target.
-B, --benchmark=COUNT Evaluate the effectiveness of delta transfer by mutating the TARGET file (which must be a local file) and resynchronizing its contents. This process is repeated COUNT times, with varying similarity. At the end an overview is printed.
-l, --listen=ADDRESS Listen on the specified IP:PORT or PORT.
-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.

Benchmarks

The command line interface provides a simple way to evaluate the effectiveness of the delta transfer implementation and compare it against rsync. The tables in the following sections are based on that benchmark.

Low concurrency
Concurrency:6 processes on 4 CPU cores
Disks:Magnetic storage (slow)
Filesize:1.79 GiB

The following table shows the results of the benchmark on a 1.79 GiB datafile that’s synchronized between two bare metal servers that each have four CPU cores and spinning disks, where pdiffcopy was run with a concurrency of six [1]:

Delta Data size pdiffcopy rsync
10% 183 MiB 3.20 seconds 38.55 seconds
20% 366 MiB 4.15 seconds 44.33 seconds
30% 549 MiB 5.17 seconds 49.63 seconds
40% 732 MiB 6.09 seconds 53.74 seconds
50% 916 MiB 6.99 seconds 57.49 seconds
60% 1.07 GiB 8.06 seconds 1 minute and 0.97 seconds
70% 1.25 GiB 9.06 seconds 1 minute and 2.38 seconds
80% 1.43 GiB 10.12 seconds 1 minute and 4.20 seconds
90% 1.61 GiB 10.89 seconds 1 minute and 3.80 seconds
100% 1.79 GiB 12.05 seconds 1 minute and 4.14 seconds
[1]Allocating more processes than there are CPU cores available can make sense when the majority of the time spent by those processes is waiting for I/O (this definitely applies to pdiffcopy).
High concurrency
Concurrency:10 processes on 48 CPU cores
Disks:NVMe (fast)
Filesize:5.5 GiB

Here’s a benchmark based on a 5.5 GB datafile that’s synchronized between two bare metal servers that each have 48 CPU cores and high-end NVMe disks, where pdiffcopy was run with a concurrency of ten:

Delta Data size pdiffcopy rsync
10% 562 MiB 4.23 seconds 49.96 seconds
20% 1.10 GiB 6.76 seconds 1 minute and 2.38 seconds
30% 1.65 GiB 9.43 seconds 1 minute and 13.73 seconds
40% 2.20 GiB 12.41 seconds 1 minute and 19.67 seconds
50% 2.75 GiB 14.54 seconds 1 minute and 25.86 seconds
60% 3.29 GiB 17.21 seconds 1 minute and 26.97 seconds
70% 3.84 GiB 19.79 seconds 1 minute and 27.46 seconds
80% 4.39 GiB 23.10 seconds 1 minute and 26.15 seconds
90% 4.94 GiB 25.19 seconds 1 minute and 21.96 seconds
100% 5.43 GiB 27.82 seconds 1 minute and 19.17 seconds

This benchmark shows how well pdiffcopy can scale up its performance by running on a large number of CPU cores. Notice how the smaller the delta is, the bigger the edge is that pdiffcopy has over rsync? This is because pdiffcopy computes the differences between the local and remote file using many CPU cores at the same time. This operation requires only reading, and that parallelizes surprisingly well on modern NVMe disks.

Silly concurrency
Concurrency:20 processes on 48 CPU cores
Disks:NVMe (fast)
Filesize:5.5 GiB

In case you looked at the high concurrency benchmark above, noticed the large number of CPU cores available and wondered whether increasing the concurrency further would make a difference, this section is for you 😉. Having taken the effort of developing pdiffcopy and enabling it to run on many CPU cores I was curious myself so I reran the high concurrency benchmark using 20 processes instead of 10. Here are the results:

Delta Data size pdiffcopy rsync
10% 562 MiB 3.80 seconds 49.71 seconds
20% 1.10 GiB 6.25 seconds 1 minute and 3.37 seconds
30% 1.65 GiB 8.90 seconds 1 minute and 12.40 seconds
40% 2.20 GiB 11.44 seconds 1 minute and 19.57 seconds
50% 2.75 GiB 14.21 seconds 1 minute and 25.43 seconds
60% 3.29 GiB 16.45 seconds 1 minute and 28.12 seconds
70% 3.84 GiB 19.05 seconds 1 minute and 28.34 seconds
80% 4.39 GiB 21.95 seconds 1 minute and 25.49 seconds
90% 4.94 GiB 24.60 seconds 1 minute and 22.27 seconds
100% 5.43 GiB 26.42 seconds 1 minute and 18.73 seconds

As you can see increasing the concurrency from 10 to 20 does make the benchmark a bit faster, however the margin is so small that it’s hardly worth bothering. I interpret this to mean that the NVMe disks on these servers can be more or less saturated using 8–12 writer processes.

Note

In the end the question is how many CPU cores it takes to saturate your storage infrastructure. This can be determined through experimentation, which the benchmark can assist with. There are no fundamental reasons why 30 or even 50 processes couldn’t work well, as long as your storage infrastructure can keep up…

Limitations

While inspired by rsync the goal definitely isn’t feature parity with rsync. Right now only single files can be transferred and only the file data is copied, not the metadata. It’s a proof of concept that works but is limited. While I’m tempted to add support for synchronization of directory trees and file metadata just because its convenient, it’s definitely not my intention to compete with rsync in the domain of synchronizing large directory trees, because I would most likely fail.

Error handling is currently very limited and interrupting the program using Control-C may get you stuck with an angry pool of multiprocessing workers that refuse to shut down 😝. In all seriousness, hitting Control-C a couple of times should break out of it, otherwise try Control-\ (that’s a backslash, it should send a QUIT signal).

History

In June 2019 I found myself in a situation where I wanted to quickly synchronize large binary datafiles (a small set of very large MySQL *.ibd files totaling several hundred gigabytes) using the abundant computing resources available to me (48 CPU cores, NVMe disks, bonded network interfaces, you name it 😉).

I spent quite a bit of time experimenting with running many rsync processes in parallel, but the small number of very large files was “clogging up the pipe” so to speak, no matter what I did. This was how I realized that rsync was a really poor fit, which was a disappointment for me because rsync has long been one my go-to programs for ad hoc problem solving on Linux servers 🙂.

In any case I decided to prove to myself that the hardware available to me could do much more than what rsync was getting me and after a weekend of hacking on a prototype I had something that could outperform rsync even though it was written in Python and used HTTP as a transport 😁. During this weekend I decided that my prototype was worthy of being published as an open source project, however it wasn’t until months later that I actually found the time to do so.

About the name

The name pdiffcopy is intended as a (possibly somewhat obscure) abbreviation of “Parallel Differential Copy”:

  • Parallel because it’s intended run on many CPU cores.
  • Differential because of the delta transfer mechanism.

But mostly I just needed a short, unique name like rsync so that searching for this project will actually turn up this project instead of a dozen others 😇.

Contact

The latest version of pdiffcopy is available on PyPI and GitHub. The documentation is hosted on Read the Docs and includes a changelog. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.

License

This software is licensed under the MIT license.

© 2020 Peter Odding.

API documentation

The following API documentation is automatically generated from the source code:

API documentation

This documentation is based on the source code of version 1.0.1 of the pdiffcopy package. The following modules are available:

pdiffcopy

Configuration defaults for the pdiffcopy program.

pdiffcopy.BLOCK_SIZE = 1048576

The default block size to be used by pdiffcopy (1 MiB).

pdiffcopy.DEFAULT_CONCURRENCY = 2

The default concurrency to be used by pdiffcopy (at least two, at most 1/3 of available cores).

pdiffcopy.DEFAULT_PORT = 8080

The default port number for the pdiffcopy server (an integer number, defaults to 8080).

pdiffcopy.cli

Usage: pdiffcopy [OPTIONS] [SOURCE, TARGET]

Synchronize large binary data files between Linux servers at blazing speeds by performing delta transfers and spreading the work over many CPU cores.

One of the SOURCE and TARGET arguments is expected to be the pathname of a local file and the other argument is expected to be a URL that provides the location of a remote pdiffcopy server and a remote filename. File data will be read from SOURCE and written to TARGET.

If no positional arguments are given the server is started.

Supported options:

Option Description
-b, --block-size=BYTES Customize the block size of the delta transfer. Can be a plain integer number (bytes) or an expression like 5K, 1MiB, etc.
-m, --hash-method=NAME Customize the hash method of the delta transfer (defaults to ‘sha1’ but supports all hash methods provided by the Python hashlib module).
-W, --whole-file Disable the delta transfer algorithm (skips computing of hashing and downloads all blocks unconditionally).
-c, --concurrency=COUNT Change the number of parallel block hash / copy operations.
-n, --dry-run Scan for differences between the source and target file and report the similarity index, but don’t write any changed blocks to the target.
-B, --benchmark=COUNT Evaluate the effectiveness of delta transfer by mutating the TARGET file (which must be a local file) and resynchronizing its contents. This process is repeated COUNT times, with varying similarity. At the end an overview is printed.
-l, --listen=ADDRESS Listen on the specified IP:PORT or PORT.
-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.
pdiffcopy.cli.main()[source]

The command line interface.

pdiffcopy.cli.run_client(**options)[source]

Run the client program.

pdiffcopy.cli.run_server(**options)[source]

Run the server program.

pdiffcopy.client

Parallel, differential file copy client.

class pdiffcopy.client.Client(**kw)[source]

Python API for the client side of the pdiffcopy program.

Here’s an overview of the Client class:

Superclass: PropertyManager
Public methods: compute_transfer_size(), find_changes(), mutate_target(), run_benchmark(), synchronize(), synchronize_once() and transfer_changes()
Properties: benchmark, block_size, concurrency, delta_transfer, dry_run, hash_method, source and target

You can set the values of the benchmark, block_size, concurrency, delta_transfer, dry_run, hash_method, source and target properties by passing keyword arguments to the class initializer.

benchmark[source]

How many times the benchmark should be run (an integer, defaults to 0).

Note

The benchmark property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

block_size[source]

The block size used by the client.

Note

The block_size property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

concurrency[source]

The number of parallel processes that the client is allowed to start.

Note

The concurrency property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

delta_transfer[source]

Whether delta transfer is enabled (a boolean, defaults to True).

Note

The delta_transfer property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

dry_run[source]

Whether the client is allowed to make changes.

Note

The dry_run property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

hash_method[source]

The block hash method (a string, defaults to ‘sha1’).

Note

The hash_method property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

source[source]

The Location from which data is read.

Note

The source property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

target[source]

The Location to which data is written.

Note

The target property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

compute_transfer_size(offsets)[source]

Figure out how much data we’re going to transfer.

Parameters:offsets – a list of integers with the offsets of the blocks to be synchronized.
Returns:The amount of data to be transferred in bytes (an integer).

This would be trivially easy if it wasn’t for the last block which can be smaller than the block size. Depending on the configured block size and the size of the file being synchronized the difference may be negligible or quite significant, so we go to the effort of calculating this correctly.

mutate_target(percentage)[source]

Invalidate a percentage of the data in the target file.

run_benchmark()[source]

Benchmark the effectiveness of the delta transfer implementation.

synchronize()[source]

Synchronize from source to target (possibly more than once, see benchmark).

synchronize_once()[source]

Synchronize from source to target.

Returns:The number of blocks that differed (an integer).
find_changes()[source]

Helper for synchronize() to compute the similarity index.

transfer_changes(offsets)[source]

Helper for synchronize() to transfer the differences.

Parameters:offsets – A list of integers with the byte offsets of the blocks to copy from source to target.
pdiffcopy.client.get_hashes_fn(location, **options)[source]

Adapter for multiprocessing used by Client.find_changes().

pdiffcopy.client.transfer_block_fn(offset, source, target, block_size)[source]

Adapter for multiprocessing used by Client.transfer_changes().

class pdiffcopy.client.Location(**kw)[source]

A local or remote file to be copied.

Here’s an overview of the Location class:

Superclass: PropertyManager
Public methods: get_hashes(), get_url(), read_block(), resize() and write_block()
Properties: exists, expression, file_info, file_size, filename, hostname, label and port_number

You can set the values of the expression, filename, hostname and port_number properties by passing keyword arguments to the class initializer.

exists[source]

True if the file exists, False otherwise.

Note

The exists property is a cached_property. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

expression[source]

The location expression (a string).

Note

The expression property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

filename[source]

The absolute pathname of the file to copy (a string).

Note

The filename property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

hostname[source]

The host name of a pdiffcopy server (a string or None).

Note

The hostname property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

label

A human friendly label for the location (a string).

port_number[source]

The port number of a pdiffcopy server (a number or None).

Note

The port_number property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

file_info[source]

A dictionary with file metadata.

Note

The file_info property is a cached_property. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

file_size[source]

The size of the file in bytes (an integer).

Note

The file_size property is a cached_property. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

get_hashes(**options)[source]

Get the hashes of the blocks in a file.

Parameters:options – See get_url().
Returns:A generator of tokens with two values each:
  1. A byte offset into the file (an integer).
  2. The hash of the block starting at that offset (a string).
get_url(endpoint, **params)[source]

Get the server URL for the given endpoint.

Parameters:
  • endpoint – The name of a server side endpoint (a string).
  • params – Any query string parameters.
read_block(offset, size)[source]

Read a block of data from filename.

Parameters:
  • offset – The byte offset where reading starts (an integer).
  • size – The number of bytes to read (an integer).
Returns:

A byte string.

resize(size)[source]

Adjust the size of filename to the given size.

Parameters:size – The new file size in bytes (an integer).
write_block(offset, data)[source]

Write a block of data to filename.

Parameters:
  • offset – The byte offset where writing starts (an integer).
  • data – The byte string to write to the file.

pdiffcopy.exceptions

Custom exceptions raised by the pdiffcopy modules.

exception pdiffcopy.exceptions.ProgramError(text, *args, **kw)[source]

The base exception class for all custom exceptions raised by the pdiffcopy modules.

__init__(text, *args, **kw)[source]

Initialize a ProgramError object.

For argument handling see the compact() function. The resulting string is used as the exception message.

exception pdiffcopy.exceptions.BenchmarkAbortedError(text, *args, **kw)[source]

Raised when the operator doesn’t give explicit permission to run the benchmark.

exception pdiffcopy.exceptions.DependencyError(text, *args, **kw)[source]

Raised when client or server installation requirements are missing.

pdiffcopy.hashing

Parallel hashing of files using multiprocessing and pdiffcopy.mp.

pdiffcopy.hashing.compute_hashes(filename, block_size, method, concurrency)[source]

Compute checksums of a file in blocks (parallel).

pdiffcopy.hashing.hash_worker(offset, block_size, filename, method)[source]

Worker function to be run in child processes.

pdiffcopy.mp

Adaptations of multiprocessing that make it easier to do the right thing.

This module stands alone as a library used by the other modules that are specialized to what pdiffcopy does (synchronizing files). I may end up extracting this to a separate package at some point, because over the 10+ years that I’ve been programming Python I’ve written an awful lot of plumbing code for multiprocessing and it’s not exactly my favorite thing in the world (I suck at reasoning about concurrency, like most people I guess).

class pdiffcopy.mp.Promise(**options)[source]

Execute a Python function in a child process and retrieve its return value.

__init__(**options)[source]

Initialize a Promise object.

The initializer arguments are the same as for multiprocessing.Process. The child process is started automatically.

run()[source]

Run the target function in a newly spawned child process.

join()[source]

Get the return value and wait for the child process to finish.

class pdiffcopy.mp.WorkerPool(**kw)[source]

Simple to use worker pool implementation using multiprocessing.

Here’s an overview of the WorkerPool class:

Superclass: PropertyManager
Special methods: __enter__(), __exit__() and __iter__()
Properties: all_processes, concurrency, generator_fn, generator_process, input_queue, log_level, output_queue, polling_interval, worker_fn and worker_processes

When you initialize a WorkerPool object you are required to provide values for the concurrency, generator_fn and worker_fn properties. You can set the values of the concurrency, generator_fn, log_level, polling_interval and worker_fn properties by passing keyword arguments to the class initializer.

all_processes[source]

A list with all multiprocessing.Process objects used by the pool.

Note

The all_processes property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

concurrency[source]

The number of processes allowed to run simultaneously (an integer).

Note

The concurrency property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named concurrency (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

generator_fn[source]

A user defined generator to populate input_queue.

Note

The generator_fn property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named generator_fn (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

generator_process[source]

A multiprocessing.Process object to run generator_fn.

Note

The generator_process property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

input_queue[source]

The input queue (a multiprocessing.Queue object).

Note

The input_queue property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

log_level[source]

The logging level to configure in child processes (an integer).

Defaults to the current log level in the parent process at the point when the worker processes are created.

Note

The log_level property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

output_queue[source]

The output queue (a multiprocessing.Queue object).

Note

The output_queue property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

polling_interval[source]

The time to wait between checking output_queue (a floating point number, defaults to 0.1 second).

Note

The polling_interval property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

worker_fn[source]

A user defined worker function to consume input_queue and populate output_queue.

Note

The worker_fn property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named worker_fn (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

worker_processes[source]

A list of multiprocessing.Process objects to run worker_fn.

Note

The worker_processes property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

__iter__()[source]

Initialize the generator and worker processes and start yielding values from the output_queue.

__enter__()[source]

Start up the generator and worker processes.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Terminate any child processes that are still alive.

pdiffcopy.mp.generator_adapter(concurrency, generator_fn, input_queue, log_level)[source]

Adapter function for the generator process.

pdiffcopy.mp.worker_adapter(input_queue, log_level, output_queue, worker_fn)[source]

Adapter function for the worker processes.

pdiffcopy.operations

Utility functions used by the client as well as the server.

pdiffcopy.operations.get_file_info(filename)[source]

Get information about a local file.

Parameters:filename – An absolute filename (a string).
Returns:A dictionary with file metadata, currently only the file size is included. If the file doesn’t exist an empty dictionary is returned.
pdiffcopy.operations.get_file_size(filename)[source]

Get the size of a local file.

Parameters:filename – An absolute filename (a string).
Returns:The size of the file (an integer) or None when the file doesn’t exist.
pdiffcopy.operations.read_block(filename, offset, size)[source]

Read a block of data from a local file.

Parameters:
  • filename – An absolute filename (a string).
  • offset – The byte offset were reading starts (an integer).
  • size – The number of bytes to read (an integer).
Returns:

The read data (a byte string).

pdiffcopy.operations.resize_file(filename, size)[source]

Create or resize a local file, in preparation for synchronizing its contents.

Parameters:
  • filename – An absolute filename (a string).
  • size – The new size in bytes (an integer).
pdiffcopy.operations.write_block(filename, offset, data)[source]

Write a block of data to a local file.

Parameters:
  • filename – An absolute filename (a string).
  • offset – The byte offset were writing starts (an integer).
  • data – The data to write (a byte string).

Change log

The change log lists notable changes to the project:

Changelog

The purpose of this document is to list all of the notable changes to this project. The format was inspired by (but doesn’t strictly adhere to) Keep a Changelog . This project adheres to semantic versioning.

Release 1.0 (2020-03-06)

The initial release of the pdiffcopy program after several weekends of hacking on prototypes and refactoring things until I was happy enough with the code base to share it with the world 🙂.