$treeview $search $mathjax $extrastylesheet
librsync
2.3.1
$projectbrief
|
$projectbrief
|
$searchbox |
00001 # File formats {#page_formats} 00002 00003 ## Generalities 00004 00005 There are two file formats used by `librsync` and `rdiff`: the 00006 *signature* file, which summarizes a data file, and the *delta* file, 00007 which describes the edits from one data file to another. 00008 00009 librsync does not know or care about any formats in the data files. 00010 00011 All integers are big-endian. 00012 00013 ## Magic numbers 00014 00015 All librsync files start with a u32 \ref rs_magic_number identifying them. 00016 These are declared in `librsync.h`, and there are different numbers for every 00017 different signature and delta file type. Note magic numbers for newer file 00018 types are not supported by older versions of librsync. Older librsync versions 00019 will immediately fail with an error when they encounter file types they don't 00020 support. 00021 00022 ## Signatures 00023 00024 Signatures consist of a header followed by a number of block signatures for 00025 each block in the data file. 00026 00027 The signature header is: 00028 00029 u32 magic; // Some RS_*_SIG_MAGIC value. 00030 u32 block_len; // Bytes per block. 00031 u32 strong_sum_len; // Bytes per strong sum in each block. 00032 00033 Each block signature includes a weaksum followed by a truncated strongsum hash 00034 for one block of `block_len` bytes from the input data file. The strongsum 00035 signature will be truncated to the `strong_sum_len` in the header. The final 00036 data block may be shorter. The number of blocks in the signature is therefore 00037 00038 ceil(input_len/block_len) 00039 00040 The block signature weak checksum is used as a rolling checksum to find moved 00041 data, and a strong hash used to check the match is correct. The weak checksum 00042 is either a rollsum (based on adler32) or (better alternative) rabinkarp, and 00043 the strong hash is either MD4 or BLAKE2 depending on the magic number. 00044 00045 Truncating the strongsum makes the signatures smaller at a cost of a greater 00046 chance of collisions. The strongsums are truncated by keeping the left most 00047 (first) bytes after computation. 00048 00049 Each signature block format is (see `rs_sig_do_block`): 00050 00051 u32 weak_sum; 00052 u8[strong_sum_len] strong_sum; 00053 00054 ## Delta files 00055 00056 Deltas consist of the delta magic constant `RS_DELTA_MAGIC` followed by a 00057 series of commands. Commands tell the patch logic how to construct the result 00058 file (new version) from the basis file (old version). 00059 00060 There are three kinds of commands: the literal command, the copy command, and 00061 the end command. A command consists of a single byte followed by zero or more 00062 arguments. The number and size of the arguments are defined in `prototab.c`. 00063 00064 A literal command describes data not present in the basis file. It has one 00065 argument: `length`. The format is: 00066 00067 u8 command; // in the range 0x41 through 0x44 inclusive 00068 u8[arg1_len] length; 00069 u8[length] data; // new data to append 00070 00071 A copy command describes a range of data in the basis file. It has two 00072 arguments: `start` and `length`. The format is: 00073 00074 u8 command; // in the range 0x45 through 0x54 inclusive 00075 u8[arg1_len] start; // offset in the basis to begin copying data 00076 u8[arg2_len] length; // number of bytes to copy from the basis 00077 00078 The end command indicates the end of the delta file. It consists of a single 00079 null byte and has no arguments.