Efficient Copy-on-Write (“Forkery”) – High-Level Algorithm Description

0 users think it is non-scientific; after 15 it will be moderated.

Originates from https://blockchain.vporton.name/whitepaper-draft/

I realized that there is a data structure in which copying an object takes almost zero time and energy. Hey Linus, it’s possible to build a filesystem in which copying a 100GB file happens in about 1/10000 sec without using much memory! We can backup all files on an average SSD disk every second. It also allows to create modifiable blockchains.

The scientific article about this algorithm is to be written. The algorithm follows the following simple ideas:

  • Index blocks of memory by a number of a fork and a point in time (a natural number), store all such pairs together with pointers to memory blocks in a data structure with storage and complexity similar to a (My)SQL table with a two-column (the number of a fork and the point in time) composite index.
  • So, for a “sequential” forks (when an object that has been already forked cannot be forked again, but its children can be forked further) retrieval is an efficient operation by analogy with SQL
    SELECT value FROM storage WHERE fork <= ? AND time <= ? ORDER BY fork ASC, time ASC LIMIT 1
    Especially, retrieval of the first or the last element is constant-time (very efficient).
  • If we have not just a sequence of forks: 0, 1, 2, … but a tree of forks? We also can manage this reasonably efficiently: E.g. make a mapping from every fork number that deviates from linear order (that is where there are forks not only of the very last fork) into a sequence of a new B-tree version (see https://www.usenix.org/legacy/event/hotstorage11/tech/final_files/Twigg.pdf for “stratified” B-trees that basically offer logarithmic that is adequate performance).
    It still remains efficient especially constant-time for insertion into the end.
  • The size of the data structure can be kept small (or we can instead opt-out of deleting old data to keep the complete version history what is another useful feature), because we need to store only the last version or the version which another forks depends on.
  • Storing data (except of storing into a new fork, which is logarithmic on the number of forks) is also constant-time (super-efficient), because we need to update only the pointer to the actual storage not the indexes (we don’t need to update the time, except if we opt-in to keep the complete history of changes).

That’s an efficient copy-on-write.

0 users think it is non-scientific; after 15 it will be moderated.
Victor Porton

By Victor Porton

I am the chief editor of this journal and creator of this site.