Module forest_filecoin::db::gc

source ·
Expand description

The current implementation of the garbage collector is concurrent mark-and-sweep.

§Terminology

chain finality - a number of epochs after which it becomes impossible to add or remove a block previously appended to the blockchain.

§Design goals

A correct GC algorithm that is simple and efficient for forest scenarios. This algorithm removes unreachable blocks that are older than chain finality, making sure to avoid removing something that could later become reachable as a result of a fork.

Properties:

  • No BlockHeader reachable from HEAD may be garbage collected.
  • No data younger than chain finality epochs may be garbage collected.
  • State-trees older than depth epochs should be garbage collected.
  • Not all unreachable data has to be garbage collected. In other words, it’s acceptable for the garbage collector to be conservative.
  • The garbage collector may not prevent access to the database.

§GC Algorithm

The mark-and-sweep algorithm was chosen due to it’s simplicity, efficiency and low memory footprint.

§GC Workflow

  1. Mark: traverse all the blocks, generating integer hash representations for each identifier and storing those in a set.
  2. Wait at least chain finality blocks.
  3. Traverse reachable blocks starting at the current heaviest tipset and remove those from the marked set, leaving only unreachable entries that are older than chain finality.
  4. Sweep, removing all the remaining marked entries from the database.

§Correctness

This algorithm considers all the blocks that are visited during the snapshot export task reachable, making sure they are kept in the database after the run. It makes sure to retain the reachable graph as well as all the blocks for at least chain finality to account for potential forks. A snapshot can be used to bootstrap the node from scratch, thus the algorithm is considered correct when a valid snapshot can be exported using records available in the database after the run.

§Disk usage

The expected disk usage is slightly greater than the size of live data for three reasons:

  1. Unreachable data is not removed until it is at least 7.5 hours old (see chain finality).
  2. The garbage collector is conservative and is expected to leave a small (less than 1%) amount of unreachable data behind.
  3. The blockstore back-end may be fragmented, therefore not relinquishing the disk space back to the OS.

§Memory usage

During the mark and up to the sweep stage, the algorithm requires 4 bytes of memory for each database record. Additionally, the seen cache while traversing the reachable graph executing the filter stage requires at least 32 bytes of memory for each reachable block. For a typical mainnet snapshot of about 100 GiB that adds up to roughly 2.5 GiB.

§Scheduling

  1. GC is triggered automatically and there have to be at least chain finality epochs stored for the mark step.
  2. The filter step is triggered after at least chain finality has passed since the mark step.
  3. Then, the sweep step happens.
  4. Finally, the algorithm waits for a configured amount of time to initiate the next run.

§Performance

The time complexity of mark and sweep steps is O(n). The filter step is currently utilizing a depth-first search algorithm, with O(V+E) complexity, where V is the number of vertices and E is the number of edges.

Structs§

  • MarkAndSweep is a simple garbage collector implementation that traverses all the database keys writing them to a CidHashSet, then filters out those that need to be kept and schedules the rest for removal.

Constants§