Page MenuHomeClusterLabs Projects
Diviner Tech Docs PhabricatorRepositoryGraphCache

final class PhabricatorRepositoryGraphCache
Phorge Technical Documentation (Repositories)

Given a commit and a path, efficiently determine the most recent ancestor commit where the path was touched.

In Git and Mercurial, log operations with a path are relatively slow. For example:

git log -n1 <commit> -- <path>

...routinely takes several hundred milliseconds, and equivalent requests often take longer in Mercurial.

Unfortunately, this operation is fundamental to rendering a repository for the web, and essentially everything else that's slow can be reduced to this plus some trivial work afterward. Making this fast is desirable and powerful, and allows us to make other things fast by expressing them in terms of this query.

Because the query is fundamentally a graph query, it isn't easy to express in a reasonable way in MySQL, and we can't do round trips to the server to walk the graph without incurring huge performance penalties.

However, the total amount of data in the graph is relatively small. By caching it in chunks and keeping it in APC, we can reasonably load and walk the graph in PHP quickly.

For more context, see T2683.

Structure of the Cache

The cache divides commits into buckets (see getBucketSize()). To walk the graph, we pull a commit's bucket. The bucket is a map from commit IDs to a list of parents and changed paths, separated by null. For example, a bucket might look like this:

array(
  1 => array(0, null, 17, 18),
  2 => array(1, null, 4),
  // ...
)

This means that commit ID 1 has parent commit 0 (a special value meaning no parents) and affected path IDs 17 and 18. Commit ID 2 has parent commit 1, and affected path 4.

This data structure attempts to balance compactness, ease of construction, simplicity of cache semantics, and lookup performance. In the average case, it appears to do a reasonable job at this.

Tasks

Querying the Graph Cache

  • public function loadLastModifiedCommitID($commit_id, $path_id, $time) — Search the graph cache for the most modification to a path.

Cache Internals

  • private function getBucketKey($commit_id) — Get the bucket key for a given commit ID.
  • private function getBucketCacheKey($bucket_key) — Get the cache key for a given bucket key (from @{method:getBucketKey}).
  • private function getBucketSize() — Get the number of items per bucket.
  • private function getBucketData($bucket_key, $rebuild_data) — Retrieve or build a graph cache bucket from the cache.
  • private function rebuildBucket($bucket_key, $current_data) — Rebuild a cache bucket, amending existing data if available.

Methods

public function loadLastModifiedCommitID($commit_id, $path_id, $time)

Search the graph cache for the most modification to a path.

Parameters
int$commit_idThe commit ID to search ancestors of.
int$path_idThe path ID to search for changes to.
float$timeMaximum number of seconds to spend trying to satisfy this query using the graph cache. By default, `0.5` (500ms).
Return
mixedCommit ID, or `null` if no ancestors exist, or `false` if the graph cache was unable to determine the answer.

private function getBucketKey($commit_id)

Get the bucket key for a given commit ID.

Parameters
int$commit_idCommit ID.
Return
intBucket key.

private function getBucketCacheKey($bucket_key)

Get the cache key for a given bucket key (from getBucketKey()).

Parameters
int$bucket_keyBucket key.
Return
stringCache key.

private function getBucketSize()

Get the number of items per bucket.

Return
intNumber of items to store per bucket.

private function getBucketData($bucket_key, $rebuild_data)

Retrieve or build a graph cache bucket from the cache.

Normally, this operates as a readthrough cache call. It can also be used to force a cache update by passing the existing data to $rebuild_data.

Parameters
int$bucket_keyBucket key, from @{method:getBucketKey}.
mixed$rebuild_dataCurrent data, to force a cache rebuild of this bucket.
Return
arrayData from the cache.

private function rebuildBucket($bucket_key, $current_data)

Rebuild a cache bucket, amending existing data if available.

Parameters
int$bucket_keyBucket key, from @{method:getBucketKey}.
array$current_dataExisting bucket data.
Return
arrayRebuilt bucket data.