Database SchemaPhorge Contributor Documentation (Developer Guides)
This document describes key components of the database schema and should answer questions like how to store new types of data.
Database System
Phorge uses MySQL or another MySQL-compatible database (like MariaDB or Amazon RDS).
Phorge uses the InnoDB table engine. The only exception is the search_documentfield table which uses MyISAM because MySQL doesn't support fulltext search in InnoDB (recent versions do, but we haven't added support yet).
We are unlikely to ever support other incompatible databases like PostgreSQL or SQLite.
PHP Drivers
Phorge supports MySQL and MySQLi PHP extensions.
Databases
Each Phorge application has its own database. The names are prefixed by phorge_ (this is configurable).
Phorge uses a separate database for each application. To understand why, see Why does Phorge need so many databases?.
Connections
Phorge specifies if it will use any opened connection just for reading or also for writing. This allows opening write connections to a primary and read connections to a replica in primary/replica setups (which are not actually supported yet).
Tables
Most table names are prefixed by their application names. For example, Differential revisions are stored in database phorge_differential and table differential_revision. This generally makes queries easier to recognize and understand.
The exception is a few tables which share the same schema over different databases such as edge.
We use lower-case table names with words separated by underscores.
Column Names
Phorge uses camelCase names for columns. The main advantage is that they directly map to properties in PHP classes.
Don't use MySQL reserved words (such as order) for column names.
Data Types
Phorge defines a set of abstract data types (like uint32, epoch, and phid) which map to MySQL column types. The mapping depends on the MySQL version.
Phorge uses utf8mb4 character sets where available (MySQL 5.5 or newer), and binary character sets in most other cases. The primary motivation is to allow 4-byte unicode characters to be stored (the utf8 character set, which is more widely available, does not support them). On newer MySQL, we use utf8mb4 to take advantage of improved collation rules.
Phorge stores dates with an epoch abstract data type, which maps to int unsigned. Although this makes dates less readable when browsing the database, it makes date and time manipulation more consistent and straightforward in the application.
We don't use the enum data type because each change to the list of possible values requires altering the table (which is slow with big tables). We use numbers (or short strings in some cases) mapped to PHP constants instead.
JSON and Other Serialized Data
Some data don't require structured access -- we don't need to filter or order by them. We store these data as text fields in JSON format. This approach has several advantages:
- If we decide to add another unstructured field then we don't need to alter the table (which is slow for big tables in MySQL).
- Table structure is not cluttered by fields which could be unused most of the time.
An example of such usage can be found in column differential_diffproperty.data.
Primary Keys
Most tables have an auto-increment column named id. Adding an ID column is appropriate for most tables (even tables that have another natural unique key), as it improves consistency and makes it easier to perform generic operations on objects.
For example, LiskMigrationIterator allows you to very easily apply a migration to a table using a constant amount of memory provided the table has an id column.
Indexes
Create all indexes necessary for fast query execution in most cases. Don't create indexes which are not used. You can analyze queries Using DarkConsole.
Older MySQL versions are not able to use indexes for tuple search: (a, b) IN ((%s, %d), (%s, %d)). Use AND and OR instead: ((a = %s AND b = %d) OR (a = %s AND b = %d)).
Foreign Keys
We don't use foreign keys because they're complicated and we haven't experienced significant issues with data inconsistency that foreign keys could help prevent. Empirically, we have witnessed first hand as ON DELETE CASCADE relationships accidentally destroy huge amounts of data. We may pursue foreign keys eventually, but there isn't a strong case for them at the present time.
PHIDs
Each globally referenceable object in Phorge has an associated PHID ("Phorge ID") which serves as a global identifier, similar to a GUID. We use PHIDs for referencing data in different databases.
We use both auto-incrementing IDs and global PHIDs because each is useful in different contexts. Auto-incrementing IDs are meaningfully ordered and allow us to construct short, human-readable object names (like D2258) and URIs. Global PHIDs allow us to represent relationships between different types of objects in a homogeneous way.
For example, infrastructure like "subscribers" can be implemented easily with PHID relationships: different types of objects (users, projects, mailing lists) are permitted to subscribe to different types of objects (revisions, tasks, etc). Without PHIDs, we would need to add a "type" column to avoid ID collision; using PHIDs makes implementing features like this simpler.
For more information, see Handles Technical Documentation
Transactions
Transactional code should be written using transactions. Example of such code is inserting multiple records where one doesn't make sense without the other, or selecting data later used for update. See chapter in LiskDAO.
Advanced Features
We don't use MySQL advanced features such as triggers, stored procedures or events because we like expressing the application logic in PHP more than in SQL. Some of these features (especially triggers) can also cause a great deal of confusion, and are generally more difficult to debug, profile, version control, update, and understand than application code.
Schema Denormalization
Phorge uses schema denormalization sparingly. Avoid denormalization unless there is a compelling reason (usually, performance) to denormalize.
Schema Changes and Migrations
To create a new schema change or migration:
Create a database patch. Database patches go in resources/sql/autopatches/. To change a schema, use a .sql file and write in SQL. To perform a migration, use a .php file and write in PHP. Name your file YYYYMMDD.patchname.ext. For example, 20141225.christmas.sql.
Keep patches small. Most schema change statements are not transactional. If a patch contains several SQL statements and fails partway through, it normally can not be rolled back. When a user tries to apply the patch again later, the first statement (which, for example, adds a column) may fail (because the column already exists). This can be avoided by keeping patches small (generally, one statement per patch).
Use namespace and character set variables. When defining a .sql patch, you should use these variables instead of hard-coding namespaces or character set names:
Variable | Meaning | Notes |
---|---|---|
{$NAMESPACE} | Storage Namespace | Defaults to phabricator |
{$CHARSET} | Default Charset | Mostly used to specify table charset |
{$COLLATE_TEXT} | Text Collation | For most text (case-sensitive) |
{$COLLATE_SORT} | Sort Collation | For sortable text (case-insensitive) |
{$CHARSET_FULLTEXT} | Fulltext Charset | Specify explicitly for fulltext |
{$COLLATE_FULLTEXT} | Fulltext Collate | Specify explicitly for fulltext |
Test your patch. Run bin/storage upgrade to test your patch.