Best Practices for Scaling Reference Databases (Work in Progress)

The premise of this document is to provide reader with ideas on how to manage a growing number of reference databases as teams adopt Datical.

Introduction

During early phase of Datical adoption, teams typically setup Reference Databases on their own, i.e., they may not be using centralized infrastructure from day 1. This allows teams flexibility to manage and configure their Reference Databases as well as underlying servers. However, as Datical usage grows, the approach above would create a sprawl of Reference Databases. To better manage sprawl, it would be ideal to move Reference Databases to centralized/shared infrastructure.

Note that Reference Databases are typically structure-only databases without any significant amount of data. You may still need lookup data or test data to validate objects.

Reference Databases for SQL Server

You could put a large number of Reference Databases on single SQL Server instance. The number of Reference Database per instance would be dictated by the size of each Reference Database (size depends on number of objects in the database and the amount of data needed in the database).

Database name collisions could occur when different teams use the same database name. In that case, a separate database instance would be needed.

If dealing with multiple versions of SQL Server (2008, 2012, 2014, 2016, etc.). You could setup different versions of SQL Server on the same server. Each version of SQL Server would run as a separate Database Engine instance.

The best practice, however, for scaling out SQL Server-based Reference Databases would be to arrange separate servers for each version of SQL Server. And then run only one version of Database Engine (SQL Server instance) per server. It would be possible to set up multiple servers for the same version of Database Engine to support the case for database name collisions.

Reference Database for Oracle 11g

An Oracle Reference Database is essentially one or more schemas in an Oracle instance. Datical connects to Reference Databases using user that are also names of schemas in the database.

In a pipeline that consists of DEV, QA and PROD environments, the schema name (e.g., "appSchema") would be consistent across each environment. As such, there would be multiple Oracle instances configured for each environment and each instance would have "appSchema" defined.

Because schema names have to be unique within an Oracle instance, this creates the need to setup new Oracle instances for each numbered Reference Databases.

Each Datical project could have one or more pipelines. This allows application teams to have multiple concurrent release streams. Datical requires that each pipelines has a dedicated Reference Database in the following manner. The table below shows sample pipelines for one Datical project

	Environments in the pipeline	Notes
Pipeline 1	REF1 → DEV1 → QA1 → UAT → PROD	Reference Databases (numbered) for each pipeline. DEV1-5, QA1-5, UAT and PROD are existing databases in pipelines.
Pipeline 2	REF2 → DEV2 → QA2 → UAT → PROD
Pipeilne 3	REF3 → DEV3 → QA3 → UAT → PROD
Pipeline 4	REF4 → DEV4 → QA4 → UAT → PROD
Pipeline 5	REF5 → DEV5 → QA5 → UAT → PROD

Each Datical project will have similar pipelines as shown in the table above where projects will require one or more numbered Reference Databases. Each of the numbered Reference Databases will be housed in their designated Oracle 11g instance. For example, all REF1 databases from all Datical projects could be housed in a single Oracle 11g instance. Similarly, all REF2 databases from all Datical projects could be housed in the second Oracle 11g instance.

Numbered Reference Databases	Oracle 11g instance
REF1 databases for all Datical projects	Instance #1
REF2 databases for all Datical projects	Instance #2
REF3 databases for all Datical projects	Instance #3
REF4 databases for all Datical projects	Instance #4
REF5 databases for all Datical projects	Instance #5

Shared services team who own centralized infrastructure will need to figure out how many numbered Reference Databases are going to be used and from there decide how many Oracle 11g instances will be needed.

The best practice, for scaling out Oracle 11g based Reference Databases would be to setup one Oracle 11g instance per server and have separate servers for each Oracle 11g instance. In the example above, you would setup 5 servers, each with a single Oracle 11g instance. The number of Reference Database per Oracle 11g instance would be dictated by the size of each Reference Database (size depends on number of objects in the database and the amount of data needed in the database).

Reference Databases for Oracle 12c

Oracle 12c has the concept of pluggable databases (PDBs). A PDB is a portable collection of schemas, schema objects, and non-schema objects that appear to an Oracle Net client as a separate database. One or more PDBs together are called a container database (CDB).

Because each PDB is a separate database, you can configure all Reference Databases as PDBs on a single Oracle 12c instance. The number of PDBs (hence number Reference Databases) per Oracle 12 instance would be dictated by the size of each Reference Database (size depends on number of objects in the database and the amount of data needed in the database).

Best Practices for High Availability

Because possibly large number of databases will be running on few servers, it would be necessary to consider High Availability options for each database. The key consideration here is for SQL Server instances. The out-of-the-box configuration for SQL Server requires elevated privileges which requires SQL Server instances to be configured in single user mode. However, a custom, non-elevated setup can be used which can work with SQL Server instances configured in Availability Groups (AG).