Horizontal Scaling vs Vertical Scaling
There are two main ways to scale our database…
- Horizontal Scaling (scale-out): It’s performed by adding more database nodes creating or increasing a database cluster.
- Vertical Scaling (scale-up): It’s performed by adding more hardware resources (CPU, Memory, Disk) to an existing database node.
For Horizontal Scaling, we can add more database nodes as slave nodes. It can help us to improve the read performance balancing the traffic between the nodes. In this case, we’ll need to add a load balancer to distribute traffic to the correct node depending on the policy and the node state.
To avoid a single point of failure adding only one load balancer, we should consider adding two or more load balancer nodes and using some tool like “Keepalived”, to ensure the availability.
As PostgreSQL doesn’t have native multi-master support, if we want to implement it to improve the write performance we’ll need to use an external tool for this task.
For Vertical Scaling, it could be needed to change some configuration parameter to allow PostgreSQL to use a new or better hardware resource. Let’s see some of these parameters from the PostgreSQL documentation.
- work_mem: Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. Several running sessions could be doing such operations concurrently, so the total memory used could be many times the value of work_mem.
- maintenance_work_mem: Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. Larger settings might improve performance for vacuuming and for restoring database dumps.
- autovacuum_work_mem: Specifies the maximum amount of memory to be used by each autovacuum worker process.
- autovacuum_max_workers: Specifies the maximum number of autovacuum processes that may be running at any one time.
- max_worker_processes: Sets the maximum number of background processes that the system can support. Specify the limit of the process like vacuuming, checkpoints, and more maintenance jobs.
- max_parallel_workers: Sets the maximum number of workers that the system can support for parallel operations. Parallel workers are taken from the pool of worker processes established by the previous parameter.
- max_parallel_maintenance_workers: Sets the maximum number of parallel workers that can be started by a single utility command. Currently, the only parallel utility command that supports the use of parallel workers is CREATE INDEX, and only when building a B-tree index.
- effective_cache_size: Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used.
- shared_buffers: Sets the amount of memory the database server uses for shared memory buffers. Settings significantly higher than the minimum are usually needed for good performance.
- temp_buffers: Sets the maximum number of temporary buffers used by each database session. These are session-local buffers used only for access to temporary tables.
- effective_io_concurrency: Sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. Raising this value will increase the number of I/O operations that any individual PostgreSQL session attempts to initiate in parallel. Currently, this setting only affects bitmap heap scans.
- max_connections: Determines the maximum number of concurrent connections to the database server. Increasing this parameter allows PostgreSQL running more backend process simultaneously.
At this point, there is a question that we must ask. How can we know if we need to scale our database and how can we know the best way to do it?
Scaling our PostgreSQL database is a complex process, so we should check some metrics to be able to determine the best strategy to scale it.
We can monitor the CPU, Memory and Disk usage to determine if there is some configuration issue or if actually, we need to scale our database. For example, if we’re seeing a high server load but the database activity is low, it’s probably not needed to scale it, we only need to check the configuration parameters to match it with our hardware resources.
Checking the disk space used by the PostgreSQL node per database can help us to confirm if we need more disk or even a table partitioning. To check the disk space used by a database/table we can use some PostgreSQL function like
From the database side, we should check
- Amount of connection
- Running queries
- Index usage
- Replication Lag
These could be clear metrics to confirm if the scaling of our database is needed.