/ CS

cs - DATABASE - Difference Sharding and Partitioning

CS 스터디 공부 목록

Database Sharding vs. Partitioning: What’s the Difference?

Introduction:

  • Grewing size of the service, vary and many Tables eixisted.
  • VLDB (Very Large DBMS) DBMSs that make it difficult to fit the entire DB into one DBMS have naturally emerged, and as one DBMS manages many tables, a slowdown issue occurs.
  • One way to address these issues is through partitioning.
  • Sharding and Partitioning are both related to breaking up a large data set into smaller subsets.
  • The difference is that sharding implies the data is spread across multiple computers while partitioning does not.

1. What is Partitioning?

  • Partitioning is the database process where very large tables are divied into multiple smaller parts.
  • it’s more a generic term for dividing data across tables or databases.
  • The main goal of partitioning is to aid in maintenance of large tables and to reduce the overall response time to read and load data for particular SQL operations.

1.1.Vertical Partitioning

  • Vertical table partitioning is mostly used to increase SQL Server performance especially in cases where a query retrieves all columns from a table that contains a number of very wide text or BLOB columns.
  • In this case to reduce access times the BLOB columns can be split to its own table.
  • Another example is to restrict access to sensitive data (exaple, passwords, salary information etc.)
  • Vertical partitioning splits a table into two or more tables containing different columns.

1.2. Advantage

1.2.1. Availability

  • Physical partitioning reduces the likelihood of overall data corruption and improves data availability.

1.2.2. Manageability

  • Eliminates large tables for easier management.

1.2.3. Performance

  • Improves the performance of certain DMLs and queries, primarily in large data write environments.
  • Reduces contention in OLTP systems with many inserts by distributing insert operations to separate partitions.

1.3. Disadvantages

  • Increases the cost of joins between tables.
  • We cannot partition tables and indexes separately. Tables and indexes must be partitioned together.

2. What is Sharding(Horizontal Partitioning)?

  • A type of horizontal partitioning that splits large database into smaller components, which are faster and easier to manage.
  • In other words, Splitting up a large table of data horizontally i.e. row-wise.
  • As both the database size and number of transactions increase, so does the response time of querying the database.
  • Costs associated with maintaining a huge database can also skyrocket due to the number and quality of computers you need to manage your workload.
  • Data shards, on the other hand, have fewer hardware and software requirements and can be managed on less expensive servers.

2.1. Advantages of Sharding

2.1.1. Solve Scalability Issue:

  • With a single database server architecture any application experiences performance degradation.
  • At some point, we will be running out of disk space.
  • Database sharding fixes all these issues by partitioning the data across multiple machines.

2.1.2. High Availability:

  • If an outage happens in sharded architecture, then only some specific shards will be down.
  • All the ohther shards will continue the operation and the entire application won’t be unavailable for the users.

2.1.3. Speed Up Query Response Time:

  • In a sharded database a query has to go through fewer rows, and we receive the response in less time.

2.1.4. More Write Bandwidth:

  • For many applications writings is a major bottleneck.
  • With no master database serializing writes sharded archietecure allows we to write in parallel and increase our write throughtput.

2.1.5. Scaling Out:

  • Sharding a database facilitates horizontal scaling, known as scaling out.
  • In horizontal scaling, we add more machines in the network and distribute the load and these machines for faster processing and response.

2.2. Disadvantages of Sharding

2.2.1 Adds Comlexity in the System:

  • It’s a complicated task and if it’s not implemented properly then we may lose the data or get corrupt tables in our database.

2.2.2. Reblancing Data:

  • In a sharded database architecure, somtimes shards become unbalanced and may create database hotspot.
  • To overcome this problem and to rebalance the data we need to do re-sharding for even data distribution.
  • Moving data from one shard to another shard is not a good idea beacause it requires a lot of downtime.

2.2.3. Joining Data from Multiple Shards is Expensive:

  • In sharded architecture, we need to pull the data from different shards, and we need to perform joins across multiple networked servers.
  • We can pull out the data and join the data across the network.
  • This is going to be an expensive and time-consuming process.
  • It adds latency to our system.

2.2.4. No Native Support:

  • Sharding is not natively supported by every database engine.