What is sharding - A simple explanation

- 12 minute read

Waiting in line
Paul Hopmans
Crypto Expert
Paul Hopmans

✔️ Sharding means shard or small part of a whole

✔️ Sharding comes from the 1990s

✔️ Sharding is designed to make a database faster

✔️ Sharding on a blockchain is mainly meant to be fast and cheap

✔️ Ethereum has come up with its own solution to scalability problem

You've probably heard of sharding, but what it entails is another matter. After reading this blog, you'll know exactly what it is.

What is sharding?

Table of Contents

  1. What is sharding?
  2. How does sharding work on a blockchain?
  3. Purpose sharding
  4. Forms of sharding
  5. Danksharding in detail
  6. Strengths of sharding
  7. Weaknesses of sharding
  8. Examples of blockchains using sharding
  9. Sharding compared to other solutions
  10. The future of sharding

What is sharding?

Sharding is a term originally from the 1990s. You can translate it to small part of a whole. All kinds of problems on computers and in networks with databases had to be solved. It was even predicted that there would be no future for distributed databases with specialized hardware.

They were, as we can see in blockchain, completely wrong! But then again, predicting the future is trickier than just sitting around being a potty negative.

There have been tremendous developments since then, both in hardware, software and database techniques.

In 1997, the term shard also appeared in a popular game called Ultima Online. Each player ended up in their own piece of the virtual world, so you can rightly call this a shard. The game still exists and has since achieved 8 Guinness World Records.

Sharding is a database management technique in which a growing database is divided into smaller, manageable partitions or portions. Such a small partition is called a shard.

The idea is to store these portions on different machines as independent shards.

A database consists of rows and columns. In database sharding, you engage in horizontal partitioning, splitting the larger database into smaller sections, with the goal of faster and easier database management.

Sharding is usually used when a data set becomes too large to store in a single database. If a database becomes too large, a usual query of a database, for which it is of course partly made, may take too long or cost too much. A shard is then a solution, so there is no need to work with expensive hardware and queries have an acceptable response time. A logical data set is then divided into several small portions of this overall database.

Sharding, then, is horizontal scaling. With this technique, more devices are added, distributing the workload. When you scale vertically, you add more processing power to existing devices. The latter cannot be done indefinitely, so when devices can no longer process everything or become too expensive, you can start thinking about sharding, as in horizontal scaling. What used to be done by one device can then be distributed across multiple devices. In theory, sharding allows you to scale up endlessly, because you can simply add more and more devices that divide the work.

Sharding also means that individual devices do not have to search the entire database. They're in a small partition, responsible for a specific portion of the overall database, so they're ready much faster when a query comes up for their database, or a transaction needs to be processed that their partition is responsible for. This is obviously much faster, especially in increasingly large databases, which blockchains are known for.

In fact, a good visual representation of sharding is any queue. Suppose everyone in the Netherlands has to go get a new ID card on a certain date. You obviously can't make those millions of people wait in front of one counter in Utrecht or something. If you only open a second counter, the queue is already halved. That's the power of sharding. With each additional counter you open, the queue is reduced. So the purpose of sharding is clear in this case: to increase throughput by opening more counters.

Waiting line

How does sharding work on a blockchain?

So a shard takes responsibility for part of the total database. A blockchain consists of a database maintained by many computers. They call this a distributed database. If all nodes, validators or miners have to keep track of the entire database, it can put a lot of pressure on them, especially with large databases, like Ethereum with its popular smart contracts. By dividing the database into partitions or portions, the different validators or nodes only have to keep up with a portion of the database and are thus much faster. Communicating with each other is therefore greatly reduced for nodes.

One of the problems with blockchain is scalability. At the start of a blockchain, the database is not that large, but with each new block added, the database is 1 larger. As time marches on, backups of the database can start to cause problems. Especially popular networks like Bitcoin and Ethereum have scalability issues. Processing rates as low as 20 per second are no longer realistic when so many people are using a network. You can't make people wait endlessly for validation of their transaction. Moreover, miners or nodes often try to take advantage of network congestion by charging as much money as possible for validation.

So one way to solve this is the sharding process. In this process, huge numbers of transactions on a blockchain can be achieved per second if done properly. We are talking about numbers above 100,000 per second.

Once the layout of the shards is known, these shards must start doing what normally all nodes should do: check a transaction within their range. An example would be that they validate all transactions with a hash starting with a 1. Next, all members of this shard will have to agree according to a consensus protocol of this shard and then collect the approved transactions in a new block to be formed. These then become part of the blockchain. Because the database remains relatively small per shard, these types of blockchain networks are much faster than traditional blockchains such as Bitcoin, for example.

One of the problems of blockchain is the trilemma, as proposed by Vitalik Buterin of Ethereum. A blockchain must be decentralized, secure and scalable. To the latter, sharding aims to contribute significantly by partitioning a database into small pieces that are much easier and cheaper to process and a lot faster. In theory, a sharded database is endlessly scalable.

If you're going to partition a database, there are several ways you can do it. You can come up with something like dividing by alphabet, by number, by hash or whatever. The idea is then that a shard only has responsibility for that particular part of the database.

Purpose sharding

  • Increase scalability
  • Reduce transaction costs
  • Dividing the workload into smaller partitions
  • Increasing query response time per database
  • Increasing the computing power of a network by adding more machines
  • Using cheaper machines for validation or query

Forms of sharding

  1. Data sharding. This is simply a classification by alphabet, for example.
  2. Hash sharding. The database is partitioned based on a hash.
  3. Range sharding. This involves sharding based on a range, for example, everything beginning with the letter a through f.
  4. Network sharding. The network is divided into consensus groups that together verify a portion of all blocks based on randomness.
  5. Transaction sharding. Here a shard is assigned a certain portion of the workload based on a transaction ID, e.g. all transactions starting with 1 are in shard 1. A plausible risk with this is double spending, so this probably won't become popular anytime soon.
  6. State sharding. Here, a node has a partial view of a component of the entire system. Validating and approving a transaction may require multiple tables and thus shards. Ethereum has been working on this for a while and it was even on their roadmap, but they decided to move forward with layer 2 rollups to scale up Ethereum and make it cheaper. This also keeps consensus on Ethereum simpler in nature and doesn't require sharded databases.
  7. Danksharding. This is Ethereum's current path and is intended to eventually get a throughput of more than 100,000 transactions per second at the cheapest possible price.

Danksharding in detail

Because Ethereum is such an important network we will go into a little more detail about their strategy.

Danksharding is intended to make Ethereum a truly scalable blockchain. However, this still requires intermediate steps.

One of those intermediate steps is Proto-Danksharding . The name comes from two researchers named Protolambda and Dankrad Feist. The purpose of this Ethereum proposal is to add cheaper data to blocks on the blockchain.

Rollups now have to send their data to all the nodes in the Ethereum network, and that data stays there forever.

Danksharding introduces data blobs that can be attached to blocks that cannot be picked up by the Ethereum Virtual Machine and are automatically deleted after a certain amount of time.

A rollup consists of two parts: the data and the prover's check. The prover must check if the rollup is correct. So the latter does not have to be on the EVM forever.

Full Danksharding is actually no sharding at all. True shard chains are no longer on Ethereum's roadmap because Danksharding is such a good and simple solution.

Danksharding is the full realization of rollup scaling that started with Proto-Danksharding. With Danksharding, there will be a lot of space on Ethereum for rollups to dump their compressed transaction data on. This will allow Ethereum to support hundreds of individual rollups and eventually start reaching millions of transactions per second.

Normally I would say, "Talk is cheap!"

But yes, when Vitalik says it I am a bit more moderate in my statements. His network and success speaks for itself. If they succeed it could become the fastest blockchain in the cryptocurrency world. That's a little different than paying 50 bucks for a transaction and waiting two hours for it too!

They are going to do this by adding not 1 blob, as in the proto part, but 64 blobs to a block. Before you know it, you'll have a scary movie ! Well, scary. Rather funny.

Full Danksharding will still be some years away, but after a real summoning ceremony the magic of Ethereum will have another part to it in the form of Proto-Danksharding which is somewhat akin to witchcraft, although you may not call that part a shard.

Strengths of sharding

  1. If one shard goes down, all the other shards still work. If you have everything on one server and it goes down, your entire network is no longer reachable! With blockchain, by the way, this is rare given their distributed database.
  2. Sharding leads to smaller databases.
  3. Sharding makes a blockchain network cheaper.
  4. Managing a database is made easier by sharding.
  5. Sharing the workload can make complex operations easier.
  6. It makes a blockchain faster.
  7. You can use cheaper hardware.

Weaknesses of sharding

  1. Implementing sharding can be a very complex and time-consuming task.
  2. When sharding is introduced, database data may be lost or the database may become corrupted.
  3. Data is in all sorts of places, sometimes making it hard to see the wood for the trees.
  4. Congestion can still occur if a particular shard happens to get much more work than the others.
  5. The security of the blockchain or a shard may be compromised.
  6. If you ever want to get rid of sharding again, it can be a huge job to get the database correct again.
  7. You need more hardware as well as software.
  8. If shards depend on each other, a single corrupt shard can collapse the entire database.

Examples of blockchains using sharding

  • Zilliqa
  • Ontology
  • Polkadot
  • Near
  • Elrond
  • QuarkChain

Sharding compared to other solutions

The purpose of sharding is to increase scalability. It does this by creating islands that keep track of only a small portion of the blockchain. These shards usually do not require you to communicate with other shards.

Comparing this to other solutions, such as roll ups, sidechains or Bitcoin's Lightning Network, one notices that while these solutions allow a network to be more scalable, it still needs to communicate with the main blockchain, so it can still be overloaded.

The advantage of sharding is that you can actually scale up endlessly. If you first work with shards that process, say, all transactions starting with 1, you can create another subdivision consisting of the first two digits, and so on, making shards smaller and smaller relative to the full database.

The future of sharding

Sharding is a promising and convenient way to greatly increase the scalability of a blockchain. Cost savings are also an important part of this strategy.

However, because it is so difficult to implement, this may also inhibit the growth of sharding as an application.

Anyone who has looked at the disadvantages of sharding and compared that to Ethereum's solution might begin to doubt the effectiveness of sharding.

However, when you see that it will take years for even the superheroes of Ethereum to cobble something like this together, there may be something to be said for sharding.

What is undeniable, however, is the fact that a blockchain that uses sharding is much cheaper and faster than the traditional blockchain. And that is incredibly good news for ordinary users.