Google uses a data storage facility called the Bigtable. It is a distributed, persistent, multidimensional sorted map. Users can store strings under an index that consists of a row key, a column key, and a timestamp. This key points to an uninterpreted array of bytes i.e. strings of the size of about 64KB. The keys used can be generated by the database or by the application. Each row-column intersection can contain multiple cells at different timestamps. It provides a record of how the stored data has been altered over time. Bigtable also excels as a storage engine for batch MapReduce operations, stream processing/analytics, and machine learning applications.
Bigtable is a compressed, high performance, proprietary data storage system that is built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB), and few other Google technologies. A public version of Bigtable was made available as a service on May 6th, 2015. It also underlines Google Cloud Datastore which is available as a part of the Google Cloud Platform. The application can define how many entries based on the timestamp should be kept.
Features of Bigtable
As known that Bigtable is a fully managed, scalable NoSQL database service for large analytical and operations workloads, here are the features it possesses as it is been operated on:
- It is easily connected to Google Cloud services such as BigQuery or the Apache ecosystem.
- Also, it is seamlessly scaled to match your storage needs; and there is no downtime during reconfiguration.
- It is designed with a storage engine for machine learning applications leading to better predictions.
- Bigtable consistently sub 10ms latency, and it handles millions of requests per second.
- It is ideal for use cases such as personalization, ad tech, fintech, digital media, and loT.
There are other key features of the Bigtable usage, but it is important you know about the benefits of Bigtable before knowing the key features it possesses.
Benefits of Bigtable
Using the Bigtable comes with unimagined benefits to its users. There are several benefits to it, but I will be detailing the major benefits it possesses. The benefit of Bigtable includes:
- Fast and performant: You can use Bigtable as the storage engine that grows with you from your first gigabyte to petabyte-scale for a low latency application as well as high throughput data processing and analytics.
- Simple and integrated: It is a fully managed service that integrates easily with big data tools like Hadoop, Dataflow, and Dataproc. And also, it supports the open-source HBase API standard makes it easier for users to get started for development teams.
- Seamless scaling and replication: Bigtable start with a single node per cluster, and seamlessly scale to hundreds of nodes dynamically supporting peak demand. Replication also adds high availability and workload isolation for a live serving app.
Key Features of Bigtable
Alongside the benefits of using this Bigtable, there are some basic key features accessible when using Bigtable. Below, I will be listing out the basic key features of Bigtable, they include:
- High throughput at low latency: Bigtable is ideal for storing a very large amount of data in a key-value store. It supports high read and writes throughput at low latency for fast access to the large amount of data stored on it. Throughput scales linearly, you can increase the queries per second (QPS) by adding Bigtable nodes. Bigtable is built with proven infrastructures that power Google products used by millions of users such as Search and Maps.
- Cluster resizing without downtime: It scales seamlessly from thousands to millions of reads/writes per second and it can be dynamically adjusted by adding or removing cluster nodes without restarting it. This means that you can increase the size of a Bigtable cluster for a few hours to handle a large amount of data, then reduce the cluster size again all without any downtime.
- Flexible, automate replication to optimize any workload: You can write data once and automatically replicate it where needed with eventual consistency, giving you the full control for high availability and isolation of reading and writing workloads. There is no need for a manual step to ensure consistency, repair data, or synchronize writes and deletes. It benefits from a high availability SLA of about 99.99%, for instance with multi-cluster routing (99.9% for single cluster instances).
Use of Bigtable
It is ideal for applications that need very high throughput and scalability for key/value data, where each value is not larger than 10MB. Bigtable also excels as a storage engine and for storage/query of data types. Below are the types of data that can be stored and queried on Bigtable:
- Graph data: This is information about how users are connected to one another.
- Time series data: Such as CPU and memory usage overtime for more than one server.
- Internet of things data: These are usage reports from energy meters and home appliances.
- Marketing data: Such as purchase history and customer preference entry.
- Financial data: Data such as transaction histories, currency exchange rates, and stock prices.
Bigtable Storage Model
It stores data in massively scalable tables, each of which is a sorted key/value map. This table is composed of rows, each of which describes a single entity, and columns which contain individual values for each row. Each row is indexed by a single row key, and columns that is related to one another are grouped together into a column family. Each of the columns is identified by a combination of the column family and a column qualifier which is a unique name within the column family.
Each row/column intersection can contain multiple cells at different timestamps, providing a record of how the stored data is been altered over time. Bigtable tables are sparse; if a cell does not contain any data, it does not consume any space.
Each Bigtable zone is managed by a primary process that balances workload and data volume within the clusters. The process splits busier/larger tablets in half and merges less accessed/smaller tablets together, redistributing them between nodes as required. If a particular tablet gets a spike of traffic, Bigtable splits the tablet in two, then distribute one of the new tablets to another node. Bigtable manages all of the splittings, merging, ad rebalancing atomically, hereby saving users the effort of manually administering their tablets.
Memory and disk usage
There are sections that describe how several components of Bigtable affect memory and disk usage for your instance. Here are the sections listed below.
- Empty cells: This does not take up any space. Each row is essentially a collection of key/value entries where the key is a combination of the column family, column qualifier, and timestamp. If a row in the Bigtable does not include a value for a specific key, then the key/value entry is simply not present.
- Column qualifiers: This takes up space in a row since each column qualifier used in a row is stored in that row. It is efficient to use column qualifiers as data entries.
- Compactions: Bigtable periodically rewrites your tables to remove deleted entries and to organize your data so that reads and writes are more efficient.
- Mutations: This is also referred to as changes, and it is done to a row which takes up storage space because of Bigtable stores mutations sequentially and compacts them only periodically. When Bigtable compacts a table, it removes values that are no longer needed.
- Deletion: Also takes up extra storage space, at least in the short term, because it is actually a specialized type of mutation. Until the table is compacted, a deletion uses extra storage rather than freeing up spaces.
- Data durability: When using Bigtable, data is stored on Colossus (Google internal) highly durable file system, using storage devices in Google data centers.
- Security: You can assign IAM roles that prevent individual users from reading from tables, and writing to tables, or creating new instances.
- Backups: Bigtable backups let you save a copy of your data, then restore from the backup to another tablet at a later time.