Posted
over 12 years
ago
Silicon Valley Cassandra User Group Last Fall, Rick Branson gave his popular Introduction to Cassandra talk to a packed house of the Silicon Valley Cassandra User Group. For all of those who missed it, Rick is doing it again. This time, he has
... [More]
re-tooled his talk to include CQL (Cassandra Query Language). If you've seen the talk before, it's worth catching again just to see Rick's introduction to CQL. DataStax will be picking up the tab for pizza and beverages, so you can come straight from the office. Hope to see you there! -Lynn 6:30pm - Networking 7:15pm - Rick Branson on Cassandra and CQL 8:15pm - Informal Q&A Santa Clara, CA 95054 - USA Thursday, January 10 at 7:00 PM Attending: 13 Details: http://www.meetup.com/silicon-valley-cassandra-user-group/events/87094382/ [Less]
|
Posted
over 12 years
ago
Silicon Valley Cassandra User Group Last Fall, Rick Branson gave his popular Introduction to Cassandra talk to a packed house of the Silicon Valley Cassandra User Group.read more
|
Posted
over 12 years
ago
In Cassandra a batch allows the client to group related updates into a single statement.read more
|
Posted
over 12 years
ago
In Cassandra a batch allows the client to group related updates into a single statement. If some of the replicas for the batch fail mid-operation, the coordinator will hint those rows automatically.
But there is one failure scenario that the
... [More]
classic batch design does not address: if the coordinator itself fails mid-batch, you could end up with partially applied batches.
In the past Cassandra has relied on the client to deal with this by retrying the batch to a different coordinator. This is usually adequate since writes in Cassandra, including batches, are idempotent — that is, performing the same update multiple times is harmless.
But, if the client and coordinator fail at the same time — say, because the client is an app server in the same datacenter, and suffers a power failure at the same time as the coordinator — then there is no way to recover other than manually crawling through your records and reconciling inconsistencies.
Some particularly sophisticated clients have implemented a client-side commitlog to handle this scenario, but this is a responsibility that logically belongs on the server. That is what atomic batches bring in Cassandra 1.2.
Using atomic batches
Batches are atomic by default starting in 1.2. Unfortunately, the price for atomicity is about a 30% hit to performance compared to the old non-atomic batches. So, we also provide BEGIN UNLOGGED BATCH for when performance is more important than atomicity guarantees.
1.2 also introduces a separate BEGIN COUNTER BATCH for batched counter updates. Unlike other writes, counter updates are not idempotent, so replaying them automatically from the batchlog is not safe. Counter batches are thus strictly for improved performance when updating multiple counters in the same partition.
(Note that we mean “atomic” in the database sense that if any part of the batch succeeds, all of it will. No other guarantees are implied; in particular, there is no isolation; other clients will be able to read the first updated rows from the batch, while others are in progress. However, updates within a single row are isolated.)
Under the hood
Atomic batches use a new system table, batchlog, defined as follows:
CREATE TABLE batchlog (
id uuid PRIMARY KEY,
written_at timestamp,
data blob
);
When an atomic batch is written, we first write the serialized batch to the batchlog as the data blob. After the rows in the batch have been successfully written (or hinted), we remove the batchlog entry. (There are thus some similarities to how Megastore uses a Bigtable ColumnFamily as a transaction log, but atomic batches are much, much more performant than Megastore writes.)
The batchlog table is node-local, along with the rest of the system keyspace. Instead of relying on normal Cassandra replication, StorageProxy special-cases the batchlog. This lets us make two big improvements over a naively replicated approach.
First, we can dynamically adjust behavior depending on the cluster size and arrangement. Cassandra prefers to perform batchlog writes to two different replicas in the same datacenter as the coordinator. This is not for durability — since we only need to record the batch until it’s successfully written — so much as fault tolerance: if one batchlog replica fails, we don’t need to wait for it to timeout before retrying to another. But, if only one replica is available, Cassandra will work with that without requiring an operator to manually adjust replication parameters.
Second, we can make each batchlog replica responsible for replaying batches that didn’t finish in a timely fashion. This saves the complexity of requiring coordination between coordinator and replicas, and having to “failover” replay responsibility if the coordinator is itself replaced. (Again, since writes are idempotent, having multiple replicas of the batch replayed occasionally is fine.)
We are also able to make some performance optimizations based on our knowledge of the batchlog’s function. For instance, since each replica has local responsibility to replay failed batches, we don’t need to worry about preserving tombstones on delete. So in the normal case when a batch is written to and removed from the batchlog in quick succession, we don’t need to write anything to disk on memtable flush.
Availability
Atomic batches are feature complete in Cassandra 1.2beta1, which is available for download on the Apache site; we’re projecting a final release by the end of the year. [Less]
|
Posted
over 12 years
ago
Silicon Valley Cassandra User Group Matthias Broecheler of Aurelius will give a presentation on the Titan graph database.read more
|
Posted
over 12 years
ago
Silicon Valley Cassandra User Group Matthias Broecheler of Aurelius will give a presentation on the Titan graph database. Because this talk is of interest to both the Cassandra and graph communities, this will be a joint meetup with the Bay Area
... [More]
Graph Geeks. For a bit of background on Titan, check out Matthias' talk at the 2012 Cassandra Summit. For his November talk, Matthias will also cover the latest developments with Titan. Additional details will follow. Hope you can join us! -Lynn Santa Clara, CA 95054 - USA Tuesday, November 13 at 7:00 PM Attending: 4 Details: http://www.meetup.com/silicon-valley-cassandra-user-group/events/86687402/ [Less]
|
Posted
over 12 years
ago
Handling Disk Failures In Cassandra 1.2.
Cassandra is great at handling entire node failures. It’s not just robust, it’s almost indestructible.
But until Cassandra 1.2, a single unavailable disk has the potential to make the whole replica
... [More]
unresponsive, while still technically alive and part of the cluster: memtables will be unable to flush and the node will eventually run out of memory. Commitlog append may also fail if you happen to lose the commitlog disk.
The traditional workaround has been to deploy on raid10 volumes, but as Cassandra handles increasingly large data volumes the prospect of paying an extra 50% space penalty on top of Cassandra’s own replication is becoming unpalatable.
The upcoming Cassandra 1.2 release (currently in beta) fixes both of these issues by introducing a disk_failure_policy setting that allows you to choose from two policies that deal with disk failure sensibly: best_effort and stop. Here is how these work:
stop is the default behavior for new 1.2 installations. Upon encountering a file system error Cassandra will shut down gossip and Thrift services, leaving the node effectively dead, but still inspectable via JMX for troubleshooting.
best_effort Cassandra will do its best in the face of disk errors: if it can’t write to a disk, the disk will become blacklisted for writes and the node will continue writing elsewhere; if Cassandra can’t read from a disk, it will be marked as unreadable, and the node will continue serving data from readable sstables only. This implies that it’s possible for stale data to be served when the most recent version was on the disk that is no longer accessible and consistency level is ONE, so choose this option with care. This allows you to get the most out of your disks.
An ignore policy also exists for upgrading users. In this mode Cassandra will behave in the exact same manner as 1.1 and older versions did – all file system errors will logged but otherwise ignored. DataStax recommends users opt in to stop or best_effort instead.
Summary
Starting with version 1.2, Cassandra will be able to properly react to a disk failure – either by stopping the affected node or by blacklisting the failed drive, depending on your availability/consistency requirements. This allows deploying Cassandra nodes with large disk arrays without the overhead of raid10. [Less]
|
Posted
over 12 years
ago
Handling Disk Failures In Cassandra 1.2.
Cassandra is great at handling entire node failures. It’s not just robust, it’s almost indestructible.read more
|
Posted
over 12 years
ago
We’re pleased to let you know that DataStax Enterprise Edition version 2.2 is now available for download.
The primary enhancement included in DataStax Enterprise 2.2 is integration with a version of Apache Cassandra 1.1 that we’ve certified for
... [More]
production use. For a list of new features contained in Cassandra 1.1, please refer to a blog post by Jonathan that summarizes the release or our “What’s New in Apache 1.1” white paper.
Version 2.2 also extends our Linux support by adding SUSE Linux 11.2 and 11.4 to our list of platforms.
For more information on DataStax Enterprise 2.2, please see our online documentation, white papers, articles, and customer case studies and interviews to find out how modern businesses are relying on DataStax Enterprise to power the apps that power their business.
[Less]
|
Posted
over 12 years
ago
We’re pleased to let you know that DataStax Enterprise Edition version 2.2 is now available for download.
The primary enhancement included in DataStax Enterprise 2.2 is integration with a version of Apache Cassandra 1.1 that we’ve certified for production use.read more
|