Tuesday, August 22, 2023

Quick useful Cassandra links



changing  Vnode  impact on Cluster --> https://thelastpickle.com/blog/2021/01/29/impacts-of-changing-the-number-of-vnodes.html

ER diagram  shapes  --> https://www.geeksforgeeks.org/introduction-of-er-model/ 

https://www.edureka.co/blog/interview-questions/cassandra-interview-questions/





 Data modelling in cassandra

https://www.datastax.com/learn/data-modeling-by-example




Query First approach 

Cassandra gets a lot of its speed in database reads and writes from the fact that it never has to perform any joins on the database in fact in Cassandra it’s impossible to perform joins, so instead of taking a relational model approach we take a query first approach, this means we design our tables for a specific query, as a result, we want to have every table which is catered for a specific query, rather than flexible tables such as this employee table and the company car table, hence, we only ever have to query one table when we’re reading or writing data, obviously there are some consequences to this approach, as we might end up writing the same data to multiple tables just to satisfy different queries but it’s ideal for a Cassandra’s distributed architecture.



Read  more on read repair 




Monday, August 7, 2023

Cassandra is key value or columnar

 Cassandra is not a columnar database.
A columnar/column-store/column-oriented database, as you said, guarantees data locality for a single column, within a given node, on disk. This is a column that spans many or all rows depending on if, or how, you specify partitions and what the database supports.

Cassandra is a column-family* store. A column-family store ensures data locality at the partition level, not the column level. In a database like Cassandra a partition is a group of rows and columns split up by a specified partition key, then clustered together by specified clustering column(s) (optional). To query Cassandra, you must know, at a minimum, the partition key in order to avoid full scans of your data.
All data for a given partition in Cassandra is guaranteed to be on the same node and in a given file (SSTable) in the same location within that file. The one thing to note here is that depending on your compaction strategy, the partition can be split across multiple files on disk, so data locality on disk is not a guarantee.

Column stores lend themselves to, and are designed for, analytic workloads. Because each column is in the same location on disk, they can read all information for a given column across many/all rows incredibly fast. This comes at the cost of very slow writes which usually need to be done in batch loads to avoid drastic performance implications.

Column-family stores, like Cassandra, are a great choice if you have high throughput writes and want to be able to linearly scale horizontally. Reads that use the partition key are incredibly fast as the partition key defines where the data resides. The downfall here is that if you need to do any sort of ad-hoc query, a full scan of all data is required.
* The term "column-family" comes from the original storage engine that was a key/value store, where the value was a "family" of column/value tuples. There was no hard limit on the number of columns that each key could have. In Cassandra this was later abstracted into "partitions", then eventually the storage engine was modified to match the abstraction.

Courtsy --> https://www.quora.com/How-is-Cassandra-a-columnar-database