Install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8

Posted on 261 views

How can I install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8 machine?. Apache Cassandra is a free and open-source NoSQL database management system designed to be distributed and highly available. Cassandra can handle large amounts of data across many commodity servers without any single point of failure.

This guide will walk you through the installation of Cassandra on CentOS 8 | Rocky Linux 8. After installation is done, we’ll proceed to do configurations and tuning of Cassandra to work with machines having minimal resources available.

Features of Cassandra

Cassandra provides the Cassandra Query Language (CQL), an SQL-like language, to create and update database schema and access data. CQL allows users to organize data within a cluster of Cassandra nodes using:

  • Keyspace: defines how a dataset is replicated, for example in which datacenters and how many copies. Keyspaces contain tables.
  • Table: defines the typed schema for a collection of partitions. Cassandra tables have flexible addition of new columns to tables with zero downtime. Tables contain partitions, which contain partitions, which contain columns.
  • Partition: defines the mandatory part of the primary key all rows in Cassandra must have. All performant queries supply the partition key in the query.
  • Row: contains a collection of columns identified by a unique primary key made up of the partition key and optionally additional clustering keys.
  • Column: A single datum with a type which belong to a row.

Cassandra has support for the following client drivers:

  • Java
  • Python
  • Ruby
  • C# / .NET
  • Nodejs
  • PHP
  • C++
  • Scala
  • Clojure
  • Erlang
  • Go
  • Haskell
  • Rust
  • Perl
  • Elixir
  • Dart

Install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8

Java is required for running Cassandra on CentOS 8 | Rocky Linux 8. As of this writing, required version of Java is 8. If you want to use cqlsh, you need the latest version of Python 2.7.

Step 1: Install Java 8 and Python and cqlsh

Install Python3 Pip and OpenJDK 8 on your CentOS / Rocky Linux 8:

sudo yum  install python3 python3-pip java-1.8.0-openjdk  java-1.8.0-openjdk-devel

Install cqsh using pip3 Python package manager:

sudo pip3 install cqlsh tox

Ensure the install is successful:

Collecting importlib-metadata; python_version < "3.8" (from click->geomet<0.3,>=0.1->cassandra-driver->cqlsh)
Collecting zipp>=0.5 (from importlib-metadata; python_version < "3.8"->click->geomet<0.3,>=0.1->cassandra-driver->cqlsh)
Collecting typing-extensions>=3.6.4; python_version < "3.8" (from importlib-metadata; python_version < "3.8"->click->geomet<0.3,>=0.1->cassandra-driver->cqlsh)
Installing collected packages: thrift, cql, zipp, typing-extensions, importlib-metadata, click, geomet, cassandra-driver, cqlsh
  Running install for thrift ... done
  Running install for cql ... done
Successfully installed cassandra-driver-3.25.0 click-8.0.3 cql-1.4.0 cqlsh-6.0.0 geomet-0.2.1.post1 importlib-metadata-4.8.1 thrift-0.15.0 typing-extensions- zipp-3.6.0

Confirm the installation of Java and cqlsh.

$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

$ cqlsh --version
cqlsh 6.0.0

Step 2: Install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8

Now that Java and Python are installed. Let’s now add Cassandra repository to our CentOS / Rocky system.

sudo tee  /etc/yum.repos.d/cassandra.repo <

Install Apache Cassandra with the command below.

sudo yum -y install cassandra

Create Cassandra service.

sudo tee /etc/systemd/system/cassandra.service<

Start and enable service to start at boot.

sudo systemctl daemon-reload
sudo systemctl start cassandra.service
sudo systemctl enable cassandra

Check service status:

$ systemctl status cassandra.service
● cassandra.service - Apache Cassandra
   Loaded: loaded (/etc/systemd/system/cassandra.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-03-04 22:24:31 EAT; 2s ago
 Main PID: 8758 (java)
    Tasks: 10 (limit: 26213)
   Memory: 3.9G
   CGroup: /system.slice/cassandra.service
           └─8758 java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-Us>

Mar 04 22:24:31 cent8.localdomain systemd[1]: Started Apache Cassandra.

You can also verify that Cassandra is running with the command below.

$ nodetool status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  70 KiB     256          100.0%            0daf41fa-22e5-4471-bc00-9aed6f566235  rack1

To run a query against Cassandra, invoke the CQL shell with below command.

$ cqlsh
Connected to Test Cluster at
[cqlsh 6.0.0 | Cassandra 4.0.1 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
  • The default location of configuration files is /etc/cassandra.
  • The default location of log and data directories is /var/log/cassandra/ and /var/lib/cassandra.

Step 3: Configuring Cassandra on CentOS 8 | Rocky Linux 8

For running Cassandra on a single node, the default configuration file present at /etc/cassandra/conf/cassandra.yaml. For cluster of nodes setup, you may need to modify this file to ensure your cluster is tuned properly.

At a minimum you should consider setting the following properties:

  • cluster_name: the name of your cluster.
  • seeds: a comma separated list of the IP addresses of your cluster seeds.
  • storage_port: you don’t necessarily need to change this but make sure that there are no firewalls blocking this port.
  • listen_address: the IP address of your node, this is what allows other nodes to communicate with this node so it is important that you change it.
  • native_transport_port: as for storage_port, make sure this port is not blocked by firewalls as clients will communicate with Cassandra on this port.

Changing the location of directories

The configuration yaml file controls the following data directories.

  • data_file_directories: one or more directories where data files are located.
  • commitlog_directory: the directory where commitlog files are located.
  • saved_caches_directory: the directory where saved caches are located.
  • hints_directory: the directory where hints are located.

For performance reasons, if you have multiple disks, consider putting commitlog and data files on different disks.

Setting Environment variables

The JVM level settings such as heap size are set in the Consider adding any additional JVM command line argument to the JVM_OPTS environment variable. These arguments are passed to Cassandra service when it starts.

Cassandra Logging

The logger in use is logback. You can change logging properties by editing logback.xml. By default it will log at INFO level into a file called system.log and at debug level into a file calle debug.log. When running in the foreground, it will also log at INFO level to the console.

Refer to official guide for Clients configuration.


Gravatar Image
A systems engineer with excellent skills in systems administration, cloud computing, systems deployment, virtualization, containers, and a certified ethical hacker.