Install Apache Cassandra on AlmaLinux 8|Oracle Linux 8

Posted on 188 views

Many organizations handling large unstructured data require a highly scalable and available tool. Apache Cassandra is the most popular tool for this task.

Apache Cassandra is an open-source NoSQL database capable of handling large volumes of unstructured data types. This tool was made an open-source project in 2008 and later owned by Apache in 2009. Cassandra works on a peer-to-peer design based on two main products i.e. DynamoDB and Google’s Big Table. With this model, all the nodes in the cluster have equal read/write permissions and no master nodes are required. The amazing feature with Cassandra is that one has the ability to add endless nodes to the cluster and expand it as per their desire.

The Apache Cassandra offers the below key features and benefits:

  • Fast Writes – SInce data handled here is unstructured, you can just chuck the data into the database at ridiculous speeds.
  • Highly scalable – you can add endless nodes to your cluster at any given time. Cassandra is meant to grow horizontally as much as you need it.
  • Fault tolerance – Since all nodes are treated equally, when one goes down, it’s not a real big deal.
  • Tunable consistency – Performance tuning can be performed on top of your typical JVM performance tuning. The table level compression options can also be configured when creating tables.
  • Cassandra Query Language – SInce Cassandra is NoSQL, you can move data horizontally across the clusters easier, have the potential for massive scalability, and is not subject to the confines of joins and fixed schemas.

This guide offers a step-by-step illustration of how to install and configure Apache Cassandra on Alma Linux 8|Oracle Linux 8.

Step 1 – Update your System.

Begin by refreshing the repository cache and updating all the packages on your system.

sudo dnf update

Now install the EPEL repository on Alma Linux 8|Oracle Linux 8

sudo dnf install yum-utils
sudo dnf install epel-release

Enable PowerTools as below.

sudo dnf config-manager --set-enabled powertools

Step 2 – Install Java on Alma Linux 8|Oracle Linux 8.

Since Apache Cassandra is written in Java, we need to have Java installed on our system before we proceed. In this guide, we will install the OpenJDK 11 to offer the Java runtime environment as below.

sudo dnf install java-11-openjdk

Once installed, verify the version.

$ java --version
openjdk 11.0.14 2022-01-18 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.14+9-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.14+9-LTS, mixed mode, sharing)

Step 3 – Add Apache Cassandra Repository on Alma Linux 8|Oracle Linux 8.

The Cassandra packages are not available in the default Alma Linux 8|Oracle Linux 8 package repositories and therefore need to be added. The main benefit of installing Cassandra from the official repositories is that we are guaranteed of latest software updates using the simple update command.

Create the repository using your favorite editor.

sudo vi /etc/yum.repos.d/cassandra.repo

In the file, add the lines:

[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS

This repository belongs to version 4.0 which is currently the latest version although there are older releases for Cassandra.

Save the file and update your package index.

sudo dnf update -y

Step 4 – Install Apache Cassandra on Alma Linux 8|Oracle Linux 8.

Using the added repository above, we can easily install the latest version of Apache Cassandra on Alma Linux 8|Oracle Linux 8 using the command:

sudo dnf install cassandra 

Dependency Tree:

Dependencies resolved.
================================================================================
 Package                        Arch   Version                 Repository  Size
================================================================================
Installing:
 cassandra                      noarch 4.0.2-1                 cassandra   45 M
Installing dependencies:
 java-1.8.0-openjdk-headless-slowdebug
                                x86_64 1:1.8.0.322.b06-2.el8_5 powertools  36 M
 java-1.8.0-openjdk-slowdebug   x86_64 1:1.8.0.322.b06-2.el8_5 powertools 345 k

Transaction Summary
================================================================================
Install  3 Packages

Total download size: 81 M
Installed size: 194 M
Is this ok [y/N]: y

Accept the GPG key importation and proceed with the installation.

Step 5 – Start and Enable the Cassandra service.

Once installed, we are required to start and enable the Cassandra service to run automatically on system boot. This can be done using the commands below:

sudo service cassandra start
sudo systemctl enable cassandra

Verify if the service is running:

$ systemctl status cassandra
 cassandra.service - LSB: distributed storage system for structured data
   Loaded: loaded (/etc/rc.d/init.d/cassandra; generated)
   Active: active (running) since Wed 2022-02-16 05:16:34 EST; 11s ago
     Docs: man:systemd-sysv-generator(8)
 Main PID: 32646 (java)
    Tasks: 16 (limit: 36433)
   Memory: 1.7G
   CGroup: /system.slice/cassandra.service
           └─32646 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-2.el8_5.x86_64-slowdebug/jre/bin/java -ea -da:net.openhft... -XX:+UseThrea>

You can also verify if Cassandra is running on localhost:9042 as below. Remember to wait until Cassandra has finished loading all the modules.

nodetool status

Output:

Install-and-Configure-Apache-Cassandra-on-Alma-LinuxOracle-Linux-1

Step 6 – Install the Cassandra Query Language.

To be able to interact with Cassandra, we need the cqlshtool which is compatible with Python 2.7 or Python 3.6+. In this guide, we will go for Python 3.8 installed as below.

sudo dnf install python38

If you have multiple versions. you may be required to set the default Python version as below.

$ sudo update-alternatives --config python3
There are 2 programs which provide 'python3'.

  Selection    Command
-----------------------------------------------
*+ 1           /usr/bin/python3.6
   2           /usr/bin/python3.8

Enter to keep the current selection[+], or type selection number: 2

Check the Python version.

$ python3 --version
Python 3.8.8

Now using PIP, install the cqlsh tool.

pip3 install --user cqlsh

Sample output:

Collecting cqlsh
  Downloading https://files.pythonhosted.org/packages/af/62/88bf9200252158871843a1f65c5215a5480f64817b663ff8ece41ad0f977/cqlsh-6.0.0-py3-none-any.whl (106kB)
     |████████████████████████████████| 112kB 12.3MB/s 
Collecting cql
  Downloading https://files.pythonhosted.org/packages/0b/15/523f6008d32f05dd3c6a2e7c2f21505f0a785b6dc8949cad325306858afc/cql-1.4.0.tar.gz (76kB)
     |████████████████████████████████| 81kB 2.9MB/s 
Collecting six
  Downloading https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl
Collecting cassandra-driver
  Downloading https://files.pythonhosted.org/packages/0b/c6/77ffe96b897a6dbf867847bf1c8ebf72ca9881fffbc08c06a206a33ce1e1/cassandra_driver-3.25.0-cp38-cp38-manylinux1_x86_64.whl (3.6MB)
     |████████████████████████████████| 3.6MB 51.2MB/s 
Collecting thrift
  Downloading https://files.pythonhosted.org/packages/6e/97/a73a1a62f62375b21464fa45a0093ef0b653cb14f7599cffce35d51c9161/thrift-0.15.0.tar.gz (59kB)
     |████████████████████████████████| 61kB 1.8MB/s 
Collecting geomet<0.3,>=0.1
  Downloading https://files.pythonhosted.org/packages/c9/81/156ca48f950f833ddc392f8e3677ca50a18cb9d5db38ccb4ecea55a9303f/geomet-0.2.1.post1-py3-none-any.whl
Collecting click
......

Verify the installation.

$ cqlsh --version
cqlsh 6.0.0

Step 7 – Configure Apache Cassandra on Alma Linux 8|Oracle Linux 8.

The Apache Cassandra configuration files are located under /etc/cassandra, Java start-up can be configured under /etc/default/cassandra.

7.1. Configure Storage

This step is for those who wish to configure a secondary disk to serve as the Apache Cassandra storage. By default, Apache Cassandra stores its data is at /var/lib/cassandra. However in this guide, we will configure this storage by mounting an external disk on this path for the data storage.

First, identify the secondary attached storage.

$ lsblk
NAME               MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                  8:0    0   20G  0 disk 
sr0                 11:0    1 1024M  0 rom  
vda                252:0    0   40G  0 disk 
├─vda1             252:1    0    1G  0 part /boot
└─vda2             252:2    0   39G  0 part 
  ├─almalinux-root 253:0    0   35G  0 lvm  /
  └─almalinux-swap 253:1    0    4G  0 lvm  [SWAP]

Here, the secondary disk is /dev/sda. Format the disk to EXT4 using the mkfs command

sudo mkfs.ext4 /dev/sda

Sample output

mke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done                            
Creating filesystem with 5242880 4k blocks and 1310720 inodes
Filesystem UUID: 5c3c4032-637b-4e07-9772-83fe0425a6bd
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

Verify if the partition has been created.

$ lsblk -f 
NAME         FSTYPE      LABEL UUID                                   MOUNTPOINT
sda          ext4              5c3c4032-637b-4e07-9772-83fe0425a6bd   
sr0                                                                   
vda                                                                   
├─vda1       xfs               d64815a0-ceaa-42d9-a5c0-075079daf099   /boot
└─vda2       LVM2_member       HH9t2V-12NT-iKyk-sEST-HbsT-Auf1-n14VM1 
  ├─almalinux-root
  │          xfs               7872878f-1b01-4717-97e2-1f045e3685e9   /
  └─almalinux-swap
             swap              dbf263b7-aa5a-44d9-8a19-4c1c7adfa966   [SWAP]

Now we will mount this disk to /var/lib/cassandra as below.

sudo cp -r /var/lib/cassandra /var/lib/cassandra.bak
sudo mount /dev/sda /var/lib/cassandra

Verify the mounting:

$ df -hT -P /var/lib/cassandra
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sda       ext4   20G   45M   19G   1% /var/lib/cassandra

Restore the backup and set the right permissions.

sudo mv /var/lib/cassandra.bak/* /var/lib/cassandra
sudo chown -R cassandra:cassandra /var/lib/cassandra

Now configure permanent mounting as below.

$ sudo vim /etc/fstab
/dev/sda /var/lib/cassandra ext4 defaults 0 0

Now we have secondary storage configured as the Apache Cassandra datastore. For the changes to apply, restart Cassandra

sudo systemctl restart cassandra

7.2. Change the cluster name

After the configurations have been made, switch to the CQL Shell using the command:

cqlsh

Sample output:

Install-and-Configure-Apache-Cassandra-on-Alma-LinuxOracle-Linux-1-1

Change the cluster name using the below steps:

First, run the command below in the CQL Shell

UPDATE system.local SET cluster_name = 'My CLuster' WHERE KEY = 'local';

Exit the shell:

exit;

Now edit the Cassandra file below.

sudo vi /etc/cassandra/default.conf/cassandra.yaml

Replace the Cluster name with the set name as below.

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'My CLuster'

Now restart the service.

nodetool flush system
sudo systemctl restart cassandra

Verify if the changes have been made.

cqlsh

Once in the shell, use the command below to check the cluster name.

DESC CLUSTER

Sample Output:

Install-and-Configure-Apache-Cassandra-on-Alma-LinuxOracle-Linux-2

7.3. Enable User Authentication

We will begin by taking a backup of the available file before we edit it.

sudo cp /etc/cassandra/conf/cassandra.yaml /etc/cassandra/conf/cassandra.yaml.backup

Now open the file:

sudo vi /etc/cassandra/conf/cassandra.yaml

To enable user authentication, make the below changes:

.....
authenticator: org.apache.cassandra.auth.PasswordAuthenticator
.....
authorizer: org.apache.cassandra.auth.CassandraAuthorizer
......
roles_validity_in_ms: 0
......
permissions_validity_in_ms: 0
.......

Save the file and restart Cassandra.

sudo systemctl restart cassandra

7.4 . Create an Admin user for your Database

Begin by logging in to the shell with the default user credentials as below:

cqlsh -u cassandra -p cassandra

Now create a user with the command below, replacing appropriately:

CREATE ROLE user1 WITH PASSWORD = 'Passw0rd' AND SUPERUSER = true AND LOGIN = true;

Remember to replace user1 and Passw0rd with the preferred user credentials. Once created, exit the shell.

exit;

Now try logging in using the created user.

cqlsh -u user1 -p Passw0rd

Sample Output:

Install-and-Configure-Apache-Cassandra-on-Alma-LinuxOracle-Linux-3

Once in the shell, you can disable the default superuser rights.

ALTER ROLE cassandra WITH PASSWORD = 'cassandra' AND SUPERUSER = false AND LOGIN = false;

Now grant all permissions to the created user.

GRANT ALL PERMISSIONS ON ALL KEYSPACES TO 'user1';
exit;

7.5. Access Apache Cassandra Remotely.

By default, Apache Cassandra is set to listen on localhost. However, you can configure it to be accessed remotely by making adjustments to the config file as below.

sudo vi /etc/cassandra/default.conf/cassandra.yaml

In the file, make the below changes:

# For security reasons, you should not expose this port to the internet.  Firewall it if needed.
rpc_address: 192.168.205.3

Save the file and restart Cassandra.

sudo systemctl restart cassandra

Verify if the service is listening on the set IP address:

$ sudo ss -plunt|grep 9042
tcp   LISTEN 0      128     192.168.205.3:9042       0.0.0.0:*    users:(("java",pid=39432,fd=261))  

Allow the port through the firewall.

sudo firewall-cmd --permanent --add-port=9042/tcp
sudo firewall-cmd --reload

Now test the connection on a remote system with cqlsh installed.

cqlsh -u user1 192.168.205.3

Sample Output:

Install-and-Configure-Apache-Cassandra-on-Alma-LinuxOracle-Linux-4-1024x140

That is it!

Closing Thoughts

We have triumphantly walked through how to install and configure Apache Cassandra on Alma Linux 8|Oracle Linux 8. Furthermore, we have configured a secondary disk as the Apache Cassandra datastore and enabled remote access. You can now proceed and perform horizontal scaling on Cassandra. I hope this was significant.

coffee

Gravatar Image
A systems engineer with excellent skills in systems administration, cloud computing, systems deployment, virtualization, containers, and a certified ethical hacker.