Bhupendra Singh Sisodiya
4 min readApr 3, 2021

--

Ansble Task 11.1

Hadoop and Ansible…?

What is Ansible?

Ansible is an open-source automation tool. It is used for automating configuration management, cloud provisioning, application deployment, intra-service orchestration, and other IT tasks. It has its own declarative language to describe system configuration.

Ansible Playbook:

Ansible playbook is an ordered list of tasks, saved so you can run those tasks in that order repeatedly. Playbooks include variables as well as tasks. Playbooks are written in YAML and are easy to read, write, share and understand.

Some common terminologies:-

  1. Control Node: A control node is a Linux server that has Ansible installed on it and is used for managing remote hosts or nodes.
  2. Managed Node: The network devices (and/or servers) you manage with Ansible. Managed nodes are also sometimes called “hosts”. Ansible is not installed on managed nodes.
  3. Inventory: It’s a file that contains the list of the managed nodes. It contains their IP address, username, password, connection type, etc.

What is Hadoop?

Hadoop is the product of Apache. It is an open-source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers. It’s written in java. It is designed to scale up from a single server to thousands of machines, each offering local computation and storage.

Some common terminologies:-

  1. Namenode/Masternode: NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes/slave nodes.
  2. Datanode/Slavenode: DataNodes are the slave nodes in HDFS. Datanodes are responsible for storing actual data.

Installation And Configuration-Ansible

Step 1: Install Ansible

Command: pip3 install ansible

Step 2: Create an Inventory file

Add the managed node IP address, user name, password, and connection type in the inventory file.

Step 3: Configure Ansible

Create a directory using the command: mkdir /etc/ansible

add the inventory file location in the ansible configuration file: ansible.cfg and write host_key_checking=False to disable the ssh key.

Step 4: Check the Connectivity

Check the connectivity of the control node with managed nodes

Command: ansible all -m ping

Configuration of Hadoop Cluster

Step 1: Create an Ansible Playbook

Create a directory using command: mkdir <dir_name>

And then create a yml file.

Step 2: Write the Command

Step 3: Run the Playbook

Command: ansible-playbook <file_name.yml>

Step 4: Check the Managed Node

Check whether the java and hadoop applications are installed or not using the commands: java -version and hadoop version

To check whether the namenode/datanode running or not, use Command: jps

Now We know how to configure Hadoop cluster using Ansible.

Keep Learning & Keep Sharing

--

--