Wednesday, February 24, 2016

How To Install Cassandra and Run a Single-Node Cluster on Ubuntu 14.04


Cassandra, or Apache Cassandra, is a highly scalable open source NoSQL database system, achieving great performance on multi-node setups.
In this tutorial, you’ll learn how to install and use it to run a single-node cluster on Ubuntu 14.04.


To complete this tutorial, you will need the following:

Step 1 — Installing the Oracle Java Virtual Machine

Cassandra requires that the Oracle Java SE Runtime Environment (JRE) be installed. So, in this step, you'll install and verify that it's the default JRE.
To make the Oracle JRE package available, you'll have to add a Personal Package Archives (PPA) using this command:

$ sudo add-apt-repository ppa:webupd8team/java

Update the package database:

$ sudo apt-get update

Then install the Oracle JRE. Installing this particular package not only installs it but also makes it the default JRE. When prompted, accept the license agreement: 

$ sudo apt-get install oracle-java8-set-default
After installing it, verify that it's now the default JRE: 
$java -version 

You should see output similar to the following:
java version "1.8.0_60" Java(TM) SE Runtime Environment (build 1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

Step 2 — Installing Cassandra

We'll install Cassandra using packages from the official Apache Software Foundation repositories, so start by adding the repo so that the packages are available to your system. Note that Cassandra 2.2.2 is the latest version at the time of this publication. Change the 22x to match the latest version. For example, use 23x if Cassandra 2.3 is the latest version:

$ echo "deb 22x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

The add the repo's source:
$echo "deb-src 22x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list 
To avoid package signature warnings during package updates, we need to add three public keys from the Apache Software Foundation associated with the package repositories.
Add the first one using this pair of commands, which must be run one after the other:

 $   gpg --keyserver --recv-keys F758CE318D77295D
 $   gpg --export --armor F758CE318D77295D | sudo apt-key add -

Then add the second key:

 $   gpg --keyserver --recv-keys 2B5C1B00
 $   gpg --export --armor 2B5C1B00 | sudo apt-key add -

Then add the third:

  $  gpg --keyserver --recv-keys 0353B12C
  $  gpg --export --armor 0353B12C | sudo apt-key add -

Update the package database once again:

  $  sudo apt-get update

Finally, install Cassandra:


$ sudo apt-get install cassandra

If it is not running, the following output will be displayed:

* could not access pidfile for Cassandra
This is a well-known issue with the latest versions of Cassandra on 
Ubuntu. We'll try a few fixes. First, start by editing its init script. 
The parameter we're going to modify is on line 60 of that script, so 
open it using:
$sudo nano +60 /etc/init.d/cassandra
That line should read:  
Change it to:  
Close and save the file, then reboot the server:
$sudo reboot
$sudo shutdown -r now
After logging back in, Cassandra should now be running. Verify:
$sudo service cassandra status    
If you are successful, you will see: 

* Cassandra is running

Step 4 — Connecting to the Cluster

 If you were able to successfully start Cassandra, check the status of the cluster:

$sudo nodetool status

In the output, UN means it's Up and Normal:

Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 142.02 KB 256 ? 2053956d-7461-41e6-8dd2-0af59436f736 rack1 Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
Then connect to it using its interactive command line interface cqlsh.
You will see it connect:
Connected to Test Cluster at [cqlsh 5.0.1 | Cassandra 2.2.2 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh>
Type exit to quit:


Congratulations! You now have a single-node Cassandra cluster running on Ubuntu 14.04. More


Hadoop Installation : ssh-keygen -t rsa -P ""

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java6-installer
$ sudo addgroup hadoop
$ sudo adduser —ingroup hadoop hduser
$ sudo apt-get install openssh-server
$ su - hduser
$ ssh-keygen -t rsa -P ""
$ cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys
$ wget
$ cd /home/hduser
$ tar xzf hadoop-1.1.2.tar.gz
$ mv hadoop-1.1.2 hadoop
# Set Hadoop-related environment variables
export HADOOP_PREFIX=/home/hduser/hadoop
The next one points to the Java home directory. We need to make sure that it is pointing to Oracle Java
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
The last one is to update the PATH to include the Hadoop Home directory
# Add Hadoop bin/ directory to PATH
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
$ mkdir /home/hduser/tmp

A base for other temporary directories.
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.


The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.


Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.

$ hadoop namenode -format
$ jps
$ hadoop jar hadoop-examples-1.1.2.jar pi 3 10

Sunday, February 14, 2016

ssh localhost

Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
Please contact your system administrator.
Add correct host key in /home/hadoop/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/hadoop/.ssh/known_hosts:1
  remove with: ssh-keygen -f "/home/hadoop/.ssh/known_hosts" -R localhost
ECDSA host key for localhost has changed and you have requested strict checking.
Host key verification failed.

Saturday, February 13, 2016

How to setup password-less ssh to the slaves?

For setting up Hadoop on a cluster of machines, the master should be able to do a password-less ssh to start the daemons on all the slaves.

Class MR - master starts TaskTracker and the DataNode on all the slaves.

MRv2 (next generation MR) - master starts NodeManager and the DataNode on all the slaves.

Here are the steps to setup password-ssh. Ensure that port 22 is open on all the slave (`telnet slave-hostname 22` should connect).

1) Install openssh-client on the master

sudo apt-get install openssh-client
2) Install openssh-server on all the slaves
sudo apt-get install openssh-server
3) Generate the ssh key
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
4) Copy the key to all the slaves (replace username appropriately as the user starting the Hadoop daemons). Will be prompted for the password.
ssh-copy-id -i $HOME/.ssh/ username@slave-hostname
5) If the master also acts a slave (`ssh localhost` should work without a password)
cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

If hdfs/mapreduce are run as different users then the steps (3,4 and 5) have to be repeated for all the users.

How to test ?

1) Run `ssh user@slave-hostname`. It should get connected without prompting for a password.

Adding a new SSH key to the ssh-agent

When generating an SSH key, you'll need to add your newly created (or existing) SSH key to the ssh-agent.
Before adding a new SSH key to the ssh-agent, you should have:
Tip: If you used an existing SSH key rather than generating a new SSH key, you'll need to replace id_rsa in the above command with the name of your existing private key file.
  1. Ensure ssh-agent is enabled:
    # start the ssh-agent in the background
    eval "$(ssh-agent -s)"
    Agent pid 59566
  2. Add your SSH key to the ssh-agent:
    ssh-add ~/.ssh/id_rsa

Generating a new SSH key

After you've checked for existing SSH keys, you can generate a new SSH key to use for authentication.
Before generating a new SSH key, you should have checked for existing SSH keys.
  1. In the command line, paste the text below, substituting in your GitHub email address.
    ssh-keygen -t rsa -b 4096 -C ""
    # Creates a new ssh key, using the provided email as a label
    Generating public/private rsa key pair.
  2. When you're prompted to "Enter a file in which to save the key," press Enter. This accepts the default file location.
    Enter a file in which to save the key (/Users/you/.ssh/id_rsa): [Press enter]
  3. At the prompt, type a secure passphrase. For more information, see "Working with SSH key passphrases".
    Enter passphrase (empty for no passphrase): [Type a passphrase]
    Enter same passphrase again: [Type passphrase again]
  4. In the command line, copy the alphanumeric key fingerprint you see:
    The key fingerprint is:
    If you're using OpenSSH 6.8 or newer, the key fingerprint is:
  5. Add the SSH key fingerprint you've generated to the ssh-agent and your GitHub account. For more information, see "Adding a new SSH key to the ssh-agent" and "Adding a new SSH key to your GitHub account".

Switching remote URLs from SSH to HTTPS

  1. Open Terminal (for Mac and Linux users) or the command prompt (for Windows users).
  2. Change the current working directory to your local project.
  3. List your existing remotes in order to get the name of the remote you want to change.
    git remote -v
    origin (fetch)
    origin (push)
  4. Change your remote's URL from SSH to HTTPS with the git remote set-url command.
    git remote set-url origin
  5. Verify that the remote URL has changed.
    git remote -v
    # Verify new remote URL
    origin (fetch)
    origin (push)
The next time you git fetch, git pull, or git push to the remote repository, you'll be asked for your GitHub username and password.

Switching remote URLs from HTTPS to SSH

  1. Open Terminal (for Mac and Linux users) or the command prompt (for Windows users).
  2. Change the current working directory to your local project.
  3. List your existing remotes in order to get the name of the remote you want to change.
    git remote -v
    origin (fetch)
    origin (push)
  4. Change your remote's URL from HTTPS to SSH with the git remote set-url command.
    git remote set-url origin
  5. Verify that the remote URL has changed.
    git remote -v
    # Verify new remote URL
    origin (fetch)
    origin (push)


You may encounter these errors when trying to changing a remote.

No such remote '[name]'

This error means that the remote you tried to change doesn't exist:
git remote set-url sofake
fatal: No such remote 'sofake'
Check that you've correctly typed the remote name.