Installing PIL in Mac OS

Posted on February 10, 2013 by Bharath Kumar

PIL(Python Imaging Library) is an image processing library for python.

In this post I will show you how to install PIL.

Requisites:

Python should be already installed on the machine. To make sure, just type ‘python’ in terminal and see if its already there.
Xcode should already be installed

I installed PIL using pip.

If you dont have pip, do the following. Open the terminal and type the following:

curl -O http://pypi.python.org/packages/source/p/pip/pip-0.7.2.tar.gz
tar xzf pip-0.7.2.tar.gz
cd pip-0.7.2
sudo /usr/bin/python setup.py install

This will install pip. Once pip is installed, you can install PIL by following steps.

First make sure you have lib jpeg install for JPEG handling. To do that,

curl -O http://www.ijg.org/files/jpegsrc.v8c.tar.gz
tar zxvf jpegsrc.v8c.tar.gz
cd  jpeg-8c/
./configure
make
sudo make install

Then install PIL using pip

sudo pip install PIL

References:

bashrc in Mac OS

Posted on January 29, 2013 by Bharath Kumar

If you are working with CUDA or OpenCV on Mac OS, you need to provide proper paths before you compile. And in some cases you need to give these paths every time you open a terminal. In Linux, there is bashrc file in which we can add all the paths. But in Mac OS, I could not find it. We need to create a profile and it loads every time we open a terminal.

To make your own profile, open the terminal and type


sudo vi /etc/bashrc

At the end of this file, add the following line


source ~/.bashrc

Now if you restart the terminal, you get a warning :


-bash: /Users/username/.bashrc: No such file or directory

Now create this file


vim .bashrc

Add whatever the paths you want into this file. For example, I would add

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
export LD_LIBRARY_PATH=/usr

Save the file.

Now restart the terminal. Whenever you start a terminal from now on all the settings in bashrc file are loaded.

Done and done! 🙂

Using OpenCL with Eclipse on Mac OS

Posted on December 7, 2012 by Bharath Kumar

It’s very easy to execute an openCL code in Mac, by simply using the flag ‘-framework OpenCL’ during compilation.

But some times when dealing with large projects, it becomes necessary to have a visual structure of the code tree, just the command line won’t do the trick. In this post, I will show how to integrate an OpenCL project with Eclipse IDE.

Eclipse is an open-source, cross-platform IDE for developing applications using many languages. Firstly, make sure that you have OpenCL installed on your machine. Latest versions of Xcode are shipped with OpenCL, so no extra installation is required.

Install Eclipse IDE for C/C++ from here
Once installed, open the application
From File menu, select New -> C++ Project

One the project is created, add a C++ source file, name it as test.cpp and write the following code into it:

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

using namespace std;

int main(int argc, char* const argv[])
{

    //Number of devices
    cl_uint num_devices, i;

    //Getting device ids
    clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, 0, NULL, &num_devices);

    cl_device_id* devices = (cl_device_id*)calloc(sizeof(cl_device_id), num_devices);
    clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, num_devices, devices, NULL);

    char buf[128];
    for (i = 0; i < num_devices; i++)
        {
            clGetDeviceInfo(devices[i], CL_DEVICE_NAME, 128, buf, NULL);
            fprintf(stdout, "Device %s supports ", buf);

            clGetDeviceInfo(devices[i], CL_DEVICE_VERSION, 128, buf, NULL);

            //Printing device info
            fprintf(stdout, "%s\n", buf);
        }
    free(devices);
}

The above code basically prints out the details of the devices available on your machine.

Right click on the project and click settings. In C/C++ build, choose settings. Click MacOS X C++ Linker. In command field, it should be g++ -framework OpenCL
Click on GCC C++ compiler, in inlcudes field paste this
```
/System/Library/Frameworks/OpenCL.framework/Versions/A/Headers
```
Choose other flags as needed

Then, right click on project and select build. Then run your program.
This is the output from the console

Device GeForce 320M supports OpenCL 1.0
Device Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz supports OpenCL 1.0

Update: To use the same settings again for a different project, try the following

Start an Empty C++ project
Add a cpp source file and write some openCL code into it
In the Project Explorer, right click on the project and click Import…. You will see the following dialog box
In C/C++, select C/C++ Project Settings, then click Next

Create the file ‘OpenCL_properties.xml‘ and write the following into it

</pre>
<?xml version="1.0" encoding="UTF-8"?>
<cdtprojectproperties>
<section name="org.eclipse.cdt.internal.ui.wizards.settingswizards.IncludePaths">
<language name="Object File">

</language>
<language name="Assembly Source File">

</language>
<language name="C++ Source File">
<includepath>/System/Library/Frameworks/OpenCL.framework/Versions/A/Headers</includepath>

</language>
<language name="C Source File">

</language>
</section>
<section name="org.eclipse.cdt.internal.ui.wizards.settingswizards.Macros">
<language name="Object File">

</language>
<language name="Assembly Source File">

</language>
<language name="C++ Source File">

</language>
<language name="C Source File">

</language>
</section>
</cdtprojectproperties>
<pre>

For the Settings File, use the above xml file
One more thing left to do. Right click on the Project and select Properties
Under MacOS X C++ Linker, make sure you have g++ -framework OpenCL
Done and done! You can build and run the project now. You can use the same configuration file for other projects

Asilomar Conference

Posted on November 18, 2012 by Bharath Kumar

During my final year at Indian Institute of Technology Madras, I worked on accelerating a decoder for Polar Codes. My project was titled ‘A GPU implementation of Belief Propagation Decoder for Polar Codes’. I used GPUs to accelerate the decoding process for Polar Codes.

Polar Codes are a class of capacity achieving codes for any Binary-input Discrete Memoryless Channel (B-DMC). These are based on the concept of channel polarization which suggests that given N-independent copies of a channel, we can synthesize another set of N-channels, that show a polarization effect in the sense that as N grows large, the channels tend to become either completely noisy or completely noise-free, with the noise-free channels approaching the capacity. Channel polarization suggests that we transmit information with rate 1 over these noise-free channels, while fixing the symbols over the noisy-channels to values known to both the sender and receiver.

I implemented the Belief Propagation decoder for Polar Codes using GPUs and observed a good throughput rate. I submitted an extended abstract at Asilomar Conference on Signals, Systems and Computers, 2012. It was accepted and I was invited to present my results at the conference which was held at Asilomar Conference Grounds, Pacific Grove, California from November 4th to 7th.

Firstly, the place was really awesome. It was along the coast and the conference housing had a great view of the ocean. The lodging was like one of those medieval-style architecture.

Conference Housing

The place itself was really beautiful. The weather was pleasant. And I have been longing for a good beach for quite some time. My professor joined me for the conference. After checking into the hotel room, the first thing he said, “Come on, let’s go to the beach!” 😉

Asilomar Beach, Pacific Grove, California

It was a great place to watch sunset all the way.

Sunset at Asilomar beach

This was my paper

I might have been one of the youngest persons to present at this conference. And these are my presentation slides. My paper would soon be published in the IEEE proceedings. 🙂

presentation

Search and replace a word in a file using Vim editor

Posted on October 22, 2012 by Bharath Kumar

There are many times when you want to replace multiple occurrences of a word in a file with another word. In small files you can do it manually. But in large files, you can’t take on that method. Vim editor provides a simple command to do this. Say you want to replace a word ‘begin‘ with the word ‘end‘ from a file ‘test‘.

In the terminal,

vim test

Once you have the file opened, simply type this command

:%s/begin/end/g

This command will replace all the occurrences of ‘begin‘ with ‘end‘

Vector Addition – OpenCL

Posted on October 16, 2012 by Bharath Kumar

In this post, I will show you how to write a vector addition code using OpenCL . The code is listed below:

//Includes
#include <stdio.h>
#include <stdlib.h>
#include <iostream>

#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

#define DATA_SIZE 10

using namespace std;

const char *ProgramSource =
"__kernel void add(__global float *inputA, __global float *inputB, __global float *output)\n"\
"{\n"\
"  size_t id = get_global_id(0);\n"\
"  output[id] = inputA[id] + inputB[id];\n"\
"}\n";

int main(void)
{
cl_context context;
cl_context_properties properties[3];
cl_kernel kernel;
cl_command_queue command_queue;
cl_program program;
cl_int err;
cl_uint num_of_platforms=0;
cl_platform_id platform_id;
cl_device_id device_id;
cl_uint num_of_devices=0;
cl_mem inputA, inputB, output;

size_t global;

float inputDataA[DATA_SIZE]={1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
float inputDataB[DATA_SIZE]={1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
float results[DATA_SIZE]={0};

int i;

// retreive a list of platforms avaible
if (clGetPlatformIDs(1, &platform_id, &num_of_platforms)!= CL_SUCCESS)
{
printf("Unable to get platform_id\n");
return 1;
}

// try to get a supported GPU device
if (clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &num_of_devices) != CL_SUCCESS)
{
printf("Unable to get device_id\n");
return 1;
}

// context properties list - must be terminated with 0
properties[0]= CL_CONTEXT_PLATFORM;
properties[1]= (cl_context_properties) platform_id;
properties[2]= 0;

// create a context with the GPU device
context = clCreateContext(properties,1,&device_id,NULL,NULL,&err);

// create command queue using the context and device
command_queue = clCreateCommandQueue(context, device_id, 0, &err);

// create a program from the kernel source code
program = clCreateProgramWithSource(context,1,(const char **) &ProgramSource, NULL, &err);

// compile the program
if (clBuildProgram(program, 0, NULL, NULL, NULL, NULL) != CL_SUCCESS)
{
printf("Error building program\n");
return 1;
}

// specify which kernel from the program to execute
kernel = clCreateKernel(program, "add", &err);

// create buffers for the input and ouput

inputA = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * DATA_SIZE, NULL, NULL);
inputB = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * DATA_SIZE, NULL, NULL);
output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * DATA_SIZE, NULL, NULL);

// load data into the input buffer
clEnqueueWriteBuffer(command_queue, inputA, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputDataA, 0, NULL, NULL);
clEnqueueWriteBuffer(command_queue, inputB, CL_TRUE, 0, sizeof(float) * DATA_SIZE, inputDataB, 0, NULL, NULL);

// set the argument list for the kernel command
clSetKernelArg(kernel, 0, sizeof(cl_mem), &inputA);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &inputB);
clSetKernelArg(kernel, 2, sizeof(cl_mem), &output);

global=DATA_SIZE;

// enqueue the kernel command for execution
clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global, NULL, 0, NULL, NULL);
clFinish(command_queue);

// copy the results from out of the output buffer
clEnqueueReadBuffer(command_queue, output, CL_TRUE, 0, sizeof(float) *DATA_SIZE, results, 0, NULL, NULL);

// print the results
printf("output: ");

for(i=0;i<DATA_SIZE; i++)
{
printf("%f ",results[i]);
}

// cleanup - release OpenCL resources
clReleaseMemObject(inputA);
clReleaseMemObject(inputB);
clReleaseMemObject(output);
clReleaseProgram(program);
clReleaseKernel(kernel);
clReleaseCommandQueue(command_queue);
clReleaseContext(context);

return 0;

}

To compile the code on a Mac, open terminal and type

g++ -o add add.c -framework OpenCL

The output is :

output: 2.000000
4.000000
6.000000
8.000000
10.000000
12.000000
14.000000
16.000000
18.000000
20.000000

Take a look at line number 53, CL_DEVICE_TYPE_GPU is used to select a GPU device. Other alternatives for this flag include CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_ACCELERATOR, CL_DEVICE_TYPE_ALL etc . Refer to OpenCL documentation for more details (see here).

OpenCL code structure

Posted on October 8, 2012 by Bharath Kumar

OpenCL is the first open, royalty-free standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/embedded devices. OpenCL (Open Computing Language) greatly improves speed and responsiveness for a wide spectrum of applications in numerous market categories from gaming and entertainment to scientific and medical software.

The Khronos consortium that manages the OpenCL standard has developed an applications programming interface (API) that is general enough to run on significantly different architectures while being adaptable enough that each hardware platform can still obtain high performance. The OpenCL API is a C with a C++ Wrapper API that is defined in terms of the C API. There are third-party bindings for many languages, including Java, Python, and .NET. The code that executes on an OpenCL device, which in general is not the same device as the host CPU, is written in the OpenCL C language. OpenCL C is a restricted version of the C99 language with extensions appropriate for executing data-parallel code on a variety of heterogeneous devices.

Let’s get started with OpenCL program structure. In the process, I will also indicate the analogy between CUDA and OpenCL commands later. In this way, it will be easy to learn CUDA and OpenCL side by side. In general, writing a code in OpenCL can be generalized in the following steps:

Discover and initialize the platforms
Discover and initialize the devices
Create a context
Create a command queue
Create device buffers
Write host data to device buffers
Create and compile the program
Create the kernel
Set the kernel arguments
Configure the work-item structure
Enqueue the kernel for execution
Read the output buffer back to the host
Release OpenCL resources

Discover and initialize the platforms

In the OpenCL platform model, there is a single host that coordinates execution on one or more devices. The API function clGetPlatformIDs( ) is used to discover the set of available platforms for a given system.

Discover and initialize the devices

clGetDeviceIDs( ) is used to discover the devices. clGetDeviceInfo( ) is called to retrieve information such as name, type, and vendor from each device.

Create a context

A context is an abstract container that exists on the host. A context coordinates the mechanisms for host–device interaction, manages the memory objects that are available to the devices, and keeps track of the programs and kernels that are created for each device. The API function to create a context is clCreateContext( ).

Create a command queue

Communication with a device occurs by submitting commands to a command queue. The command queue is the mechanism that the host uses to request action by the device. The API clCreateCommandQueue( ) is used to create a command queue and associate it with a device.

Create device buffers

In order for data to be transferred to a device, it must first be encapsulated as a memory object. The API function clCreateBuffer( ) allocates the buffer and returns a memory object.

Write host data to device buffers

Data contained in host memory is transferred to and from an OpenCL buffer using the commands clEnqueueWriteBuffer( ) and clEnqueueReadBuffer( ), respectively.

Create and compile the program

OpenCL C code is called a program. A program is a collection of functions called kernels, where kernels are units of execution that can be scheduled to run on a device.

The process of creating a kernel is as follows:

The OpenCL C source code is stored in a character string. If the source code is stored in a file on a disk, it must be read into memory and stored as a character array.
The source code is turned into a program object,cl_program,by calling clCreate ProgramWithSource( ).
The program object is then compiled, for one or more OpenCL devices, with clBuildProgram( ).

Create the kernel

Now we have to obtain a cl_kernel object that can be used to execute kernels on a device is to extract the kernel from the cl_program. Extracting a kernel from a program is similar to obtaining an exported function from a dynamic library. The name of the kernel that the program exports is used to request it from the compiled program object. The name of the kernel is passed to clCreateKernel( ), along with the program object, and the kernel object will be returned if the program object was valid and the particular kernel is found.

Set the kernel arguments

Each kernel argument individually using the function clSetKernelArg( ).

Configure the work-item structure

Define an index space (global work size) of work items for execution.

Enqueue the kernel for execution

Requesting that a device begin executing a kernel is done with a call to clEnqueueNDRangeKernel( ).

Read the output buffer back to the host

Use clEnqueueReadBuffer( ) to read the OpenCL output

Release OpenCL resources

This is done using appropriate clRelease commands.

In the next post, I will show an OpenCL equivalent of CUDA vector addition code from the previous post and then the command analogy between CUDA and OpenCL.

References:

http://www.khronos.org/opencl/
http://en.wikipedia.org/wiki/OpenCL
Book – Heterogeneous Computing With OpenCL: http://www.amazon.com/Heterogeneous-Computing-OpenCL-Benedict-Gaster/dp/0123877660

Vector addition – CUDA

Posted on September 26, 2012 by Bharath Kumar

In this post, I will show you how to write a vector addition code using CUDA . The code is listed below:


// Includes
#include  <stdio.h>;
// CUDA includes
#include <cuda_runtime.h>;
#include <cutil_inline.h>;
#include <cuda_runtime_api.h>;

#define N 10               //Size of the array

//Kernel function
__global__ void add (float* a, float* b, float* c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;     // A thread id
if (tid < N)
    {
        c[tid] = a[tid] + b[tid];
    }
}

int main()
{
    //Initialising inputs
    float* a;
    float* b;
    float* c;
    float* dev_a;
    float* dev_b;
    float* dev_c;

    //CUDA event timers
    cudaEvent_t start, stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
    float time;

    //Allocating memory on the host
    a = (float*)malloc(N*sizeof(float));
    b = (float*)malloc(N*sizeof(float));
    c = (float*)malloc(N*sizeof(float));

    for (int i = 0; i < N; ++i)
        {
            a[i] = (float)i;
            b[i] = (float)i;
            c[i] = 0.0;
        }

    //Allocating memory on the device
    cutilSafeCall(cudaMalloc( (void**)&dev_a, N*sizeof(float) ));
    cutilSafeCall(cudaMalloc( (void**)&dev_b, N*sizeof(float) ));
    cutilSafeCall(cudaMalloc( (void**)&dev_c, N*sizeof(float) ));

    //Copying data from host to device
    cutilSafeCall(cudaMemcpy(dev_a, a, N*sizeof(float), cudaMemcpyHostToDevice));
    cutilSafeCall(cudaMemcpy(dev_b, b, N*sizeof(float), cudaMemcpyHostToDevice));
    cutilSafeCall(cudaMemcpy(dev_c, c, N*sizeof(float), cudaMemcpyHostToDevice));

    //Starting CUDA timer
    cudaEventRecord(start, 0);

    //Launching kernel
    add<<<N,1 >>>(dev_a, dev_b, dev_c);
    cudaThreadSynchronize();

    //Stopping CUDA timer
    cudaEventRecord(stop, 0);

    cudaEventSynchronize(stop);
    cudaEventElapsedTime(&time, start, stop);
    cudaEventDestroy(start);
    cudaEventDestroy(stop);

    printf("Time taken by kernel: %f\n", time);

    //Copying data back to host
    cutilSafeCall(cudaMemcpy(c, dev_c, N*sizeof(float), cudaMemcpyDeviceToHost));
    for(int i = 0; i < N; ++i)
        {
            printf("c[%d] = %f\n",i,c[i]);
        }

    //Freeing memory
    cudaFree(dev_a);
    cudaFree(dev_b);
    cudaFree(dev_c);

    free(a);
    free(b);
    free(c);

    return 0;
}

Let me describe the code in detail.

Lines 1-6 includes the necessary header files.

Line 8 defines the size of the array. Well size 10 is too small a number for GPU vector addition. But for experimental purposes, this should be fine.

In lines 11-18, kernel function is defined. tid is a unique thread id.

Main function starts at line 20. In lines 23-28, input variables are defined.

In 31-33, CUDA event timers are defined which are defined to calculate the time taken on GPU. CPU timers might not have enough precision to measure the low times taken by the kernel on GPU.

In lines 37-39, memory is allocated on the host.In 41-46, inputs are initialized.

In 49-51, memory is allocated on device using cudaMalloc. cutilSafeCall makes sure that the commands are properly executed. If there’s any error in executing the command, cutilSafeCall returns an error at that line number. It’s a good practice to do this, to avoid bugs.

In lines 54-56, data is copied from host to device. This is done using cudaMemcpy. cudaMemcpyHostToDevice means the copy is from host to device.

In line 59, CUDA timer is started.

In line 62, CUDA kernel is called. It’s done using execution configuration syntax <<< >>>. The first argument inside it represents the number of blocks, the second argument being the number of threads per block. More details on these numbers, I will discuss in future posts.

cudaThreadSynchronize in line 63 is sort of a barrier synchronization which makes sure that all the threads have reached a certain point, in this case the end of kernel.

In line 66, we stop the CUDA timer.

In line 76, results are copied back to host from device. Note the cudaMemcpyDeviceToHost flag.

In 83-89, we free up the memory.

Makefile:
I am giving a general Makefile for compiling a CUDA code. Further details regarding the flags used, I will discuss in future posts.

# Add the root directory for the NVidia SDK installation
ROOTDIR := [Path to NVIDIA_CUDA SDK]/C/src
# Keep the executable here
ROOTBINDIR := bin

# Add source files here
EXECUTABLE := vectoradd
# Cuda source files (compiled with cudacc)
CUFILES_sm_20 := vectoradd.cu
# CUDA Dependencies
CU_DEPS :=  \
# C/C++ source files (compiled with gcc / c++)
CCFILES := \

# Do not link with CUTIL
OMIT_CUTIL_LIB := 1

# Additional libraries needed by the project -po maxrregcount=15
USECUFFT := 1
CFLAGS = -pg -lc -fPIC -Wall -litpp -lblas -llapack
CUDACCFLAGS := --use_fast_math --ptxas-options=-v
#############################################################
# Rules and targets

include $(ROOTDIR)/../common/common.mk

Then type make in the terminal. Output :

ptxas info : Compiling entry function '_Z3addPfS_S_' for 'sm_30'
ptxas info : Function properties for _Z3addPfS_S_
 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 332 bytes cmem[0]
ptxas info : Compiling entry function '_Z3addPfS_S_' for 'sm_10'
ptxas info : Used 4 registers, 12+16 bytes smem, 4 bytes cmem[1]
ptxas info : Compiling entry function '_Z3addPfS_S_' for 'sm_20'
ptxas info : Function properties for _Z3addPfS_S_
 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 44 bytes cmem[0]

The executable is in the bin directory (bin/darwin/release)

./vectoradd

Time taken by kernel: 0.134816
c[0] = 0.000000
c[1] = 2.000000
c[2] = 4.000000
c[3] = 6.000000
c[4] = 8.000000
c[5] = 10.000000
c[6] = 12.000000
c[7] = 14.000000
c[8] = 16.000000
c[9] = 18.000000

Setting Up Single-Node Hadoop Cluster on a Mac

Posted on September 22, 2012 by Bharath Kumar

If you are a just starting out as a Software Engineer getting started with Cloud Computing platform, you will want to have a local instance to learn and experiment without having to go down the route of virtualization.

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

Hadoop is a top-level Apache project being built and used by a global community of contributors, using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

In this post, I will describe how to set up a single node Apache Hadoop cluster in Mac OS (10.6.8).

Ensure Java is installed. For me it was already pre-installed.
To check if it’s installed, open the terminal and type java -version
Terminal output:
```
java version &quot;1.6.0_33&quot;
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-10M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
```
If you don’t have it, you can get it directly from Apple site here
Download Hadoop tar file from here. Unzip it wherever you want. Preferably place it in non-root folder. In that way permission issues can be avoided. My directory was /Users/bharath/Documents/Hadoop/hadoop
```
export HADOOP_HOME=/Users/bharath/Documents/Hadoop/hadoop
```

Now cd into conf directory in hadoop folder. Modify hadoop-env.sh like this

# The java implementation to use. Required.
export JAVA_HOME=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=2000

Modify the hdfs-site.xml, core-site.xml, mapred-site.xml under conf
hdfs-site.xml

 <configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.name.dir</name>
 <value>/Users/bharath/Documents/Hadoop/hadoop/dfs/name</value>
 </property>
 </configuration>

core-site.xml

 <configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:9000</value>
 </property>
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/Users/bharath/Documents/Hadoop/hadoop/tmp</value>
 </property>
 </configuration>

mapred-site.xml

 <configuration>
 <property>
 <name>mapred.job.tracker</name>
 <value>localhost:9001</value>
 </property>
 </configuration>

Next setup ssh on your Mac
Make sure Remote Login is turned on. To check this, open System Preferences. Under Internet & Wireless, open Sharing. Make
sure Remote Login is checked.
We need to prepare a password-less login into localhost.
First type ssh localhost in terminal. If it asks for a password, follow the below steps. Else you are good to go.
In the terminal, type
```
ssh-keygen -t rsa -P ""
```
This will generate a pass key. In Mac OS, this key is stored in /var/root/.ssh in under Home directory
Login as root (type sudo su in terminal)
Then type, cd /var/root/.ssh
Next type ls
The key generated will be id_rsa.pub. We need to copy this into known_hosts
To copy the key file, use the command:
```
cat $HOME/var/root/.ssh/id_rsa.pub >> $HOME/.ssh/known_hosts
```

Setting up HDFS for the first time
cd into HADOOP_HOME
Type

bin/hadoop namenode -format

The output should be something like below. You should see a statement “…. successfully formatted”

2/09/19 15:44:53 INFO namenode.NameNode: STARTUP_MSG:
  /************************************************************
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG: host = Bharath-Kumar-Reddys-MacBook-Air.local/172.20.10.2
  STARTUP_MSG: args = [-format]
  STARTUP_MSG: version = 0.20.2+737
  STARTUP_MSG: build = git://ubuntu64-build01.sf.cloudera.com/ on branch -r 98c55c28258aa6f42250569bd7fa431ac657b  dbd; compiled by 'root' on Tue Dec 14 11:50:19 PST 2010
  ************************************************************/
  12/09/19 15:44:54 INFO namenode.FSNamesystem: fsOwner=bharath
  12/09/19 15:44:54 INFO namenode.FSNamesystem: supergroup=supergroup
  12/09/19 15:44:54 INFO namenode.FSNamesystem: isPermissionEnabled=true
  12/09/19 15:44:54 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
  accessTokenLifetime=0 min(s)
  12/09/19 15:44:54 INFO common.Storage: Image file of size 113 saved in 0 seconds.
  12/09/19 15:44:54 INFO common.Storage: Storage directory /Users/bharath/Documents/Hadoop/hadoop/dfs/name has been successfully formatted.
  12/09/19 15:44:54 INFO namenode.NameNode: SHUTDOWN_MSG:
  /************************************************************
 SHUTDOWN_MSG: Shutting down NameNode at Bharath-Kumar-Reddys-MacBook-Air.local/172.20.10.2

Do ssh into localhost
```
ssh localhost
```

Start the hadoop daemons

$HADOOP_HOME/bin/start-all.sh

The output should be like this:

starting namenode, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-namenode-Bharath-Kumar-Reddys-MacBook-Air.local.out
localhost: starting datanode, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-datanode-Bharath-Kumar-Reddys-MacBook-Air.local.out
localhost: starting secondarynamenode, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath
secondarynamenode-Bharath-Kumar-Reddys-MacBook-Air.local.out
starting jobtracker, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-jobtracker-Bharath-Kumar-Reddys-MacBook-Air.local.out
localhost: starting tasktracker, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-tasktracker
Bharath-Kumar-Reddys-MacBook-Air.local.out

Test to see if all the nodes are running
```
$JAVA_HOME/bin/jps
```
The output of the above command:
```
2490 Jps
2206 TaskTracker
2071 SecondaryNameNode
1919 NameNode
2130 JobTracker
1995 DataNode
```
So, all the nodes are up and running. 🙂
To see a list of ports opened, use the command
```
lsof -i | grep LISTEN 
```

Test the examples

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar

The output :

An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.

Run pi example

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar pi 10 100

Last few lines of output:

.....................................................
 12/09/19 16:14:20 INFO mapred.JobClient: Reduce output records=0
 12/09/19 16:14:20 INFO mapred.JobClient: Spilled Records=40
 12/09/19 16:14:20 INFO mapred.JobClient: Map output bytes=180
 12/09/19 16:14:20 INFO mapred.JobClient: Map input bytes=240
 12/09/19 16:14:20 INFO mapred.JobClient: Combine input records=0
 12/09/19 16:14:20 INFO mapred.JobClient: Map output records=20
 12/09/19 16:14:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=1240
 12/09/19 16:14:20 INFO mapred.JobClient: Reduce input records=20
 Job Finished in 76.438 seconds
 Estimated value of Pi is 3.14800000000000000000

To stop the nodes, type
```
$HADOOP_HOME/bin/stop-all.sh
```

Done and done!

If you get an error like : java.io.IOException: Tmp directory hdfs://localhost:9000/user/bharath/PiEstimator_TMP_3_141592654 already exists. Please remove it first.

Then in the terminal type,

$HADOOP_HOME/bin/hadoop fs -rmr hdfs://localhost:9000/user/bharath/PiEstimator_TMP_3_141592654

References:

Big Data – Information in the future and for the future!

Posted on September 20, 2012 by Bharath Kumar

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. Consider social-networking sites like Facebook or Twitter. Billions of users post comments, update their status, upload photos etc. Imagine how large such data would be. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is Big data.

Visualization of all editing activity by user “Pearle” on Wikipedia (Pearle is a robot)

The three Vs – volume, velocity and variety are commonly used to characterize different aspects of big data. Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it. A research report on Big data done by McKinsey can be found here.

Ok, now we have Big Data. What can be done with it?!

We can extract insight and intelligent information from an immense volume, variety and velocity of data in context, beyond what was previously possible

Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, a new platform of “big data” tools has arisen to handle sensemaking over large quantities of data, as in the Apache Hadoop Big Data Platform.

Some instances of big data :

In total, the four main detectors at the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010 (13,000 terabytes)
Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress
Facebook handles 40 billion photos from its user base
FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide
The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates
Decoding the human genome originally took 10 years to process; now it can be achieved in one week
Computational social science – Tobias Preis et al. used Google Trends data to demonstrate that Internet users from countries with a higher per capita gross domestic product (GDP) are more likely to search for information about the future than information about the past. The findings suggest there may be a link between online behavior and real-world economic indicators.The authors of the study examined Google queries logs made by Internet users in 45 different countries in 2010 and calculated the ratio of the volume of searches for the coming year (‘2011’) to the volume of searches for the previous year (‘2009’), which they call the ‘future orientation index’. They compared the future orientation index to the per capita GDP of each country and found a strong tendency for countries in which Google users enquire more about the future to exhibit a higher GDP. The results hint that there may potentially be a relationship between the economic success of a country and the information-seeking behavior of its citizens captured in big data.

Consider a big organization. You have some data to process. So, you get a cluster to process that data. Soon the data is increasing in volume. You can get more nodes to inlcude in that cluster. This can be very expensive. And over and above that there is cost of maintenance. But to what extent can you increase your cluster?! A better alternative would be to rent a cluster on time basis to process your data. Once you are done, you can stop using it. Later, whenever you might require, you can rent again on-demand. This is exactly what Amazon Web Services provides.

Amazon Web Services (abbreviated AWS) is a collection of remote computing services (also called web services) that together make up a cloud computing platform, offered over the Internet by Amazon.com. The most central and well-known of these services are Amazon EC2 and Amazon S3. Apaarently, you can get an instance running on Amazon cloud for as low as 1 Rupee/hour!

Coming back to Hadoop, two main components constitute the cluster:
1. Hadoop Distributed File System (HDFS) – Storage Layer
2. Map Reduce – (Computation Layer)

Map-Reduce is a simple data-parallel programming model designed for scalability and fault-tolerance.

There are two types of Hadoop Clusters:

Cloud Cluster
Local Cluster

For learning, we can get started with a local cluster on our personal machines with a single node. In the next post, I will tell you how to set up a single-node hadoop cluster.

References:

GPU Enthusiast

Supercomputing for Humans

Author Archives: Bharath Kumar

Installing PIL in Mac OS

bashrc in Mac OS

Using OpenCL with Eclipse on Mac OS

Asilomar Conference

Search and replace a word in a file using Vim editor

Vector Addition – OpenCL

OpenCL code structure

Vector addition – CUDA