Vector addition – CUDA

In this post, I will show you how to write a vector addition code using CUDA . The code is listed below:


// Includes
#include  <stdio.h>;
// CUDA includes
#include <cuda_runtime.h>;
#include <cutil_inline.h>;
#include <cuda_runtime_api.h>;

#define N 10               //Size of the array

//Kernel function
__global__ void add (float* a, float* b, float* c)
{
int tid = threadIdx.x + blockIdx.x * blockDim.x;     // A thread id
if (tid < N)
    {
        c[tid] = a[tid] + b[tid];
    }
}

int main()
{
    //Initialising inputs
    float* a;
    float* b;
    float* c;
    float* dev_a;
    float* dev_b;
    float* dev_c;

    //CUDA event timers
    cudaEvent_t start, stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
    float time;

    //Allocating memory on the host
    a = (float*)malloc(N*sizeof(float));
    b = (float*)malloc(N*sizeof(float));
    c = (float*)malloc(N*sizeof(float));

    for (int i = 0; i < N; ++i)
        {
            a[i] = (float)i;
            b[i] = (float)i;
            c[i] = 0.0;
        }

    //Allocating memory on the device
    cutilSafeCall(cudaMalloc( (void**)&dev_a, N*sizeof(float) ));
    cutilSafeCall(cudaMalloc( (void**)&dev_b, N*sizeof(float) ));
    cutilSafeCall(cudaMalloc( (void**)&dev_c, N*sizeof(float) ));

    //Copying data from host to device
    cutilSafeCall(cudaMemcpy(dev_a, a, N*sizeof(float), cudaMemcpyHostToDevice));
    cutilSafeCall(cudaMemcpy(dev_b, b, N*sizeof(float), cudaMemcpyHostToDevice));
    cutilSafeCall(cudaMemcpy(dev_c, c, N*sizeof(float), cudaMemcpyHostToDevice));

    //Starting CUDA timer
    cudaEventRecord(start, 0);

    //Launching kernel
    add<<<N,1 >>>(dev_a, dev_b, dev_c);
    cudaThreadSynchronize();

    //Stopping CUDA timer
    cudaEventRecord(stop, 0);

    cudaEventSynchronize(stop);
    cudaEventElapsedTime(&time, start, stop);
    cudaEventDestroy(start);
    cudaEventDestroy(stop);

    printf("Time taken by kernel: %f\n", time);

    //Copying data back to host
    cutilSafeCall(cudaMemcpy(c, dev_c, N*sizeof(float), cudaMemcpyDeviceToHost));
    for(int i = 0; i < N; ++i)
        {
            printf("c[%d] = %f\n",i,c[i]);
        }

    //Freeing memory
    cudaFree(dev_a);
    cudaFree(dev_b);
    cudaFree(dev_c);

    free(a);
    free(b);
    free(c);

    return 0;
}

Let me describe the code in detail.

Lines 1-6 includes the necessary header files.

Line 8 defines the size of the array. Well size 10 is too small a number for GPU vector addition. But for experimental purposes, this should be fine.

In lines 11-18, kernel function is defined. tid is a unique thread id.

Main function starts at line 20. In lines 23-28, input variables are defined.

In 31-33, CUDA event timers are defined which are defined to calculate the time taken on GPU. CPU timers might not have enough precision to measure the low times taken by the kernel on GPU.

In lines 37-39, memory is allocated on the host.In 41-46, inputs are initialized.

In 49-51, memory is allocated on device using cudaMalloccutilSafeCall makes sure that the commands are properly executed. If there’s any error in executing the command, cutilSafeCall returns an error at that line number. It’s a good practice to do this, to avoid bugs.

In lines 54-56, data is copied from host to device. This is done using cudaMemcpycudaMemcpyHostToDevice means the copy is from host to device.

In line 59, CUDA timer is started.

In line 62, CUDA kernel is called. It’s done using execution configuration syntax <<<  >>>. The first argument inside it represents the number of blocks, the second argument being the number of threads per block. More details on these numbers, I will discuss in future posts.

cudaThreadSynchronize in line 63 is sort of a barrier synchronization which makes sure that all the threads have reached a certain point, in this case the end of kernel.

In line 66, we stop the CUDA timer.

In line 76, results are copied back to host from device. Note the cudaMemcpyDeviceToHost flag.

In 83-89, we free up the memory.

Makefile:
I am giving a general Makefile for compiling a CUDA code. Further details regarding the flags used, I will discuss in future posts.

# Add the root directory for the NVidia SDK installation
ROOTDIR := [Path to NVIDIA_CUDA SDK]/C/src
# Keep the executable here
ROOTBINDIR := bin

# Add source files here
EXECUTABLE := vectoradd
# Cuda source files (compiled with cudacc)
CUFILES_sm_20 := vectoradd.cu
# CUDA Dependencies
CU_DEPS :=  \
# C/C++ source files (compiled with gcc / c++)
CCFILES := \

# Do not link with CUTIL
OMIT_CUTIL_LIB := 1

# Additional libraries needed by the project -po maxrregcount=15
USECUFFT := 1
CFLAGS = -pg -lc -fPIC -Wall -litpp -lblas -llapack
CUDACCFLAGS := --use_fast_math --ptxas-options=-v
#############################################################
# Rules and targets

include $(ROOTDIR)/../common/common.mk

Then type make in the terminal. Output :

ptxas info : Compiling entry function '_Z3addPfS_S_' for 'sm_30'
ptxas info : Function properties for _Z3addPfS_S_
 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 332 bytes cmem[0]
ptxas info : Compiling entry function '_Z3addPfS_S_' for 'sm_10'
ptxas info : Used 4 registers, 12+16 bytes smem, 4 bytes cmem[1]
ptxas info : Compiling entry function '_Z3addPfS_S_' for 'sm_20'
ptxas info : Function properties for _Z3addPfS_S_
 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 44 bytes cmem[0]

The executable is in the bin directory (bin/darwin/release)

./vectoradd

Time taken by kernel: 0.134816
c[0] = 0.000000
c[1] = 2.000000
c[2] = 4.000000
c[3] = 6.000000
c[4] = 8.000000
c[5] = 10.000000
c[6] = 12.000000
c[7] = 14.000000
c[8] = 16.000000
c[9] = 18.000000

Advertisements

Setting Up Single-Node Hadoop Cluster on a Mac

If you are a just starting out as a Software Engineer getting started with Cloud Computing platform, you will want to have a local instance to learn and experiment without having to go down the route of virtualization.

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

Hadoop is a top-level Apache project being built and used by a global community of contributors, using the Java programming language. Yahoo! has been the largest contributor to the project, and uses Hadoop extensively across its businesses.

In this post, I will describe how to set up a single node Apache Hadoop cluster in Mac OS (10.6.8).

  1. Ensure Java is installed. For me it was already pre-installed.
    To check if it’s installed, open the terminal and type java -version
    Terminal output:

    java version &quot;1.6.0_33&quot;
    Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-10M3720)
    Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)

    If you don’t have it, you can get it directly from Apple site here

  2. Download Hadoop tar file from here. Unzip it wherever you want. Preferably place it in non-root folder. In that way permission issues can be avoided. My directory was /Users/bharath/Documents/Hadoop/hadoop

    export HADOOP_HOME=/Users/bharath/Documents/Hadoop/hadoop
  3. Now cd into conf directory in hadoop folder. Modify hadoop-env.sh like this
    # The java implementation to use. Required.
    export JAVA_HOME=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
    # The maximum amount of heap to use, in MB. Default is 1000.
    export HADOOP_HEAPSIZE=2000
  4. Modify the hdfs-site.xml, core-site.xml, mapred-site.xml under conf
    hdfs-site.xml

     <configuration>
     <property>
     <name>dfs.replication</name>
     <value>1</value>
     </property>
     <property>
     <name>dfs.name.dir</name>
     <value>/Users/bharath/Documents/Hadoop/hadoop/dfs/name</value>
     </property>
     </configuration>

    core-site.xml

     <configuration>
     <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:9000</value>
     </property>
     <property>
     <name>hadoop.tmp.dir</name>
     <value>/Users/bharath/Documents/Hadoop/hadoop/tmp</value>
     </property>
     </configuration>

    mapred-site.xml

     <configuration>
     <property>
     <name>mapred.job.tracker</name>
     <value>localhost:9001</value>
     </property>
     </configuration>
  5. Next setup ssh on your Mac
    Make sure Remote Login is turned on. To check this, open System Preferences. Under Internet & Wireless, open Sharing. Make
    sure Remote Login is checked.
    We need to prepare a password-less login into localhost.
    First type ssh localhost in terminal. If it asks for a password, follow the below steps. Else you are good to go.
    In the terminal, type

    ssh-keygen -t rsa -P ""

    This will generate a pass key. In Mac OS, this key is stored in /var/root/.ssh in under Home directory
    Login as root (type sudo su in terminal)
    Then type, cd /var/root/.ssh
    Next type ls
    The key generated will be id_rsa.pub. We need to copy this into known_hosts
    To copy the key file, use the command:

    cat $HOME/var/root/.ssh/id_rsa.pub >> $HOME/.ssh/known_hosts
  6. Setting up HDFS for the first time
    cd into HADOOP_HOME
    Type

    bin/hadoop namenode -format

    The output should be something like below. You should see a statement “…. successfully formatted”

    2/09/19 15:44:53 INFO namenode.NameNode: STARTUP_MSG:
      /************************************************************
      STARTUP_MSG: Starting NameNode
      STARTUP_MSG: host = Bharath-Kumar-Reddys-MacBook-Air.local/172.20.10.2
      STARTUP_MSG: args = [-format]
      STARTUP_MSG: version = 0.20.2+737
      STARTUP_MSG: build = git://ubuntu64-build01.sf.cloudera.com/ on branch -r 98c55c28258aa6f42250569bd7fa431ac657b  dbd; compiled by 'root' on Tue Dec 14 11:50:19 PST 2010
      ************************************************************/
      12/09/19 15:44:54 INFO namenode.FSNamesystem: fsOwner=bharath
      12/09/19 15:44:54 INFO namenode.FSNamesystem: supergroup=supergroup
      12/09/19 15:44:54 INFO namenode.FSNamesystem: isPermissionEnabled=true
      12/09/19 15:44:54 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
      accessTokenLifetime=0 min(s)
      12/09/19 15:44:54 INFO common.Storage: Image file of size 113 saved in 0 seconds.
      12/09/19 15:44:54 INFO common.Storage: Storage directory /Users/bharath/Documents/Hadoop/hadoop/dfs/name has been successfully formatted.
      12/09/19 15:44:54 INFO namenode.NameNode: SHUTDOWN_MSG:
      /************************************************************
     SHUTDOWN_MSG: Shutting down NameNode at Bharath-Kumar-Reddys-MacBook-Air.local/172.20.10.2
    
  7. Do ssh into localhost
    ssh localhost
  8. Start the hadoop daemons
    $HADOOP_HOME/bin/start-all.sh

     The output should be like this:

    starting namenode, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-namenode-Bharath-Kumar-Reddys-MacBook-Air.local.out
    localhost: starting datanode, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-datanode-Bharath-Kumar-Reddys-MacBook-Air.local.out
    localhost: starting secondarynamenode, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath
    secondarynamenode-Bharath-Kumar-Reddys-MacBook-Air.local.out
    starting jobtracker, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-jobtracker-Bharath-Kumar-Reddys-MacBook-Air.local.out
    localhost: starting tasktracker, logging to /Users/bharath/Documents/Hadoop/hadoop/bin/../logs/hadoop-bharath-tasktracker
    Bharath-Kumar-Reddys-MacBook-Air.local.out
  9. Test to see if all the nodes are running
    $JAVA_HOME/bin/jps

    The output of the above command:

    2490 Jps
    2206 TaskTracker
    2071 SecondaryNameNode
    1919 NameNode
    2130 JobTracker
    1995 DataNode

    So, all the nodes are up and running. 🙂
    To see a list of ports opened, use the command

    lsof -i | grep LISTEN 
  10. Test the examples
    $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar

    The output :

    An example program must be given as the first argument.
    Valid program names are:
    aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
    aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
    dbcount: An example job that count the pageview counts from a database.
    grep: A map/reduce program that counts the matches of a regex in the input.
    join: A job that effects a join over sorted, equally partitioned datasets
    multifilewc: A job that counts words from several files.
    pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
    pi: A map/reduce program that estimates Pi using monte-carlo method.
    randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
    randomwriter: A map/reduce program that writes 10GB of random data per node.
    secondarysort: An example defining a secondary sort to the reduce.
    sleep: A job that sleeps at each map and reduce task.
    sort: A map/reduce program that sorts the data written by the random writer.
    sudoku: A sudoku solver.
    teragen: Generate data for the terasort
    terasort: Run the terasort
    teravalidate: Checking results of terasort
    wordcount: A map/reduce program that counts the words in the input files.
  11. Run pi example
    $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-examples-*.jar pi 10 100

    Last few lines of output:

    .....................................................
     12/09/19 16:14:20 INFO mapred.JobClient: Reduce output records=0
     12/09/19 16:14:20 INFO mapred.JobClient: Spilled Records=40
     12/09/19 16:14:20 INFO mapred.JobClient: Map output bytes=180
     12/09/19 16:14:20 INFO mapred.JobClient: Map input bytes=240
     12/09/19 16:14:20 INFO mapred.JobClient: Combine input records=0
     12/09/19 16:14:20 INFO mapred.JobClient: Map output records=20
     12/09/19 16:14:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=1240
     12/09/19 16:14:20 INFO mapred.JobClient: Reduce input records=20
     Job Finished in 76.438 seconds
     Estimated value of Pi is 3.14800000000000000000
  12. To stop the nodes, type
    $HADOOP_HOME/bin/stop-all.sh

Done and done!

If you get an error like : java.io.IOException: Tmp directory hdfs://localhost:9000/user/bharath/PiEstimator_TMP_3_141592654 already exists. Please remove it first.

Then in the terminal type,

$HADOOP_HOME/bin/hadoop fs -rmr hdfs://localhost:9000/user/bharath/PiEstimator_TMP_3_141592654 

References:

  1. http://www.chaceliang.com/blog/study/03-how-to-setup-hadoop-at-ur-macbook-pro/
  2. http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)
  3. http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/ (Good reference for troubleshooting)

Big Data – Information in the future and for the future!

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. Consider social-networking sites like Facebook or Twitter. Billions of users post comments, update their status, upload photos etc. Imagine how large such data would be. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is Big data.

Visualization of all editing activity by user “Pearle” on Wikipedia (Pearle is a robot)

The three Vs – volume, velocity and variety are commonly used to characterize different aspects of big data. Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it. A research report on Big data done by McKinsey can be found here.

Ok, now we have Big Data. What can be done with it?!

We can extract insight and intelligent information from an immense volume, variety and velocity of data in context, beyond what was previously possible

Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, a new platform of “big data” tools has arisen to handle sensemaking over large quantities of data, as in the Apache Hadoop Big Data Platform.

Some instances of big data :

  1. In total, the four main detectors at the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010 (13,000 terabytes)
  2. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress
  3. Facebook handles 40 billion photos from its user base
  4. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide
  5. The volume of business data worldwide, across all companies, doubles every 1.2 years, according to estimates
  6. Decoding the human genome originally took 10 years to process; now it can be achieved in one week
  7. Computational social science – Tobias Preis et al. used Google Trends data to demonstrate that Internet users from countries with a higher per capita gross domestic product (GDP) are more likely to search for information about the future than information about the past. The findings suggest there may be a link between online behavior and real-world economic indicators.The authors of the study examined Google queries logs made by Internet users in 45 different countries in 2010 and calculated the ratio of the volume of searches for the coming year (‘2011’) to the volume of searches for the previous year (‘2009’), which they call the ‘future orientation index’. They compared the future orientation index to the per capita GDP of each country and found a strong tendency for countries in which Google users enquire more about the future to exhibit a higher GDP. The results hint that there may potentially be a relationship between the economic success of a country and the information-seeking behavior of its citizens captured in big data.

Consider a big organization. You have some data to process. So, you get a cluster to process that data. Soon the data is increasing in volume. You can get more nodes to inlcude in that cluster. This can be very expensive. And over and above that there is cost of maintenance. But to what extent can you increase your cluster?! A better alternative would be to rent a cluster on time basis to process your data. Once you are done,  you can stop using it. Later, whenever you might require, you can rent again on-demand. This is exactly what Amazon Web Services provides.

Amazon Web Services (abbreviated AWS) is a collection of remote computing services (also called web services) that together make up a cloud computing platform, offered over the Internet by Amazon.com. The most central and well-known of these services are Amazon EC2 and Amazon S3. Apaarently, you can get an instance running on Amazon cloud for as low as 1 Rupee/hour!

Coming back to Hadoop, two main components constitute the cluster:
1. Hadoop Distributed File System (HDFS) – Storage Layer
2. Map Reduce – (Computation Layer)

Map-Reduce is a simple data-parallel programming model designed for scalability and fault-tolerance.

There are two types of Hadoop Clusters:

  1. Cloud Cluster
  2. Local Cluster

For learning, we can get started with a local cluster on our personal machines with a single node. In the next post, I will tell you how to set up a single-node hadoop cluster.

References:

  1. http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
  2. http://strata.oreilly.com/2012/01/what-is-big-data.html
  3. http://www-01.ibm.com/software/data/bigdata/

Qt is a cutie!

Qt is a cross platform application framework used for developing stunning GUI (Graphical User Interface) applications. Most notably it’s used in Autodesk Maya, Adobe Photoshop Elements, Skype, VLC media player, Mathematica. Giants like Dreamworks, Google, HP, Lucasfilm, Walt Disney Animation Studios, Research In Motion make use of this.

Qt uses standard C++. It can also be used with several other programming languages using language bindings. It runs on almost all desktop platforms and few mobile platforms. Non-GUI features include SQL database access, XML parsing, thread management, network support and a unified cross-platform application programming interface (API) for file handling. With Qt, you can reuse code efficiently to target multiple platforms with one code base.

  • Qt framework – intuitive APIs for C++ and CSS/JavaScript-like programming with Qt Quick for rapid UI creation
  • Qt Creator IDE – powerful cross-platform integrated development environment, including UI designer tools and on-device debugging
  • Tools and toolchains – All you need: simulator, local and remote compilers, internationalization support, device toolchains and more

An example demo of Qt can be seen below:

More information regarding Qt, documentation and installation files can be found here.

The Qt 5 Beta  is now available as the first major release under the new Qt Project umbrella. Major architectural changes are to be implemented in it. One that interests me the most is Qt 5 allows smooth accelerated graphics performance with limited resources. Qt 5 makes better use of the GPU to deliver better performance on inexpensive hardware. For example, using Qt 5 you can achieve 60 fps performance on a $35 single-board computer like Raspberry Pi. See here for Qt on Pi. There is also a Qt on iPhone project, purpose of which is to have the Qt framework run on the iPhone. Qt already runs on Android.

References:

  1. http://qt.nokia.com/
  2. http://qt-apps.org/
  3. http://labs.qt.nokia.com/

 

CUDA vs OpenCL

CUDA and OpenCL are two major programming frameworks for GPU computing. I have told briefly about them in one of the previous posts. Now, if you wanted to learn GPU Computing, which one to choose – CUDA or OpenCL?

Until recently, CUDA has attracted most of the attention from developers, especially in the High Performance Computing realm because of the good support from NVIDIA itself especially from the forums. But OpenCL is gaining ground rapidly. OpenCL software has now reached the point GPU programmers are taking a second look.

CUDA and OpenCL do mostly the same – it’s like Italians and French fighting over who has the most beautiful language, while they’re both Roman languages

nVidia’s CUDA is vendor-specific. It has better tools, better performance and there’s lot sample code, tools, documentation and utilities available. If you have an actual GPU project that you need to work on in the in short term and you can be certain that you only need to support high-end nVidia hardware, then CUDA is the way to go. OpenCL provides an open, industry-standard framework. As such, it has garnered support from nearly all processor manufacturers including AMD, Intel, and nVidia, as well as others that serve the mobile and embedded computing markets. As a result, applications developed in OpenCL are now portable across a variety of GPUs and CPUs. OpenCL, being an open standard, allows any vendor to implement OpenCL support on its products. Intel has announced that it will support OpenCL on future CPU products.

Ok, now you have two frameworks – which one to choose? Well, it depends on a lot of factors. If you are planning to implement a GPU project solely on nVidia’s cards, then CUDA is a better option. But if your application is to be deployed over a range of architectures then you need to work with OpenCL.

But to start off with, I personally prefer CUDA, because of the detailed documentation that nVidia has provided and also vast community support. You can post a question in nVidia forums (which are off-line now due to some security issues) and get clarifications from experts. And also there is Stackoverflow. The basic idea behind learning CUDA and OpenCL is the same. The skills and knowledge you develop while working with CUDA will mostly be transferrable to OpenCL later if needed. Also some tools like swan, convert a CUDA code into an OpenCL code. So, basically if you learn one, you can very easily work with the other. A good comparison of CUDA and OpenCL is shown here and here. You can also look in the references for more information.

Concluding,

CUDA

  • Better marketing
  • Good support and documentation
  • Many features and toolsets
  • Works only on nVidia cards

OpenCL

  • Supports many architectures
  • It’s open standard – which we always want
  • No proper documentation
  • Provided by different vendors in various packages – no universal package

Recently, OpenCL is gaining grounds on CUDA – this might be a reason that nVidia recently released its source code to developers and also stopped providing OpenCL support in newer releases of CUDA. Well, that indicates there is a stiff competition going on and I personally feel it’s only a matter of time that OpenCL will reach the level of CUDA.

References:

  1. http://www.streamcomputing.eu/blog/2011-06-22/opencl-vs-cuda-misconceptions/
  2. http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html
  3. http://wiki.tiker.net/CudaVsOpenCL
  4. http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/

Installing CUDA on Mac OS

CUDA is a parallel computing platform and programming model that enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

In this post, I will tell you how to get started with CUDA on Mac OS. To use CUDA on your system, you will need the following installed:

  1. CUDA-enabled GPU. A list of such GPUs is available here
  2. Mac OS X v. 10.5.6 or later (10.6.3 or later for 64-bit CUDA applications)
  3. The gcc compiler and toolchain installed using Xcode
  4. CUDA software (available at no cost from http://developer.nvidia.com/cuda/cuda-downloads)

Once you have verified that you have a supported NVIDIA processor and a supported version the Mac OS, you need to download the CUDA software. Download the following packages for the latest version of the Development Tools from the site above:

  1. CUDA Driver
  2. CUDA Toolkit
  3. GPU Computing SDK

Installation:

  1. Install the CUDA Driver
    Install the CUDA driver package by executing the installer and following the on-screen prompts. This will install /Library/Framework/CUDA.framework and the UNIX-compatibility stub /usr/local/cuda/lib/libcuda.dylib that refers to it
  2. Install the CUDA Toolkit
    Install the CUDA Toolkit by executing the Toolkit installer package and following the on-screen prompts. The CUDA Toolkit supplements the CUDA Driver with compilers and additional libraries and header files that are installed into /usr/local/cuda by default
  3. Define the environment variables
    – The PATH variable needs to include /usr/local/cuda/bin
    – 
    DYLD_LIBRARY_PATH needs to contain /usr/local/cuda/lib
    The typical way to place these values in your environment is with the following commands:
    export PATH=/usr/local/cuda/bin:$PATH
    export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PAT
    To make these settings permanent, place them in ~/.bash_profile
  4. Install CUDA SDK
    The default installation process places the files in/Developer/GPU Computing

To compile the examples, cd into /Developer/GPU Computing/C and type make. The resulting binaries will be installed under the home directory in /Developer/GPU Computing/C/bin/darwin/release

Verify the installation by running ./deviceQuery, the output of which should be something like this

Sample CUDA deviceQuery Program

Now, you are all set to start with CUDA programming!

 

References:

  1. CUDA getting started guide for Mac OS

Why Apple and Mac OS? – II

What’s a hardware with out software? Many think they need to buy many softwares for Mac OS. Frankly, until now I haven’t spent more than $100 on my software. Mac OS gives a very colorful environment to play with: you have Mach, FreeBSD, a nice driver development environment, and a lot of the system’s source code to go with it.

Windows is the odd-man out here. So, let me not talk about it. 😛 When I started using Linux during my college, I wondered why do people still use Windows?! (Ok…you can’t play games like in Windows). I have made heavy use of Linux in academics and now in my job. You can get a lot of the same, or similar software on Mac OS X. You can compile it yourself like you do in linux. Using Macports, you can install most of the softwares like you do on a linux machine. Like ‘sudo apt get install’ you have ‘sudo port install‘ on Mac OS. Mac OS  is representative of a “best-effort” approach – Apple took technology they had collected over the years, along with technology that had flourished in the open source world, and put together a reasonable system.

Personally, I prefer Mac OS to Linux. I found installing new stuff on Mac way easier than other platforms. I mostly use C/C++ in my coding. Once Xcode is installed, voilà! You have the necessary libraries requied to kick-off programming. Macs come with Ruby and Python installed, and an incredible C/C++/Objective C development environment from Xcode environment. The terminal base in linux is very similar to Linux. So, compiling and executing a program via terminal is the same as Linux. Newer versions of Xcode comes preinstalled with OpenCL. As a GPU progrmmer, that makes my life more easy. No need to install any extra libraries. Installing CUDA is also very easy. And also libraries like OpenGL are very easy to install. If you are familiar with linux, then you will be experiencing the same with a little make-up 😉

And I have also started with iPhone programming, and to deploy apps must have a Mac hardware!  You can’t test iPhone apps on a non-Mac hardware. Here again, Xcode simplifies things a lot. Mac OS runs pretty much all open source software flawlessly and, of course, it comes with all of the Unix command line tools and a well-built Terminal app for running them. So, those who are used to the terminal, you will not miss anything.

Here is a list of some of the softwares available for Mac OS.

The Spotlight feature easily lets you search any files on the disk. There are many productivity apps like Garagegand, iPhoto, iMovie, iWeb. Garageband is useful to start any project involving music. You can actually tune your guitar using the mic on a Mac machine. iPhoto lets you organise your photos accordingly. When you attach a new device, Mac OS will mostly recognise that and install the necessary drivers. Since the Mac hardware is so tightly controlled by Apple, Mac OS includes all of the drivers for everything, from graphics to USB, and things just tend to work. And another feature, Universal Access, using which even the blind users can interact with Mac software. Its called Voice Over and it’s exaplained here.

Then there is a great back up feature in Mac OS. Its called Time Machine. It’s as simple as clicking a ‘yes’ button when you put a hard-drive to use as a back-up. And if you want a Mac hardware but prefer working with Linux/windows environment, there’s way for that too. Firstly, Windows Parallels, which is a commercial software where you can install Linux or Windows in Mac OS and access them from. If you want a multi-boot system, you can use Boot Camp, which comes preloaded with Mac OS. It assists users in installing Windows/Linux through non-destructive partitioning of the hard-drive.

Everything said, I prefer to work with a Mac any day over other machines! But…

“Talk is cheap, I will show you the codes from next post” 😉