Showing posts with label CUDA. Show all posts
Showing posts with label CUDA. Show all posts

Saturday, 8 August 2020

CUDA Memory Architecture of GPU | CUDA GPU Architecture

                    In this post, we will see CUDA Memory Architecture of GPU | CUDA GPU Architecture | cuda memory architecture of gpu,cuda gpu architecture,cuda programming

Basics of CUDA Programming | CUDA Terminologies | Host, Device, Kernel, Thread, Block, Grid, Warp:
https://www.youtube.com/watch?v=lwA4SK-82rI

What is CUDA? / Basics of CUDA (Necessity of GPU, Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, Memory architecture of GPU):
https://www.comrevo.com/2015/05/what-is-cuda-basics-gpu-host-device-kernel-stream-multiprocessor-thread-block-grid-warp-memory-architecture-global-shared-constant-texture-local-registers.html 

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=9zeAcO2Etlk

Friday, 7 August 2020

CUDA Vector Addition Program | Basics of CUDA Programming with CUDA Array Addition with All Cases

                      In this post, we will see CUDA Vector Addition Program | Basics of CUDA Programming with CUDA Array Addition with All Cases | cuda vector addition,cuda programming,cuda programming tutorial,cuda programming c++,cuda programming model,cuda programming tutorial for beginners,cuda programming for beginners,cuda programming basics,cuda programming nvidia,cuda processor,cuda program for array addition

Blog link for Cuda program for addition of two one dimensional arrays:
https://www.comrevo.com/2015/05/cuda-program-for-addition-of-two-one-dimensional-arrays.html 

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=vo0eCxoAf68

Basics of CUDA Programming | CUDA Terminologies | Host, Device, Kernel, Thread, Block, Grid, Warp

                       In this post, we will see Basics of CUDA Programming | CUDA Terminologies | Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, gpu vs cpu,what is cuda,what is cuda cores,what is cuda cores in graphics cards,what is cuda gpu,what is cuda programming,what is cuda and opencl,what is cuda toolkit,what is cuda nvidia,what is cuda cores in gpu

Blog link for What is CUDA? / Basics of CUDA (Necessity of GPU, Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, Memory architecture of GPU):
https://www.comrevo.com/2015/05/what-is-cuda-basics-gpu-host-device-kernel-stream-multiprocessor-thread-block-grid-warp-memory-architecture-global-shared-constant-texture-local-registers.html 

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=lwA4SK-82rI&t=948s

Thursday, 30 July 2020

How to run CUDA program on Google Colab | How to run CUDA program online | Run CUDA prog without GPU

                   In this post, we will see How to run CUDA program on Google Colab | How to run CUDA program online | Run CUDA prog without GPU | how to run cuda program on google colab,how to run cuda program without gpu,how to run cuda program online,how to run cuda program,google colab gpu tutorial,google colab gpu example,google colab gpu vs cpu,google colab gpu acceleration,google colab tensorflow gpu

Link for steps to set up Google Colab for CUDA Programming:
https://medium.com/@harshityadav95/how-to-run-cuda-c-or-c-on-google-colab-or-azure-notebook-ea75a23a5962 

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=gggQ9-_crmU

Wednesday, 25 July 2018

How to run CUDA Program on Remote Machine

                   In this post, we will see how to run CUDA program on remote machine i.e. to run CUDA program on a system which doesn't have GPU and doesn't have CUDA installation and you want to use gpu of another system in  your network.

Sunday, 28 May 2017

Interview Questions on CUDA Programming

                  Following are the questions on CUDA programming generally asked in interviews. Go through them. 

Thursday, 9 March 2017

OpenCL Program for Vector / Array Addition

          To learn Parallel Computing with OpenCL, you should start with example of Array Addition as it illustrates the proper use of multi-threading paradigm.
           In this post, we will see OpenCL program for Array / Vector Addition.                        

Monday, 25 May 2015

CUDA Multi-GPU : To set a GPU of required Compute Capability as current GPU

              Suppose, your system has multiple CUDA enabled GPUs. Only one GPU can be set as current GPU. In this situation, you can set a particular GPU having specific compute capability(say 1.2) as a current GPU.
              There are two ways to achieve the same as follows:

Cuda program using Constant memory of a GPU


Monday, 18 May 2015

Cuda program for matrix multiplication using shared memory

                  In this post, we will see CUDA Matrix Multiplication Shared Memory | CUDA Matrix Multiplication Code and Tutorial | cuda matrix multiplication code,cuda matrix multiplication tutorial,cuda matrix multiplication shared memory,cuda matrix multiplication,cuda programming,cuda programming tutorial,cuda programming c++.

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=XeR400_QFXQ

            In last post, we have seen matrix multiplication in Cuda. In this post, we will see matrix multiplication using shared memory.
           Here, I have considered two matrices of sizes row1*col1 and row2*col2. Resultant matrix(product), definitely will be of size row1*col2. That's why, I have considered a two dimensional grid having row1*col2 blocks. Each block will be responsible to calculate one value of product. To find each value in product, there will be col1 or row2 number of multiplications. So I considered, Col1 number of threads in each block; So one thread for one multiplication. In short, total number of blocks = row1*col2 and in each block, number of threads = col1(or row2).
           I have used shared array p[] to save intermediate multiplication values. In each block, each thread will find out value of p[i]. __syncthreads() will make sure that all threads has finished their computation i.e. all values of array p[] are available which will be added together to get one value of product.

Thursday, 14 May 2015

Cuda: Finding Compute Capability of a GPU

What is Compute Capability?
           Compute capability represents the micro architecture generation of a gpu. e.g. 1.x represents Tesla micro architecture generation. Similarly 2.x represents Fermi, 3.x represents Kepler, 5.x represents Maxwell. The number before decimal point is called as major. It represents significant change in the generation of micro architecture. The number after the decimal point is called minor. It represents smaller change in the micro architecture generation. It is just like Android versions. 4.x is called Kitkat while 5.x is called Lollypop version. 4.1 and 4.2 shows the smaller changes in the Kitkat version.

Wednesday, 13 May 2015

Cuda program for multiplication of two matrices

                 For Cuda multiplication using shared memory, check Next Post. 

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=XeR400_QFXQ

                 In this post of "Cuda program for multiplication of two matrices", I have considered two cases:
1. Two dimensional blocks and one thread per block.
2. One block and two dimensional threads in that block.

CUDA program to add two matrices

                       In this post, we will see CUDA Matrix Addition | CUDA Program for Matrices Addition | CUDA Programming | cuda matrix addition,cuda programming,cuda programming tutorial,cuda programming c++,cuda programming model,cuda programming tutorial for beginners,cuda programming for beginners,cuda programming nvidia,cuda programming linux

Watch following Video:

                    
Watch on YouTube: https://www.youtube.com/watch?v=jTAxCbcxwJA

Here, two cases are considered.
1. Two dimensional blocks and one thread per block.
2. One block and two dimensional threads in that block.

Monday, 11 May 2015

Cuda program for addition of two one dimensional arrays

                              In this post, we will see CUDA Vector Addition Program | Basics of CUDA Programming with CUDA Array Addition with All Cases | cuda vector addition,cuda programming,cuda programming tutorial,cuda programming c++,cuda programming model,cuda programming tutorial for beginners,cuda programming for beginners,cuda programming basics,cuda programming nvidia,cuda processor,cuda program for array addition.

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=vo0eCxoAf68              

                       Following programs are checked on Nsight Eclipse. Nsight Eclipse is Eclipse IDE for C/C++ bundled with the libraries of CUDA. These results are checked on a system which has GPU of compute capability of 1.2. But you will get the same result on GPU of any compute capability for the following code.

             Here three cases are considered for addition of two arrays:
1. n blocks and one thread per block.
2. 1 block and n threads in that block.
3. m blocks and n threads per block.

            Program's complete code and the respective output is given below.


Saturday, 9 May 2015

What is CUDA? / Basics of CUDA (Necessity of GPU, Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, Memory architecture of GPU)

                     In this post, we will see Basics of CUDA Programming | CUDA Terminologies | Host, Device, Kernel, Stream Multiprocessor, Stream Processor, Thread, Block, Grid, Warp, gpu vs cpu,what is cuda,what is cuda cores,what is cuda cores in graphics cards,what is cuda gpu,what is cuda programming,what is cuda and opencl,what is cuda toolkit,what is cuda nvidia,what is cuda cores in gpu.

Watch following video:




Watch on YouTube: https://www.youtube.com/watch?v=lwA4SK-82rI


Why do we need GPU when already we have CPU?
              GPU stands for Graphics Processing Unit while CPU stands for Central Processing Unit. For CPUs which has say four cores, we at max can run four threads simultaneously (one thread on each core). In graphics(e.g. image) processing, so many pixels are processed simultaneously. For so much simultaneous processing, we need so many threads running simultaneously. But as I told before, for a CPU(which has limited number of cores), we can run limited number of threads simultaneously. Hence we need GPU which consisted of many more cores, specially in a case of graphics processing(e.g. Games on Computers).
               For your information, Nvidia's GPU GeForce GTX TITAN X consisted of 3072 cores. For it's specification, go through http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x/specifications.