I am trying to print a dynamically allocated 2d array from my master process after receiving all its components from all other processes. By components I mean subarrays, or blocks.
I have made the code generic to the number of processes. The following diagram will help you see how the blocks are arranged in the complete array. Each block is handled by one process. Just for here though, let's assume that i run the program using 12 processes (natively i have 8 cores), using the command:
mpiexec -n 12 ./gather2dArray
This is the diagram, which targets specifically the 12 process scenario:
The answer by Jonathan in this question helped me a great deal, but unfortunately i have not been able to fully implement what i want.
I first create the blocks into each process, which i name them grid. Every array is a dynamically allocated 2d array. I also create the global array (universe) to be visible only by the master process (#0).
Finally i have to use MPI_Gatherv(...) to assemble all the subarrays into the global array. Then i proceed to display the local arrays and the global array.
When i run the program with the command above i get Segmentation fault when i reach the MPI_Gatherv(...) function. I can't figure out what i do incorrectly. I have provided complete code (heavily commented) below:
EDIT
I have fixed some wrongs in the code. Now MPI_Gatherv() is somewhat successful. I am able to print the entire first row of the global array correctly (i check the individual elements of the processes and they always match). But when i reach the second row some hieroglyphics appear and finally a segmentation fault. I haven't been able to figure out what is wrong there. Still looking into it..
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
void print2dCharArray(char** array, int rows, int columns);
int main(int argc, char** argv)
{
int master = 0, np, rank;
char version[10];
char processorName[20];
int strLen[10];
// Initialize MPI environment
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (np != 12) { MPI_Abort(MPI_COMM_WORLD,1); }
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// We need a different seed for each process
srand(time(0) ^ (rank * 33 / 4));
int nDims = 2; // array dimensions
int rows = 4, columns = 6; // rows and columns of each block
int prows = 3, pcolumns = 4; // rows and columns of blocks. Each block is handled by 1 process
char** grid = malloc(rows * sizeof(char*));
for (int i = 0; i < rows; i++)
grid[i] = malloc(columns * sizeof(char));
char** universe = NULL; // Global array
char* recvPtr; // Pointer to start of Global array
int Rows = rows * prows; // Global array rows
int Columns = columns * pcolumns; // Global array columns
int sizes[2]; // No of elements in each dimension of the whole array
int subSizes[2]; // No of elements in each dimension of the subarray
int startCoords[2]; // Starting coordinates of each subarray
MPI_Datatype recvBlock, recvMagicBlock;
if (rank == master){ // For the master's eyes only
universe = malloc(Rows * sizeof(char*));
for (int i = 0; i < Rows; i++)
universe[i] = malloc(Columns * sizeof(char));
// Create a subarray (a rectangular block) datatype from a regular, 2d array
sizes[0] = Rows;
sizes[1] = Columns;
subSizes[0] = rows;
subSizes[1] = columns;
startCoords[0] = 0;
startCoords[1] = 0;
MPI_Type_create_subarray(nDims, sizes, subSizes, startCoords, MPI_ORDER_C, MPI_CHAR, &recvBlock);
// Now modify the newly created datatype to fit our needs, by specifying
// (lower bound remains the same = 0)
// - new extent
// The new region / block will now "change" sooner, as soon as we reach a region of elements
// occupied by a new block, ie. every: (columns) * sizeof(elementType) =
MPI_Type_create_resized(recvBlock, 0, columns * sizeof(char), &recvMagicBlock);
MPI_Type_commit(&recvMagicBlock);
recvPtr = &universe[0][0];
}
// populate arrays
for (int y = 0; y < rows; y++){
for (int x = 0; x < columns; x++){
if (( (double) rand() / RAND_MAX) <= density)
grid[y][x] = '#';
else
grid[y][x] = '.';
}
}
// display local array
for (int i = 0; i < np; i++){
if (i == rank) {
printf("\n[Rank] of [total]: No%d of %d\n", rank, np);
print2dCharArray(grid, rows, columns);
}
MPI_Barrier(MPI_COMM_WORLD);
}
/* MPI_Gathering.. */
int recvCounts[np], displacements[np];
// recvCounts: how many chunks of data each process has -- in units of blocks here --
for (int i = 0; i < np; i++)
recvCounts[i] = 1;
// prows * pcolumns = np
// displacements: displacement relative to global buffer (universe) at which to place the
// incoming data block from process i -- in block extents! --
int index = 0;
for (int p_row = 0; p_row < prows; p_row++)
for (int p_column = 0; p_column < pcolumns; p_column++)
displacements[index++] = p_column + p_row * (rows * pcolumns);
// MPI_Gatherv(...) is a collective routine
// Gather the local arrays to the global array in the master process
// send type: MPI_CHAR (a char)
// recv type: recvMagicBlock (a block)
MPI_Gatherv(&grid[0][0], rows * columns, MPI_CHAR, //: parameters relevant to sender
recvPtr, recvCounts, displacements, recvMagicBlock, master, //: parameters relevant to receiver
MPI_COMM_WORLD);
// display global array
MPI_Barrier(MPI_COMM_WORLD);
if (rank == master){
printf("\n---Global Array---\n");
print2dCharArray(universe, Rows, Columns);
}
MPI_Finalize();
return 0;
}
void print2dCharArray(char** array, int rows, int columns)
{
int i, j;
for (i = 0; i < rows; i++){
for (j = 0; j < columns; j++){
printf("%c ", array[i][j]);
}
printf("\n");
}
fflush(stdout);
}
The following is the output I'm getting. No matter what i try, I cannot get past this. As you can see the first line of the global array is printed properly using the first 4 blocks of the 4 processes. When jumping to next line we get hieroglyphics..
hostname#User:~/mpi$ mpiexec -n 12 ./gather2darray
MPICH Version: 3User
Processor name: User
[Rank] of [total]: No0 of 12
. . # . . #
# . # # # .
. . . # # .
. . # . . .
[Rank] of [total]: No1 of 12
. . # # . .
. . . . # #
. # . . # .
. . # . . .
[Rank] of [total]: No2 of 12
. # # # . #
. # . . . .
# # # . . .
. . . # # .
[Rank] of [total]: No3 of 12
. . # # # #
. . # # . .
# . # . # .
. . . # . .
[Rank] of [total]: No4 of 12
. # . . . #
# . # . # .
# . . . . .
# . . . . .
[Rank] of [total]: No5 of 12
# # . # # .
# . . # # .
. . . . # .
. # # . . .
[Rank] of [total]: No6 of 12
. . # # . #
. . # . # .
# . . . . .
. . . # # #
[Rank] of [total]: No7 of 12
# # . # # .
. # # . . .
. . . . . #
. . . # # .
[Rank] of [total]: No8 of 12
. # . . . .
# . # . # .
. . . # . #
# . # # # .
[Rank] of [total]: No9 of 12
. . . . . #
. . # . . .
. . # . . #
. . # # . .
[Rank] of [total]: No10 of 12
. . . . # .
# . . . . .
. . # # . .
. . . # . #
[Rank] of [total]: No11 of 12
. # . . # .
. # . # # .
. . . # . .
. # . # . #
---Global Array---
. . # . . # . . # # . . . # # # . # . . # # # #
� � < * � � e { � � � � � �
J
*** Error in `./gather2darray': double free or corruption (out): 0x0000000001e4c050 ***
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
*** stack smashing detected ***: ./gather2darray terminated
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10979 RUNNING AT User
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Help will be very appreciated. Thanks in advance.
Your code is almost correct, you just forgotten an MPI important principle. When you are using an array on MPI functions, MPI assumes that your array memory is allocate continuously. So you have to change your 2 dims arrays allocations.
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <time.h>
void print2dCharArray(char** array, int rows, int columns);
int main(int argc, char** argv)
{
int master = 0, np, rank;
char version[10];
char processorName[20];
int strLen[10];
// Initialize MPI environment
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (np != 12) { MPI_Abort(MPI_COMM_WORLD,1); }
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// We need a different seed for each process
srand(time(0) ^ (rank * 33 / 4));
int nDims = 2; // array dimensions
int rows = 4, columns = 6; // rows and columns of each block
int prows = 3, pcolumns = 4; // rows and columns of blocks. Each block is handled by 1 process
char* pre_grid = (char*) malloc(rows * columns * sizeof(char));
char** grid = (char**) malloc(rows * sizeof(char*));
for (int i = 0; i < rows; i++)
grid[i] = &(pre_grid[i * columns]);
char** universe = NULL; // Global array
char* pre_universe = NULL;
char* recvPtr; // Pointer to start of Global array
int Rows = rows * prows; // Global array rows
int Columns = columns * pcolumns; // Global array columns
int sizes[2]; // No of elements in each dimension of the whole array
int subSizes[2]; // No of elements in each dimension of the subarray
int startCoords[2]; // Starting coordinates of each subarray
MPI_Datatype recvBlock, recvMagicBlock;
if (rank == master){ // For the master's eyes only
/* universe = malloc(Rows * sizeof(char*));*/
/* for (int i = 0; i < Rows; i++)*/
/* universe[i] = malloc(Columns * sizeof(char));*/
pre_universe = (char*) malloc(Rows * Columns * sizeof(char));
universe = (char**) malloc(Rows * sizeof(char*));
for (int i = 0; i < Rows; i++) {
universe[i] = &(pre_universe[i * Columns]);
}
// Create a subarray (a rectangular block) datatype from a regular, 2d array
sizes[0] = Rows;
sizes[1] = Columns;
subSizes[0] = rows;
subSizes[1] = columns;
startCoords[0] = 0;
startCoords[1] = 0;
MPI_Type_create_subarray(nDims, sizes, subSizes, startCoords, MPI_ORDER_C, MPI_CHAR, &recvBlock);
// Now modify the newly created datatype to fit our needs, by specifying
// (lower bound remains the same = 0)
// - new extent
// The new region / block will now "change" sooner, as soon as we reach a region of elements
// occupied by a new block, ie. every: (columns) * sizeof(elementType) =
MPI_Type_create_resized(recvBlock, 0, columns * sizeof(char), &recvMagicBlock);
MPI_Type_commit(&recvMagicBlock);
recvPtr = &universe[0][0];
}
// populate arrays
for (int y = 0; y < rows; y++){
for (int x = 0; x < columns; x++){
grid[y][x] = rank + 65;
}
}
// display local array
for (int i = 0; i < np; i++){
if (i == rank) {
printf("\n[Rank] of [total]: No%d of %d\n", rank, np);
print2dCharArray(grid, rows, columns);
}
MPI_Barrier(MPI_COMM_WORLD);
}
/* MPI_Gathering.. */
int recvCounts[np], displacements[np];
// recvCounts: how many chunks of data each process has -- in units of blocks here --
for (int i = 0; i < np; i++)
recvCounts[i] = 1;
// prows * pcolumns = np
// displacements: displacement relative to global buffer (universe) at which to place the
// incoming data block from process i -- in block extents! --
int index = 0;
for (int p_row = 0; p_row < prows; p_row++)
for (int p_column = 0; p_column < pcolumns; p_column++)
displacements[index++] = p_column + p_row * (rows * pcolumns);
// MPI_Gatherv(...) is a collective routine
// Gather the local arrays to the global array in the master process
// send type: MPI_CHAR (a char)
// recv type: recvMagicBlock (a block)
MPI_Gatherv(&grid[0][0], rows * columns, MPI_CHAR, //: parameters relevant to sender
recvPtr, recvCounts, displacements, recvMagicBlock, master, //: parameters relevant to receiver
MPI_COMM_WORLD);
// display global array
MPI_Barrier(MPI_COMM_WORLD);
if (rank == master){
printf("\n---Global Array---\n");
print2dCharArray(universe, Rows, Columns);
}
free(grid[0]);
free(grid);
if (rank == master) {
free(universe[0]);
free(universe);
MPI_Type_free(&recvMagicBlock);
MPI_Type_free(&recvBlock);
}
MPI_Finalize();
return 0;
}
void print2dCharArray(char** array, int rows, int columns)
{
int i, j;
for (i = 0; i < rows; i++){
for (j = 0; j < columns; j++){
printf("%c ", array[i][j]);
}
printf("\n");
}
fflush(stdout);
}
Output:
---Global Array---
A A A A A A B B B B B B C C C C C C D D D D D D
A A A A A A B B B B B B C C C C C C D D D D D D
A A A A A A B B B B B B C C C C C C D D D D D D
A A A A A A B B B B B B C C C C C C D D D D D D
E E E E E E F F F F F F G G G G G G H H H H H H
E E E E E E F F F F F F G G G G G G H H H H H H
E E E E E E F F F F F F G G G G G G H H H H H H
E E E E E E F F F F F F G G G G G G H H H H H H
I I I I I I J J J J J J K K K K K K L L L L L L
I I I I I I J J J J J J K K K K K K L L L L L L
I I I I I I J J J J J J K K K K K K L L L L L L
I I I I I I J J J J J J K K K K K K L L L L L L
Related
Edit: I have removed my code as I do not want to get caught for cheating on my assignment. I will repost the code once my assignment has been submitted. I apologize for posting it on stack overflow, I just had no where else to go for help. Please respect my edit to remove the code. I have tried deleting it, but it will not let me as I need to request it.
[MIPS code I was trying to follow][1]
[C Code I was trying to follow][2]
I am trying to convert recursive fibonacci code into arm assembly but I am running into issues. When running my arm assembly, the final value of the sum is 5 when it should be 2. It seems as though my code loops but maybe one too many times. Any help would be much appreciated as I am new to this.
This is what your code is doing, and below is a test run. This simply isn't a usual recursive fibonacci.
#include <stdio.h>
void f ( int );
int R2 = 0;
int main () {
for ( int i = 0; i < 10; i++ ) {
R2 = 0;
f ( i );
printf ( "f ( %d ) = %d\n", i, R2 );
}
}
void f ( int n ) {
if ( n == 0 ) { R2 += 0; return; }
if ( n == 1 ) { R2 += 1; return; }
f ( n-1 );
f ( n-2 );
R2 += n-1;
}
f ( 0 ) = 0
f ( 1 ) = 1
f ( 2 ) = 2
f ( 3 ) = 5
f ( 4 ) = 10
f ( 5 ) = 19
f ( 6 ) = 34
f ( 7 ) = 59
f ( 8 ) = 100
f ( 9 ) = 167
Either you started with a broken Fibonacci algorithm, or substantially changed it going to assembly. I don't know how this can be fixed, except by following a working algorithm.
Note that in the C code the only addition is in the fib(n-1) + fib(n-2). In particular the special cases just do return 0; and return 1; respectively. Thus your else add 0/1 to sum lines are wrong. You should replace your additions with moves.
Also, you do MOV R1, R0 //copy fib(n-1) which is incorrect because the fib(n-1) has been returned in R2 not R0. That should be MOV R1, R2.
With these changes the code works, even if it is slightly non-standard.
I want to find out the function to generate the sequence with the following pattern.
1 2 3 1 1 2 2 3 3 1 1 1 1 2 2 2 2 3 3 3 3 ....
Where the lower bound number is 1 upper number bound number is 3. Each time numbers start from 1 and each number repeats 2 ^ n times, with n starting with 0.
Here it goes, I hope it will help.
#include <iostream>
#include <math.h>
int main(){
for(int n = 0; n < 5;n++){
for(int i = 1; i < 4;i++){
for(int j = 0;j < pow(2,n) ;j++){
std::cout << i;
}
}
}
return 0;
}
Here is a code in C++:
#include <iostream>
#include <cmath>
int main()
{
// These are the loop control variables
int n, m, i, j, k;
// Read the limit
cin >> n;
// Outermost loop to execute the pattern {1..., 2..., 3...} n times
for (i = 0; i < n; ++i)
{
// This loop generates the required numbers 1, 2, and 3
for (j = 1; j <= 3; ++j)
{
// Display the generated number 2^i times
m = pow(2, i);
for (k = 0; k < m; ++k)
{
std::cout << j << ' ';
}
}
}
}
You can use the same logic in any language you choose to implement it.
Using RcppEigen I want to extract only the diagonal of a sparse matrix as a sparse matrix. Seemed easy enough - below you find my attempts and none deliver my desired result. Mind you attempt 5 doesn't compile and doesn't work. Here are some resources I used; Rcpp Gallery, KDE Forum and in the same post KDE Forum (2), Eigen Sparse Tutorial and SO. Feel like I am close... maybe not... I will let the experts decide.
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
#include <Eigen/SparseCore>
// [[Rcpp::export]]
Eigen::SparseMatrix<double> diag_mat1(Eigen::Map<Eigen::SparseMatrix<double> > &X){
// cannot access diagonal of mapped sparse matrix
const int n(X.rows());
Eigen::VectorXd dii(n);
for (int i = 0; i < n; ++i) {
dii[i] = X.coeff(i,i);
}
Eigen::SparseMatrix<double> ans(dii.asDiagonal());
return ans;
}
// [[Rcpp::export]]
Eigen::SparseMatrix<double> diag_mat2(Eigen::SparseMatrix<double> &X){
Eigen::SparseVector<double> dii(X.diagonal().sparseView());
Eigen::SparseMatrix<double> ans(dii);
return ans;
}
// [[Rcpp::export]]
Eigen::SparseMatrix<double> diag_mat3(Eigen::SparseMatrix<double> &X){
Eigen::VectorXd dii(X.diagonal());
Eigen::SparseMatrix<double> ans(dii.asDiagonal());
ans.pruned(); //hoping this helps
return ans;
}
// [[Rcpp::export]]
Eigen::SparseMatrix<double> diag_mat4(Eigen::SparseMatrix<double> &X){
Eigen::SparseMatrix<double> ans(X.diagonal().asDiagonal());
return ans;
}
// [[Rcpp::export]]
Eigen::SparseMatrix<double> diag_mat5(Eigen::SparseMatrix<double> &X){
struct keep_diag {
inline bool operator() (const int& row, const int& col, const double&) const
{ return row==col; }
};
Eigen::SparseMatrix<double> ans(X.prune(keep_diag()));
return ans;
}
/***R
library(Matrix)
set.seed(42)
nc <- nr <- 5
m <- rsparsematrix(nr, nc, nnz = 10)
diag_mat1(m)
diag_mat2(m)
diag_mat3(m)
diag_mat4(m)
*/
EDIT: Added the results that each attempt gives;
> diag_mat1(m)
5 x 5 sparse Matrix of class "dgCMatrix"
[1,] 0 . . . .
[2,] . -0.095 . . .
[3,] . . 0 . .
[4,] . . . 2 .
[5,] . . . . 1.5
> diag_mat2(m)
5 x 1 sparse Matrix of class "dgCMatrix"
[1,] .
[2,] -0.095
[3,] .
[4,] 2.000
[5,] 1.500
> diag_mat3(m)
5 x 5 sparse Matrix of class "dgCMatrix"
[1,] 0 . . . .
[2,] . -0.095 . . .
[3,] . . 0 . .
[4,] . . . 2 .
[5,] . . . . 1.5
> diag_mat4(m)
5 x 5 sparse Matrix of class "dgCMatrix"
[1,] 0 . . . .
[2,] . -0.095 . . .
[3,] . . 0 . .
[4,] . . . 2 .
[5,] . . . . 1.5
EDIT2: Added desired output;
5 x 5 sparse Matrix of class "dgCMatrix"
[1,] . . . . .
[2,] . -0.095 . . .
[3,] . . . . .
[4,] . . . 2 .
[5,] . . . . 1.5
Answer with inspiration thanks to Aleh;
Eigen::SparseMatrix<double> diag_mat6(Eigen::Map<Eigen::SparseMatrix<double> > &X){
const int n(X.rows());
Eigen::SparseMatrix<double> dii(n, n);
for (int i = 0; i < n; ++i) {
if (X.coeff(i,i) != 0.0 ) dii.insert(i, i) = X.coeff(i,i);
}
dii.makeCompressed();
return dii;
}
I prefer RcppArmadillo because it generally behaves more like R than RcppEigen does.
For your problem, with RcppArmadillo, you can do:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::sp_mat extractDiag(const arma::sp_mat& x) {
int n = x.n_rows;
arma::sp_mat res(n, n);
for (int i = 0; i < n; i++)
res(i, i) = x(i, i);
return res;
}
As suggested by #mtall, you can simply use:
// [[Rcpp::export]]
arma::sp_mat extractDiag3(const arma::sp_mat& x) {
return arma::diagmat(x);
}
If you really want to do this in Eigen, from the documentation, I came up with:
// [[Rcpp::export]]
Eigen::SparseMatrix<double> extractDiag2(Eigen::Map<Eigen::SparseMatrix<double> > &X){
int n = X.rows();
Eigen::SparseMatrix<double> res(n, n);
double d;
typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(n);
for (int i = 0; i < n; i++) {
d = X.coeff(i, i);
if (d != 0) tripletList.push_back(T(i, i, d));
}
res.setFromTriplets(tripletList.begin(), tripletList.end());
return res;
}
I think you just need to skip zero elements across the diagonal:
for (int i = 0; i < n; ++i) {
if (X.coeff(i,i) != 0.0)
dii[i] = X.coeff(i,i);
}
}
I´m trying to learn some cuda and I can't figure out how to solve the following situation:
Consider two groups G1 and G2:
G1 have 2 vectors with 3 elements each a1 = {2,5,8} and b1 =
{8,4,6}
G2 have 2 vectors with 3 elements each a2 = {7,3,1}
and b2 = {4,2,9}
The task is to sum vector a and b from each group and return a sorted c vector, so:
G1 will give c1 = {10,9,14) => (sort algorithm) => c1 = {9,10,14}
G2 will give c2 = {11,5,10) => (sort algorithm) => c1 = {5,10,11}
If I have a gforce with 92 cuda cores I would like to create 92 G groups and make all the sum in parallel so
core 1-> G1 -> c1 = a1 + b1 -> sort c1 -> return c1
core 2-> G2 -> c2 = a2 + b2 -> sort c2 -> return c2
....
core 92-> G92 -> c92 = a92 + b92 -> sort c92 -> return c92
The kernel below sum two vectors in parallel and return another one:
__global__ void add( int*a, int*b, int*c )
{
c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x];
}
What I can´t understand is how make the kernel handle the entire vector not only one
element of the vector and them return an entire vector.
Something like this:
__global__ void add( int*a, int*b, int*c, int size )
{
for (int i = 0; i < size ; i++)
c[i] = a[i] + b[i];
//sort c
}
Can anyone please explain me if it is possible and how to do it?
This is a small example. It uses cudaMallocPitch and cudaMemcpy2D. I hope it will give you guidelines to solve your particular problem:
#include<stdio.h>
#include<cuda.h>
#include<cuda_runtime.h>
#include<device_launch_parameters.h>
#include<conio.h>
#define N 92
#define M 3
__global__ void test_access(float** d_a,float** d_b,float** d_c,size_t pitch1,size_t pitch2,size_t pitch3)
{
int idx = threadIdx.x;
float* row_a = (float*)((char*)d_a + idx*pitch1);
float* row_b = (float*)((char*)d_b + idx*pitch2);
float* row_c = (float*)((char*)d_c + idx*pitch3);
for (int i=0; i<M; i++) row_c[i] = row_a[i] + row_b[i];
printf("row %i column 0 value %f \n",idx,row_c[0]);
printf("row %i column 1 value %f \n",idx,row_c[1]);
printf("row %i column 2 value %f \n",idx,row_c[2]);
}
/********/
/* MAIN */
/********/
int main()
{
float a[N][M], b[N][M], c[N][M];
float **d_a, **d_b, **d_c;
size_t pitch1,pitch2,pitch3;
cudaMallocPitch(&d_a,&pitch1,M*sizeof(float),N);
cudaMallocPitch(&d_b,&pitch2,M*sizeof(float),N);
cudaMallocPitch(&d_c,&pitch3,M*sizeof(float),N);
for (int i=0; i<N; i++)
for (int j=0; j<M; j++) {
a[i][j] = i*j;
b[i][j] = -i*j+1;
}
cudaMemcpy2D(d_a,pitch1,a,M*sizeof(float),M*sizeof(float),N,cudaMemcpyHostToDevice);
cudaMemcpy2D(d_b,pitch2,b,M*sizeof(float),M*sizeof(float),N,cudaMemcpyHostToDevice);
test_access<<<1,N>>>(d_a,d_b,d_c,pitch1,pitch2,pitch3);
cudaMemcpy2D(c,M*sizeof(float),d_c,pitch3,M*sizeof(float),N,cudaMemcpyDeviceToHost);
for (int i=0; i<N; i++)
for (int j=0; j<M; j++) printf("row %i column %i value %f\n",i,j,c[i][j]);
getch();
return 0;
}
92 3-D vectors can be seen as 1 276-D vector, then you can use the single vector add kernel to add them. Thrust will be a more simple way to do this.
update
If your vector is only 3-D, you could simply sort the elements immediately after they are calculated, using sequential method.
If your vector has higher dimensions, you could consider use cub::BlockRadixSort. The idea is to first add one vector per thread/block, then sort the vector within the block using cub::BlockRadixSort.
http://nvlabs.github.io/cub/classcub_1_1_block_radix_sort.html
How do I convert an int, n, to a string so that when I send it over the serial, it is sent as a string?
This is what I have so far:
int ledPin=13;
int testerPin=8;
int n=1;
char buf[10];
void setup()
{
pinMode(ledPin, OUTPUT);
pinMode(testerPin, OUTPUT);
Serial.begin(115200);
}
void loop()
{
digitalWrite(ledPin, HIGH);
sprintf(buf, "Hello!%d", n);
Serial.println(buf);
delay(500);
digitalWrite(ledPin, LOW);
delay(500);
n++;
}
Use like this:
String myString = String(n);
You can find more examples here.
use the itoa() function included in stdlib.h
char buffer[7]; //the ASCII of the integer will be stored in this char array
itoa(-31596,buffer,10); //(integer, yourBuffer, base)
You can simply do:
Serial.println(n);
which will convert n to an ASCII string automatically. See the documentation for Serial.println().
You just need to wrap it around a String object like this:
String numberString = String(n);
You can also do:
String stringOne = "Hello String"; // using a constant String
String stringOne = String('a'); // converting a constant char into a String
String stringTwo = String("This is a string"); // converting a constant string into a String object
String stringOne = String(stringTwo + " with more"); // concatenating two strings
String stringOne = String(13); // using a constant integer
String stringOne = String(analogRead(0), DEC); // using an int and a base
String stringOne = String(45, HEX); // using an int and a base (hexadecimal)
String stringOne = String(255, BIN); // using an int and a base (binary)
String stringOne = String(millis(), DEC); // using a long and a base
This is speed-optimized solution for converting int (signed 16-bit integer) into string.
This implementation avoids using division since 8-bit AVR used for Arduino has no hardware DIV instruction, the compiler translate division into time-consuming repetitive subtractions. Thus the fastest solution is using conditional branches to build the string.
A fixed 7 bytes buffer prepared from beginning in RAM to avoid dynamic allocation. Since it's only 7 bytes, the cost of fixed RAM usage is considered minimum. To assist compiler, we add register modifier into variable declaration to speed-up execution.
char _int2str[7];
char* int2str( register int i ) {
register unsigned char L = 1;
register char c;
register boolean m = false;
register char b; // lower-byte of i
// negative
if ( i < 0 ) {
_int2str[ 0 ] = '-';
i = -i;
}
else L = 0;
// ten-thousands
if( i > 9999 ) {
c = i < 20000 ? 1
: i < 30000 ? 2
: 3;
_int2str[ L++ ] = c + 48;
i -= c * 10000;
m = true;
}
// thousands
if( i > 999 ) {
c = i < 5000
? ( i < 3000
? ( i < 2000 ? 1 : 2 )
: i < 4000 ? 3 : 4
)
: i < 8000
? ( i < 6000
? 5
: i < 7000 ? 6 : 7
)
: i < 9000 ? 8 : 9;
_int2str[ L++ ] = c + 48;
i -= c * 1000;
m = true;
}
else if( m ) _int2str[ L++ ] = '0';
// hundreds
if( i > 99 ) {
c = i < 500
? ( i < 300
? ( i < 200 ? 1 : 2 )
: i < 400 ? 3 : 4
)
: i < 800
? ( i < 600
? 5
: i < 700 ? 6 : 7
)
: i < 900 ? 8 : 9;
_int2str[ L++ ] = c + 48;
i -= c * 100;
m = true;
}
else if( m ) _int2str[ L++ ] = '0';
// decades (check on lower byte to optimize code)
b = char( i );
if( b > 9 ) {
c = b < 50
? ( b < 30
? ( b < 20 ? 1 : 2 )
: b < 40 ? 3 : 4
)
: b < 80
? ( i < 60
? 5
: i < 70 ? 6 : 7
)
: i < 90 ? 8 : 9;
_int2str[ L++ ] = c + 48;
b -= c * 10;
m = true;
}
else if( m ) _int2str[ L++ ] = '0';
// last digit
_int2str[ L++ ] = b + 48;
// null terminator
_int2str[ L ] = 0;
return _int2str;
}
// Usage example:
int i = -12345;
char* s;
void setup() {
s = int2str( i );
}
void loop() {}
This sketch is compiled to 1,082 bytes of code using avr-gcc which bundled with Arduino v1.0.5 (size of int2str function itself is 594 bytes). Compared with solution using String object which compiled into 2,398 bytes, this implementation can reduce your code size by 1.2 Kb (assumed that you need no other String's object method, and your number is strict to signed int type).
This function can be optimized further by writing it in proper assembler code.
The solution is much too big. Try this simple one. Please provide a 7+ character buffer, no check made.
char *i2str(int i, char *buf){
byte l=0;
if(i<0) buf[l++]='-';
boolean leadingZ=true;
for(int div=10000, mod=0; div>0; div/=10){
mod=i%div;
i/=div;
if(!leadingZ || i!=0){
leadingZ=false;
buf[l++]=i+'0';
}
i=mod;
}
buf[l]=0;
return buf;
}
Can be easily modified to give back end of buffer, if you discard index 'l' and increment the buffer directly.
This simply work for me:
int bpm = 60;
char text[256];
sprintf(text, "Pulso: %d ", bpm);
//now use text as string
In Arduino, using the String keyword creates an object of the String class which has multiple versions of its constructor. If an integer is passed as an argument while instantiating, it contains the ASCII representation of the numbers.
int num = 12;
String intString = String(num);
// The value of intString should be "12"
Please check out the arduino String reference.
Here below is a self composed myitoa() which is by far smaller in code, and reserves a FIXED array of 7 (including terminating 0) in char *mystring, which is often desirable. It is obvious that one can build the code with character-shift instead, if one need a variable-length output-string.
void myitoa(int number, char *mystring) {
boolean negative = number>0;
mystring[0] = number<0? '-' : '+';
number = number<0 ? -number : number;
for (int n=5; n>0; n--) {
mystring[n] = ' ';
if(number > 0) mystring[n] = number%10 + 48;
number /= 10;
}
mystring[6]=0;
}
Serial.println(val)
Serial.println(val, format)
for more you can visit to the site of arduino
https://www.arduino.cc/en/Serial/Println
wish this will help you.
thanks!