Free up memory in C creating issue - pointers

I have a structure
typedef struct
char * name;
Player players[10];
The memory is getting leak beacuse I am using malloc in initializeTeam.
Team * initializeTeam(int players, char *name)
int i=0;
Team *teama=malloc(sizeof(*teama));
for(i=0 ; i < players ; i++)
teama->players[i].defensive=((rand() % 7) + 1);
teama->players[i].offensive=((rand() % 10) + 1);
return teama;
I want memory not to leak.
The way is I can use free(teama);, if I use it the value is getting lost, as I am returning teama I want value to stay.
How can I go about it?


C++ Vector understanding

I am currently using to practice my coding skills. I finished one of the problems and wanted to check what other people's solutions were and I found one I couldn't understand. It's much better than my solution and I would like to understand it more. such what does "*std" do exactly. what is the +=i doing to the min_elements and what is happening to the min elements?
long queueTime(std::vector<int> customers,int n){
std::vector<long> queues(n, 0);
for (int i : customers)
*std::min_element(queues.begin(), queues.end()) += i;
return *std::max_element(queues.cbegin(), queues.cend());
This was my solution:
#include <iostream>
#include <vector>
#include <array>
using namespace std;
long queueTime(std::vector<int> customers,int n){
int i = 0; //start of Queue
int count = 0; //keeps track of how many items has been
int biggest = 0; //Last/largest ending item size, add to count at end
int list [n]; //Declared number of registers by size n
for(int k = 0;k<n;k++) //sets each existing register to have 0
list[k] = 0;
//Start of processing customers, ends when last customer is at register.
for (auto i = customers.begin(); i!=customers.end();)
//checks if there are free registers.
for(int index = 0; index<n && i!=customers.end();index++)
list[index] = *i;
//Subtract 1 from every register
int temp=0;
for (int k =0;k<n;k++)
if(list[k]!= 0)
temp = list[k];
temp = temp-1;
list[k] = temp;
//increase count of items processed
//calculates the largest number of items a customer has amungst the last few
for(int j=0;j<n;j++)
biggest = list[j];
//end first part
cout<<"\nCount: "<<count<<" Biggest: "<<biggest<<endl;
cout<<"End Function:"
//answer if number of items processed + last biggest number of items.
return count+biggest;
The code is mapping a set of integers to n buckets and minimizing the sum of the elements assigned to a given bucket.
For each customer int i , the smallest element of the queue is incremented by i. Then the largest resulting queue value is returned.
std::vector is a qualified name lookup of an identifier within the std namespace.
min_element returns an iterator. The dereference operator (*) produces an lvalue that is incremented by a compound assignment operator (+=).

Using of MPI Barrier lead to fatal error

I get a strange behavior of my simple MPI program. I spent time to find an answer myself, but I can't. I red some questions here, like OpenMPI MPI_Barrier problems, MPI_SEND stops working after MPI_BARRIER, Using MPI_Bcast for MPI communication. I red MPI tutorial on mpitutorial.
My program just modify array that was broadcasted from root process and then gather modified arrays to one array and print them.
So, the problem is, that when I use code listed below with uncommented MPI_Barrier(MPI_COMM_WORLD) I get an error.
#include "mpi/mpi.h"
#define N 4
void transform_row(int* row, const int k) {
for (int i = 0; i < N; ++i) {
row[i] *= k;
const int root = 0;
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int rank, ranksize;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &ranksize);
if (rank == root) {
int* arr = new int[N];
for (int i = 0; i < N; ++i) {
arr[i] = i * i + 1;
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
int* arr = new int[N];
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
transform_row(arr, rank * 100);
int* transformed = new int[N * ranksize];
MPI_Gather(arr, N, MPI_INT, transformed, N, MPI_INT, root, MPI_COMM_WORLD);
if (rank == root) {
for (int i = 0; i < ranksize; ++i) {
for (int j = 0; j < N ; j++) {
printf("%i ", transformed[i * N + j]);
return 0;
The error comes with number of thread > 1. The error:
Fatal error in PMPI_Barrier: Message truncated, error stack:
PMPI_Barrier(425)...................: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(332)..............: Failure during collective
MPIDI_CH3U_Request_unpack_uebuf(568): Message truncated; 16 bytes received but buffer size is 1
I understand that some problem with buffer exists, but when I use MPI_buffer_attach to attach big buffer to MPI it don't help.
Seems I need to increase this buffer, but I don't now how to do this.
XXXXXX#XXXXXXXXX:~/test_mpi$ mpirun --version
HYDRA build details:
Version: 3.2
Release Date: Wed Nov 11 22:06:48 CST 2015
So help me please.
One issue is MPI_Bcast() is invoked twice by the root rank, but only once by the other ranks. And then root rank uses an uninitialized arr.
MPI_Barrier() might only hide the problem, but it cannot fix it.
Also, note that if N is "large enough", then the second MPI_Bcast() invoked by root rank will likely hang.
Here is how you can revamp the init/broadcast phase to fix these issues.
int* arr = new int[N];
if (rank == root) {
for (int i = 0; i < N; ++i) {
arr[i] = i * i + 1;
MPI_Bcast(arr, N, MPI_INT, root, MPI_COMM_WORLD);
Note in this case, you can simply initialize arr on all the ranks so you do not even need to broadcast the array.
As a side note, MPI program typically
#include <mpi.h>
and then use mpicc for the compilation/linking
(this is a wrapper that invoke the real compiler after setting the include/library paths and using the MPI libs)

How can I write the memory pointer in CUDA [duplicate]

This question already has an answer here:
Summing the rows of a matrix (stored in either row-major or column-major order) in CUDA
(1 answer)
Closed 5 years ago.
I declared two GPU memory pointers, and allocated the GPU memory, transfer data and launch the kernel in the main:
// declare GPU memory pointers
char * gpuIn;
char * gpuOut;
// allocate GPU memory
cudaMalloc(&gpuIn, ARRAY_BYTES);
cudaMalloc(&gpuOut, ARRAY_BYTES);
// transfer the array to the GPU
cudaMemcpy(gpuIn, currIn, ARRAY_BYTES, cudaMemcpyHostToDevice);
// launch the kernel
role<<<dim3(1),dim3(40,20)>>>(gpuOut, gpuIn);
// copy back the result array to the CPU
cudaMemcpy(currOut, gpuOut, ARRAY_BYTES, cudaMemcpyDeviceToHost);
And this is my code inside the kernel:
__global__ void role(char * gpuOut, char * gpuIn){
int idx = threadIdx.x;
int idy = threadIdx.y;
char live = '0';
char dead = '.';
char f = gpuIn[idx][idy];
But here are some errors, I think here are some errors on the pointers. Any body can give a help?
The key concept is the storage order of multidimensional arrays in memory -- this is well described here. A useful abstraction is to define a simple class which encapsulates a pointer to a multidimensional array stored in linear memory and provides an operator which gives something like the usual a[i][j] style access. Your code could be modified something like this:
template<typename T>
struct array2d
T* p;
size_t lda;
__device__ __host__
array2d(T* _p, size_t _lda) : p(_p), lda(_lda) {};
__device__ __host__
T& operator()(size_t i, size_t j) {
return p[j + i * lda];
__device__ __host__
const T& operator()(size_t i, size_t j) const {
return p[j + i * lda];
__global__ void role(array2d<char> gpuOut, array2d<char> gpuIn){
int idx = threadIdx.x;
int idy = threadIdx.y;
char live = '0';
char dead = '.';
char f = gpuIn(idx,idy);
int main()
const int rows = 5, cols = 6;
const size_t ARRAY_BYTES = sizeof(char) * size_t(rows * cols);
// declare GPU memory pointers
char * gpuIn;
char * gpuOut;
char currIn[rows][cols], currOut[rows][cols];
// allocate GPU memory
cudaMalloc(&gpuIn, ARRAY_BYTES);
cudaMalloc(&gpuOut, ARRAY_BYTES);
// transfer the array to the GPU
cudaMemcpy(gpuIn, currIn, ARRAY_BYTES, cudaMemcpyHostToDevice);
// launch the kernel
role<<<dim3(1),dim3(rows,cols)>>>(array2d<char>(gpuOut, cols), array2d<char>(gpuIn, cols));
// copy back the result array to the CPU
cudaMemcpy(currOut, gpuOut, ARRAY_BYTES, cudaMemcpyDeviceToHost);
return 0;
The important point here is that a two dimensional C or C++ array stored in linear memory can be addressed as col + row * number of cols. The class in the code above is just a convenient way of expressing this.

parallelize for loop using boost MPI

I am learning to use Boost.MPI to parallelize the large amount of computation, here below is just my simple test see if I can get MPI logic correctly. However, I did not get it to work. I used world.size()=10, there are total 50 elements in data array, each process will do 5 iteration. I would hope to update data array by having each process sending the updated data array to root process, and then the root process receives the updated data array then print out. But I only get a few elements updated.
Thanks for helping me.
#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>
namespace mpi = boost::mpi;
using namespace std;
#define max_rows 100
int data[max_rows];
int modifyArr(const int index, const int arr[]) {
return arr[index]*2+1;
int main(int argc, char* argv[])
mpi::environment env(argc, argv);
mpi::communicator world;
int num_rows = 50;
int my_number;
if (world.rank() == 0) {
for ( int i = 0; i < num_rows; i++)
data[i] = i + 1;
broadcast(world, data, 0);
for (int i = world.rank(); i < num_rows; i += world.size()) {
my_number = modifyArr(i, data);
data[i] = my_number;
world.send(0, 1, data);
//cout << "i=" << i << " my_number=" << my_number << endl;
if (world.rank() == 0)
for (int j = 1; j < world.size(); j++)
mpi::status s = world.recv(boost::mpi::any_source, 1, data);
if (world.rank() == 0) {
for ( int i = 0; i < num_rows; i++)
cout << "i=" << i << " results = " << data[i] << endl;
return 0;
Your problem is probably here:
mpi::status s = world.recv(boost::mpi::any_source, 1, data);
This is the only way data can get back to the master node.
However, you do not tell the master node where in data to store the answers it is getting. Since data is the address of the array, everything should get stored in the zeroth element.
Interleaving which elements of the array you are processing on each node is a pretty bad idea. You should assign blocks of the array to each node so that you can send entire chunks of the array at once. That will reduce communication overhead significantly.
Also, if your issue is simply speeding up for loops, you should consider OpenMP, which can do things like this:
#pragma omp parallel for
for(int i=0;i<100;i++)
Bam! I just split that for loop up between all of my processes with no further work needed.

How do I read ext2 root directory from mapped memory?

I'm making a Remote Filesystem Server for my university and I'm having some trouble with reading the root directory... Here's the thing:
I've read the root inode (inode 2) and it has consistent data, I mean that, for example, owner user Id field is set at '1000'. Then I proceed to read the contents of the inode data blocks, but when I try to access to the data block in question (the only one that is addressed in the inode i_block array, 240 on my debugging) all bytes are set to '0'. Can anyone help me with this? It's really important. Note: I cannot make it another way than with mapped memory and I'm not opening a real disk, but rather opening a .disk linux file. It has been created with the command-line
mkfs.ext2 -F -r 0 -b 1024 ext2.disk 30000
Here's my code:
#include <linux/ext2_fs.h>
typedef struct s_inode *pinode; /* Pointer to inode struct */
typedef struct s_direct *pdir; /* Pointer to direct struct */
int main(int argv, char *argc[]){
int *data;
pdir root = malloc(sizeof(struct s_direct));
/* Code for mpping .disk file, fetching supernode, and other ext2 data */
/* fsys is a global variable that holds general ext2 system data */
fsys->root = get_inode(2);
data = get_cont(fsys->root);
root = (pdir)getblock(data[0]);
pinode get_inode(int idx){
pinode inod;
int grp, offs;
grp = (idx-1)/fsys->superblock->s_inodes_per_group;
offs = (idx-1)%fsys->superblock->s_inodes_per_group;
inod = (pinode)&fsys->diskmap[(fsys->group[grp]->itab)+offs*sizeof(struct s_inode)];
return inod;
int *get_cont(pinode inod){
int *cont;
int *idx;
int i=0;
int *block;
idx = malloc(sizeof(int));
cont = malloc(sizeof(int));
while(i < inod->i_blocks && i<13) {
realloc(cont, i*sizeof(int));
if(i < inod->i_blocks){
fetchcont(block, idx, cont, inod->i_blocks, 0);
if(i < inod->i_blocks){
fetchcont(block, idx, cont, inod->i_blocks, 1);
if(i < inod->i_blocks){
fetchcont(block, idx, cont, inod->i_blocks, 2);
return cont;
int fetchcont(int *block, int *idx, int *cont, int lim, int lvl){
int i=0;
if(lvl == 0){
while((*idx) < lim && i<fsys->bsize){
realloc(cont, (*idx)*sizeof(int));
return 1;
return 0;
if(!fetchcont((int*)getblock(block[i]), idx, cont, lim, lvl)){
return 0;
void *getblock(int idx){
char *block;
int grp, offs;
grp = (idx-1)/fsys->superblock->s_blocks_per_group;
offs = (idx-1)%fsys->superblock->s_blocks_per_group;
block = &fsys->diskmap[fsys->group[grp]->blocks+offs*fsys->bsize];
return block;
Solved the problem. I assumed that block n was the n data block, but the offset included ALL the blocks. I've changed my getblock function to
void *getblock(int idx){
return &fsys->diskmap[fsys->bsize*idx];
and worked!
