pointers memory allocation using calloc example - pointers

when the size of pointer array is itself 4 and when i try to print 5th value it gives a random number.How? tell me how this random allocation happens. Thanks!
#include< stdio.h>
#include< stdlib.h>
int main()
{
int* p_array;
int i;
// call calloc to allocate that appropriate number of bytes for the array
p_array = (int *)calloc(4,sizeof(int)); // allocate 4 ints
for(i=0; i < 4; i++)
{
p_array[i] = 1;
}
for(i=0; i < 4; i++)
{
printf("%d\n",p_array[i]);
}
printf("%d\n",p_array[5]); // when the size of pointer array is itself 4 and when i try to print 5th value it gives a random number.How?
free(p_array);
return 0;
}

Arrays start at zero, so p_array[5] wasn't initialized in your code. It is printing out a piece of memory that is somewhere on your system.
Read this for a great description on why arrays are zero-based.
e.g.:
p_array[0] = 1;
p_array[1] = 1;
p_array[2] = 1;
p_array[3] = 1;
p_array[4] = 1;
p_array[5] = ?????;

The following has undefined behaviour since you're reading past the end of the array:
p_array[5]

printf("%d\n",p_array[5]);
is an attempt to print an uninitialized section of memory, because your array p_array has the strenght to store only 5 items p_array = (int *)calloc(4,sizeof(int)); starting from
p_array[0] through p_array[4] and hence p_array[5] gives you a garbage value.

Related

Testing slob.c using malloc()

I have altered slob.c so that it gathers stats on the last 100 small list allocations. I made necessary edits to make sure SLOB is being used.
I am running a test program that calls malloc() about 10,000 or 100,000 times on a size of 20 bytes.
But my SLOB test results immediately after the test program runs states that the average claimed size was 140 bytes (when I was expecting it to at least be somewhere near 20 bytes).
What am I doing wrong, is there a way to accurately test SLOB?
I am pretty sure that my stat collecting is accurate, as I have had a few professors check it out for me. This is my current test program:
int main()
{
char * a ;
int i ;
for( i = 0; i < 1000000; i++)
{
a = (char*) malloc(20*sizeof(char)) ;
if(a == NULL) printf("NULL\n") ;
}
//Here I print the system call resulting stats for memory claimed and free memory
My original answer was actually correct:
char * a ;
int i ;
for( i = 0; i < 10000; i++)
{
a = (char*) malloc(20*sizeof(char)) ;
if(a == NULL) printf("NULL\n") ;
}
This simple code can be used to test how the altered slob file allocates memory (considering you are gathering statistics within the slob file).

Mapping MPI processes to particular nodes

I think this question is irrelavant to ask here. But could n't help myself.
Suppose I have a cluster with 100 nodes with each node having 16 cores.
I have an mpi application whose communication pattern is already known and I also know the cluster topology(i.e hop distance between nodes).
Now I know the processes to node mapping that reduces the contention on the network. For example: process to node mappings are 10->20,30->90.
How do I map the process with rank 10 to the node-20?
Please help me in this.
If you are not constrained with any kind of a queueing system you can control the rank to node mapping by creating your own machinefile.
For instance if the file my_machine_file has the following 1600 lines
node001
node002
node003
....
node100
node001
node002
node003
....
node100
...
[repeat 13 more times]
...
node001
node002
node003
....
node100
it would correspond to the mapping
0-> node001, 1 -> node002, ... 99 -> node100, 100 -> node001, ...
you should run your application with
mpirun -machinefile my_machine_file -n 1600 my_app
When your application needs less than 1600 processes you can edit your machinefile accordingly.
Please remember though that the cluster admin has probably numbered the nodes respecting the topology of the interconnect. Yet there are reports of sensible increase (order of 10%-20%) in performance through careful exploitation of the cluster topology. (References to follow).
Note: Starting an MPI program with mpirun is neither standardized nor portable. However here the question is clearly related to a specific compute cluster and a specific implementation (OpenMPI) and does not request a portable solution.
A little late to this party, but here's a subroutine in C++ that will give you a node communicator and a master communicator (just for the masters of nodes), as well as the size and rank of each. It's clumsy, but I haven't found a better way to do this unfortunately. Luckily it only adds about 0.1s to the wall times. Maybe you or someone else will get some use out of it.
#define MASTER 0
using namespace std;
/*
* Make a comunicator for each node and another for just
* the masters of the nodes. Upon completion, everyone is
* in a new node communicator, knows its size and their rank,
* and the rank of their master in the master communicator,
* which can be useful to use for indexing.
*/
bool CommByNode(MPI::Intracomm &NodeComm,
MPI::Intracomm &MasterComm,
int &NodeRank, int &MasterRank,
int &NodeSize, int &MasterSize,
string &NodeNameStr)
{
bool IsOk = true;
int Rank = MPI::COMM_WORLD.Get_rank();
int Size = MPI::COMM_WORLD.Get_size();
/*
* ======================================================================
* What follows is my best attempt at creating a communicator
* for each node in a job such that only the cores on that
* node are in the node's communicator, and each core groups
* itself and the node communicator is made using the Split() function.
* The end of this (lengthly) process is indicated by another comment.
* ======================================================================
*/
char *NodeName, *NodeNameList;
NodeName = new char [1000];
int NodeNameLen,
*NodeNameCountVect,
*NodeNameOffsetVect,
NodeNameTotalLen = 0;
// Get the name and name character count of each core's node
MPI::Get_processor_name(NodeName, NodeNameLen);
// Prepare a vector for character counts of node names
if (Rank == MASTER)
NodeNameCountVect = new int [Size];
// Gather node name lengths to master to prepare c-array
MPI::COMM_WORLD.Gather(&NodeNameLen, 1, MPI::INT, NodeNameCountVect, 1, MPI::INT, MASTER);
if (Rank == MASTER){
// Need character count information for navigating node name c-array
NodeNameOffsetVect = new int [Size];
NodeNameOffsetVect[0] = 0;
NodeNameTotalLen = NodeNameCountVect[0];
// build offset vector and total char count for all node names
for (int i = 1 ; i < Size ; ++i){
NodeNameOffsetVect[i] = NodeNameCountVect[i-1] + NodeNameOffsetVect[i-1];
NodeNameTotalLen += NodeNameCountVect[i];
}
// char-array for all node names
NodeNameList = new char [NodeNameTotalLen];
}
// Gatherv node names to char-array in master
MPI::COMM_WORLD.Gatherv(NodeName, NodeNameLen, MPI::CHAR, NodeNameList, NodeNameCountVect, NodeNameOffsetVect, MPI::CHAR, MASTER);
string *FullStrList, *NodeStrList;
// Each core keeps its node's name in a str for later comparison
stringstream ss;
ss << NodeName;
ss >> NodeNameStr;
delete NodeName; // node name in str, so delete c-array
int *NodeListLenVect, NumUniqueNodes = 0, NodeListCharLen = 0;
string NodeListStr;
if (Rank == MASTER){
/*
* Need to prepare a list of all unique node names, so first
* need all node names (incl duplicates) as strings, then
* can make a list of all unique node names.
*/
FullStrList = new string [Size]; // full list of node names, each will be checked
NodeStrList = new string [Size]; // list of unique node names, used for checking above list
// i loops over node names, j loops over characters for each node name.
for (int i = 0 ; i < Size ; ++i){
stringstream ss;
for (int j = 0 ; j < NodeNameCountVect[i] ; ++j)
ss << NodeNameList[NodeNameOffsetVect[i] + j]; // each char into the stringstream
ss >> FullStrList[i]; // stringstream into string for each node name
ss.str(""); // This and below clear the contents of the stringstream,
ss.clear(); // since the >> operator doesn't clear as it extracts
//cout << FullStrList[i] << endl; // for testing
}
delete NodeNameList; // master is done with full c-array
bool IsUnique; // flag for breaking from for loop
stringstream ss; // used for a full c-array of unique node names
for (int i = 0 ; i < Size ; ++i){ // Loop over EVERY name
IsUnique = true;
for (int j = 0 ; j < NumUniqueNodes ; ++j)
if (FullStrList[i].compare(NodeStrList[j]) == 0){ // check against list of uniques
IsUnique = false;
break;
}
if (IsUnique){
NodeStrList[NumUniqueNodes] = FullStrList[i]; // add unique names so others can be checked against them
ss << NodeStrList[NumUniqueNodes].c_str(); // build up a string of all unique names back-to-back
++NumUniqueNodes; // keep a tally of number of unique nodes
}
}
ss >> NodeListStr; // make a string of all unique node names
NodeListCharLen = NodeListStr.size(); // char length of all unique node names
NodeListLenVect = new int [NumUniqueNodes]; // list of unique node name lengths
/*
* Because Bcast simply duplicates the buffer of the Bcaster to all cores,
* the buffer needs to be a char* so that the other cores can have a similar
* buffer prepared to receive. This wouldn't work if we passed string.c_str()
* as the buffer, becuase the receiving cores don't have string.c_str() to
* receive into, and even if they did, c_srt() is a method and can't be used
* that way.
*/
NodeNameList = new char [NodeListCharLen]; // even though c_str is used, allocate necessary memory
NodeNameList = const_cast<char*>(NodeListStr.c_str()); // c_str() returns const char*, so need to recast
for (int i = 0 ; i < NumUniqueNodes ; ++i) // fill list of unique node name char lengths
NodeListLenVect[i] = NodeStrList[i].size();
/*for (int i = 0 ; i < NumUnique ; ++i)
cout << UniqueNodeStrList[i] << endl;
MPI::COMM_WORLD.Abort(1);*/
//delete NodeStrList; // Arrays of string don't need to be deallocated,
//delete FullStrList; // I'm guessing becuase of something weird in the string class.
delete NodeNameCountVect;
delete NodeNameOffsetVect;
}
/*
* Now we send the list of node names back to all cores
* so they can group themselves appropriately.
*/
// Bcast the number of nodes in use
MPI::COMM_WORLD.Bcast(&NumUniqueNodes, 1, MPI::INT, MASTER);
// Bcast the full length of all node names
MPI::COMM_WORLD.Bcast(&NodeListCharLen, 1, MPI::INT, MASTER);
// prepare buffers for node name Bcast's
if (Rank > MASTER){
NodeListLenVect = new int [NumUniqueNodes];
NodeNameList = new char [NodeListCharLen];
}
// Lengths of node names for navigating c-string
MPI::COMM_WORLD.Bcast(NodeListLenVect, NumUniqueNodes, MPI::INT, MASTER);
// The actual full list of unique node names
MPI::COMM_WORLD.Bcast(NodeNameList, NodeListCharLen, MPI::CHAR, MASTER);
/*
* Similar to what master did before, each core (incl master)
* needs to build an actual list of node names as strings so they
* can compare the c++ way.
*/
int Offset = 0;
NodeStrList = new string[NumUniqueNodes];
for (int i = 0 ; i < NumUniqueNodes ; ++i){
stringstream ss;
for (int j = 0 ; j < NodeListLenVect[i] ; ++j)
ss << NodeNameList[Offset + j];
ss >> NodeStrList[i];
ss.str("");
ss.clear();
Offset += NodeListLenVect[i];
//cout << FullStrList[i] << endl;
}
// Now since everyone has the same list, just check your node and find your group.
int CommGroup = -1;
for (int i = 0 ; i < NumUniqueNodes ; ++i)
if (NodeNameStr.compare(NodeStrList[i]) == 0){
CommGroup = i;
break;
}
if (Rank > MASTER){
delete NodeListLenVect;
delete NodeNameList;
}
// In case process fails, error prints and job aborts.
if (CommGroup < 0){
cout << "**ERROR** Rank " << Rank << " didn't identify comm group correctly." << endl;
IsOk = false;
}
/*
* ======================================================================
* The above method uses c++ strings wherever possible so that things
* like node name comparisons can be done the c++ way. I'm sure there's
* a better way to do this because that was way too many lines of code...
* ======================================================================
*/
// Create node communicators
NodeComm = MPI::COMM_WORLD.Split(CommGroup, 0);
NodeSize = NodeComm.Get_size();
NodeRank = NodeComm.Get_rank();
// Group for master communicator
int MasterGroup;
if (NodeRank == MASTER)
MasterGroup = 0;
else
MasterGroup = MPI_UNDEFINED;
// Create master communicator
MasterComm = MPI::COMM_WORLD.Split(MasterGroup, 0);
MasterRank = -1;
MasterSize = -1;
if (MasterComm != MPI::COMM_NULL){
MasterRank = MasterComm.Get_rank();
MasterSize = MasterComm.Get_size();
}
MPI::COMM_WORLD.Bcast(&MasterSize, 1, MPI::INT, MASTER);
NodeComm.Bcast(&MasterRank, 1, MPI::INT, MASTER);
return IsOk;
}

mpi_gather for struct with dynamic array

I have a struct:
typedef struct
{
double distance;
int* path;
} tour;
Then I trying to gather results from all processes:
MPI_Gather(&best, sizeof(tour), MPI_BEST, all_best, sizeof(tour)*proc_count, MPI_BEST, 0, MPI_COMM_WORLD);
After gather my root see that all_best containts only 1 normal element and trash in others.
Type of all_best is tour*.
Initialisation of MPI_BEST:
void ACO_Build_best(tour *tour,int city_count, MPI_Datatype *mpi_type /*out*/)
{
int block_lengths[2];
MPI_Aint displacements[2];
MPI_Datatype typelist[2];
MPI_Aint start_address;
MPI_Aint address;
block_lengths[0] = 1;
block_lengths[1] = city_count;
typelist[0] = MPI_DOUBLE;
typelist[1] = MPI_INT;
MPI_Address(&(tour->distance), &displacements[0]);
MPI_Address(&(tour->path), &displacements[1]);
displacements[1] = displacements[1] - displacements[0];
displacements[0] = 0;
MPI_Type_struct(2, block_lengths, displacements, typelist, mpi_type);
MPI_Type_commit(mpi_type);
}
Any ideas are welcome.
Apart from passing incorrect lengths to MPI_Gather, MPI actually does not follow pointers to pointers. With such a structured type you would be sending the value of distance and the value of the path pointer (essentially an address which makes no sense when sent to other processes). If one supposes that distance essentially gives the number of elements in path, then you can kind of achieve your goal with a combination of MPI_Gather and MPI_Gatherv:
First, gather the lengths:
int counts[proc_count];
MPI_Gather(&best->distance, 1, MPI_INT, counts, 1, MPI_INT, 0, MPI_COMM_WORLD);
Now that counts is populated with the correct lengths, you can continue and use MPI_Gatherv to receive all paths:
int disps[proc_count];
disps[0] = 0;
for (int i = 1; i < proc_count; i++)
disps[i] = disps[i-1] + counts[i-1];
// Allocate space for the concatenation of all paths
int *all_paths = malloc((disps[proc_count-1] + counts[proc_count-1])*sizeof(int));
MPI_Gatherv(best->path, best->distance, MPI_INT,
all_paths, counts, disps, MPI_INT, 0, MPI_COMM_WORLD);
Now you have the concatenation of all paths in all_paths. You can examine or extract an individual path by taking counts[i] elements starting at position disps[i] in all_paths. Or you can even build an array of tour structures and make them use the already allocated and populated path storage:
tour *all_best = malloc(proc_count*sizeof(tour));
for (int i = 0; i < proc_count; i++)
{
all_best[i].distance = counts[i];
all_best[i].path = &all_paths[disps[i]];
}
Or you can duplicate the segments instead:
for (int i = 0; i < proc_count; i++)
{
all_best[i].distance = counts[i];
all_best[i].path = malloc(counts[i]*sizeof(int));
memcpy(all_best[i].path, &all_paths[disps[i]], counts[i]*sizeof(int));
}
// all_paths is not needed any more and can be safely free()-ed
Edit: Because I've overlooked the definition of the tour structure, the above code actually works with:
struct
{
int distance;
int *path;
}
where distance holds the number of significant elements in path. This is different from your case, but without some information on how tour.path is being allocated (and sized), it's hard to give a specific solution.

using stars when declaring an array

I have two questions regarding the multidimensional arrays. I declared a 3D array using two stars but when I try to access the elements I get a used-without-initializing error.
unsigned **(test[10]);
**(test[0]) = 5;
Howcome I get that error while when I use the following code, I don't get an error - What's the difference?
unsigned test3[10][10][10];
**(test3[0]) = 5;
My second question is this: I'm trying to port a piece of code that was written for Unix to Windows. One of the lines is this:
unsigned **(precomputedHashesOfULSHs[nnStruct->nHFTuples]);
*nHFTuples is of type int but it's not a constant, and this the error that I'm getting;
error C2057: expected constant expression
Is it possible that I'm getting this error because I'm running it on Windows not Unix? - and how would I solve this problem? I can't make nHFTuples a constant because the user will need to provide the value for it!
In the first one, you didn't declare a 3D array, you declared an array of 10 pointers to pointers to unsigned ints. When you dereference it, you're dereferencing a garbage pointer.
In the second one, you declared the array correctly but you're using it wrong. Arrays are not pointers and you don't dereference them.
Do this:
unsigned test3[10][10][10];
test3[0][0][0] = 5;
To answer your second question, you have to use a number that can be known at compile time as the size of an array. GCC has a nonstandard extension that allows you to do that, but it's not portable and not part of the standard (though C99 introduced them). To fix it, you'll have to use malloc and free:
int i, j, k;
unsigned*** precomputedHashOfULSHs = malloc(nnStruct->nHFTuples * sizeof(unsigned));
for (i = 0; i < firstDimensionLength; ++i) {
precomputedHashOfULSHs[i] = malloc(sizeOfFirstDimension * sizeof(unsigned));
for (j = 0; j < secondDimensionLength; ++j) {
precomputedHashOfULSHs[i][j] = malloc(sizeOfSecondDimension * sizeof(unsigned));
for (k = 0; k < sizeOfSecondDimension; ++k)
precomputedHashOfULSHs[i][j][k] = malloc(sizeof(unsigned));
}
}
// then when you're done...
for (i = 0; i < firstDimensionLength; ++i) {
for (j = 0; j < secondDimensionLength; ++j) {
for (k = 0; k < sizeOfSecondDimension; ++k)
free(precomputedHashOfULSHs[i][j][k]);
free(precomputedHashOfULSHs[i][j]);
}
free(precomputedHashOfULSHs[i]);
}
free(precomputedHashOfULSHs);
(Pardon me if that allocation/deallocation code is wrong, it's late :))
Although you don't specify it, I think you're using a compiler on unix that supports C99 (SUch as GCC), whereas the compiler you use on windows does not support it. (Visual Studio uses only C89 here).
You have three options:
You can hard-code a suitable maximum array size.
You could allocate the array yourself using malloc or calloc. Don't forget to free it when you're done.
Port the program to C++, and use std::vector.
If you choose option 3, then you'll want something like:
std::vector<unsigned int> precomputedHashOfULSHs;
For a single-dimension vector, or for a two-dimensional vector, use:
std::vector<std::vector<unsigned int> > precomputedHashOfULSHs;
Do note that vectors default to being empty, of zero length, so you will need to add each element from the original set.
In the case of test3 as an example, you'll want:
std::vector<std::vector<std::vector<unsigned int> > > precomputedHashOfULSHs;
precomputedHashOfULSHs.resize(10);
for(int i = 0; i < 10; i++) {
precomputedHashOfULSHs[i].resize(10);
for(int ii=0; ii<10; ii++) {
precomputedHashOfULSHs[i][ii].resize(10);
}
}
I haven't tested this code, but it should be right. C++ will manage the memory of that vector for you automatically.

stream.read method accepts length as integer type ??? where as

i am trying to read file from a stream.
and i am using stream.read method to read the bytes. So the code goes like below
FileByteStream.Read(buffer, 0, outputMessage.FileByteStream.Length)
Now the above gives me error because the last parameter "outputMessage.FileByteStream.Length" returns a long type value but the method expects an integer type.
Please advise.
Convert it to an int...
FileByteStream.Read(buffer, 0, Convert.ToInt32(outputMessage.FileByteStream.Length))
It's probably an int because this operation blocks until it's done reading...so if you're in a high volume application, you may not want to block while you read in a massively large file.
If what you're reading in isn't reasonably sized, you may want to consider looping to read the data into a buffer (example from MSDN docs):
//s is the stream that I'm working with...
byte[] bytes = new byte[s.Length];
int numBytesToRead = (int) s.Length;
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to 10.
int n = s.Read(bytes, numBytesRead, 10);
// The end of the file is reached.
if (n == 0)
{
break;
}
numBytesRead += n;
numBytesToRead -= n;
}
This way you don't cast, and if you pick a reasonably large number to read into the buffer, you'll only go through the while loop once.

Resources