MPI - passing argument 1 makes pointer from integer without a cast - mpi

I am trying to send rank of processes to the process on the right side in a circle. When I used
MPI_Send(msg, 100, MPI_CHAR, right, 99, MPI_COMM_WORLD);
MPI_Recv(msg, 100, MPI_CHAR, left, 99, MPI_COMM_WORLD,&status);
where MSG was Char[100], everything was OK. Now, when I changed it like:
MPI_Send(value, 1, MPI_INT, right, 99, MPI_COMM_WORLD);
MPI_Recv(value, 1, MPI_INT, left, 99, MPI_COMM_WORLD,&status);
where int value=value+rank, I am getting error during compilation for each MPI_Send and MPI_Recv: passing argument 1 makes pointer from integer without a cast. Does anyone knows how to solve it?
Thanks

MPI_Send(&value, 1, MPI_INT, right, 99, MPI_COMM_WORLD);
MPI_Recv(&value, 1, MPI_INT, left, 99, MPI_COMM_WORLD,&status);

Related

Solving for possible combinations of variables that add up to a certain number or range of numbers

Good evening all, this is my first question and I am hoping someone on here might be able to at least point me in a direction.
I am trying to figure out how to optimally stack pallets in a new storage facility. I need to configure the racking ahead of time in order to accept different sized pallets.
I am thinking of using between 3-6 different pallet height openings, say 105", 100", 84", 78", 72" and 66".
What I need to do is figure out every possible combination of these pallet heights that will have the top of the top beam at, say, 439".
An example of a combination would be (1) 105" pallet, (1) 100" pallet and (3) 78" pallets.
Another example would be (1) 105" pallet, (1) 100" pallet, (1) 84" pallet, (1) 78" pallet and (1) 72" pallet.
Obviously there are a number of these combinations...and I need to find them all.
I'm wondering if this is possible with excel? I just discovered "Solver" but haven't quite figured it out yet.
Any input would be greatly appreciated. I am kind of running in circles here...
Using a bit of Python and the constraint solver https://pypi.org/project/python-constraint/:
import constraint
h = [105, 100, 84, 78, 72] # heights
total = 439
n = len(h) # number of different heights
# max number of pallets that can fit
Max = int(max([total/h[i] for i in range(n)]))
problem = constraint.Problem()
problem.addVariables( [f"h{j}" for j in h], range(Max+1) )
problem.addConstraint(constraint.MaxSumConstraint(total,h))
s = problem.getSolutions()
print(f"number of solutions:{len(s)}")
print(s)
Output:
number of solutions:194
[{'h100': 4, 'h105': 0, 'h72': 0, 'h78': 0, 'h84': 0},
{'h100': 3, 'h105': 1, 'h72': 0, 'h78': 0, 'h84': 0},
{'h100': 3, 'h105': 0, 'h72': 1, 'h78': 0, 'h84': 0},
...
{'h100': 0, 'h105': 0, 'h78': 0, 'h84': 0, 'h72': 1},
{'h100': 0, 'h105': 0, 'h78': 0, 'h84': 0, 'h72': 0}]

mpi_alltoall on nonequal number of grids and processes

I understand the general usage of MPI_alltoall, which can be described by the following figure
But in practice, it is almost not always that the number of processes will equal to the number grids. The above case assume process = grid = 4. If numbers are not equal, I will have rectangular grids. Below I show an example showing a similar alltoall operation, but nonequal number of grids and processes (grid = 8, process = 2).
My question is then very straightforward, how should I achieve that?
I have looked over alltoallv, but I don't think it will work.
Any suggestions are welcome.
Thank you
a "natural" alltoall would be
MPI_Alltoall(sbuf, 4, MPI_INT, rbuf, 4, MPI_INT, MPI_COMM_WORLD);
and you would end up with
P0 = { A0, A1, A2, A3, C0, C1, C2, C3}
P1 = { B0, B1, B2, B3, D0, D1, D2, D3}
your case is a bit convoluted and you have to use (complex) derived datatypes. (note I did not free the intermediate datatypes in order to keep the code readable)
MPI_Datatype tmp, stype, rtype;
/* derived datatype for send */
MPI_Type_vector(2, 1, 4, MPI_INT, &tmp); /* {0, 4} */
MPI_Type_create_resized(tmp, 0, 4, &tmp); /* next type starts at 1 */
MPI_Type_contiguous(2, tmp, &tmp); /* {0, 4, 1, 5} */
MPI_Type_create_resized(tmp, 0, 8, &stype); /* next type starts at 2, likely unnecessary */
MPI_Type_commit(&stype);
/* derived datatype for recv */
MPI_Type_vector(2, 2, 4, MPI_INT, &tmp); /* {0, 1, 4, 5 } */
MPI_Type_create_resized(tmp, 0, 8, &rtype); /* next type starts at 2 */
MPI_Type_commit(&rtype);
/* all2all */
/* thanks to the derived datatypes :
P0 sends {A0, B0, A1, B1} to P0 and {A2, B2, A3, B3} to P1
P0 receives {A0, B0, .., .., A1, B1, .., ..} from itself, and
{ .., .., C0, D0, .., .., C1, D1} from P1 } */
MPI_Alltoall(sbuf, 1, stype, rbuf, 1, rtype, MPI_COMM_WORLD);

Threejs - How to offset all points on a 2d geometry by distance

Using Three.js, (although I believe this is more math related) I have a set of 2D points that can create a 2D geometry. such as square, rectangle, pentagon, or custom 2D shape. Based of the original 2D shape, I would like to create a method to offset the points inward or outward uniformly in such a way like the attached image.
I don't know if there is a simple way to offset/grow/shrink all the points (vector3) uniformly on the 2D shape inward or outward. And if so, it'll be cool if I can offset the points by X distance? Kinda of like saying offset the points on the 2D shape outward or inward by X distance.
And no, I'm not referring to scaling from a center point. While scaling may work for symmetrical shapes, it won't work when it comes to non-symmetrical shapes.
see image for example
Thanks in advance.
You can read that forum thread.
I've made some changes with ProfiledContourGeometry and got OffsetContour, so I leave it here, just in case, what if it helps :)
function OffsetContour(offset, contour) {
let result = [];
offset = new THREE.BufferAttribute(new Float32Array([offset, 0, 0]), 3);
console.log("offset", offset);
for (let i = 0; i < contour.length; i++) {
let v1 = new THREE.Vector2().subVectors(contour[i - 1 < 0 ? contour.length - 1 : i - 1], contour[i]);
let v2 = new THREE.Vector2().subVectors(contour[i + 1 == contour.length ? 0 : i + 1], contour[i]);
let angle = v2.angle() - v1.angle();
let halfAngle = angle * 0.5;
let hA = halfAngle;
let tA = v2.angle() + Math.PI * 0.5;
let shift = Math.tan(hA - Math.PI * 0.5);
let shiftMatrix = new THREE.Matrix4().set(
1, 0, 0, 0,
-shift, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1
);
let tempAngle = tA;
let rotationMatrix = new THREE.Matrix4().set(
Math.cos(tempAngle), -Math.sin(tempAngle), 0, 0,
Math.sin(tempAngle), Math.cos(tempAngle), 0, 0,
0, 0, 1, 0,
0, 0, 0, 1
);
let translationMatrix = new THREE.Matrix4().set(
1, 0, 0, contour[i].x,
0, 1, 0, contour[i].y,
0, 0, 1, 0,
0, 0, 0, 1,
);
let cloneOffset = offset.clone();
console.log("cloneOffset", cloneOffset);
shiftMatrix.applyToBufferAttribute(cloneOffset);
rotationMatrix.applyToBufferAttribute(cloneOffset);
translationMatrix.applyToBufferAttribute(cloneOffset);
result.push(new THREE.Vector2(cloneOffset.getX(0), cloneOffset.getY(0)));
}
return result;
}
Feel free to modify it :)
I have some doubts about solutions that do not include number of edges modification.
I faced the same issue in this project where I wanted to ensure a known distance between voronoi cells, and I quickly figured out that scale does not fulfill the use case. But one complication I faced was the disappearance of some edges that I had to handle in a while loop. It was so difficult to debug that I had to create a debug mode that helps see the points and lines, that I also left available. It's possible to activate this debug mode with a checkbox:
Note for the images, I have them as links not embedded as I'm still new contributor (might improve that later).
The edges that shall disappear are shown in red
retraction snapshot1
retraction with edges discard 1
retraction with edges discard 2
Here a link to the function in action, you might have to modify it to have another points format though :
https://github.com/WebSVG/voronoi/blob/8893768e3929ea713a47dba2c4d273b775e0bd82/src/voronoi_diag.js#L278
And here a link to the complete project integrating this function, it has link to a live demo too
https://github.com/WebSVG/voronoi

MPI Communication Pattern

I was wondering if there was a smart way to do this. Let's say I have three nodes, 0, 1, 2. And let's say each node has an array, a0, a1, a2. If the contents of each node is something like
a0 = {0, 1, 2, 1}
a1 = {1, 2, 2, 0}
a2 = {0, 0, 1, 2}
Is there a clever communication pattern so to move each number to it's corresponding node, i.e.
a0 = {0, 0, 0, 0}
a1 = {1, 1, 1, 1}
a2 = {2, 2, 2, 2}
The approach I have in mind, would involve sorting and temporary buffers, but I was wondering if there was a smarter way?
You can use MPI_Alltoallv for this in the following way:
Sort the local_data (a) by corresponding node of each element in increasing order.
Create a send_displacements array such that send_displacements[r] indicates the index of the first element in the local_data that refers to node r.
Create a send_counts array such that send_counts[r] equals the number of elements in local_data that correspond to node r. This can be computed send_counts[r] = send_displacements[r+1] - send_displacements[r] except for the last rank.
MPI_Alltoall(send_counts, 1, MPI_INT, recv_counts, 1, MPI_INT, comm)
Compute recv_displacements such that recv_displacements[r] = sum(recv_counts[r'] for all r' < r).
Prepare a recv_data with sum(recv_counts) elements.
MPI_Alltoallv(local_data, send_counts, send_displacements, MPI_INT, recv_data, recv_counts, recv_displacements, MPI_INT, comm)

How to multiply each digit in a number efficiently

I want to multiply every digit in a number to each other.
For example
515 would become 25(i.e 5*1*5)
10 would become 0(i.e 1*0)
111111 would become 1(i.e 1*1*1*1*1*1)
I used this code to do it
public static int evalulate(int no)
{
if(no==0)return 0;
int temp=1;
do
{
temp=(no%10)*temp;
no=no/10;
}while(no>0);
return temp;
}
problem is I want to evaluate for about a billion numbers like this
for(int i=0;i<1000000000;i++)evaluate(i);
This takes about 146 seconds on my processor.I want to evaluate it within some seconds.
So,is it possible to optimize this code using some shift,and,or operators so that I can reduce the time to evaluate without using multiple threads or parallelizing it
Thanks
First, figure out how many numbers you can store in memory. For this example, let's say you can store 999 numbers.
Your first step will be to pre-calculate the products of digits for all numbers from 0-999, and store that in memory. So, you'd have an array along the lines of:
multLookup = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0, 2, 4, 6, 8, 10, 12, 14, 16, 18,
0, 3, 6, 9, 12, 15, 18, 21, 24, 27,
0, 4, 8, 12, 16, 20, 24, 28, 32, 36,
...]
Now, you'd break your number up into a bunch of 3 digit numbers. For example, if your number is 1739203423, you'd break it up into 1, 739, 203, and 423. You'd look each of these up in your multLookup array, and multiply the results together, like so:
solution = multLookup[1] * multLookup[739] * multLookup[203] * multLookup[423];
With this approach, you will have sped up your calculations by a factor of 3 (since we picked 999 items to store in memory). To speed it up by 5, store 99999 numbers in memory and follow the same steps. In your case, speeding it up by 5 means you'll arrive at your solution in 29.2 seconds.
Note: the gain isn't exactly linear with respect to how many numbers you store in memory. See jogojapan's reasoning in the comments under this answer for why that is.
If you know more about the order in which your numbers show up, or the range of your numbers (say your input is only in the range of [0, 10000]), you can make this algorithm smarter.
In your example, you're using a for loop to iterate from 0 to 1000000000. In this case, this approach will be super efficient because the memory won't page-fault very frequently and there will be fewer cache-misses.
But wait! You can make this even faster (for your specific for-loop iteration example)!! How, you ask? Caching! Lets say you're going through 10 digit numbers.
Let's say you start off at 8934236000. Based on the 999 digits in memory solution, you'd break this down into 8, 934, 236, and 000. Then you'd multiply:
solution = multLookup[8] * multLookup[934] * multLookup[236] * multLookup[0];
Next, you'd take 8934236001, break it down to 8, 934, 236, and 001, and multiply:
solution = multLookup[8] * multLookup[934] * multLookup[236] * multLookup[1];
And so on... But we notice that the first three lookups are the same for the next 997 iterations! So, we cache that.
cache = multLookup[8] * multLookup[934] * multLookup[236];
And then we use the cache as such:
for (int i = 0; i < 1000; i++) {
solution = cache * i;
}
And just like that, we've almost reduced the time by a factor of 4. So you take the ~29.2 second solution you had, and divide that by 4 to go through all billion numbers in ~7.3 seconds
If you can store the result of each operation for all your numbers.. Then you can use Memoization. That way you need to only calculate 1 digit.
int prodOf(int num){
// can be optimized to store 1/10 of the numbers, since the last digit will always be processed
static std::vector<int> memo(<max number of iterations>, -1);
if(num == 0) return 0;
if(memo[num] != -1 )return memo[num];
int prod = (num%10) * prodOf(num/10);
memo[num] = prod;
return prod;
}
Some test i made,
With simple C/C++ code on my PC (Xeon 3.2GHz),
last no = i = 999999999 ==> 387420489 nb sec 23
#include "stdafx.h"
#include <chrono>
#include <iostream>
#undef _TRACE_
inline int evaluate(int no)
{
#ifdef _TRACE_
std::cout << no;
#endif
if(no==0)return 0;
int temp=1;
do
{
temp=(no%10)*temp;
no=no/10;
}while(no>0);
#ifdef _TRACE_
std::cout << " => " << temp << std::endl;
#endif // _TRACE_
return temp;
}
int _tmain(int argc, _TCHAR* argv[])
{
std::chrono::time_point<std::chrono::system_clock> start(std::chrono::system_clock::now());
int last = 0;
int i = 0;
for(/*int i = 0*/;i<1000000000;++i) {
last = evaluate(i);
}
std::cout << "last no = i = " << (i-1) << " ==> " << last << std::endl;
std::chrono::time_point<std::chrono::system_clock> end(std::chrono::system_clock::now());
std::cout << "nb sec " << std::chrono::duration_cast<std::chrono::seconds>(end - start).count() << std::endl;
return 0;
}
I also tested the loop split over multiple thread with openMP and result is 0 second,
So I would say that it would be useful if you consider performance problem of using a real efficient language.
pragma omp parallel for
for(int i = 0;i<1000000000;++i) {
/*last[threadID][i] = */evaluate(i);
}

Resources