Generalization of MPI rank number to MPI groups?

Generalization of MPI rank number to MPI groups? - mpi

Is there any generalization of rank number to group numbers? For my code I would like to create a hierarchical decomposition of MPI::COMM_WORLD. Assume we make use of 16 threads. I use MPI::COMM_WORLD.Split to create 4 communicators each having 4 ranks. Is there now an MPI function that provides some unique ids to the corresponding four groups?

Well, you can still refer to each process by its original rank in MPI_COMM_WORLD. You also have complete control over what rank each process receives in its new communicator via the color and key arguments of MPI_Comm_split(). This is plenty enough information to create a mapping between old ranks and new groups/ranks.

If you don't like #suszterpatt's answer (I do) you could always abuse a Cartesian communicator and pretend that the process at index (2,3) in the communicator is process 3 in group 2 of your hierarchical decomposition.
But don't read this and take away the impression that I recommend such abuse, it's just a thought.

Related

Difference between All-to-All Reduction and All-Reduce in MPI

Trying to figure out the difference between All-to-All Reduction and All-Reduce in open MPI. From my understanding All-to-One Reduction takes a piece m (integer, array, etc..) from all processes and combines all the pieces together with an operator (min, max, sum, etc..) and stores it in the selected process. From this i assume that All-to-All Reduction is the same but the product is stored in all the processes instead of just one. From this document it seems like All-Reduce is basically doing the same as All-to-All Reduction, is this right or am i getting it wrong?

The all-reduce (MPI_Allreduce) is a combined reduction and broadcast (MPI_Reduce, MPI_Bcast). They might have called it MPI_Reduce_Bcast. It is important to note that a MPI reduction does not do any global reduction. So if have 10 numbers each on 5 processes, after a MPI_Reduce one process has 10 numbers. After MPI_Allreduce, all 5 processes have the same 10 numbers.
In contrast, the all-to-all reduction performs a reduction and scatter, hence it is called MPI_Reduce_scatter[_block]. So if you have 10 numbers each on 5 processes, after a MPI_Reduce_scatter_block, the 5 processes have 2 numbers each. Note that MPI doesn't itself use the terminology all-to-all reduction, probably due to the misleading ambiguity.

Collective communications along directions in MPI cartesian topology

I have a 3D cartesian topology of nx by ny by nz processes.
There are mathematical calculations that involve at the same time only "pencils" of processors. In the case of a 3 by 3 by 3 matrix of processes, ranked from 0 to 26, process 4 is involved in three operations:
with processes 13 and 22 along the first direction
with processes 1 and 7 along the second direction
with processes 3 and 5 along the third direction
That mathematical operations require both point to point and collective communications between processes belonging to the same pencil.
For what concerns point to point communications, I used MPI_CART_SHIFT to make each process know the ranks of neighboring processes. (Then I'm going to use MPI_SENDRECV.)
For what concerns collective communications, how to perform such communications?
I think a solution could be to define "pencil" communicators, which would be in number of nx*ny + nx*nz + ny*nz (this number of communicators required is asymptotically small with respect to the number of processes, as the number of processes per direction grows).
Would this be the only way? Is there no standard subroutine relying on the cartesian communicator to perform such collective communications?

The neighborhood collectives are really the only routines that can directly exploit the connectivity information for cartesian topologies. However, they would treat all the directions (x, y, z) in the same way so wouldn't help you with your pencil scheme.
I think the only way to do this is as you suggest, i.e. construct a complete set of pencil communicators. Note that MPI does give you an easy way to do this by calling MPI_Cart_sub on the cartesian communicator. Once you've constructed the pencil communicators then you would also have the option of using neighborhood collectives on the pencils rather than point-to-point, but for a 1D communicator it's not clear that this has many advantages to computing neighbours by hand as you do at present.

customer segmentation in retail [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have a large sales database of a 'home and construction' retail.
And I need to know who are the electricians, plumbers, painters, etc. in the store.
My first approach was to select the articles related to a specialty (wires [article] is related to an electrician [specialty], for example) And then, based on customer sales, know who the customers are.
But this is a lot of work.
My second approach is to make a cluster segmentation first, and then discover which cluster belong to a specialty. (this is a lot better because I would be able to discover new segments)
But, how can I do that? What type of clustering should I occupy? Kmeans, fuzzy? What variables should I take to that model? Should I use PCA to know how many cluster to search?
The header of my data (simplified):
customer_id | transaction_id | transaction_date | item_article_id | item_group_id | item_category_id | item_qty | sales_amt
Any help would be appreciated
(sorry my english)

You want to identify classes of customers based on what they buy (I presume this is for marketing reasons). This calls for a clustering approach. I will talk you through the entire setup.
The clustering space
Let us first consider what exactly you are clustering: either orders or customers. In either case, the way you characterize the items and the distances between them is the same. I will discuss the basic case for orders first, and then explain the considerations that apply to clustering by customers instead.
For your purpose, an order is characterized by what articles were purchased, and possibly also how many of them. In terms of a space, this means that you have a dimension for each type of article (item_article_id), for example the "wire" dimension. If all you care about is whether an article is bought or not, each item has a coordinate of either 0 or 1 in each dimension. If some order includes wire but not pipe, then it has a value of 1 on the "wire" dimension and 0 on the "pipe" dimension.
However, there is something to say for caring about the quantities. Perhaps plumbers buy lots of glue while electricians buy only small amounts. In that case, you can set the coordinate in each dimension to the quantity of the corresponding article (presumably item_qty). So suppose you have three articles, wire, pipe and glue, then an order described by the vector (2, 3, 0) includes 2 wire, 3 pipe and 0 glue, while an order described by the vector (0, 1, 4) includes 0 wire, 1 pipe and 4 glue.
If there is a large spread in the quantities for a given article, i.e. if some orders include order of magnitude more of some article than other orders, then it may be helpful to work with a log scale. Suppose you have these four orders:
2 wire, 2 pipe, 1 glue
3 wire, 2 pipe, 0 glue
0 wire, 100 pipe, 1 glue
0 wire, 300 pipe, 3 glue
The former two orders look like they may belong to electricians while the latter two look like they belong to plumbers. However, if you work with a linear scale, order 3 will turn out to be closer to orders 1 and 2 than to order 4. We fix that by using a log scale for the vectors that encode these orders (I use the base 10 logarithm here, but it does not matter which base you take because they differ only by a constant factor):
(0.30, 0.30, 0)
(0.48, 0.30, -2)
(-2, 2, 0)
(-2, 2.48, 0.48)
Now order 3 is closest to order 4, as we would expect. Note that I have used -2 as a special value to indicate the absence of an article, because the logarithm of 0 is not defined (log(x) tends to negative infinity as x tends to 0). -2 means that we pretend that the order included 1/100th of the article; you could make the special value more or less extreme, depending on how much weight you want to give to the fact that an article was not included.
The input to your clustering algorithm (regardless of which algorithm you take, see below) will be a position matrix with one row for each item (order or customer), one column for each dimension (article), and either the presence (0/1), amount, or logarithm of the amount in each cell, depending on which you choose based on the discussion above. If you cluster by customers, you can simply sum the amounts from all orders that belong to that customer before you calculate what goes into each cell of your position matrix (if you use the log scale, sum the amounts before taking the logarithm).
Clustering by orders rather than by customers gives you more detail, but also more noise. Customers may be consistent within an order but not between them; perhaps a customer sometimes behaves like a plumber and sometimes like an electrician. This is a pattern that you will only find if you cluster by orders. You will then find how often each customer belongs to each cluster; perhaps 70% of somebody's orders belong to the electrician type and 30% belong to the plumber type. On the other hand, a plumber may only buy pipe in one order and then only buy glue in the next order. Only if you cluster by customers and sum the amounts of their orders, you get a balanced view of what each customer needs on average.
From here on I will refer to your position matrix by the name my.matrix.
The clustering algorithm
If you want to be able to discover new customer types, you probably want to let the data speak for themselves as much as possible. A good old fashioned
hierarchical clustering with complete linkage (CLINK) may be an appropriate choice in this case. In R, you simply do hclust(dist(my.matrix)) (this will use the Euclidean distance measure, which is probably good enough in your case). It will join closely neighbouring items or clusters together until all items are categorized in a hierarchical tree. You can treat any branch of the tree as a cluster, observe typical article amounts for that branch and decide whether that branch represents a customer segment by itself, should be split in sub-branches, or joined with a sibling branch instead. The advantage is that you find the "full story" of which items and clusters of items are most similar to each other and how much. The disadvantage is that the outcome of the algorithm does not tell you where to draw the borders between your customer segments; you can cut up the clustering tree in many ways, so it's up to your interpretation how you want to identify your customer types.
On the other hand, if you are comfortable fixing the number of clusters (k) beforehand, k-means is a very robust way to get just any segmentation of your customers in k distinct types. In R, you would do kmeans(my.matrix, k). For marketing purposes, it may be sufficient to have (say) 5 different profiles of customers that you make custom advertisement for, rather than treating all customers the same. With k-means you don't explore all of the diversity that is present in your data, but you might not need to do so anyway.
If you don't want to fix the number of clusters beforehand, but you also don't want to manually decide where to draw the borders between the segments afterwards, there is a third possibility. You start with the k-means algorithm, where you let it generate an amount of cluster centers that is much larger than the number of clusters that you hope to end up with (for example, if you hope to end up with somewhere about 10 clusters, let the k-means algorithm look for 200 clusters). Then, use the mean shift algorithm to further cluster the resulting centers. You will end up with a smaller number of compact clusters. The approach is explained in more detail by James Li over here. You can use the mean shift algorithm in R with the ms function from the LPCM package, see this documentation.
About using PCA
PCA will not tell you how many clusters you need. PCA answers a different question: which variables seem to represent a common underlying (hidden) factor. In a sense, it is a way to cluster variables, i.e. properties of entities, not to cluster the entities themselves. The number of principal components (common underlying factors) is not indicative of the number of clusters needed. PCA can still be interesting if you want to learn something about the predictive value of each article about a customer's interests.
Sources
Michael J. Crawley, 2005. Statistics. An Introduction using R.
Gerry P. Quinn and Michael J. Keough, 2002. Experimental Design and Data Analysis for Biologists.
Wikipedia: hierarchical clustering, k-means, mean shift, PCA

Error in MPI broadcast

Sorry for the long post. I did read some other MPI broadcast related errors but I couldn't
find out why my program is failing.
I am new to MPI and I am facing this problem. First I will explain what I am trying to do:
My declarations :
ROWTAG 400
COLUMNTAG 800
Create a 2 X 2 Cartesian topology.
Rank 0 has the whole matrix. It wants to dissipate parts of matrix to all the processes in the 2 X 2 Cartesian topology. For now, instead
of matrix I am just dealing with integers. So for process P(i,j) in 2 X 2 Cartesian topology, (i - row , j - column), I want it to receive
(ROWTAG + i ) in one message and (COLUMNTAG + j) in another message.
My strategy to do so is:
Processes: P(0,0) , P(0,1), P(1,0), P(1,1)
P(0,0) has all the initial data.
P(0,0) sends (ROWTAG+1) (in this case 401) to P(1,0) - In essense P(1,0) is responsible for dissipating information related to row 1 for all the processes in Row 1 - I just used a blocking send
P(0,0) sends (COLUMNTAG+1) (in this case 801) to P(0,1) - In essense P(0,1) is responsible for dissipating information related to column 1 for all the processes in Column 1 - Used a blocking send
For each process, I made a row_group containing all the processes in that row and out of this created a row_comm (communicator object)
For each process, I made a col_group containing all the processes in that column and out of this created a col_comm (communicator object)
At this point, P(0,0) has given information related to row 'i' to Process P(i,0) and P(0,0) has given information related to column 'j' to
P(0,j). I call P(i,0) and P(0,j) as row_head and col_head respectively.
For Process P(i,j) , P(i,0) gives information related to row i, and P(0,j) gives information related to column j.
I used a broad cast call:
MPI_Bcast(&row_data,1,MPI_INT,row_head,row_comm)
MPI_Bcast(&col_data,1,MPI_INT,col_head,col_comm)
Please find my code here: http://pastebin.com/NpqRWaWN
Here is the error I see:
* An error occurred in MPI_Bcast
on communicator MPI COMMUNICATOR 5 CREATE FROM 3
MPI_ERR_ROOT: invalid root
* MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
Also please let me know if there is any better way to distribute the matrix data.

There are several errors in your program. First, row_Ranks is declared with one element less and when writing to it, you possibly overwrite other stack variables:
int col_Ranks[SIZE], row_Ranks[SIZE-1];
// ^^^^^^
On my test system the program just hangs because of that.
Second, you create new subcommunicators out of matrixComm but you use rank numbers from the latter to address processes in the former when performing the broadcast. That doesn't work. For example, in a 2x2 Cartesian communicator ranks range from 0 to 3. In any column- or row-wise subgroup there are only two processes with ranks 0 and 1 - there is neither rank 2 nor rank 3. If you take a look at the value of row_head across the ranks, it is 2 in two of them, hence the error.
For a much better way to distribute the data, you should refer to this extremely informative answer.

Is it possible to do pagerank without the entire dataset?

Sorry if this is dumb but I was just thinking I should give a shot. Say I have a graph thats huge(for example, 100 billion nodes). Neo4J supports 32 Billion and others support more or less the same, so say I cannot have the entire dataset in a database at the same time, can I run pagerank on it if its a directed graph(no loops) and each set of nodes connect to the next set of nodes(so no new links will be created backwards, only new links are created to new sets of data).
Is there a way I can somehow take the previous pagerank scores and apply them to new datasets(I only care about the pagerank for the most recent set of data but need the previous set's pagerank to derive the last sets data)?
Does that make sense? If so, is it possible to do?

You need to compute the principle eigenvector of a 100 billion by 100 billion matrix. Unless it's extremely sparse, you can not fit that inside your machine. So, you need a way to compute the leading eigenvector of a matrix when you can only look at a small part of your matrix at a time.
Iterative methods to compute eigenvectors only require that you store a few vectors at each iteration (they'll each have 100 billion elements). Those may fit on your machine (with 4 byte floats you'll need around 375GB per vector). Once you have a candidate vector of rankings you can (very slowly) apply your giant matrix to it by reading the matrix in chunks (since you can look at 32 billion rows at a time you'll need just over 3 chunks). Repeat this process and you'll have the basics of the power method which is what gets used in pagerank. cf http://www.ams.org/samplings/feature-column/fcarc-pagerank and http://en.wikipedia.org/wiki/Power_iteration
Of course the limiting factor here is how many times you need to examine the matrix. It turns out that by storing more than one candidate vector and using some randomized algorithms you can get good accuracy with fewer reads of your data. This is a current research topic in the applied math world. You can find more information here http://arxiv.org/abs/0909.4061 , here http://arxiv.org/abs/0909.4061 , and here http://arxiv.org/abs/0809.2274 . There's code available here: http://code.google.com/p/redsvd/ but you can't just use that off-the-shelf for the data sizes you're talking about.
Another way you may go about this is to look into "incremental svd" which may suit your problem better but is a bit more complicated. Consider this note: http://www.cs.usask.ca/~spiteri/CSDA-06T0909e.pdf and this forum: https://mathoverflow.net/questions/32158/distributed-incremental-svd

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex