Suppose that I have a 4x4 grid with positions numbered as so:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
And I want to put pieces in this grid, the pieces can be of any of the following 10 types: {A, B, C, D, E, F, G, H, I, J}
How can I efficiently generate all possible ways of putting 3 pieces in this grid(allowing for type repetition)?
Right now I have a bunch of for loops, but this doesn't look efficient nor it is scalable(if I want to put more pieces I need to rewrite the code to add for loops).
The purpose of this post is to ask if anyone knows a better way of doing this, or could at least point some papers to help me finding a solution to this problem.
Thanks!
Related
This question is basically a duplicate of this, however I'm interested in solutions in R.
Does anyone know an approach with igraph or other CRAN-based packages which would allow you to identify closed loops (for example, DGHD, BCDB, or BCEFDB, if the letters are nodes)?
Note that I have a relatively large network with ~ 700 edges and ~ 100 nodes, so it would be good if the solution is not computationally too expensive.
One more important piece of information is that my network is directed.
I am assuming that you are only interested in paths that do not go through any node twice except that the beginning equals the end. With a little work, you can do this in igraph using all_simple_paths. The key point to notice is that any closed loop without repeated nodes is a simple path from a vertex, v, to one of v's neighbors, followed by the single link from the neighbor back to v. I will show how to get all simple closed loops like this starting and ending with a single node. You can simply loop through all of the nodes if you want all examples in the graph.
First, we need some example data.
library(igraph)
set.seed(1234)
g = erdos.renyi.game(8,0.35)
plot(g)
I will get the closed loops starting and ending at node 8, because that node shows the interesting issues.
V = 8
SP = all_simple_paths(g, from=V, to=neighbors(g, v=V))
We do not want to include paths that just go to a neighbor and directly back (like 8-2-8) so we eliminate the paths with just one link.
SP2 = SP[sapply(SP, function(p) length(p)> 2)]
Depending on what you want, we might be done here, but I suspect that you do not want both a path and the same path in reverse, e.g. I think that you do not want both 8-2-5-8 and 8-5-2-8. We can get rid of these duplicates by insisting that the first neighbor (the second node in the path) has a smaller index than the last one.
SP3 = SP2[sapply(SP2, function(p) p[2] < p[length(p)])]
But we have also left off the return to the first node, so we add the first node on to the end of each path.
SP4 = lapply(SP3, function(p) c(unclass(p), V))
SP4
[[1]]
[1] 8 2 5 8
[[2]]
[1] 8 2 5 4 8
[[3]]
[1] 8 2 5 7 3 4 8
[[4]]
[1] 8 4 3 7 5 8
[[5]]
[1] 8 4 5 8
this is probably a simple one, but I somehow got stuck...
I need to many loops to get the result of every sample in my support like the usual stacked loops:
for (a in 1:N1){
for (b in 1:N2){
for (c in 1:N3){
...
}
}
}
but the number of the for loops needed in this messy system depends on another random variable, let's say,
for(f in 1:N.for)
so how can I write a for loop to do deal with this? Or are there more elegant ways to do this?
note that the difference is that the nested for loops above (the variables a,b,c,...) do matter in my calculations, but the variable f of the for loop that controls for the number of for loops needed does not go into any of my calculations for my real purpose - all it does is count/ensure the number of for loops needed is correct.
Did I make it clear?
So what I am actually trying to do is generate all the possible combinations of a number of peoples preferences towards others.
Let's say I have 6 people (the simplest case for my purpose): Abi, Bob, Cath, Dan, Eva, Fay.
Abi and Bob have preference lists of C D E F ( 4!=24 possible permutations for each of them);
Cath and Dan have preference lists of A B and E F, respectively (2! * 2! = 4 possible permutations for each of them);
Eva and Fay have preference lists of A B C D (4!=24 possible permutations for each of them);
So all together there should be 24*24*4*4*24*24 possible permutations of preferences when taking all six them together.
I am just wondering what is a clear, easy and systematic way to generate them all at once?
I'd want them in the format such as
c.prefs <- as.matrix(data.frame(Abi = c("Eva", "Fay", "Dan", "Cath"),Bob = c("Dan", "Eva", "Fay", "Cath"))
but any clear format is fine...
Thank you so much!!
I'll assume you have a list of each loop variable and its maximum value, ordered from the outermost to innermost variable.
loops <- list(a=2, b=3, c=2)
You could create a data frame with all the loop variable values in the correct order with:
(indices <- rev(do.call(expand.grid, lapply(rev(loops), seq_len))))
# a b c
# 1 1 1 1
# 2 1 1 2
# 3 1 2 1
# 4 1 2 2
# 5 1 3 1
# 6 1 3 2
# 7 2 1 1
# 8 2 1 2
# 9 2 2 1
# 10 2 2 2
# 11 2 3 1
# 12 2 3 2
If the code run at the innermost point of the nested loop doesn't depend on the previous iterations, you could use something like apply to process each iteration independently. Otherwise you could loop through the rows of the data frame with a single loop:
for (i in seq_len(nrow(indices))) {
# You can get "a" with indices$a[i], "b" with indices$b[i], etc.
}
For the way of doing the calculation, an option is to use the Reduce function or some other higher-order function.
Since your data is not inherently ordered (an individual is part of a set, its preferences are part of the set) I would keep indivudals in a factor and have eg preferences in lists named with the individuals. If you have large data you can store it in an environment.
The first code is just how to make it reproducible. the problem domain was akin for graph oriented naming. You just need to change in the first line and in runif to change the behavior.
#people
verts <- factor(c(LETTERS[1:10]))
#relations, disallow preferring yourself
edges<-lapply(seq_along(verts), function(ind) {
levels(verts)[-ind]
})
names(edges) <- levels(verts)
#directions
#say you have these stored in a list or something
pool <- levels(verts)
directions<-lapply(pool, function(vert) {
relations <- pool[unique(round(runif(5, 1, 10)))]
relations[!(vert %in% relations)]
})
names(directions) = pool
num_prefs <- (lapply(directions, length))
names(num_prefs) <- names(directions)
#First take factorial of each persons preferences,
#then reduce that with multiplication
combinations <-
Reduce(`*`,
sapply(num_prefs, factorial)
)
I hope this answers your question!
We want to use the dtw library for R in order to shrink and expand certain time series data to a standard length.
Consider, three time series with equivalent columns. moref is of length(rows) 105, mobig is 130 and mosmall is 100. We want to project mobig and mosmall to a length of 105.
moref <- good_list[[2]]
mobig <- good_list[[1]]
mosmall <- good_list[[3]]
Therefore, we compute two alignments.
ali1 <- dtw(mobig, moref)
ali2 <- dtw(mosmall, moref)
If we print out the alignments the result is:
DTW alignment object
Alignment size (query x reference): 130 x 105
Call: dtw(x = mobig, y = moref)
DTW alignment object
Alignment size (query x reference): 100 x 105
Call: dtw(x = mosmall, y = moref)
So exactly what we want? From my understanding we need to use the warping functions ali1$index1 or ali1$index2 in order to shrink or expand the time series. However, if we invoke the following commands
length(ali1$index1)
length(ali2$index1)
length(ali1$index2)
length(ali2$index2)
the result is
[1] 198
[1] 162
[1] 198
[1] 162
These are vector with indices (probably refering to other vectors). Which one of these can we use for the mapping? Aren't they all to long?
First of all, we need to agree that index1 and index2 are two vectors of the same length that maps query/input data to reference/stored data and vice versa.
Since you did not give out any data. Here is some dummy data to give people an idea.
# Reference data is the template that we use as reference.
# say perfect pronunciation from CNN
data_reference <- 1:10
# Query data is the input data that we want to map to our reference
# say random youtube audio
data_query <- seq(1,10,0.5) + rnorm(19)
library(dtw)
alignment <- dtw(x=data_query, y=data_reference, keep=TRUE)
alignment$index1
alignment$index2
lcm <- alignment$costMatrix
image(x=1:nrow(lcm), y=1:ncol(lcm), lcm)
plot(alignment, type="threeway")
Here are the outputs:
> alignment$index1
[1] 1 2 3 4 5 6 7 7 8 9 10 11 12 13 13 14 14 15 16 17 18 19
> alignment$index2
[1] 1 1 1 2 2 3 3 4 5 6 6 6 6 6 7 8 9 9 9 9 10 10
So basically, the mapping from index1 to index2 is how to map input data to the reference data.
i.e. the 10th data point at the input data has been matched to the 6th data point from the template.
index1: Warping function φx(k) for the query
index2: Warping function φy(k) for the reference
-- Toni Giorgino
Per your question, "what is the deal with the length of the index", since it is basically the coordinates of the optimal, path, it could be as long as m+n(really shallow) or min(m,n) (perfect diagonal). Clearly, it is not a one-to-one mapping which might bothers people a little bit, I guess you can do more research from here how to pick up the mapping you want.
I don't know if there is some buildin function functionality to pick up the best one-to-one mapping. But here is one way.
library(plyr)
mapping <- data.frame(index1=alignment$index1, index2=alignment$index2)
mapping <- ddply(mapping, .(index1), summarize, index2_new = max(index2))
Now mapping contains a one-to-one mapping from query to reference. Then you can map the query to the reference and scale the mapped input in whatever way you want.
I am not exactly sure about the content below the line and anyone is more than welcome to make any improvement how the mapping and scaling should work.
References: 1, 2
I know you can use read.table to read one matrix from a file, but I would like to read two matrices of the same size (m by n) from one file in R and put them in two separate R variables.
For example, this file contains two 3 by 2 matrices:
6 3
2 5
5 4
4 3
6 3
3 4
Here is my shot at it.
split(read.table("data.txt"), gl(2, 3, labels=c("x1", "x2")))
It should be easy to generalize this and wrap it up into a function.
I hope this helps.
I am trying to implement the Affinity Propagation clustering algorithm in C++. As part of testing I want to compare my results with well established implementations of the algorithm in Matlab (Link) and in R (package apcluster). Unfortunately, the clusterings do not agree.
To be more precise, the (test) data set is:
0.9411760 0.9702140
0.9607826 0.9744693
0.9754896 0.9574479
0.9852929 0.9489372
0.9950962 0.9234050
1.0000000 0.8936175
1.0000000 0.8723408
0.9852929 0.8595747
1.0000000 0.8893622
1.0000000 0.9191497
In R I typed:
S<-negDistMat(data)
A<-apcluster(S,maxits=1000,convits=100, lam=0.9,q=0.5)
and got:
> A#idx
2 2 2 5 5 9 9 9 9 5
2 2 2 5 5 9 9 9 9 5
In Matlab I just typed:
[idx,netsim,dpsim,expref]=apcluster(S,diag(S));
From the apcluster.m file implementing apcluster (line 77):
maxits=1000; convits=100; lam=0.9; plt=0; details=0; nonoise=0;
This explains the parameters for R, in Matlab their are the default values. Since I'm more comfortable with R concerning Affinity Propagation, for comparison reasons I stuck with Matlab's defaults, just to avoid messing something up unintentionally.
..but got:
>> idx'
ans =
3 3 3 3 5 9 9 9 9 5
In both cases the similarity matrices matched. What could I've missed?
Update:
I've also implemented the Matlab code proposed by Frey & Dueck in their original publication. (You may notice that I omitted noise) and although I can replicate the indexes provided by the former Matlab implementation, Availability and Responsibility matrices differ on some values. The error is less than 0.01 but this is significant.
Their code is:
function [idx,A,R]=frey(S);
N=size(S,1);
A=zeros(N,N);
R=zeros(N,N);
lam=0.9; % Set damping factor
for iter=1:122
% Compute responsibilities
Rold=R;
AS=A+S;
[Y,I]=max(AS,[],2);
for i=1:N
AS(i,I(i))=-realmax;
end;
[Y2,I2]=max(AS,[],2);
R=S-repmat(Y,[1,N]);
for i=1:N
R(i,I(i))=S(i,I(i))-Y2(i);
end;
R=(1-lam)*R+lam*Rold; % Dampen responsibilities
% Compute availabilities
Aold=A;
Rp=max(R,0);
for k=1:N
Rp(k,k)=R(k,k);
end;
A=repmat(sum(Rp,1),[N,1])-Rp;
dA=diag(A);
A=min(A,0);
for k=1:N
A(k,k)=dA(k);
end;
A=(1-lam)*A+lam*Aold; % Dampen availabilities
end;
E=R+A; % Pseudomarginals
I=find(diag(E)>0); K=length(I); % Indices of exemplars
[tmp c]=max(S(:,I),[],2); c(I)=1:K; idx=I(c); % Assignments
I have tried all your code and the problem is caused by the way you supply the input preference. In the first case (R), you specify q=0.5. This means that the input preference p is set to the median of off-diagonal similarities (in your example, this is -0.05129912). If I run the Matlab code as follows (I used Octave, but Matlab should give the same result), I get:
octave:7> [idx,netsim,dpsim,expref]=apcluster(S,-0.05129912);
octave:8> idx'
ans =
2 2 2 5 5 9 9 9 9 5
This is exactly the same as the R result. If I run your Matlab code (with diag(S) being the second argument) and if I run
apcluster(S, p=diag(S))
in R (which sets the input preference to 0 for all samples in both cases), I get 10 one-sample clusters in both cases. So the two results match again, though I could not recover your Matlab result
3 3 3 3 5 9 9 9 9 5
I hope that makes the difference clear.
Cheers, UBod