I would like to have a dataframe like this for example:
example=data.frame(a=c(1,2,3,4,5,6,7,8), b=c(1,2,3,4,5,6,7,8), c=c(1,2,3,4,5,6,7,8), d = c(1,2,3,4,5,6,7,8))
a b c d
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
Be transformed so that the first row stays in the same position, but each row after that shifts one column to the right from the previous row, for example:
a b c d X X.1 X.2 X.3 X.4 X.5 X.6
1 1 1 1 1 NA NA NA NA NA NA NA
2 NA 2 2 2 2 NA NA NA NA NA NA
3 NA NA 3 3 3 3 NA NA NA NA NA
4 NA NA NA 4 4 4 4 NA NA NA NA
5 NA NA NA NA 5 5 5 5 NA NA NA
6 NA NA NA NA NA 6 6 6 6 NA NA
7 NA NA NA NA NA NA 7 7 7 7 NA
8 NA NA NA NA NA NA NA 8 8 8 8
This is for the purpose that each column can then be summed (by which I mean that for each new column, the rows will be added together, but there's no meaning to the column titles so moving them doesn't matter), so the column names don't particularly matter.
Any help would be much appreciated as i've yet to stumble across anything to achieve this kind of data transformation.
EDIT: Thanks to everyone who answered, all the solutions worked great!
Here is another base R for loop. I construct the matrix first, and then fill it in.
# build matrix of missing values
myMat <- matrix(NA, nrow(example), ncol(example) + nrow(example) - 1)
# fill it in by row with vector pulled from row of example and simiplified with `unlist`
for(i in seq_len(nrow(myMat))) {
myMat[i, i:(ncol(example) + i - 1)] <- unlist(example[i,], use.names=FALSE)
}
This returns
myMat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 1 1 1 1 NA NA NA NA NA NA NA
[2,] NA 2 2 2 2 NA NA NA NA NA NA
[3,] NA NA 3 3 3 3 NA NA NA NA NA
[4,] NA NA NA 4 4 4 4 NA NA NA NA
[5,] NA NA NA NA 5 5 5 5 NA NA NA
[6,] NA NA NA NA NA 6 6 6 6 NA NA
[7,] NA NA NA NA NA NA 7 7 7 7 NA
[8,] NA NA NA NA NA NA NA 8 8 8 8
A simple approach to create a matrix with NA and fill in the values you want.
Shifted = matrix(NA, nrow=nrow(example),
ncol=nrow(example) + ncol(example) - 1)
for(i in 1:nrow(example)) {
Shifted[i, i:(i+ncol(example)-1)] = unlist(example[i,]) }
If you really want a data.frame, you can finish off with
as.data.frame(Shifted)
Related
I'm sure theres a name for this type of matrix but not sure of it. In R I want to quickly convert a variable x:
x = 1:10
x #[1] 1 2 3 4 5 6 7 8 9 10
into a matrix where all the diagonals (not the main diagonal) are the same in one direction, the anti-diagonal direction:
x
1 1 2 3 4 5 6 7 8 9 10
2 2 3 4 5 6 7 8 9 10 NA
3 3 4 5 6 7 8 9 10 NA NA
4 4 5 6 7 8 9 10 NA NA NA
5 5 6 7 8 9 10 NA NA NA NA
6 6 7 8 9 10 NA NA NA NA NA
7 7 8 9 10 NA NA NA NA NA NA
8 8 9 10 NA NA NA NA NA NA NA
9 9 10 NA NA NA NA NA NA NA NA
10 10 NA NA NA NA NA NA NA NA NA
thanks
We can use shift from data.table
library(data.table)
do.call(cbind, shift(x, 0:9, type = 'lead'))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 6 7 8 9 10
# [2,] 2 3 4 5 6 7 8 9 10 NA
# [3,] 3 4 5 6 7 8 9 10 NA NA
# [4,] 4 5 6 7 8 9 10 NA NA NA
# [5,] 5 6 7 8 9 10 NA NA NA NA
# [6,] 6 7 8 9 10 NA NA NA NA NA
# [7,] 7 8 9 10 NA NA NA NA NA NA
# [8,] 8 9 10 NA NA NA NA NA NA NA
# [9,] 9 10 NA NA NA NA NA NA NA NA
#[10,] 10 NA NA NA NA NA NA NA NA NA
In base R, we can use embed
out <- embed(c(x, x), 10)
replace(out, lower.tri(out), NA)[, 10:1]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 6 7 8 9 10
# [2,] 2 3 4 5 6 7 8 9 10 NA
# [3,] 3 4 5 6 7 8 9 10 NA NA
# [4,] 4 5 6 7 8 9 10 NA NA NA
# [5,] 5 6 7 8 9 10 NA NA NA NA
# [6,] 6 7 8 9 10 NA NA NA NA NA
# [7,] 7 8 9 10 NA NA NA NA NA NA
# [8,] 8 9 10 NA NA NA NA NA NA NA
# [9,] 9 10 NA NA NA NA NA NA NA NA
#[10,] 10 NA NA NA NA NA NA NA NA NA
#[11,] NA NA NA NA NA NA NA NA NA NA
Here the alternative solution is using base R
f <- function(x) {
y <- x:10
length(y) <- 10
return(y)
}
sapply(x, f)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
[2,] 2 3 4 5 6 7 8 9 10 NA
[3,] 3 4 5 6 7 8 9 10 NA NA
[4,] 4 5 6 7 8 9 10 NA NA NA
[5,] 5 6 7 8 9 10 NA NA NA NA
[6,] 6 7 8 9 10 NA NA NA NA NA
[7,] 7 8 9 10 NA NA NA NA NA NA
[8,] 8 9 10 NA NA NA NA NA NA NA
[9,] 9 10 NA NA NA NA NA NA NA NA
[10,] 10 NA NA NA NA NA NA NA NA NA
This is much easier to understand, and does not require any package.
I've generated a small world network with 16 agents with igraph:
myNetwork <- sample_smallworld(dim = 1, nei = 1, size = 16, p = 0.1) #generate small world
plot(myNetwork, vertex.size=20, vertex.label=c(1:16), layout=layout_in_circle) #inspect the network
In a separate dataframe, stack, I have each of these agents' opinion (opinion1):
> stack
agent opinion1
1 1 0.71979146
2 2 0.25040406
3 3 0.50866647
4 4 0.53713674
5 5 0.53954982
6 6 0.23903034
7 7 0.03989347
8 8 0.29350197
9 9 0.85441826
10 10 0.44565889
11 11 0.28223782
12 12 0.39748249
13 13 0.17488017
14 14 0.08804374
15 15 0.61174168
16 16 0.30949636
I now want to calculate each agent's updated opinion (let's call it opinion2) by applying this equation, where networkNeighborsOpinion1 refers to the opinion1s of the agents that are connected in myNetwork:
opinion2 <- 0.5 * opinion1 * 0.5 * (mean(networkNeighborsOpinion1))
Given myNetwork and DF$opinion1, how can I efficiently apply this equation to each agent?
Here's my thinking so far...
From myNetwork, the corresponding adjacency matrix can be retrieved like so:
adjMatrix <- as.matrix(as_adjacency_matrix(myNetwork, names = TRUE, edges = FALSE))
adjMatrix[adjMatrix == 0] <- NA #turn all 0s into NAs
> adjMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA
[2,] 1 NA 1 NA NA NA NA NA NA NA NA NA NA NA NA
[3,] NA 1 NA NA NA NA NA NA NA NA NA NA NA 1 NA
[4,] NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA
[5,] NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA
[6,] NA NA NA NA NA NA 1 NA NA NA NA NA NA 1 NA
[7,] NA NA NA NA NA 1 NA 1 NA NA NA NA NA NA NA
[8,] NA NA NA NA NA NA 1 NA 1 NA NA NA NA NA NA
[9,] NA NA NA NA NA NA NA 1 NA 1 NA NA NA NA NA
[10,] NA NA NA NA NA NA NA NA 1 NA 1 NA NA NA NA
[11,] NA NA NA NA NA NA NA NA NA 1 NA 1 NA NA NA
[12,] NA NA NA NA NA NA NA NA NA NA 1 NA 1 NA NA
[13,] NA NA NA NA NA NA NA NA NA NA NA 1 NA 1 NA
[14,] NA NA 1 NA NA 1 NA NA NA NA NA NA 1 NA 1
[15,] NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA
[16,] 1 NA NA NA NA NA NA NA NA NA NA NA NA NA 1
Each agent is represented by a row in adjMatrix, and each network connection is indicated by a value of 1.
Then, it seems like there should be way to use each row of adjMatrix to call the appropriate values from stack$opinion1 and generate a vector of networkNeighborsOpinion1, which could then be used to compute an opinion2 for each agent. Note that I've changed the 0s in adjMatrix to NAs, which follows my thinking that each row could by multiplied by the corresponding values in stack$opinion1 (i.e., each opinion1 is either multiplied by 1 or NA, which could then be input as mean(networkNeighborsOpinion1, na.rm = TRUE))
Any direction on this would be appreciated. Perhaps a for loop or function?
Multiply the adjacency matrix by opinion1 and divide by the sum of corresponding rows in the adjacency matrix. Then average that with opinion1.
adjMatrix <- as.matrix(as_adjacency_matrix(myNetwork, names = TRUE, edges = FALSE))
0.5 * stack$opinion1 + 0.5 * (adjMatrix %*% stack$opinion1) / rowSums(adjMatrix)
Note
stack is reproducible form is:
Lines <- " agent opinion1
1 1 0.71979146
2 2 0.25040406
3 3 0.50866647
4 4 0.53713674
5 5 0.53954982
6 6 0.23903034
7 7 0.03989347
8 8 0.29350197
9 9 0.85441826
10 10 0.44565889
11 11 0.28223782
12 12 0.39748249
13 13 0.17488017
14 14 0.08804374
15 15 0.61174168
16 16 0.30949636"
stack <- read.table(text = Lines)
I am trying to produce a matrix of variable dimensions of the form below (i.e. integers increasing by 1 at a time, with a lower triangle of NAs)
NA 1 2 3 4
NA NA 5 6 7
NA NA NA 8 9
NA NA NA NA 10
NA NA NA NA 11
I have used the below code
sample_vector <- c(1:(total_nodes^2))
sample_matrix <- matrix(sample_vector, nrow=total_nodes, byrow=FALSE)
sample_matrix[lower.tri(sample_matrix, diag = TRUE)] <- NA
However the matrix I get with this method is of the form:
NA 2 3 4 5
NA NA 8 9 10
NA NA NA 14 15
NA NA NA NA 20
NA NA NA NA 25
How about this
total_nodes <- 5
sample_matrix <- matrix(NA, nrow=total_nodes, ncol=total_nodes)
sample_matrix[lower.tri(sample_matrix)]<-1:sum(lower.tri(sample_matrix))
sample_matrix <- t(sample_matrix)
sample_matrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 2 3 4
# [2,] NA NA 5 6 7
# [3,] NA NA NA 8 9
# [4,] NA NA NA NA 10
# [5,] NA NA NA NA NA
I'm using the diag function to construct a matrix and upper.tri to turn it into a "target" aas well as a logical indexing tool:
upr5 <- upper.tri(diag(5))
upr5
upr5[upr5] <- 1:sum(upr5)
upr5[upr5==0] <- NA # would otherwise have been zeroes
upr5
[,1] [,2] [,3] [,4] [,5]
[1,] NA 1 2 4 7
[2,] NA NA 3 5 8
[3,] NA NA NA 6 9
[4,] NA NA NA NA 10
[5,] NA NA NA NA NA
I am trying to use the Gale-Shapely algorithm in R matchingMarkets::hri() to assign 10 students (A-J) to 6 groups (1-6) based on their preferences and subject to capacity constraints in each group. Each student ranks their top 3 choices for groups and all other choices are null. My issue is that
> hri(nSlots=capacities$capacity, s.prefs = student_prefs_matrix, c.prefs = null_matrix)
returns this:
Error in x[y] : invalid subscript type 'list'
hri() does allow missing values, according to the documentation (this is unlike the similar matchingR::galeShapely.collegeAdmissions()), so that is not where the issue lays. I compared my inputs to the example in the documentation (p. 7) and all same type of structure. Here are my inputs:
> student_prefs_matrix
a b c d e f g h i j
1 3 1 NA NA NA NA NA NA NA NA
2 NA NA 3 NA NA 3 3 2 2 2
3 NA NA NA NA 1 2 NA NA 1 3
4 1 3 NA 3 NA NA NA NA NA NA
5 NA 2 2 1 3 1 1 1 3 1
6 2 NA 1 2 2 NA 2 3 NA NA
> null_matrix
1 2 3 4 5 6
a NA NA NA NA NA NA
b NA NA NA NA NA NA
c NA NA NA NA NA NA
d NA NA NA NA NA NA
e NA NA NA NA NA NA
f NA NA NA NA NA NA
g NA NA NA NA NA NA
h NA NA NA NA NA NA
i NA NA NA NA NA NA
j NA NA NA NA NA NA
> capacities$capacity
[1] 2 2 2 2 1 1
Can anyone give a hint as to what this error may mean? The only list (vector) I give is for nSlots which is supposed to be a list. Alternatively, is there a better way to solve this matching problem? I know Gale Shapely is meant for 2 sided matchings but I thought this may work anyways if I always look for "student optimal" matching. Thanks for the help! This is my first time posting a question on here.
To fix your problem, make sure to provide the preference lists in the appropriate format. See the documentation of the matchingMarkets package at https://matchingmarkets.org/hri.html for several examples on this.
Let us look at an example with 7 students, 2 colleges with 3 slots each, and given preference lists as follows:
> s.prefs <- matrix(c(1,2, 1,2, 1,NA, 1,2, 1,2, 1,2, 1,2), 2,7)
> c.prefs <- matrix(c(1,2,3,4,5,6,7, 1,2,3,4,5,NA,NA), 7,2)
> hri(s.prefs=s.prefs, c.prefs=c.prefs, nSlots=c(3,3))
> s.prefs
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 NA 2 2 NA NA
> c.prefs
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 4
[4,] 4 5
[5,] 5 NA
[6,] 6 NA
[7,] 7 NA
For your specification of the preference lists, there are two problems. The most obvious problem is your college_prefs_matrix that currently indicates that none of the colleges find any student acceptable. Thus no stable matching can exist. The other problem is in your student_prefs_matrix. In your example, student a finds colleges 4, 6, and 1 acceptable (in that order). Thus the preference list should be:
> student_prefs_matrix
a b c d ...
[1,] 4 1 6 5 ...
[2,] 6 5 5 6 ...
[3,] 1 4 2 4 ...
[4,] NA NA NA NA ...
[5,] NA NA NA NA ...
[6,] NA NA NA NA ...
I have been trying to solve the following problem.
Suppose I have the following vector:
aux1<-c(0,0,0,4,5,0,7,0,0,10,11,12) where the numbers represent the number of the row.
I want to calculate the distance between the differents elements of this vector fixing the first component, then the second and so on.
If the element is zero, I do not want to count it, so I put a NA instead. The output I want should look like this:
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
1 NA NA NA NA
NA NA NA NA NA
3 2 NA NA NA
NA NA NA NA NA
NA NA NA NA NA
6 5 3 NA NA
7 6 4 1
8 7 5 2 1
In the first column, I have the difference between the first element different from zero and all other elements, i.e., Matrix[5,1]=5-4=1 and Matrix[12,1]=12-4=8. Also, Matrix[7,2]=7-5=2, where 5 is the second element in the vector non-equal to zero. Notice that Matrix[10,3]=10-7=3, where 7 is third element non-equal to zero, but the seventh element in my vector.
I have tried to do this in a loop. My current code looks like this:
M=matrix(nrow=N-1, ncol=N-1))
for (i in 1:N-1){
for (j in 1:N-1){
if(j<=i)
next
else
if(aux1[j]>0)
M[j,i]=aux1[j]-aux1[i]
else
M[j,i]=0
}
}
Unfortunately. I have not been able to solve my problem. Any help would be greatly appreciated.
You could try something like the following (with generous help from #thela)
res <- outer(aux1, head(aux1[aux1 > 0], -1), `-`)
is.na(res) <- res <= 0
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] NA NA NA NA NA
# [3,] NA NA NA NA NA
# [4,] NA NA NA NA NA
# [5,] 1 NA NA NA NA
# [6,] NA NA NA NA NA
# [7,] 3 2 NA NA NA
# [8,] NA NA NA NA NA
# [9,] NA NA NA NA NA
# [10,] 6 5 3 NA NA
# [11,] 7 6 4 1 NA
# [12,] 8 7 5 2 1
Using sapply and ifelse :
sapply(head(vv[vv>0],-1),function(y)ifelse(vv-y>0,vv-y,NA))
You loop over the positive values (you should also remove the last element), then you extract each value from the original vector. I used ifelse to replace negative values.
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] NA NA NA NA NA
# [3,] NA NA NA NA NA
# [4,] NA NA NA NA NA
# [5,] 1 NA NA NA NA
# [6,] NA NA NA NA NA
# [7,] 3 2 NA NA NA
# [8,] NA NA NA NA NA
# [9,] NA NA NA NA NA
# [10,] 6 5 3 NA NA
# [11,] 7 6 4 1 NA
# [12,] 8 7 5 2 1