I have been trying to solve the following problem.
Suppose I have the following vector:
aux1<-c(0,0,0,4,5,0,7,0,0,10,11,12) where the numbers represent the number of the row.
I want to calculate the distance between the differents elements of this vector fixing the first component, then the second and so on.
If the element is zero, I do not want to count it, so I put a NA instead. The output I want should look like this:
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
NA NA NA NA NA
1 NA NA NA NA
NA NA NA NA NA
3 2 NA NA NA
NA NA NA NA NA
NA NA NA NA NA
6 5 3 NA NA
7 6 4 1
8 7 5 2 1
In the first column, I have the difference between the first element different from zero and all other elements, i.e., Matrix[5,1]=5-4=1 and Matrix[12,1]=12-4=8. Also, Matrix[7,2]=7-5=2, where 5 is the second element in the vector non-equal to zero. Notice that Matrix[10,3]=10-7=3, where 7 is third element non-equal to zero, but the seventh element in my vector.
I have tried to do this in a loop. My current code looks like this:
M=matrix(nrow=N-1, ncol=N-1))
for (i in 1:N-1){
for (j in 1:N-1){
if(j<=i)
next
else
if(aux1[j]>0)
M[j,i]=aux1[j]-aux1[i]
else
M[j,i]=0
}
}
Unfortunately. I have not been able to solve my problem. Any help would be greatly appreciated.
You could try something like the following (with generous help from #thela)
res <- outer(aux1, head(aux1[aux1 > 0], -1), `-`)
is.na(res) <- res <= 0
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] NA NA NA NA NA
# [3,] NA NA NA NA NA
# [4,] NA NA NA NA NA
# [5,] 1 NA NA NA NA
# [6,] NA NA NA NA NA
# [7,] 3 2 NA NA NA
# [8,] NA NA NA NA NA
# [9,] NA NA NA NA NA
# [10,] 6 5 3 NA NA
# [11,] 7 6 4 1 NA
# [12,] 8 7 5 2 1
Using sapply and ifelse :
sapply(head(vv[vv>0],-1),function(y)ifelse(vv-y>0,vv-y,NA))
You loop over the positive values (you should also remove the last element), then you extract each value from the original vector. I used ifelse to replace negative values.
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA NA NA NA NA
# [2,] NA NA NA NA NA
# [3,] NA NA NA NA NA
# [4,] NA NA NA NA NA
# [5,] 1 NA NA NA NA
# [6,] NA NA NA NA NA
# [7,] 3 2 NA NA NA
# [8,] NA NA NA NA NA
# [9,] NA NA NA NA NA
# [10,] 6 5 3 NA NA
# [11,] 7 6 4 1 NA
# [12,] 8 7 5 2 1
Related
I've generated a small world network with 16 agents with igraph:
myNetwork <- sample_smallworld(dim = 1, nei = 1, size = 16, p = 0.1) #generate small world
plot(myNetwork, vertex.size=20, vertex.label=c(1:16), layout=layout_in_circle) #inspect the network
In a separate dataframe, stack, I have each of these agents' opinion (opinion1):
> stack
agent opinion1
1 1 0.71979146
2 2 0.25040406
3 3 0.50866647
4 4 0.53713674
5 5 0.53954982
6 6 0.23903034
7 7 0.03989347
8 8 0.29350197
9 9 0.85441826
10 10 0.44565889
11 11 0.28223782
12 12 0.39748249
13 13 0.17488017
14 14 0.08804374
15 15 0.61174168
16 16 0.30949636
I now want to calculate each agent's updated opinion (let's call it opinion2) by applying this equation, where networkNeighborsOpinion1 refers to the opinion1s of the agents that are connected in myNetwork:
opinion2 <- 0.5 * opinion1 * 0.5 * (mean(networkNeighborsOpinion1))
Given myNetwork and DF$opinion1, how can I efficiently apply this equation to each agent?
Here's my thinking so far...
From myNetwork, the corresponding adjacency matrix can be retrieved like so:
adjMatrix <- as.matrix(as_adjacency_matrix(myNetwork, names = TRUE, edges = FALSE))
adjMatrix[adjMatrix == 0] <- NA #turn all 0s into NAs
> adjMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA
[2,] 1 NA 1 NA NA NA NA NA NA NA NA NA NA NA NA
[3,] NA 1 NA NA NA NA NA NA NA NA NA NA NA 1 NA
[4,] NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA
[5,] NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA
[6,] NA NA NA NA NA NA 1 NA NA NA NA NA NA 1 NA
[7,] NA NA NA NA NA 1 NA 1 NA NA NA NA NA NA NA
[8,] NA NA NA NA NA NA 1 NA 1 NA NA NA NA NA NA
[9,] NA NA NA NA NA NA NA 1 NA 1 NA NA NA NA NA
[10,] NA NA NA NA NA NA NA NA 1 NA 1 NA NA NA NA
[11,] NA NA NA NA NA NA NA NA NA 1 NA 1 NA NA NA
[12,] NA NA NA NA NA NA NA NA NA NA 1 NA 1 NA NA
[13,] NA NA NA NA NA NA NA NA NA NA NA 1 NA 1 NA
[14,] NA NA 1 NA NA 1 NA NA NA NA NA NA 1 NA 1
[15,] NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA
[16,] 1 NA NA NA NA NA NA NA NA NA NA NA NA NA 1
Each agent is represented by a row in adjMatrix, and each network connection is indicated by a value of 1.
Then, it seems like there should be way to use each row of adjMatrix to call the appropriate values from stack$opinion1 and generate a vector of networkNeighborsOpinion1, which could then be used to compute an opinion2 for each agent. Note that I've changed the 0s in adjMatrix to NAs, which follows my thinking that each row could by multiplied by the corresponding values in stack$opinion1 (i.e., each opinion1 is either multiplied by 1 or NA, which could then be input as mean(networkNeighborsOpinion1, na.rm = TRUE))
Any direction on this would be appreciated. Perhaps a for loop or function?
Multiply the adjacency matrix by opinion1 and divide by the sum of corresponding rows in the adjacency matrix. Then average that with opinion1.
adjMatrix <- as.matrix(as_adjacency_matrix(myNetwork, names = TRUE, edges = FALSE))
0.5 * stack$opinion1 + 0.5 * (adjMatrix %*% stack$opinion1) / rowSums(adjMatrix)
Note
stack is reproducible form is:
Lines <- " agent opinion1
1 1 0.71979146
2 2 0.25040406
3 3 0.50866647
4 4 0.53713674
5 5 0.53954982
6 6 0.23903034
7 7 0.03989347
8 8 0.29350197
9 9 0.85441826
10 10 0.44565889
11 11 0.28223782
12 12 0.39748249
13 13 0.17488017
14 14 0.08804374
15 15 0.61174168
16 16 0.30949636"
stack <- read.table(text = Lines)
I am trying to produce a matrix of variable dimensions of the form below (i.e. integers increasing by 1 at a time, with a lower triangle of NAs)
NA 1 2 3 4
NA NA 5 6 7
NA NA NA 8 9
NA NA NA NA 10
NA NA NA NA 11
I have used the below code
sample_vector <- c(1:(total_nodes^2))
sample_matrix <- matrix(sample_vector, nrow=total_nodes, byrow=FALSE)
sample_matrix[lower.tri(sample_matrix, diag = TRUE)] <- NA
However the matrix I get with this method is of the form:
NA 2 3 4 5
NA NA 8 9 10
NA NA NA 14 15
NA NA NA NA 20
NA NA NA NA 25
How about this
total_nodes <- 5
sample_matrix <- matrix(NA, nrow=total_nodes, ncol=total_nodes)
sample_matrix[lower.tri(sample_matrix)]<-1:sum(lower.tri(sample_matrix))
sample_matrix <- t(sample_matrix)
sample_matrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 1 2 3 4
# [2,] NA NA 5 6 7
# [3,] NA NA NA 8 9
# [4,] NA NA NA NA 10
# [5,] NA NA NA NA NA
I'm using the diag function to construct a matrix and upper.tri to turn it into a "target" aas well as a logical indexing tool:
upr5 <- upper.tri(diag(5))
upr5
upr5[upr5] <- 1:sum(upr5)
upr5[upr5==0] <- NA # would otherwise have been zeroes
upr5
[,1] [,2] [,3] [,4] [,5]
[1,] NA 1 2 4 7
[2,] NA NA 3 5 8
[3,] NA NA NA 6 9
[4,] NA NA NA NA 10
[5,] NA NA NA NA NA
I'm trying to fill the columns of an index matrix with samples from 1:whatever using a for loop. The purpose of this is for a bootstrap coding problem. The issue I'm getting is that the for loop wont run correctly once it reaches a number that is not a multiple of the row length. For some reason it thinks I want an equal representation of number in each column. How do I get this to stop?
index.mat=matrix(NA,nr=12,nc=10,byrow=FALSE)
for(i in 1:5)
{
index.mat[,i] <- sample(1:i, i, replace=TRUE)
print(index.mat)
}
will print
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 2 1 NA NA NA NA NA NA
[2,] 1 1 2 4 NA NA NA NA NA NA
[3,] 1 1 2 1 NA NA NA NA NA NA
[4,] 1 1 2 2 NA NA NA NA NA NA
[5,] 1 1 2 1 NA NA NA NA NA NA
[6,] 1 1 2 4 NA NA NA NA NA NA
[7,] 1 1 2 1 NA NA NA NA NA NA
[8,] 1 1 2 2 NA NA NA NA NA NA
[9,] 1 1 2 1 NA NA NA NA NA NA
[10,] 1 1 2 4 NA NA NA NA NA NA
[11,] 1 1 2 1 NA NA NA NA NA NA
[12,] 1 1 2 2 NA NA NA NA NA NA
as the final matrix before giving the error
Error in index.mat[, i] <- sample(1:i, i, replace = TRUE) :
number of items to replace is not a multiple of replacement length
Just use sample(i, size = 12, replace = TRUE).
Your LHS is index.mat[,i] which has length 12.
Your RHS is sample(1:i, i, replace = TRUE), which has length i.
By nature, R will recycle the RHS when the lengths don't match up -- this means, when i=2, your RHS is length 2 and it will simply be repeated 6 times to match the LHS length 12.
In this particular case, if the RHS length isn't a divisor of the LHS length, you get an error -- which happens first when, you guessed it, i=5 (since 1, 2, 3, and 4 all divide 12 evenly).
I have a large matrix which comprises 1,2 and missing (coded as NA) values. The matrix has 500000 rows by 10000 columns. There are approximately 0.05% 1- or 2-values, and the remaining values are NA.
I would like to reorder the rows and columns of the matrix so that the top left corner of the matrix comprises a relatively high number of 1s and 2s compared to the rest of the matrix. In other words, I would like to create a relatively datarich subset of the matrix, by reordering the matrix rows and columns.
Is there an efficient method of achieving this in R?
In particular, I'm interested in solutions where sorting by the number of non-NA values in each row and column is not sufficient to produce a dense corner.
In addition, I'll add a constraint. The size of the dense corner will be pre-defined.
In the following example, the goal is to reorder the rows and columns so that the top leftmost 3x3 submatrix is relatively dense (i.e. few or no NA values).
m1 <- matrix(c(rep(c(rep(NA, 3), rep(1, 7)), 1),
rep(c(rep(2, 3), rep(NA, 7)), 7),
rep(c(rep(NA, 3), rep(1, 7)), 2)
), nrow=10, byrow=TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA NA NA 1 1 1 1 1 1 1
[2,] 2 2 2 NA NA NA NA NA NA NA
[3,] 2 2 2 NA NA NA NA NA NA NA
[4,] 2 2 2 NA NA NA NA NA NA NA
[5,] 2 2 2 NA NA NA NA NA NA NA
[6,] 2 2 2 NA NA NA NA NA NA NA
[7,] 2 2 2 NA NA NA NA NA NA NA
[8,] 2 2 2 NA NA NA NA NA NA NA
[9,] NA NA NA 1 1 1 1 1 1 1
[10,] NA NA NA 1 1 1 1 1 1 1
The rows and columns are ordered by the number of non-NA values (using code from an answer below):
m1 <- m1[order(rowSums(is.na(m1))), order(colSums(is.na(m1)))]
However, this does not result in a dense 3x3 top leftmost corner:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA NA NA 1 1 1 1 1 1 1
[2,] NA NA NA 1 1 1 1 1 1 1
[3,] NA NA NA 1 1 1 1 1 1 1
[4,] 2 2 2 NA NA NA NA NA NA NA
[5,] 2 2 2 NA NA NA NA NA NA NA
[6,] 2 2 2 NA NA NA NA NA NA NA
[7,] 2 2 2 NA NA NA NA NA NA NA
[8,] 2 2 2 NA NA NA NA NA NA NA
[9,] 2 2 2 NA NA NA NA NA NA NA
[10,] 2 2 2 NA NA NA NA NA NA NA
I thought that there maybe a set of optimisation procedures that I could implement as my working matrix is too large to do the reorganisation by eye.
I have a very large dataset, so I want to avoid loops.
I have three columns of data:
col1 = time presented as 10000, 10001, 10002, 10100, 10101, 10102,
10200, 10201, 10202, 10300, ... (total 18000 times)
col2 = id number 1 2 3 4 ... (total 500 ids)
col3 = reading associated with particular id at particular time. 0.1
0.5 0.6 0.7... Say this is called Data3
10000 1 0.1
10001 1 0.5
10002 1 0.6
10100 1 0.7
10200 1 0.6 (NOTE - some random entries missing)
I want to present this as a matrix (called DataMatrix), but there is missing data, so a simple reshape will not do. I want to have the missing data as NA entries.
DataMatrix is currently an NA matrix of 500 columns and 18000 rows, where the row names and column names are the times and ids respectively.
1 2 3 4 ....
10000 NA NA NA NA ....
10001 NA NA NA NA ....
Is there a way I can get R to go through each row of Data3, completing DataMatrix with the reading Data3[,3] by placing it in the row and column of the matrix whose names relate to the Data3[,1] and Data3[,2]. But without loops.
Thanks to all you smart people out there.
If I understood you correctly:
Data3 <- data.frame(col1=10000:10499,
col2=1:500,
col3=round(runif(500),1))
library(reshape2)
DataMatrix <- dcast(Data3, col1~col2, value.var="col3")
DataMatrix[1:5, 1:5]
# col1 1 2 3 4
# 1 10000 0.4 NA NA NA
# 2 10001 NA 0.6 NA NA
# 3 10002 NA NA 0.9 NA
# 4 10003 NA NA NA 0.5
# 5 10004 NA NA NA NA
Here is a solution with possible id values in 1:10 and times values in 1:20. First, create data:
mx <- matrix(c(sample(1:20, 5), sample(1:10, 5), sample(1:50, 5)), ncol=3, dimnames=list(NULL, c("time", "id", "reading")))
times <- 1:20
ids <- 1:10
mx
# time id reading
# [1,] 4 3 25
# [2,] 5 4 9
# [3,] 9 7 45
# [4,] 18 1 40
# [5,] 11 8 28
Now, use outer to pass every possible combination of time/id to a look up function that returns the corresponding reading value:
outer(times, ids,
function(x, y) {
mapply(function(x.sub, y.sub) {
val <- mx[mx[, 1] == x.sub & mx[, 2] == y.sub, 3]
if(length(val) == 0L) NA_integer_ else val
},
x, y)
} )
This produces the (hopefully) desired answer:
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] NA NA NA NA NA NA NA NA NA NA
# [2,] NA NA NA NA NA NA NA NA NA NA
# [3,] NA NA NA NA NA NA NA NA NA NA
# [4,] NA NA 25 NA NA NA NA NA NA NA
# [5,] NA NA NA 9 NA NA NA NA NA NA
# [6,] NA NA NA NA NA NA NA NA NA NA
# [7,] NA NA NA NA NA NA NA NA NA NA
# [8,] NA NA NA NA NA NA NA NA NA NA
# [9,] NA NA NA NA NA NA 45 NA NA NA
# [10,] NA NA NA NA NA NA NA NA NA NA
# [11,] NA NA NA NA NA NA NA 28 NA NA
# [12,] NA NA NA NA NA NA NA NA NA NA
# [13,] NA NA NA NA NA NA NA NA NA NA
# [14,] NA NA NA NA NA NA NA NA NA NA
# [15,] NA NA NA NA NA NA NA NA NA NA
# [16,] NA NA NA NA NA NA NA NA NA NA
# [17,] NA NA NA NA NA NA NA NA NA NA
# [18,] 40 NA NA NA NA NA NA NA NA NA
# [19,] NA NA NA NA NA NA NA NA NA NA
# [20,] NA NA NA NA NA NA NA NA NA NA