Create list with looping - r

I have a i times j (ixj) dummy-matrix for rating events of companies, with i dates and j different companies. On a day where a rating occurs rating [i,j]=1 and 0 otherwise.
I want to create a list, which contains 4 sublist (1 for each of the 4 companies). Each sublist states the row numbers of the rating event of the specific company.
This is my code:
r<-list(); i=1;j=2;
for(j in 1:4){
x<-list()
for(i in 100){
if(rating[i,j]!=0){
x<-c(x,i)
i=i+1
}
else{i=i+1}
}
r[[j]]<-x
j=j+1
}
It is somehow not working, and I really can not figure out where the bug is. The x sublists are always empty. Could somebody help?
Thanks a lot!
Here is an example rating matrix:
rating<-matrix(data = 0, nrow = (100), ncol = 4, dimnames=list(c(1:100), c(1:4)));
rating[3,1]=1;rating[7,1]=1;rating[20,1]=1;rating[75,1]=1;
rating[8,2]=1;rating[40,2]=1;rating[50,2]=1;rating[78,2]=1;
rating[1,3]=1;rating[4,3]=1;rating[17,3]=1;rating[99,3]=1;
rating[10,4]=1;rating[20,4]=1;rating[30,4]=1;rating[90,4]=1;

You may try this:
set.seed(123)
m <- matrix(data = sample(c(0, 1), 16, replace = TRUE), ncol = 4,
dimnames = list(date = 1:4, company = letters[1:4]))
m
# company
# date a b c d
# 1 0 1 1 1
# 2 1 0 0 1
# 3 0 1 1 0
# 4 1 1 0 1
lapply(as.data.frame(m), function(x){
which(x == 1)
})
# $a
# [1] 2 4
#
# $b
# [1] 1 3 4
#
# $c
# [1] 1 3
#
# $d
# [1] 1 2 4
Update
Or more compact (thanks to #flodel!):
lapply(as.data.frame(m == 1), which)

(Leave for-loops behind.) If ratings really is a matrix or even if its a dataframe, then why not use rowSums:
r <- rowSums(rating) # accomplished the stated task more effectively.
# simple example:
> rating <- matrix( rbinom(100, 1,prob=.5), 10)
> rating
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 1 0 0 1 1 0 0 1
[2,] 1 0 0 0 0 0 0 1 1 1
[3,] 0 0 1 1 1 1 0 0 1 0
[4,] 1 0 0 0 1 1 0 0 0 1
[5,] 1 1 0 1 1 1 1 0 0 0
[6,] 1 1 1 0 1 1 1 0 1 0
[7,] 0 1 0 1 0 1 1 0 1 0
[8,] 0 1 0 0 1 1 0 1 1 0
[9,] 1 1 1 0 1 1 1 1 0 0
[10,] 0 1 0 0 1 0 0 1 0 1
> rowSums(rating)
[1] 5 4 5 4 6 7 5 5 7 4
> rowSums(as.data.frame(rating))
[1] 5 4 5 4 6 7 5 5 7 4
If it needs to be a list then just wrap as.list() around it.

Related

How to find rows that sum up to certain values of colSums and rowSums?

My task is to randomly assign 8 rows that consist of 12 columns and values that are random combinations of 0 and 1 values while each row sum equals 6 and each column sum equals 4.
So I create all possible combinations of 0 and 1 within 12 variables:
df <- expand.grid(0:1, 0:1, 0:1, 0:1, 0:1, 0:1,
0:1, 0:1, 0:1, 0:1, 0:1, 0:1)
Restrain possible combinations to these that row sum equals 6:
df <- df[rowSums(df)==6,]
Then I shuffle it:
shuffled <- df[sample(nrow(df)),]
and finally I'd like to pick 8 rows from shuffled data. All these 8 rows must have column sums that equal 4 and row sums equal 6:
colSums(picked_shuffled)
[1] 4 4 4 4 4 4 4 4 4 4 4 4
rowSums(picked_shuffled)
[1] 6 6 6 6 6 6 6 6
How to do it?
Doing it by trial and error will take you a very long time! An alternative is to construct a matrix that works and then shuffle it...
rows <- rep(1:8, 6) #48 row positions for the 1s - 6 of each
columns <- rep(1:12, each = 4) #48 column positions for the 1s - 4 of each
mat <- matrix(0, nrow = 8, ncol = 12) #blank matrix of 0s
mat[cbind(rows, columns)] <- 1 #set selected values to 1
mat <- mat[sample(1:8), sample(1:12)] #shuffle rows and columns
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 0 0 0 1 1 0 0 1 1 0 1
[2,] 0 1 1 1 0 0 1 1 0 0 1 0
[3,] 0 1 1 1 0 0 1 1 0 0 1 0
[4,] 1 0 0 0 1 1 0 0 1 1 0 1
[5,] 1 0 0 0 1 1 0 0 1 1 0 1
[6,] 0 1 1 1 0 0 1 1 0 0 1 0
[7,] 1 0 0 0 1 1 0 0 1 1 0 1
[8,] 0 1 1 1 0 0 1 1 0 0 1 0
I don't know if it is possible to produce a more "random" distribution than this - there are still only two types of column and two types of row however you shuffle it!
By the way these operations are usually much faster on matrices than dataframes - you can always convert it at the end.
A more random solution...
After a bit of thought, it is possible to get a more "random" solution with the method above, but shuffling columns until you get no duplicated row-column pairs (which seems to be quite fast). So a modified version...
rows <- rep(1:8, 6)
columns <- sample(rep(1:12, 4))
while(any(duplicated(cbind(rows, columns)))){
columns <- sample(columns)
}
mat <- matrix(0, nrow = 8, ncol = 12)
mat[cbind(rows, columns)] <- 1
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 0 0 1 0 1 1 0 1 0 1 1 0
[2,] 1 1 0 1 0 1 1 0 0 0 1 0
[3,] 1 1 1 0 0 0 1 0 0 1 0 1
[4,] 0 1 0 1 0 1 0 1 1 1 0 0
[5,] 0 1 0 0 1 0 1 0 1 0 1 1
[6,] 0 0 1 0 1 0 1 1 1 0 0 1
[7,] 1 0 1 1 1 1 0 0 0 1 0 0
[8,] 1 0 0 1 0 0 0 1 1 0 1 1
rowSums(mat)
[1] 6 6 6 6 6 6 6 6
colSums(mat)
[1] 4 4 4 4 4 4 4 4 4 4 4 4
I have got a less clean but more random solution to the problem than Andrew. It randomly shoots 1 at the initially empty grid, until the conditions are satisfied. Sometimes, it removes 20% of previous hits to prevent getting stuck. When it gets stuck because of too many iterations, it resets.
I simulated it and it usually takes about 40-80 iterations to fill the grid according to your specifications. In rare cases, it takes up to 160.
grid = matrix(0,nrow=8,ncol=12)
finished = F
count=0
while(!finished){
openrows = c(1:8)[rowSums(grid)<6]
opencols = c(1:12)[colSums(grid)<4]
if(length(openrows)>0 & length(opencols)>0){
if(length(openrows)==1 & length(opencols)==1 & grid[openrows[1],opencols[1]]==1){
grid[grid==1 & runif(length(grid),0,1)>0.8]=0
}
i = as.integer(runif(1,0,length(openrows)))+1
j = as.integer(runif(1,0,length(opencols)))+1
grid[openrows[i],opencols[j]]=1
}else{
finished=TRUE
}
count = count+1
if(count>500){
grid = matrix(0,nrow=8,ncol=12)
count=0
}
}
It's not very efficient (for large tables) but it works and gives you random data.
That was quite the brain teaser, tbh.

Building adjacency matrix from delaunay triangle values

I have a large matrix containing 3D coordinates of points and I want to have the adjacency matrix of them using their delaunay triangulation results. To have their delaunay triangulation values I used 'geometry' package. To have an example of what I have, I prepared the below codes:
values<-rnorm(12,mean = 10, 5)
mat<-matrix(values,ncol = 3)
dimnames(mat)<-list(c("asd","qwe","rty","poi"),c("x","y","z"))
require("geometry")
delaunaynMat<-delaunayn(mat)
However, I could not find any proper function to build the adjacency matrix from delaunay results. Any idea about it?
Perhaps with the DatabionicSwarm package
> library(DatabionicSwarm)
> Delaunay4Points(mat, IsToroid=FALSE)
1 2 3 4
1 0 1 1 1
2 1 0 1 1
3 1 1 0 0
4 1 1 0 0
Or wirh deldir (only for 2D):
> require(deldir)
Loading required package: deldir
deldir 0.1-15
> set.seed(42)
> x <- runif(6)
> y <- runif(6)
> dxy <- deldir(x,y)
> ind <- dxy$dirsgs[,5:6]
> adj <- matrix(0, length(x), length(y))
> for (i in 1:nrow(ind)){
+ adj[ind[i,1], ind[i,2]] <- 1
+ adj[ind[i,2], ind[i,1]] <- 1
+ }
> adj
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 0 1 0 0
[2,] 1 0 0 1 1 0
[3,] 0 0 0 0 1 1
[4,] 1 1 0 0 1 1
[5,] 0 1 1 1 0 1
[6,] 0 0 1 1 1 0
EDIT
There's something weird. I need to flip the matrix of Delaauny4Points to get results consistent with deldir:
set.seed(42)
x <- runif(6)
y <- runif(6)
dxy <- deldir(x,y)
ind <- dxy$delsgs[,5:6]
adj <- matrix(0, length(x), length(y))
for (i in 1:nrow(ind)){
adj[ind[i,1], ind[i,2]] <- 1
adj[ind[i,2], ind[i,1]] <- 1
}
adj
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 0 1 0 1
[2,] 1 0 1 1 1 0
[3,] 0 1 0 0 1 1
[4,] 1 1 0 0 1 1
[5,] 0 1 1 1 0 1
[6,] 1 0 1 1 1 0
> Delaunay4Points(cbind(x,y), IsToroid=FALSE)[6:1,6:1] # <- look
6 5 4 3 2 1
6 0 1 0 1 0 1
5 1 0 1 1 1 0
4 0 1 0 0 1 1
3 1 1 0 0 1 1
2 0 1 1 1 0 1
1 1 0 1 1 1 0

Find frequency of vector elements in a matrix

I have a matrix in R, here is a small example:
set.seed(1)
n.columns<-6
mat <- matrix(, nrow = 5, ncol = n.columns)
for(column in 1:n.columns){
mat[, column] <- sample(1:10,5)
}
mat
The matrix looks like this:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 9 3 5 10 4
[2,] 4 10 2 7 2 1
[3,] 5 6 6 8 6 10
[4,] 7 5 10 3 1 7
[5,] 2 1 5 10 9 3
I also have a vector v of integers, v<-c(1,3,6), whose elements could theoretically appear in the matrix mat above.
What I am looking for is an overview of the number of times that each element in v appears in mat per column. For the current example this overview is
1: 0 1 0 0 1 1
3: 1 0 1 1 0 1
6: 0 1 1 0 1 0
It is fairly straightforward to do this using for-loops and if-statements, but this solution is not very pretty.
Is there a professional way to do this?
One option using sapply:
t(sapply(v, function(a) colSums(mat==a)))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 1 0 0 1 1
#[2,] 1 0 1 1 0 1
#[3,] 0 1 1 0 1 0
Using table:
table(mat[mat %in% v], col(mat)[mat %in% v])
# 1 2 3 4 5 6
# 1 0 1 0 0 1 1
# 3 1 0 1 1 0 1
# 6 0 1 1 0 1 0
A drawback is a column with all values not in v will not be reported.
Using sapply on data.frame iterates over columns.
setNames(object = as.data.frame(sapply(v, function(a)
sapply(as.data.frame(mat), function(b)
sum(a %in% b)))), nm = v)
# 1 3 6
#V1 0 1 0
#V2 1 0 1
#V3 0 1 1
#V4 0 1 0
#V5 1 0 1
#V6 1 1 0

Changing the values in a binary matrix

Consider the 8 by 6 binary matrix, M:
M <- matrix(c(0,0,1,1,0,0,1,1,
0,1,1,0,0,1,1,0,
0,0,0,0,1,1,1,1,
0,1,0,1,1,0,1,0,
0,0,1,1,1,1,0,0,
0,1,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is the M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 1 0 1 0 1
[3,] 1 1 0 0 1 1
[4,] 1 0 0 1 1 0
[5,] 0 0 1 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
The following matrix contains the column index of the 1's in matrix M
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 2 5 2 3 2
[2,] 4 3 6 4 4 3
[3,] 7 6 7 5 5 5
[4,] 8 7 8 7 6 8
Let's denote that
ind <- matrix(c(3,4,7,8,
2,3,6,7,
5,6,7,8,
2,4,5,7,
3,4,5,6,
2,3,5,8),nrow = 4, ncol=6)
I'm trying to change a single position of 1 into 0in each column of M.
For an example, one possibility of index of1s in each column would be (4,2,5,4,3,2), i.e. 4th position of Column1, 2nd position of Column2, 5thposition of Column3 and so on. Let N be the resulting matrices. This will produce the following matrix N
N <- matrix(c(0,0,1,0,0,0,1,1,
0,0,1,0,0,1,1,0,
0,0,0,0,0,1,1,1,
0,1,0,0,1,0,1,0,
0,0,0,1,1,1,0,0,
0,0,1,0,1,0,0,1),nrow = 8,ncol = 6)
Here is that N
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 1 0 0
[3,] 1 1 0 0 0 1
[4,] 0 0 0 0 1 0
[5,] 0 0 0 1 1 1
[6,] 0 1 1 0 1 0
[7,] 1 1 1 1 0 0
[8,] 1 0 1 0 0 1
For EACH of the resulting matrices of N, I do the following calculations.
X <- cbind(c(rep(1,nrow(N))),N)
ans <- sum(diag(solve(t(X)%*%X)[-1,-1]))
Then, I want to obtain the matrix N, which produce the smallest value of ans. How do I do this efficiently?
Let me know if this works.
We first build a conversion function that I'll need, and we build also the reverse function as you may need it at some point:
ind_to_M <- function(ind){
M <- matrix(rep(0,6*8),ncol=6)
for(i in 1:ncol(ind)){M[ind[,i],i] <- 1}
return(M)
}
M_to_ind <- function(M){apply(M==1,2,which)}
Then we will build a matrix of possible ways to ditch a value
all_possible_ways_to_ditch_value <- 1:4
for (i in 2:ncol(M)){
all_possible_ways_to_ditch_value <- merge(all_possible_ways_to_ditch_value,1:4,by=NULL)
}
# there's probably a more elegant way to do that
head(all_possible_ways_to_ditch_value)
# x y.x y.y y.x y.y y
# 1 1 1 1 1 1 1 # will be used to ditch the 1st value of ind for every column
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 4 1 1 1 1 1
# 5 1 2 1 1 1 1
# 6 2 2 1 1 1 1
Then we iterate through those, each time storing ans and N (as data is quite small overall).
ans_list <- list()
N_list <- list()
for(j in 1:nrow(all_possible_ways_to_ditch_value)){
#print(j)
ind_N <- matrix(rep(0,6*3),ncol=6) # initiate ind_N as an empty matrix
for(i in 1:ncol(M)){
ind_N[,i] <- ind[-all_possible_ways_to_ditch_value[j,i],i] # fill with ind except for the value we ditch
}
N <- ind_to_M(ind_N)
X <- cbind(c(rep(1,nrow(N))),N)
ans_list[[j]] <- try(sum(diag(solve(t(X)%*%X)[-1,-1])),silent=TRUE) # some systems are not well defined, we'll just ignore the errors
N_list[[j]] <- N
}
We finally retrieve the minimal ans and the relevant N
ans <- ans_list[[which.min(ans_list)]]
# [1] -3.60288e+15
N <- N_list[[which.min(ans_list)]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 0 1 0 1 0 1
# [3,] 1 1 0 0 1 1
# [4,] 1 0 0 1 1 0
# [5,] 0 0 1 1 1 1
# [6,] 0 1 1 0 0 0
# [7,] 1 0 1 0 0 0
# [8,] 0 0 0 0 0 0
EDIT:
To get minimal positive ans
ans_list[which(!sapply(ans_list,is.numeric))] <- Inf
ans <- ans_list[[which.min(abs(unlist(ans_list)))]]
# [1] 3.3
N <- N_list[[which.min(abs(unlist(ans_list)))]]
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 0 0
# [2,] 0 1 0 1 0 0
# [3,] 1 1 0 0 0 1
# [4,] 1 0 0 0 1 0
# [5,] 0 0 0 1 1 1
# [6,] 0 1 1 0 1 0
# [7,] 1 0 1 1 0 0
# [8,] 0 0 1 0 0 1
EDIT 2 : to generalize the number of rows of ind to ditch
It seems to give the same result for ans for n_ditch = 1, and results make sense for n_ditch = 2
n_ditch <- 2
ditch_possibilities <- combn(1:4,n_ditch) # these are all the possible sets of indices to ditch for one given columns
all_possible_ways_to_ditch_value <- 1:ncol(ditch_possibilities) # this will be all the possible sets of indices of ditch_possibilities to test
for (i in 2:ncol(M)){
all_possible_ways_to_ditch_value <- merge(all_possible_ways_to_ditch_value,1:ncol(ditch_possibilities),by=NULL)
}
ans_list <- list()
N_list <- list()
for(j in 1:nrow(all_possible_ways_to_ditch_value)){
#print(j)
ind_N <- matrix(rep(0,6*(4-n_ditch)),ncol=6) # initiate ind_N as an empty matrix
for(i in 1:ncol(M)){
ind_N[,i] <- ind[-ditch_possibilities[,all_possible_ways_to_ditch_value[j,i]],i] # fill with ind except for the value we ditch
}
N <- ind_to_M(ind_N)
X <- cbind(c(rep(1,nrow(N))),N)
ans_list[[j]] <- try(sum(diag(solve(t(X)%*%X)[-1,-1])),silent=TRUE) # some systems are not well defined, we'll just ignore the errors
N_list[[j]] <- N
}

Is there a way to generate a matrix in R of 0's and 1's to satisfy specific row and column totals?

I want to generate a 7 column by 10 row matrix with a total of exactly 20 randomly generated 1's, but with at least two 1's per row and two 1's per column. How could I do that?
Also, how would the code be different if I wanted to set a range of acceptable row and column totals instead of minimums?
Thanks!
I feel like there should be a more elegant solution, but here's a ball of duct tape:
matbuilder <- function(n,nrow,ncol) {
finished <- F
while(!finished) {
trial <- matrix(sample(c(rep(1,n),rep(0,nrow*ncol-n))),nrow=nrow,ncol=ncol)
if(all(rowSums(trial)>=2 & all(colSums(trial)>=2))) finished <- T
}
return(trial)
}
x <- matbuilder(20, 10, 7)
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 1 0 0 0 0 0
## [2,] 0 0 0 0 1 0 1
## [3,] 0 1 0 0 0 1 0
## [4,] 1 0 0 0 1 0 0
## [5,] 0 1 0 0 0 1 0
## [6,] 0 0 1 1 0 0 0
## [7,] 0 0 1 1 0 0 0
## [8,] 0 0 0 0 1 0 1
## [9,] 0 0 0 0 0 1 1
## [10,] 0 1 1 0 0 0 0
sum(x)
## [1] 20
rowSums(x)
## [1] 2 2 2 2 2 2 2 2 2 2
colSums(x)
## [1] 2 4 3 2 3 3 3
Or, to give a range of acceptable row/column totals...
matbuilder <- function(n,nrow,ncol,rowmin,rowmax,colmin,colmax,ntimeout=100000) {
finished <- F
i <- 1
trial <- NA
while(!finished) {
trial <- matrix(sample(c(rep(1,n),rep(0,nrow*ncol-n))),nrow=nrow,ncol=ncol)
if(all(rowSums(trial)>=rowmin) & all(rowSums(trial)<=rowmax) & all(colSums(trial)>=colmin) & all(colSums(trial)<=colmax)) finished <- T
i <- i+1
if(i>ntimeout) {
finished <- T
cat("sorry boss, timeout.")
}
}
return(trial)
}
x <- matbuilder(25,10,7,rowmin=2,rowmax=3,colmin=2,colmax=4)
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 0 0 0 0 1 1
## [2,] 0 1 1 0 0 0 1
## [3,] 1 0 0 0 1 0 0
## [4,] 1 0 1 1 0 0 0
## [5,] 1 0 0 0 0 1 1
## [6,] 0 1 1 1 0 0 0
## [7,] 0 0 0 1 0 0 1
## [8,] 0 0 1 0 1 0 0
## [9,] 0 0 0 1 1 0 0
## [10,] 0 0 0 0 1 1 0
sum(x)
## [1] 25
rowSums(x)
## [1] 3 3 2 3 3 3 2 2 2 2
colSums(x)
## [1] 4 2 4 4 4 3 4
This one involves sampling a matrix of indices such that each row is repeated twice and columns are repeated at least 2 times.
set.seed(42)
m = matrix(rep(0, 70), nrow = 10)
#Sample rows 1-10 twice
rows = sample(c(1:10, 1:10))
#Sample columns 1-7 twice and additional 6 to make 20
columns = sample(c(sample(1:7, 6, replace = TRUE), 1:7, 1:7))
#Create a matrix of indices that should be 1
inds = cbind(rows, columns)
#Remove duplicates in inds if any (Refer: https://stackoverflow.com/q/44555420/7128934)
inds[,2] = replace(x = inds[,2],
list = duplicated(inds),
values = sample(x = columns[!(columns %in% inds[,2][duplicated(inds)])],
size = 1))
m[inds] = 1
#Check
rowSums(m)
#[1] 2 2 2 2 2 2 2 2 2 2
colSums(m)
#[1] 4 2 2 3 2 2 5
sum(m)
#[1] 20

Resources