I try to create an adjacency matrix M from a list pList containing the indices that have to be equal to 1 in the matrix M.
For example, M is a 10x5 matrix
The variable pList contains 5 elements, each one is a vector of indices
Example :
s <- list("1210", c("254", "534"), "254", "534", "364")
M <- matrix(c(rep(0)),nrow = 5, ncol = length(unique(unlist(s))), dimnames=list(1:5,unique(unlist(s))))
Actually, my too simple solution is the brutal one with a for loop over rows of the matrix :
for (i in 1:nrow(M)){
M[i, as.character(s[[i]])] <- 1
}
So that the expected result is :
M
1210 254 534 364
1 1 0 0 0
2 0 1 1 0
3 0 1 0 0
4 0 0 1 0
5 0 0 0 1
The problem is that I have to manipulate matrices with several thousands of lines and it takes too much time.
I am not a "apply" expert but I wonder if there is a quicker solution
Thanks
Regards
We can convert the list to a matrix of row/column index, use that index to assign the elements in 'M' to 1.
M[as.matrix(stack(setNames(s, seq_along(s)))[,2:1])] <- 1
M
# 1210 254 534 364
#1 1 0 0 0
#2 0 1 1 0
#3 0 1 0 0
#4 0 0 1 0
#5 0 0 0 1
Or instead of using stack to convert to a data.frame, we can unlist the 's' to get the column index, cbind with row index created by replicating the sequence of list with length of each list element (using lengths) and assign the elements in 'M' to 1.
M[cbind(rep(seq_along(s), lengths(s)), unlist(s))] <- 1
Or yet another option would be to create a sparseMatrix
library(Matrix)
Un1 <- unlist(s)
sparseMatrix(i = rep(seq_along(s), lengths(s)),
j=as.integer(factor(Un1, levels = unique(Un1))),
x=1)
Related
From a given dataframe:
# Create dataframe with 4 variables and 10 obs
set.seed(1)
df<-data.frame(replicate(4,sample(0:1,10,rep=TRUE)))
I would like to compute a substract operation between in all columns combinations by pairs, but only keeping one substact, i.e column A- column B but not column B-column A and so on.
What I got is very manual, and this tend to be not so easy when there are lots of variables.
# Result
df_result <- as.data.frame(list(df$X1-df$X2,
df$X1-df$X3,
df$X1-df$X4,
df$X2-df$X3,
df$X2-df$X4,
df$X3-df$X4))
Also the colname of the feature name should describe the operation i.e.(x1_x2) being x1-x2.
You can use combn:
COMBI = combn(colnames(df),2)
res = data.frame(apply(COMBI,2,function(i)df[,i[1]]-df[,i[2]]))
colnames(res) = apply(COMBI,2,paste0,collapse="minus")
head(res)
X1minusX2 X1minusX3 X1minusX4 X2minusX3 X2minusX4 X3minusX4
1 0 0 -1 0 -1 -1
2 1 1 0 0 -1 -1
3 0 0 0 0 0 0
4 0 0 -1 0 -1 -1
5 1 1 1 0 0 0
6 -1 0 0 1 1 0
I have a data frame column
df$col1=(1,2,3,4,5,6,7,8,9,...,500000)
and a vector
vc<-c(1,2,4,5,7,8,10,...,499999)
If i compare the two vectors the second vector has some missing values how i can insert in the missing values' places 0s e.g the second vector i want to be
vc<-c(1,2,0,4,5,0,7,8,9,10,...,499999,0)
You could use match and replace (thanks to #RonakShah)
Input
vc <- c(1,2,4,5,7,8,10)
x <- 1:15
Result
out <- replace(tmp <- vc[match(x, vc)], is.na(tmp), 0L)
out
# [1] 1 2 0 4 5 0 7 8 0 10 0 0 0 0 0
You could try using the larger vector containing all values as a template, and then assign zero to any value which does not match to the second smaller vector:
v_out <- df$col1
v_out[!(v_out %in% vc)] <- 0
v_out
[1] 1 2 0 4 5 0 7 8 0 10
Data
df$col1 <- c(1:10)
vc <- c(1,2,4,5,7,8,10)
A more cryptic, but maybe faster one-liner alternative (using Tim's data):
`[<-`(numeric(max(df$col1)),vc,vc)
#[1] 1 2 0 4 5 0 7 8 0 10
I have the following code: model$data
model$data
[[1]]
Category1 Category2 Category3 Category4
3555 1 0 0 0
6447 1 0 0 0
5523 1 0 1 0
7550 1 0 1 0
6330 1 0 1 0
2451 1 0 0 0
4308 1 0 1 0
8917 0 0 0 0
4780 1 0 1 0
6802 1 0 1 0
2021 1 0 0 0
5792 1 0 1 0
5475 1 0 1 0
4198 1 0 0 0
223 1 0 1 0
4811 1 0 1 0
678 1 0 1 0
I am trying to use this formula to get an index of the column names:
sample(colnames(model$data), 1)
But I receive the following error message:
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
Is there a way to avoid that error?
Notice this?
model$data
[[1]]
The [[1]] means that model$data is a list, whose first component is a data frame. To do anything with it, you need to pass model$data[[1]] to your code, not model$data.
sample(colnames(model$data[[1]]), 1)
This seems to be a near-duplicate of Random rows in dataframes in R and should probably be closed as duplicate. But for completeness, adapting that answer to sampling column-indices is trivial:
you don't need to generate a vector of column-names, only their indices. Keep it simple.
sample your col-indices from 1:ncol(df) instead of 1:nrow(df)
then put those column-indices on the RHS of the comma in df[, ...]
df[, sample(ncol(df), 1)]
the 1 is because you apparently want to take a sample of size 1.
one minor complication is that your dataframe is model$data[[1]], since your model$data looks like a list with one element which is a dataframe, rather than a plain dataframe. So first, assign df <- model$data[[1]]
finally, if you really really want the sampled column-name(s) as well as their indices:
samp_col_idxs <- sample(ncol(df), 1)
samp_col_names <- colnames(df) [samp_col_idxs]
I have a data frame and I want to create a boolean data frame from it. I want to make all unique values of every column in the original data frame as column names in the bolean data frame. To show it using an example:
mydata =
sex route
m oral
f oral
m topical
f unknown
Then, I want to create
m f oral topical unknown
1 0 1 0 0
0 1 1 0 0
1 0 0 1 0
0 1 0 0 1
I am using the code below to create the bolean data frame. It works in R but not in shiny. What could be the problem?
col_names=c()
for(i in seq(1,ncol(mydata))){
col_names=c(col_names,unique(mydata[i]))
}
col_names= as.vector(unlist(col_names))
my_boolean= data.frame(matrix(0, nrow = nrow(mydata), ncol = length(col_names)))
colnames( my_boolean)=col_names
for(i in seq(1,nrow(mydata))){
for(j in seq(1,ncol(mydata)))
{
my_boolean[i,which(mydata[i,j]==colnames(my_boolean))]=1
}}
There are a few ways you can do this, but I always find table the easiest to understand. Here's an approach with table:
do.call(cbind, lapply(mydf, function(x) table(1:nrow(mydf), x)))
## f m oral topical unknown
## 1 0 1 1 0 0
## 2 1 0 1 0 0
## 3 0 1 0 1 0
## 4 1 0 0 0 1
Using the Cshapes package in R, I want to create a list of matrices that measure for each year whether two countries are neigbors or not.
install.packages("cshapes")
Running the code for one year (here 1990) works fine:
wmat <- distmatrix(as.Date("1990-1-1"), type="mindist", tolerance=0.5, useGW=FALSE)
This gives a matrix with the following structure:
A B C D
1 A 0 0 210 0
2 B 0 0 637 305
3 C 210 637 0 73
4 D 0 305 73 0
In a next step, I set all combinations with 0 distance between two countries to 1, all other combinations to 0, and the diagonal to 0 again:
wmat[wmat>0]<-5
wmat[wmat==0]<-1
wmat[wmat==5]<-0
diag(wmat)<-0
This gives me following matrix:
A B C D
1 A 0 1 0 1
2 B 1 0 0 0
3 C 0 0 0 0
4 D 1 0 0 0
What I struggle to do is to automatically create matrices for all the years between 1960 and 2014, do the corrections for each year and store the results into a list of matrices where I can recall each matrix by the respective year.
Any inputs are highly welcome.
You could try
lst <- lapply(1960:2014, function(x) {
wmat <- distmatrix(as.Date(paste0(x, '-1-1')),
type="mindist", tolerance=0.5, useGW=FALSE)
wmat[wmat>0]<-5
wmat[wmat==0]<-1
wmat[wmat==5]<-0
diag(wmat)<-0
wmat
}
)