Comparing similarity matrices over time - r

I've computed 4 different similarity matrices for same items but in different time periods and I'd like to compare how similarity between items changes over time. The problem is that order of items i.e. matrix columns is different for every matrix. How can I reorder columns in matrices so all my matrices become comparable?

Moving comment to an answer:
For each matrix mx_j create a vector of the column names:
cnj<-colnames(mx_j)
Then for any given pair of matrices j and k,
colmatch<- intersect(cnj,cnk) will identify the common columns, and the analysis can be limited to that subset of names.

Related

Create different subsets of variables in a matrix with no repetition

I have a matrix of about 20 different variables. My scope is to create different combinations of these variables (no repetition, each of min 3 variables) and store them each in a new data frame. I tried with combn and expand.grid but it takes as inputs all the values connected to each variable; while I want it to take only the "name" of the variable and not its values.
Thanks in advance

Optimal merge of 2 unequal dataframes with spatial coordinates in R?

I need some help here. I'm trying to merge 2 dataframes (WIDE.2018 and WIDE.2015) with different number of columns and rows. Sorry I can't share the data. Both have the a similar set of columns with spatial coordinates (lon and lat). I'm trying to merge both of them by the best unique pairs (something such as optimal) or by nearest with replace=F (I'm using an analogy to MatchIt terminology). I only could handle a cbind that finds the closest distance but allows repeating observations.
As mentioned by Geoffrey(thks!), I'm looking to find the optimal 1:1 matching that minimizes euclidean distance across all matches, ensuring that each point has only one match in the other data.frame (with some points being unmatched in the longer data.frame).
library (geosphere)
D = distm(WIDE.2018[, c("lon","lat")], WIDE.2015[, c("lon","lat")])
m1<-cbind(WIDE.2018, WIDE.2015[apply(D, 1, which.min),])
Thanks in advance!
Based on this answer:
Best match between two sets of points
I think you are looking for the Hungarian Algorithm. Here is an implementation in R that claims to work with a rectangular matrix (e.g., unequal sample sizes).
https://rdrr.io/cran/RcppHungarian/man/HungarianSolver.html
I believe the weights in the required matrix would be the distances between points.

R code to generate random pairs of rows and do simulation

I have a data matrix in R having 45 rows. Each row represents a value of a individual sample. I need to do to a trial simulation; I want to pair up samples randomly and calculate their differences. I want a large sampling (maybe 10000) from all the possible permutations and combinations.
This is how I managed to do it till now:-
My data matrix ("data") has 45 rows and 2 columns. I selected 45 rows randomly and subtracted from another randomly generated 45 rows.
n1<-(data[sample(nrow(data),size=45,replace=F),])-(data[sample(nrow(data),size=45,replace=F),])
This gave me a random set of differences of size 45.
I made 50 such vectors (n1 to n50) and did rbind, which gave me a big data matrix containing random differences.
Of course, many rows between first random set and second random set were same and cancelled out. I removed it with a code as follows:
row_sub = apply(new, 1, function(row) all(row !=0 ))
new.remove.zero<-new[row_sub,]
BUT, is there a cleaner way to do this? A simpler way to generate all possible random pairs of rows, calculate their difference as bind it together as a new matrix?
Thanks in advance.

Using data frame values to select columns of a different data frame

I'm relatively new in R so excuse me if I'm not even posting this question the right way.
I have a matrix generated from combination function.
double_expression_combinations <- combn(marker_column_vector,2)
This matrix has x columns and 2 rows. Each column has 2 rows with numbers that will be used to represent column numbers in my main data frame named initial. These columns numbers are combinations of columns to be tested. The initial data frame is 27 columns (thousands of rows) with values of 1 and 0. The test consists in using the 2 numbers given by double_expression_combinations as column numbers to use from initial. The test consists in adding each row of those 2 columns and counting how many times the sum is equal to 2.
I believe I'm able to come up with the counting part, I just don't know how to use the data from the double_expression_combinations data frame to select columns to test from the "initial" data frame.
Edited to fix corrections made by commenters
Using R it's important to keep your terminology precise. double_expression_combinations is not a dataframe but rather a matrix. It's easy to loop over columns in a matrix with apply. I'm a bit unclear about the exact test, but this might succeed:
apply( double_expression_combinations, 2, # the 2 selects each column in turn
function(cols){ sum( initial[ , cols[1] ] + initial[ , cols[2] ] == 2) } )
Both the '+' and '==' operators are vectorised so no additional loop is needed inside the call to sum.

Selecting different elements of an R dataframe (one for each row, but possibly different columns) without using loops

Say I have a data.frame of arbitrary dimensions (n by p). I want to extract a vector of length n from that data.frame, one element in the vector per row in the data.frame. However, the column in which each element lies may vary by row. Is there a way to do this without loops?
For example, if I have the following (3x3) data frame, called say DATA
X Y Z
1 17 43
3 4 2
6 9 0
I want to extract one scalar value from DATA per row. I have a vector, call it column.list, c(1,3,1) (arbitrarily selected in this case) which gives the column index for the elements I want, where the kth element of column.list is the column index for row k in DATA. How do I do this without loops? I want to avoid loops because I am using this repeatedly in a simulation study that will take a lot of running time even without loops, and the row number might be 100,000 or so. Much appreciated!
You can do this by indexing your data.frame with a matrix. The first column indicates row, the second indicates column. So if you do
column.list <- c(1,3,1)
DATA[cbind(1:nrow(DATA), column.list)]
You will get
[1] 1 2 6
as desired. If you mix across columns of different classes, all the variable will be coerced to the most accommodating data type.

Resources