R - Matching rows and colums of matrices with different length - r

my problem at the moment is the following. I have an directed 1-mode edgelist representing pairs of actors participating in joint projects in a certain year, which might look like:
projektleader projectpartner year
A B 2005
A C 2000
B A 2002
... ... ...
Now I need only a subset for one particular year. Not all actors are active in very year, so the dimensions of the subsets differ. For a following Network Analysis, I need a weighted and directed adjacency matrix, so I use the option of the [network package] to create it. I first load it as a network object and transform it then in a adjacency matrix.
grants_00 <- subset(grants, (year_grant=2000), select = c(projectpartner, projectleader))
nw_00 <- network(grants_08to11[,1:2], matrix="edgelist", directed=TRUE)
grants_00.adj <- as.matrix(nw_00, matrix.type = "adjacency")
The resulting matrix looks somewhat like
A B C E ...
A 0 1 1 0
B 1 0 0 0
...
So far so good. My problem is now: For the further analysis I am planning to do I need an adjacency Matrix for every year with the same dimension and order. That means that all actors from the initial dataset have to be the row and column names of the matrix for the corresponding years, but the matrix should only contain observed pairs for this certain year. I hope my problem is clear. I appreciate any kind of constructive solutions.
My idea ATM is the following: I create a matrix of the initial dataset and the reduced dataset. Then I set all matrix values there to Zero. Then I somehow match it with the reduced matrix and fill it with the right values in the right rows and columns. Unfortunately I have no clue how this might be possible.
Has anybody an idea how to solve this problem?

Unfortunately , your question is not clear, so I will try to answer.
If I understand you want :
****Given a big and small matrix : Find the locations where they match?****
I regenerate your data
library(network)
N <- 20
grants <- data.frame(
projectleader = sample(x=LETTERS[1:20],size=N,replace = TRUE),
projectpartner = sample(x=LETTERS[1:20],size=N,replace = TRUE),
year_grant = sample(x=0:5 ,size=N,replace = TRUE) +2000
)
head(grants)
projectleader projectpartner year_grant
1 D K 2002
2 M M 2001
3 K L 2005
4 N Q 2002
5 G D 2003
6 I B 2004
Function to create the small matrix
##
adjency <- function(year){
grants_00 <- subset(grants, (year_grant==year),
select = c(projectpartner, projectleader))
nw_00 <- network(grants_00, matrix="edgelist", directed=TRUE)
grants_00.adj <- as.matrix(nw_00, matrix.type = "adjacency")
as.data.frame(grants_00.adj)
}
use plyr to get a list for every year
library(plyr)
years <- unique(grants$year_grant)
years <- years[order(years)]
bigMatrix <- llply(as.list(years),.fun=adjm)
Create full matrix (The answer)
# create an empty matrix with NAs
population <- union(grants$projectpartner,grants$projectleader)
population_size <- length(population)
full_matrix <- matrix(rep(NA, population_size*population_size),
nrow=population_size)
rownames(full_matrix) <- colnames(full_matrix) <- population
find the location where they match
frn <- as.matrix(bigMatrix[[1]])
tmp <- match(rownames(frn), rownames(full_matrix))
tmp2 <- match(colnames(frn), colnames(full_matrix))
# do a merge
full_matrix[tmp,tmp2] <- frn
head(bigMatrix[[1]])
D I J K O Q S
D 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0
J 1 0 0 0 0 0 0
K 0 0 0 0 0 0 0
O 0 0 0 1 0 0 0
Q 0 1 0 0 0 0 0
the full matrix
K M L Q D B E J C S O F G N I A H
K 0 NA NA 0 0 NA NA 0 NA 0 0 NA NA NA 0 NA NA
M NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
L NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Q 0 NA NA 0 0 NA NA 0 NA 0 0 NA NA NA 1 NA NA
D 0 NA NA 0 0 NA NA 0 NA 0 0 NA NA NA 0 NA NA
B NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
E NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
J 0 NA NA 0 1 NA NA 0 NA 0 0 NA NA NA 0 NA NA
C NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
S 0 NA NA 1 0 NA NA 0 NA 0 0 NA NA NA 0 NA NA
O 1 NA NA 0 0 NA NA 0 NA 0 0 NA NA NA 0 NA NA
F NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
G NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
N NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
I 0 NA NA 0 0 NA NA 0 NA 0 0 NA NA NA 0 NA NA
A NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
H NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Related

Nested loop in R with two level of variation

Hello dear stack overflow community,
Here is the context of my problem : I have a dataframe with each column corresponding to one bat species and and each row corresponds to the acoustic activity measured for one night (for each night of recording not all the species as been sampled).
eg :
> Dataset
Bba Ese Hsa Mda Mda.Mca Mema Mpu
1 3 NA NA NA 33 NA NA
2 NA NA NA NA 1 NA NA
3 2 4 1 NA 19 1 NA
4 NA NA NA NA 25 NA NA
5 NA NA NA NA 3 NA NA
6 1 1 NA NA 53 NA NA
7 1 NA 9 NA NA 1 NA
8 NA NA 10 NA NA NA NA
9 NA NA NA NA NA NA NA
10 1 1 NA NA NA NA NA
11 6 NA NA NA NA NA NA
12 12 NA 1 NA NA 1 NA
13 3 NA 2 NA NA 1 NA
14 1 NA NA NA NA NA NA
15 NA NA NA NA NA NA NA
16 1 NA NA NA NA NA NA
17 2 NA NA NA NA 2 NA
18 1 1 NA NA NA NA 1
19 NA NA NA NA NA NA NA
20 1 1 NA NA NA NA NA
21 2 NA 1 NA NA NA NA
22 1 NA NA NA NA 4 NA
23 1 NA 1 NA NA 1 NA
24 NA NA NA NA NA 2 NA
25 1 NA NA NA NA NA NA
26 1 NA NA NA NA 1 NA
27 1 NA NA NA NA NA NA
28 5 NA NA NA NA NA NA
29 NA NA NA NA NA NA NA
.....
To study vocal activity I am checking the quantile of bat vocal activity per species
apply(Dataset[,9:15],2,quantile, na.rm=TRUE, type=7, c(0.02,0.25,0.5,0.75,0.98))
Bba Ese Hsa Mda Mda.Mca Mema Mpu
2% 1.00 1.00 1.00 1.00 1.00 1.00 1
25% 1.00 1.00 2.00 2.00 2.00 1.00 1
50% 3.00 4.00 6.00 4.00 3.00 2.00 2
75% 9.75 12.00 18.00 12.00 20.00 4.00 6
98% 53.86 69.88 166.12 313.32 159.04 27.28 44
To test the impact of sampling (number of night) on my quantile estimate, I want to do a boostrap. More specifically, I want to calculate the mean of the bat activity if I take only 3 night per species using 1000 random sample with replacement. And i want to do it If I take from 3 to 70 nights.
This is what I have so far (for one species):
Bbana<-as.data.frame(Bbana)
L= length(Bbana[,1])
B= 1000
m<-list()
for (j in 3:70) {
for (i in 1 : B) {
idx<-sample(1:L, j, replace=TRUE)
data_idx<-Bbana[idx, ]
m[i]<-mean(data_idx)
}}
Somehow it didn't give my what I am expected : 67 list with 1000 means of bat activity.
Could anyone help me ?
(I don't know if it's clear enough...)
Thanks in advance
if you want to stick to loops and lists:
for (j in 3:70) {
mat = matrix(NA, nrow = B, ncol = ncol(idx))
for (i in 1 : B) {
idx<-sample(1:L, j, replace=TRUE)
data_idx<-Bbana[idx, ]
mat[i,] = colMeans(data_idx, na.rm = TRUE)
}
m[[j]] = mat
}
Otherwise, this option should work (and should be more efficient / convenient to use):
sample.fun = function(nb.nights, dataset){
# select randomly nb.nights rows to sample
selected.rows = sample(1:nrow(dataset), nb.nights, replace = FALSE)
# return a vector with their means
return(colMeans(dataset[select.rows,], na.rm = TRUE))
}
sapply(3:67, function(nights) replicate(1000, sample.fun(nights, dataset), simplify = 'array'), simplify = FALSE)
This will return you a list of 67 elements that each contains a dataframe of 1000 rows (1000 means per species)

Measure impact of store renovation on sales in R

I have a dataset that contains the sales of stores for the last years, together with the year when the shop was renovated last. My goal is to measure if the renovation had an impact on sales post-reopening, and how this impact evolved over the 4 years after the re-opening.
My challenge is that the general trend in the data set shows that all stores re losing about 2% per year of revenues. I therefore need to take that into account as well when measuring my effect.
My initial idea was to create dummies for each possible year of renovation, but this won't work given that I only data for 35 shops. I therefore tried to create a variable counting the number of years since renovation, but i'm missing something I think:
library(data.table)
year_start = 2013
year_stop = 2017
n_years = year_stop - year_start+1
seed_sales = 100
year_decrease = 0.02
n_shops = 35
shops = paste0("Shop",seq(1,n_shops))
dt_sales <- data.table( Shop = sort(rep(shops, n_years)),
Year = rep(seq(year_start,year_stop), length(shops)),
Year_Renovation = round(rbinom(n_shops*n_years,1,0.3)*runif(1, year_start-10, year_stop))
)
dt_sales[, Sales := 100-(Year-year_start)*year_decrease*rnorm(n_shops*n_years,1)-ifelse(Year_Renovation==1,ifelse(Year-Year_Renovation<2,10,0)*rnorm(n_shops*n_years)+ifelse(Year-Year_Renovation>2,10*Year-Year_Renovation,0)*rnorm(n_shops*n_years),0)]
## Current thinking
dt_sales[, Is_renovated := ifelse(Year_Renovation == 0,0,1)]
dt_sales[Is_renovated==1 & Year-Year_Renovation>=0, Years_since_rennovation := Year-Year_Renovation]
lm = glm(Sales ~ Year + Is_renovated:Years_since_rennovation, data=dt_sales,family = gaussian(),na.action = na.omit)
summary(lm)
Output is:
(Intercept) 137.855325 9.679754 14.242 < 2e-16 ***
Year -0.018807 0.004803 -3.915 0.000279 ***
Years_since_rennovation NA NA NA NA
The yearly decline is captured, but the effect of renovation is apparently wrapped into the intercept, which goes up to 137 instead of 100 as I set it.
Where am I going wrong?
Thanks!
Stefano
What follows is an answer to your R question. If you have any questions about whether this is the proper modelling strategy, I would head to Cross Validated.
There's two problems. First, dt_sales$Years_since_rennovation is almost all NA:
dt_sales$Years_since_rennovation
[1] NA NA NA NA 2 NA NA NA 1 2 NA NA NA 1 2 NA NA NA 1 NA NA NA 0 1 NA
[26] NA NA 0 NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA 1 NA NA NA 0 1 NA
[51] NA NA NA NA NA NA NA NA NA NA NA NA 0 1 NA NA NA NA NA NA NA NA 0 NA NA
[76] NA NA NA 1 NA NA NA NA 1 NA NA NA 0 1 2 NA NA 0 NA NA NA NA NA NA NA
[101] NA NA 0 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA
[126] NA NA NA 1 2 NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA 0 NA 2
[151] NA NA NA NA NA NA NA NA NA NA NA NA 0 NA NA NA NA NA NA NA NA NA NA NA NA
Therefore, you see in the summary() output
(144 observations deleted due to missingness)
Then, if we examine dt$Year and dt$Years_since_rennovation for the remaining observations, we see there's perfect multicollinearity:
dt_sales$Year[!is.na(dt_sales$Years_since_rennovation)] - 2015
# [1] 2 1 2 1 2 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 2 0 0 1 1 1 2 0 0 2 0
dt_sales$Years_since_rennovation[!is.na(dt_sales$Years_since_rennovation)]
# [1] 2 1 2 1 2 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 2 0 0 1 1 1 2 0 0 2 0
This makes it impossible for R to estimate both coefficients. So, R estimates the first coefficient and drops the second variable. If you don't want R to do that without throwing an error, pass singular.ok = FALSE (see help("glm")):
lm = glm(Sales ~ Year + Is_renovated:Years_since_rennovation, data=dt_sales,
family = gaussian(), na.action = na.omit, singular.ok = FALSE)
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, :
singular fit encountered
As a side note, I'd avoid naming objects lm as that's also the name of the basic OLS function.

why i cant change contents in column in R?

> data$Accepted.Final.round
[1] NA NA NA NA NA NA NA NA 1 NA NA NA NA 1 1 1 1 0 1 0 0 1 1 1
1
1 NA 1 1 1 1
[32] NA 1 1 0 1 1 1 1 1 1 NA 1 1 0 1 1 0 1 1 1 1 1 1 1
1
NA 1 NA NA NA NA
[63] NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA
NA
NA NA NA NA NA NA
[94] NA NA NA NA NA NA NA NA NA NA NA NA NA 1 NA NA NA NA NA
I have a dataset column consists of NA, 1, 0. However when I try
data$Accepted.Final.round[data$Accepted.Final.round==NA]<-0
or
ifelse(data$Accepted.Final.round==1,1,0)
to replace NA with 0, both lines cannot work.
Could you guys think of any ways to fix this?
Use is.na() to determine if a value is NA. NA is contagious, meaning that doing operations with NA usually returns NA. That includes checking for equality with ==, i.e. x == NA will always return NA and not TRUE or FALSE.
x <- c(2, NA, 2)
x[is.na(x)] <- 0
The second attempt from OP was pretty close:
data$Accepted.Final.round <- ifelse(is.na(data$Accepted.Final.round),
0 ,data$Accepted.Final.round)
The document for ifelse explains as:
Usages:
ifelse(test, yes, no)
yes will be evaluated if and only if any element of test is true, and
analogously for no.
Missing values (i.e. NA) in test give missing (NA) values in the result.

convert a list of matrices into one large matrix in R

I have a list of matrices that have been extracted from a larger network using the egoextract function from R's 'network' package. I need to merge all the (square) matrices, which have different numbers of rows/columns into one larger matrix containing all the information of the previous ones and 'NA's for places where the information is missing (which is ok).
I tried cbindX function but it only does it along the column dimension and not the rows, so the product is a rectangular matrix.
This is what I've done so far
require(network)
require(statnet)
require(gdata)
samplenet <- as.network.numeric(100, directed = TRUE, density = 0.03)
plot(samplenet)
set.vertex.attribute(samplenet, "name", 1:100)
names <- get.vertex.attribute(samplenet, "name")
rv1 <- sample(names,1) #selects a random vertex
rv2 <- get.neighborhood(samplenet, rv1, type = c("combined"), na.omit = TRUE) #selects the neighborhood around selected vertex
rv <- unique(unlist(merge(rv1, rv2))) #combines vertex + neighborhood into one list without duplicates
extraction <- unique(ego.extract(samplenet, ego = rv, neighborhood = c("combined")))
df <- data.frame(extraction) -- error due to different number of rows/columns
Alternative (leads to rectangular matrix, which is not sufficient)
df <- cbindX(extraction[[1]], extraction[[2]], extraction[[3]])
Thank you awesome R community!!!
Try this:
allnames <- unique(unlist(sapply(extraction, colnames)))
df <- do.call(rbind, lapply(extraction, function(mat) {
df <- data.frame(mat); colnames(df) <- colnames(mat)
df[setdiff(allnames, colnames(df))] <- NA; df
}))
head(df)
# 66 3 9 27 31 49 86 87 26 89 16 18 24 41 53 65 73 79 88 30 48 71 78 19 23 43 20 85 100 17 25 38 40 72 2 29 50 57
# 66 0 1 1 0 0 1 1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 3 0 0 0 0 1 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 9 0 1 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 27 1 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 31 1 0 0 0 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# 49 0 0 0 1 0 0 0 0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
If you want to order by column name:
head(df[, order(as.integer(colnames(df)))])
# 2 3 9 16 17 18 19 20 23 24 25 26 27 29 30 31 38 40 41 43 48 49 50 53 57 65 66 71 72 73 78 79 85 86 87 88 89 100
# 66 NA 1 1 NA NA NA NA NA NA NA NA NA 0 NA NA 0 NA NA NA NA NA 1 NA NA NA NA 0 NA NA NA NA NA NA 1 1 NA NA NA
# 3 NA 0 0 NA NA NA NA NA NA NA NA NA 0 NA NA 1 NA NA NA NA NA 0 NA NA NA NA 0 NA NA NA NA NA NA 0 0 NA NA NA
# 9 NA 1 0 NA NA NA NA NA NA NA NA NA 0 NA NA 0 NA NA NA NA NA 0 NA NA NA NA 0 NA NA NA NA NA NA 0 0 NA NA NA
# 27 NA 0 0 NA NA NA NA NA NA NA NA NA 0 NA NA 0 NA NA NA NA NA 0 NA NA NA NA 1 NA NA NA NA NA NA 0 0 NA NA NA
# 31 NA 0 0 NA NA NA NA NA NA NA NA NA 0 NA NA 0 NA NA NA NA NA 0 NA NA NA NA 1 NA NA NA NA NA NA 0 0 NA NA NA
# 49 NA 0 0 NA NA NA NA NA NA NA NA NA 1 NA NA 0 NA NA NA NA NA 0 NA NA NA NA 0 NA NA NA NA NA NA 0 0 NA NA NA

How to populate matrix with values from another matrix in R?

How do you populate an empty matrix with the values of another matrix?
The empty matrix:
> m1 <- matrix(ncol=8, nrow=8)
> rownames(m1) <- c('a','b','c','d','e','f','g','h')
> colnames(m1) <- c('a','b','c','d','e','f','g','h')
> m1
a b c d e f g h
a NA NA NA NA NA NA NA NA
b NA NA NA NA NA NA NA NA
c NA NA NA NA NA NA NA NA
d NA NA NA NA NA NA NA NA
e NA NA NA NA NA NA NA NA
f NA NA NA NA NA NA NA NA
g NA NA NA NA NA NA NA NA
h NA NA NA NA NA NA NA NA
The matrix with the values to populate the empty matrix:
> m2 <- matrix(ncol=4, nrow=4)
> rownames(m2) <- c('b','e','h','x')
> colnames(m2) <- c('b','e','h','x')
> m2[,'b'] <- c(1,2,3,1)
> m2[,'e'] <- c(2,1,1,5)
> m2[,'h'] <- c(3,1,3,5)
> m2[,'x'] <- c(1,5,5,1)
> m2
b e h x
b 1 2 3 1
e 2 1 1 5
h 3 1 3 5
x 1 5 5 1
How do you merge the two matrixes to get this result:
a b c d e f g h
a NA NA NA NA NA NA NA NA
b NA 1 NA NA 2 NA NA 3
c NA NA NA NA NA NA NA NA
d NA NA NA NA NA NA NA NA
e NA 2 NA NA 1 NA NA 1
f NA NA NA NA NA NA NA NA
g NA NA NA NA NA NA NA NA
h NA 3 NA NA 1 NA NA 3
Edit: added row/col x in m2, which is not in m1
Find the column (row) names that both matrices have in common
cols <- colnames(m1)[colnames(m1) %in% colnames(m2)]
rows <- rownames(m1)[rownames(m1) %in% rownames(m2)]
Then assign the appropriate values from m2 to m1
m1[rows, cols] <- m2[rows, cols]
m1
# a b c d e f g h
#a NA NA NA NA NA NA NA NA
#b NA 1 NA NA 2 NA NA 3
#c NA NA NA NA NA NA NA NA
#d NA NA NA NA NA NA NA NA
#e NA 2 NA NA 1 NA NA 1
#f NA NA NA NA NA NA NA NA
#g NA NA NA NA NA NA NA NA
#h NA 3 NA NA 1 NA NA 3

Resources