Fit one matrix into another in R - r

I have a matrix A[72][36] and I would like to fit the values of A in a bigger matrix B[360][180].
I constructed this data frame linking the col/row index of A to the new 'grid'.
> head(INDEX)
LonNew LatNew LonINT LatINT
1 -179.5 -89.5 1 1
2 -178.5 -88.5 1 1
3 -177.5 -87.5 1 1
4 -176.5 -86.5 1 1
5 -175.5 -85.5 1 1
6 -174.5 -84.5 2 2
7 -173.5 -83.5 2 2
8 -172.5 -82.5 2 2
9 -171.5 -81.5 2 2
10 -170.5 -80.5 2 2
Then I calculated the corresponding values of the new Lat/Lon couples
NEWVar <- array(NA, dim = length(INDEX$LonNew))
for (j in 1:length(INDEX$LonINT) ){
NEWVar[j] <- A[INDEX$LonINT[j],INDEX$LatINT[j]]
}
> head(NEWVar)
3 3 3 3 3 4 4 4 4 4
The problem is then that I don't know how to create the new 360x180 matrix where for each couple (LonNew,LatNew) I have the corresponding NEWVar.
Can someone help me?

I've created a smaller, complete reproducible example. Here's the smaller matrix.
A<-matrix(1:4, nrow=2)
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
And let's say you want to scale that up to a 5x5 using this index.
INDEX<-data.frame(
LonNew = rep(c(-2,-2,0,2,2), each=5),
LatNew = rep(c(-2,-2,0,2,2), 5),
LonInt = rep(c(1,1,1,2,2), each=5),
LatInt = rep(c(1,1,2,1,2), 5)
)
The easiest way to turn the new values of Lat and Lon into array indexes is via factor variables. So i created
NNF <- factor(INDEX$LonNew)
TNF <- factor(INDEX$LatNew)
And i create the new B matrix with
B<-matrix(NA, nrow=nlevels(NNF), ncol=nlevels(TNF),
dimnames=list(levels(NNF), levels(TNF)))
And then I do the assignment with
B[cbind(NNF, TNF)] <- A[cbind(INDEX$LonInt, INDEX$LatInt)]
and that returns
# -2 0 2
# -2 1 3 3
# 0 1 3 3
# 2 2 4 4
which has scaled up the matrix according to the index data. The trick here was just index our matrices with matrices so we can grab different row and column values each time.

Related

Crosschecking numbers of a matrix in R

I'm currently working with a large matrix of two columns, and what I want to check is If every line/combination (two columns) is also present in a dataframe loaded (two columns as well).
Example,
(obj_design <- matrix(c(2,5,4,7,6,6,20,12,4,0), nrow = 5, ncol = 2))
[,1] [,2]
[1,] 2 6
[2,] 5 20
[3,] 4 12
[4,] 7 4
[5,] 6 0
(refined_grid <- data.frame(i=1:4, j=1:12))
i j
1 1 1
2 2 2
3 3 3
4 4 4
5 1 5
6 2 6
7 3 7
8 4 8
9 1 9
10 2 10
11 3 11
12 4 12
Following the reproducible example, it would be selected (2,6) and (4,12).
I'm wondering if there's a function that I can use to check the whole matrix, and see if a specific line is in the dataframe, and (if possible) write separately (new dataset) which elements of the matrix it is in.
Any assistance would be wonderful.
Here is an option with match
i1 <- match(do.call(paste, as.data.frame(obj_design)),
do.call(paste, refined_grid), nomatch = 0)
refined_grid[i1,]
This code will give you which rows of the matrix exist in the dataframe.
which(paste(obj_design[,1], obj_design[,2]) %in%
paste(refined_grid$i, refined_grid$j)
)
Then you can just assign it to a vector!

Select unique values from a list of 3

I would like to list all unique combinations of vectors of length 3 where each element of the vector can range between 1 to 9.
First I list all such combinations:
df <- expand.grid(1:9, 1:9, 1:9)
Then I would like to remove the rows that contain repetitions.
For example:
1 1 9
9 1 1
1 9 1
should only be included once.
In other words if two lines have the same numbers and the same number of each number then it should only be included once.
Note that
8 8 8 or
9 9 9 is fine as long as it only appears once.
Based on your approach and the idea to remove repetitions:
df <- expand.grid(1:2, 1:2, 1:2)
# Var1 Var2 Var3
# 1 1 1 1
# 2 2 1 1
# 3 1 2 1
# 4 2 2 1
# 5 1 1 2
# 6 2 1 2
# 7 1 2 2
# 8 2 2 2
df2 <- unique(t(apply(df, 1, sort))) #class matrix
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 2 2
# [4,] 2 2 2
df2 <- as.data.frame(df2) #class data.frame
There are probably more efficient methods, but if I understand you correct, that is the result you want.
Maybe something like this (since your data frame is not large, so it does not pain!):
len <- apply(df,1,function(x) length(unique(x)))
res <- rbind(df[len!=2,], df[unique(apply(df[len==2,],1,prod)),])
Here is what is done:
Get the number of unique elements per row
Comprises two steps:
First argument of rbind: Those with length either 1 (e.g. 1 1 1, 7 7 7, etc) or 3 (e.g. 5 8 7, 2 4 9, etc) are included in the final results res.
Second argument of rbind: For those in which the number of unique elements are 2 (e.g. 1 1 9, 3 5 3, etc), we apply product per row and take whose unique products (cause, for example, the product of 3 3 5 and 3 5 3 and 5 3 3 are the same)

Repeat vector to fill down column in data frame

Seems like this very simple maneuver used to work for me, and now it simply doesn't. A dummy version of the problem:
df <- data.frame(x = 1:5) # create simple dataframe
df
x
1 1
2 2
3 3
4 4
5 5
df$y <- c(1:5) # adding a new column with a vector of the exact same length. Works out like it should
df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
df$z <- c(1:4) # trying to add a new colum, this time with a vector with less elements than there are rows in the dataframe.
Error in `$<-.data.frame`(`*tmp*`, "z", value = 1:4) :
replacement has 4 rows, data has 5
I was expecting this to work with the following result:
x y z
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 1
I.e. the shorter vector should just start repeating itself automatically. I'm pretty certain this used to work for me (it's in a script that I've been running a hundred times before without problems). Now I can't even get the above dummy example to work like I want to. What am I missing?
If the vector can be evenly recycled, into the data.frame, you do not get and error or a warning:
df <- data.frame(x = 1:10)
df$z <- 1:5
This may be what you were experiencing before.
You can get your vector to fit as you mention with rep_len:
df$y <- rep_len(1:3, length.out=10)
This results in
df
x z y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 1
5 5 5 2
6 6 1 3
7 7 2 1
8 8 3 2
9 9 4 3
10 10 5 1
Note that in place of rep_len, you could use the more common rep function:
df$y <- rep(1:3,len=10)
From the help file for rep:
rep.int and rep_len are faster simplified versions for two common cases. They are not generic.
If the total number of rows is a multiple of the length of your new vector, it works fine. When it is not, it does not work everywhere. In particular, probably you have used this type of recycling with matrices:
data.frame(1:6, 1:3, 1:4) # not a multiply
# Error in data.frame(1:6, 1:3, 1:4) :
# arguments imply differing number of rows: 6, 3, 4
data.frame(1:6, 1:3) # a multiple
# X1.6 X1.3
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 1
# 5 5 2
# 6 6 3
cbind(1:6, 1:3, 1:4) # works even with not a multiple
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 1 4
# [5,] 5 2 1
# [6,] 6 3 2
# Warning message:
# In cbind(1:6, 1:3, 1:4) :
# number of rows of result is not a multiple of vector length (arg 3)

How to transform a list of user ratings into a matrix in R

I am working on a collaborative filtering problem, and I am having problems reshaping my raw data into a user-rating matrix. I am given a rating database with columns 'movie', 'user' and 'rating'. From this database, I would like to obtain a matrix of size #users x #movies, where each row indicates a user's ratings.
Here is a minimal working example:
# given this:
ratingDB <- data.frame(rbind(c(1,1,1),c(1,2,NA),c(1,3,0), c(2,1,1), c(2,2,1), c(2,3,0),
c(3,1,NA), c(3,2,NA), c(3,3,1)))
names(ratingDB) <- c('user', 'movie', 'liked')
#how do I get this?
userRating <- matrix(data = rbind(c(1,NA,0), c(1,1,0), c(NA,NA,1)), nrow=3)
I can solve the problem using two for loops, but this of course doesn't scale well. Can anybody help with me with a vectorized solution?
This can be done without any loop. It works with the function matrix:
# sort the 'liked' values (this is not neccessary for the example data)
vec <- with(ratingDB, liked[order(user, movie)])
# create a matrix
matrix(vec, nrow = length(unique(ratingDB$user)), byrow = TRUE)
[,1] [,2] [,3]
[1,] 1 NA 0
[2,] 1 1 0
[3,] NA NA 1
This will transform the vector stored in ratingDB$liked to a matrix. The argument byrow = TRUE allows arranging the data in rows (the default is by columns).
Update: What to do if the NA cases are not in the data frame?
(see comment by #steffen)
First, remove the rows containing NA:
subDB <- ratingDB[complete.cases(ratingDB), ]
user movie liked
1 1 1 1
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 0
9 3 3 1
The full data frame can be reconstructed. The function expand.grid is used to generate all combinations of user and movie:
full <- setNames(with(subDB, expand.grid(sort(unique(user)), sort(unique(movie)))),
c("user", "movie"))
movie user
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
Now, the information of the sub data frame subDB and the full combination data frame full can be combined with the merge function:
ratingDB_2 <- merge(full, subDB, all = TRUE)
user movie liked
1 1 1 1
2 1 2 NA
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 0
7 3 1 NA
8 3 2 NA
9 3 3 1
The result is identical with the original matrix. Hence, the same procedure can be applied to transform it to a matrix of liked values:
matrix(ratingDB_2$liked, nrow = length(unique(ratingDB_2$user)), byrow = TRUE)
[,1] [,2] [,3]
[1,] 1 NA 0
[2,] 1 1 0
[3,] NA NA 1

Count and label observations per participant using loop

I have repeated-measures data.
I need to create a loop that will incrementally count each observation, within a participant, and label it.
I am new to writing loops. My logic was to say, for each item in the list of unique ids, count each row in that, and apply some function to that row.
Could someone point our what I am doing wrong?
data$Ob <- 0
for (i in unique(data$id)) {
count <- 1
for (u in data[data$id == i,]) {
data[data$id ==u,]$Ob <- count
count <- count + 1
print(count)
}
}
Thanks!
Justin
You can also use ave:
set.seed(1)
data <- data.frame(id = sample(4, 10, TRUE))
data$Ob = ave(data$id, data$id, FUN=seq_along)
data
id Ob
1 2 1
2 2 2
3 3 1
4 4 1
5 1 1
6 4 2
7 4 3
8 3 2
9 3 3
10 1 2
# Generate some dummy data
data <- data.frame(Ob=0, id=sample(4,20,TRUE))
# Go through every id value
for(i in unique(data$id)){
# Label observations
data$Ob[data$id == i] = 1:sum(data$id == i)
}
Be aware though that for loops are notoriously slow in R. In this simple case they work fine, but should you have millions and millions of rows in your data frame you'd better do something purely vectorized.
But you don't need a loop...
data <- data.frame (id = sample (4, 10, TRUE))
## id
## 1 3
## 2 4
## 3 1
## 4 3
## 5 3
## 6 4
## 7 2
## 8 1
## 9 1
## 10 4
data$Ob [order (data$id)] <- sequence (table (data$id))
## id Ob
## 1 3 1
## 2 4 1
## 3 1 1
## 4 3 2
## 5 3 3
## 6 4 2
## 7 2 1
## 8 1 2
## 9 1 3
## 10 4 3
(works also with character or factor IDs)
(isn't R just cool!?)

Resources