How is it possible to make a matrix of array in R? - r

What i want to have is a matrix in which each element is an array itself.
This array is taken subsetting a dataframe, but the example can be generalized for any array.
I tried with:
My_matrix <- matrix(array(), nrow = NROW, ncol = NCOL)
for (i in 1:NROW){
for(j in 1:NCOL){
My_matrix[i,j] <- df[df$var1 == j & df$var2== i,]$var3
}
}
but I got this message error:
Error in My_matrix[i,j] <- df[df$var1== j & df$var2== i,]$var3 :
number of items to replace is not a multiple of replacement length
How should I define and access each element of the matrix and each element of the contained array?

I think I understand that: (1) the base array is 45x3; (2) each cell has a differently sized matrix; and (3) this is not known apriori. Gotcha. Not possible. An array (matrix) is always perfectly dimensioned, and while you can dynamically change one or more of the dimensions, you change for all cells.
Alternative: list-columns.
dat <- data.frame(x=1:3, y=11:13)
dat$z <- lapply(3:5, function(i) matrix(seq_len(i^2), nr=i))
dat
# x y
# 1 1 11
# 2 2 12
# 3 3 13
# z
# 1 1, 2, 3, 4, 5, 6, 7, 8, 9
# 2 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
# 3 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
That doesn't look very appealing, but if you want a different presentation, you might consider assigning it as a tibble::tbl_df (available whenever dplyr is loaded as well). (Note that presentation is distinct from storage and accessibility.)
library(tibble)
as_tibble(dat)
# # A tibble: 3 x 3
# x y z
# <int> <int> <list>
# 1 1 11 <int[,3] [3 x 3]>
# 2 2 12 <int[,4] [4 x 4]>
# 3 3 13 <int[,5] [5 x 5]>
Subsetting is consistent:
dat$z[ dat$x == 2 & dat$y == 12 ]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
### note that you need an extra [[1]] to get to the real data
m <- dat$z[ dat$x == 2 & dat$y == 12 ][[1]]
m
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
m[3,4]
# [1] 15

Related

How to repeat a data list with two vectors in R

I have a list data X with two vectors
X[1]=(1,2,3,5,6,9,7,8)
X[2]=(2,3,4,5,6)
I want to get a new list data Y
Y[1]=(1,2,3,5,6,9,7,8,1,2,3,5,6,9,7,8)-repeat x[1]
Y[2]=(2,3,4,5,6,2,3,4,5,6)-repeat x[2]
I used Y<-rep(X,2) but get
Y[1]:(1,2,3,5,6,9,7,8)
Y[2]:(2,3,4,5,6)
Y[3]:(1,2,3,5,6,9,7,8)
Y[4]:(2,3,4,5,6)
How to do it right? Many thanks.
Use sapply/lapply :
sapply(X, rep, 2)
#[[1]]
# [1] 1 2 3 5 6 9 7 8 1 2 3 5 6 9 7 8
#[[2]]
# [1] 2 3 4 5 6 2 3 4 5 6
data
X <- list(c(1, 2, 3, 5, 6, 9, 7, 8), c(2, 3, 4, 5, 6))
You are having problems accessing the list elements - use [[1]] etc.
X <- list( c(1,2,3,5,6,9,7,8),
c(2,3,4,5,6))
Y = list(rep(X[[1]], 2),
rep(X[[2]], 2))
# R > Y
# [[1]]
# [1] 1 2 3 5 6 9 7 8 1 2 3 5 6 9 7 8
#
# [[2]]
# [1] 2 3 4 5 6 2 3 4 5 6
Using map from purrr
library(purrr)
map(X, rep, 2)
data
X <- list(c(1, 2, 3, 5, 6, 9, 7, 8), c(2, 3, 4, 5, 6))

How do I operate basic functions within a column in R?

I apologize for the poor phrasing of this question, I am still a beginner in R and I am still getting used to the proper terminology. I have provided sample data below:
mydata <- data.frame(x = c(1, 2, 7, 19, 45), y=c(10, 12, 15, 19, 24))
View(mydata)
My intention is to find the x speed, and for this I would need to find the difference between 1 and 2, 2 and 7, 7 and 19, and so on. How would I do this?
You can use the diff function.
> diffs <- as.data.frame(diff(as.matrix(mydata)))
> diffs
x y
1 1 2
2 5 3
3 12 4
4 26 5
> mean(diffs$x)
[1] 11
You can use dplyr::lead() and dplyr::lag() depending on how you want the calculations to line up
library(dplyr)
mydata <- data.frame(x = c(1, 2, 7, 19, 45), y=c(10, 12, 15, 19, 24))
View(mydata)
mydata %>%
mutate(x_speed_diff_lead = lead(x) - x
, x_speed_diff_lag = x - lag(x))
# x y x_speed_diff_lead x_speed_diff_lag
# 1 1 10 1 NA
# 2 2 12 5 1
# 3 7 15 12 5
# 4 19 19 26 12
# 5 45 24 NA 26

New variable with values depending on combination of other variables

I'm very inexperienced in R, and although this site has been tremendously helpful, I have a very specific situation and cannot find a solution. I imagine I need to write a function to accomplish this. However, my current time frame does not allow me to spend the time doing trial/error. (I apologize in advance for anything unclear).
Here is an example of my current data:
UniqueID, Time1.Feel1, Time2.Feel1.1, Time2.Feel1.2, Time2Num
1, 9, 5, 6, 1
1, 9, 7, 5, 2
2, 4, 3, 4, 1
2, 4, 5, 6, 2
3, 7, 4, 7, 1
3, 7, 6, 5, 2
I want to create a new variable: Time2.Feel1, which consists of the values of either Time2.Feel1.1 OR Time2.Feel1.2, depending on the value of Time2Num.
So, this:
UniqueID, Time1.Feel1, Time2.Feel1.1, Time2.Feel1.2, Time2Num, Time2.Feel1
1, 9, 5, 6, 1, 5
1, 9, 7, 5, 2, 5
2, 4, 3, 4, 1, 3
2, 4, 5, 6, 2, 6
3, 7, 4, 7, 1, 4
3, 7, 6, 5, 2, 5
I need to do this 30 times (i.e., Time2Num has values 1:30 and there are 30 different Time2.Feel1 variables: Time2.Feel1.1:30)
I then want to calculate a correlation between Time1.Feel1 and Time2.Feel1 for EACH UniqueID, creating a new data frame with the variables UniqueID and the new correlations. This part is less of a concern; I think I've figured out how to that, but if the combined steps could be done more simply, I'd prefer that.
Thanks in advance!
To expound on #thelatemail's comment, you could do this
dat <- read.csv(text="UniqueID, Time1.Feel1, Time2.Feel1.1, Time2.Feel1.2, Time2Num
1, 9, 5, 6, 1
1, 9, 7, 5, 2
2, 4, 3, 4, 1
2, 4, 5, 6, 2
3, 7, 4, 7, 1
3, 7, 6, 5, 2")
dat$Time2.Feel1 <- dat[c("Time2.Feel1.1","Time2.Feel1.2")][cbind(seq(nrow(dat)),dat$Time2Num)]
# UniqueID Time1.Feel1 Time2.Feel1.1 Time2.Feel1.2 Time2Num Time2.Feel1
# 1 1 9 5 6 1 5
# 2 1 9 7 5 2 5
# 3 2 4 3 4 1 3
# 4 2 4 5 6 2 6
# 5 3 7 4 7 1 4
# 6 3 7 6 5 2 5
Doing that 30 times isn't very efficient, so you could use a loop:
## creating some example data which I think matches your format
nr <- nrow(dat)
set.seed(1)
dat1 <- lapply(1:15, function(ii)
matrix(c(sample(1:9, nr * 2, replace = TRUE),
sample(1:2, nr, replace = TRUE)), nrow = nr,
dimnames = list(NULL, c(paste0('Time2.Feel1.', 1 + 2 * (ii - 1)),
paste0('Time2.Feel1.', 2 + 2 * (ii - 1)),
sprintf('Time%sNum', 2 + 2 * (ii - 1))))))
dat1 <- data.frame(do.call('cbind', dat1))
# Time2.Feel1.1 Time2.Feel1.2 Time2Num Time2.Feel1.3 Time2.Feel1.4 Time4Num
# 1 3 9 2 4 3 1
# 2 4 6 1 7 4 2
# 3 6 6 2 9 1 1
# 4 9 1 1 2 4 1
# 5 2 2 2 6 8 2
# 6 9 2 2 2 4 2
# Time2.Feel1.5 Time2.Feel1.6 Time6Num Time2.Feel1.7 Time2.Feel1.8 Time8Num
# 1 8 8 2 1 9 1
# 2 1 5 2 1 3 2
# 3 7 5 1 3 5 1
# 4 4 8 2 5 3 2
# 5 8 1 1 6 6 1
# 6 6 5 1 4 3 2
# Time2.Feel1.9 Time2.Feel1.10 Time10Num Time2.Feel1.11 Time2.Feel1.12 Time12Num
# 1 4 7 2 3 5 1
# 2 4 9 1 1 4 2
# 3 5 4 2 6 8 2
# 4 9 7 1 8 6 1
# 5 8 4 1 8 6 1
# 6 4 3 1 8 4 1
etc, etc
So you can start here. First you make the input vectors:
I call xx which is Time2.Feel1, Time2.Feel3, Time2.Feel5, etc
yy which is Time2.Feel2, Time2.Feel4, Time2.Feel6, etc; xx and yy are your two "choices"
and zz which is the "decision" column, Time2Feel1, Time4Feel1, Time6Feel1, etc
Then use mapply to do the indexing above but in a 1-1 mapping using those three input vectors with mapply. Note that zz, yy, and xx are all the same length
n <- 30
xx <- paste0('Time2.Feel1.', seq(1, n - 1, by = 2))
yy <- paste0('Time2.Feel1.', seq(2, n, by = 2))
zz <- sprintf('Time%sNum', seq(2, n, by = 2))
nn <- sprintf('Time%s.Feel1', seq(2, n, by = 2))
res <- mapply(function(x, y, z) dat1[, c(x, y)][cbind(1:nr, dat1[, z])],
xx, yy, zz, SIMPLIFY = FALSE)
res <- `colnames<-`(do.call('cbind', res), nn)
# Time2.Feel1 Time4.Feel1 Time6.Feel1 Time8.Feel1 Time10.Feel1 Time12.Feel1
# [1,] 9 4 8 1 7 3
# [2,] 4 4 5 3 4 4
# [3,] 6 9 7 3 4 8
# [4,] 9 2 8 3 9 8
# [5,] 2 8 8 6 8 8
# [6,] 2 4 6 3 4 8
And then you can combine the results back. You would need to reorder them if that is important to you
## combine results into original data
cbind(dat1, res)
When searching for the error I received when trying the answer from #user12202013, I came across this solution using ifelse, found here: Conditional assignment of one variable to the value of one of two other variables
Time2.Feel1 <- ifelse(Time2Num == 1, Time2.Feel1.1, ifelse(Time2Num == 2,
Time2.Feel1.2,""))
Although it is definitely not the most efficient solution, particularly because I need to nest it 30 times and I need to do it for 9 items, it solved my problem. A simpler answer is still welcome, though!
Thanks for your answers!
You want to do something like:
Time2.Feel1 = rep(NA, length(Time2Num))
Time2.Feel1[Time2Num == 1] <- Time2.Feel1.1
Time2.Feel1[Time2Num == 2] <- Time2.Feel1.2
This says to create a vector called Time2.Feel1 which we initialize with NA values. Then where Time2Num is one we fill in the values from Time2.Feel1.1 and where Time2Num is two we fill in the values from Time2.Feel1.2. If there is any place where Time2Num is neither 1 nor 2 thenTime2.Feel1` will have an NA value.
Edit:
Not sure what the error message is referring to since I am able to do this
# reproducible example
set.seed(1)
A <- letters
B <- sample(c(0, 1, NA), 26, TRUE)
A[B == 1] <- '5' # assignment where subscript contains NAs
A[B == 0] <- NA # assigning NA values
A
[1] NA "5" "5" "d" NA "f" "g" "5" "5" NA NA NA "m" "5" "o" "5" "q" "r" "5" "t" "u" NA "5" NA NA "5"
I would need to see more complete code to know what is causing the error.

Map numbers to smallest in a vector of numbers in R

Given a vector of numbers, I'd like to map each to the smallest in a separate vector that the number does not exceed. For example:
# Given these
v1 <- 1:10
v2 <- c(2, 5, 11)
# I'd like to return
result <- c(2, 2, 5, 5, 5, 11, 11, 11, 11, 11)
Try
cut(v1, c(0, v2), labels = v2)
[1] 2 2 5 5 5 11 11 11 11 11
Levels: 2 5 11
which can be converted to a numeric vector using as.numeric(as.character(...)).
Another way (Thanks for the edit #Ananda)
v2[findInterval(v1, v2 + 1) + 1]
# [1] 2 2 5 5 5 11 11 11 11 11]

Extract a numeric vector from data frame in R

I have a data.frame like following example. I want to write a function to do these two tasks for me in one function in R? first extract the value of data frame which is same for x and y and I want to save it as a numeric vector and also make the rest as a data frame.
d = data.frame(x = c(1,7, 2, 9, 11),y=c(6, 7, 8, 9,10))
v = c(7, 9)
w = data.frame(x=c(1, 2, 11), y=c(6, 8, 10))
My desire result as follows:
> result
$v
[1] 7 9
$w
x y
1 1 6
2 2 8
3 11 10
Maybe with is what you want?
with(d, list(v = x[x==y] ,w=d[x!=y,]))
$v
[1] 7 9
$w
x y
1 1 6
3 2 8
5 11 10
Something along these lines should do this too
splitdf <- function(df) {
if (ncol(df) != 2) stop("df must have 2 columns")
ind <- do.call("==", df)
list(v = df[ind, 1], w = df[!ind, ])
}
d <- data.frame(x = c(1, 7, 2, 9, 11), y = c(6, 7, 8, 9, 10))
splitdf(d)
## $v
## [1] 7 9
## $w
## x y
## 1 1 6
## 3 2 8
## 5 11 10
df <- data.frame(x = c(1, 7, 2, 9, 11), z = c(7, 8, 10, 9, 12))
splitdf(df)
## $v
## [1] 9
## $w
## x z
## 1 1 7
## 2 7 8
## 3 2 10
## 5 11 12

Resources