Joining two data frames of different lengths - r

I have a data frame which has 25 weeks data on sales. I have computed a lagged moving average. Now, say x <- c(1,2,3,4) and moving average y <- c(Nan,1,1.5,2,2.5).
If I use z <- data.frame(x,y) it's giving me error as the dimensions are not matching. Is there any way to join them as a data frame by inserting an NA value at the end of the x column? '
Is the same thing possible when x is a data frame with n rows, m columns and I want to append a column of length (m+1) to the right of it?

Yet another way of doing it
data.frame(x[1:length(y)], y)
If x is a data frame, you can use
data.frame(x[1:length(y), ], y)

You could do this
> lst <- list(x = x, y = y)
> m <- max(sapply(lst, length))
> as.data.frame(lapply(lst, function(x){ length(x) <- m; x }))
# x y
# 1 1 NaN
# 2 2 1.0
# 3 3 1.5
# 4 4 2.0
# 5 NA 2.5
In response to your comment, if x is a matrix and y is a vector, it would depend on the number of columns in x. But for this example
cbind(append(x, rep(NA, length(y)-length(x))), y)
If x has multiple columns, you could use some variety of
apply(x, 2, append, NA)
But again, it depends on what's in the columns and what's in y

May be this also helps:
x<- 1:4
x1 <- matrix(1:8,ncol=2)
y <- c(NaN,1,1.5,2,2.5)
do.call(`merge`, c(list(x,y),by=0,all=TRUE))[,-1]
# x y
# 1 1 NaN
# 2 2 1.0
# 3 3 1.5
# 4 4 2.0
# 5 NA 2.5
do.call(`merge`, c(list(x1,y),by=0,all=TRUE))[,-1]
# V1 V2 y
#1 1 5 NaN
#2 2 6 1.0
#3 3 7 1.5
#4 4 8 2.0
#5 NA NA 2.5

Related

Correlation between two matrices of different dimensions

I'm very new to R. I have two matrices of different dimensions, C (3 rows, 79 columns) and T(3 rows, 215 columns). I want my code to calculate the Spearman correlation between the first column of C and all the columns of T and return the maximum correlation with the indexes and of the columns. Then, the second column of C and all the columns of T and so on. In fact, I want to find the columns between two matrices which are most correlated. Hope it was clear.
What I did was a nested for loop, but the result is not what I search.
for (i in 1:79){
for(j in 1:215){
print(max(cor(C[,i],T[,j],method = c("spearman"))))
}
}
You don't have to loop over the columns.
x <- cor(C,T,method = c("spearman"))
out <- data.frame(MaxCorr = apply(x,1,max), T_ColIndex=apply(x,1,which.max),C_ColIndex=1:nrow(x))
head(out)
gives,
MaxCorr T_ColIndex C_ColIndex
1 1 8 1
2 1 1 2
3 1 2 3
4 1 1 4
5 1 11 5
6 1 4 6
Fake Data:
C <- matrix(rnorm(3*79),nrow=3)
T <- matrix(rnorm(3*215),nrow=3)
Maybe something like the function below can solve the problem.
pairwise_cor <- function(x, y, method = "spearman"){
ix <- seq_len(ncol(x))
iy <- seq_len(ncol(y))
t(sapply(ix, function(i){
m <- sapply(iy, function(j) cor(x[,i], y[,j], method = method))
setNames(c(i, which.max(m), max(m)), c("col_x", "col_y", "max"))
}))
}
set.seed(2021)
C <- matrix(rnorm(3*5), nrow=3)
T <- matrix(rnorm(3*7), nrow=3)
pairwise_cor(C, T)
# col_x col_y max
#[1,] 1 1 1.0
#[2,] 2 2 1.0
#[3,] 3 2 1.0
#[4,] 4 3 0.5
#[5,] 5 5 1.0

Selection of argument within a function based on the comparison of two vectors

Given is a dataframe with the vectors x1 and y1:
x1 <- c(1,1,2,2,3,4)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
Also, I have a dataframe with the different values from the vector y1 and a corresponding probability:
y <- c(0,1,2)
p <- c(0.1,0.6,0.9)
df2 <- data.frame(y,p)
The following function compares a given probability (p) with a random number (runif(1)). Based on the result of the comparison, the value of df$x1 changes and is stored in df$x2 (for each value of x1 a new random number has to be drawn):
example_function <- function(x,p){
if(runif(1) <= p) return(x + 1)
return(x)
}
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,example_function,0.5))
> df1$x2
[1] 2 1 3 2 3 5
Here is my problem: In the example above I chose 0.5 for the argument "p" (manually). Instead, I would like to select a probability p from df2 based on the values for y1 associated with x1 in df1. Accordingly, I want p in
df1$x2 <- unlist(lapply(df1$x1,example_function,p))
to be derived from df2.
For example, df$x1[3], which is a 2, belongs to df$y1[3], which is a 1. df2 shows, that a 1 for y is associated with p = 0.6. In that case, the argument p for df1$x1[3] in "example_function" should be 0.6. How can this kind of a query for the value p be integrated into the described function?
df1$x2 <- unlist(lapply(df1$x1,
function(z) {
example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z][1])
}))
df1
# x1 y1 x2
# 1 1 0 1
# 2 2 0 2
# 3 3 1 4
# 4 4 1 4
# 5 5 2 6
# 6 6 2 7
There is no need to do anything complicated here. You can get what you want using vector-expressions.
To pick your probabilities given p and y1, simply subscript:
> p[y1]
[1] 0.1 0.1 0.6 0.6
and then pick your x2 from x1 and the sample like this:
> ifelse(runif(1) <= p[y1], x1, x1 + 2)
[1] 3 4 3 4
One way to solve the problem is working with "merge" and "mapply" instead of "lapply":
df_new <- merge(df1, df2, by.x = 'y1', by.y = 'y')
set.seed(123)
df1$x2 <- mapply(example_function,df1$x1,df_new$p)
> df1
x1 y1 x2
1 1 0 1
2 1 0 1
3 2 1 3
4 2 1 2
5 3 2 3
6 4 2 5

how to do calculation between a list and a matrix in r

I have a list and a Matrix as per below:
List Y:
$`1`
V1 V2
1 1 1
2 1 2
3 2 1
4 2 2
$`2`
V1 V2
5 5 5
6 11 2
$`3`
V1 V2
7 10 1
8 10 2
9 11 1
10 5 6
Matrix Z:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
I consider below as points1, points2 and points3 in Matrix Z respectively
points1 -(2,1)
[,1][,2]
[1,] 2 1
points2 - (5,5)
[,3][,4]
[1,] 5 5
points3 - (10,1)
[,5][,5]
[1,] 10 1
I want to calculate the sum of distances between all points in list Y[[1]] and points1, all points in List Y[[2]] and points2 and all points in List Y[[3]] and points 3 in r. How can I do this?
rowsums(|y-z|^2)
Based on the description,
Map(function(y, z) rowSums(abs(y - z[col(y)])^2),
Y, split(Z, as.numeric(gl(ncol(Z), 2, ncol(Z)))))
Try the following. It uses Map to apply a function to every vector of the two lists passed to Map. Note that we cannot simply do
Map('-', Y, Z2)
because R would do the subtractions columnwise, not row by row.
f <- function(x, y){
for(i in seq_len(nrow(x)))
x[i, ] <- x[i, ] - y
x
}
Z2 <- split(Z, rep(1:3, each = 2))
Map(f, Y, Z2)

How to calculate an element-wise quotient of two data frames?

> A <- data.frame(x = c(1,2,3), y = c(4,5,6), z = c(7,8,9))
> B <- data.frame(x = c(1,1,1), y = c(2,2,2), z = c(3,3,3))
> A
x y z
1 1 4 7
2 2 5 8
3 3 6 9
> B
x y z
1 1 2 3
2 1 2 3
3 1 2 3
What I would like to do is calculate a new data frame C which is the defined as:
C[i,j] := A[i,j] / B[i,j]
for all coordinates i,j possible.
Is there a clean and quick way to do it without resorting to loops and without referencing individual columns or rows?
(Application of data.table, plyr is fine)
Simple: do A/B:
R> C <- A/B
R> C
x y z
1 1 2.0 2.33333
2 2 2.5 2.66667
3 3 3.0 3.00000
R>
R really is a vectorised language.

r "slot" two columns into one (like a zip)

Given two columns (perhaps from a data frame) of equal length N, how can I produce a column of length 2N with the odd entries from the first column and the even entries from the second column?
Suppose I have the following data frame
df.1 <- data.frame(X = LETTERS[1:10], Y = 2*(1:10)-1, Z = 2*(1:10))
How can I produce this data frame df.2?
i <- 1
j <- 0
XX <- NA
while (i <= 10){
XX[i+j] <- LETTERS[i]
XX[i+j+1]<- LETTERS[i]
i <- i+1
j <- i-1
}
df.2 <- data.frame(X.X = XX, Y.Z = c(1:20))
ggplot2 has an unexported function interleave which does this.
Whilst unexported it does have a help page (?ggplot2:::interleave)
with(df.1, ggplot2:::interleave(Y,Z))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
If I understand you right, you want to create a new vector twice the length of the vectors X, Y and Z in your data frame and then want all the elements of X to occupy the odd indices of this new vector and all the elements of Y the even indices. If so, then the code below should do the trick:
foo<-vector(length=2*nrow(df.1), mode='character')
foo[seq(from = 1, to = 2*length(df.1$X), by=2)]<-as.character(df.1$X)
foo[seq(from = 2, to = 2*length(df.1$X), by=2)]<-df.1$Y
Note, I first create an empty vector foo of length 20, then fill it in with elements of df.1$X and df.1$Y.
Cheers,
Danny
You can use melt from reshape2:
library(reshape2)
foo <- melt(df.1, id.vars='X')
> foo
X variable value
1 A Y 1
2 B Y 3
3 C Y 5
4 D Y 7
5 E Y 9
6 F Y 11
7 G Y 13
8 H Y 15
9 I Y 17
10 J Y 19
11 A Z 2
12 B Z 4
13 C Z 6
14 D Z 8
15 E Z 10
16 F Z 12
17 G Z 14
18 H Z 16
19 I Z 18
20 J Z 20
Then you can sort and pick the columns you want:
foo[order(foo$X), c('X', 'value')]
Another solution using base R.
First index the character vector of the data.frame using the vector [1,1,2,2 ... 10,10] and store as X.X. Next, rbind the data.frame vectors Y & Z effectively "zipping" them and store in Y.X.
> res <- data.frame(
+ X.X = df.1$X[c(rbind(1:10, 1:10))],
+ Y.Z = c(rbind(df.1$Y, df.1$Z))
+ )
> head(res)
X.X Y.Z
1 A 1
2 A 2
3 B 3
4 B 4
5 C 5
6 C 6
A one two liner in base R:
test <- data.frame(X.X=df.1$X,Y.Z=unlist(df.1[c("Y","Z")]))
test[order(test$X.X),]
Assuming that you want what you asked for in the first paragraph, and the rest of what you posted is your attempt at solving it.
a=df.1[df.1$Y%%2>0,1:2]
b=df.1[df.1$Z%%2==0,c(1,3)]
names(a)=c("X.X","Y.Z")
names(b)=names(a)
df.2=rbind(a, b)
If you want to group them by X.X as shown in your example, you can do:
library(plyr)
arrange(df.2, X.X)

Resources