How to calculate an element-wise quotient of two data frames? - r

> A <- data.frame(x = c(1,2,3), y = c(4,5,6), z = c(7,8,9))
> B <- data.frame(x = c(1,1,1), y = c(2,2,2), z = c(3,3,3))
> A
x y z
1 1 4 7
2 2 5 8
3 3 6 9
> B
x y z
1 1 2 3
2 1 2 3
3 1 2 3
What I would like to do is calculate a new data frame C which is the defined as:
C[i,j] := A[i,j] / B[i,j]
for all coordinates i,j possible.
Is there a clean and quick way to do it without resorting to loops and without referencing individual columns or rows?
(Application of data.table, plyr is fine)

Simple: do A/B:
R> C <- A/B
R> C
x y z
1 1 2.0 2.33333
2 2 2.5 2.66667
3 3 3.0 3.00000
R>
R really is a vectorised language.

Related

extract and format data from dataset into matrix in R

I want to make this dataframe
into this matrix
I have tried:
x <- read.csv("sample1.csv")
ax <- matrix(c(x[1,1],x[2,1],x[1,3],x[1,1],x[3,1],x[1,4],x[1,1],x[4,1],x[1,5],x[1,1],x[5,1],x[1,6],x[1,1],x[6,1],x[1,7],x[2,1],x[1,1],x[2,2],x[2,1],x[3,1],x[2,4],x[2,1],x[4,1],x[2,5],x[2,1],x[5,1],x[2,6],x[3,1],x[6,1],x[2,7],x[3,1],x[1,1],x[3,2],x[3,1],x[2,1],x[3,3],x[3,1],x[4,1],x[3,5],x[3,1],x[5,1],x[3,6],x[3,1],x[6,1],x[3,7],x[4,1],x[1,1],x[4,2],x[4,1],x[2,1],x[4,3],x[4,1],x[3,1],x[4,4],x[4,1],x[5,1],x[4,6],x[4,1],x[6,1],x[4,7],x[5,1],x[1,1],x[2,2],x[5,1],x[2,1],x[2,4],x[5,1],x[3,1],x[2,5],x[5,1],x[4,1],x[2,6],x[5,1],x[6,1],x[2,7],x[6,1],x[1,1],x[2,2],x[6,1],x[2,1],x[2,4],x[6,1],x[3,1],x[2,5],x[6,1],x[4,1],x[2,6],x[6,1],x[5,1],x[2,7]),10,3, byrow=TRUE)
bx <- ax[order(ax[,3], decreasing = TRUE),]
But it's not beautiful at all, and also it's gonna be lots of work if I got different sample data.
So I wish to simplified it if possible, any suggestion?
This can be achieved by using melt() function from reshape2 package:
> a = matrix(c(1:9), nrow = 3, ncol = 3, dimnames = list(LETTERS[1:3], letters[1:3]))
> a
a b c
A 1 4 7
B 2 5 8
C 3 6 9
> library(reshape2)
> melt(a, na.rm = TRUE)
Var1 Var2 value
1 A a 1
2 B a 2
3 C a 3
4 A b 4
5 B b 5
6 C b 6
7 A c 7
8 B c 8
9 C c 9

R - find clusters of group 2 (pairs)

I am looking for a way to find clusters of group 2 (pairs).
Is there a simple way to do that?
Imagine I have some kind of data where I want to match on x and y, like
library(cluster)
set.seed(1)
df = data.frame(id = 1:10, x_coord = sample(10,10), y_coord = sample(10,10))
I want to find the closest pair of distances between the x_coord and y_coord:
d = stats::dist(df[,c(1,2)], diag = T)
h = hclust(d)
plot(h)
I get a dendrogram like the one below. What I would like is that the pairs (9,10), (1,3), (6,7), (4,5) be grouped together. And that in fact the cases 8 and 2, be left alone and removed.
Maybe there is a more effective alternative for doing this than clustering.
Ultimately I would like is to remove the unmatched ids and keep the pairs and have a dataset like this one:
id x_coord y_coord pair_id
1 9 3 1
3 7 5 1
4 1 8 2
5 2 2 2
6 5 6 3
7 3 10 3
9 6 4 4
10 8 7 4
You could use the element h$merge. Any rows of this two-column matrix that both contain negative values represent a pairing of singletons. Therefore you can do:
pairs <- -h$merge[apply(h$merge, 1, function(x) all(x < 0)),]
df$pair <- (match(df$id, c(pairs)) - 1) %% nrow(pairs) + 1
df <- df[!is.na(df$pair),]
df
#> id x_coord y_coord pair
#> 1 1 9 3 4
#> 3 3 7 5 4
#> 4 4 1 8 1
#> 5 5 2 2 1
#> 6 6 5 6 2
#> 7 7 3 10 2
#> 9 9 6 4 3
#> 10 10 8 7 3
Note that the pair numbers equate to "height" on the dendrogram. If you want them to be in ascending order according to the order of their appearance in the dataframe you can add the line
df$pair <- as.numeric(factor(df$pair, levels = unique(df$pair)))
Anyway, if we repeat your plotting code on our newly modified df, we can see there are no unpaired singletons left:
d = stats::dist(df[,c(1,2)], diag = T)
h = hclust(d)
plot(h)
And we can see the method scales nicely:
df = data.frame(id = 1:50, x_coord = sample(50), y_coord = sample(50))
d = stats::dist(df[,c(1,2)], diag = T)
h = hclust(d)
pairs <- -h$merge[apply(h$merge, 1, function(x) all(x < 0)),]
df$pair <- (match(df$id, c(pairs)) - 1) %% nrow(pairs) + 1
df <- df[!is.na(df$pair),]
d = stats::dist(df[,c(1,2)], diag = T)
h = hclust(d)
plot(h)

Joining two data frames of different lengths

I have a data frame which has 25 weeks data on sales. I have computed a lagged moving average. Now, say x <- c(1,2,3,4) and moving average y <- c(Nan,1,1.5,2,2.5).
If I use z <- data.frame(x,y) it's giving me error as the dimensions are not matching. Is there any way to join them as a data frame by inserting an NA value at the end of the x column? '
Is the same thing possible when x is a data frame with n rows, m columns and I want to append a column of length (m+1) to the right of it?
Yet another way of doing it
data.frame(x[1:length(y)], y)
If x is a data frame, you can use
data.frame(x[1:length(y), ], y)
You could do this
> lst <- list(x = x, y = y)
> m <- max(sapply(lst, length))
> as.data.frame(lapply(lst, function(x){ length(x) <- m; x }))
# x y
# 1 1 NaN
# 2 2 1.0
# 3 3 1.5
# 4 4 2.0
# 5 NA 2.5
In response to your comment, if x is a matrix and y is a vector, it would depend on the number of columns in x. But for this example
cbind(append(x, rep(NA, length(y)-length(x))), y)
If x has multiple columns, you could use some variety of
apply(x, 2, append, NA)
But again, it depends on what's in the columns and what's in y
May be this also helps:
x<- 1:4
x1 <- matrix(1:8,ncol=2)
y <- c(NaN,1,1.5,2,2.5)
do.call(`merge`, c(list(x,y),by=0,all=TRUE))[,-1]
# x y
# 1 1 NaN
# 2 2 1.0
# 3 3 1.5
# 4 4 2.0
# 5 NA 2.5
do.call(`merge`, c(list(x1,y),by=0,all=TRUE))[,-1]
# V1 V2 y
#1 1 5 NaN
#2 2 6 1.0
#3 3 7 1.5
#4 4 8 2.0
#5 NA NA 2.5

Replacing header in data frame based on values in second data frame

Say I have a data frame which looks like this:
df.A
A B C
x 1 3 4
y 5 4 6
z 8 9 1
And I want to replace the column names in the first based on column values in a second:
df.B
Low High
A D
B F
C G
Such that I get:
df.A
D F G
x 1 3 4
y 5 4 6
z 8 9 1
How would I do it?
I have tried extracting the vector df.B$High from df.B and using this in names(df.A), but everything is in alphabetical order and shifted over one. Furthermore, this only works if the order of columns in df.A is conserved with respect to the elements in df.B$High, which is not always the case (and in my real example there is no numeric or alphabetical way to sort the two to the same order). So I think I need an rbind-type argument for matching elements, but I'm not sure.
Thanks!
You can use rename from plyr:
library(plyr)
dat <- read.table(text = " A B C
x 1 3 4
y 5 4 6
z 8 9 1",header = TRUE,sep = "")
> new <- read.table(text = "Low High
A D
B F
C G",header = TRUE,sep = "")
> rename(dat,replace = setNames(new$High,new$Low))
D F G
x 1 3 4
y 5 4 6
z 8 9 1
using match:
df.A <- read.table(sep=" ", header=T, text="
A B C
x 1 3 4
y 5 4 6
z 8 9 1")
df.B <- read.table(sep=" ", header=T, text="
Low High
A D
B F
C G")
df.C <- df.A
names(df.C) <- df.B$High[match(names(df.A), df.B$Low)]
df.C
# D F G
# x 1 3 4
# y 5 4 6
# z 8 9 1
You can play games with the row names of df.B to make a lookup more convenient:
rownames(df.B) <- df.B$Low
names(df.A) <- df.B[names(df.A),"High"]
df.A
## D F G
## x 1 3 4
## y 5 4 6
## z 8 9 1
Here's an approach abusing factor:
f <- factor(names(df.A), levels=df.B$Low)
levels(f) <- df.B$High
f
## [1] D F G
## Levels: D F G
names(df.A) <- f
## Desired results

Create a new vector formed by a list of vectors

Suppose I have m vectors: a_1 = (a_{11}...a_{1n}) ... a_m = (a_{m1}...a_{mn})
I want a new vector b of length mn such that
b = (a_{11}...a_{m1} a_{12}...a_{m2}...a_{1n}...a_{mn})
I can think of a for loop, for example:
>a<-c(1,1,1);b<-c(2,2,2);c<-c(3,3,3)
>x<-NULL
>for (i in 1:3) {x<-c(x,c(a[i],b[i],c[i]))}
>x
[1] 1 2 3 1 2 3 1 2 3
Is there a better way?
Or using mapply...
c( mapply( c , a , b , c ) )
[1] 1 2 3 1 2 3 1 2 3
c(matrix(c(a, b, c), nrow=length(a), byrow=TRUE))

Resources