subset one list on another - r

I have been looking at mapply documentation but I cannot find an example close enough to help me get started.
I have lists foo and bar:
set.seed(123)
f <- data.frame(y=1:10,x=sample(LETTERS,10))
foo <- list(f,f)
b <- data.frame(x=c("J","U","A"))
ba <- data.frame(x=c("J","W"))
bar <- list(b,ba)
I can subset f with b using:
result <- f[f$x %in% b$x ,]
I want to do this subset but for the whole lists foo and bar i.e. subset foo[[1]] by foo[[1]]["x"] on bar[[1]] and foo[[2]] by foo[[2]]["x"] on bar[[2]] etc...
the result would be:
>foo
[[1]]
y x
3 3 J
4 4 U
6 6 A
[[2]]
y x
3 3 J
5 5 W

Like so...?
mapply(merge,foo,bar,SIMPLIFY = FALSE)
[[1]]
x y
1 A 6
2 J 3
3 U 4
[[2]]
x y
1 J 3
2 W 5

Related

How do I make a function that returns the equation from the argument?

For example, I have a function with the arguments "x", "d" and "equation". The "x" argument is my data frame, the "d" argument is the numeric data frame column, and in "equation" argument I wanted to insert the equation "d * 5 ^ 0.02".
As an output, I need to have a new column "V" in the data frame, with the result of the argument equation.
My idea went wrong:
myfunction <- function(x, d, equation, ...){
x$V <- equation
}
myfunction(x=x, d = x$d, equation = c("d"*5^0.02))
I assume you want equation applied to x$d, and the result to be written to x$V.
Providing an "equation" like that is very unusual, and prone to error. Consider creating a function: f <- function(x) x * 5^0.02, and then doing the following.
# dummy data
x <- data.frame(d = 1:10)
# your equation
f <- function(x) x * 5^0.02
g <- function(x, d, f) {
# call function f with column d as its argument
x$V <- f(x[[d]]))
return(x)
}
g(x, "d", f)
d V
1 1 1.032712
2 2 2.065425
3 3 3.098137
4 4 4.130850
5 5 5.163562
Similar to other answers. However, functions that operate in side-effect preclude things like: assign to a new variable (one answer suggests a way to do this), or operate within a pipeline (e.g., %>%).
I suggest not using side-effect (<<- and assign).
myfunction <- function(x, d, equation, ...) {
x$V <- eval(substitute(equation), envir = x)
x
}
x <- data.frame(d = 1:5)
myfunction(x, x$d, d*5^0.02)
# d V
# 1 1 1.032712
# 2 2 2.065425
# 3 3 3.098137
# 4 4 4.130850
# 5 5 5.163562
The original x is unchanged. One advantage to using a functional vice side-effect paradigm is that it will flow better in (say) pipes:
library(dplyr)
x %>%
myfunction(d, d*5^0.02)
# d V
# 1 1 1.032712
# 2 2 2.065425
# 3 3 3.098137
# 4 4 4.130850
# 5 5 5.163562
whereas using side-effect might not be affecting the x that is intended/desired.
x %>%
filter(between(d, 2, 4)) %>%
myfunction(d, d*5^0.02)
# d V
# 1 2 2.065425
# 2 3 3.098137
# 3 4 4.13085
(This does not work when side-effect is used.)
Alternatively, we already have a function in base R for that:
within(x, { V = d*5^0.02 })
# d V
# 1 1 1.032712
# 2 2 2.065425
# 3 3 3.098137
# 4 4 4.130850
# 5 5 5.163562
transform(x, V = d*5^0.02 )
# d V
# 1 1 1.032712
# 2 2 2.065425
# 3 3 3.098137
# 4 4 4.130850
# 5 5 5.163562
Did you have something like this in mind?
myfunction <- function(x, d, equation, ...) x$v <<- eval(substitute(equation))
x <- data.frame(d = 1:5)
myfunction(x=x, d = x$d, equation = d*5^0.02)
x
#> d v
#> 1 1 1.032712
#> 2 2 2.065425
#> 3 3 3.098137
#> 4 4 4.130850
#> 5 5 5.163562
After pondering this a bit, I wonder if you are trying to reinvent within?
within(x, v <- d*5^0.02)
#> d v
#> 1 1 1.032712
#> 2 2 2.065425
#> 3 3 3.098137
#> 4 4 4.130850
#> 5 5 5.163562
Created on 2020-05-27 by the reprex package (v0.3.0)
Here you can return to the global environment direct:
x <- data.frame(c(1:2,5:6),c(7:10))
x
colnames(x) <- c("V1","d")
myfunction <- function(x, d, equation,named.df="NA" ,...){
x$V <- equation
assign(named.df,x, envir=.GlobalEnv)
}
myfunction(x=x, d = x$d, equation = c(x$d*5^0.02),"function result" )

how to do calculation between a list and a matrix in r

I have a list and a Matrix as per below:
List Y:
$`1`
V1 V2
1 1 1
2 1 2
3 2 1
4 2 2
$`2`
V1 V2
5 5 5
6 11 2
$`3`
V1 V2
7 10 1
8 10 2
9 11 1
10 5 6
Matrix Z:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
I consider below as points1, points2 and points3 in Matrix Z respectively
points1 -(2,1)
[,1][,2]
[1,] 2 1
points2 - (5,5)
[,3][,4]
[1,] 5 5
points3 - (10,1)
[,5][,5]
[1,] 10 1
I want to calculate the sum of distances between all points in list Y[[1]] and points1, all points in List Y[[2]] and points2 and all points in List Y[[3]] and points 3 in r. How can I do this?
rowsums(|y-z|^2)
Based on the description,
Map(function(y, z) rowSums(abs(y - z[col(y)])^2),
Y, split(Z, as.numeric(gl(ncol(Z), 2, ncol(Z)))))
Try the following. It uses Map to apply a function to every vector of the two lists passed to Map. Note that we cannot simply do
Map('-', Y, Z2)
because R would do the subtractions columnwise, not row by row.
f <- function(x, y){
for(i in seq_len(nrow(x)))
x[i, ] <- x[i, ] - y
x
}
Z2 <- split(Z, rep(1:3, each = 2))
Map(f, Y, Z2)

Replacing header in data frame based on values in second data frame

Say I have a data frame which looks like this:
df.A
A B C
x 1 3 4
y 5 4 6
z 8 9 1
And I want to replace the column names in the first based on column values in a second:
df.B
Low High
A D
B F
C G
Such that I get:
df.A
D F G
x 1 3 4
y 5 4 6
z 8 9 1
How would I do it?
I have tried extracting the vector df.B$High from df.B and using this in names(df.A), but everything is in alphabetical order and shifted over one. Furthermore, this only works if the order of columns in df.A is conserved with respect to the elements in df.B$High, which is not always the case (and in my real example there is no numeric or alphabetical way to sort the two to the same order). So I think I need an rbind-type argument for matching elements, but I'm not sure.
Thanks!
You can use rename from plyr:
library(plyr)
dat <- read.table(text = " A B C
x 1 3 4
y 5 4 6
z 8 9 1",header = TRUE,sep = "")
> new <- read.table(text = "Low High
A D
B F
C G",header = TRUE,sep = "")
> rename(dat,replace = setNames(new$High,new$Low))
D F G
x 1 3 4
y 5 4 6
z 8 9 1
using match:
df.A <- read.table(sep=" ", header=T, text="
A B C
x 1 3 4
y 5 4 6
z 8 9 1")
df.B <- read.table(sep=" ", header=T, text="
Low High
A D
B F
C G")
df.C <- df.A
names(df.C) <- df.B$High[match(names(df.A), df.B$Low)]
df.C
# D F G
# x 1 3 4
# y 5 4 6
# z 8 9 1
You can play games with the row names of df.B to make a lookup more convenient:
rownames(df.B) <- df.B$Low
names(df.A) <- df.B[names(df.A),"High"]
df.A
## D F G
## x 1 3 4
## y 5 4 6
## z 8 9 1
Here's an approach abusing factor:
f <- factor(names(df.A), levels=df.B$Low)
levels(f) <- df.B$High
f
## [1] D F G
## Levels: D F G
names(df.A) <- f
## Desired results

How to calculate an element-wise quotient of two data frames?

> A <- data.frame(x = c(1,2,3), y = c(4,5,6), z = c(7,8,9))
> B <- data.frame(x = c(1,1,1), y = c(2,2,2), z = c(3,3,3))
> A
x y z
1 1 4 7
2 2 5 8
3 3 6 9
> B
x y z
1 1 2 3
2 1 2 3
3 1 2 3
What I would like to do is calculate a new data frame C which is the defined as:
C[i,j] := A[i,j] / B[i,j]
for all coordinates i,j possible.
Is there a clean and quick way to do it without resorting to loops and without referencing individual columns or rows?
(Application of data.table, plyr is fine)
Simple: do A/B:
R> C <- A/B
R> C
x y z
1 1 2.0 2.33333
2 2 2.5 2.66667
3 3 3.0 3.00000
R>
R really is a vectorised language.

r "slot" two columns into one (like a zip)

Given two columns (perhaps from a data frame) of equal length N, how can I produce a column of length 2N with the odd entries from the first column and the even entries from the second column?
Suppose I have the following data frame
df.1 <- data.frame(X = LETTERS[1:10], Y = 2*(1:10)-1, Z = 2*(1:10))
How can I produce this data frame df.2?
i <- 1
j <- 0
XX <- NA
while (i <= 10){
XX[i+j] <- LETTERS[i]
XX[i+j+1]<- LETTERS[i]
i <- i+1
j <- i-1
}
df.2 <- data.frame(X.X = XX, Y.Z = c(1:20))
ggplot2 has an unexported function interleave which does this.
Whilst unexported it does have a help page (?ggplot2:::interleave)
with(df.1, ggplot2:::interleave(Y,Z))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
If I understand you right, you want to create a new vector twice the length of the vectors X, Y and Z in your data frame and then want all the elements of X to occupy the odd indices of this new vector and all the elements of Y the even indices. If so, then the code below should do the trick:
foo<-vector(length=2*nrow(df.1), mode='character')
foo[seq(from = 1, to = 2*length(df.1$X), by=2)]<-as.character(df.1$X)
foo[seq(from = 2, to = 2*length(df.1$X), by=2)]<-df.1$Y
Note, I first create an empty vector foo of length 20, then fill it in with elements of df.1$X and df.1$Y.
Cheers,
Danny
You can use melt from reshape2:
library(reshape2)
foo <- melt(df.1, id.vars='X')
> foo
X variable value
1 A Y 1
2 B Y 3
3 C Y 5
4 D Y 7
5 E Y 9
6 F Y 11
7 G Y 13
8 H Y 15
9 I Y 17
10 J Y 19
11 A Z 2
12 B Z 4
13 C Z 6
14 D Z 8
15 E Z 10
16 F Z 12
17 G Z 14
18 H Z 16
19 I Z 18
20 J Z 20
Then you can sort and pick the columns you want:
foo[order(foo$X), c('X', 'value')]
Another solution using base R.
First index the character vector of the data.frame using the vector [1,1,2,2 ... 10,10] and store as X.X. Next, rbind the data.frame vectors Y & Z effectively "zipping" them and store in Y.X.
> res <- data.frame(
+ X.X = df.1$X[c(rbind(1:10, 1:10))],
+ Y.Z = c(rbind(df.1$Y, df.1$Z))
+ )
> head(res)
X.X Y.Z
1 A 1
2 A 2
3 B 3
4 B 4
5 C 5
6 C 6
A one two liner in base R:
test <- data.frame(X.X=df.1$X,Y.Z=unlist(df.1[c("Y","Z")]))
test[order(test$X.X),]
Assuming that you want what you asked for in the first paragraph, and the rest of what you posted is your attempt at solving it.
a=df.1[df.1$Y%%2>0,1:2]
b=df.1[df.1$Z%%2==0,c(1,3)]
names(a)=c("X.X","Y.Z")
names(b)=names(a)
df.2=rbind(a, b)
If you want to group them by X.X as shown in your example, you can do:
library(plyr)
arrange(df.2, X.X)

Resources