Multiplication in FUN argument - r

I have this dataframe
x <- data.frame(
matrix(
c(letters[1:3], c("x", "x", "y") ,
sample(c(rep(1,100),0), size = 1),
sample(c(rep(1,100),0), size = 1),
sample(c(rep(1,100),0), size = 1)), ncol = 3)
)
I would like to do multiplication by group X and Y.
My suggestion
agg <- aggregate(x$X3,
by = list(x$X2),
FUN = *)
I would like to use something like sum, mean byt to multiply

+ is to sum as * is to prod (for product).
Your sample data follows the anti-pattern of data.frame(matrix()). A matrix can only have one data type. You mix character and numeric data in the matrix, and the matrix makes it all character class, and you can't do math on characters. Here's proper sample data and a demonstration the solution works. Also note that using by = X["X2"] instead of by = list(x$X2) gives a nicer column name in the result.
(x <- data.frame(
X1 = letters[1:3],
X2 = c("x", "x", "y") ,
X3 = 2:4
))
# X1 X2 X3
# 1 a x 2
# 2 b x 3
# 3 c y 4
aggregate(x$X3, by = x["X2"], FUN = prod)
# X2 x
# 1 x 6
# 2 y 4

Either use prod or use Reduce with *. Also convert X3 to numeric and and use single brackets as shown to preserve the names. Alternately use the aggregate formula method, shown only for prod but applies to Reduce as well.
xx <- transform(x, X3 = as.numeric(X3))
aggregate(xx["X3"], by = xx["X2"], FUN = prod)
aggregate(xx["X3"], by = xx["X2"], FUN = Reduce, f = `*`) # same
aggregate(X3 ~ X2, xx, FUN = prod)
A better example might be to use mtcars that comes with R:
aggregate(mtcars["mpg"], by = mtcars["cyl"], FUN = prod)
aggregate(mtcars["mpg"], by = mtcars["cyl"], FUN = Reduce, f = `*`) # same
aggregate(mpg ~ cyl, mtcars, FUN = prod)

Related

Converting a Nested For Loop into `sapply()` in R

I have been trying to create a series of coplots using a nested for loop but the loop takes too long to run (the original data set is very big). I have looked at similar questions and they suggest using the sapply function but I am still unclear about how to convert between the 2. I understand I need to create a plotting function to use (see below) but what I don't understand is how the i's and j's of the nested for loop into sapply arguements.
I have made some sample data, the nested for loop that I have been using and the plotting function I created that are below. Could someone walk me through how I convert my nested for loop into sapply arguements. I have been doing all of this in R. Many Thanks
y = rnorm(n = 200, mean = 10, sd = 2)
x1 = rnorm(n = 200, mean = 5, sd = 2)
x2 = rnorm(n = 200, mean = 2.5, sd = 2)
x3 = rep(letters[1:4], each = 50)
x4 = rep(LETTERS[1:8], each = 25)
dat = data.frame(y = y, x1 = x1, x2 = x2, x3 = x3, x4 = x4)
for(i in dat[, 2:3]){
for(j in dat[, 4:5]){
coplot(y ~ i | j, rows = 1, data = dat)
}
}
coplop_fun = function(data, x, y, x, na.rm = TRUE){
coplot(.data[[y]] ~ .data[[x]] | .data[[z]], data = data, rows = 1)
}
I think you might be able to use mapply here and not sapply. mapply is similar to sapply but allows for you to pass two inputs instead of one.
y = rnorm(n = 200, mean = 10, sd = 2)
x1 = rnorm(n = 200, mean = 5, sd = 2)
x2 = rnorm(n = 200, mean = 2.5, sd = 2)
x3 = rep(letters[1:4], each = 50)
x4 = rep(LETTERS[1:8], each = 25)
dat = data.frame(y = y, x1 = x1, x2 = x2, x3 = x3, x4 = x4)
for(i in dat[, 2:3]){
for(j in dat[, 4:5]){
coplot(y ~ i | j, rows = 1, data = dat)
}
}
mapply(function(x,j){coplot(dat[["y"]]~x|j,rows =1)}, dat[,2:3],dat[,4:5])
We can use a combination of functions expand.grid, formula and apply to accept character column names into coplot.
# combinations of column names for plotting
vars <- expand.grid(y = "y", x = c("x1", "x2"), z = c("x3", "x4"))
# cycle through column name variations, construct formula for each combination
apply(vars, MARGIN = 1,
FUN = function(x) coplot(
formula = formula(paste(x[1], "~", x[2], "|", x[3])),
data = dat, row = 1
)
)
Here's a tidyverse version of #nya's solution with expand.grid() and apply(). Each row in ds_plot_parameters represents a single plot. The equation variable is the string eventually passed to coplot().
Each equation is passed to purrr::walk(), which then calls coplot()
to produce one graph each. as.equation() converts the string to an equation.
ds_plot_parameters <-
tidyr::expand_grid(
v = c("x1", "x2"),
w = c("x3", "x4")
) |>
dplyr::mutate(
equation = paste0("y ~ ", v, " | ", w),
)
ds_plot_parameters$equation |>
purrr::walk(
\(e) coplot(as.formula(e), rows = 1, data = dat)
)
Gravy:
If you want to more input to the graph, then expand ds_plot_parameters to include other things like graph & axis titles.
ds_plot_parameters <-
tidyr::expand_grid(
v = c("x1", "x2"),
w = c("x3", "x4")
) |>
dplyr::mutate(
equation = paste0("y ~ ", v, " | ", w),
label_y = "Outcome (mL)",
label_x = paste(v, " (log 10)")
)
ds_plot_parameters |>
dplyr::select(
# Make sure this order exactly matches the function signature
equation,
label_x,
label_y,
) |>
purrr::pwalk(
.f = \(equation, label_x, label_y) {
coplot(
formula = as.formula(equation),
xlab = label_x,
ylab = label_y,
rows = 1,
data = dat
)
}
)
ds_plot_parameters
# # A tibble: 4 x 5
# v w equation label_y label_x
# <chr> <chr> <chr> <chr> <chr>
# 1 x1 x3 y ~ x1 | x3 Outcome (mL) x1 (log 10)
# 2 x1 x4 y ~ x1 | x4 Outcome (mL) x1 (log 10)
# 3 x2 x3 y ~ x2 | x3 Outcome (mL) x2 (log 10)
# 4 x2 x4 y ~ x2 | x4 Outcome (mL) x2 (log 10)

custom function does not work on column named "x" unless specified by .$x in summarise() dplyr R

I wanted to create a custom function to calculate confidence intervals of a column by creating two columns called lower.bound and upper.bound. I also wanted this function to be able to work within dplyr::summarize() function.
The function works as expected in all tested circumstances, but it does not when the column is named "x". When it is it draws a warning and returns NaN values. It only works when the column is specifically declared as .$x. Here is an example of the code. I don't understand the nuance... could you point me to the right direction in understanding this?
set.seed(12)
# creates random data frame
z <- data.frame(
x = runif(100),
y = runif(100),
z = runif(100)
)
# creates function to calculate confidence intervals
conf.int <- function(x, alpha = 0.05) {
sample.mean <- mean(x)
sample.n <- length(x)
sample.sd <- sd(x)
sample.se <- sample.sd / sqrt(sample.n)
t.score <- qt(p = alpha / 2,
df = sample.n - 1,
lower.tail = F)
margin.error <- t.score * sample.se
lower.bound <- sample.mean - margin.error
upper.bound <- sample.mean + margin.error
as.data.frame(cbind(lower.bound, upper.bound))
}
# This works as expected
z %>%
summarise(x = mean(y), conf.int(y))
# This does not
z %>%
summarise(x = mean(x), conf.int(x))
# This does
z %>%
summarise(x = mean(x), conf.int(.$x))
Thanks!
This is a "feature" in dplyr which makes the updated value of x (which has the mean value) is available when you pass it to conf.int function.
Possible options are -
Change the name of the variable to store the mean value
library(dplyr)
z %>% summarise(x1 = mean(x), conf.int(x))
# x1 lower.bound upper.bound
#1 0.4797154 0.4248486 0.5345822
Change the order
z %>% summarise(conf.int(x), x = mean(x))
# lower.bound upper.bound x
#1 0.4248486 0.5345822 0.4797154

how to apply functions on data frame in r

How can i apply the following function rt on each and every value l in df.
x and y have the following values.
x<-9
y<-1
rt<-function(x,y,l) min(x,max(0,l-y))
df
a b c
5 6 7
1 4 1
2 4 3
Probably simplest if you'd like to stick with dataframes is to use apply with the MARGIN parameter set to c(1,2), which makes it apply the function by both rows and columns (i.e., to every cell).
x <- 9
y <- 1
rt <- function(x, y, l) min(x, max(0, l-y))
df <- data.frame(a = c(5, 1, 2),
b = c(6, 4, 4),
c = c(7, 1, 3))
rt_df <- as.data.frame(apply(df, c(1,2), rt, x = x, y = y))

Matrix version of rasterToPoints?

Anyone know of a non-raster method to achieve the following?
require(raster)
d = data.frame(rasterToPoints(raster(volcano)))
head(d)
x y layer
1 0.008196721 0.9942529 100
2 0.024590164 0.9942529 100
3 0.040983607 0.9942529 101
4 0.057377049 0.9942529 101
5 0.073770492 0.9942529 101
6 0.090163934 0.9942529 101
Cheers.
One way would be to use the row and col command:
library(raster)
data(volcano)
df <- data.frame(
x = as.vector(col(volcano)),
y = (yy <- as.vector(row(volcano)))[length(yy):1],
val = as.vector(volcano)
)
raster rescales the range to 0 - 1, if not specified differently, so we would to have to do this too:
## rescale
df$x <- with(df, (x - min(x)) / (max(x) - min(x)))
df$y <- with(df, (y - min(x)) / (max(y) - min(y)))
Finally lets check, that the results are the same:
## Using raster df1 <- data.frame(rasterToPoints(raster(volcano)))
cols <- colorRampPalette(c('white', "blue",'red')) df$col <-
cols(20)[as.numeric(cut(df$val, breaks = 20))] df1$col <-
cols(20)[as.numeric(cut(df1$layer, breaks = 20))]
par(mfrow = c(1, 2)) plot(df[, 1:2], col = df$col, pch = 20, main =
"matrix")
plot(df1[, 1:2], col = df1$col, pch = 20, main = "raster")
Note:
While the results appear the same visually, they are not. The resolution of the raster command is most likely different, and hence there are different nrows for df and df1.
Faster for large matrices:
data.frame(
x = rep(1:ncol(m), each=nrow(m)),
y = rep(nrow(m):1, ncol(m)),
val = as.vector(m)
)

Create R Function with flexibility to reference different datasets

I am trying to create a simple function in R that can reference multiple datasets and multiple variable names. Using the following code, I get an error, which I believe is due to referencing:
set.seed(123)
dat1 <- data.frame(x = sample(10), y = sample(10), z = sample(10))
dat2 <- data.frame(x = sample(10), y = sample(10), z = sample(10))
table(dat1$x, dat1$y)
table(dat2$x, dat2$y)
fun <- function(dat, sig, range){print(table(dat$sig, dat$range))}
fun(dat = dat1, sig = x, range = y)
fun(dat = dat2, sig = x, range = y)
Any idea how to adjust this code so that it can return the table appropriately?
The [[ ]] operator on data frame is similar to $ but allows you to introduce an object and look for it's value. Then outside of the function you assign "x" value to sig. if you don't put quotes there R will look for x object
fun <- function(dat, sig, range){print(table(dat[[sig]], dat[[range]]))}
fun(dat = dat1, sig = "x", range = "y")
fun(dat = dat2, sig = "x", range = "y")

Resources