how to pass options to a function using dplyr mutate_at - r

I want to center, but not standardize, a set of variables in a data frame. I tried the code for doing that using mutate_at, but the scale function uses scale = TRUE as default, and I can't figure out how to set it to scale = FALSE. Tis scales the desired variables, but standardizes in addition to centering:
centdata <- mydat %>%
mutate_at(.vars = c(1, 2, 3, 4, 5, 6, 7, 8, 14),
.funs = list("scaled" = scale))

You can use purrr style formula or an anonymous function here.
library(dplyr)
cols <- c(1, 2, 3, 4, 5, 6, 7, 8, 14)
centdata <- mydat %>%
mutate_at(.vars = cols,
.funs = list("scaled" = ~scale(., scale = FALSE)))
Since mutate_at has been deprecated, you can use across.
centdata <- mydat %>%
mutate(across(cols, list("scaled" = ~scale(., scale = FALSE))))
In base R -
mydat[paste0(names(mydat)[cols], '_scaled')] <- lapply(mydat[cols], scale, scale = FALSE)
scale also work on dataframe directly.
mydat[paste0(names(mydat)[cols], '_scaled')] <- scale(mydat[cols])

Related

R ave with multiple arguments / rank by group with weighting

I am using ave for ranking values within groups in a dataset in R. In the example 'data' is a data.frame with the cols raw, group and others, for example
data <- data.frame(raw = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), group = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), weight = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)))
The ranking works fine with
data$rank <- ave(data$raw, data$group, FUN = function(x) {rank(x)})
I would like to generalize this approach by applying weights. The weights are available as another col in the data.frame. The weighted ranking is a self defined function that needs both the raw scores and the weights vector. It is available via the cNORM package, code: https://github.com/WLenhard/cNORM/blob/master/R/utilities.R
Is it possible to use ave with multiple input variables, e. g.
data$rank <- ave(x = data$raw, data$group, y = data$weights, FUN = function(x, y) {weighted.rank(x, weights = y)})
so that both x and y are both the according subsets based on the grouping variable? I guess packages like dplyr have functions for that. Is there a way to do that with base R as well and without changing the order of the rows in the original data frame?
Many thanks!
Edit: The solution from Ronak Shah perfectly solves the problem. Thanks!
You can use by for base R option.
library(cNORM)
data$rank <- unlist(by(data, data$group, function(x) weighted.rank(x$raw, x$weight)))
In dplyr you could do :
library(dplyr)
data %>% group_by(group) %>% mutate(rank = weighted.rank(raw, weight))

How to apply a function to grouped set and bind columns to existing dataframe

I'm looking to run a function on each group of a dataset, and bind the output to the existing set inside the tidyverse environment. After the example set, I've added how I do it right now, which requires splitting the set and running lapply (I want to move everything towards the tidyverse).
library(TTR)
test = data.frame('high'=rnorm(100,10,0.1),'low'=rnorm(100,0,0.1), 'close'=rnorm(100,5,0.1))
stoch(test,
nFastK = 14, nFastD = 3, nSlowD = 3,
maType=list(list(SMA), list(SMA), list(SMA)),
bounded = TRUE,
smooth = 1)
Here is how it used to be done with lists:
get_stoch = function(dat_) {
stochs = stoch(dat_ %>% select(-ticker), nFastK = 14, nFastD = 3, nSlowD = 3,
maType=list(list(SMA), list(SMA), list(SMA)),
bounded = TRUE, smooth = 1)
dat_ = cbind(dat_,stochs)
}
test = data.frame('ticker'=c(rep('A',50),rep('B',50)),
'high'=rnorm(100,10,0.1),'low'=rnorm(100,0,0.1), 'close'=rnorm(100,5,0.1)) %>%
split(.,.$ticker) %>%
lapply(.,get_stoch) %>%
bind_rows
If you want to translate your code to tidyverse you can use :
library(dplyr)
library(purrr)
df %>% group_split(ticker) %>% map_dfr(get_stoch)
You can use plyr::ddply to run a split-apply-bind method in tidyverse-like language:
df <- data.frame(ticker = c(rep('A', 50), rep('B', 50)),
high = rnorm(100, 10, 0.1),
low = rnorm(100, 0, 0.1),
close = rnorm(100, 5, 0.1))
test1 <- df %>%
split(.,.$ticker) %>%
lapply(.,get_stoch) %>%
bind_rows
test2 <- df %>%
ddply("ticker", get_stoch)
identical(test1, test2)
#> [1] TRUE

Dynamic filtering with dplyr

For all numeric fields in a shiny app I am adding dynamically a slider. Now I want to add also
a dynamic filter for the data based on the slider input.
To illustrate the problem some code with data and static filtering:
library(glue)
library(tidyverse)
data <-
tibble(
a = c(61, 7, 10, 2, 5, 7, 23, 60),
b = c(2, 7, 1, 9, 6, 7, 3, 6),
c = c(21, 70, 1, 4, 6, 2, 3, 61)
)
input <- list("a" = c(2, 10),
"b" = c(7, 10),
"c" = c(1, 5))
data %>% filter(
between(a, input$a[1], input$a[2]),
between(b, input$b[1], input$b[2]),
between(c, input$c[1], input$c[2])
)
Is there a way to implement dynamic filtering?
I built myself a dynamic filter, which basically works:
query <- data %>% filter_if(is.numeric) %>% colnames() %>% map( function(feature){
"between({
feature
}, input${
feature
}[1], input${
feature
}[2])" %>% glue()
}) %>% paste0(collapse = ", ")
eval(parse(text = "data %>% filter({
query
})" %>% glue()))
Is there are a dplyr way?

OR operator in filter()?

I want to use the filter() function to find the types that have an x value less than or equal to 4, OR a y value greater than 5. I think this might be a simple fix I just can't find much info on ?filter(). I almost have it I think:
x = c(1, 2, 3, 4, 5, 6)
y = c(3, 6, 1, 9, 1, 1)
type = c("cars", "bikes", "trains")
df = data.frame(x, y, type)
df2 = df %>%
filter(x<=4)
Try
df %>%
filter(x <=4| y>=5)

How to use dplyr to make modifications in a dataframe equivalent to the use of a 'which'?

If I have a data frame, say
df <- data.frame(x = c(1, 2, 3), y = c(2, 4, 7), z = c(3, 6, 10))
then I can modify entries with the which function:
w <- which(df[,"y"] == 7)
df[w,c("y", "z")] <- data.frame(6, 9)
One way I see to do this with the package dplyr is the following:
df <- df %>%
mutate(W = (y==7),
y = ifelse(W, 6, y),
z = ifelse(W, 9, z)) %>%
select(-W)
But I find it a bit unelegant, and I am not so sure it would replace all kinds of which uses. Ideally I would imagine something like:
df <- df %>%
keep(y == 7) %>%
mutate(y = 6) %>%
unkeep()
where keep would provisionally select rows where modifications are to be made, and unkeep would unselect them to recover the full data frame.

Resources