Dynamic filtering with dplyr - r

For all numeric fields in a shiny app I am adding dynamically a slider. Now I want to add also
a dynamic filter for the data based on the slider input.
To illustrate the problem some code with data and static filtering:
library(glue)
library(tidyverse)
data <-
tibble(
a = c(61, 7, 10, 2, 5, 7, 23, 60),
b = c(2, 7, 1, 9, 6, 7, 3, 6),
c = c(21, 70, 1, 4, 6, 2, 3, 61)
)
input <- list("a" = c(2, 10),
"b" = c(7, 10),
"c" = c(1, 5))
data %>% filter(
between(a, input$a[1], input$a[2]),
between(b, input$b[1], input$b[2]),
between(c, input$c[1], input$c[2])
)
Is there a way to implement dynamic filtering?

I built myself a dynamic filter, which basically works:
query <- data %>% filter_if(is.numeric) %>% colnames() %>% map( function(feature){
"between({
feature
}, input${
feature
}[1], input${
feature
}[2])" %>% glue()
}) %>% paste0(collapse = ", ")
eval(parse(text = "data %>% filter({
query
})" %>% glue()))
Is there are a dplyr way?

Related

Create a new categorical column merging two categorical columns

I have this df
df <- data.table(id=c(1,2,3,4,5,6,7,8,9,10),
var1=c(0,4,5,6,99,3,5,5,23,0),
var2=c(22,4,6,25,6,70,75,23,24,21))
I would like to create a third column being:
df <- data.table(id=c(1,2,3,4,5,6,7,8,9,10),
var1=c(0,4,5,6,99,3,5,5,23,0),
var2=c(22,4,6,25,6,70,75,23,24,21),
var3=c("0_22","4_4","5_6","6_25","99_6","3_70","5_75","5_23","23_24","0_21"))
where the value of each cell will be "var1 underscore var2".
Var1 and Var2 are categorical variables as they represent medications. Var3 would be to represent a combination of medications.
how can I do this?
thanks!
Load packages
library(data.table)
library(dplyr)
Create dataframe
df <- data.table(
id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
var1 = c(0, 4, 5, 6, 99, 3, 5, 5, 23, 0),
var2 = c(22, 4, 6, 25, 6, 70, 75, 23, 24, 21)
)
Add new variable
By means of dplyr package and sprintf
df <- df %>%
mutate(var3 = sprintf("%d_%d", var1, var2))
By means of dplyr package and paste0
df <- df %>%
mutate(var3 = paste0(var1, "_", var2))
By means of base package and sprintf
df$var3 <- sprintf("%d_%d", df$var1, df$var2)
By means of base package and paste0
df$var3 <- paste0(df$var1, "_", df$var2)
as #Wimpel says, the solution is df$var3 <- paste(df$var1, df$var2, sep = "_")
thanks!!
You can do this efficiently using the tidyverse and the unite() function
library(tidyverse)
df <- tibble(id=c(1,2,3,4,5,6,7,8,9,10),
var1=c(0,4,5,6,99,3,5,5,23,0),
var2=c(22,4,6,25,6,70,75,23,24,21)) %>%
# create new variable
unite(var3, c(var1, var2), sep = "_", remove = FALSE)

how to pass options to a function using dplyr mutate_at

I want to center, but not standardize, a set of variables in a data frame. I tried the code for doing that using mutate_at, but the scale function uses scale = TRUE as default, and I can't figure out how to set it to scale = FALSE. Tis scales the desired variables, but standardizes in addition to centering:
centdata <- mydat %>%
mutate_at(.vars = c(1, 2, 3, 4, 5, 6, 7, 8, 14),
.funs = list("scaled" = scale))
You can use purrr style formula or an anonymous function here.
library(dplyr)
cols <- c(1, 2, 3, 4, 5, 6, 7, 8, 14)
centdata <- mydat %>%
mutate_at(.vars = cols,
.funs = list("scaled" = ~scale(., scale = FALSE)))
Since mutate_at has been deprecated, you can use across.
centdata <- mydat %>%
mutate(across(cols, list("scaled" = ~scale(., scale = FALSE))))
In base R -
mydat[paste0(names(mydat)[cols], '_scaled')] <- lapply(mydat[cols], scale, scale = FALSE)
scale also work on dataframe directly.
mydat[paste0(names(mydat)[cols], '_scaled')] <- scale(mydat[cols])

How do you align time-date data across multiple subjects in R

I am trying to scale data from multiple subjects onto the same time-scale. The current data files have 3 months of data for each subject, but the time-stamps for each event for each subject reflect different begin-end dates.
df$ID <- c(1, 1, 1, 1, 1, 2, 2, 2, 2)
df$Time <- c(2:34:00, 2:55:13, 5:23:23, 7:23:04, 9:18:18, 3:22:12, 4:23:02; 5:23:22, 9:30:02)
df$Date <- c(7/13/16, 7/13/16, 7/13/16, 7/14/16, 7/14/16, 1/02/14, 1/02/14, 1/03/14, 1/05/14)
df$widgets <-(4, 6, 9, 18, 3, 3, 7, 9, 12)
I want to change the df to have a common time scale so that I have a date index that allows me to keep the same format like below:
df$ScaleDate <- c(1,1,1,2,2,1,1,2,4) #time scale is within-ID
First, for reference, here's the data I used:
df <- data.frame(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2),
Time = c("2:34:00", "2:55:13", "5:23:23", "7:23:04", "9:18:18", "3:22:12", "4:23:02", "5:23:22", "9:30:02"),
Date = c("7/13/16", "7/13/16", "7/13/16", "7/14/16", "7/14/16", "1/02/14", "1/02/14", "1/03/14", "1/05/14"),
widgets = c(4, 6, 9, 18, 3, 3, 7, 9, 12))
We can use used dplyr syntax (not mandatory, but looks very legible and simple) with lubridate functions and perform this operation very simply:
library(dplyr)
library(lubridate)
df %>%
mutate(Date = mdy(Date)) %>%
group_by(ID) %>%
mutate(ScaleDate = as.numeric((Date - Date[1]) + 1))
mdy converts the values to Date objects. We can then do operations with them. If the dates are equal, it will result in "0 days". Hence, we add 1 and covert it to numeric to get the indexes.

Loop through each variable and collect output R

I have a data frame that looks like this. names and number of columns will NOT be consistent (sometimes 'C' will not be present, other times "D', 'E', 'F' may be present, etc.)
# name and number of columns varies...so need flexible process
A <- c(1, 2, 1, 2, 3, 2, 1, 1, 1, 2, 1, 4, 3, 1, 2, 2, 1, 2, 4, 8)
B <- c(5, 6, 6, 5, 3, 7, 2, 1, 1, 2, 7, 4, 7, 8, 5, 7, 6, 6, 4, 7)
C <- c(9, 1, 2, 2, 1, 4, 5, 6, 7, 8, 89, 9, 7, 6, 5, 6, 8, 9 , 67, 6)
ABC <- data.frame(A, B, C)
I want to loop through each variable and collect various information. This is a simple example, but what I am doing will be more complicated. I say that so that somebody doesn't just recommend some sort of summary() type solution.
maximum_value <- max(A)
mean_value <- mean(A)
# lots of other calculations for A
ID = 'A'
tempA <- data.frame(ID, maximum_value, mean_value)
maximum_value <- max(B)
mean_value <- mean(B)
# lots of other calculations for B
ID = 'B'
tempB <- data.frame(ID, maximum_value, mean_value)
maximum_value <- max(C)
mean_value <- mean(C)
# lots of other calculations for C
ID = 'C'
tempC <- data.frame(ID, maximum_value, mean_value)
output <- rbind(tempA, tempB, tempC)
Here is my attempt at creating a loop to go through the variables one by one and aggregate output. I can't figure out how to get [i] to point at an individual column of the data frame ABC.
# initialize data frame
data__ <- data.frame(ID__ = as.character(),
max__ = as.numeric(),
mean__ = as.numeric())
# loop through A, then B, then C
for(i in A:C) {
ID__ <- '[i]'
max__ <- maximum[i]
mean__ <- mean[i]
data__temp <- (ID__, max__, mean__)
data__ <- rbind(data__, data__temp)
}
If I were doing this in SAS, I would use a select into within proc sql to create a list of the variable names, then write an array, then i could loop through them that way, but there's something I'm missing here.
How would I tell R to do this process for each variable in the data frame?
If you use the tidyverse dplyr and tidyr package, you can do
library(tidyr)
ABC %>% gather(ID, value) %>% group_by(ID) %>% summarize_all(funs(mean, max))
or
ABC %>% gather(ID, value) %>% group_by(ID) %>%
summarize(maximum_value = max(value), mean_value=mean(value))
If you'd rather use base functions and there are a lot of "weird" functions, you can use purrr's map_df function
library(purrr)
map2_df(ABC, names(ABC), function(a, n) {
data_frame(ID=n, max_val=max(a), mean_val=mean(a))
})

OR operator in filter()?

I want to use the filter() function to find the types that have an x value less than or equal to 4, OR a y value greater than 5. I think this might be a simple fix I just can't find much info on ?filter(). I almost have it I think:
x = c(1, 2, 3, 4, 5, 6)
y = c(3, 6, 1, 9, 1, 1)
type = c("cars", "bikes", "trains")
df = data.frame(x, y, type)
df2 = df %>%
filter(x<=4)
Try
df %>%
filter(x <=4| y>=5)

Resources