'Macro Variables' in R - r

I am trying to build a process that accepts user input parameter and then produces things accordingly.
I need to be able to:
1. Input a variable
2. Pull max date for that variable
3. Pull all data less than or equal to that date
dates <- c('2001-01-08', '2015-01-07', '2013-03-03', '2001-01-01', '2013-07-25', '2000-09-20', '2017-02-20')
groups <- c('A', 'A', 'A', 'B', 'B', 'C', 'D')
dat <- data.frame(groups, dates)
dat$dates <- as.Date(dat$dates)
The following piece works for what I want to do....
querydate <- sqldf(
"SELECT max(dates) as x
FROM dat
WHERE groups == 'A'")
But I want to edit this to do something like this....where I specify a value and query references...
group_i_want <- 'A'
querydate <- sqldf(
"SELECT max(dates) as x
FROM dat
WHERE groups == group_i_want")
How can I get R to recognize this value?

You can look into using sprintf to do string formatting on values you collect at runtime. For example:
g <- "A"
if (invalid.input(g)) stop("Error") # Make sure input was valid
query <- sprintf("SELECT max(dates) as x FROM dat WHERE groups == '%s'", g)
querydate <- sqldf(query)
Here the %s will be substituted by the string contained in g. You can also substitute numbers with specific formatting, check out ?sprintf for more information on it.

Related

How to retain class of variable in `tapply`?

Suppose my data frame is set up like so:
X <- data.frame(
id = c('A', 'A', 'B', 'B'),
dt = as.Date(c('2020-01-01', '2020-01-02', '2021-01-01', '2021-01-02'))
)
and I want to populate a variable of the id-specific minimum value of date dt
Doing: X$dtmin <- with(X, tapply(dt, id, min)[id]) gives a numeric because the simplify=T in tapply has cast the value to numeric. Why has it done this? Setting simplify=F returns a list which each element in the list has the desired data structure, but populating the variable in my dataframe X casts these back to numeric. Yet calling as.Date(<output>, origin='1970-01-01') seems needlessly verbose. How can I retain the data structure of dt?
We may use
X$dtmin <- with(X, do.call("c", tapply(dt, id, min, simplify = FALSE)[id]))
Or use dplyr
library(dplyr)
X %>%
mutate(dtmin = min(dt), .by = "id")

Generalizable function to select and filter dataframe r - using shiny input

I am building a shiny app. The user will need to be able to reduce the data by selecting variables and filtering on specific values for those variables. I am stuck trying to get a generalizable function that can work based on all possible selections.
Here is an example - I skip the shiny code because I think the problem is with the function:
#sample dataframe
df <- data.frame('date' = c(1, 2, 3, 2, 2, 3, 1),
'time' = c('a', 'b', 'c', 'e', 'b', 'a', 'e'),
'place' = c('A', 'A', 'A', 'H', 'A', 'H', 'H'),
'result' = c('W', 'W', 'L', 'W', 'W', 'L', 'L'))
If the user selected date and result for the date values 1, 2; and the result values W, I would do the following:
out <- df %>%
select(date, result) %>%
filter(date %in% c(1,2)) %>%
filter(result %in% c('W'))
The challenge I am having is that the user can select any unique combinations of variables and values. Using the input$ values from my shiny app, I can get the selected variables into a vector and I can get the selected values into a list of values, positionaly matching the selected variables. For example:
selected_variables <- c('date', 'result')
selected_values <- list(c(1,2), c('W'))
What i think i then need is a generalizable function that will match up the filter calls with the correct variables. Something like:
#function that takes data frame, vector of selected variables, list of vectors of chosen values for each variable
#Returns a reduced table of selected variables, filtered values
table_reducer <- function(df, select_var, filter_values) {
#select the variables
out <- df %>%
#now filter each variable by the values contained in the list
select(vect_of_var)
out <- [for loop that iterates over vect_of_var, list_of_vec, filtering accordingly]
out #return out
}
My thinking would be to use a zip equivalent from python, but all my searching on that just points me to mapply and i can't see how to use that within the for loop (which i also know is not always approved in R - but i am talking about a relatively small number of iterations). If there is a better solution to this i would welcome it.
Here's a 1-liner table_reducer function in base R -
table_reducer <- function(df, select_var, filter_values) {
subset(df, Reduce(`&`, Map(`%in%`, df[select_var], filter_values)))
}
selected_variables <- c('date', 'result')
selected_values <- list(c(1,2), c('W'))
table_reducer(df, selected_variables, selected_values)
# date time place result
#1 1 a A W
#2 2 b A W
#4 2 e H W
#5 2 b A W
Map is a wrapper over mapply so you were right in thinking that you should use mapply for this task. This answer is also free of dreaded for loops.

How to OR Loop in R

I have a data set with 100 values and want to pick only specific items from that data set. That's how I do it right now:
df.match <- subset(df.raw.csv, value == "UC9d" | value == "UCenoM“)
It's working but I want to solve it with a loop. I tried this but I only get one match. Although I know both values are in the data set.
for (ID in c("UC9d" , "UCenoM")){df.match <- subset(df.raw.csv, value == ID)}
Any suggestions?
My suggestion would be not to use loops in R:
library(dplyr)
mydata <- mutate(mydata, TOBEINCL = 0) #rename according to your data
Create a list of patterns for the match of mydata$ID (^ and $ are for exact matching):
toMatch <- c("^UC9d$", "^UCenoM$")
Use pattern matching from base R:
mydata$TOBEINCL[grep(paste(toMatch,collapse="|"), mydata$ID, ignore.case = FALSE, invert = TRUE)] <- 1
Select data:
mydataINCL <- mydata[(mydata$TOBEINCL==1) , ]
mydataINCL$ID <- factor(mydataINCL$ID) #sometimes R sticks with the old values
An option:
df.match <- subset(df.raw.csv, value %in% c("UcenoM", "Uc9d"))

Calculate length of each object in R

I would like to calculate the length of many objects in R and return those objects with the name-prefix 'length_'. However, when I type this code:
A <- c('A', 'B', '3')
B <- c('A', '2')
files <- ls()
for (i in 1:length(files)) assign(paste("length_",files[i], sep = ""), length(unlist(files[i])))
This returns the vectors length_A and length_B, but each with the value 1 and not 3 and 2.
Thank you for any help,
Paul
p.s. I actually would like to apply this to a different function instead of length (GC.content from package ape to calculate GC content of DNA-sequences), but with that function I have the same problem as with the abovementioned example.
In R 3.2.0, the lengths function was introduced which calculates the length of each item of a list. Using this function, as #docendo-discimus notes in the comments above, a super compact (and R-like) solution is
lengths(mget(ls()))
which returns a named vector
A B
3 2
mget returns a list of objects in the environment and is sort of like "multipleget."
A <- c('A', 'B', '3')
B <- c('A', '2')
files <- ls()
for (i in 1:length(files)) assign(paste("length_",files[i], sep = ""), length(get(files[i])))
This create a length_A of value 3 and length_B of value 2.
A <- c('A', 'B', '3')
B <- c('A', '2')
files <- list(A,B)
sapply(files,length)
this will give you the answer but I don't know if it's what you want.

Case usage in R:Count number of events from Table 2 when case in Table 1 satisfy specific restrictions

The DF for Table 1 is like this:
df1 <- data.frame(ID = c('001','001','002','003', '003', '003'),
date = c('2015-05-23', '2015-07-29', '2015-08-08', '2015-06-10', '2015-10-12', '2015-11-15'),
date_last = c('2015-01-20', '2015-05-23', '2015-05-15', '2015-01-20', '2015-06-10', '2015-10-12'))
And the DF for Table 2 is like this:
df2 <- data.frame(Event = c('A', 'B', 'C', 'D', 'E'),
Event_date = c('2015-01-21', '2015-01-21', '2015-03-29', '2015-08-12', '2015-10-12'))
what I want to get is to get case when df1$date_last < df2$Event_date < df1$date, then count(Event) as 1 and sum up how many events during the time period. The ideal result I want to have is like the following:
df3 <- data.frame(ID = c('001','001','002','003', '003', '003'),
date = c('2015-05-23', '2015-07-29', '2015-02-08', '2015-06-10', '2015-10-12', '2015-11-15'),
date_last = c('2015-01-20', '2015-05-23', '2015-05-15', '2015-01-20', '2015-06-10', '2015-10-12'),
number_of_events = c(3,1,0,3,1,0))
Anyone know the R code for this? Thank you so much!
Make sure that all your dates are of class date. You simply to this by putting as.Date() around the columns in the creation of the data frames.
First define a function with x being a vector with end and start date respectively, and y being a vector with dates that should be checked.
nr_events_in_between <- function(x, y) sum(x[2] < y & x[1] > y)
Apply this to all rows in df1 and you get the number_of_events column.
apply(df1[ ,c('date', 'date_last')], 1, nr_events_in_between, df2[,'Event_date'])
(Note that for the second row the value is 0 not 1 as you state in the example for df3)

Resources