R: Create a function with optional parameters without if statement - r

I want to create a function with optional parameters. I am working on a dataset with income, states, and a few flags. The function takes the data, state and flag and filters it. If state and flag is missing, it should return the entire data, and so on. So far I have this code which is working fine. However, I want to know if there is a simpler code than this.
myfun <- function(data, states, flag){
if(missing(states)) {
if(missing(flag)) {
new_data <- data
} else {
new_data <- subset(data, data$Flag == flag)
}
} else if(missing(flag)) {
new_data <- subset(data, data$State == states)
} else {
new_data <- subset(data, (data$State == states & data$Flag == flag))
}
temp_data <- toJSON(new_data)
save(temp_data, file = "Mydata.JSON")
return(new_data)
}
I know we can have optional parameters by having a default parameter like flag = Y. But how do I pass both the parameters in default parameters i.e. flag = Y & N, or all the 50 states. I am new to R and any help would be greatly appreciated.
Update: Got a solution I was looking for thanks to Daniel Winkler
test2 <- function(data,states = unique(data$State),flag = c("Y","N"))
{my_data<-subset(data, (data$State %in% states & data$Flag %in% flag))}

For passing multiple values to the function:
myfun<-function(data,states,flag){
if(missing(states))
{
if(missing(flag))
{
new_data<-data
}
else
{
new_data<-subset(data, data$Flag %in% flag)
}
}
else if(missing(flag))
{
new_data<-subset(data, data$State %in% states)
}
else
{
new_data<-subset(data, (data$State %in% states & data$Flag %in% flag))
}
temp_data <- toJSON(new_data)
save(temp_data, file="Mydata.JSON")
return(new_data)
}
Then call with:
all_states <- unique(mydata$States)
myfun(data = mydata, states = all_states, flag = c('Y', 'N'))

Related

Convert time don't work on a column's dataframe

I have some data from event producer. In a "created_at column I have mixed type of datetime value.
Some NA, some ISO8601 like, some POSIX with and without millisec.
I build a func that should take care of everything meanning let's NA and ISO8601 info as it is, and convert POSIX date to ISO8601.
library(anytime)
convert_time <- function(x) {
nb_char = nchar(x)
if (is.na(x)) return(x)
else if (nb_char == 10 | nb_char == 13) {
num_x = as.numeric(x)
if (nb_char == 13) {
num_x = round(num_x / 1000, 0)
}
return(anytime(num_x))
}
return(x)
}
If I passe one problematic value
convert_time("1613488656")
"2021-02-16 15:17:36 UTC"
Works well !
Now
df_offer2$created_at = df_offer2$created_at %>% sapply(convert_time)
I still have the problematic values.
Any tips here ?
I would suggest the following small changes...
convert_time <- function(x) {
nb_char = nchar(x)
if (is.na(x)) return(x)
else if (nb_char == 10 | nb_char == 13) {
num_x = as.numeric(x)
if (nb_char == 13) {
num_x = round(num_x / 1000, 0)
}
return(num_x) #remove anytime from here
}
return(x)
}
df_offer2$created_at = df_offer2$created_at %>%
sapply(convert_time) %>% anytime() #put it back in at this point
Two things that have worked for me:
col1<-seq(from=1,to=10)
col2<-rep("1613488656",10)
df <- data.frame(cbind(col1,col2))
colnames(df)<-c("index","created_at")
df <- df%>%
mutate(converted = convert_time(df$created_at))`
alternatively
col1<-seq(from=1,to=10)
col2<-rep("1613488656",10)
df <- data.frame(cbind(col1,col2))
colnames(df)<-c("index","created_at")
df$created_at <- convert_time(df$created_at)
Both spit out warnings but appear to make the correction properly

Best practice using multiple null arguments in writing R function

I'm writing a function that subset a dataframe based on the variables passed to it. I read in Advanced R to use the is_null function to check for null arguments. I've added 2 arguments which is already an if/elseif/elseif/else. I'm afraid if I add many more of the arguments readability of the code will greatly suffer. Is my method best practice?
add_scores <- function(data,
study = NULL,
therapeutic_area = NULL ){
if (is_null(study) & is_null(therapeutic_area)){
temp <- data
} else if (!is_null(study) & is_null(therapeutic_area)){
temp <- data %>%
filter(BC == study)
} else if (is_null(study) & !is_null(therapeutic_area)) {
temp <- data %>%
filter(PPDDIVISIONPRI == therapeutic_area)
} else {
temp <- data %>%
filter(
BC == study &
PPDDIVISIONPRI == therapeutic_area)
}
return(
temp %>%
mutate(ENROLLMENTRANK = dense_rank(desc(ENROLLMENTRATE)),
CYCLETIMERANK = dense_rank(CYCLETIME)*2,
TOTALRANK = dense_rank(ENROLLMENTRANK + CYCLETIMERANK)
) %>%
arrange(TOTALRANK, ENROLLMENTRANK, CYCLETIMERANK)
)
}
Edited:
In your specific issue, you can separate out the if tests:
if(!is.null(study)) data <- filter(data, BC==study)
if(!is.null(therapeutic_area)) data <- filter(data, PPDDIVISIONPRI==therapeutic_area)
Otherwise, as you point out, the number of permutations will rapidly increase!

How to check if data table has empty rows?

I am writing function that filters out some random data from a data table random according to an id value ,but I want it first to check if there is actual data rows in the data table first. I ended up writing an if statement and using is.null but its still for the condition but its not working and ends up accessing the else statement and then giving me an error.
The code is posted below, please help me
new.filterID <- function(DataTable,id) {
if(DataTable == is.null){
return(print("No Data Available: ",id))
} else { filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
The error I get when I run this is
Error in aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...) :
no rows to aggregate
Also below the empty data table random
Value YEAR_WW
I think you can use nrow if you just want to check if the number of lines is zero:
new.filterID <- function(DataTable,id) {
if(nrow(DataTable) == 0){
return(print("No Data Available: ",id))
} else {
filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
However, if you want to check if data.table is null you can check if there is some column on it:
new.filterID <- function(DataTable,id) {
if(nrow(DataTable) == 0 & length(names(DataTable)) == 0){
return(print("No Data Available: ",id))
} else {
filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
Data table has not a method to check if it is null, yet.

Passing argument to subset() and unique()

I am using the phyloseq package.
test <- function( ...){
bar <- unique(sampleData[,'pH'])
foo <- subset_samples(phyloseqObject, pH == as.numeric(bar[1]#.Data))
print(foo)
}
test(pH)
I want to pass pH as an argument to test() but unique() won't accept it as valid. I can pass 'pH' to test() but subset_samples() won't accept that as valid. I have tried coercing the argument to several different types with no luck.
SORCE for subset_samples:
subset_samples <- function(physeq, ...){
if( is.null(sample_data(physeq)) ){
cat("Nothing subset. No sample_data in physeq.\n")
return(physeq)
} else {
oldDF <- as(sample_data(physeq), "data.frame")
newDF <- subset(oldDF, ...)
if( class(physeq) == "sample_data" ){
return(sample_data(newDF))
} else {
sample_data(physeq) <- sample_data(newDF)
return(physeq)
}
}
}
Try this instead:
test=function(x,...){
bar=unique(mtcars[,x])
foo=subset(mtcars,mtcars[,x]==bar[1])
return(foo)
}
Building on what #desc said I managed to solve it like this:
test <- function(...){
bar <- unique(sampleData[,...])
foo <- subset_samples(phyloseqObject, eval(parse(bar#names)) == as.numeric(bar[1]))
print(foo)
}
test('pH')

quirky behavoir with rvest wikipedia scraping

Sorry, this is a weird question, but I just can't seem to figure it out myself. The good news is: I think it's totally reproducible.
I'm trying to build a simple R function to use {rvest} to scrape wikipedia for the hometown of musicians. Basically, the function I wrote works, but for certain artists, it doesn't work (returns NULL). (Randy Newman is one such, so I'll use him as an example.)
When I just run the whole thing (below) and then findHome("randy newman") I get NULL but then when I attempt to debug, I run the tableMusic() function and then artist <- "randy newman" and then run all the guts of the artistData() function line by line, IT WORKS!
AND THEN, once I've done that, I can run findHome("randy newman") and it will work right. What gives?! Do I have something in the wrong order or something? I can't seem to figure it out.
Any help is much appreciated. Here is the code:
library(rvest)
findHome <- function(artist) {
##function to look for the table with the right info
tableMusic <- function(data) {
if(!any(grepl("years active|labels|instruments", data[,1], ignore.case=T))) {
for (i in 2:5) {
data <- try(url %>% html %>% html_nodes(xpath=paste('//*[#id="mw-content-text"]/table[', i, ']', sep="")) %>% html_table(fill=T), silent=T)
if(!class(data)=="try-error" & length(data)>0) {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
if(any(grepl("years active|labels|instruments", data[,1], ignore.case=T))) {
break
}
}
}
}
if(class(data)=="try-error" | length(data)<1) {
data <- NULL
} else if (!any(grepl("years active|labels|instruments", data[,1], ignore.case=T))) {
data <- NULL
}
data
}
#function to pull data and try different pages if the first is wrong
artistData <- function(artist) {
artist <- gsub(" ", "_", artist)
artist <- gsub("'", "%27", artist)
##first try getting the data
url <- paste("https://en.wikipedia.org/wiki/", artist, sep="")
data <- try(url %>% html %>% html_nodes(xpath='//*[#id="mw-content-text"]/table[1]') %>% html_table(fill=T), silent=T)
##check if it's the right page (deal with disambiguation issues)
if(!class(data)=="try-error" & length(data)>0) {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
data <- tableMusic(data)
}
## if try-error or musicTable==NULL, try _(band)
if(class(data)=="try-error" | is.null(data) | length(data)<1) {
url <- paste("https://en.wikipedia.org/wiki/", artist, "_(band)", sep="")
data <- try(url %>% html %>% html_nodes(xpath='//*[#id="mw-content-text"]/table[1]') %>% html_table(fill=T), silent=T)
if(class(data)=="try-error"){
data <- NULL
} else {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
data <- tableMusic(data)
}
} else {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
}
## if try-error or musicTable==NULL, try _(musician)
if(class(data)=="try-error" | is.null(data) | length(data)<1) {
url <- paste("https://en.wikipedia.org/wiki/", artist, "_(musician)", sep="")
data <- try(url %>% html %>% html_nodes(xpath='//*[#id="mw-content-text"]/table[1]') %>% html_table(fill=T), silent=T)
if(class(data)=="try-error"){
data <- NULL
} else {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
data <- tableMusic(data)
}
} else {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
}
data
}
## first try finding data
data <- artistData(artist)
## try finding with and/&
if(is.null(data)){data <- artistData(unlist(strsplit(artist, " and| &"))[1])}
## if no matches return ""
if(class(data)=="try-error" | is.null(data)) {
data <- ""
return()
} else {
if(class(data)!="data.frame") {data <- data.frame(data, stringsAsFactors=F)}
}
## if we have a matching page, pull the relevant data
origin <- data[data[,1]=="Origin",2]
if(length(origin)>0) {
home <- origin
} else {
born <- data[data[,1]=="Born",2]
if (length(born)>0) {
home <- unlist(strsplit(born, "age.[0-9]+)"))[2]
} else {
home <- ""
}
}
home
}
findHome("randy newman")
I figured it out. I had to add a url parameter into the tableMusic() function. As it was, it recycling the url from past searches. Thanks for the suggestion.

Resources