How to check if data table has empty rows? - r

I am writing function that filters out some random data from a data table random according to an id value ,but I want it first to check if there is actual data rows in the data table first. I ended up writing an if statement and using is.null but its still for the condition but its not working and ends up accessing the else statement and then giving me an error.
The code is posted below, please help me
new.filterID <- function(DataTable,id) {
if(DataTable == is.null){
return(print("No Data Available: ",id))
} else { filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
The error I get when I run this is
Error in aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...) :
no rows to aggregate
Also below the empty data table random
Value YEAR_WW

I think you can use nrow if you just want to check if the number of lines is zero:
new.filterID <- function(DataTable,id) {
if(nrow(DataTable) == 0){
return(print("No Data Available: ",id))
} else {
filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
However, if you want to check if data.table is null you can check if there is some column on it:
new.filterID <- function(DataTable,id) {
if(nrow(DataTable) == 0 & length(names(DataTable)) == 0){
return(print("No Data Available: ",id))
} else {
filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
Data table has not a method to check if it is null, yet.

Related

Iterate over unique values in dataframe, skip some in R

I want to iterate over unique values in a dataframe in R, that's an extract:
for(id in unique(df$event_id)) {
df_id = df %>% filter(event_id == id)
if(!any(df_id$value == "test")) {
next
}
# function and bind_rows based on current id
segments = get_segments(df_id)
all_segments <- bind_rows(all_segments, segments)
}
I get the following error for one unique ID:
Error in if (!any(df_id$value == "test")) { : Missing value
where TRUE/FALSE is needed
The relevant row for this Error has a "NA" in the df_id$value-column. How can I skip this without an error message? I have to change the if condition somehow.
Try modifying the if statement like this to account for NA values:
if(!any(df_id$value == "test") | is.na(df_id$value)) {
next
}

Check if a data frame contains at least one zero value inside an if statement in R

I have a dataframe in R as follows
df <-
as.data.frame(cbind(c(1,2,3,4,5), c(0,1,2,3,4,5),c(1,2,4,5,6)))
and I have a function in which I want the procedure to stop and display a message if the input df contains at least one 0 value. I tried the following but can't make it work properly. What is the correct if() statement I should use?
my_function <- function(df){
if (all(df == 0) == 'TRUE')
stop(paste("invalid input df"))
}
We could use %in%
my_function <- function(df) {
if(0 %in% unlist(df)) {
stop("invalid input df")
}
}

Best practice using multiple null arguments in writing R function

I'm writing a function that subset a dataframe based on the variables passed to it. I read in Advanced R to use the is_null function to check for null arguments. I've added 2 arguments which is already an if/elseif/elseif/else. I'm afraid if I add many more of the arguments readability of the code will greatly suffer. Is my method best practice?
add_scores <- function(data,
study = NULL,
therapeutic_area = NULL ){
if (is_null(study) & is_null(therapeutic_area)){
temp <- data
} else if (!is_null(study) & is_null(therapeutic_area)){
temp <- data %>%
filter(BC == study)
} else if (is_null(study) & !is_null(therapeutic_area)) {
temp <- data %>%
filter(PPDDIVISIONPRI == therapeutic_area)
} else {
temp <- data %>%
filter(
BC == study &
PPDDIVISIONPRI == therapeutic_area)
}
return(
temp %>%
mutate(ENROLLMENTRANK = dense_rank(desc(ENROLLMENTRATE)),
CYCLETIMERANK = dense_rank(CYCLETIME)*2,
TOTALRANK = dense_rank(ENROLLMENTRANK + CYCLETIMERANK)
) %>%
arrange(TOTALRANK, ENROLLMENTRANK, CYCLETIMERANK)
)
}
Edited:
In your specific issue, you can separate out the if tests:
if(!is.null(study)) data <- filter(data, BC==study)
if(!is.null(therapeutic_area)) data <- filter(data, PPDDIVISIONPRI==therapeutic_area)
Otherwise, as you point out, the number of permutations will rapidly increase!

How to write if/else statements if dataframe is empty in R

I am trying to do the following:
If there is nothing in the dataframe, print "no_match".
If there is something, bind it to the ID of dataframe df2:
if(df == []){
print("nomatch")
}else{
cbind(df, df2$id2)
}
You could get the information about the dimensions of your data frame via dim. For example running the code:
data(mtcars)
dim(mtcars)
will show you the dimensions:
[1] 32 11
For a NULL object you would get:
mtcars <- NULL
dim(mtcars)
NULL
dim is quite flexible as in case of a data.frame with no rows:
mtcars <- mtcars[-c(1:dim(mtcars)[1]),]
you will get
> dim(mtcars)
[1] 0 11
IF statements
Constructing if statements is very simple, depening on what you want to check you can do
Object is NULL
*The object is NULL, no rows and no columns.
if (dim(df) == NULL) {
}
No rows
This data frame has columns but no observations.
if (dim(df)[1] == 0) {
}
No columns
*The object is still of class data.frame but has no data.
if (dim(df)[2] == 0) {
}
You would construct the object like that (if of interest):
data(mtcars)
mtcars <- mtcars[,-c(1:dim(mtcars)[2])]
Naturally, you can combine conditions to check for both or one event of data frame being empty.
It depends, is your data.frame actually empty or are all the elements something you consider empty.
If the data.frame is empty you can use nrow as a simple check.
tmp <- data.frame(A = numeric())
nrow(tmp)
[1] 0
if(nrow(tmp) == 0){
print("data.frame is empty")
}else{
print("data.frame contains data")
}
EDIT - OP asks about object existence
You can check if an object has been defined with exists
exists("tmp2")
[1] FALSE
exists("tmp")
[1] TRUE
Is max(dim(df)) == 0 doing the trick?
if (max(dim(df)) == 0) {
print("nomatch")
} else {
cbind(df, df2$id2)
}

Delete data frame column within function

I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)

Resources