I want to iterate over unique values in a dataframe in R, that's an extract:
for(id in unique(df$event_id)) {
df_id = df %>% filter(event_id == id)
if(!any(df_id$value == "test")) {
next
}
# function and bind_rows based on current id
segments = get_segments(df_id)
all_segments <- bind_rows(all_segments, segments)
}
I get the following error for one unique ID:
Error in if (!any(df_id$value == "test")) { : Missing value
where TRUE/FALSE is needed
The relevant row for this Error has a "NA" in the df_id$value-column. How can I skip this without an error message? I have to change the if condition somehow.
Try modifying the if statement like this to account for NA values:
if(!any(df_id$value == "test") | is.na(df_id$value)) {
next
}
Related
I have a dataframe in R as follows
df <-
as.data.frame(cbind(c(1,2,3,4,5), c(0,1,2,3,4,5),c(1,2,4,5,6)))
and I have a function in which I want the procedure to stop and display a message if the input df contains at least one 0 value. I tried the following but can't make it work properly. What is the correct if() statement I should use?
my_function <- function(df){
if (all(df == 0) == 'TRUE')
stop(paste("invalid input df"))
}
We could use %in%
my_function <- function(df) {
if(0 %in% unlist(df)) {
stop("invalid input df")
}
}
I have a data frame that contains a column with binary variables (pointed or broad). To do my calculations I need to replace them with 0 or 1. I want to write a for loop which is doing this for me.
My code:
binary_To_Number<-function(df)
{
for(i in df)
{
if(i=="pointed")
{
i<-1
}
else if(i=="broad")
{
i<-0
}
else if(is.na(i))
{
print("NA")
}
else
{
}
}
}
binary_To_Number(town$shape)
I tried to use this piece of code. My first problem with it is that I don't know how to save the results. So my code is changing the i temporarily but won't save it in the df. I know that you can create an empty storage vector to store results in it, but can I replace the variable in my df immediately?
The second problem is that my code stops and gives me an error message if it comes to an i which contains NA.
Error in if (i == "pointed") { : missing value where TRUE/FALSE needed
Is there something I can do about it or do I need to replace the NA with a placeholder first?
You can also use dplyr (ensures 0 for not pointed):
library(dplyr)
df <- df %>%
mutate(
isPointed = as.integer(tolower(shape) == 'pointed')
)
Output:
shape isPointed
1 Pointed 1
2 broad 0
3 pointed 1
The dataframe I used:
df <- data.frame(
shape = c('Pointed', 'broad', 'pointed'),
stringsAsFactors = FALSE
)
I am writing function that filters out some random data from a data table random according to an id value ,but I want it first to check if there is actual data rows in the data table first. I ended up writing an if statement and using is.null but its still for the condition but its not working and ends up accessing the else statement and then giving me an error.
The code is posted below, please help me
new.filterID <- function(DataTable,id) {
if(DataTable == is.null){
return(print("No Data Available: ",id))
} else { filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
The error I get when I run this is
Error in aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...) :
no rows to aggregate
Also below the empty data table random
Value YEAR_WW
I think you can use nrow if you just want to check if the number of lines is zero:
new.filterID <- function(DataTable,id) {
if(nrow(DataTable) == 0){
return(print("No Data Available: ",id))
} else {
filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
However, if you want to check if data.table is null you can check if there is some column on it:
new.filterID <- function(DataTable,id) {
if(nrow(DataTable) == 0 & length(names(DataTable)) == 0){
return(print("No Data Available: ",id))
} else {
filtered <- subset(DataTable, ID == id)
return(aggregate(Value ~ YEAR_WW, filtered, mean))
}
}
filteredData <- new.filterID(random, 213)
Data table has not a method to check if it is null, yet.
I am trying to do the following:
If there is nothing in the dataframe, print "no_match".
If there is something, bind it to the ID of dataframe df2:
if(df == []){
print("nomatch")
}else{
cbind(df, df2$id2)
}
You could get the information about the dimensions of your data frame via dim. For example running the code:
data(mtcars)
dim(mtcars)
will show you the dimensions:
[1] 32 11
For a NULL object you would get:
mtcars <- NULL
dim(mtcars)
NULL
dim is quite flexible as in case of a data.frame with no rows:
mtcars <- mtcars[-c(1:dim(mtcars)[1]),]
you will get
> dim(mtcars)
[1] 0 11
IF statements
Constructing if statements is very simple, depening on what you want to check you can do
Object is NULL
*The object is NULL, no rows and no columns.
if (dim(df) == NULL) {
}
No rows
This data frame has columns but no observations.
if (dim(df)[1] == 0) {
}
No columns
*The object is still of class data.frame but has no data.
if (dim(df)[2] == 0) {
}
You would construct the object like that (if of interest):
data(mtcars)
mtcars <- mtcars[,-c(1:dim(mtcars)[2])]
Naturally, you can combine conditions to check for both or one event of data frame being empty.
It depends, is your data.frame actually empty or are all the elements something you consider empty.
If the data.frame is empty you can use nrow as a simple check.
tmp <- data.frame(A = numeric())
nrow(tmp)
[1] 0
if(nrow(tmp) == 0){
print("data.frame is empty")
}else{
print("data.frame contains data")
}
EDIT - OP asks about object existence
You can check if an object has been defined with exists
exists("tmp2")
[1] FALSE
exists("tmp")
[1] TRUE
Is max(dim(df)) == 0 doing the trick?
if (max(dim(df)) == 0) {
print("nomatch")
} else {
cbind(df, df2$id2)
}
So, I've created the following code that gets me the mean of a series of rows having the same ID, so if I input pollutemean(directory,pollutant,id) i get a numeric result, however, if I input the a vector identifying more than one ID, i.e: 1:18 as here pollutemean(directory,pollutant,1:15), the code breaks. How can I have it working for both options?
CODE:
pollutemean <- function(directory,pollutant,id) {
alldata <- lapply(list.files(directory, full.names=TRUE),read.csv,header=TRUE)
alldatamerged<-do.call(rbind,alldata)
if (pollutant=="sulfate") {
allsulfatedata <- alldatamerged[c("Date","sulfate","ID")]
allsulfatedatabyid<-allsulfatedata[allsulfatedata$"ID"==id,]
completesulfatedatabyid<-na.omit(allsulfatedatabyid)
print(mean(completesulfatedatabyid$sulfate))
}
OUTPUT:
pollutemean("specdata","sulfate",8)
[1] 4.781354
pollutemean("specdata","sulfate",1:8)
[1] 4.252498
Warning message:
In allsulfatedata$ID == id :
longer object length is not a multiple of shorter object length
Try this:
allsulfatedatabyid<-allsulfatedata[allsulfatedata$ID %in% id, ]
Or:
allsulfatedatabyid<-allsulfatedata[id[which(id %in% allsulfatedata$ID)], ]