creating <=> conditions with date format in R - r

I have the following data frame:
id<-c(1,2,3,4)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
in the 4th column f my table I want to divide my data to calib and valid based on date such that where date <=2007-12-16 the forth column should be calib otherwise it should be valid
I have written the following lines:
for ( i in 1:4)
if (df[i,3]<=2007-12-16)(df[i,4]="calib")else (df[i,4]="valid")
The first problem is that by executing this command all cells in the 4th column will become valid and it seems that the date condition can not be processed appropriately. so my first question is that how can I solve this problem.
the second problem is that my real data frame has 600000 rows and executing this command will take hours. I wonder if there is any way to preform this command faster and with full CPU capacity.
Thank you!

R is vectorised so you can do that in a single statement:
R> df <- within(df,state <- ifelse(date2<=as.Date("2007-12-16"),"calib","valid"))
R> df
id date date2 state
1 1 23-01-08 2008-01-23 valid
2 2 01-11-07 2007-11-01 calib
3 3 30-11-07 2007-11-30 calib
4 4 17-12-07 2007-12-17 valid
R>
If within, with, or transform seem strange, you can also do it directly:
R> df$state <- ifelse(df$date2<=as.Date("2007-12-16"),"calib","valid")
R> df
id date date2 state
1 1 23-01-08 2008-01-23 valid
2 2 01-11-07 2007-11-01 calib
3 3 30-11-07 2007-11-30 calib
4 4 17-12-07 2007-12-17 valid
R>

Related

adding two column of a data where col1 contains date and col2 contains days

I have a data frame in which i have two columns date and days and i want to add date column with days and show the result in other column
data frame-1
col date is in format of mm/dd/yyyy format
date days
3/2/2019 8
3/5/2019 4
3/6/2019 4
3/21/2019 3
3/25/2019 7
and i want my output like this
date days new-date
3/2/2019 8 3/10/2019
3/5/2019 4 3/9/2019
3/6/2019 4 3/10/2019
3/21/2019 3 3/24/2019
3/25/2019 7 4/1/2019
i was trying this
as.Date("3/10/2019") +8
but i think it will work for a single value
Convert to actual Date values and then add Days. You need to specify the actual format of date (read ?strptime) while converting it to Date.
as.Date(df$date, "%m/%d/%Y") + df$days
#[1] "2019-03-10" "2019-03-09" "2019-03-10" "2019-03-24" "2019-04-01"
If you want the output back in same format, we can use format
df$new_date <- format(as.Date(df$date, "%m/%d/%Y") + df$days, "%m/%d/%Y")
df
# date days new_date
#1 3/2/2019 8 03/10/2019
#2 3/5/2019 4 03/09/2019
#3 3/6/2019 4 03/10/2019
#4 3/21/2019 3 03/24/2019
#5 3/25/2019 7 04/01/2019
If you get confused with different date format we can use lubridate to do
library(lubridate)
with(df, mdy(date) + days)

Change NA to blank in date time in R

This might be a simple question but I have tried a few things and they're not working.
I have a large data frame with date/time formats in. An example of my data frame is:
Index FixTime1 FixTime2
1 2017-05-06 10:11:03 NA
2 NA 2017-05-07 11:03:03
I want to remove all NAs from the dataframe and make them "" (blank). I have tried:
df[is.na(df)]<-""
but this gives the error:
Error in as.POSIXlt.character(value) :
character string is not in a standard unambiguous format
Again, this is probably very simple to fix but can't find how to do this, while keeping each of these columns in time/date format
We can use replace
df[] <- replace(as.matrix(df), is.na(df), "")
df
# Index FixTime1 FixTime2
#1 1 2017-05-06 10:11:03
#2 2 2017-05-07 11:03:03
Here a possible solution on a toy dataset, adapt this code to your needs:
df<-data.frame(date=c("01/01/2017",NA,"01/02/2017"))
df
date
1 01/01/2017
2 <NA>
3 01/02/2017
From factor to character, and then remove NA
df$date <- as.character(df$date)
df[is.na(df$date),]<-""
df
date
1 01/01/2017
2
3 01/02/2017
In your specific example, this could be fine:
df_2<-data.frame(Index=c(1,2),
+ FixTime1=c("2017-05-06 10:11:03",NA),
+ FixTime2=c(NA,"2017-05-07 11:03:03"))
df_2<-data.frame(lapply(df_2, as.character), stringsAsFactors=FALSE)
df_2[is.na(df_2$FixTime1),"FixTime1"]<-""
df_2[is.na(df_2$FixTime2),"FixTime2"]<-""
df_2
Index FixTime1 FixTime2
1 1 2017-05-06 10:11:03
2 2 2017-05-07 11:03:03

Removing multiple data entries based on a total number of entries per day

I start with a data frame titled 'dat' in R that looks like the following:
datetime lat long id extra step
1 8/9/2014 13:00 31.34767 -81.39117 36 1 31.38946
2 8/9/2014 17:00 31.34767 -81.39150 36 1 11155.67502
3 8/9/2014 23:00 31.30683 -81.28433 36 1 206.33342
4 8/10/2014 5:00 31.30867 -81.28400 36 1 11152.88177
What I need to do is find out what days have less than 3 entries and remove all entries associated with those days from the original data.
I initially did this by the following:
library(plyr)
datetime<-dat$datetime
###strip the time down to only have the date no hh:mm:ss
date<- strptime(datetime, format = "%m/%d/%Y")
### bind the date to the old data
dat2<-cbind(date, dat)
### count using just the date so you can ID which days have fewer than 3 points
datecount<- count(dat2, "date")
datecount<- subset(datecount, datecount$freq < 3)
This end up producing the following:
row.names date freq
1 49 2014-09-26 1
2 50 2014-09-27 2
3 135 2014-12-21 2
Which is great, but I cannot figure out how to remove the entries from these days with less than three entries from the original 'dat' because this is a compressed version of the original data frame.
So to try and deal with this I have come up with another way of looking at the problem. I will use the strptime and cbind from above:
datetime<-dat$datetime
###strip the time down to only have the date no hh:mm:ss
date<- strptime(datetime, format = "%m/%d/%Y")
### bind the date to the old data
dat2<-cbind(date, dat)
And I will utilize the column titled "extra". I would like to create a new column which is the result of summing the values in this "extra" column by the simplified strptime dates. But find a way to apply this new value to all entries from that date, like the following:
date datetime lat long id extra extra_sum
1 2014-08-09 8/9/2014 13:00 31.34767 -81.39117 36 1 3
2 2014-08-09 8/9/2014 17:00 31.34767 -81.39150 36 1 3
3 2014-08-09 8/9/2014 23:00 31.30683 -81.28433 36 1 3
4 2014-08-10 8/10/2014 5:00 31.30867 -81.28400 36 1 4
5 2014-08-10 8/10/2014 13:00 31.34533 -81.39317 36 1 4
6 2014-08-10 8/10/2014 17:00 31.34517 -81.39317 36 1 4
7 2014-08-10 8/10/2014 23:00 31.34483 -81.39283 36 1 4
8 2014-08-11 8/11/2014 5:00 31.30600 -81.28317 36 1 2
9 2014-08-11 8/11/2014 13:00 31.34433 -81.39300 36 1 2
The code that creates the "extra_sum" column is what I am struggling with.
After creating this I can simply subset my data to all entries that have a value >2. Any help figuring out how to use my initial methodology or this new one to remove days with fewer than 3 entries from my initial data set would be much appreciated!
The plyr way.
library(plyr)
datetime <- dat$datetime
###strip the time down to only have the date no hh:mm:ss
date <- strptime(datetime, format = "%m/%d/%Y")
### bind the date to the old data
dat2 <-cbind(date, dat)
dat3 <- ddply(dat2, .(date), function(df){
if (nrow(df)>=3) {
return(df)
} else {
return(NULL)
}
})
I recommend using the data.table package
library(data.table)
dat<-data.table(dat)
dat$Date<-as.Date(as.character(dat$datetime), format = "%m/%d/%Y")
dat_sum<-dat[, .N, by = Date ]
dat_3plus<-dat_sum[N>=3]
dat<-dat[Date%in%dat_3plus$Date]

Extracting last date of the year from a date object

I have following data set:
>d
x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012
I want:
> d
x date
1 1 31-12-2013
2 2 31-12-2010
3 3 31-12-2011
4 4 31-12-2012
i.e. Last day, last month and the year of the date object.
Please Help!
You can also just use the ceiling_date function in LUBRIDATE package.
You can do something like -
library(lubridate)
last_date <- ceiling_date(date,"year") - days(1)
ceiling_date(date,"year") gives you the first date of the next year and to get the last date of the current year, you subtract this by 1 or days(1).
Hope this helps.
Another option using lubridate package:
## using d from Roland answer
transform(d,last =dmy(paste0('3112',year(dmy(date)))))
x date last
1 1 1-3-2013 2013-12-31
2 2 2-4-2010 2010-12-31
3 3 2-5-2011 2011-12-31
4 4 1-6-2012 2012-12-31
d <- read.table(text="x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012", header=TRUE)
d$date <- as.Date(d$date, "%d-%m-%Y")
d$date <- as.POSIXlt(d$date)
d$date$mon <- 11
d$date$mday <- 31
d$date <- as.Date(d$date)
# x date
#1 1 2013-12-31
#2 2 2010-12-31
#3 3 2011-12-31
#4 4 2012-12-31
1) cut.Date Define cut_year to give the first day of the year. Adding 366 gets us to the next year and then applying cut_year again gets us to the first day of the next year. Finally subtract 1 to get the last day of the year. The code uses base functionality only.
cut_year <- function(x) as.Date(cut(as.Date(x), "year"))
transform(d, date = cut_year(cut_year(date) + 366) - 1)
2) format
transform(d, date = as.Date(format(as.Date(date), "%Y-12-31")))
3) zoo A "yearmon" class variable stores the date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec. Thus taking its floor and adding 11/12 gets one to Dec and as.Date.yearmon(..., frac = 1) uses the last of the month instead of the first.
library(zoo)
transform(d, date = as.Date(floor(as.yearmon(as.Date(date))) + 11 / 12, frac = 1))
Note: The inner as.Date in cut_year and in the other two solutions can be omitted if it is known that date is already of "Date" class.
ADDED additional solutions.

Finding the minimum value of a variable for each value of another variable in R

I'm fairly new to R and am trying to find the minimum date/time for each value of an ID number.
Below is an example of the data I'm working with
ID DATE
1 11/24/12 12:51
1 11/24/12 12:52
1 11/24/12 12:53
2 11/27/12 12:51
2 11/24/12 12:52
2 11/24/12 12:53
What I need to do is generate an object that shows the earliest date/time for each value of ID like so:
ID DATE
1 11/24/12 12:51
2 11/27/12 12:51
I've tried several approaches but am still struggling.
Any suggestions would be appreciated!
Try this (as Roland suggests) using R base functions
DATE <- strptime(c("11/24/12 12:51", "11/24/12 12:52", "11/24/12 12:53",
"11/27/12 12:51", "11/24/12 12:52", "11/24/12 12:53"),
"%m/%d/%y %H:%M")
ID <- rep(1:2, each=3)
DF <- data.frame(ID, DATE)
aggregate(DATE ~ ID, min, data=DF)
ID DATE
1 1 2012-11-24 12:51:00
2 2 2012-11-24 12:52:00

Resources