I have a dataframe in R, which has two variables that are dates and I need to calculate the difference in days between them. However, they are formatted as YYYYMMDD. How do I change it to a date format readable in R?
This should work
lubridate::ymd(given_date_format)
I like anydate() from the anytime package. Quick demo, with actual data:
R> set.seed(123) # be reproducible
R> data <- data.frame(inp=Sys.Date() + cumsum(runif(10)*10))
R> data$ymd <- format(data$inp, "%Y%m%d") ## as yyyymmdd
R> data$int <- as.integer(data$ymd) ## same as integer
R> library(anytime)
R> data$diff1 <- c(NA, diff(anydate(data$ymd))) # reads YMD
R> data$diff2 <- c(NA, diff(anydate(data$int))) # also reads int
R> data
inp ymd int diff1 diff2
1 2017-06-23 20170623 20170623 NA NA
2 2017-07-01 20170701 20170701 8 8
3 2017-07-05 20170705 20170705 4 4
4 2017-07-14 20170714 20170714 9 9
5 2017-07-24 20170724 20170724 10 10
6 2017-07-24 20170724 20170724 0 0
7 2017-07-29 20170729 20170729 5 5
8 2017-08-07 20170807 20170807 9 9
9 2017-08-13 20170813 20170813 6 6
10 2017-08-17 20170817 20170817 4 4
R>
Here the first column is actual dates we work from. Columns two and three are then generates to match OP's requirement: YMD, either in character or integer.
We then compute differences on them, account for the first 'lost' data point differences when we have no predecessor and show that either date format works.
Related
I have a data frame in which i have two columns date and days and i want to add date column with days and show the result in other column
data frame-1
col date is in format of mm/dd/yyyy format
date days
3/2/2019 8
3/5/2019 4
3/6/2019 4
3/21/2019 3
3/25/2019 7
and i want my output like this
date days new-date
3/2/2019 8 3/10/2019
3/5/2019 4 3/9/2019
3/6/2019 4 3/10/2019
3/21/2019 3 3/24/2019
3/25/2019 7 4/1/2019
i was trying this
as.Date("3/10/2019") +8
but i think it will work for a single value
Convert to actual Date values and then add Days. You need to specify the actual format of date (read ?strptime) while converting it to Date.
as.Date(df$date, "%m/%d/%Y") + df$days
#[1] "2019-03-10" "2019-03-09" "2019-03-10" "2019-03-24" "2019-04-01"
If you want the output back in same format, we can use format
df$new_date <- format(as.Date(df$date, "%m/%d/%Y") + df$days, "%m/%d/%Y")
df
# date days new_date
#1 3/2/2019 8 03/10/2019
#2 3/5/2019 4 03/09/2019
#3 3/6/2019 4 03/10/2019
#4 3/21/2019 3 03/24/2019
#5 3/25/2019 7 04/01/2019
If you get confused with different date format we can use lubridate to do
library(lubridate)
with(df, mdy(date) + days)
I would like to compare 2 time series by their time of day. These 2 series are from different dates (ie, 2018-08-10 for the first series and 2018-09-10 for the second series) but have the same time stamps. Is it possible to do a cbind/merge between the 2 series only taking in to acccount the timestamps but not the date of the time stamp?
Thanks
I have no idea of how your data looks like, so next time please create a reproducible exampel. I created 2 data.frames that can serve as an example. You should now that xts needs a valid timeseries object and an hms timeseries is not a valid timeseries for xts.
That being said, you can always transform an xts object into a data.frame with:
my_df <- data.frame(times = index(my_xts), coredata(my_xts))
Now for the example:
I'm showing it via dplyr, but merge will work as well if you create a hms object in each data.frame. I'm using as.hms from the hms package to create a hms object in the data.frames and join them together.
x <- Sys.time() + 1:10*60 # today
y <- x - 60*60*24 # same time yesterday
df1 <- data.frame(times = x, val1 = 1:10)
df2 <- data.frame(times = y, val2 = 10:1)
library(dplyr)
# create hms object in df1 and in df2 on the fly
df1 %>%
mutate(times2 = hms::as.hms(times)) %>%
inner_join(df2 %>% mutate(times2 = hms::as.hms(times)), by = "times2"
)
times.x val1 times2 times.y val2
1 2018-10-01 18:26:05 1 18:26:05.421764 2018-09-30 18:26:05 10
2 2018-10-01 18:27:05 2 18:27:05.421764 2018-09-30 18:27:05 9
3 2018-10-01 18:28:05 3 18:28:05.421764 2018-09-30 18:28:05 8
4 2018-10-01 18:29:05 4 18:29:05.421764 2018-09-30 18:29:05 7
5 2018-10-01 18:30:05 5 18:30:05.421764 2018-09-30 18:30:05 6
6 2018-10-01 18:31:05 6 18:31:05.421764 2018-09-30 18:31:05 5
7 2018-10-01 18:32:05 7 18:32:05.421764 2018-09-30 18:32:05 4
8 2018-10-01 18:33:05 8 18:33:05.421764 2018-09-30 18:33:05 3
9 2018-10-01 18:34:05 9 18:34:05.421764 2018-09-30 18:34:05 2
10 2018-10-01 18:35:05 10 18:35:05.421764 2018-09-30 18:35:05 1
I have a data frame with 3262 rows and 10 columns. One of the columns has date with format YY-MM-DD. I want to store all the rows with 10 different dates in a different data frame so I tried :
newdata= df[df$Date %in% as.Date(c('2011-08-05','2012-1-13','2012-03-2','2014-04-01')),]
but nothing. I thought it might need to specify again the format so I tried:
df$Date <- as.Date( as.character(df$Date), "%d-%m-%y")
newdata= df[df$Date %in% as.Date(c('2011-08-05','2012-1-13','2012-03-2','2014-04-01')),]
All I get is an empty data frame saying no data available in table. At which point I made the mistake (something stupid I guess)?
I created an example:
set.seed(1)
df=data.frame(col1=seq(1,10),
col2=seq(1,0),
Date=as.Date(floor(runif(min=15550,max=17000,n=10)),origin="1970-01-01"))
> df
col1 col2 Date
1 1 1 2013-08-17
2 2 0 2014-01-19
3 3 1 2014-11-06
4 4 0 2016-03-06
5 5 1 2013-05-17
6 6 0 2016-02-21
7 7 1 2016-04-28
8 8 0 2015-03-14
9 9 1 2015-01-27
10 10 0 2012-10-26
Using the same code you provided:
newdata= df[df$Date %in% as.Date(c('2013-08-17','2015-01-27')),]
Gives me
> newdata
col1 col2 Date
1 1 1 2013-08-17
9 9 1 2015-01-27
Are you sure that str(df$Date) shows you really have a Date format?
> str(df$Date)
Date[1:10], format: "2013-08-17" "2014-01-19" "2014-11-06" "2016-03-06" "2013-05-17" "2016-02-21" "2016-04-28" "2015-03-14" "2015-01-27" "2012-10-26"
I am working on a data frame that contains 2 columns as follows:
time frequency
2014-01-06 13
2014-01-07 30
2014-01-09 56
My issue is that I am interested in counting the days of which frequency is 0. The data is pulled using RPostgreSQL/RSQLite so there is no datetime given unless there is a value (i.e. unless frequency is at least 1). If I was interested in counting these dates that don't actually exist in the data frame, is there an easy way to go about doing it? I.E. If we consider the date range 2014-01-01 to 20-14-01-10, I would want it to count 7
My only thought was to brute force create a separate dataframe with every date (note that this is 4+ years of dates which would be an immense undertaking) and then merging the two dataframes and counting the number of NA values. I'm sure there is a more elegant solution than what I've thought of.
Thanks!
Sort by date and then look for gaps.
start <- as.Date("2014-01-01")
time <- as.Date(c("2014-01-06", "2014-01-07","2014-01-09"))
end <- as.Date("2014-01-10")
time <- sort(unique(time))
# Include start and end dates, so the missing dates are 1/1-1/5, 1/8, 1/10
d <- c(time[1]- start,
diff(time) - 1,
end - time[length(time)] )
d # [1] 5 0 1 1
sum(d) # 7 missing days
And now for which days are missing...
(gaps <- data.frame(gap_starts = c(start,time+1)[d>0],
gap_length = d[d>0]))
# gap_starts gap_length
# 1 2014-01-01 5
# 2 2014-01-08 1
# 3 2014-01-10 1
for (g in 1:nrow(gaps)){
start=gaps$gap_starts[g]
length=gaps$gap_length[g]
for(i in start:(start+length-1)){
print(as.Date(i, origin="1970-01-01"))
}
}
# [1] "2014-01-01"
# [1] "2014-01-02"
# [1] "2014-01-03"
# [1] "2014-01-04"
# [1] "2014-01-05"
# [1] "2014-01-08"
# [1] "2014-01-10"
I have a questions that might be too basic, but here it is...
I want to extract monthly data from a dataset like this:
Date Obs
1 2001-01-01 120
2 2001-01-02 100
3 2001-01-03 150
4 2001-01-04 175
5 2001-01-05 121
6 2001-01-06 100
I just want to get the rows from the data where I have a certain month(e.g. January), this works perfectly:
output=which(strftime(dataset[,1],"%m")=="01",dataset[,1])
However when I try to create a loop to go through all the months using a variable that is declared has character it doesn't work and I only get "FALSE".
value=as.character(k)
output=which(strftime(dataset[,1],"%m")==value,dataset[,1])
Do not parse dates as strings. That is too error prone. Parse dates as dates, and do logical comparisons on them.
Here is one approach, creating January to March data and sub-setting February based on a comparison:
R> output <- data.frame(date=seq(as.Date("2011-01-01"), by=7, length=10),
+ value=cumsum(runif(10)*100))
R> output
date value
1 2011-01-01 8.29916
2 2011-01-08 44.82950
3 2011-01-15 72.08662
4 2011-01-22 134.19277
5 2011-01-29 221.67744
6 2011-02-05 245.77195
7 2011-02-12 314.82081
8 2011-02-19 396.34661
9 2011-02-26 437.14286
10 2011-03-05 442.41321
R> output[ output[,"date"] >= as.Date("2011-02-01") &
+ output[,"date"] <= as.Date("2011-02-28"), ]
date value
6 2011-02-05 245.772
7 2011-02-12 314.821
8 2011-02-19 396.347
9 2011-02-26 437.143
R>
Another approach uses the xts package:
R> oo <- xts(output[,"value"], order.by=output[,"date"])
R> oo
[,1]
2011-01-01 8.29916
2011-01-08 44.82950
2011-01-15 72.08662
2011-01-22 134.19277
2011-01-29 221.67744
2011-02-05 245.77195
2011-02-12 314.82081
2011-02-19 396.34661
2011-02-26 437.14286
2011-03-05 442.41321
R> oo["2011-02-01::2011-02-28"]
[,1]
2011-02-05 245.772
2011-02-12 314.821
2011-02-19 396.347
2011-02-26 437.143
R>
as xts has convenient date parsing for the index; see the package documentation for details.
I'm assuming k is an integer in 1:12. I suspect you may be better off using abbreviated month names:
value <- month.abb[k]
output <- which(strftime(dataset[,1],"%b")==value,dataset[,1])
The reason you way isn't working is because the month number is zero-padded and "1" != "01".
You can also use dates as dates with POSIXlt()$mon
as.POSIXlt(output$date)$mon # Note that Jan = 0 and Feb=1
[1] 0 0 0 0 0 1 1 1 1 2
There are several other packages such as chron, lubridate and gdata that provide date handling functions. I found the functions in lubridate particularly intuitive and less prone to errors in my clumsy hands.