I have a set of events that each have a start and end date, but they take place over the scope of a number of months. I would like to create a table that shows the number of days in each month for this event.
I have the following example.
event_start_date <- as.Date("23/10/2012", "%d/%m/%Y")
event_end_date <- as.Date("07/02/2013", "%d/%m/%Y")
I would expect to get a table out as the following:
Oct-12 8
Nov-12 30
Dec-12 31
Jan-13 31
Feb-13 7
Does anybody know about a smart and elegant way of doing this or is creating a system of loops the only viable method?
Jochem
This is not necessarily efficient because it creates a sequence of days, but it does the job:
> library(zoo)
> table(as.yearmon(seq(event_start_date, event_end_date, "day")))
Oct 2012 Nov 2012 Dec 2012 Jan 2013 Feb 2013
9 30 31 31 7
If your time span is so large than this method is slow, you'll have to create a sequence of firsts of the months between your two (truncated) dates, take the diff, and do a little extra work for the end points.
As DjSol already pointed out in his comment, you can just subtract two dates to get the number of days:
event_start_date <- as.Date("23/10/2012", "%d/%m/%Y")
event_end_date <- as.Date("07/02/2013", "%d/%m/%Y")
as.numeric(event_end_date - event_start_date)
Is that what you want? I have the feeling that you might have more of a problem to get the start and end date in such a format so you can easily subtract them because you mention a loop. If so, however, I guess we need more details on how your actual data looks.
Related
I have a dataset for a time series spanning a couple of years with daily observations. I'm trying to smooth some clearly wrong data inserted there (for example, negative values when the variable cannot take values below zero) and what I came up with was trying to smooth it or "interpolate" it by using both the mean of the days around that observation and the mean of the same day or couple of days from previous years, as I have yearly seasonality (I'm still unsure about this part, any comment would be greatly appreciated).
So my question is whether I can easily access the same day acrosss different years.
Here's a dummy example of my data:
library(tidyverse)
library(lubridate)
date value
2016-10-01 00:00:00 28
2016-10-02 00:00:00 25
2016-10-03 00:00:00 24
2016-10-04 00:00:00 22
2016-10-05 00:00:00 -6
2016-10-06 00:00:00 26
I have that for years 2016 through 2020. So in this example I would use the dates around 2016-10-05 AND I would like to use the dates around the 5th of October from years 2017 to 2020 to kind of maintain the seasonality, but maybe this is incorrect.
I tried to use +years() from lubridate but I still have to do things manually and I would like to kind of autimatize things.
If your question is solely "whether [you] can easily access the same day [across] different years", you could do that as follows:
# say your data frame is called df
library(lubridate)
day(df$date)
This will return the day part of the date for every entry in that column of your data frame.
Edit to reply to comment from asker:
This is a very basic way to specify the day and month for which you would like to obtain the corresponding rows in your data frame:
df[day(df$dates) == 5 & month(df$dates) == 10, ]
How can I remove the first elements from a variable, especially if this variable has a special characters. For example, I have the following column:
Date
01/01/2009
01/01/2010
01/01/2011
01/01/2012
I need to have a new column like the following:
Date
2009
2010
2011
2012
As discussed in the comments, this can be achieved by converting the entry into Date format and extracting the year, for instance like this:
format(as.Date(df1$Date, format="%d/%m/%Y"),"%Y")
library(lubridate)
a=mdy(b)
year(a)
https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
http://vita.had.co.nz/papers/lubridate.pdf
When you convert your variable to Date:
date <- as.Date('10/30/2018','%m/%d/%Y')
you can then cut out the elements you want and make new variables, like year:
year <- as.numeric(format(date,'%Y'))
or month:
month <- as.numeric(format(date,'%m'))
if all your dates are the same width, you can put the dates in a vector and use substring
Date
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
substring(a,7,10) #This takes string and only keeps the characters beginning in position 7 to position 10
output
[1] "2009" "2010" "2011"
This is more advice than a specific answer, but my suggestion is to convert dates to date variables immediately, rather than keeping them as strings. This way you can use date (and time) functions on them, rather than trying to use very troublesome workarounds.
As pointed out, the lubridate package has nice extraction functions.
For some projects, I have found that piecing dates out from the start is helpful:
create year, month, day (of month) and day (of week) variables to start with.
This can simplify summaries, tables and graphs, because the extraction code is separate from the summary/table/graph code, and because if you need to change it, you don't have to roll out those changes in multiple spots.
If you are using the date package, this can be done fairly easily.
library(date)
Date <- c("01/01/2009", "01/01/2010", "01/01/2011", "01/01/2012")
Date <- as.date(Date)
Date
# [1] 1Jan2009 1Jan2010 1Jan2011 1Jan2012
date.mdy(Date)$year
# [1] 2009 2010 2011 2012
## be aware that these are now integers and thus different methods may be invoked:
str(date.mdy(Date)$year)
# int [1:4] 2009 2010 2011 2012
summary(Date)
# First Last
# "1Jan2009" "1Jan2012"
summary(date.mdy(Date)$year)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 2009 2010 2010 2010 2011 2012
For some time now, you can also only rely on the data.table package and its IDate class plus associated functions (Check ?as.IDate()).
require(data.table)
a <- c("01/01/2009", "01/01/2010" , "01/01/2011")
year(as.IDate(a, '%d/%m/%Y')) # all data.table functions
Within R, say I have a vector of some Lubridate dates:
> Date
"2012-01-01 UTC"
"2013-01-01 UTC"
Next, suppose I want to see what week number these days fall in:
> week(Date)
1
1
Lubridate is fantastic!
But wait...I'm dealing a time series with 10,000 rows of data...and the data spans 3 years.
I've been struggling with finding some way to make this happen:
> result of awesome R code here
1
54
The question: is there a succinct way to coax out a list of week numbers over multiyear periods within Lubridate? More directly, I would like the first week of the second year to be represented as the 54th week. And the first week in the third year to be represented as the 107th week, ad nauseum.
So far, I've attempted a number of hackney schemes but cannot seem to create something not fastened together with scotch tape. Any advice would be greatly appreciated. Thanks in advance.
To get the interval from a particular date to another date, you can just subtract...
If tda is your vector of dates, then
tda - min(tda)
will be the difference in seconds between them.
To get the units out in weeks:
(tda - min(tda))/eweeks(1)
To do it from a particular date:
tda - ymd(19960101)
This gives the number of days from 1996 to each value.
From there, you can divide by days per week, or seconds per week.
(tda - ymd(19960101))/eweeks(1)
To get only the integer part, and starting from January 2012:
trunc((tda - ymd(20111225))/eweeks(1))
Test data:
tda = ymd(c(20120101, 20120106, 20130101, 20130108))
Output:
1 1 53 54
Since eweeks() is now deprecated, I thought I'd add to #beroe's answer.
If tda is your date vector, you can get the week numbers with:
weeknos <- (interval(min(tda), tda) %/% weeks(1)) + 1
where %/% causes integer division. ( 5 / 3 = 1.667; 5 %/% 3 = 1)
You can do something like this :
week(dat) +53*(year(dat)-min(year(dat)))
Given you like lubridate (as do I)
year_week <- function(x,base) week(x) - week(base) + 52*(year(x) - year(base))
test <- ymd(c(20120101, 20120106, 20130101, 20130108))
year_week(test, "2012-01-01")
Giving
[1] 0 0 52 53
I am wondering how to create a subset of data in R based on a list of dates, rather than by a date range.
For example, I have the following data set data which contains 3 years of 6-minute data.
date zone month day year hour minute temp speed gust dir
1 09/06/2009 00:00 PDT 9 6 2009 0 0 62 2 15 156
2 09/06/2009 00:06 PDT 9 6 2009 0 6 62 13 16 157
I have used breeze<-subset(data, ws>=15 & wd>=247.5 & wd<=315, select=date:dir) to select the rows which meet my criteria for a sea breeze, which is fine, but what I want to do is create a subset of the days which contain those times that meet my criteria.
I have used...
as.character(breeze$date)
trimdate<-strtrim(breeze$date, 10)
breezedate<-as.Date(trimdate, "%m/%d/%Y")
breezedate<-format(breezedate, format="%m/%d/%Y")
...to extract the dates from each row that meets my criteria so I have a variable called breezedate that contains a list of the dates that I want (not the most eloquent coding to do this, I'm sure). There are about two-hundred dates in the list. What I am trying to do with the next command is in my original dataset data to create a subset which contains only those days which meet the seabreeze criteria, not just the specific times.
breezedays<-(data$date==breezedate)
I think one of my issues here is that I am comparing one value to a list of values, but I am not sure how to make it work.
Lets assume your breezedate list looks like this and data$date is simple string:
breezedate <- as.Date(c("2009-09-06", "2009-10-01"))
This is probably want you want:
breezedays <- data[as.Date(data$date, '%m/%d/%Y') %in% breezedate]
The intersect() function (docs) will allow you to compare one data frame to another and return those records that are the same.
To use, run the following:
breezedays <- intersect(data$date,breezedate) # returns into breezedays all records that are shared between data$date and breezedate
I feel like there's a pretty simple way to do this, but I'm not finding it easily...
I am working with R to extract data from a dataset and them summarize it by a number of different characteristics. One of them is the month in which an event is scheduled / has occurred. We have the exact date of the event in the database, something like this:
person_id date_visit
1 2012-05-03
2 2012-08-13
3 2012-12-12
...
I would like to use the table() function to generate a summary table that would look something like this:
Month Freq
Jan 12 1
Feb 12 2
Mar 12 1
Apr 12 3
...
My issue is this. I've read the data in and used as.Date() to convert character strings to dates. I can use format.Date() to get the dates formatted as Jan 12, Mar 12, etc. But when you use format.Date(), you end up with character strings again. This means when you apply table() to them, they come out in alphabetical order (my current set is Aug 12, Jul 12, Jun 12, Mar 12, and so forth).
I know that in SAS, you could use a format to change the appearance of a date, while preserving it as a date (so you could still do date operators on it). Can the same thing be done using R?
My plan is to build a nice data frame through a number of steps, and then (after making sure that all the dates are converted to strings, for compatibility reasons) use xtable() to make a nice LaTeX output.
Here's my code at present.
load("temp.RData")
ds$date_visit <- as.Date(ds$date_visit,format="%Y-%m-%d")
table(format.Date(safebeat_recruiting$date_baseline,format="%b %Y"))
ETA: I'd prefer to just do it in Base R if I can, but if I have to I can always use an additional package.
You could use the yearmon class from the zoo package
require("zoo")
ds <- data.frame(person_id=1:3, date_visit=c("2012-05-03", "2012-08-13", "2012-12-12"))
ds$date_visit <- as.yearmon(ds$date_visit)
ds
person_id date_visit
1 1 May 2012
2 2 Aug 2012
3 3 Dec 2012
month.abb is a constant vector in R and can be used to sort on the first three letter of the string of names for the table.
ds <- data.frame(person_id=1:3, date_visit=as.Date(c("2012-05-03", "2012-08-13", "2012-12-12")))
table(format( ds$date_visit, format="%b %Y"))
tbl <- table(format( ds$date_visit, format="%b %Y"))
tbl[order( match(substr(names(tbl), 1,3), month.abb) )]
May 2012 Aug 2012 Dec 2012
1 1 1
With additional years you would see the "May"s all together so this would be needed:
tbl[order( substr(names(tbl), 5,8), match(substr(names(tbl), 1,3), month.abb) )]