I'm trying to create a function that get all the months between two months into a list:
date1<- 201305
date2<- 201511
months <- function(date1,date2){}
And I want it return a list like this:
201305
201306
201307
...
201509
201510
201511
First, we need to create dates. What you supply is not yet a date as it misses a day--so we add one:
R> d1 <- as.Date(paste0("201305","01"), "%Y%m%d")
R> d2 <- as.Date(paste0("201511","01"), "%Y%m%d")
Given two dates, getting a sequence of dates is trivial: a call to seq(). Equally trivial to format in the way you want:
R> dat <- format(seq(d1,d2,by="month"), "%Y%m")
We check the beginning and end:
R> head(dat)
[1] "201305" "201306" "201307" "201308" "201309" "201310"
R> tail(dat)
[1] "201506" "201507" "201508" "201509" "201510" "201511"
R>
Now, as a function:
datseq <- function(t1, t2) {
format(seq(as.Date(paste0(t1,"01"), "%Y%m%d"),
as.Date(paste0(t2,"01"), "%Y%m%d"),by="month"),
"%Y%m")
}
This can be done using the yearmon class in the zoo package:
library(zoo)
ym1 <- as.yearmon(as.character(date1), "%Y%m") # convert to yearmon
ym2 <- as.yearmon(as.character(date2), "%Y%m") # ditto
s <- seq(ym1, ym2, 1/12) # create yearmon sequence
as.numeric(format(s, "%Y%m")) # convert to numeric yyyymm
giving:
[1] 201305 201306 201307 201308 201309 201310 201311 201312 201401 201402
[11] 201403 201404 201405 201406 201407 201408 201409 201410 201411 201412
[21] 201501 201502 201503 201504 201505 201506 201507 201508 201509 201510
[31] 201511
or you might prefer to use s which is a yearmon class variable which looks like this but sorts correctly and can be used in plotting:
> s
[1] "May 2013" "Jun 2013" "Jul 2013" "Aug 2013" "Sep 2013" "Oct 2013"
[7] "Nov 2013" "Dec 2013" "Jan 2014" "Feb 2014" "Mar 2014" "Apr 2014"
[13] "May 2014" "Jun 2014" "Jul 2014" "Aug 2014" "Sep 2014" "Oct 2014"
[19] "Nov 2014" "Dec 2014" "Jan 2015" "Feb 2015" "Mar 2015" "Apr 2015"
[25] "May 2015" "Jun 2015" "Jul 2015" "Aug 2015" "Sep 2015" "Oct 2015"
[31] "Nov 2015"
For example this works:
plot(seq(31) ~ s)
I know this is an old post but incase someone was having a similar issue to me I hope this can be helpful. If you're looking for the months between 2 dates, lets say
d1 = as.Date("2022-01-25")
d2 = as.Date("2022-02-13")
seq(as.Date(d1), as.Date(d2), by = "month")
[1] "2022-01-25"
It does not give back the desired output of both January and February. I noticed this is because the 'day' portion of the date needs to be the same (or less than) for the first date.
What I opted to do was mutate the first date as so:
d1 = paste0(substr(d1, 1,7), "-01")
seq(as.Date(d1), as.Date(d2), by = "month")
[1] "2022-01-01" "2022-02-01"
This way you will always get the year months that you desire but not necessarily the day portion.
Related
I have a set of dates represented as strings that have the following format:
dates_strings = c("Monday 27 March 2017", "Friday 24 March 2017" , "Wednesday 22 March 2017", "Monday 20 March 2017" , "Wednesday 15 March 2017")
My aim is to parse these strings into date format. I have tried anytime() and something like as.Date(dates_strings, format = "%A% %d %m %Y"). I wonder whether there is a lubridate-type solution similar to dmy() that would consider the day of the week as well.
You need to use "%B" for the month name and not "%m"
dates_strings = c("Monday 27 March 2017", "Friday 24 March 2017" , "Wednesday 22 March 2017", "Monday 20 March 2017" , "Wednesday 15 March 2017")
as.Date(dates_strings, format = "%A %d %B %Y")
[1] "2017-03-27" "2017-03-24" "2017-03-22" "2017-03-20" "2017-03-15"
We can use
as.Date(dates_strings, "%a %d %B %Y")
#[1] "2017-03-27" "2017-03-24" "2017-03-22" "2017-03-20" "2017-03-15"
I'm still a newbie and need some help with my dataset in R.
I have a dataset containing daily observations for weekdays. In this dataset I want to add the dates for the missing weekends and change the format of the date to "2010-03-04".
The dataset looks as follows:
Date Price
2392 Mar 04, 2010 1,132.60
2393 Mar 03, 2010 1,142.70
2394 Mar 02, 2010 1,136.90
2395 Mar 01, 2010 1,117.80
2396 Feb 26, 2010 1,118.30
2397 Feb 25, 2010 1,107.80
2398 Feb 24, 2010 1,096.50
I use the following to change the format:
as.Date(gold_future$Date, format = '%b %d, %Y')
Date Price
2392 <NA> 1,132.60
2393 <NA> 1,142.70
2394 <NA> 1,136.90
2395 <NA> 1,117.80
2396 2010-02-26 1,118.30
2397 2010-02-25 1,107.80
2398 2010-02-24 1,096.50
What happens is that some dates are changed to the correct format but for others I get NA's. Furthermore after formatting the "Date" column I would like to add additional rows for the missing weekends. Any suggestions how I can solve the problem with the dates and include the missing rows? Btw the date column is of class factor.
Thanks in advance!
An option would be
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date = mdy(Date))
# Date Price
#1 2010-03-04 1,132.60
#2 2010-03-03 1,142.70
#3 2010-03-02 1,136.90
#4 2010-03-01 1,117.80
#5 2010-02-26 1,118.30
#6 2010-02-25 1,107.80
#7 2010-02-24 1,096.50
The as.Date is also working
as.Date(df1$Date, "%b %d, %Y")
#[1] "2010-03-04" "2010-03-03" "2010-03-02" "2010-03-01" "2010-02-26" "2010-02-25" "2010-02-24"
data
df1 <- structure(list(Date = c("Mar 04, 2010", "Mar 03, 2010", "Mar 02, 2010",
"Mar 01, 2010", "Feb 26, 2010", "Feb 25, 2010", "Feb 24, 2010"
), Price = c("1,132.60", "1,142.70", "1,136.90", "1,117.80",
"1,118.30", "1,107.80", "1,096.50")), class = "data.frame",
row.names = c("2392",
"2393", "2394", "2395", "2396", "2397", "2398"))
I have a vector b of strings as
> b
[1] "Jan 01 2016 00:26:00" "Jan 01 2016 03:06:00" "Jan 01 2016 22:36:00" "Jan 01 2016 17:46:00"
[5] "Jan 01 2016 18:06:00" "Jan 01 2016 23:16:00" "Jan 01 2016 03:16:00" "Jan 01 2016 09:46:00"
[9] "Jan 01 2016 00:06:00" "Jan 01 2016 03:56:00"
I want to convert them into Date/Time object. I tried:
> as.Date(b, "%b %d %Y %H:%M:%S")
[1] "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01" "2016-01-01"
[9] "2016-01-01" "2016-01-01"
Why do I not get H:M:S? By the way, each element of vector b is stripped from string of this type "Fri Jan 01 00:26:00 UTC 2016" using substring.
A solution to convert directly from string "Fri Jan 01 00:26:00 UTC 2016" to a date/time object of the format "2016-01-01 23:59:00" would be helpful. I will use this date/time column to order the entire dataframe.
I would try using as.POSIXct() instead of as.Date() to account for the time component. Some helpful documentation here.
I have time-stamps in one column of my dataframe. They look like
"Tue May 14 21:57:04 +0000 2013"
I want to replace the whole timestamp with only month name. How can I do it in R? Lets say the column name is "timestamp" and dataframe name is "Df".
Below is the sample of some more entries.
"Wed Jul 10 01:30:36 +0000 2013"
"Fri Apr 20 01:46:59 +0000 2012"
"Sat Jul 07 17:56:34 +0000 2012"
"Sat Mar 16 02:12:30 +0000 2013"
"Sat Feb 16 02:29:11 +0000 2013"
I want these to look like
Jul
Apr
Jul
Mar
Feb
Your help will be highly appreciated.
Assign the source data using Akrun's string
R> dates <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
R> dates
[1] "Tue May 14 21:57:04 +0000 2013"
[2] "Wed Jul 10 01:30:36 +0000 2013"
[3] "Fri Apr 20 01:46:59 +0000 2012"
[4] "Sat Jul 07 17:56:34 +0000 2012"
[5] "Sat Mar 16 02:12:30 +0000 2013"
[6] "Sat Feb 16 02:29:11 +0000 2013"
R>
Parse using the appropriate strptime format:
R> pt <- strptime(dates, "%a %b %d %H:%M:%S +0000 %Y")
R> pt
[1] "2013-05-14 21:57:04 CDT" "2013-07-10 01:30:36 CDT"
[3] "2012-04-20 01:46:59 CDT" "2012-07-07 17:56:34 CDT"
[5] "2013-03-16 02:12:30 CDT" "2013-02-16 02:29:11 CST"
R>
Re-format just the desired month
R> strftime(pt, "%m")
[1] "05" "07" "04" "07" "03" "02"
R> strftime(pt, "%b")
[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
R> strftime(pt, "%B")
[1] "May" "July" "April" "July" "March"
[6] "February"
R>
You can use strptime along with format.
Assuming you have characters, we can first convert it into "POSIXlt" "POSIXt" format and then extracting the month (%b) part of it
format(strptime(x, "%a %b %d %H:%M:%S +0000 %Y"), "%b")
#[1] "Jul" "Apr" "Jul" "Mar" "Feb"
We can use sub. Match one or more non-white space characters(\\S+) followed by one or more white space (\\s+), then capture the non-white space as a group ((\\S+)) followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.
sub("\\S+\\s+(\\S+).*", "\\1", v1)
#[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
It may be better to use DateTime conversions (as #DirkEddelbuettel mentioned in the comments) if we know how to get the format correct.
data
v1 <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
Assuming your timestamp is text:
df<-data.frame(timestamp=c("Tue May 14 21:57:04 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013"),stringsAsFactors = F)
df$month<-sapply(df$timestamp,function(sx)strsplit(sx,split=" ")[[1]][2])
df
> df
timestamp month
1 Tue May 14 21:57:04 +0000 2013 May
2 Fri Apr 20 01:46:59 +0000 2012 Apr
3 Sat Mar 16 02:12:30 +0000 2013 Mar
1) The month name is always in character positions 5 through 7 inclusive of the timestamp column so this replaces the timestampcolumn with a character solumn of months:
transform(DF, timestamp = format(substr(timestamp, 5, 7)))
The output is:
timestamp
1 Jul
2 Apr
3 Jul
4 Mar
5 Feb
2) If you wanted a factor column instead then use this variation which ensures that the factor levels are Jan=1, Feb=2, etc. rather than being assigned alphabetically:
transform(DF, timestamp = factor(substr(timestamp, 5, 7), levels = month.abb))
Note: We have assumed input in the following reproducible form:
DF <- data.frame(timestamp = c("Fri Apr 20 01:46:59 +0000 2012",
"Sat Feb 16 02:29:11 +0000 2013", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013"))
I have a date field called date.timestamp in database,which has date values like "Fri Nov 27 20:17:01 IST 2015" .There are lot of records having date.timestamp field. I need to display it as Nov 2015 for all those records in database. How can I do that?
Assuming IST is the Indian Standard Time, which is 5:30 hours ahead of Coordinated Universal Time, one would substitute IST for +0530 and use the format %z of strptime.
vec <- "Fri Nov 27 20:17:01 IST 2015"
format(strptime(sub("IST", "+0530", vec), "%a %b %d %H:%M:%S %z %Y"), "%b %Y")
# [1] "Nov 2015"
vec <- c("Fri Nov 27 20:17:01 IST 2015","Mon Nov 30 20:17:01 IST 2015")
format(strptime(sub("IST", "+0530", vec), "%a %b %d %H:%M:%S %z %Y"), "%b %Y")
# [1] "Nov 2015" "Nov 2015"