I have a series of dates as follows
25 September 2019
27 April 2020
1994
28 February 2021
1986
Now I want to convert the 1994 and 1996 to:
01 January 1994
01 January 1986
Other full dates should be left as they are.
Any help is appreciated especially using the tidyverse way.
A regex solution, which identifies the "only-year" values using the anchors ^ (for string start position) and $ (for string end position) as well as backreference \\1 to recollect the "only-year" values:
library(dplyr)
df %>%
mutate(dates = sub("^(\\d{4})$", "01 January \\1", dates))
dates
1 25 September 2019
2 27 April 2020
3 01 January 1994
4 28 February 2021
5 01 January 1986
base R:
df$dates <- sub("^(\\d{4})$", "01 January \\1", df$dates)
Data:
df <- data.frame(
dates = c("25 September 2019",
"27 April 2020",
"1994",
"28 February 2021",
"1986")
)
Given some vector d of dates and years:
> d
[1] "25 September 2019" "27 April 2020" "1994"
[4] "28 February 2021" "1986"
Replace any entries with only 4 letters with those four letters with "01 January" pasted in front:
> d[nchar(d)==4] = paste0("01 January ",d[nchar(d)==4])
Giving:
> d
[1] "25 September 2019" "27 April 2020" "01 January 1994"
[4] "28 February 2021" "01 January 1986"
Related
I have a set of dates represented as strings that have the following format:
dates_strings = c("Monday 27 March 2017", "Friday 24 March 2017" , "Wednesday 22 March 2017", "Monday 20 March 2017" , "Wednesday 15 March 2017")
My aim is to parse these strings into date format. I have tried anytime() and something like as.Date(dates_strings, format = "%A% %d %m %Y"). I wonder whether there is a lubridate-type solution similar to dmy() that would consider the day of the week as well.
You need to use "%B" for the month name and not "%m"
dates_strings = c("Monday 27 March 2017", "Friday 24 March 2017" , "Wednesday 22 March 2017", "Monday 20 March 2017" , "Wednesday 15 March 2017")
as.Date(dates_strings, format = "%A %d %B %Y")
[1] "2017-03-27" "2017-03-24" "2017-03-22" "2017-03-20" "2017-03-15"
We can use
as.Date(dates_strings, "%a %d %B %Y")
#[1] "2017-03-27" "2017-03-24" "2017-03-22" "2017-03-20" "2017-03-15"
This question already has an answer here:
R convert string date (e.g. "October 1, 2014") to Date format
(1 answer)
Closed 4 years ago.
I have a dataframe which is about World Cup matches that include date,location,match_name etc.
In this dataframe I want to convert date column as date in format "2018-05-06"
Here is my file;
date match_name price
1 Thu Jun 14 Russia v Saudi Arabia €453.92
2 Fri Jun 15 Egypt v Uruguay €90.00
3 Tue Jun 19 Russia v Egypt €297.45
4 Wed Jun 20 Uruguay v Saudi Arabia €95.00
and here is my expectation;
date match_name price
1 2018-05-14 Russia v Saudi Arabia €453.92
2 2018-05-15 Egypt v Uruguay €90.00
3 2018-05-19 Russia v Egypt €297.45
4 2018-05-20 Uruguay v Saudi Arabia €95.00
This sure is not the easiest way to do it, But I just wanted you to have a quick answer.
library(stringr)
library(dplyr)
Data=data.frame(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20"),match_name=c("a","b","c","d"),price=c(1,2,3,4))
Data$date=as.character(Data$date)
regexp <- "[[:digit:]]+"
Data=mutate(Data,datenum=str_extract(Data$date, regexp))
Data=mutate(Data,monthnum=str_extract(Data$date, regexp))
Data=mutate(Data,monthname=str_extract(Data$date,"Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))
Data=mutate(Data,monthnum=if(Data$monthname=="Jan")
"01"
else if(Data$monthname=="Feb")
"02"
else if(Data$monthname=="Mar")
"03"
else if(Data$monthname=="Apr")
"04"
else if(Data$monthname=="May")
"05"
else if(Data$monthname=="Jun")
"06"
else if(Data$monthname=="Jul")
"07"
else if(Data$monthname=="Aug")
"08"
else if(Data$monthname=="Sep")
"09"
else if(Data$monthname=="Oct")
"10"
else if(Data$monthname=="Nov")
"11"
else if(Data$monthname=="Dec")
"12"
)
mutate(Data,Final_Date=paste0("2018-",monthnum,"-",datenum))
Resulting in
date match_name price datenum monthnum monthname Final_Date
1 Thu Jun 14 a 1 14 06 Jun 2018-06-14
2 Fri Jun 15 b 2 15 06 Jun 2018-06-15
3 Tue Jun 19 c 3 19 06 Jun 2018-06-19
4 Wed Jun 20 d 4 20 06 Jun 2018-06-20
OK, let's say you have this data.frame:
myDF <-as.data.frame(x=list(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20")))
Which constructs the following data.frame:
date
1 Thu Jun 14
2 Fri Jun 15
3 Tue Jun 19
4 Wed Jun 20
Assuming that each game is in 2018:
#for handling month abbreviations in English:
Sys.setlocale("LC_TIME", "en_US.UTF-8")
myDF$date <- as.Date(paste0(substr(myDF$date,5,10),", 2018"),format="%b %d, %Y")
The resulting myDF:
date
1 2018-06-14
2 2018-06-15
3 2018-06-19
4 2018-06-20
You can change 2018 to any year you like where necessary.
To convert a variable "date" to the format '2018-05-14', you need to perform the following function:
conv_date <- function(var, year){
var <- as.Date(paste0(var, " ", year), '%a %b %d %Y')
return(var)
}
where:
var - variable in your data table (i.e 'date')
year - the year you need
Example:
yours_df$date <- conv_date(yours_df$date, 2018)
Hi and thanks in advance,
So I'm currently trying to read a list of test dates from a text file that I want to try and plot on a graph with some test values(known as quantity).
The issue I'm having is that when the times are read from the file, they are not read correctly. When I plot them onto a graph, the values are very distorted and incorrect, and don't display anything remotely resembling a date.
Here is my code:
Frame <- read.table("....path.../Frame.txt")
Frame$Time <- as.Date(Frame$Time)
TheForecast <- naive(Frame)
plot(TheForecast, xlab="Time",ylab="Quantity",main="Stock Quantity vs Time",type='l')
I have tried all the different formats of dates in the text file that I can think of, but they all return the same issue or worse ones.
Here's what I've tried:
Time <- c("01/01/2010", "07/02/2010", "08/03/2010", "02/04/2011", "11/05/2011", "12/06/2011", "06/07/2012", "08/30/2012", "04/16/2013", "03/18/2013", "02/22/2014", "01/27/2014", "12/15/2015", "09/28/2015", "05/04/2016", "11/07/2017", "09/22/2017", "04/04/2017")
Time <- c("2010-01-01", "2010-07-02", "2010-08-03", "2011-02-04", "2011-11-05", "2011-12-06", "2012-06-07", "2012-08-30", "2013-04-16", "2013-03-18", "2014-02-22", "2014-01-27", "2015-12-15", "2015-09-28", "2016-05-04", "2017-11-07", "2017-09-22", "2017-04-04")
Time <- c("1 January 2010", "7 February 2010", "8 March 2010", "2 April 2011", "11 May 2011", "12 June 2011", "6 July 2012", "30 August 2012", "16 April 2013", "18 March 2013", "22 February 2014", "27 January 2014", "15 December 2015", "28 September 2015", "4 May 2016", "7 November 2017", "22 September 2017", "4 April 2017")
Here's the test values for the y (quantity) axis, just for reference:
Quantity <- c(5,3,8,4,0,5,2,7,4,2,6,8,4,7,8,9,4,6)
Here is an example of the file before reading:
Time Quantity
1 2010-01-01 5
2 2010-07-02 3
3 2010-08-03 8
4 2011-02-04 4
5 2011-11-05 0
6 2011-12-06 5
7 2012-06-07 2
8 2012-08-30 7
9 2013-04-16 4
10 2013-03-18 2
11 2014-02-22 6
12 2014-01-27 8
13 2015-12-15 4
14 2015-09-28 7
15 2016-05-04 8
16 2017-11-07 9
17 2017-09-22 4
18 2017-04-04 6
Thank you.
The naive function needs an object of class ts or time-series to display graphics properly.
file_path <- paste0(getwd(),"/data/Frame.txt")
Frame <- read.table(file_path, stringsAsFactors = FALSE)
Frame$Time <- as.Date(Frame$Time, format = "%Y-%m-%d")
Here is your missing step:
Frame <- ts(Frame$Quantity, start = 1, end = NROW(Frame), frequency = 1)
Then proceed with the rest and your time scale should be much more accurate:
library(forecast)
TheForecast <- naive(Frame)
plot(TheForecast, xlab="Time",ylab="Quantity",main="Stock Quantity vs Time",type='l')
The following looks OK to me:
## Your Data
df = read.table(text="Time Quantity
1 2010-01-01 5
2 2010-07-02 3
3 2010-08-03 8
4 2011-02-04 4
5 2011-11-05 0
6 2011-12-06 5
7 2012-06-07 2
8 2012-08-30 7
9 2013-04-16 4
10 2013-03-18 2
11 2014-02-22 6
12 2014-01-27 8
13 2015-12-15 4
14 2015-09-28 7
15 2016-05-04 8
16 2017-11-07 9
17 2017-09-22 4
18 2017-04-04 6",
header=TRUE, stringsAsFactors=FALSE)
## Convert string to date
df$Time = as.Date(df$Time, format="%Y-%m-%d")
plot(df, pch=20)
I have time-stamps in one column of my dataframe. They look like
"Tue May 14 21:57:04 +0000 2013"
I want to replace the whole timestamp with only month name. How can I do it in R? Lets say the column name is "timestamp" and dataframe name is "Df".
Below is the sample of some more entries.
"Wed Jul 10 01:30:36 +0000 2013"
"Fri Apr 20 01:46:59 +0000 2012"
"Sat Jul 07 17:56:34 +0000 2012"
"Sat Mar 16 02:12:30 +0000 2013"
"Sat Feb 16 02:29:11 +0000 2013"
I want these to look like
Jul
Apr
Jul
Mar
Feb
Your help will be highly appreciated.
Assign the source data using Akrun's string
R> dates <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
R> dates
[1] "Tue May 14 21:57:04 +0000 2013"
[2] "Wed Jul 10 01:30:36 +0000 2013"
[3] "Fri Apr 20 01:46:59 +0000 2012"
[4] "Sat Jul 07 17:56:34 +0000 2012"
[5] "Sat Mar 16 02:12:30 +0000 2013"
[6] "Sat Feb 16 02:29:11 +0000 2013"
R>
Parse using the appropriate strptime format:
R> pt <- strptime(dates, "%a %b %d %H:%M:%S +0000 %Y")
R> pt
[1] "2013-05-14 21:57:04 CDT" "2013-07-10 01:30:36 CDT"
[3] "2012-04-20 01:46:59 CDT" "2012-07-07 17:56:34 CDT"
[5] "2013-03-16 02:12:30 CDT" "2013-02-16 02:29:11 CST"
R>
Re-format just the desired month
R> strftime(pt, "%m")
[1] "05" "07" "04" "07" "03" "02"
R> strftime(pt, "%b")
[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
R> strftime(pt, "%B")
[1] "May" "July" "April" "July" "March"
[6] "February"
R>
You can use strptime along with format.
Assuming you have characters, we can first convert it into "POSIXlt" "POSIXt" format and then extracting the month (%b) part of it
format(strptime(x, "%a %b %d %H:%M:%S +0000 %Y"), "%b")
#[1] "Jul" "Apr" "Jul" "Mar" "Feb"
We can use sub. Match one or more non-white space characters(\\S+) followed by one or more white space (\\s+), then capture the non-white space as a group ((\\S+)) followed by characters until the end of the string and replace it with the backreference (\\1) for the captured group.
sub("\\S+\\s+(\\S+).*", "\\1", v1)
#[1] "May" "Jul" "Apr" "Jul" "Mar" "Feb"
It may be better to use DateTime conversions (as #DirkEddelbuettel mentioned in the comments) if we know how to get the format correct.
data
v1 <- c("Tue May 14 21:57:04 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Sat Feb 16 02:29:11 +0000 2013")
Assuming your timestamp is text:
df<-data.frame(timestamp=c("Tue May 14 21:57:04 +0000 2013",
"Fri Apr 20 01:46:59 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013"),stringsAsFactors = F)
df$month<-sapply(df$timestamp,function(sx)strsplit(sx,split=" ")[[1]][2])
df
> df
timestamp month
1 Tue May 14 21:57:04 +0000 2013 May
2 Fri Apr 20 01:46:59 +0000 2012 Apr
3 Sat Mar 16 02:12:30 +0000 2013 Mar
1) The month name is always in character positions 5 through 7 inclusive of the timestamp column so this replaces the timestampcolumn with a character solumn of months:
transform(DF, timestamp = format(substr(timestamp, 5, 7)))
The output is:
timestamp
1 Jul
2 Apr
3 Jul
4 Mar
5 Feb
2) If you wanted a factor column instead then use this variation which ensures that the factor levels are Jan=1, Feb=2, etc. rather than being assigned alphabetically:
transform(DF, timestamp = factor(substr(timestamp, 5, 7), levels = month.abb))
Note: We have assumed input in the following reproducible form:
DF <- data.frame(timestamp = c("Fri Apr 20 01:46:59 +0000 2012",
"Sat Feb 16 02:29:11 +0000 2013", "Sat Jul 07 17:56:34 +0000 2012",
"Sat Mar 16 02:12:30 +0000 2013", "Wed Jul 10 01:30:36 +0000 2013"))
I need to create a new data frame from my original whose format given below.
MonthFrom MonthTo
Jan 2010 May 2010
Mar 2010 Jan 2012
Jan 2011 Jun 2011
Mar 2010 Jun 2010
Feb 2012 Mar 2012
Feb 2013 Feb 2013 #please note that these two months same.
The example data set above is from my data. I want to create a data frame as below.
Month NumberofMonth
Jan 5
Jan 12
Feb 1
Feb 2
Mar 16
Mar 4
So Generally,the function will count the number of months between two dates (whose class yearmon), and will assign this number to corresponding date. For example, If the number of months in first row is 5 and the MonthFrom in the first row is January, the function will assign the 5 to january. Can anyone help me please?
Given that the zoo type yearmon you're using allows for basic math manipulation and month name extraction with format(), the following should work for you (unless I've missed something in your requirements):
library(zoo)
my.df <- data.frame(
MonthFrom=as.yearmon(c("Jan 2010", "Mar 2010", "Jan 2011", "Mar 2010", "Feb 2012", "Feb 2013")),
MonthTo=as.yearmon(c("May 2010", "Jan 2012", "Jun 2011", "Jun 2010", "Mar 2012", "Feb 2013")))
print(my.df)
## MonthFrom MonthTo
## 1 Jan 2010 May 2010
## 2 Mar 2010 Jan 2012
## 3 Jan 2011 Jun 2011
## 4 Mar 2010 Jun 2010
## 5 Feb 2012 Mar 2012
## 6 Feb 2013 Feb 2013
new.df <- data.frame(
Month=format(my.df$MonthFrom, "%b"),
NumberOfMonth= (my.df$MonthTo - my.df$MonthFrom) * 12 + 1)
print(new.df)
## Month NumberOfMonth
## 1 Jan 5
## 2 Mar 23
## 3 Jan 6
## 4 Mar 4
## 5 Feb 2
## 6 Feb 1