I need to create a new data frame from my original whose format given below.
MonthFrom MonthTo
Jan 2010 May 2010
Mar 2010 Jan 2012
Jan 2011 Jun 2011
Mar 2010 Jun 2010
Feb 2012 Mar 2012
Feb 2013 Feb 2013 #please note that these two months same.
The example data set above is from my data. I want to create a data frame as below.
Month NumberofMonth
Jan 5
Jan 12
Feb 1
Feb 2
Mar 16
Mar 4
So Generally,the function will count the number of months between two dates (whose class yearmon), and will assign this number to corresponding date. For example, If the number of months in first row is 5 and the MonthFrom in the first row is January, the function will assign the 5 to january. Can anyone help me please?
Given that the zoo type yearmon you're using allows for basic math manipulation and month name extraction with format(), the following should work for you (unless I've missed something in your requirements):
library(zoo)
my.df <- data.frame(
MonthFrom=as.yearmon(c("Jan 2010", "Mar 2010", "Jan 2011", "Mar 2010", "Feb 2012", "Feb 2013")),
MonthTo=as.yearmon(c("May 2010", "Jan 2012", "Jun 2011", "Jun 2010", "Mar 2012", "Feb 2013")))
print(my.df)
## MonthFrom MonthTo
## 1 Jan 2010 May 2010
## 2 Mar 2010 Jan 2012
## 3 Jan 2011 Jun 2011
## 4 Mar 2010 Jun 2010
## 5 Feb 2012 Mar 2012
## 6 Feb 2013 Feb 2013
new.df <- data.frame(
Month=format(my.df$MonthFrom, "%b"),
NumberOfMonth= (my.df$MonthTo - my.df$MonthFrom) * 12 + 1)
print(new.df)
## Month NumberOfMonth
## 1 Jan 5
## 2 Mar 23
## 3 Jan 6
## 4 Mar 4
## 5 Feb 2
## 6 Feb 1
Related
I am trying to convert these dates in created_at column to the number of seconds column created_at_dt using POSIXct.
created_at
<chr>
Fri May 26 17:30:01 +0000 2017
Fri May 26 17:30:05 +0000 2017
Fri May 26 17:30:05 +0000 2017
Fri May 26 17:30:04 +0000 2017
Fri May 26 17:30:12 +0000 2017
Example of what i want to achieve:
created_at_dt
<dbl>
1495819801
1495819805
1495819805
1495819804
1495819812
I tried the following line but got only NA values introduced.
tweets <- tweets %>%
mutate(created_at_dt = asPOSIXct(as.numeric('created_at')))
Any help would be much appreciated. Thank you!
You just need to specify the correct format string for as.POSIXct.
Also, created_at should not be in quotes for mutate().
library(dplyr)
tweets <- tweets %>%
mutate(created_at_dt = as.POSIXct(created_at,
format = "%a %B %d %H:%M:%S %z %Y") %>%
as.numeric())
Result:
created_at created_at_dt
1 Fri May 26 17:30:01 +0000 2017 1495819801
2 Fri May 26 17:30:05 +0000 2017 1495819805
3 Fri May 26 17:30:05 +0000 2017 1495819805
4 Fri May 26 17:30:04 +0000 2017 1495819804
5 Fri May 26 17:30:12 +0000 2017 1495819812
The data:
tweets <- structure(list(created_at = c("Fri May 26 17:30:01 +0000 2017",
"Fri May 26 17:30:05 +0000 2017", "Fri May 26 17:30:05 +0000 2017",
"Fri May 26 17:30:04 +0000 2017", "Fri May 26 17:30:12 +0000 2017"
)), class = "data.frame", row.names = c(NA, -5L))
I have a series of dates as follows
25 September 2019
27 April 2020
1994
28 February 2021
1986
Now I want to convert the 1994 and 1996 to:
01 January 1994
01 January 1986
Other full dates should be left as they are.
Any help is appreciated especially using the tidyverse way.
A regex solution, which identifies the "only-year" values using the anchors ^ (for string start position) and $ (for string end position) as well as backreference \\1 to recollect the "only-year" values:
library(dplyr)
df %>%
mutate(dates = sub("^(\\d{4})$", "01 January \\1", dates))
dates
1 25 September 2019
2 27 April 2020
3 01 January 1994
4 28 February 2021
5 01 January 1986
base R:
df$dates <- sub("^(\\d{4})$", "01 January \\1", df$dates)
Data:
df <- data.frame(
dates = c("25 September 2019",
"27 April 2020",
"1994",
"28 February 2021",
"1986")
)
Given some vector d of dates and years:
> d
[1] "25 September 2019" "27 April 2020" "1994"
[4] "28 February 2021" "1986"
Replace any entries with only 4 letters with those four letters with "01 January" pasted in front:
> d[nchar(d)==4] = paste0("01 January ",d[nchar(d)==4])
Giving:
> d
[1] "25 September 2019" "27 April 2020" "01 January 1994"
[4] "28 February 2021" "01 January 1986"
I'm still a newbie and need some help with my dataset in R.
I have a dataset containing daily observations for weekdays. In this dataset I want to add the dates for the missing weekends and change the format of the date to "2010-03-04".
The dataset looks as follows:
Date Price
2392 Mar 04, 2010 1,132.60
2393 Mar 03, 2010 1,142.70
2394 Mar 02, 2010 1,136.90
2395 Mar 01, 2010 1,117.80
2396 Feb 26, 2010 1,118.30
2397 Feb 25, 2010 1,107.80
2398 Feb 24, 2010 1,096.50
I use the following to change the format:
as.Date(gold_future$Date, format = '%b %d, %Y')
Date Price
2392 <NA> 1,132.60
2393 <NA> 1,142.70
2394 <NA> 1,136.90
2395 <NA> 1,117.80
2396 2010-02-26 1,118.30
2397 2010-02-25 1,107.80
2398 2010-02-24 1,096.50
What happens is that some dates are changed to the correct format but for others I get NA's. Furthermore after formatting the "Date" column I would like to add additional rows for the missing weekends. Any suggestions how I can solve the problem with the dates and include the missing rows? Btw the date column is of class factor.
Thanks in advance!
An option would be
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date = mdy(Date))
# Date Price
#1 2010-03-04 1,132.60
#2 2010-03-03 1,142.70
#3 2010-03-02 1,136.90
#4 2010-03-01 1,117.80
#5 2010-02-26 1,118.30
#6 2010-02-25 1,107.80
#7 2010-02-24 1,096.50
The as.Date is also working
as.Date(df1$Date, "%b %d, %Y")
#[1] "2010-03-04" "2010-03-03" "2010-03-02" "2010-03-01" "2010-02-26" "2010-02-25" "2010-02-24"
data
df1 <- structure(list(Date = c("Mar 04, 2010", "Mar 03, 2010", "Mar 02, 2010",
"Mar 01, 2010", "Feb 26, 2010", "Feb 25, 2010", "Feb 24, 2010"
), Price = c("1,132.60", "1,142.70", "1,136.90", "1,117.80",
"1,118.30", "1,107.80", "1,096.50")), class = "data.frame",
row.names = c("2392",
"2393", "2394", "2395", "2396", "2397", "2398"))
This question already has an answer here:
R convert string date (e.g. "October 1, 2014") to Date format
(1 answer)
Closed 4 years ago.
I have a dataframe which is about World Cup matches that include date,location,match_name etc.
In this dataframe I want to convert date column as date in format "2018-05-06"
Here is my file;
date match_name price
1 Thu Jun 14 Russia v Saudi Arabia €453.92
2 Fri Jun 15 Egypt v Uruguay €90.00
3 Tue Jun 19 Russia v Egypt €297.45
4 Wed Jun 20 Uruguay v Saudi Arabia €95.00
and here is my expectation;
date match_name price
1 2018-05-14 Russia v Saudi Arabia €453.92
2 2018-05-15 Egypt v Uruguay €90.00
3 2018-05-19 Russia v Egypt €297.45
4 2018-05-20 Uruguay v Saudi Arabia €95.00
This sure is not the easiest way to do it, But I just wanted you to have a quick answer.
library(stringr)
library(dplyr)
Data=data.frame(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20"),match_name=c("a","b","c","d"),price=c(1,2,3,4))
Data$date=as.character(Data$date)
regexp <- "[[:digit:]]+"
Data=mutate(Data,datenum=str_extract(Data$date, regexp))
Data=mutate(Data,monthnum=str_extract(Data$date, regexp))
Data=mutate(Data,monthname=str_extract(Data$date,"Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))
Data=mutate(Data,monthnum=if(Data$monthname=="Jan")
"01"
else if(Data$monthname=="Feb")
"02"
else if(Data$monthname=="Mar")
"03"
else if(Data$monthname=="Apr")
"04"
else if(Data$monthname=="May")
"05"
else if(Data$monthname=="Jun")
"06"
else if(Data$monthname=="Jul")
"07"
else if(Data$monthname=="Aug")
"08"
else if(Data$monthname=="Sep")
"09"
else if(Data$monthname=="Oct")
"10"
else if(Data$monthname=="Nov")
"11"
else if(Data$monthname=="Dec")
"12"
)
mutate(Data,Final_Date=paste0("2018-",monthnum,"-",datenum))
Resulting in
date match_name price datenum monthnum monthname Final_Date
1 Thu Jun 14 a 1 14 06 Jun 2018-06-14
2 Fri Jun 15 b 2 15 06 Jun 2018-06-15
3 Tue Jun 19 c 3 19 06 Jun 2018-06-19
4 Wed Jun 20 d 4 20 06 Jun 2018-06-20
OK, let's say you have this data.frame:
myDF <-as.data.frame(x=list(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20")))
Which constructs the following data.frame:
date
1 Thu Jun 14
2 Fri Jun 15
3 Tue Jun 19
4 Wed Jun 20
Assuming that each game is in 2018:
#for handling month abbreviations in English:
Sys.setlocale("LC_TIME", "en_US.UTF-8")
myDF$date <- as.Date(paste0(substr(myDF$date,5,10),", 2018"),format="%b %d, %Y")
The resulting myDF:
date
1 2018-06-14
2 2018-06-15
3 2018-06-19
4 2018-06-20
You can change 2018 to any year you like where necessary.
To convert a variable "date" to the format '2018-05-14', you need to perform the following function:
conv_date <- function(var, year){
var <- as.Date(paste0(var, " ", year), '%a %b %d %Y')
return(var)
}
where:
var - variable in your data table (i.e 'date')
year - the year you need
Example:
yours_df$date <- conv_date(yours_df$date, 2018)
Currently, I have a lot of data. Associated with the data, I also have dates. Unfortunately, the dates are in the following format (day (Monday-Sunday), month (January-December) date (1-31) Hour:Minute:Second timezone Year). I would like to convert this into just Month/Day(1-31)/Year. Following is the sample data.
created_data
Sat Jun 20 23:45:03 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:08 +0000 2015
Sat Jun 20 23:45:11 +0000 2015
Sat Jun 20 23:45:13 +0000 2015
Sat Jun 20 23:45:14 +0000 2015
Sat Jun 20 23:45:15 +0000 2015
This is currently in the form of a dataframe. The format in which I am trying to see the dataframe is the following:
Results
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Following is the code that I have tried but the result was just NA
strptime(x = created_data, format = "%m/%d/%Y")
Result = NA
First you have to convert your character string to something that R knows how to deal with such as a POSIXct object.
Given your format you can do as.POSIXct(created_data), format="%a %b %d %X %z %Y")
Once it is in that format you can convert it back to a character string of the format you want using format such as...
format(as.POSIXct(created_data, format="%a %b %d %X %z %Y"), format = "%Y/%m/%d")
The following should work, assuming the datetimes are stored in a character vector.
library("stringr")
library("dplyr")
dates <- c("Sat Jun 20 23:45:03 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:08 +0000 2015",
"Sat Jun 20 23:45:11 +0000 2015",
"Sat Jun 20 23:45:13 +0000 2015",
"Sat Jun 20 23:45:14 +0000 2015",
"Sat Jun 20 23:45:15 +0000 2015")
str_split_fixed(dates, pattern = " ", n=6) %>%
as.data.frame() %>%
mutate(new.date = as.Date(paste(V2, V3, V6), format = "%b %d %Y"))
The basic idea being to split the string into its individual pieces using str_split_fixed(), then recombine the pieces in as.Date()
Just a base R solution without other packages.
x <- "Sat Jun 20 23:45:03 +0000 2015"
x1 <- format(strptime(x, "%a %b %d %H:%M:%S %z %Y", tz = "GMT"), "%b %d %Y")
x1
[1] "Jun 20 2015"