Changing theformat from factor to date in R - r

I'm still a newbie and need some help with my dataset in R.
I have a dataset containing daily observations for weekdays. In this dataset I want to add the dates for the missing weekends and change the format of the date to "2010-03-04".
The dataset looks as follows:
Date Price
2392 Mar 04, 2010 1,132.60
2393 Mar 03, 2010 1,142.70
2394 Mar 02, 2010 1,136.90
2395 Mar 01, 2010 1,117.80
2396 Feb 26, 2010 1,118.30
2397 Feb 25, 2010 1,107.80
2398 Feb 24, 2010 1,096.50
I use the following to change the format:
as.Date(gold_future$Date, format = '%b %d, %Y')
Date Price
2392 <NA> 1,132.60
2393 <NA> 1,142.70
2394 <NA> 1,136.90
2395 <NA> 1,117.80
2396 2010-02-26 1,118.30
2397 2010-02-25 1,107.80
2398 2010-02-24 1,096.50
What happens is that some dates are changed to the correct format but for others I get NA's. Furthermore after formatting the "Date" column I would like to add additional rows for the missing weekends. Any suggestions how I can solve the problem with the dates and include the missing rows? Btw the date column is of class factor.
Thanks in advance!

An option would be
library(dplyr)
library(lubridate)
df1 %>%
mutate(Date = mdy(Date))
# Date Price
#1 2010-03-04 1,132.60
#2 2010-03-03 1,142.70
#3 2010-03-02 1,136.90
#4 2010-03-01 1,117.80
#5 2010-02-26 1,118.30
#6 2010-02-25 1,107.80
#7 2010-02-24 1,096.50
The as.Date is also working
as.Date(df1$Date, "%b %d, %Y")
#[1] "2010-03-04" "2010-03-03" "2010-03-02" "2010-03-01" "2010-02-26" "2010-02-25" "2010-02-24"
data
df1 <- structure(list(Date = c("Mar 04, 2010", "Mar 03, 2010", "Mar 02, 2010",
"Mar 01, 2010", "Feb 26, 2010", "Feb 25, 2010", "Feb 24, 2010"
), Price = c("1,132.60", "1,142.70", "1,136.90", "1,117.80",
"1,118.30", "1,107.80", "1,096.50")), class = "data.frame",
row.names = c("2392",
"2393", "2394", "2395", "2396", "2397", "2398"))

Related

How can I change date string to POSIXct format?

I am trying to convert these dates in created_at column to the number of seconds column created_at_dt using POSIXct.
created_at
<chr>
Fri May 26 17:30:01 +0000 2017
Fri May 26 17:30:05 +0000 2017
Fri May 26 17:30:05 +0000 2017
Fri May 26 17:30:04 +0000 2017
Fri May 26 17:30:12 +0000 2017
Example of what i want to achieve:
created_at_dt
<dbl>
1495819801
1495819805
1495819805
1495819804
1495819812
I tried the following line but got only NA values introduced.
tweets <- tweets %>%
mutate(created_at_dt = asPOSIXct(as.numeric('created_at')))
Any help would be much appreciated. Thank you!
You just need to specify the correct format string for as.POSIXct.
Also, created_at should not be in quotes for mutate().
library(dplyr)
tweets <- tweets %>%
mutate(created_at_dt = as.POSIXct(created_at,
format = "%a %B %d %H:%M:%S %z %Y") %>%
as.numeric())
Result:
created_at created_at_dt
1 Fri May 26 17:30:01 +0000 2017 1495819801
2 Fri May 26 17:30:05 +0000 2017 1495819805
3 Fri May 26 17:30:05 +0000 2017 1495819805
4 Fri May 26 17:30:04 +0000 2017 1495819804
5 Fri May 26 17:30:12 +0000 2017 1495819812
The data:
tweets <- structure(list(created_at = c("Fri May 26 17:30:01 +0000 2017",
"Fri May 26 17:30:05 +0000 2017", "Fri May 26 17:30:05 +0000 2017",
"Fri May 26 17:30:04 +0000 2017", "Fri May 26 17:30:12 +0000 2017"
)), class = "data.frame", row.names = c(NA, -5L))

Converting irregular dates in R and the Tidyverse

I have a series of dates as follows
25 September 2019
27 April 2020
1994
28 February 2021
1986
Now I want to convert the 1994 and 1996 to:
01 January 1994
01 January 1986
Other full dates should be left as they are.
Any help is appreciated especially using the tidyverse way.
A regex solution, which identifies the "only-year" values using the anchors ^ (for string start position) and $ (for string end position) as well as backreference \\1 to recollect the "only-year" values:
library(dplyr)
df %>%
mutate(dates = sub("^(\\d{4})$", "01 January \\1", dates))
dates
1 25 September 2019
2 27 April 2020
3 01 January 1994
4 28 February 2021
5 01 January 1986
base R:
df$dates <- sub("^(\\d{4})$", "01 January \\1", df$dates)
Data:
df <- data.frame(
dates = c("25 September 2019",
"27 April 2020",
"1994",
"28 February 2021",
"1986")
)
Given some vector d of dates and years:
> d
[1] "25 September 2019" "27 April 2020" "1994"
[4] "28 February 2021" "1986"
Replace any entries with only 4 letters with those four letters with "01 January" pasted in front:
> d[nchar(d)==4] = paste0("01 January ",d[nchar(d)==4])
Giving:
> d
[1] "25 September 2019" "27 April 2020" "01 January 1994"
[4] "28 February 2021" "01 January 1986"

As.Date returns error when applied to column

I have a dataset with about 20000 observations. I need to convert one of the columns to a different date format.
head(df$created_at)
[1] Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020
[3] Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020
[5] Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019
I can apply as.date to an individual row:
as.Date(df$created_at[1], format = '%a %b %d %H:%M:%S %z %Y')
[1] "2020-03-31
But when I try to use as.Date on the entire column, I get:
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Error in strptime(x, format, tz = "GMT") : input string is too long
What am I doing wrong? Is there another command I'm missing here?
(Too long for a comment.)
It works fine for the data you've shown us. There must be something wrong later in your column. You could locate the problem by trying the command on subsets of your data, e.g. tmp <- as.Date(df[1:(round(nrow(df)/2)), "created_at", ...) - then bisect to find the problem, e.g. if the problem doesn't occur in the first half of the data set then try rows 1:(round(0.75*nrow(df))) and so on ...
You could also try plotting nchar(df$created_at) to see if anything pops out.
df <- data.frame(created_at=c(
"Tue Mar 31 13:42:58 +0000 2020 Sat Mar 14 05:15:56 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020 Tue Mar 24 09:06:12 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020 Thu Oct 24 18:47:10 +0000 2019"))
df$dates = as.Date(df$created_at, format = '%a %b %d %H:%M:%S %z %Y')
Absent issues with your data as alluded to by Ben, here is a solution using parse_date_time from the lubridate package which parses the date variable into POSIXct date-time.
df <- tibble(date = c("Tue Mar 31 13:42:58 +0000 2020",
"Sun Apr 05 14:02:10 +0000 2020",
"Tue Apr 28 01:14:28 +0000 2020",
"Sat Mar 14 05:15:56 +0000 2020",
"Tue Mar 24 09:06:12 +0000 2020",
"Thu Oct 24 18:47:10 +0000 2019"))
library(lubridate)
df$date <- parse_date_time(df$date, "%a %b %d %H:%M:%S %z %Y")
date
<dttm>
1 2020-03-31 13:42:58
2 2020-04-05 14:02:10
3 2020-04-28 01:14:28
4 2020-03-14 05:15:56
5 2020-03-24 09:06:12
6 2019-10-24 18:47:10
Created on 2020-11-13 by the reprex package (v0.3.0)

How to transform the date using R

Currently, I have a lot of data. Associated with the data, I also have dates. Unfortunately, the dates are in the following format (day (Monday-Sunday), month (January-December) date (1-31) Hour:Minute:Second timezone Year). I would like to convert this into just Month/Day(1-31)/Year. Following is the sample data.
created_data
Sat Jun 20 23:45:03 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:06 +0000 2015
Sat Jun 20 23:45:08 +0000 2015
Sat Jun 20 23:45:11 +0000 2015
Sat Jun 20 23:45:13 +0000 2015
Sat Jun 20 23:45:14 +0000 2015
Sat Jun 20 23:45:15 +0000 2015
This is currently in the form of a dataframe. The format in which I am trying to see the dataframe is the following:
Results
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Jun 20 2015
Following is the code that I have tried but the result was just NA
strptime(x = created_data, format = "%m/%d/%Y")
Result = NA
First you have to convert your character string to something that R knows how to deal with such as a POSIXct object.
Given your format you can do as.POSIXct(created_data), format="%a %b %d %X %z %Y")
Once it is in that format you can convert it back to a character string of the format you want using format such as...
format(as.POSIXct(created_data, format="%a %b %d %X %z %Y"), format = "%Y/%m/%d")
The following should work, assuming the datetimes are stored in a character vector.
library("stringr")
library("dplyr")
dates <- c("Sat Jun 20 23:45:03 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:06 +0000 2015",
"Sat Jun 20 23:45:08 +0000 2015",
"Sat Jun 20 23:45:11 +0000 2015",
"Sat Jun 20 23:45:13 +0000 2015",
"Sat Jun 20 23:45:14 +0000 2015",
"Sat Jun 20 23:45:15 +0000 2015")
str_split_fixed(dates, pattern = " ", n=6) %>%
as.data.frame() %>%
mutate(new.date = as.Date(paste(V2, V3, V6), format = "%b %d %Y"))
The basic idea being to split the string into its individual pieces using str_split_fixed(), then recombine the pieces in as.Date()
Just a base R solution without other packages.
x <- "Sat Jun 20 23:45:03 +0000 2015"
x1 <- format(strptime(x, "%a %b %d %H:%M:%S %z %Y", tz = "GMT"), "%b %d %Y")
x1
[1] "Jun 20 2015"

Counting number of month between two dates whose class is yearmon?

I need to create a new data frame from my original whose format given below.
MonthFrom MonthTo
Jan 2010 May 2010
Mar 2010 Jan 2012
Jan 2011 Jun 2011
Mar 2010 Jun 2010
Feb 2012 Mar 2012
Feb 2013 Feb 2013 #please note that these two months same.
The example data set above is from my data. I want to create a data frame as below.
Month NumberofMonth
Jan 5
Jan 12
Feb 1
Feb 2
Mar 16
Mar 4
So Generally,the function will count the number of months between two dates (whose class yearmon), and will assign this number to corresponding date. For example, If the number of months in first row is 5 and the MonthFrom in the first row is January, the function will assign the 5 to january. Can anyone help me please?
Given that the zoo type yearmon you're using allows for basic math manipulation and month name extraction with format(), the following should work for you (unless I've missed something in your requirements):
library(zoo)
my.df <- data.frame(
MonthFrom=as.yearmon(c("Jan 2010", "Mar 2010", "Jan 2011", "Mar 2010", "Feb 2012", "Feb 2013")),
MonthTo=as.yearmon(c("May 2010", "Jan 2012", "Jun 2011", "Jun 2010", "Mar 2012", "Feb 2013")))
print(my.df)
## MonthFrom MonthTo
## 1 Jan 2010 May 2010
## 2 Mar 2010 Jan 2012
## 3 Jan 2011 Jun 2011
## 4 Mar 2010 Jun 2010
## 5 Feb 2012 Mar 2012
## 6 Feb 2013 Feb 2013
new.df <- data.frame(
Month=format(my.df$MonthFrom, "%b"),
NumberOfMonth= (my.df$MonthTo - my.df$MonthFrom) * 12 + 1)
print(new.df)
## Month NumberOfMonth
## 1 Jan 5
## 2 Mar 23
## 3 Jan 6
## 4 Mar 4
## 5 Feb 2
## 6 Feb 1

Resources