I've got a dateframe with a lot of dates in it that were generated by the date() command in R, resembling the first dataframe below. On my computer with this version of R, the date values are formatted like this "Thu Mar 18 11:15:23 2021" - I believe this is all base R stuff.
I want to strip the weekday, the hours, minutes, and seconds away, and then transform it so that it looks like this "2021-03-18". My goal dataframe is the second dataframe below. I've tried various as.Date() or strftime functions to no avail.
df <- data.frame(date=c(date(),date()),value = c(1,2))
df <- data.frame(date =c("2021-03-18","2021-03-18"), value = c(1,2))
If you don't need strings, you can skip the strftime call and only use as.Date
df <- data.frame(
date=c(date(),date()),
value = c(1,2),
stringsAsFactors = FALSE
)
df$date <- strftime(as.Date(df$date, "%c"), "%Y-%m-%d")
https://stat.ethz.ch/R-manual/R-patched/library/base/html/strptime.html
Related
Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format.
Find the number of days elapsed since Independence as of 15th August 2018.
Need to code in R language.
DATE1 <- c("15Aug1947")
DATE2 <- c("15Aug2018")
X <- as.Date(DATE1, "%d/%m/%y") - as.Date(DATE2 , "%d/%m/%y")
print(X)
You are close, but are missing a small detail. The second argument in as.Date requires you to specify exactly in what format your dates is coming from. Right now, you are saying your date is comprised of 15/08/1947. Two things are wrong with this. Your date has no slashes and the month is not an integer but an abbreviation of the month name. The correct way to parse this date would be
> ps <- "%d%b%Y"
> DATE1 <- c("15Aug1947")
> DATE2 <- c("15Aug2018")
> X <- as.Date(DATE1, ps) - as.Date(DATE2 , ps)
>
> print(X)
Time difference of -25933 days
For more information on how to construct the string for parsing, see ?strptime.
You can use a package to parse dates automatically, such as lubridate.
The following code may help!
#Create a variable of value 15Aug1947 and 15Aug2018 in POSIX Date format
dt <- c(as.POSIXct("15Aug1947", format = "%d%b%Y"),as.POSIXct("15Aug1948", format = "%d%b%Y"))
#Finding the number of days elapsed
difftime(dt[2], dt[1], units = "days")
#Time difference of 25933 days
I have the following data
data_sample
date Sum
1 Feb 2015 3322.01
2 Mar 2015 6652.77
3 Apr 2015 3311.12
etc
I need to convert to time series for forecasting
> data <- xts(data_sample[,-1], order.by=as.Date(data_sample[,1], "%Y %m"))
Error in 1 - frac : non-numeric argument to binary operator
> data <- xts(data_sample[,-1], order.by=as.Date(data_sample[,1], "%m %Y"))
Error in 1 - frac : non-numeric argument to binary operator
> ts_ts(ts_long(data_sample))
Error in guess_time(x) :
No [time] column detected. To be explict, name time column as 'time'.
If you want to use as.Date(), you have to specify full dates.
Simply add 01 at the end of each entry.
date <- c("Feb 2015", "Mar 2015", "Apr 2015")
date <- as.Date(paste(date, "01"), format="%b %Y %d")
You can convert them back as follows,
format(date, "%b %Y")
or use as.yearmon from zoo library,
library("zoo")
as.yearmon(date)
Some examples here: Converting Date formats in R
R has multiple ways of representing time series. Since you are working with only Date and Sum, I have created a sample time series for you. I choose random dates and numbers.
Call for Packages
library(xts)
Create a Data Frame
data_sample <- data.frame(
date = as.Date(c("2012-01-01","2013-01-01","2014-01-01", )),
sum1 = c(3322.01, 6652.77, 3311.12))
head(data_sample)
Convert the date as in a format which R understands.
rdate<- as.Date(data_sample$date, "%m/%d/%y")
fix(rdate)
Plot the graph
plot(data_sample$sum1~rdate,type="l",col="red")
Execution of above code will gives below output.
Assuming data_sample is as shown reproducibly in the Note at the end, convert to a time series of class zoo using read.zoo and then either use it in that form or convert it to some other class such as xts or ts using the appropriate as.* function. Here we used yearmon class to represent the index as that directly represents year and month without day. This class will be used in zoo and xts and when converting to ts it will be converted appropriately.
library(xts) # this also loads zoo
z <- read.zoo(data_sample, FUN = as.yearmon, format = "%b %Y")
as.xts(z)
as.ts(z)
Date
It is also possible to use Date class for the index in zoo and xts but that does not work well with ts class. Using Date class implies that the distance between consecutive points varies according to the number of days per month as opposed to being a regularly spaced series so using Date for monthly data is normally not useful for forecasting.
zd <- aggregate(z, as.Date, c)
xd <- as.xts(zd)
Note
Input in reproducible form
Lines <- "date,Sum
1,Feb 2015,3322.01
2,Mar 2015,6652.77
3,Apr 2015,3311.12 "
data_sample <- read.csv(text = Lines)
air1 <- type.convert(.preformat.ts(AirPassengers))
airpassengers <- as.data.frame(air1)
View(airpassengers)
class(airpassengers)
[1] "data.frame"
It converts time series data to dataframe.
I would like to find the intersection of two dataframes based on the date column.
Previously, I have been using this command to find the intersect of a yearly date column (where the date only contained the year)
common_rows <-as.Date(intersect(df1$Date, df2$Date), origin = "1970-01-01")
But now my date column for df1 is of type date and looks like this:
1985-01-01
1985-04-01
1985-07-01
1985-10-01
My date column for df2 is also of type date and looks like this (notice the days are different)
1985-01-05
1985-04-03
1985-07-07
1985-10-01
The above command works fine when I keep the format like this (i.e year, month and day) but since my days are different and I am interested in the monthly intersection I dropped the days like this, but that produces and error when I look for the intersection:
df1$Date <- format(as.Date(df1$Date), "%Y-%m")
common_rows <-as.Date(intersect(df1$Date, df2$Date), origin = "1970-01-01")
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there a way to find the intersection of the two datasets, based on the year and month, while ignoring the day?
The problem is the as.Date() function wrapping your final output. I don't know if you can convert incomplete dates to date objects. If you are fine with simple strings then use common_rows <-intersect(df1$Date, df2$Date). Otherwise, try:
common_rows <-as.Date(paste(intersect(df1$Date, df2$Date),'-01',sep = ''), origin = "1970-01-01")
Try this:
date1 <- c('1985-01-01','1985-04-01','1985-07-01','1985-10-01')
date2 <- c('1985-01-05','1985-04-03','1985-07-07','1985-10-01')
# extract the part without date
date1 <- sapply(date1, function(j) substr(j, 1, 7))
date2 <- sapply(date2, function(j) substr(j, 1, 7))
print(intersect(date1, date2))
[1] "1985-01" "1985-04" "1985-07" "1985-10"
I got a dataset in CSV format that has two columns: Date and Value. There are hundreds of rows in the file. Date format in the file is given as YYYY-MM-DD. When I imported this dataset, the Date column got imported as a factor, so I cannot run a regression between those two variables.
I am very new to R, but I understand that lubridate can help me convert the data in the Date column. Could someone provide some suggestions on what command should I use to do so? The file name is: Test.csv.
Next time please provide some test data and show what you did. For variations see ?as.Date and ?read.csv . The following does not use any packages:
# test data
Lines <- "Date,Value
2000-01-01,12
2001-01-01,13"
# DF <- read.csv("myfile.csv")
DF <- read.csv(text = Lines)
DF$Date <- as.Date(DF$Date)
plot(Value ~ Date, DF, type = "o")
giving:
> DF
Date Value
1 2000-01-01 12
2 2001-01-01 13
Note: Since your data is a time series you might want to use a time series representation. In this case read.zoo automatically converts the first column to "Date" class:
library(zoo)
# z <- read.zoo("myfile.csv", header = TRUE, sep = ",")
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
plot(z)
I am trying to understand why R behaves differently with the "aggregate" function. I wanted to average 15m-data to hourly data. For this, I passed the 15m-data together with a pre-designed "hour" array (4 times the same date per hour, taking the original POSIXct array) to the aggregate function.
After some time, I realized that the function was behaving odd (well, probably the data was odd, but why?) when giving over the date-array with
strftime(data.15min$posix, format="%Y-%m-%d %H")
However, if I handed over the data with
cut(data.15min$posix, "1 hour")
the data was averaged correctly.
Below, a minimal example is embedded, including a sample of the data.
I would be happy to understand what I did wrong.
Thanks in advance!
d <- 3
bla <- read.table("test_daten.dat",header=TRUE,sep=",")
data.15min <- NULL
data.15min$posix <- as.POSIXct(bla$dates,tz="UTC")
data.15min$o3 <- bla$o3
hourtimes <- unique(as.POSIXct(paste(strftime(data.15min$posix, format="%Y-%m-%d %H"),":00:00",sep=""),tz="Universal"))
agg.mean <- function (xx, yy, rm.na = T)
# xx: parameter that determines the aggregation: list(xx), e.g. hour etc.
# yy: parameter that will be aggregated
{
aa <- yy
out.mean <- aggregate(aa, list(xx), FUN = mean, na.rm=rm.na)
out.mean <- out.mean[,2]
}
#############
data.o3.hour.mean <- round(agg.mean(strftime(data.15min$posix, format="%m/%d/%y %H"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Wrong
##############
data.o3.hour.mean <- round(agg.mean(cut(data.15min$posix, "1 hour"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Correct
Data:
Download data
Too long for a comment.
The reason your results look different is that aggregate(...) sorts the results by your grouping variable(s). In the first case,
strftime(data.15min$posix, format="%m/%d/%y %H")
is a character vector with poorly formatted dates (they do not sort properly). So the first row corresponds to the "date" "01/01/96 00".
In your second case,
cut(data.15min$posix, "1 hour")
generates actual POSIXct dates, which sort properly. So the first row corresponds to the date: 1995-11-04 13:00:00.
If you had used
strftime(data.15min$posix, format="%Y-%m-%d %H")
in your first case you would have gotten the same result as using cut(...)