Prophet Date Format R - r

year_month amount_usd
201501 -390217.24
201502 230944.09
201503 367259.69
201504 15000.00
201505 27000.21
201506 38249.65
df <- structure(list(year_month = 201501:201506, amount_usd = c(-390217.24,
230944.09, 367259.69, 15000, 27000.21, 38249.65)), class = "data.frame", row.names = c(NA,
-6L))
I want to bring it in to DD/MM/YYYY format for usability in Prophet Forecasting code.
this is what i have tried so far.
for (loopitem in loopvec){
df2 <- subset(df, account_id==loopitem)
df3 <- df2[,c("year_month","amount_usd")]
df3$year_month <- as.Date(df3$year_month, format="YYYY-MM", origin="1/1/1970")
try <- prophet(df3, seasonality.mode = 'multiplicative')
}
Error in fit.prophet(m, df, ...) :
Dataframe must have columns 'ds' and 'y' with the dates and values respectively.

You need to paste the day number (I'm just using the first) to the year_month values, then can use the ymd() function from lubridate to convert the column to a date object.
library(dplyr)
library(lubridate)
mutate_at(df, "year_month", ~ymd(paste(., "01")))
year_month amount_usd
1 2015-01-01 -390217.24
2 2015-02-01 230944.09
3 2015-03-01 367259.69
4 2015-04-01 15000.00
5 2015-05-01 27000.21
6 2015-06-01 38249.65

Related

Converting Date to Name

I have date's in a dataframe with corresponding sampling date as presented by the sample dataframe:
Date Temp
2016-06-11 5
2017-08-19 12
2018-01-21 13
2019-04-28 7
The date column is in numeric format currently. I want to convert the numeric month (i.e. 06) into its full name (i.e. June) but am having trouble with the conversion.
I did check the converting dates to names question but was confused by the select DATENAME.
You may simply use months(). Example:
d <- transform(d, date.m=months(v))
d
# date x date.m
# 1 2020-10-01 -1.1390886 October
# 2 2020-11-01 -0.6872151 November
# 3 2020-12-01 1.0632769 December
# 4 2021-01-01 1.7351265 January
Note: If your date is not of class "date" you also need to wrap as.Date:
d <- transform(d, date.m=months(as.Date(v)))
Data:
d <- structure(list(date = structure(c(18536, 18567, 18597, 18628), class = "Date"),
x = c(-1.13908860117162, -0.687215137639502, 1.06327693201579,
1.73512650928455)), class = "data.frame", row.names = c(NA,
-4L))

Calculate the difference between to date columns of a dataframe

How can I get the difference between Date1 and Date2 columns of my dataframe?
Date1 Tfd Date2 Sex
13/08/1936 3 09/01/2013 M
25/04/1948 2 14/05/2014 M
26/01/1939 1 03/07/2015 F
13/02/1935 8 03/08/2012 F
I have tryed:
age<-apply(df[, c("Date1", "Date2")], function(x, y) difftime(strptime(y, format = "%d.%m.%Y"), strptime(x, format = "%d.%m.%Y"),units="years"))
but I get this error:
Error in strptime(y, format = "%d.%m.%Y") :
argument "y" is missing, with no default
Do you know how can I solve this?
You don't need apply here :
as.numeric(as.Date(df$Date2, "%d/%m/%Y") - as.Date(df$Date1, "%d/%m/%Y"))
#[1] 27908 24125 27917 28296
difftime does not have units as 'years'. The maximum units it has is of weeks. You can divide the week value with 52.25 to get year of use lubridate's time_length function.
Or using dplyr with difftime
library(dplyr)
library(lubridate)
df %>%
mutate_at(vars(starts_with('date')), lubridate::dmy) %>%
mutate(diff = time_length(difftime(Date2, Date1), 'years'))
# Date1 Tfd Date2 Sex diff
#1 1936-08-13 3 2013-01-09 M 76.4
#2 1948-04-25 2 2014-05-14 M 66.1
#3 1939-01-26 1 2015-07-03 F 76.4
#4 1935-02-13 8 2012-08-03 F 77.5

Harmonizing dates

I have a data frame with dates and the time in it.
Now I want to convert each date into the correct month. How can I do this?
Now it looks like this:
1 01.01.2019 00:00:20.747000
2 21.04.2019 00:00:21.362000
3 31.08.2019 00:00:21.422000
I need it in a format like this:
1 01.01.2019
2 21.04.2019
3 31.08.2019
or eventually like this:
1 January
2 April
3 August
With base R, you can do the following.
First, I wasn't sure if initial data frame was in POSIXct format. I converted it for my example.
Then you can use format to extract the month number or month name.
lubridate is a great package to use for various date manipulations as well and has month function.
df$datetime <- as.POSIXct(df$datetime, format = "%d.%m.%Y %H:%M:%OS")
df$date_only <- as.Date(df$datetime)
df$month_num <- format(df$datetime, "%m")
df$month <- format(df$datetime, "%B")
df
Output
datetime date_only month_num month
1 2019-01-01 00:00:20 2019-01-01 01 January
2 2019-04-21 00:00:21 2019-04-21 04 April
3 2019-08-31 00:00:21 2019-08-31 08 August
Data
df <- structure(list(datetime = c("01.01.2019 00:00:20.747000", "21.04.2019 00:00:21.362000",
"31.08.2019 00:00:21.422000")), class = "data.frame", row.names = c(NA,
-3L))
Try:
df$date <- lubridate::dmy_hms(df$date)
df$date <- format(df$date, "%d.%m.%Y")
data:
df: structure(list(date = c("01.01.2019", "21.04.2019", "31.08.2019"
)), row.names = c(NA, -3L), class = "data.frame")

convert quarter year to last date of quarter in R

I have an issue when I use as.Date(as.yearqtr(test[,1],format ="%qQ%Y"),frac =1), but it returns an error,and quater-year didn't change to date. The error is:
error in as.yearqtr(as.numeric(x)) (list) object cannot be coerced to type 'double'
This is my dataframe in R.
TIME VALUE
1Q2019 1
2Q2019 2
3Q2019 3
4Q2019 4
The ideal output is
TIME VALUE
2019-03-31 1
2019-06-30 2
2019-09-30 3
2019-12-31 4
We can convert to Date with zoo and get the last date of the quarter with frac. We use some RegEx to rearrange in zoo's suitable format:
df$TIME=as.Date(as.yearqtr(gsub("(\\d)(Q)(\\d{1,})","\\3 Q\\1",df$TIME)),frac = 1)
df
TIME VALUE
1 2019-03-31 1
2 2019-06-30 2
3 2019-09-30 3
4 2019-12-31 4
Data:
df <-structure(list(TIME = structure(1:4, .Label = c("1Q2019", "2Q2019",
"3Q2019", "4Q2019"), class = "factor"), VALUE = 1:4), class = "data.frame", row.names = c(NA,
-4L))
Here is a function that will return a vector of dates, given an input vector in the form of 1Q2019...
dateStrings <- c("1Q2019","2Q2019","3Q2019","4Q2019","1Q2020")
lastDayOfQuarter <- function(x){
require(lubridate)
result <- NULL
months <-c(3,6,9,12)
days <- c(31,30,30,31)
for(i in 1:length(x)) {
qtr <- as.numeric(substr(x[i],1,1))
result[i] <- mdy(paste(months[qtr],days[qtr],(substr(x[i],3,6)),sep="-"))
}
as.Date(result)
}
lastDayOfQuarter(dateStrings)
and the output:
>lastDayOfQuarter(dateStrings)
[1] "2019-03-31" "2019-06-30" "2019-09-30" "2019-12-31" "2020-03-31"
>

How to subset rows for a specific range of dates in r?

I am new to R and currently working on some rainfall data. I have two data frames named df1 and df2.
df1
Date Duration_sum
5/28/2014 110
5/31/2014 20
5/31/2014 20
6/1/2014 10
6/1/2014 110
6/3/2014 140
6/4/2014 40
6/5/2014 60
6/12/2014 10
6/14/2014 100
df2
Date PercentRemoval
6/2/2014 25.8
6/5/2014 78.58
6/6/2014 15.6
6/13/2014 70.06
I want to look up the dates from df2 in df1. For example, if the 1st date from df2 is available in df1, I want to subset rows in df1 within the range of that specific date and 3 days prior to that. If that date is not available, then just look for the previous 3 days.
In case the data for previous 3 days are not available, then it will extract as many days as available but maximum limit is 3 days prior to the specific date of df2. If none of the dates are available in df1, then that date is ignored and look for the next date in df2. Also, for example, 3 days prior to 6/6/2014 is available in df1 but we have already considered those days for 6/5/2014. So, 6/6/2014 is ignored.
The resulted data frame should look something like this:
df3
col_1 Date Duration_sum
5/31/2014 20
5/31/2014 20
6/1/2014 10
6/2/2014 6/1/2014 110
6/3/2014 140
6/4/2014 40
6/5/2014 6/5/2014 60
6/13/2014 6/12/2014 10
I have used this code:
df3 <- df1[df1$Date %in% as.Date(c(df2)),]
this code gives me the results for specific dates but not for the previous 3 days. I would really appreciate If someone can help me out with this code or some other codes. Thanks in advance.
This may be one way to do the task. If I am correctly reading your question, you want to remove any date, which does not have more than 3 days as an interval with a previous date. In this way, you can avoid the overlapping issue you mentioned in your question; you can successfully remove the 5th of June, 2014. Once you filter dates in df2, you can subset df1 for each date in the revised df2 in the lapply() part. The output is a list, and you want to assign names to each data frame in the list. Finally, you bind all data frames.
library(dplyr)
mutate(df1, Date = as.Date(Date, format = "%m/%d/%Y")) -> df1
mutate(df2, Date = as.Date(Date, format = "%m/%d/%Y")) %>%
filter(!(Date - lag(Date, default = 0) < 3)) -> df2
lapply(df2$Date, function(x){
filter(df1, between(Date, x-3, x)) -> foo
foo
}) -> temp
names(temp) <- as.character(df2$Date)
bind_rows(temp, .id = "df2.date")
# df2.date Date Duration_sum
#1 2014-06-02 2014-05-31 20
#2 2014-06-02 2014-05-31 20
#3 2014-06-02 2014-06-01 10
#4 2014-06-02 2014-06-01 110
#5 2014-06-05 2014-06-03 140
#6 2014-06-05 2014-06-04 40
#7 2014-06-05 2014-06-05 60
#8 2014-06-13 2014-06-12 10
DATA
df1 <- structure(list(Date = c("5/28/2014", "5/31/2014", "5/31/2014",
"6/1/2014", "6/1/2014", "6/3/2014", "6/4/2014", "6/5/2014", "6/12/2014",
"6/14/2014"), Duration_sum = c(110L, 20L, 20L, 10L, 110L, 140L,
40L, 60L, 10L, 100L)), .Names = c("Date", "Duration_sum"), class = "data.frame", row.names = c(NA,
-10L))
df2 <- structure(list(Date = c("6/2/2014", "6/5/2014", "6/6/2014", "6/13/2014"
), PercentRemoval = c(25.8, 78.58, 15.6, 70.06)), .Names = c("Date",
"PercentRemoval"), class = "data.frame", row.names = c(NA, -4L
))

Resources