How to set datetime index frequency in workdays time series pandas - datetime

When dealing with a time series pandas dataframe with date index (working days from Monday to Friday), I tried to change the index frequency from None to 'D' for daily time series. I got the following error:
ValueError: Inferred frequency None from passed values does not conform to passed frequency D
This is how my dataframe looks like:
And this is my code for setting the frequency:
df.index.freq = 'D'

df = df.asfreq('B')
'B' stands for business day frequency (Mon-Fri).

Related

Time Series Changing Values

so I wanted to forecast month-over-month increases for four columns to the end of the year; however, upon creating my dataset through ts, it removed the value of my imported dataset. Is there a reason for this that I can avoid? Or should it have come out in such a manner.
Month - 2022-03-01, 2022-04-01,2022-05-01,2022-06-01,2022-07-01
Visits- 71893, 40683,32455,34898,49834
Revenue- 87036,23846,34575,39732,45632
Orders- 3488,6578,4345,5644,6543
Conversion Rate- .35%,.33%,.43%,.39%
However, it is returning the following below: does this have an actual meaning? Or is the month column causing this?
Month - 2022-03-01, 2022-04-01,2022-05-01,2022-06-01,2022-07-01
Visits- 5,1,2,3,4
Revenue- 5,3,4,1,2
Orders- 1,2,3,4,5
Conversion Rate- 1,2,3,4,5

How to create intervals of 1 hour

How to create for every date hourly timestamps?
So for example from 00:00 til 23:59. The result of the function could be 10:00. I read on the internet that loop could work but we couldn't make it fit.
Data sample:
df = data.frame( id = c(1, 2, 3, 4), Date = c(2021-04-18, 2021-04-19, 2021-04-21
07:07:08.000, 2021-04-22))
A few points:
The input shown in the question is not valid R syntax so we assume what we have is the data frame shown reproducibly in the Note at the end.
the question did not describe the specific output desired so we will assume that what is wanted is a POSIXct vector of hourly values which in (1) below we assume is from the first hour of the minimum date to the last hour of the maximum date in the current time zone or in (2) below we assume that we only want hourly sequences for the dates in df also in the current time zone.
we assume that any times in the input should be dropped.
we assume that the id column of the input should be ignored.
No packages are used.
1) This calculates hour 0 of the first date and hour 0 of the day after the last date giving rng. The as.Date takes the Date part, range extracts out the smallest and largest dates into a vector of two components, adding 0:1 adds 0 to the first date leaving it as is and 1 to the second date converting it to the date after the last date. The format ensures that the Dates are converted to POSIXct in the current time zone rather than UTC. Then it creates an hourly sequence from those and uses head to drop the last value since it would be the day after the input's last date.
rng <- as.POSIXct(format(range(as.Date(df$Date)) + 0:1))
head(seq(rng[1], rng[2], "hour"), -1)
2) Another possibility is to paste together each date with each hour from 0 to 23 and then convert that to POSIXct. This will give the same result if the input dates are sequential; otherwise, it will give the hours only for those dates provided.
with(expand.grid(Date = as.Date(df$Date), hour = paste0(0:23, ":00:00")),
sort(as.POSIXct(paste(Date, hour))))
Note
df <- data.frame( id = c(1, 2, 3, 4),
Date = c("2021-04-18", "2021-04-19", "2021-04-21 07:07:08.000", "2021-04-22"))

Covert R dataframe to timeseries

I am new to ML/timeseries so not sure if this question is very basic.
Have the following dataframe:
week 1,1,1,1,,2,2,2,2,2,2,2,2,3,3,3,3,4,4...(1 - 145 weeks) numOrder 120,110,100.....
There is no set frequency i.e the number of records for each week can be same of different
how do I convert this dataframe to timeseries object
A simple tm=ts(dataframe name) give "mts","ts",matrix with week as column1 and numOrder as column 2. but a plot of plot(tm[,2]) gives a time series but x axis does not show time as weeks (1,2,3)
please guide how to convert this dataframe to timeseries object

Using pandas groupby to fill in timeseries based up upon weekday and weekend

Thanks to whoever can help with this:
I have an annual time series at half-hour intervals. There are NaN values littered throughout. What I am looking to do is first fill in the NaN values with the average of the same day of the week within the same month.
This is what I have so far:
def fill_mean(VAH_data):
# function which replaces NaN values with the mean of a particular grouping
return VAH_data.fillna(VAH_data.mean())
VAH_data_filled =\
VAH_data_rs.groupby([lambda x: x.month, lambda x: x.weekday(), lambda x: x.hour],
group_keys=False).apply(fill_mean)
`
This fills in most but there are still gaps ranging in size. I would like to instead implement the weekday() function inside the grouping to generalize between weekdays and weekends.
I found the following thread:
in pandas how can I groupby weekday() for a datetime column?
but unsure as to how to implement it within group-by.
SOLUTION
I've figured out one way for people to look at:
#Create copy of index column to be used to group day types
VAH_data_fill1['date_temp'] = VAH_data_fill1.index
#Create separate column that indicates specific day type
VAH_data_fill1['weekday'] =\
VAH_data_fill1['date_temp'].apply(lambda x: x.weekday())
#Create a function to differentiate between weekdays and weekends
#Days are defined as: Monday = 0 to Sunday = 6
dayLog = []
def day_differentiate(VAH_data2):
if VAH_data2 < 5:
dayLog.append(1)
else:
dayLog.append(0)
#Apply differentiate function to sort weekdays and weekends
VAH_data_fill1['weekday'].apply(day_differentiate)
#Add column of logged day types
dayType = {'dayType': pd.Series(dayLog)}
dayType = pd.DataFrame(dayType)
dayType = dayType.set_index(VAH_data_fill1.index)
VAH_data_fill1 = pd.concat([VAH_data_fill1, dayType],axis=1)
VAH_data_fill2 =\
VAH_data_fill1.groupby([lambda x: x.month, 'dayType',
lambda x: x.hour],
group_keys=False).apply(fill_mean)
Cheers,
Chris

R - fill in values for all dates

I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.

Resources