Covert R dataframe to timeseries - r

I am new to ML/timeseries so not sure if this question is very basic.
Have the following dataframe:
week 1,1,1,1,,2,2,2,2,2,2,2,2,3,3,3,3,4,4...(1 - 145 weeks) numOrder 120,110,100.....
There is no set frequency i.e the number of records for each week can be same of different
how do I convert this dataframe to timeseries object
A simple tm=ts(dataframe name) give "mts","ts",matrix with week as column1 and numOrder as column 2. but a plot of plot(tm[,2]) gives a time series but x axis does not show time as weeks (1,2,3)
please guide how to convert this dataframe to timeseries object

Related

Changing period of dates to standard date to do line graph

I'm trying to plot a line graph with R using the dataset that can be found here . I'm looking specifically at how to plot the number of cases in each region i.e. north east, north west etc against the period of time.
However, as the date is a period of a week rather than a standard date, how can I convert it to make the line graph actually possible? For example, right now it has the dates as 01/09/2020 - 07/09/2020. How can I use this for a line graph?
Sorry if my explanation isn't clear, here is a picture below.
I assume you're trying to plot a time series? You could just trim the dates to the beginning of the week and label the time axis as "Week beginning on date". You could do this with substr() in base r and keep the first 10 characters.
substr(data$column,1,10)
You may also want to format it as a date, easiest with the lubridate package, something like dmy() (day month year).
Here is the full code you would want:
library(tidyverse)
#Read in data
data <- read.csv("/Users/sabrinaxie/Downloads/covid19casesbysociodemographiccharacteristicengland1sep2020to10dec20213.csv")
#Modify data and remove extraneous top rows
data <- data %>%
rename(Period=Table.9..Weekly.estimates.of.age.standardised.COVID.19.case.rates..per.100.000.person.weeks..by.region..England..1.September.2020.to.6.December.20211.2.3) %>%
slice(3:n())
#Keep first 10 characters of Period column and assign to old column to replace
data$Period <- substr(data$Period,1,10)
#Parse as date
data$Period <- dmy(data$Period)

Assign column variables by date (R)

I hope we're all doing great
I have several decades of daily rainfall data from several monitoring stations. The data all beings at separate dates. I have combined them into a single data frame with the date in the first column, with the rainfall depth in the second column. I want to sort the variable 'Total' by the variable: 'Date and time' (please see the links below)
ms1 <- read.csv('ms1.csv')
ms2 <- read.csv('ms2.csv')
etc.etc
df <- merge(ms1, ms2 etc. etc, by = "Date and Time")
The problem is that the range of dates would differ for each monitoring station (csv file). There may also missing dates in a range. Is there a way around this?
Would I have to create a separate vector with the greatest possible date range? Or would it automatically detect the earliest start date from the imported data.
for monitoring station 1 (ms1)
for monitoring station 2 (ms2)
Note: the data continues to the current date

Difficulty in generating time series in R for my data set

So I am trying to generate time series for my dataset in R but finding difficulty in doing so. My dataset has two columns- one for date and other for price of a material. Now there are many dates which don't have price and hence are not in the dataset. Data is roughly for a year. NOw i am finding difficulty in setting the frequency and start time for the time series. Is there any way to set the start as per the dataset and time series automatically incorporates the missing data points.
the following is for a dataframe df with two columns called "date" and "price".
This will create missing dates and fill the missing prices for those dates as the preceding price. You can change fill('price') to fill with other specified values.
library(tidyverse)
df<-df %>%
complete(date = seq.Date(min(date), max(date), by="day")) %>%
fill('price')

ggplot x axis order days

I have a dataset currently sorted by date and time. I have a column called 'day' and is just the day of the month, in numerical form i.e. 1-31
I have a 14 days stretch that I want to plot, however it starts from 30th of one month, to the 13th of the next.
When I try to plot it, it orders 1-13,30,31.
How can I plot the x axis as it is found within the dataframe?
Thanks.
Make sample data with columns day and value.
df<-data.frame(day=c(30,31,1,2,3,4,5,6,7,8),value=rnorm(10))
If column day contains just day values as numbers you can convert them to factor and set levels as original order of values.
ggplot(df,aes(factor(day,levels=df$day),value,group=1))+geom_line()

R - fill in values for all dates

I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.

Resources