Define the week as 6-days in length r - r

I want to create a time series with date and quantity as variables. However I always have zero observation on Sundays. Therefore I want to define the week as 6-days in length in R. Any suggestions?

Related

Time Series Changing Values

so I wanted to forecast month-over-month increases for four columns to the end of the year; however, upon creating my dataset through ts, it removed the value of my imported dataset. Is there a reason for this that I can avoid? Or should it have come out in such a manner.
Month - 2022-03-01, 2022-04-01,2022-05-01,2022-06-01,2022-07-01
Visits- 71893, 40683,32455,34898,49834
Revenue- 87036,23846,34575,39732,45632
Orders- 3488,6578,4345,5644,6543
Conversion Rate- .35%,.33%,.43%,.39%
However, it is returning the following below: does this have an actual meaning? Or is the month column causing this?
Month - 2022-03-01, 2022-04-01,2022-05-01,2022-06-01,2022-07-01
Visits- 5,1,2,3,4
Revenue- 5,3,4,1,2
Orders- 1,2,3,4,5
Conversion Rate- 1,2,3,4,5

Year-month-week expression

I have a data written in specific expression. To simplify the data, here is the example I made:
df<-data.frame(date=c(2012034,2012044,2012051,2012063,2012074),
math=c(100,100,23,46,78))
2012034 means 4th week of march,2012. Likewise 2012044 means 4th week of April,2012. I was trying to make the values of date expressing some order. The reason why I have to do this is because when I don't change them to time expressions, x axis of the scatter plot looks really weird.
My goal is this:
Find the oldest date in date column and name it as 1. In this case, 2012034 should be 1. Next, find the second oldest date in date column and calculate how many weeks passed after that date. The second oldest date in date is 2012044.So, 5 weeks after the oldest date 2012034. So it should be changed as 1+5=6. So, likewise, I want to number the date to indicate how many weeks have passed since the oldest date
One way to do it is by also specifying the day of the week and subtract it at the end, i.e.
as.Date(paste0(df$date, '-1'), '%Y%m%U-%u') - 1
#[1] "2012-03-22" "2012-04-22" "2012-05-01" "2012-06-15" "2012-07-22"

function in R that creates dummies for given time period

There is a data frame like this:
The first two columns in the df describe the start date (month and year) and the end date (month and year). Column names describe every single month and year of a certain time period.
I need a function/loop that insterts "1" or "0" in each cell - "1" when the date from given column name is within the period described by the two first columns, and "0" if not.
I would appreciate any help.
You want to do two different things. (a) create a dummy variable and (b) see if a particular date is in an interval.
Making a dummy variable is the easiest one, in base R you can use ifelse. For example in the iris data frame:
iris$dummy <- ifelse(iris$Sepal.Width > 2.5, 1, 0)
Now working with dates is more complicated. In this answer we will use the library lubridate. First you need to convert all those dates to a format 'Month Year' to something that R can understand. For example for February you could do:
new_format_february_2016 <- interval(ymd('2016-02-01'), ymd('2016-03-01') - dseconds(1))
#[1] 2016-02-01 UTC--2016-02-29 23:59:59 UTC
This is February, the interval of time from the 1 of February to one second before the 1 of March. You can do the same with your start date column and you end date column.
To compare two intevals of time (so, to see if a particular month fall into your other intervals) you can do:
int_overlaps(new_format_february_2016, other_interval)
If this returns true, the two intervals (one particular month and another one) overlaps. This is not the same as one being inside another, but in your case it will work. Using this you can iterate over different columns and rows and build your dummy variable.
But before doing so, I would recommend to clean your data, as your current format is complicate to work with. To get all the power that vector types in R provides ideally you would want to have one row per observation and one variable per column. This does not seem to be the case with your data frame. Take a look to the chapter 'Tidy data' of 'R for Data Science' specially the spreading and gathering subsection:
Tidy data

gnuplot, calculating and plotting monthly averages

I have a datafile with several months of minute data with lines like "2016-02-02 13:21(\t)value(\n)".
I need to plot the data (no problem with that) and calculate + plot an average for each month.
Is it possible in gnuplot?
I am able to get an overall average using
fit a "datafile" using 1:3 via a
I am also able to specify some time range for the fit using
fit [now_secs-3600*24*31:now_secs] b "datafile" using 1:3 via b
... and then plot them with
plot a t "Total average",b t "Last 31 days"
But no idea how to calculate and plot an average for each month (= one stepped line showing each month average)
Here is a way to do it purely in gnuplot. This method can be adapted (with a not small amount of effort) to work with files that cross a year boundary or span more than one year. It works just fine if the data starts with January or not. It computes the ordinary average for each month (the arithmetic mean) treating each data point as one value for the month. With somewhat significant modification, it can be used to work with weighted averages as well.
This makes a significant use of the stats function to compute values. It is a little long, partly because I commented it heavily. It uses 5.0 features (NaN for undefined values and in-memory datablocks instead of temporary files), but comments note how to change these for earlier versions.
Note: This script must be run before setting time mode. The stats function will not work in time mode. Time conversions are handled by the script functions.
data_time_format = "%Y-%m-%d %H:%M" #date format in file
date_cols = 2 # Number of columns consumed by date format
# get numeric month value of time - 1=January, 12=December
get_month(x) = 0+strftime("%m",strptime(data_time_format,x))
# get numeric year value of time
get_year(x) = 0+strftime("%Y",strptime(data_time_format,x))
# get internal time representation of day 1 of month x in year y
get_month_first(x,y) = strptime("%Y-%m-%d",sprintf("%d-%d-01",y,x))
# get internal time representation of date
get_date(x) = strptime(data_time_format,x)
# get date string in file format corresponding to day y in month x of year z
get_date_string(x,y,z) = strftime(data_time_format,strptime("%Y-%m-%d",sprintf("%04d-%02d-%02d",z,x,y)))
# determine if date represented by z is in month x of year y
check_valid(x,y,z) = (get_date(z)>=get_month_first(x,y))&(get_date(z)<get_month_first(x+1,y))
# Determine year and month range represented by file
year = 0
stats datafile u (year=get_year(strcol(1)),get_month(strcol(1))) nooutput
month_min = STATS_min
month_max = STATS_max
# list of average values for each month
aves = ""
# fill missing months at beginning of year with 0
do for[i=1:(month_min-1)] {
aves = sprintf("%s %d",aves,0)
}
# compute average of each month and store it at the end of aves
do for[i=month_min:month_max] {
# In versions prior to 5.0, replace NaN with 1/0
stats datafile u (check_valid(i,year,strcol(1))?column(date_cols+1):NaN) nooutput
aves = sprintf("%s %f",aves,STATS_mean)
}
# day on which to plot average
baseday = 15
# In version prior to 5.0, replace $k with a temporary file name
set print $k
# Change this to start at 1 if we want to fill in prior months
do for [i=month_min:month_max] {
print sprintf("%s %s",get_date_string(i,baseday,year),word(aves,i))
}
set print
This script will create either a in-memory datablock or a temporary file for earlier versions (with the noted changes) that contains a similar file to the original, but containing one entry per month with the value of the monthly average.
At the beginning we need to define our date format and the number of columns that the date format consumes. From then on it is assumed that the data file is structured as datetime value. Several functions are defined which make extensive use of the strptime function (to compute a date string to an internal integer) and the strftime function (to compute an internal representation to a string). Some of these functions compute both ways in order to extract the necessary values. Note the addition of 0 in the get_month and get_year function to convert a string value to an integer.
We do several steps with the data in order to build our resulting datablock/file.
Use the stats function to compute the first and last month and the year. We are assuming only one year is present. This step needs to be modified heavily if we need to work with more than one year. In particular months in a second year would need to be numbered 13 - 24 and in a third year 25 - 36 and so on. We would need to modify this line to capture multiple years as well. Probably two passes would be needed.
Build up a string which contains space separated values for the average value for each month. This is done by applying the stats function once for each month. The check_valid function checks if a value is in the month of interest, and a value that isn't is assigned NaN which causes the stats function to ignore it.
Loop over the months of interest and build a datablock/temporary file with one entry for each month with the average value for that month. In this case, the average value is assigned to the start of the 15th day of the month. This can be easily changed to any other desired time. The get_date_string function is used for assigning the value to a time.
Now to demonstrate this, suppose that we have the following data
2016-02-03 15:22 95
2016-02-20 18:03 23
2016-03-10 16:03 200
2016-03-15 03:02 100
2016-03-18 02:02 200
We wish to plot this data along with the average value for each month. We can run the above script, and we will get a datablock $k (make the commented change near the bottom to use a temporary file instead) containing the following
2016-02-15 00:00 59.000000
2016-03-15 00:00 166.666667
This is exactly the average values for each month. Now we can plot with
set xdata time
set timefmt data_time_format
set key outside top right
plot $k u 1:3 w points pt 7 t "Monthly Average",\
datafile u 1:3 with lines t "Original Data"
Here, just for illustration, I used points with the averages. Feel free to use any style that you want. If you choose to use steps, you will very likely want to adjust the day that is assigned† in the datablock/temporary file (probably the first or last day in the month depending on how you want to do it).
It is usually easier with a task like this to do some outside preprocessing, but this demonstrates that it is possible in pure gnuplot.
† Regarding changing the day that is assigned, using any specific day in the month is easy, as long as it is a day that occurs in every month (dates from the 1st to the 28th) - just change baseday. For other values modifications to the get_date_string function need to be made.
For example, to use the last day, the function can be defined as
get_date_string(x,y,z) = strftime(data_time_format,strptime("%Y-%m-%d",sprintf("%04d-%02d-01",z,x+1))-24*60*60)
This version actually computes the first day of the next month, and then subtracts one whole day from that. The second argument is ignored in this version, but preserved to allow it to be used without having to make any additional changes to the script.
With a recent version of gnuplot, you have the stats command and you can do something something like this:
stats "datafile" using 1:3 name m0
month_sec=3600*24*30.5
do for [month=1:12] {
stats [now_secs-(i+1)*month_sec:(i+0)*now_secs-month_sec] "datafile" using 1:3 name sprintf("m%d")
}
you get m0_mean value for the total mean and you get all m1_mean m2_mean variables for the previuos months etc... defined in gnuplot
Finally to plot the you should do something like:
plot 'datafile', for [month=0:12] value(sprintf("m%d_mean"))
see help stats help for help value help sprintf for more information on the above commands

Linking characters from one data.frame to other datasets

I have a data.frame with two columns. The first column contains various specific times during a day. The second column contains the animal behavior (behavior period) that I observed at each specific time:
Time; Behavior
10:20; feeding
10:25; feeding
10:30; resting
...
For each of those behavior periods I have an additional dataset (TimeSeries) which contains data about the actual animal movement (output from a movement sensor). Each TimeSeries has about 100 rows:
Time; Var1; Var2
10:20:01; 1345; 5232
10:20:02; 1423; 5271
...
Now I would like to link each TimeSeries with the behavior from the first dataset. So, that R knows that "feeding" is related to the TimeSeries of 10:20 and 10:25 and that "resting" is related to the TimeSeries of 10:30 and so on.
Afterwards I want to use this "knowledge" to calculate mean and sd from each TimeSeries. So I will have all the means and sd's from all TimeSeries for each behavior.
It is not clear whether your times are currently characters, factors, POSIXct, variables, etc. So you should first convert them (possibly in a new column) to a numeric variable, something like the number of seconds since midnight. Functions like strptime, difftime, and as.numeric may help.
Add a column to the first data frame that is just 1:nrow(firstdf). Then add a column to the second dataframe that is computed by the findInterval function:
seconddf$newcol <- findInterval( seconddf$seconds, firstdf$seconds )
Now you can merge the 2 data frames on the new columns and the finer grained times will be associated with the activity from the most recent time.

Resources