Difficulty in splitting my data set based on desired date frame - r

I want to split my data into 6 data frames for time series analysis. Example:
Time period 1; 23/3/2015 to 23/4/2015.
Time period 2; 23/3/2016 to 23/4/2016
Stuck in this stage for a while.
I tried using split by Date function in open air package like this;
n _ data <- split By Date (n _data, dates= "23/3/2015", "23/4/2015",
labels = c("March 2015", "April 2015"))
Error message; `code` Error in cut.default(as.numeric(mydata$date), breaks = c(0, as.numeric(dates), :
lengths of 'breaks' and 'labels' differ
In addition: Warning messages:
1: In cut(as.numeric(mydata$date), breaks = c(0, as.numeric(dates), :
NAs introduced by coercion
2: In sort.int(as.double(breaks)) : NAs introduced by coercion
Then I mutated my data frame then used selectbydate function:
data_1$date <- as.Date(data_1$date, format = "%d/%m/%y")
H_15 <- selectByDate(data_1, start = "23/3/2015", end = "24/4/2015")
The data frame created is empty
structure(list(NO2 = c(10.04, 12.74, 16.95, 13.96, 12.68, 9.91,
8.48, 7.46, 7.24, 7.35), PM10 = c(28.1, 22.7, 22.3, 25.5, 21.8,
20, 15.2, 12.1, 14.2, 16.7), PM2.5 = c(24.4, 14.7, 16, 15.5,
13.4, 11.8, 7.5, 7.4, 8.3, 10.1), O3 = c(53.15, 50.24, 46.95,
51.49, 53.98, 57.08, 58.97, 61.22, 59.12, 57.78), date = c("01/01/2015",
"01/01/2015", "01/01/2015", "01/01/2015", "01/01/2015", "01/01/2015",
"01/01/2015", "01/01/2015", "01/01/2015", "01/01/2015"), time = c("00:00:00",
"01:00:00", "02:00:00", "03:00:00", "04:00:00", "05:00:00", "06:00:00",
"07:00:00", "08:00:00", "09:00:00")), row.names = c(NA, 10L), class = "data.frame")


How to transform average hourly data per month into time series?

I have a data set with peculiar granularity: each month has 24 average hourly data and time span is jan-2012 until dec-2019 (see image). I am interested in the data of column "Ws".
I try unsuccessfully transforming it in times series with this R code:
data <- read.csv("c:/bib/test-1.csv", dec = ".", header = TRUE)
inds <- seq(as.Date("2012-01-01"), as.Date("2019-12-31"), by = "hour")
## Create a time series object
myData <- ts(data,
start = c(2012-1, as.numeric(format(inds[1], "%j"))),
frequency = 2304)
Output of 'dput(head(data, 20))' code:
structure(list(V1 = c(8.4, 8.2, 8.2, 8, 7.8, 7.5, 7.3, 7.2, 8.2,
8.8, 9.2, 9.5, 9.7, 9.9, 10, 10.1, 9.9, 9.5, 8.9, 8.6)), row.names = c(NA, 20L), class = "data.frame")
Does someone could help me with this?

For Loop and Appending to Vector in R

I am trying to create a For Loop in R to fill a Vector with Forecasted values, generated via the auto.arima function.
I am new to R, so I am not sure if this is done correctly.
The code I am using is the following:
dfts <- ts(df$Price_REG1)
for (i in 0:7) {
modArima <- auto.arima(dfts[0+(i*24):168+(i*24)])
forecast <- forecast(modArima, h=24)
forecast_values <- forecast$mean
fc <- append(fc, forecast_values)
I use longer sets in reality, but made it smaller here to make it more understandable.
What I am trying to achieve is to use the first week of data (168 hours in one week) to estimate the coefficients for the model. Then I want to put the generated predictions for the first 24 hours after the training set in the Vector fc.
I then want to move the window one day, reestimate the coefficients and generate the forecasts for the following day and saving them into the Vector.
I am a bit unsure on the dfts[0+(i*24):168+(i*24)] part, since df <- df[0:168], doesn't work, but needs the df <- df[0:168,]. But if I put dfts[0+(i*24):168+(i*24)] I get
Error in [.default(dfts, 0 + (i * 24):874 + (i * 24), ) : incorrect
number of dimensions
Sample of Data:
structure(c(28.78, 28.45, 27.9, 27.52, 27.54, 26.55, 25.83, 25.07,
25.65, 26.15, 26.77, 27.4, 28.08, 28.69, 29.37, 29.97, 30.46,
30.39, 30.06, 29.38, 27.65, 27.33, 25.88, 24.81, 12.07, 13.13,
19.07, 21.12, 24.29, 26.27, 27.74, 28.39, 29.37, 29.95, 29.91,
29.96, 29.94, 29.94, 30.18, 30.96, 31.2, 30.98, 30.35, 29.27,
28.17, 28.02, 27.69, 24.39, 18.93, 9.98, 1.53, 0.14, 0.85, 9.92,
24.48, 26.68, 28.12, 28.58, 28.16, 28.78, 28.31, 28.44, 28.96,
29.86, 30.15, 30.07, 29.54, 29.11, 27.91, 27.03, 25.7, 22.04,
21.73, 15.95, 16.23, 6.45, 3.83, 4.03, 4.04, 19.07, 17.49, 24.18,
24.94, 25.11, 24.94, 24.95, 25.25, 26.33, 27.36, 28.88, 29.58,
29.42, 27.71, 27.4, 27.37, 25.77, 26.65, 27.13, 27.11, 27.42), tsp = c(1,
5.125, 24), class = "ts")
Here is an example with built-in data set AirPassengers on how to run a rolling forecast with package forecast.
The code below makes use of time series functions
window to subset objects of class "ts";
frequency and start to get those attributes.
The output vector is created beforehand, not extended in the loop with append.
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
data("AirPassengers", package = "datasets")
fc <- ts(
data = rep(NA, length(AirPassengers)),
start = start(AirPassengers),
frequency = frequency(AirPassengers)
start <- start(AirPassengers)[1]
freq <- frequency(AirPassengers)
i_fc <- seq_len(freq)
fc[i_fc] <- AirPassengers[i_fc]
for(i in 1:11) {
w <- window(AirPassengers, start = start + i - 1L, end = c(start + i - 1L, freq))
modArima <- auto.arima(w)
y <- forecast(modArima, h = freq)$mean
i_fc <- i_fc + freq
fc[i_fc] <- y
plot(cbind(AirPassengers, fc))
I believe that the code below forecasts the next day given a certain initial number of days.
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
fill_first_periods <- function(x, weeks = 1L, week_days) {
if(missing(week_days)) week_days <- 7L
fc <- ts(
data = rep(NA, length(x)),
start = start(x),
frequency = frequency(x)
i_fc <- seq_len(frequency(x) * week_days * weeks)
fc[i_fc] <- x[i_fc]
# not enough data to run an example for 1 week
# three days only
weeks <- 1L
week_days <- 3L
fc <- fill_first_periods(dfts, weeks = weeks, week_days)
n <- length(fc)
i_last <- length(fc[!is.na(fc)])
h <- frequency(fc)
curr_start <- start(fc)
curr_end <- c(curr_start[1] + weeks*week_days - 1L, frequency(fc))
for(i in 2:(end(fc)[1] - 1L)) {
if(n - i_last < h) {
h <- end(fc)[2]
i_fc <- tail(seq_len(n), h)
} else {
i_fc <- (i_last + 1L):(i_last + h)
i_last <- i_last + h
w <- window(dfts, start = curr_start, end = curr_end)
modArima <- auto.arima(w)
fc[i_fc] <- forecast(modArima, h = h)$mean
curr_start[1] <- curr_start[1L] + 1L
curr_end <- c(curr_end[1L] + 1L, h)
plot(cbind(dfts, fc))
First Derivative of Scatter Plot R

Hello I am working with sigmoidal data and am attempting to plot two scatter plots on top of each other: the raw data & the first derivative of the raw data. My issue doesn't lie in plotting the data, but more-so finding a function that will create an accurate representation of the first derivative.
What have I tried: Creating a function that calculates the slope of the current & next point: (y2-y1)/(x2-x1) & assigning the value to the current temperature.
dput() of Data Frame:
structure(list(Temperature = c(4.98, 5.49, 6.01, 6.5, 7.02, 7.52, 8.03, 8.52, 9.03, 9.54, 10.04, 10.54, 11.05, 11.55, 12.05, 12.55, 13.05, 13.56, 14.06, 14.57, 15.07, 15.57, 16.07, 16.59, 17.08, 17.59, 18.08, 18.59, 19.09, 19.6, 20.1, 20.64, 21.12, 21.63, 22.13, 22.62, 23.13, 23.63, 24.13, 24.63, 25.11, 25.62, 26.11, 26.68, 27.19, 27.7, 28.2, 28.71, 29.21, 29.71, 30.21, 30.7, 31.21, 31.69, 32.19, 32.69, 33.19, 33.7, 34.19, 34.68, 35.19, 35.68, 36.19, 36.69, 37.19, 37.7, 38.19, 38.7, 39.2, 39.7, 40.21, 40.7, 41.22, 41.71, 42.21, 42.71, 43.21, 43.72, 44.22, 44.72, 45.22, 45.73, 46.23, 46.73, 47.23, 47.97, 48.71, 49.23, 49.74, 50.23, 50.73, 51.23, 51.73, 52.24, 52.75, 53.24, 53.75, 54.24, 54.75, 55.26, 55.75, 56.25, 56.75, 57.24, 57.75, 58.27, 58.77, 59.26, 59.77, 60.26, 60.78, 61.27, 61.79, 62.27, 62.77, 63.29, 63.79, 64.27, 64.78, 65.3, 65.8, 66.27, 66.8, 67.3, 67.8, 68.31, 68.78, 69.3, 69.8, 70.32, 70.81, 71.32, 71.81, 72.33, 72.82, 73.31, 73.83, 74.33, 74.82, 75.32, 75.83, 76.34, 76.84, 77.35, 77.82, 78.34, 78.85, 79.36, 79.84, 80.35, 80.85, 81.36, 81.86, 82.37, 82.86, 83.37, 83.88, 84.36, 84.88, 85.38, 85.88, 86.38, 86.89, 87.38, 87.89, 88.39, 88.89, 89.4, 89.9, 90.39, 90.9, 91.4, 91.91, 92.37, 92.89, 93.4, 93.91, 94.41, 94.91, 95.42), Absorbance = c(1.401351929, 1.403320313, 1.405181885, 1.406326294, 1.407440186, 1.409118652, 1.410095215, 1.410797119, 1.411560059, 1.412918091, 1.413970947, 1.414245605, 1.416000366, 1.415435791, 1.41809082, 1.4190979, 1.419677734, 1.420150757, 1.421966553, 1.420333862, 1.422637939, 1.422790527, 1.423461914, 1.426513672, 1.426315308, 1.426071167, 1.426467896, 1.428710938, 1.428070068, 1.428817749, 1.429733276, 1.432144165, 1.432434082, 1.433227539, 1.434616089, 1.435806274, 1.434814453, 1.436096191, 1.436096191, 1.436447144, 1.437896729, 1.4375, 1.438934326, 1.440139771, 1.440139771, 1.441741943, 1.442108154, 1.443969727, 1.444778442, 1.443862915, 1.444534302, 1.445648193, 1.444473267, 1.446395874, 1.447219849, 1.446151733, 1.449569702, 1.449066162, 1.448852539, 1.4503479, 1.451385498, 1.45111084, 1.451217651, 1.453125, 1.452560425, 1.455047607, 1.455093384, 1.456665039, 1.457977295, 1.457336426, 1.458648682, 1.46043396, 1.462158203, 1.464813232, 1.463531494, 1.468048096, 1.468643188, 1.470748901, 1.471878052, 1.476257324, 1.478057861, 1.482040405, 1.484466553, 1.486129761, 1.48815918, 1.496520996, 1.499786377, 1.504302979, 1.507217407, 1.512985229, 1.517471313, 1.524108887, 1.528198242, 1.534637451, 1.539169312, 1.546142578, 1.554611206, 1.55809021, 1.56854248, 1.572875977, 1.580307007, 1.585739136, 1.592514038, 1.600067139, 1.609222412, 1.616607666, 1.622375488, 1.631469727, 1.635635376, 1.642929077, 1.649780273, 1.655014038, 1.661483765, 1.663742065, 1.671859741, 1.677200317, 1.677108765, 1.683380127, 1.684082031, 1.687438965, 1.694595337, 1.694961548, 1.696685791, 1.696685791, 1.699768066, 1.702514648, 1.703613281, 1.705093384, 1.70022583, 1.707595825, 1.707962036, 1.709075928, 1.705276489, 1.71055603, 1.709259033, 1.70916748, 1.709732056, 1.710189819, 1.710281372, 1.711868286, 1.711883545, 1.713104248, 1.713760376, 1.711120605, 1.709716797, 1.711776733, 1.712814331, 1.714324951, 1.711120605, 1.713378906, 1.712432861, 1.716125488, 1.710006714, 1.710845947, 1.711502075, 1.711120605, 1.710006714, 1.70980835, 1.708602905, 1.708236694, 1.710189819, 1.707672119, 1.706939697, 1.710006714, 1.706192017, 1.706573486, 1.706207275, 1.705734253, 1.706207275, 1.705184937, 1.70954895, 1.705841064, 1.702972412, 1.703979492, 1.703063965, 1.709350586, 1.703338623, 1.700408936, 1.705276489, 1.705368042)), row.names = 1621:1800, class = "data.frame")
Code For my Attempt
raw = "<insert dput line>>"
columns = c("Temperature","Absorbance")
first = data.frame(matrix(nrow=0,ncol=2))
colnames(dFrame) = columns
for (i in 1:nrow(raw)) {
if(i != nrow(raw)) {
cAbs = raw[i,2]
nextAbs = raw[i+1,2]
cT = raw[i,1]
nextT = raw[i+1,1]
Temperature = raw[i,1]
Absorbance =((nextAbs-cAbs)/(nextT-cT))
t <- data.frame(Temperature,Absorbance)
names(t) <- names(raw)
first <- rbind(first, t)
geom_point(data=raw, aes(x=Temperature,y=Absorbance), color = "red") +
geom_point(data = first, aes(x=Temperature,y = Absorbance), color = "blue")
What I was expecting
I was expecting an output that had the shape of something like so:
library(dplyr); library(ggplot2)
df %>%
arrange(Temperature) %>%
mutate(slope = (Absorbance - lag(Absorbance))/
(Temperature - lag(Temperature))) %>%
ggplot(aes(Temperature)) +
geom_line(aes(y= Absorbance, color = "Absorbance"), size = 1.2) +
geom_point(aes(y= slope * 20 + 1.4, color = "slope")) +
geom_smooth(aes(y= slope * 20 + 1.4, color = "slope"), se = FALSE, size = 0.8) +
scale_y_continuous(sec.axis = sec_axis(trans = ~(.x - 1.4)/20, name = "slope"))
If the data is even a little noisy, calculating the derivative by first differencing can be very noisy.
You can get a better estimate by fitting a smoothing spline function and calculating the derivative of the spline function. By differentiating a smooth function, you get a smooth derivative.
In most cases, smooth.spline with default arguments is fine, but I recommend taking a look at the result and possibly tuning the smooth.spline parameters for more or less smoothing, depending on your judgment.
edit: I learned this approach from the Numerical Recipes textbook.
df <- tibble(
x = seq(1, 15, by = 0.1),
y = sin(x) + runif(length(x), -0.2, 0.2),
d1_diff = c(NA, diff(y) / diff(x)),
d1_spline = smooth.spline(x, y) %>% predict(x, deriv = 1) %>% pluck("y")
df %>%
pivot_longer(-x) %>%
mutate(name = factor(name, unique(name))) %>%
ggplot() + aes(x, value, color = name) + geom_point() + geom_line() +
facet_wrap(~name, ncol = 1)
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Calculate correlation on a monthly/weekly level

I am having trouble calculating the correlation coefficient between electricity prices of different countries on monthly/ weekly level. The dataset (https://github.com/Argiro1983/prices_df.git) looks like this:
prices_df<-structure(list(DATETIME = structure(c(1609459200, 1609462800,
1609466400, 1609470000, 1609473600, 1609477200, 1609480800, 1609484400,
1609488000, 1609491600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
GR = c(50.87, 48.19, 44.68, 42.92, 40.39, 20.96, 39.63, 40.1,
20, 40.74), IT = c(50.87, 48.19, 44.68, 42.92, 40.39, 40.2,
39.63, 40.09, 41.27, 41.67), BG = c(49.95, 48.05, 49.62,
46.73, 45.39, 44.25, 36.34, 19.97, 20, 20.43), HU = c(45.54,
41.59, 40.05, 36.9, 34.47, 32.82, 27.7, 15, 8.43, 20.77),
TR = c(26.31, 24.06, 24.21, 23.2, 23.2, 26.31, 24.98, 26.31,
24.04, 26.31), SR = c(38.89, 34.86, 33.62, 28.25, 29.03,
29.22, 29.71, 1.08, 1.1, 36.07)), row.names = c(NA, 10L), class = "data.frame")
I have tried converting it to xts and using apply.monthly (or apply.weekly) as follows, but it does not work.
SEE_prices <- xts(x = prices_df, order.by = DATETIME)
storage.mode(SEE_prices) <- "numeric"
SEE_prices <- na.locf(SEE_prices)
apply.monthly(SEE_prices, cor(SEE_prices$GR, SEE_prices$SR))
Another way I tried to get correlation on weekly level was to use the dplyr package, but it also did not work:
prices_df %<>% mutate( DATETIME = ymd_hms(DATETIME) )
table1<- prices_df %>% group_by( year( DATETIME ), isoweek( DATETIME ) ) %>%
summarise( DateCount = n_distinct(date(DATETIME)), correlation = cor(prices_df$GR, prices_df$SR))
Does anybody have an idea on how to calculate weekly/monthly correlation on a dataset?
Thank you in advance.
Don't use $ in dplyr pipes. To calculate correlation try -
prices_df %>%
mutate(DATETIME = ymd_hms(DATETIME),
year = year(DATETIME), week = isoweek(DATETIME)) %>%
group_by(year, week) %>%
summarise(DateCount = n_distinct(date(DATETIME)),
correlation = cor(GR, SR), .groups = 'drop')

How to make months of the year my x-axis using xyplot

Here is my data
my code
dt1 =read.csv("C:/Users/My DELL/Documents/R_data/machine learning/dt1.csv")
dt1$month <- seq(nrow(dt1))
mm <- melt(subset(dt1,select=c(month,EgbeNa,UrejeNa,EroNa,RefNa,EgbeMg,UrejeMg,EroMg,RefMg
xyplot(value ~ month|variable,data=mm,type="l",
dt_repr = structure(list(Date = c("01-11-17", "01-12-17", "01-01-18", "01-02-18",
"01-03-18", "01-04-18", "01-05-18", "01-06-18", "01-07-18", "01-08-18",
"01-09-18", "01-10-18", "01-11-18", "01-12-18", "01-01-19", "01-02-19",
"01-03-19", "01-04-19", "01-05-19", "01-06-19", "01-07-19", "01-08-19",
"01-09-19", "01-10-19"), month = 1:24, EgbeNa = c(27.4, 29.25,
31.1, 20.4, 13.55, 14, 16.25, 18.5, 24.95, 16.2, 30.15, 28.6,
35.1, 36.5, 28.45, 31.5, 38.1, 28, 32.55, 30.5, 33.2, 30.8, 13,
24.3), UrejeNa = c(10.45, 9, 7.55, 13.35, 11.6, 12.475, 20.1625,
27.85, 21.5, 32.05, 17.65, 15.15, 25.7, 18.8, 26.85, 20.65, 23.5,
26.45, 30.2, 25.75, 28.3, 31.45, 44.4, 39.6), EroNa = c(44.45,
40.55, 36.65, 43, 39.825, 36.825, 44.1, 51.65, 44.2, 56.1, 61.3,
66.05, 15.75, 19.15, 13.05, 12.2, 21.7, 17.9, 14.6, 33.3, 21.2,
19.6, 32.7, 25.1), RefNa = c(10.55, 9.75, 12.35, 19.65, 10.6,
13.74, 22.62, 25.82, 20.4, 31.2, 16.95, 14.25, 15.03, 17.15,
12.75, 13.5, 20.45, 16.8, 15.5, 25.4, 19.5, 19.8, 26.7, 25.1),
EgbeMg = c(4.118, 4.7155, 5.313, 4.4865, 5.1535, 5.1295,
5.113, 5.103, 5.721, 5.285, 3.8575, 4.128, 5.4205, 6.2975,
5.134, 5.4605, 5.124, 4.203, 5.2635, 5.135, 6.092, 5.575,
4.139, 4.8645), UrejeMg = c(3.6655, 3.977, 4.288, 4.192,
4.676, 4.434, 4.7005, 4.966, 5.3895, 5.7165, 4.881, 4.1015,
3.743, 6.132, 6.0785, 6.1775, 6.3135, 6.028, 5.739, 6.126,
4.5155, 4.716, 5.2165, 5.678), EroMg = c(2.472, 2.31425,
2.1565, 2.2115, 2.184, 2.135, 4.135, 6.2005, 5.457, 5.981,
5.784, 5.885, 5.406, 5.248, 4.967, 4.449, 5.058, 5.1675,
5.667, 6.966, 5.17, 4.8965, 7.201, 6.538), RefMg = c(3.75,
3.87, 4.82, 4.132, 3.98, 4.23, 4.57, 5.01, 5.02, 4.67, 4.18,
4.51, 5.21, 5.18, 4.76, 4.29, 4.95, 5.07, 5.45, 5.86, 5.11,
4.79, 6.01, 5.24)), class = "data.frame", row.names = c(NA,
-24L)) #This data is reproducible
and the output
I want to use Date as my x-axis, the Date covers 24 months. It starts at 01-11-17 and ends at 01-10-19. Anyone can help please.
It is difficult to provide answers without using your data. You need to provide your data in a usable format as #r2evans says above. However, you can convert your Date row, which appears to be a string, to Date type and use that as your X-axis. You can format how the date should be displayed by adding the format in the scales list.
For example, in your case:
x = list(format = "%m-%Y") # or whatever format you need
or whatever format you need.
Here is one way how you could achieve your task:
df <- dt_repr %>%
cols = c(-Date, -month),
names_to = "names",
values_to = "values"
) %>%
mutate(Date = dmy(Date))
xyplot(values ~ Date|names,data=df,type="l",
I got the solution using this set of instruction:
#From Painless way to install a new version of R?
Run in the old version of R (or via RStudio)
packages <- installed.packages()[,"Package"]
save(packages, file="Rpackages")
if(!require(installr)) { install.packages("installr"); require(installr)} #load / install+load installr
# See here for more on installr: https://www.r-statistics.com/2013/03/updating-r-from-r-on-windows-using-the-installr-package/
# step by step functions:
check.for.updates.R() # tells you if there is a new version of R or not.
install.R() # download and run the latest R installer
# Install library - run in the new version of R. This calls package names and installs them from repos, thus all packages should be correct to the most recent version
for (p in setdiff(packages, installed.packages()[,"Package"]))
# Installr includes a package migration tool but this simply copies packages, it does not update them
copy.packages.between.libraries() # copy your packages to the newest R installation from the one version before it (if ask=T, it will ask you between which two versions to perform the copying)
Then all the error messages are gone, the missing packages tidyverse and ggplot2 came back and I have my desired plot with expected x axis
