I am using a monthly time series data which is infact a xts object. My aim is to covert the monthly data to daily data, such that each day in a Month has a value of the particular month.
For example:
library("xts")
observation_dates <- as.Date(c("01.12.1993", "01.01.1994",
"01.02.1994", "01.03.1994", "01.04.1994", "01.05.1994",
"01.06.1994", "01.07.1994", "01.08.1994", "01.09.1994",
"01.10.1994", "01.11.1994", "01.12.1994"), format = "%d.%m.%Y")
air_data <- zoo(matrix(c(21, 21, 21, 30, 35.5, 36, 38.5,
33, 37, 37, 30, 24, 21), ncol = 1), observation_dates)
colnames(air_data) = "air_temperature"
The series is as shown above.
I want to have all the 31 days in December 1993 to have a value of 21 (Air temp) so that average of the month still remains 21. And similarly i want to proceed for the rest of the months as shown.
I have tried using to.period(x, period="days") but nothing changes.
please does anyone have any idea?
Your help would be appreciated
Thank you so much for your response. However i was able to solve the problem. The approach i used is similar to as suggested by Ekatef. In my case i created empty xts object containing all the dates and converted all the variables in empty xts to numeric using lapply().
Then i merged the empty xts with monthly data series using:
merge(x,y,fill=na.locf). here na.locf carries forward the last observation in the monthly series to all the days in the month and subsequently follows for the other month.
The xts package isn't applicable to your problem as according to help of to.period:
It is not possible to convert a series from a lower periodicity to a
higher periodicity - e.g. weekly to daily or daily to 5 minute bars,
as that would require magic
It seems, approx() function may be the best solution if interpolation is desired
# emulation of the original monthly dates
observation_dates <- as.Date(c("01.12.1993", "01.01.1994",
"01.02.1994", "01.03.1994", "01.04.1994", "01.05.1994",
"01.06.1994", "01.07.1994", "01.08.1994", "01.09.1994",
"01.10.1994", "01.11.1994", "01.12.1994"), format = "%d.%m.%Y")
t_air <- c(21, 23, 20, 30, 35.5, 36, 38.5, 33, 37, 37, 30, 24, 27)
# target dates
seq_date <- seq(from = as.Date("01.12.1993", format = "%d.%m.%Y"),
to = as.Date("31.12.1994", format = "%d.%m.%Y"), by = 1)
ans <- approx(observation_dates, y = t_air, xout = seq_date)
If only one value for each month should be used, I would solve your problems using two data frames. The first one obs_data to keep the observation data with a column of the dates in a convenient "year-month" format
ym_dates <- format(observation_dates, "%Y-%m")
t_air <- c(21, 23, 20, 30, 35.5, 36, 38.5, 33, 37, 37, 30, 24, 27)
obs_data <- data.frame(observation_dates, ym_dates ,t_air)
The second one res_df to keep the target dates seq_date of daily resolution. The column air_t is filled with NA first
res_df <- data.frame(seq_date, ym = format(seq_date, "%Y-%m"),
stringsAsFactors = FALSE, air_t = NA)
Then fill the air_t column with data from the obs_data using correspondence of the years and months as a condition
dates_to_int <- unique(res_df$ym)
for (i in seq(along.with = dates_to_int))
{
res_df[which(res_df$ym %in% dates_to_int[i]), "air_t"] <-
obs_data[which(obs_data$ym_dates %in% dates_to_int[i]), "t_air"]
}
Hope, it'll be helpful :)
Related
I have two data frames. The first one holds observations for a specific ID in a time interval with given StartDate and EndDate. The second data frame holds observations for the same specific IDs but on a specific date.
ID <- c(86041, 87371, 98765, 90010)
DateStart <- as.Date((c("2022-02-04", "2022-02-04", "2022-02-08", "2022-02-08")))
DateEnd <- as.Date((c("2022-02-07", "2022-02-10","2022-02-11", "2022-02-11")))
Interaction <- c(122, 73, 105, 82)
df1 <- data.frame(ID, DateStart, DateEnd, Interaction)
ID <- c(86041, 86041, 87371, 87371, 98765, 98765, 90010, 90010)
date <- as.Date(c("2022-02-04", "2022-02-05", "2022-02-06", "2022-02-09", "2022-02-09", "2022-02-11", "2022-02-08", "2022-02-10"))
view <- c(25, 67, 21, 36, 43, 61, 14, 34)
read <- c(13, 37, 29, 15, 37, 51, 9, 25)
df2 <- data.frame(ID, date, view, read)
I want to sum all the events from the second data frame for a specific ID within the interval between StartDate and Enddate and add this as another column in the first data frame.
So I tried writing a function to get the aggregate of "view" for a certain ID in a specific time interval and applying it to df1, but I only get 0 as return and it does not look very elegant:
calc_view <- function(ID, StartDate, EndDate) {
sum(df2$view[which(df2$ID == ID &
df2$date >= StartDate &
df2$date <= EndDate)])
}
df1$view <- apply(df1, 1, calc_view, StartDate = df1$StartDate, EndDate = df1$EndDate)
The desired output should present the count of aggregated events for "view" and "read" for a specific ID in the interval between StartDate and EndDate given in df1. So something like this:
ID DateStart DateEnd Interaction view read
1 86041 2022-02-04 2022-02-07 122 92 50
2 87371 2022-02-04 2022-02-10 73 57 44
3 98765 2022-02-08 2022-02-11 105 104 88
4 90010 2022-02-08 2022-02-11 82 48 34
I'm quite new to r and suppose there's a better option, so any help is highly appreciated.
I've got thousands of location points (latitude and longitude) with timestamps (YYYY-MM-DD HH:MM:SS) that I need the sunrise and sunset times for each position.
Tried doing this in in the R package "suncalc" but the examples given in the vignette accompanying the package are not practical real-world examples and gives no obvious solution to the coding I can use for my specific need.
First I tried the following code, which works great for just one date and one location:
> getSunlightTimes(date = date("2019-05-12"), lat = 24, lon = 28, keep = c("sunrise", "sunset"), tz = "CET")
date lat lon sunrise sunset
1 2019-05-12 24 28 2019-05-12 05:28:29 2019-05-12 18:42:55
Then I try run it with a few more dates and coordinates:
data <- data.frame(date = c("2019-05-12", "2019-05-13", "2019-05-14"),
lat = c(-24, -25, -26),
lon = c(28, 29, 20))
getSunlightTimes(data = data,
keep = c("sunrise", "sunset"), tz = "CET")
I would expect to get a result with the sunrise and sunset times for each of the three locations (e.g. one result for -24, 28 on 2019-05-12, another for -25, 29 on the 2019-05-13 etc), alas instead I get:
Error in getSunlightTimes(data = data, keep = c("sunrise", "sunset"), : date must to be a Date object (class Date)
Anyone?
You need to use as.Date to create multiple dates:
data <- data.frame(date = as.Date(c("2019-05-12", "2019-05-13", "2019-05-14")),
lat = c(-24, -25, -26), lon = c(28, 29, 20))
The output of a time-series looks like a data frame:
ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
Jan Feb Mar Apr May Jun Jul ...
1981 14.064085 21.664250 14.800249 -5.773095 16.477470 1.129674 16.747669 ...
1982 23.973620 17.851890 21.387944 28.451552 24.177141 25.212271 19.123179 ...
1983 19.801210 11.523906 8.103132 9.382778 4.614325 21.751529 9.540851 ...
1984 15.394517 21.021790 23.115453 12.685093 -2.209352 28.318686 10.159940 ...
1985 20.708447 13.095117 32.815273 9.393895 19.551045 24.847337 18.703991 ...
It would be handy to transform it into a data frame with columns Jan, Feb, Mar... and rows 1981, 1982, ... and then back. What's the most elegant way to do this?
Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.
# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
1) matrix. This solution requires that we have whole consecutive years
dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))
If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .
We could replace the dmn<- line with the following simpler line using #thelatemail's comment:
dmn <- dimnames(.preformat.ts(tt))
2) tapply. A more general solution using tapply is the following:
Month <- factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)
Note: To invert this suppose X is any of the solutions above. Then try:
ts(c(t(X)), start = 1981, freq = 12)
Update
Improvement motivated by comments of #latemail below.
Example with the AirPassengers dataset:
Make the data available and check its type:
data(AirPassengers)
class(AirPassengers)
Convert Time-Series into a data frame:
df <- data.frame(AirPassengers, year = trunc(time(AirPassengers)),
month = month.abb[cycle(AirPassengers)])
Redo the creation of the Time-Series object:
tsData = ts(df$AirPassengers, start = c(1949,1), end = c(1960,12), frequency = 12)
Plot the results to ensure correct execution:
components.ts = decompose(tsData)
plot(components.ts)
Try the package "tsbox"
ts = ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
df = ts_df(ts)
str(df)
data.frame: 60 obs. of 2 variables:
time : Date, format: "1981-01-01" "1981-02-01"
value: num 23.15 22.77 5.1 1.05 13.87
The output of a time-series looks like a data frame:
ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
Jan Feb Mar Apr May Jun Jul ...
1981 14.064085 21.664250 14.800249 -5.773095 16.477470 1.129674 16.747669 ...
1982 23.973620 17.851890 21.387944 28.451552 24.177141 25.212271 19.123179 ...
1983 19.801210 11.523906 8.103132 9.382778 4.614325 21.751529 9.540851 ...
1984 15.394517 21.021790 23.115453 12.685093 -2.209352 28.318686 10.159940 ...
1985 20.708447 13.095117 32.815273 9.393895 19.551045 24.847337 18.703991 ...
It would be handy to transform it into a data frame with columns Jan, Feb, Mar... and rows 1981, 1982, ... and then back. What's the most elegant way to do this?
Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.
# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
1) matrix. This solution requires that we have whole consecutive years
dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))
If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .
We could replace the dmn<- line with the following simpler line using #thelatemail's comment:
dmn <- dimnames(.preformat.ts(tt))
2) tapply. A more general solution using tapply is the following:
Month <- factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)
Note: To invert this suppose X is any of the solutions above. Then try:
ts(c(t(X)), start = 1981, freq = 12)
Update
Improvement motivated by comments of #latemail below.
Example with the AirPassengers dataset:
Make the data available and check its type:
data(AirPassengers)
class(AirPassengers)
Convert Time-Series into a data frame:
df <- data.frame(AirPassengers, year = trunc(time(AirPassengers)),
month = month.abb[cycle(AirPassengers)])
Redo the creation of the Time-Series object:
tsData = ts(df$AirPassengers, start = c(1949,1), end = c(1960,12), frequency = 12)
Plot the results to ensure correct execution:
components.ts = decompose(tsData)
plot(components.ts)
Try the package "tsbox"
ts = ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)
df = ts_df(ts)
str(df)
data.frame: 60 obs. of 2 variables:
time : Date, format: "1981-01-01" "1981-02-01"
value: num 23.15 22.77 5.1 1.05 13.87
I have found many answer regarding the week number of a particular date. What I want is to get a week number for 2 years i.e for first year it will give 1 to 53 weeks and then keep the count from 53 only and should not start with 1 again. Is it possible in R?. Example data is shown below:
We can use rep to add 53 to the vector ('vN2') after finding the number of observations for each year.
vN2 + rep(c(0, 53), tapply(vN2, cumsum(c(TRUE, diff(vN2) < 0)), FUN = length))
data
set.seed(24)
vN <- rep(1:53, sample(1:5, 53, replace=TRUE))
vN1 <- rep(1:53, sample(1:6, 53, replace=TRUE))
vN2 <- c(vN, vN1)