Divide observation by period mean. Help to simplify code - r

Link to data:
http://dl.dropbox.com/u/56075871/data.txt
I want to divide each observation by mean for that hour. Example:
2012-01-02 10:00:00 5.23
2012-01-03 10:00:00 5.28
2012-01-04 10:00:00 5.29
2012-01-05 10:00:00 5.29
2012-01-09 10:00:00 5.28
2012-01-10 10:00:00 5.33
2012-01-11 10:00:00 5.42
2012-01-12 10:00:00 5.55
2012-01-13 10:00:00 5.68
2012-01-16 10:00:00 5.53
mean for that is 5.388. Next i want divide each observation by that mean, so... 5.23/5.388, 5.28/5.388, ... until end 5.53/5.388
I have hourly timeseries for 10 stocks:
S1.1h S2.1h S3.1h S4.1h S5.1h S6.1h S7.1h S8.1h S9.1h S10.1h
2012-01-02 10:00:00 64.00 110.7 5.23 142.0 20.75 34.12 32.53 311.9 7.82 5.31
2012-01-02 11:00:00 64.00 110.8 5.30 143.2 20.90 34.27 32.81 312.0 7.97 5.34
2012-01-02 12:00:00 64.00 111.1 5.30 142.8 20.90 34.28 32.70 312.4 7.98 5.33
2012-01-02 13:00:00 61.45 114.7 5.30 143.1 21.01 34.35 32.85 313.0 7.96 5.35
2012-01-02 14:00:00 61.45 116.2 5.26 143.7 21.10 34.60 32.99 312.9 7.95 5.36
2012-01-02 15:00:00 63.95 116.2 5.26 143.2 21.26 34.72 33.00 312.6 7.99 5.37
2012-01-02 16:00:00 63.95 117.3 5.25 143.3 21.27 35.08 33.04 312.7 7.99 5.36
2012-01-02 17:00:00 63.95 117.8 5.24 144.7 21.25 35.40 33.10 313.6 7.99 5.40
2012-01-02 18:00:00 63.95 117.9 5.23 145.0 21.20 35.50 33.17 312.5 7.98 5.35
2012-01-03 10:00:00 63.95 115.5 5.28 143.5 21.15 35.31 33.05 311.7 7.94 5.37
...
And i want to divie each observation by its mean for hour (periodical)
I have some code. Code to make means:
#10:00:00, 11:00:00, ... 18:00:00
times <- paste(seq(10, 18),":00:00", sep="")
#means - matrix of means for timeseries and hour
means <- matrix(ncol= ncol(time_series), nrow = length(times))
for (t in 1:length(times)) {
#t is time 10 to 18
for(i in 1:ncol(time_series)) {
#i is stock 1 to 10
# hour mean for each observation in data
means[t,i] <- mean(time_series[grep(times[t], index(time_series)), i])
}
}
And my function to get "things done":
for (t in 1:length(times)) {
# get all dates with times[t] hour
hours <- time_series[grep(times[t], index(time_series))]
ep <- endpoints(hours, "hours")
out <- rbind(out, period.apply(hours, INDEX=ep, FUN=function(x) {
x/means[t,]
}))
}
I know this is awful, but it works. How can i simplify code?

Here's one way to do it:
# Split the xts object into chunks by hour
# .indexhour() returns the hourly portion for each timestamp
s <- split(time_series, .indexhour(time_series))
# Use sweep to divide each value of x by colMeans(x) for each group of hours
l <- lapply(s, function(x) sweep(x, 2, colMeans(x), FUN="/"))
# rbind everything back together
r <- do.call(rbind, l)

The scale function can do that. Used with ave you could restrict to calcs within hours. Post the resutls of dput on that xts/zoo object and you will get rapid replies.

Related

Mean over a certain time-period in R

I have hourly data of CO2 values and I would like to know what is the CO2 concentration during the night (e.g. 9pm-7am). A reproducible example:
library(tidyverse); library(lubridate)
times <- seq(ymd_hms("2020-01-01 08:00:00"),
ymd_hms("2020-01-04 08:00:00"), by = "1 hours")
values <- runif(length(times), 1, 15)
df <- tibble(times, values)
How to get mean nightime values (e.g. between 9pm and 7am)? Of course I can filter like this:
df <- df %>%
filter(!hour(times) %in% c(8:20))
And then give id to each observation during the night
df$ID <- rep(LETTERS[1:round(nrow(df)/11)],
times = 1, each = 11)
And finally group and summarise
df_grouped <- df %>%
group_by(., ID) %>%
summarise(value_mean =mean(values))
But this is not a good way I am sure. How to do this better? Especially the part where we give ID to the nighttime values
You can use data.table::frollmean to get the means for a certain window time. In your case you want the means for the last 10 hours, so we set the n argument of the function to 10:
> df$means <- data.table::frollmean(df$values, 10)
> df
> head(df, 20)
# A tibble: 20 x 3
times values means
<dttm> <dbl> <dbl>
1 2020-01-01 08:00:00 4.15 NA
2 2020-01-01 09:00:00 6.24 NA
3 2020-01-01 10:00:00 5.17 NA
4 2020-01-01 11:00:00 9.20 NA
5 2020-01-01 12:00:00 12.3 NA
6 2020-01-01 13:00:00 2.93 NA
7 2020-01-01 14:00:00 9.12 NA
8 2020-01-01 15:00:00 9.72 NA
9 2020-01-01 16:00:00 12.0 NA
10 2020-01-01 17:00:00 13.4 8.41
11 2020-01-01 18:00:00 10.2 9.01
12 2020-01-01 19:00:00 1.97 8.59
13 2020-01-01 20:00:00 11.9 9.26
14 2020-01-01 21:00:00 8.84 9.23
15 2020-01-01 22:00:00 10.1 9.01
16 2020-01-01 23:00:00 3.76 9.09
17 2020-01-02 00:00:00 9.98 9.18
18 2020-01-02 01:00:00 5.56 8.76
19 2020-01-02 02:00:00 5.22 8.09
20 2020-01-02 03:00:00 6.36 7.39
Each row in the mean column will be the mean of that same row value column with the 9 last rows of the value column. Of course there will be some NAs.
Maybe you should give some look to the tsibble package, built to manipulate time series.
You can parametrize the difference between the times you want, but they need to be evenly spaced in your data to use this solution:
n <- diff(which(grepl('20:00:00|08:00:00', df$times))) + 1
n <- unique(n)
df$means <- data.table::frollmean(df$values, n)
> head(df, 20)
# A tibble: 20 x 3
times values means
<dttm> <dbl> <dbl>
1 2020-01-01 08:00:00 11.4 NA
2 2020-01-01 09:00:00 7.03 NA
3 2020-01-01 10:00:00 7.15 NA
4 2020-01-01 11:00:00 6.91 NA
5 2020-01-01 12:00:00 8.18 NA
6 2020-01-01 13:00:00 4.70 NA
7 2020-01-01 14:00:00 13.8 NA
8 2020-01-01 15:00:00 5.16 NA
9 2020-01-01 16:00:00 12.3 NA
10 2020-01-01 17:00:00 3.81 NA
11 2020-01-01 18:00:00 3.09 NA
12 2020-01-01 19:00:00 9.89 NA
13 2020-01-01 20:00:00 1.24 7.28
14 2020-01-01 21:00:00 8.07 7.02
15 2020-01-01 22:00:00 5.59 6.91
16 2020-01-01 23:00:00 5.77 6.81
17 2020-01-02 00:00:00 10.7 7.10
18 2020-01-02 01:00:00 3.44 6.73
19 2020-01-02 02:00:00 10.3 7.16
20 2020-01-02 03:00:00 4.61 6.45

how to convert irregular timestamps into date format

I have the following time series data:
Date duration Volume
1 1-Jul 12am-2am 0.80
2 1-Jul 2am-4am 0.80
3 1-Jul 4am-6am 0.80
4 1-Jul 6am-8am 5.00
5 1-Jul 8am-10am 14.00
6 1-Jul 10am-12pm 3.40
7 1-Jul 12pm-2pm 0.80
8 1-Jul 2pm-4pm 0.80
9 1-Jul 4pm-6pm 2.40
10 1-Jul 6pm-8pm 12.00
11 1-Jul 8pm-10pm 14.00
12 1-Jul 10pm-12am 3.40
13 1-Jul 12am-2am 0.60
14 1-Jul 2am-4am 0.60
15 1-Jul 4am-6am 0.60
16 1-Jul 6am-8am 5.50
17 1-Jul 8am-10am 14.00
18 1-Jul 10am-12pm 4.00
19 1-Jul 12pm-2pm 0.80
20 1-Jul 2pm-4pm 0.65
21 1-Jul 4pm-6pm 6.30
22 1-Jul 6pm-8pm 19.50
23 1-Jul 8pm-10pm 19.45
24 1-Jul 10pm-12am 9.00
I would like to convert 'Date' & 'duration' column into R date format. Also, is it possible to combine these two columns to a single 'date_time' column to make it easy to forecast using auto.arima().
Thanks
Wanted to challenge myself and stumbled upon this question, this is my solution to it.
#first lets create the sample data
date <- c("1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","1-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul","2-Jul")
duration <- c("12am-2am","2am-4am","4am-6am","6am-8am","8am-10am","10am-12pm","12pm-2pm","2pm-4pm","4pm-6pm","6pm-8pm","8pm-10pm","10pm-12am","12am-2am","2am-4am","4am-6am","6am-8am","8am-10am","10am-12pm","12pm-2pm","2pm-4pm","4pm-6pm","6pm-8pm","8pm-10pm","10pm-12am")
volume <- c("0.80","0.80","0.80","5.00","14.00","3.40","0.80","0.80","2.40","12.00","14.00","3.40","0.60","0.60","0.60","5.50","14.00","4.00","0.80","0.65","6.30","19.50","19.45","9.00")
df <- data.frame(date,duration,volume, stringsAsFactors = F)
bla <- t(as.data.frame(sapply(df$duration, strsplit, "-")))
rownames(bla) <- c(1:nrow(bla))
default_year <- "2020"
#seperate the timestamps in start and end times
df <- cbind(df,bla)
#add current year to the date and make it a as.Date
df$date <- as.Date(paste0(default_year,"-",df$date),format='%Y-%d-%b')
#convert "am" and "pm" to 24h mode
df$`1` <- gsub("12am","00:00",df$`1`)
df$`1` <- gsub("am",":00",df$`1`)
df$`1`[grep("pm",df$`1`)] <- paste0(ifelse(as.numeric(gsub("pm","",df$`1`[grep("pm",df$`1`)]))==12,12,as.numeric(gsub("pm","",df$`1`[grep("pm",df$`1`)]))+12),":00")
df$`2` <- gsub("12am","00:00",df$`2`)
df$`2` <- gsub("am",":00",df$`2`)
df$`2`[grep("pm",df$`2`)] <- paste0(ifelse(as.numeric(gsub("pm","",df$`2`[grep("pm",df$`2`)]))==12,12,as.numeric(gsub("pm","",df$`2`[grep("pm",df$`2`)]))+12),":00")
#paste date and time vectors together
df$t_start <- paste0(df$date,"-",df$`1`)
df$t_end <- paste0(df$date,"-",df$`2`)
#make them posix
df$t_start <- as.POSIXct(df$t_start, format='%Y-%d-%m-%H:%M')
df$t_end <- as.POSIXct(df$t_end, format='%Y-%d-%m-%H:%M')-1
#save it
new_df <- data.frame(df$t_start,df$t_end,df$volume)
new_df
df.t_start df.t_end df.volume
1 2020-01-07 00:00:00 2020-01-07 01:59:59 0.80
2 2020-01-07 02:00:00 2020-01-07 03:59:59 0.80
3 2020-01-07 04:00:00 2020-01-07 05:59:59 0.80
4 2020-01-07 06:00:00 2020-01-07 07:59:59 5.00
5 2020-01-07 08:00:00 2020-01-07 09:59:59 14.00
6 2020-01-07 10:00:00 2020-01-07 11:59:59 3.40
7 2020-01-07 12:00:00 2020-01-07 13:59:59 0.80
8 2020-01-07 14:00:00 2020-01-07 15:59:59 0.80
9 2020-01-07 16:00:00 2020-01-07 17:59:59 2.40
10 2020-01-07 18:00:00 2020-01-07 19:59:59 12.00
11 2020-01-07 20:00:00 2020-01-07 21:59:59 14.00
12 2020-01-07 22:00:00 2020-01-06 23:59:59 3.40
13 2020-02-07 00:00:00 2020-02-07 01:59:59 0.60
14 2020-02-07 02:00:00 2020-02-07 03:59:59 0.60
15 2020-02-07 04:00:00 2020-02-07 05:59:59 0.60
16 2020-02-07 06:00:00 2020-02-07 07:59:59 5.50
17 2020-02-07 08:00:00 2020-02-07 09:59:59 14.00
18 2020-02-07 10:00:00 2020-02-07 11:59:59 4.00
19 2020-02-07 12:00:00 2020-02-07 13:59:59 0.80
20 2020-02-07 14:00:00 2020-02-07 15:59:59 0.65
21 2020-02-07 16:00:00 2020-02-07 17:59:59 6.30
22 2020-02-07 18:00:00 2020-02-07 19:59:59 19.50
23 2020-02-07 20:00:00 2020-02-07 21:59:59 19.45
24 2020-02-07 22:00:00 2020-02-06 23:59:59 9.00

How to index one minute intraday data in xts?

I have worked with daily stock data using quantmod. Quantmod automatically dowloads data from google/yahoo finance sites and convert automatically to a xts object as date as the index.
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
2014-10-01 100.59 100.69 98.70 99.18 51491300 97.09741
2014-10-02 99.27 100.22 98.04 99.90 47757800 97.80230
2014-10-03 99.44 100.21 99.04 99.62 43469600 97.52818
2014-10-06 99.95 100.65 99.42 99.62 37051200 97.52818
2014-10-07 99.43 100.12 98.73 98.75 42094200 96.67644
2014-10-08 98.76 101.11 98.31 100.80 57404700 98.68340
2014-10-09 101.54 102.38 100.61 101.02 77376500 98.89877
Now I am woking with intraday data(csv format) of one minute duration which I converted to a data frame(df) of six column.
Date Time Open High Low Close
1 20150408 09:17:00 7.15 7.15 7.10 7.10
2 20150408 09:18:00 7.15 7.15 7.15 7.15
3 20150408 09:19:00 7.10 7.10 7.10 7.10
4 20150408 09:20:00 7.10 7.10 7.05 7.10
5 20150408 09:21:00 7.10 7.15 7.10 7.10
6 20150408 09:22:00 7.10 7.10 7.05 7.10
Now how to convert this dataframe to a time series in such a way that I can use it with the default quantmod functions such as Cl(),Op(),OHLC() etc.
Elementary, dear Watson: combine date and time into a POSIXct, use that.
Untested as you supplied no reproducible data:
pt <- as.POSIXct(paste(X$Date, X$Time), format="%Y%m%d %H:%M:%S")
N <- xts(X[, -(1:2)], order.by=pt)
Here X is your current data.frame, and N is a new xts object formed from the data of X (minus date and time) using pt as the index.

Compute column average based on date and time in R

I have a matrix, which looks a bit like this:
Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492
It contains a years worth of data, with each day having a 96 rows (15 minute intervals from 00:00 to 23:45). My question is that I'd like to average the data column, for each day, based on the time range I specify. For example, if I wanted to average over times 06:00 - 08:00 for each day, in the code above I should get an answer of 1.0964 for the date 04/09/2014.
I have no idea how to do this using the date and time columns as filters, and wondered if someone could help?
To make things even more complicated, I would also like to compute 45 minute rolling averages for each day, within a different time period, say 04:00 - 09:00. Again, as this is for each day, it would be good to get the result in a matrix for which each row is a certain date, then the columns would represent the rolling averages from say, 04:00 - 04:45, 04:15 - 05:00...
Any ideas?!
check the following code and let me know if anything is unclear
data = read.table(header = T, stringsAsFactors = F, text = "Index Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492")
library("magrittr")
data$parsed.timestamp = paste(data$Date, data$Time) %>% strptime(., format = "%d/%m/%Y %H:%M:%S")
# Hourly Average
desiredGroupingUnit = cut(data$parsed.timestamp, breaks = "hour") #You can use substr for that also
aggregate(data$Data, by = list(desiredGroupingUnit), FUN = mean )
# Group.1 x
# 1 2014-09-04 05:00:00 0.908
# 2 2014-09-04 06:00:00 1.286
# 3 2014-09-04 07:00:00 0.951
# 4 2014-09-04 08:00:00 0.937
# 5 2014-09-04 09:00:00 1.113
# 6 2014-09-04 10:00:00 0.492
# Moving average
getAvgBetweenTwoTimeStamps = function(data, startTime, endTime) {
avergeThoseIndcies = which(data$parsed.timestamp >= startTime & data$parsed.timestamp <= endTime)
return(mean(data$Data[avergeThoseIndcies]))
}
movingAvgWindow = 45*60 #minutes
movingAvgTimestamps = data.frame(from = data$parsed.timestamp, to = data$parsed.timestamp + movingAvgWindow)
movingAvgTimestamps$movingAvg =
apply(movingAvgTimestamps, MARGIN = 1,
FUN = function(x) getAvgBetweenTwoTimeStamps(data = data, startTime = x["from"], endTime = x["to"]))
print(movingAvgTimestamps)
# from to movingAvg
# 1 2014-09-04 05:45:00 2014-09-04 06:30:00 1.1010000
# 2 2014-09-04 06:00:00 2014-09-04 06:45:00 1.2860000
# 3 2014-09-04 06:15:00 2014-09-04 07:00:00 1.3550000
# 4 2014-09-04 06:30:00 2014-09-04 07:15:00 1.2530000
# 5 2014-09-04 06:45:00 2014-09-04 07:30:00 1.1000000
# 6 2014-09-04 07:00:00 2014-09-04 07:45:00 0.9510000
# 7 2014-09-04 07:15:00 2014-09-04 08:00:00 0.8900000
# 8 2014-09-04 07:30:00 2014-09-04 08:15:00 0.9120000
# 9 2014-09-04 07:45:00 2014-09-04 08:30:00 0.9500000
# 10 2014-09-04 08:00:00 2014-09-04 08:45:00 0.9370000
# 11 2014-09-04 08:15:00 2014-09-04 09:00:00 1.0620000
# 12 2014-09-04 08:30:00 2014-09-04 09:15:00 1.1320000
# 13 2014-09-04 08:45:00 2014-09-04 09:30:00 1.0880000
# 14 2014-09-04 09:00:00 2014-09-04 09:45:00 1.1130000
# 15 2014-09-04 09:15:00 2014-09-04 10:00:00 0.8810000
# 16 2014-09-04 09:30:00 2014-09-04 10:15:00 0.8626667
# 17 2014-09-04 09:45:00 2014-09-04 10:30:00 0.7960000
# 18 2014-09-04 10:00:00 2014-09-04 10:45:00 0.4920000

Loading a csv file as a ts

Below are monthly prices of a particular stock;
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2008 46.09 50.01 48 48 50.15 43.45 41.05 41.67 36.66 25.02 22.98 22
2009 20.98 15 13.04 14.4 26.46 14.32 14.6 11.83 14 14.4 13.07 13.6
2010 15.31 15.71 18.97 15.43 13.5 13.8 14.21 12.73 12.35 13.17 14.59 15.01
2011 15.3 15.22 15.23 15 15.1 14.66 14.8 12.02 12.41 12.9 11.6 12.18
2012 12.45 13.33 12.4 14.16 13.99 13.75 14.4 15.38 16.3 18.02 17.29 19.49
2013 20.5 20.75 21.3 20.15 22.2 19.8 19.75 19.71 19.99 21.54 21.3 27.4
2014 23.3 20.5 20 22.7 25.4 25.05 25.08 24.6 24.5 21.2 20.52 18.41
2015 16.01 17.6 20.98 21.15 21.44 0 0 0 0 0 0 0
I want to decompose the data into seasonal and trend data but I am not getting a result.
How can I load the data as a "ts" class data so I can decompose it?
Here is a solution using tidyr, which is fairly accessible.
library(dplyr); library(tidyr)
data %>% gather(month, price, -Year) %>% # 1 row per year-month pair, name the value "price"
mutate(synth_date_txt= paste(month,"1,",Year), # combine month and year into a date string
date=as.Date(synth_date_txt,format="%b %d, %Y")) %>% # convert date string to date
select(date, price) # keep just the date and price
# date price
# 1 2008-01-01 46.09
# 2 2009-01-01 20.98
# 3 2010-01-01 15.31
# 4 2011-01-01 15.30
# 5 2012-01-01 12.45
This gives you an answer with date format (even though you didn't specify a date, just a month and year). It should work for your time series analysis, but if you really need a timestamp you can just use as.POSIXct(date)
Mike,
The program is R and below is the code I have tried.
sev=read.csv("X7UPM.csv")
se=ts(sev,start=c(2008, 1), end=c(2015,1), frequency=12)
se
se=se[,1]
S=decompose(se)
plot(se,col=c("blue"))
plot(decompose(se))
S.decom=decompose(se,type="mult")
plot(S.decom)
trend=S.decom$trend
trend
seasonal=S.decom$seasonal
seasonal
ts.plot(cbind(trend,trend*seasonal),lty=1:2)
plot(stl(se,"periodic"))

Resources