The smallest date of `v` which makes the difference `w-v` positive - r

From these vectors of dates
v<-c("2019-12-06 01:32:30 UTC","2019-12-31 18:44:31 UTC","2020-01-29 22:18:25 UTC","2020-03-22 16:44:29 UTC")
v<-as.POSIXct(v)
w<-c("2019-12-07 00:11:46","2020-01-01 05:29:45","2019-12-08 02:54:10","2020-03-23 07:48:26","2020-02-02 16:58:16","2020-01-31 06:46:46")
w<-as.POSIXct(w)
I would like to obtain a dataframe of two columns. One of them is just w. The second is built on v entries so that in the row there is the smallest date of v which makes the difference w-v positive. For example, the difference
w-rep(v[1],length(w))
Time differences in hours
[1] 22.65444 627.95417 49.36111 2598.26556 1407.42944 1349.23778
Then, if the second column of the desired dataframe is w, then the first one has at the first row the date 2019-12-06 01:32:30 UTC. The operation should be:
date <- w-rep(v[1],length(w))
v[date==min(date[date>0])]
Then the first row of the dataframe should be
2019-12-06 01:32:30 UTC, 2019-12-07 00:11:46
How could I build the others row wihtout using loops?

How about this:
o <- outer(w, v, `-`)
o
# Time differences in hours
# [,1] [,2] [,3] [,4]
# [1,] 22.65444 -594.54583 -1294.11083 -2559.54528
# [2,] 627.95417 10.75389 -688.81111 -1954.24556
# [3,] 49.36111 -567.83917 -1267.40417 -2532.83861
# [4,] 2597.26556 1980.06528 1280.50028 15.06583
# [5,] 1407.42944 790.22917 90.66417 -1174.77028
# [6,] 1349.23778 732.03750 32.47250 -1232.96194
We don't want negative values, so
o[o < 0] <- NA
o
# Time differences in hours
# [,1] [,2] [,3] [,4]
# [1,] 22.65444 NA NA NA
# [2,] 627.95417 10.75389 NA NA
# [3,] 49.36111 NA NA NA
# [4,] 2597.26556 1980.06528 1280.50028 15.06583
# [5,] 1407.42944 790.22917 90.66417 NA
# [6,] 1349.23778 732.03750 32.47250 NA
Now just apply which.min on each row, then subset v on this value:
apply(o, 1, which.min)
# [1] 1 2 1 4 3 3
v[apply(o, 1, which.min)]
# [1] "2019-12-06 01:32:30 EST" "2019-12-31 18:44:31 EST" "2019-12-06 01:32:30 EST" "2020-03-22 16:44:29 EDT"
# [5] "2020-01-29 22:18:25 EST" "2020-01-29 22:18:25 EST"
data.frame(w=w, v2=v[apply(o, 1, which.min)])
# w v2
# 1 2019-12-07 00:11:46 2019-12-06 01:32:30
# 2 2020-01-01 05:29:45 2019-12-31 18:44:31
# 3 2019-12-08 02:54:10 2019-12-06 01:32:30
# 4 2020-03-23 07:48:26 2020-03-22 16:44:29
# 5 2020-02-02 16:58:16 2020-01-29 22:18:25
# 6 2020-01-31 06:46:46 2020-01-29 22:18:25

Related

Linear interpolation R

I have this data.frame (12x2)called df_1 which represents monthly values :
month df_test
[1,] 1 -1.4408567
[2,] 2 -1.0007642
[3,] 3 2.1454113
[4,] 4 1.6935537
[5,] 5 0.1149219
[6,] 6 -1.3205144
[7,] 7 1.0277486
[8,] 8 1.0323482
[9,] 9 -0.1442319
[10,] 10 -0.2091197
[11,] 11 -0.6803158
[12,] 12 0.5965196
and this data.frame(8760x2) called df_2 where each rows represent a value associated to an interval of one hour of a day. This data.frame contains hourly values for one year:
time df_time
1 2015-01-01 00:00:00 -0.4035650
2 2015-01-01 01:00:00 0.1800579
3 2015-01-01 02:00:00 -0.3770589
4 2015-01-01 03:00:00 0.2573456
5 2015-01-01 04:00:00 1.2000178
6 2015-01-01 05:00:00 -0.4276127
...........................................
time df_time
8755 2015-12-31 18:00:00 1.3540119
8756 2015-12-31 19:00:00 0.4852843
8757 2015-12-31 20:00:00 -0.9194670
8758 2015-12-31 21:00:00 -1.0751814
8759 2015-12-31 22:00:00 1.0097749
8760 2015-12-31 23:00:00 -0.1032468
I want to obtain df_1 for each hour of each day. The problem is that all months do not have the same amount of days.
Finally we should obtain a data.frame called df_3 (8760x2) that has interpolated values between the values of df_1.
Thanks for help!
Here's done with zoo. I'm assuming that the monthly value is associated with a specific datetime stamp (middle of the month, midnight) - you have to do that. If you want a different datetime stamp, just change the value.
library(zoo)
library(dplyr)
library(tidyr)
df_3 <- df_1 %>%
mutate(time = paste(2015, month, "15 00:00:00", sep = "-"),
time = as.POSIXct(strptime(time, "%Y-%m-%d %H:%M:%S"))) %>%
full_join(df_2) %>%
arrange(time) %>%
mutate(df_test = na.approx(df_test, rule = 2))

Applying SetNames to a list of data frames

I'm running into an issue applying new names to a list of data frames. I'm using quantmod to pull stock data, and then calculating the 7-Day moving average in this example. I can create the new columns within the list of data frames, but when I go to rename them using lapply and setNames it is only returning the newly renamed column and not any of the old data in each data frame.
require(quantmod)
require(zoo)
# Select Symbols
symbols <- c('AAPL','GOOG')
# Set start Date
start_date <- '2017-01-01'
# Get data and put data xts' into a list. Create empty list and then loop through to add all symbol data
stocks <- list()
for (i in 1:length(symbols)) {
stocks[[i]] <- getSymbols(symbols[i], src = 'google', from = start_date, auto.assign = FALSE)
}
##### Create the 7 day moving average for each stock in the stocks list #####
stocks <- lapply(stocks, function(x) cbind(x, rollmean(x[,4], 7, align = "right")))
Sample Output:
[[1]]
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Close.1
2017-01-03 115.80 116.33 114.76 116.15 28781865 NA
2017-01-04 115.85 116.51 115.75 116.02 21118116 NA
2017-01-05 115.92 116.86 115.81 116.61 22193587 NA
2017-01-06 116.78 118.16 116.47 117.91 31751900 NA
2017-01-09 117.95 119.43 117.94 118.99 33561948 NA
2017-01-10 118.77 119.38 118.30 119.11 24462051 NA
2017-01-11 118.74 119.93 118.60 119.75 27588593 117.7914
2017-01-12 118.90 119.30 118.21 119.25 27086220 118.2343
2017-01-13 119.11 119.62 118.81 119.04 26111948 118.6657
[[2]]
GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Close.1
2017-01-03 778.81 789.63 775.80 786.14 1657268 NA
2017-01-04 788.36 791.34 783.16 786.90 1072958 NA
2017-01-05 786.08 794.48 785.02 794.02 1335167 NA
2017-01-06 795.26 807.90 792.20 806.15 1640170 NA
2017-01-09 806.40 809.97 802.83 806.65 1274645 NA
2017-01-10 807.86 809.13 803.51 804.79 1176780 NA
2017-01-11 805.00 808.15 801.37 807.91 1065936 798.9371
2017-01-12 807.14 807.39 799.17 806.36 1353057 801.8257
2017-01-13 807.48 811.22 806.69 807.88 1099215 804.8229
I would like to change the "AAPL.Close.1" and "GOOG.Close.1" to say "AAPL.Close.7.Day.MA" and "GOOG.Close.7.Day.MA" respectively (for however many symbols that I choose at the top).
The closest that I've gotten is:
stocks <- lapply(stocks[], function(x) setNames(x[,6], paste0(names(x[,4]), ".7.Day.MA")))
This is correctly naming the new columns, but now my stocks list only contains that single column for each ticker:
[[1]]
AAPL.Close.7.Day.MA
2017-01-03 NA
2017-01-04 NA
2017-01-05 NA
2017-01-06 NA
2017-01-09 NA
2017-01-10 NA
2017-01-11 117.7914
2017-01-12 118.2343
2017-01-13 118.6657
[[2]]
GOOG.Close.7.Day.MA
2017-01-03 NA
2017-01-04 NA
2017-01-05 NA
2017-01-06 NA
2017-01-09 NA
2017-01-10 NA
2017-01-11 798.9371
2017-01-12 801.8257
2017-01-13 804.8229
Why is the setNames function removing the original columns?
Almost there:
N = 10 #number of pseudorandom numbers
df1 <- data.frame(a=runif(N),b=sample(N))#1st data frame
df2 <- data.frame(c=rnorm(N),google=df1$b^2,e=df1$a^3)#2nd data frame
stocks<-list(df1,df2)# create the list
lapply(stocks,names) # get the names of each list element (data.frame)
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "google" "e"
Since we are using a function we need to use the <<- in order to overwrite the initial object stocks.
lapply(seq_along(1:length(stocks)),function(x) names(stocks[[x]])<<-gsub(pattern = "google",replacement = "google2",x=names(stocks[[x]])))#replacing the string google
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "google2" "e"
Additionally (verification) stocks contains the new names:
> stocks
[[1]]
a b
1 0.73826897 3
2 0.35627664 8
3 0.89060134 7
4 0.72629312 10
5 0.97069742 4
6 0.12530931 2
7 0.65744257 9
8 0.06218019 1
9 0.67322891 6
10 0.66128204 5
[[2]]
c google2 e
1 -0.5272267 9 0.402386917
2 0.6993945 64 0.045223278
3 0.3707304 49 0.706398932
4 -0.2371541 100 0.383120861
5 1.5073834 16 0.914643019
6 0.4098821 4 0.001967660
7 -0.3014211 81 0.284166886
8 0.3248919 1 0.000240412
9 1.2757740 36 0.305132358
10 1.5938208 25 0.289174620

Add posixlt as a new column to a dataframe

I am creating some random numbers:
data <- matrix(runif(10, 0, 1), ncol = 2)
dataframe <- data.frame(data)
> dataframe
X1 X2
1 0.7981783 0.13233858
2 0.9592338 0.05512942
3 0.1812384 0.74571334
4 0.1447498 0.96656930
5 0.1735390 0.37345575
and I want to create a corresponding timestamp column and bind that to the above data frame.
time <- as.POSIXlt(runif(10, 0, 60), origin = "2017-05-05 10:00:00")
This creates 10 values.
> time
[1] "2017-05-05 13:00:27 EEST" "2017-05-05 13:00:02 EEST" "2017-05-05 13:00:26 EEST" "2017-05-05 13:00:25 EEST" "2017-05-05 13:00:28 EEST"
[6] "2017-05-05 13:00:17 EEST" "2017-05-05 13:00:35 EEST" "2017-05-05 13:00:08 EEST" "2017-05-05 13:00:29 EEST" "2017-05-05 13:00:32 EEST"
Now, I want to bind it to the dataframe, so I thought first to make it a matrix:
time <- matrix(time, nrow = 5, ncol = 2)
but this gives me:
Warning message:
In matrix(time, nrow = 5, ncol = 2) :
data length [11] is not a sub-multiple or multiple of the number of rows [5]
The reason is that POSIXlt stores the date time as a list of attributes whereas POSIXct would not. So, it is better to use as.POSIXct
time <- as.POSIXct(runif(10, 0, 60), origin = "2017-05-05 10:00:00")
If we need to store, it can be done as a list of data.frame
data.frame(date1= time[1:5], date2 = time[6:10])
without converting to matrix as 'Datetime' gets coerced to integer storage mode when converted to matrix.
Suppose, we proceed with POSIXlt, then we find the list of attributes
time1 <- as.POSIXlt(runif(10, 0, 60), origin = "2017-05-05 10:00:00")
unclass(time1)
#$sec
# [1] 13.424695 40.860449 57.756890 59.072140 24.425521 39.429729 58.309546
# [8] 6.294982 46.613436 25.444415
#$min
# [1] 30 30 30 30 30 30 30 30 30 30
#$hour
# [1] 15 15 15 15 15 15 15 15 15 15
#$mday
# [1] 5 5 5 5 5 5 5 5 5 5
#$mon
# [1] 4 4 4 4 4 4 4 4 4 4
#$year
# [1] 117 117 117 117 117 117 117 117 117 117
#$wday
# [1] 5 5 5 5 5 5 5 5 5 5
#$yday
# [1] 124 124 124 124 124 124 124 124 124 124
#$isdst
# [1] 0 0 0 0 0 0 0 0 0 0
#$zone
# [1] "IST" "IST" "IST" "IST" "IST" "IST" "IST" "IST" "IST" "IST"
#$gmtoff
# [1] 19800 19800 19800 19800 19800 19800 19800 19800 19800 19800
#attr(,"tzone")
#[1] "" "IST" "IST"
With POSIXct, it is the integer storage values that can be found by unclass
unclass(time)
#[1] 1493978445 1493978451 1493978432 1493978402 1493978447 1493978441
#[7] 1493978445 1493978450 1493978419 1493978425
#attr(,"tzone")
#[1] ""

As.XTS from Matrix - Error - Adds time and timezone info

For some reason I do not understand, when I run as.xts to convert from a matrix with a date in rownames, this operation will generate a Date Time in the end. Since this is different from the start indexes merge/cbinds will not work.
Can someone point me what am I doing wrong?
> class(x)
[1] "xts" "zoo"
> head(x)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 0.042255791 0.017219585 0.17841600 0.010806168 0.04960026
2005-08-31 0.034117087 0.009951766 0.18476766 0.015245222 0.03825968
2005-09-30 -0.029594066 0.008697349 0.22851906 0.009769765 0.02944754
2005-10-31 -0.015653740 0.019966664 0.09314327 -0.012705172 0.01640395
2005-11-30 -0.005593003 0.005932542 0.05437377 -0.005209811 0.03173972
2005-12-31 0.005084193 0.021293537 0.05672958 0.002592639 0.04045477
> head(index(x))
[1] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31"
> temp=t(apply(-x, 1, rank, na.last = "keep"))
> class(temp)
[1] "matrix"
> head(temp)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 3 4 1 5 2
2005-08-31 3 5 1 4 2
2005-09-30 5 4 1 3 2
2005-10-31 5 2 1 4 3
2005-11-30 5 3 1 4 2
2005-12-31 4 3 1 5 2
> head(rownames(temp))
[1] "2005-07-31" "2005-08-31" "2005-09-30" "2005-10-31" "2005-11-30" "2005-12-31"
> y=as.xts(temp)
> class(y)
[1] "xts" "zoo"
> head(y)
XLY.Adjusted XLP.Adjusted XLE.Adjusted AGG.Adjusted IVV.Adjusted
2005-07-31 3 4 1 5 2
2005-08-31 3 5 1 4 2
2005-09-30 5 4 1 3 2
2005-10-31 5 2 1 4 3
2005-11-30 5 3 1 4 2
2005-12-31 4 3 1 5 2
> head(index(y))
[1] "2005-07-31 BST" "2005-08-31 BST" "2005-09-30 BST" "2005-10-31 GMT" "2005-11-30 GMT" "2005-12-31 GMT"
as.xts.matrix has a dateFormat argument that defaults to "POSIXct", so it assumes the rownames of your matrix are datetimes. If you want them to simply be dates, specify dateFormat="Date" in your as.xts call.
y <- as.xts(temp, dateFormat="Date")

aggregating 15minute time series data to daily

This is the data in my text file: (I have shown 10 rows out of 10,000)
Index is the rownames, temp is time series and m are the values in mm.
"Index" "temp" "m"
1 "2012-02-07 18:15:13" "4297"
2 "2012-02-07 18:30:04" "4296"
3 "2012-02-07 18:45:10" "4297"
4 "2012-02-07 19:00:01" "4297"
5 "2012-02-07 19:15:07" "4298"
6 "2012-02-07 19:30:13" "4299"
7 "2012-02-07 19:45:04" "4299"
8 "2012-02-07 20:00:10" "4299"
9 "2012-02-07 20:15:01" "4300"
10 "2012-02-07 20:30:07" "4301"
Which I import in r using this:
x2=read.table("data.txt", header=TRUE)
I tried using the following code for aggregating the time series to daily data :
c=aggregate(ts(x2[, 2], freq = 96), 1, mean)
I have set the frequency to 96 because for 15 min data 24 hrs will be covered in 96 values.
it returns me this:
Time Series:
Start = 1
End = 5
Frequency = 1
[1] 5366.698 5325.115 5311.969 5288.542 5331.115
But i want the same format in which I have my original data i.e. I also want the time series next to the values.
I need help in achieving that.
Use the apply.daily from the xts package after converting your data to an xts object:
Something like this should work:
x2 = read.table(header=TRUE, text=' "Index" "temp" "m"
1 "2012-02-07 18:15:13" "4297"
2 "2012-02-07 18:30:04" "4296"
3 "2012-02-07 18:45:10" "4297"
4 "2012-02-07 19:00:01" "4297"
5 "2012-02-07 19:15:07" "4298"
6 "2012-02-07 19:30:13" "4299"
7 "2012-02-07 19:45:04" "4299"
8 "2012-02-07 20:00:10" "4299"
9 "2012-02-07 20:15:01" "4300"
10 "2012-02-07 20:30:07" "4301"')
x2$temp = as.POSIXct(strptime(x2$temp, "%Y-%m-%d %H:%M:%S"))
require(xts)
x2 = xts(x = x2$m, order.by = x2$temp)
apply.daily(x2, mean)
## [,1]
## 2012-02-07 20:30:07 4298.3
Update: Your problem in a reproducable format (with fake data)
We don't always need the actual dataset to be able to help troubleshoot....
set.seed(1) # So you can get the same numbers as I do
x = data.frame(datetime = seq(ISOdatetime(1970, 1, 1, 0, 0, 0),
length = 384, by = 900),
m = sample(2000:4000, 384, replace = TRUE))
head(x)
# datetime m
# 1 1970-01-01 00:00:00 2531
# 2 1970-01-01 00:15:00 2744
# 3 1970-01-01 00:30:00 3146
# 4 1970-01-01 00:45:00 3817
# 5 1970-01-01 01:00:00 2403
# 6 1970-01-01 01:15:00 3797
require(xts)
x2 = xts(x$m, x$datetime)
head(x2)
# [,1]
# 1970-01-01 00:00:00 2531
# 1970-01-01 00:15:00 2744
# 1970-01-01 00:30:00 3146
# 1970-01-01 00:45:00 3817
# 1970-01-01 01:00:00 2403
# 1970-01-01 01:15:00 3797
apply.daily(x2, mean)
# [,1]
# 1970-01-01 23:45:00 3031.302
# 1970-01-02 23:45:00 3043.250
# 1970-01-03 23:45:00 2896.771
# 1970-01-04 23:45:00 2996.479
Update 2: A workaround alternative
(Using the fake data I've provided in the above update.)
data.frame(time = x[seq(96, nrow(x), by=96), 1],
mean = aggregate(ts(x[, 2], freq = 96), 1, mean))
# time mean
# 1 1970-01-01 23:45 3031.302
# 2 1970-01-02 23:45 3043.250
# 3 1970-01-03 23:45 2896.771
# 4 1970-01-04 23:45 2996.479
This would be a way to do it in base R:
x2 <- within(x2, {
temp <- as.POSIXct(temp, format='%Y-%m-%d %H:%M:%S')
days <- as.POSIXct(cut(temp, breaks='days'))
m <- as.numeric(m)
})
with(x2, aggregate(m, by=list(days=days), mean))

Resources