How to make a temporal variogram - r

How can I make a temporal variogram of a set of 20 datetime values of rainfall with a frequency of 15 minutes? How should I transfer the datetime values to temporal distances?
datetime R
0 2011-08-05 14:45:00 0.000
1 2011-08-05 15:00:00 0.000
2 2011-08-05 15:15:00 0.000
3 2011-08-05 15:30:00 0.000
4 2011-08-05 15:45:00 4.318
5 2011-08-05 16:00:00 3.302
6 2011-08-05 16:15:00 6.604
7 2011-08-05 16:30:00 0.000
...
19 2011-08-05 19:30:00 0.000
I solved already some spatial and directional variogram in Rstudio, but it is hard to solve temporal variogram. Have anyone some suggestions?

Related

Aggregate/ Grouping function in R with time as posixct object and multiple operations

i have a problem to convert this data frame called "mergeddata" from
time
Score
comms_num
close
volume
returns
GMEHolding
2021-01-28 10:00:00
795
63
335.0000
216927
0.0547
Yes
2021-01-28 10:00:00
250
16
335.0000
216927
0.0547
No
2021-01-28 11:00:00
3295
86
456.0000
711496
0.396
No
2021-01-28 11:00:00
561
86
456.0000
711496
0.396
No
2021-01-28 11:00:00
978
86
456.0000
711496
0.396
Yes
2021-01-28 11:00:00
3212
86
456.0000
711496
0.396
No
2021-01-28 12:00:00
2147
156
445.0000
754078
0.234
No
2021-01-28 12:00:00
39
66
445.0000
754078
0.234
No
to below outcome where Score, comms_num, close, volume, returns are grouped as median but GMEHolding is calculated as a % ratio of all posts within the hour.
For instance line 1 should look like this:
time
Score
comms_num
close
volume
returns
GMEHolding
2021-01-28 10:00:00
795
63
335.0000
216927
0.0547
0.5
2021-01-28 11:00:00
2095
86
456.0000
711496
0.396
0.25
I tried the aggregate function but cannot differentiate the 2 operations of sum and median within it correctly. "aggregate(x = mergeddata$GMEHolding,"...
Furthermore, time is a posixct object which also seems to make things more difficult. Does anyone can help me out here?
mergeddata %>% group_by(time) %>% summarize(across(comms_num:returns, median), GMEHolding = mean(GMEHolding == "Yes"))
lead to this: But the first group should not return 0 for GMEHolding

How can I calculate median between my changepoint locations?

I have a data frame and applied the changepoint.np package to it. Now I want to calculate the median or display a trendline between these changepoint locations (red lines) to it.
Any ideas how to do this?
My dataframe df1
date amount
2012-07-01 20.0000000
2012-08-01 11.1111111
2012-09-01 0.0000000
2012-10-01 0.0000000
2012-11-01 4.7619048
2012-12-01 4.7619048
2013-01-01 7.8947368
2013-02-01 0.0000000
2013-03-01 0.0000000
2013-04-01 1.8181818
2013-05-01 0.0000000
2013-06-01 0.0000000
2013-07-01 0.0000000
2013-08-01 0.0000000
2013-09-01 1.7543860
2013-10-01 0.6410256
2013-11-01 3.0534351
2013-12-01 2.6143791
2014-01-01 7.6023392
2014-02-01 2.7777778
2014-03-01 5.2884615
2014-04-01 2.7237354
2014-05-01 2.3255814
2014-06-01 2.6627219
2014-07-01 2.0710059
2014-08-01 2.7522936
2014-09-01 4.6413502
2014-10-01 4.4077135
2014-11-01 3.4759358
2014-12-01 4.3333333
2015-01-01 8.0128205
2015-02-01 9.3632959
2015-03-01 4.3771044
2015-04-01 4.0650407
2015-05-01 3.7500000
2015-06-01 4.6189376
2015-07-01 3.6764706
2015-08-01 2.4561404
2015-09-01 2.9090909
2015-10-01 2.1084337
And my code for the changepoint:
library(changepoint.np)
out <- cpt.np(df1$amount, method = 'PELT')
plot(out)
median(df1$amount) for the median, for the trendline you would first have to tell us the actual values. Lines can be added to a plot with the lines() function, that is coordinates for first and last point.

R - Gap fill a time series

I am trying to fill in the gaps in one of my time series by merging a full day time series into my original time series. But for some reason I get duplicate entries and all the rest of my data is NA.
My data looks like this:
> head(data)
TIME Water_Temperature
1 2016-08-22 00:00:00 81.000
2 2016-08-22 00:01:00 80.625
3 2016-08-22 00:02:00 85.000
4 2016-08-22 00:03:00 80.437
5 2016-08-22 00:04:00 85.000
6 2016-08-22 00:05:00 80.375
> tail(data)
TIME Water_Temperature
1398 2016-08-22 23:54:00 19.5
1399 2016-08-22 23:55:00 19.5
1400 2016-08-22 23:56:00 19.5
1401 2016-08-22 23:57:00 19.5
1402 2016-08-22 23:58:00 19.5
1403 2016-08-22 23:59:00 19.5
In between are some minutes missing (1403 rows instead of 1440). I tried to fill them in using:
data.length <- length(data$TIME)
time.min <- data$TIME[1]
time.max <- data$TIME[data.length]
all.dates <- seq(time.min, time.max, by="min")
all.dates.frame <- data.frame(list(TIME=all.dates))
merged.data <- merge(all.dates.frame, data, all=T)
But that gives me a result of 1449 rows instead of 1440. The first eight minutes are duplicates in the time stamp column and all other values in Water_Temperature are NA. Looks like this:
> merged.data[1:25,]
TIME Water_Temperature
1 2016-08-22 00:00:00 NA
2 2016-08-22 00:00:00 81.000
3 2016-08-22 00:01:00 NA
4 2016-08-22 00:01:00 80.625
5 2016-08-22 00:02:00 NA
6 2016-08-22 00:02:00 85.000
7 2016-08-22 00:03:00 NA
8 2016-08-22 00:03:00 80.437
9 2016-08-22 00:04:00 NA
10 2016-08-22 00:04:00 85.000
11 2016-08-22 00:05:00 NA
12 2016-08-22 00:05:00 80.375
13 2016-08-22 00:06:00 NA
14 2016-08-22 00:06:00 80.812
15 2016-08-22 00:07:00 NA
16 2016-08-22 00:07:00 80.812
17 2016-08-22 00:08:00 NA
18 2016-08-22 00:08:00 80.937
19 2016-08-22 00:09:00 NA
20 2016-08-22 00:10:00 NA
21 2016-08-22 00:11:00 NA
22 2016-08-22 00:12:00 NA
23 2016-08-22 00:13:00 NA
24 2016-08-22 00:14:00 NA
25 2016-08-22 00:15:00 NA
> tail(merged.data)
TIME Water_Temperature
1444 2016-08-22 23:54:00 NA
1445 2016-08-22 23:55:00 NA
1446 2016-08-22 23:56:00 NA
1447 2016-08-22 23:57:00 NA
1448 2016-08-22 23:58:00 NA
1449 2016-08-22 23:59:00 NA
Does anyone has an idea whats going wrong?
EDIT:
Using the xts and zoo package now to do the job by doing:
library(xts)
library(zoo)
df1.zoo<-zoo(data[,-1],data[,1])
df2 <- as.data.frame(as.zoo(merge(as.xts(df1.zoo), as.xts(zoo(,seq(start(df1.zoo),end(df1.zoo),by="min"))))))
Very easy and effective!
Instead of merge use rbind which gives you an irregular time series without NAs to start with. If you really want a regular time series with a frequency of say 1 minute you can build a time based sequence as an index and merge it with your data after ( after using rbind) and fill the resulting NAs with na.locf. Hope this helps.
you can try merging with full_join from tidyverse
This works for me with two dataframes (daily values) sharing a column named date.
big_data<-my_data %>%
reduce(full_join, by="Date")

R: calculate average over a specific time window in a time series data frame

My dataset is a bit noisy at 1-min interval. So, I'd like to get an average value every hour from 25 min to 35 min to stand for that hour at 30 min.
For example, an average average at: 00:30 (average from 00:25 to 00:35), 01:30 (average from 01:25 to 01:35), 02:30 (average from 02:25 to 02:35), etc.
Can you good way to do this in R?
Here is my dataset:
set.seed(1)
DateTime <- seq(as.POSIXct("2010/1/1 00:00"), as.POSIXct("2010/1/5 00:00"), "min")
value <- rnorm(n=length(DateTime), mean=100, sd=1)
df <- data.frame(DateTime, value)
Thanks a lot.
Here's one way
library(dplyr)
df %>%
filter(between(as.numeric(format(DateTime, "%M")), 25, 35)) %>%
group_by(hour=format(DateTime, "%Y-%m-%d %H")) %>%
summarise(value=mean(value))
I think that the existing answers are not general enough as they do not take into account that a time interval could fall within multiple midpoints.
I would instead use shift from the data.table package.
library(data.table)
setDT(df)
First set the interval argument based on the sequence you chose above. This calculates an average ten rows (minutes) around every row in your table:
df[, ave_val :=
Reduce('+',c(shift(value, 0:5L, type = "lag"),shift(value, 1:5L, type = "lead")))/11
]
Then generate the midpoints you want:
mids <- seq(as.POSIXct("2010/1/1 00:00"), as.POSIXct("2010/1/5 00:00"), by = 60*60) + 30*60 # every hour starting at 0:30
Then filter accordingly:
setkey(df,DateTime)
df[J(mids)]
Since you want to average on just a subset of each period, I think it makes sense to first subset the data.frame, then aggregate:
aggregate(
value~cbind(time=strftime(DateTime,'%Y-%m-%d %H:30:00')),
subset(df,{ m <- strftime(DateTime,'%M'); m>='25' & m<='35'; }),
mean
);
## time value
## 1 2010-01-01 00:30:00 99.82317
## 2 2010-01-01 01:30:00 100.58184
## 3 2010-01-01 02:30:00 99.54985
## 4 2010-01-01 03:30:00 100.47238
## 5 2010-01-01 04:30:00 100.05517
## 6 2010-01-01 05:30:00 99.96252
## 7 2010-01-01 06:30:00 99.79512
## 8 2010-01-01 07:30:00 99.06791
## 9 2010-01-01 08:30:00 99.58731
## 10 2010-01-01 09:30:00 100.27202
## 11 2010-01-01 10:30:00 99.60758
## 12 2010-01-01 11:30:00 99.92074
## 13 2010-01-01 12:30:00 99.65819
## 14 2010-01-01 13:30:00 100.04202
## 15 2010-01-01 14:30:00 100.04461
## 16 2010-01-01 15:30:00 100.11609
## 17 2010-01-01 16:30:00 100.08631
## 18 2010-01-01 17:30:00 100.41956
## 19 2010-01-01 18:30:00 99.98065
## 20 2010-01-01 19:30:00 100.07341
## 21 2010-01-01 20:30:00 100.20281
## 22 2010-01-01 21:30:00 100.86013
## 23 2010-01-01 22:30:00 99.68170
## 24 2010-01-01 23:30:00 99.68097
## 25 2010-01-02 00:30:00 99.58603
## 26 2010-01-02 01:30:00 100.10178
## 27 2010-01-02 02:30:00 99.78766
## 28 2010-01-02 03:30:00 100.02220
## 29 2010-01-02 04:30:00 99.83427
## 30 2010-01-02 05:30:00 99.74934
## 31 2010-01-02 06:30:00 99.99594
## 32 2010-01-02 07:30:00 100.08257
## 33 2010-01-02 08:30:00 99.47077
## 34 2010-01-02 09:30:00 99.81419
## 35 2010-01-02 10:30:00 100.13294
## 36 2010-01-02 11:30:00 99.78352
## 37 2010-01-02 12:30:00 100.04590
## 38 2010-01-02 13:30:00 99.91061
## 39 2010-01-02 14:30:00 100.61730
## 40 2010-01-02 15:30:00 100.18539
## 41 2010-01-02 16:30:00 99.45165
## 42 2010-01-02 17:30:00 100.09894
## 43 2010-01-02 18:30:00 100.04131
## 44 2010-01-02 19:30:00 99.58399
## 45 2010-01-02 20:30:00 99.75524
## 46 2010-01-02 21:30:00 99.94079
## 47 2010-01-02 22:30:00 100.26533
## 48 2010-01-02 23:30:00 100.35354
## 49 2010-01-03 00:30:00 100.31141
## 50 2010-01-03 01:30:00 100.10709
## 51 2010-01-03 02:30:00 99.41102
## 52 2010-01-03 03:30:00 100.07964
## 53 2010-01-03 04:30:00 99.88183
## 54 2010-01-03 05:30:00 99.91112
## 55 2010-01-03 06:30:00 99.71431
## 56 2010-01-03 07:30:00 100.48585
## 57 2010-01-03 08:30:00 100.35096
## 58 2010-01-03 09:30:00 100.00060
## 59 2010-01-03 10:30:00 100.03858
## 60 2010-01-03 11:30:00 99.95713
## 61 2010-01-03 12:30:00 99.18699
## 62 2010-01-03 13:30:00 99.49216
## 63 2010-01-03 14:30:00 99.37762
## 64 2010-01-03 15:30:00 99.68642
## 65 2010-01-03 16:30:00 99.84921
## 66 2010-01-03 17:30:00 99.84039
## 67 2010-01-03 18:30:00 99.90989
## 68 2010-01-03 19:30:00 99.95421
## 69 2010-01-03 20:30:00 100.01276
## 70 2010-01-03 21:30:00 100.14585
## 71 2010-01-03 22:30:00 99.54110
## 72 2010-01-03 23:30:00 100.02526
## 73 2010-01-04 00:30:00 100.04476
## 74 2010-01-04 01:30:00 99.61132
## 75 2010-01-04 02:30:00 99.94782
## 76 2010-01-04 03:30:00 99.44863
## 77 2010-01-04 04:30:00 99.91305
## 78 2010-01-04 05:30:00 100.25428
## 79 2010-01-04 06:30:00 99.86279
## 80 2010-01-04 07:30:00 99.63516
## 81 2010-01-04 08:30:00 99.65747
## 82 2010-01-04 09:30:00 99.57810
## 83 2010-01-04 10:30:00 99.77603
## 84 2010-01-04 11:30:00 99.85140
## 85 2010-01-04 12:30:00 100.82995
## 86 2010-01-04 13:30:00 100.26138
## 87 2010-01-04 14:30:00 100.25851
## 88 2010-01-04 15:30:00 99.92685
## 89 2010-01-04 16:30:00 100.00825
## 90 2010-01-04 17:30:00 100.24437
## 91 2010-01-04 18:30:00 99.62711
## 92 2010-01-04 19:30:00 99.93999
## 93 2010-01-04 20:30:00 99.82477
## 94 2010-01-04 21:30:00 100.15321
## 95 2010-01-04 22:30:00 99.88370
## 96 2010-01-04 23:30:00 100.06657

Compute column average based on date and time in R

I have a matrix, which looks a bit like this:
Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492
It contains a years worth of data, with each day having a 96 rows (15 minute intervals from 00:00 to 23:45). My question is that I'd like to average the data column, for each day, based on the time range I specify. For example, if I wanted to average over times 06:00 - 08:00 for each day, in the code above I should get an answer of 1.0964 for the date 04/09/2014.
I have no idea how to do this using the date and time columns as filters, and wondered if someone could help?
To make things even more complicated, I would also like to compute 45 minute rolling averages for each day, within a different time period, say 04:00 - 09:00. Again, as this is for each day, it would be good to get the result in a matrix for which each row is a certain date, then the columns would represent the rolling averages from say, 04:00 - 04:45, 04:15 - 05:00...
Any ideas?!
check the following code and let me know if anything is unclear
data = read.table(header = T, stringsAsFactors = F, text = "Index Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492")
library("magrittr")
data$parsed.timestamp = paste(data$Date, data$Time) %>% strptime(., format = "%d/%m/%Y %H:%M:%S")
# Hourly Average
desiredGroupingUnit = cut(data$parsed.timestamp, breaks = "hour") #You can use substr for that also
aggregate(data$Data, by = list(desiredGroupingUnit), FUN = mean )
# Group.1 x
# 1 2014-09-04 05:00:00 0.908
# 2 2014-09-04 06:00:00 1.286
# 3 2014-09-04 07:00:00 0.951
# 4 2014-09-04 08:00:00 0.937
# 5 2014-09-04 09:00:00 1.113
# 6 2014-09-04 10:00:00 0.492
# Moving average
getAvgBetweenTwoTimeStamps = function(data, startTime, endTime) {
avergeThoseIndcies = which(data$parsed.timestamp >= startTime & data$parsed.timestamp <= endTime)
return(mean(data$Data[avergeThoseIndcies]))
}
movingAvgWindow = 45*60 #minutes
movingAvgTimestamps = data.frame(from = data$parsed.timestamp, to = data$parsed.timestamp + movingAvgWindow)
movingAvgTimestamps$movingAvg =
apply(movingAvgTimestamps, MARGIN = 1,
FUN = function(x) getAvgBetweenTwoTimeStamps(data = data, startTime = x["from"], endTime = x["to"]))
print(movingAvgTimestamps)
# from to movingAvg
# 1 2014-09-04 05:45:00 2014-09-04 06:30:00 1.1010000
# 2 2014-09-04 06:00:00 2014-09-04 06:45:00 1.2860000
# 3 2014-09-04 06:15:00 2014-09-04 07:00:00 1.3550000
# 4 2014-09-04 06:30:00 2014-09-04 07:15:00 1.2530000
# 5 2014-09-04 06:45:00 2014-09-04 07:30:00 1.1000000
# 6 2014-09-04 07:00:00 2014-09-04 07:45:00 0.9510000
# 7 2014-09-04 07:15:00 2014-09-04 08:00:00 0.8900000
# 8 2014-09-04 07:30:00 2014-09-04 08:15:00 0.9120000
# 9 2014-09-04 07:45:00 2014-09-04 08:30:00 0.9500000
# 10 2014-09-04 08:00:00 2014-09-04 08:45:00 0.9370000
# 11 2014-09-04 08:15:00 2014-09-04 09:00:00 1.0620000
# 12 2014-09-04 08:30:00 2014-09-04 09:15:00 1.1320000
# 13 2014-09-04 08:45:00 2014-09-04 09:30:00 1.0880000
# 14 2014-09-04 09:00:00 2014-09-04 09:45:00 1.1130000
# 15 2014-09-04 09:15:00 2014-09-04 10:00:00 0.8810000
# 16 2014-09-04 09:30:00 2014-09-04 10:15:00 0.8626667
# 17 2014-09-04 09:45:00 2014-09-04 10:30:00 0.7960000
# 18 2014-09-04 10:00:00 2014-09-04 10:45:00 0.4920000

Resources