How to pick hourly values from dataset?

How to pick hourly values from dataset? - r

I need help with this issue:
I have a dataset of water level values distributed every 30 minutes, but I need only the hourly values. I tried with the aggregate() function but due to function FUN is one requisite it determines my analysis to be mean, or median and I don't want to use any stat function.
This one example of my data frame
06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996

We convert the 'RefDateTimeRef' column to POSIXct, extract the 'minute', 'second' with format and compare it with 00:00 to return a logical vector which we use to subset the rows.
df1[format(as.POSIXct(df1[,1], format = "%m/%d/%Y %H:%M"), "%M:%S")=="00:00",]
# RefDateTimeRef Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148
Or with lubridate
library(lubridate)
df1[ minute(mdy_hm(df1[,1]))==0,]
# RefDateTimeRef Data
#10 04/14/2016 09:00 0.153
#22 04/14/2016 08:00 0.148
Or with sub to remove the substring until the hour part and then use == to get the logical vector and subset the rows.
df1[ sub(".*\\s+\\S{2}:", "", df1[,1])=="00",]
NOTE: I would advice against using sub or substr as it can sometimes lead to incorrect answers.

df <- read.table(text = '06/16/2015 02:00:00 0.036068
06/16/2015 02:30:00 0.008916
06/16/2015 03:00:00 -0.008622
06/16/2015 03:30:00 -0.014057
06/16/2015 04:00:00 -0.011172
06/16/2015 04:30:00 0.002401
06/16/2015 05:00:00 0.029632
06/16/2015 05:30:00 0.061902002
06/16/2015 06:00:00 0.087366998
06/16/2015 06:30:00 0.105176002
06/16/2015 07:00:00 0.1153
06/16/2015 07:30:00 0.126197994
06/16/2015 08:00:00 0.144154996')
colnames(df) <- c('Date','Time','Value')
index <- ifelse(substring(df$Time,4) == "00:00",T,F)
final_df <- df[index,]

Related

Update year only in column timestamp date field SQLITE

I want to update the year only to 2025 without changing the month day and time
what I have
2027-01-01 09:30:00
2012-03-06 12:00:00
2014-01-01 17:24:00
2020-07-03 04:30:00
2020-01-01 05:50:00
2021-09-03 06:30:00
2013-01-01 23:30:00
2026-01-01 08:30:00
2028-01-01 09:30:00
what i required is below:
2025-01-01 09:30:00
2025-03-06 12:30:00
2025-01-01 17:24:00
2025-07-03 04:30:00
2025-01-01 05:50:00
2025-09-03 06:30:00
2025-01-01 23:30:00
2025-01-01 08:30:00
2025-01-01 09:30:00
I am using dB Browser for SQLite
what i have tried but it didn't worked
update t set
d = datetime(strftime('%Y', datetime(2059)) || strftime('-%m-%d', d));

You may update via a substring operation:
UPDATE yourTable
SET ts = '2025-' || SUBSTR(ts, 6, 14);
Note that SQLite does not actually have a timestamp/datetime type. Instead, these values would be stored as text, and hence we can do a substring operation on them.

Split out time interval in time series in r

I have a dataset - time series
Data below:
Col 1(End):
2018.01.01 01:00:00
2018.01.01 02:00:00
2018.01.01 03:00:00
2018.01.01 04:00:00
2018.01.01 05:00:00
2018.01.01 06:00:00
2018.01.01 07:00:00
2018.01.01 08:00:00
2018.01.01 09:00:00
2018.01.01 10:00:00
2018.01.01 11:00:00
2018.01.02 01:00:00
2018.01.02 02:00:00
2018.01.02 03:00:00
2018.01.02 04:00:00
Col 2(Price-indexed)
55.09
44.02
44.0
33
43
43
33
33
I wish to select from the data the time of 11:00 every day
I have tried doing a sequence but with daylight saving in GMT it changes to 12 in October fro 2019 and 2020 which is not correct
datos_2019_2020<-read.csv("DayaheadPricesfull_2019_2020.csv")
#price variable changed to numeric
datos_2019_2020$Price_indexed=as.numeric(datos_2019_2020$Price)
time_index_2019_2020 <- seq(from = as.POSIXct("2019-01-01 00:00"), to = as.POSIXct("2020-12-31 23:00"), by = "hour",tz="GMT")
eventdata_2019_2020 <- as.xts(datos_2019_2020$Price_indexed, drop = FALSE,order.by = time_index_2019_2020)
df.new_2019_2020 = eventdata_2019_2020[seq(12, nrow(eventdata_2019_2020), 24), ]

Using the xts object x shown reproducibly in the Note at the end:
x[format(time(x), format = "%H:%M:%S") == "11:00:00"]
giving this xts object:
[,1]
2018-01-01 11:00:00 NA
Time zone problems are often specific to a particular installation but often the problem is between local time and GMT or due to the switch between standard and daylight savings time. In these cases it often easiest to just set the entire session to GMT making the local time GMT. In that case there will be no confusion between local and GMT since they are both GMT and GMT does not have daylight savings time.
Sys.setenv(TZ = 'GMT')
Note
Lines1 <- "
2018.01.01 01:00:00
2018.01.01 02:00:00
2018.01.01 03:00:00
2018.01.01 04:00:00
2018.01.01 05:00:00
2018.01.01 06:00:00
2018.01.01 07:00:00
2018.01.01 08:00:00
2018.01.01 09:00:00
2018.01.01 10:00:00
2018.01.01 11:00:00
2018.01.02 01:00:00
2018.01.02 02:00:00
2018.01.02 03:00:00
2018.01.02 04:00:00"
Lines2 <- "
55.09
44.02
44.0
33
43
43
33
33"
library(xts)
col1 <- read.table(text = Lines1, sep = ",")
col2 <- read.table(text = Lines2)
# merge col1 and col2 using NA's to fill in
m <- merge(col1, col2, by = 0, all.x = TRUE)
z <- read.zoo(m[-1], tz = "", format = "%Y.%m.%d %H:%M:%S")
x <- as.xts(z)

Match all the dates in a dataframe that are equal to one of the dates in a vector

I have a dataframe with a timeDate column and a different vector of dates. I want to set a new column in my df equal to 1 for all the dates in my dataframe that are equal to one of the dates in my vector. I could do a double for loop but there should be a faster way of doing this right? The dataset is very large
test <- c("2009-01-01 00:00:00 UTC", "2009-01-02 01:00:00 UTC",
"2009-01-01 02:00:00 UTC", "2010-12-25 03:00:00 UTC",
"2009-01-02 04:00:00 UTC", "2009-01-09 05:00:00 UTC")
df <- as.data.frame.POSIXlt(test)
dvec <- as.POSIXlt(c("2009-01-01","2010-12-25"), tz = "GMT")

You can compare the date of test with dates in dvec
df$flag <- +(as.Date(df$test) %in% as.Date(dvec))
df
df
# test flag
#1 2009-01-01 00:00:00 1
#2 2009-01-02 01:00:00 0
#3 2009-01-01 02:00:00 1
#4 2010-12-25 03:00:00 1
#5 2009-01-02 04:00:00 0
#6 2009-01-09 05:00:00 0
The + at the beginning of the command changes the logical values (TRUE/FALSE) returned from %in% to integer values (1/0) respectively.
data
test <- as.POSIXlt(c("2009-01-01 00:00:00 UTC", "2009-01-02 01:00:00 UTC",
"2009-01-01 02:00:00 UTC", "2010-12-25 03:00:00 UTC",
"2009-01-02 04:00:00 UTC", "2009-01-09 05:00:00 UTC"), tz = "GMT")
df <- as.data.frame(test)
dvec <- as.POSIXlt(c("2009-01-01","2010-12-25"), tz = "GMT")

You can also use dplyr:
library(tidyverse)
df %>%
dplyr::mutate(valid = as.Date(test) %in% as.Date(dvec))
#> test valid
#> 1 2009-01-01 00:00:00 FALSE
#> 2 2009-01-02 01:00:00 FALSE
#> 3 2009-01-01 02:00:00 TRUE
#> 4 2010-12-25 03:00:00 TRUE
#> 5 2009-01-02 04:00:00 FALSE
#> 6 2009-01-09 05:00:00 FALSE

R time series missing values

I was working with a time series dataset having hourly data. The data contained a few missing values so I tried to create a dataframe (time_seq) with the correct time value and do a merge with the original data so the missing values become 'NA'.
> data
date value
7980 2015-03-30 20:00:00 78389
7981 2015-03-30 21:00:00 72622
7982 2015-03-30 22:00:00 65240
7983 2015-03-30 23:00:00 47795
7984 2015-03-31 08:00:00 37455
7985 2015-03-31 09:00:00 70695
7986 2015-03-31 10:00:00 68444
//converting the date in the data to POSIXct format.
> data$date <- format.POSIXct(data$date,'%Y-%m-%d %H:%M:%S')
// creating a dataframe with the correct sequence of dates.
> time_seq <- seq(from = as.POSIXct("2014-05-01 00:00:00"),
to = as.POSIXct("2015-04-30 23:00:00"), by = "hour")
> df <- data.frame(date=time_seq)
> df
date
8013 2015-03-30 20:00:00
8014 2015-03-30 21:00:00
8015 2015-03-30 22:00:00
8016 2015-03-30 23:00:00
8017 2015-03-31 00:00:00
8018 2015-03-31 01:00:00
8019 2015-03-31 02:00:00
8020 2015-03-31 03:00:00
8021 2015-03-31 04:00:00
8022 2015-03-31 05:00:00
8023 2015-03-31 06:00:00
8024 2015-03-31 07:00:00
// merging with the original data
> a <- merge(data,df, x.by = data$date, y.by = df$date ,all=TRUE)
> a
date value
4005 2014-07-23 07:00:00 37003
4006 2014-07-23 07:30:00 NA
4007 2014-07-23 08:00:00 37216
4008 2014-07-23 08:30:00 NA
The values I get after merging are incorrect and they contain half-hourly values. What would be the correct approach for solving this?
Why are is the merge result in 30 minute intervals when both my dataframes are hourly?
PS:I looked into this question : Fastest way for filling-in missing dates for data.table and followed the steps but it didn't help.

You can use the padr package to solve this problem.
library(padr)
library(dplyr) #for the pipe operator
data %>%
pad() %>%
fill_by_value()

subset by vector in r

I am trying to subset an xts object of OHLC hourly data with a vector.
If i create the vector myself with the following command
lookup = c("2012-01-12", "2012-01-31", "2012-03-05", "2012-03-19")
testdfx[lookup]
testdfx[lookup]
I get the correct data displayed which shows all the hours that match the dates in the vector (00:00 to 23:00.
> head(testdfx[lookup])
open high low close
2012-01-12 00:00:00 1.27081 1.27217 1.27063 1.27211
2012-01-12 01:00:00 1.27212 1.27216 1.27089 1.27119
2012-01-12 02:00:00 1.27118 1.27166 1.27017 1.27133
2012-01-12 03:00:00 1.27134 1.27272 1.27133 1.27261
2012-01-12 04:00:00 1.27260 1.27262 1.27141 1.27183
2012-01-12 05:00:00 1.27183 1.27230 1.27145 1.27165
> tail(testdfx[lookup])
open high low close
2012-03-19 18:00:00 1.32451 1.32554 1.32386 1.32414
2012-03-19 19:00:00 1.32417 1.32465 1.32331 1.32372
2012-03-19 20:00:00 1.32373 1.32415 1.32340 1.32372
2012-03-19 21:00:00 1.32373 1.32461 1.32366 1.32376
2012-03-19 22:00:00 1.32377 1.32424 1.32359 1.32366
2012-03-19 23:00:00 1.32364 1.32406 1.32333 1.32336
However when I extract a dates from an object and create a vector to use for subsetting I only get the hours of 00:00-19:00 displayed in my subset.
> head(testdfx[dates])
open high low close
2007-01-05 00:00:00 1.3092 1.3093 1.3085 1.3088
2007-01-05 01:00:00 1.3087 1.3092 1.3075 1.3078
2007-01-05 02:00:00 1.3079 1.3091 1.3078 1.3084
2007-01-05 03:00:00 1.3083 1.3084 1.3073 1.3074
2007-01-05 04:00:00 1.3073 1.3080 1.3061 1.3071
2007-01-05 05:00:00 1.3070 1.3072 1.3064 1.3069
> tail(euro[nfp.releases])
open high low close
2014-01-10 14:00:00 1.35892 1.36625 1.35728 1.36366
2014-01-10 15:00:00 1.36365 1.36784 1.36241 1.36743
2014-01-10 16:00:00 1.36742 1.36866 1.36693 1.36719
2014-01-10 17:00:00 1.36720 1.36752 1.36579 1.36617
2014-01-10 18:00:00 1.36617 1.36663 1.36559 1.36624
2014-01-10 19:00:00 1.36630 1.36717 1.36585 1.36702
I have compared both objects containing the require dates and they appear to be the same.
> class(lookup)
[1] "character"
> class(nfp.releases)
[1] "character"
> str(lookup)
chr [1:4] "2012-01-12" "2012-01-31" "2012-03-05" "2012-03-19"
> str(nfp.releases)
chr [1:86] "2014-02-07" "2014-01-10" "2013-12-06" "2013-11-08" ..
I am new to R but have tried everything over the past 3 days to get this to work. If I can't to it this way I will end up having to create a variable by hand but as its got 86 dates this may take some time.
Thanks in advance.

I cannot reproduce your problem
lookup = c("2012-01-12", "2012-01-31", "2012-03-05", "2012-03-19")
time_index <- seq(from = as.POSIXct("2012-01-01 07:00"), to = as.POSIXct("2012-05-17 18:00"), by = "hour")
set.seed(1)
value <- matrix(rnorm(n = 4*length(time_index)),length(time_index),4)
testdfx <- xts(value, order.by = time_index)
testdfx[lookup[1]]
testdfx["2012-01-12"]

Thanks for the response guys I actually thought i had deleted this thread but obviously not.
The problem in the case above was to be found around 3' from the computer. When looking through the data I was only interested in Fridays which also means that the FX market is closing down for the week end.
Sorry to have wasted your time and Admin please remove.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to pick hourly values from dataset? - r

Related

Update year only in column timestamp date field SQLITE

Split out time interval in time series in r

Match all the dates in a dataframe that are equal to one of the dates in a vector

R time series missing values

subset by vector in r

Categories

Resources