rnoaa air pressure looping - r

The rnoaa package only allows you to gather 30 days worth of air pressure information at a time https://cran.r-project.org/web/packages/rnoaa/rnoaa.pdf. I'm looking to create a function/ for loop to pull data from the package a month at a time. It's specific the date format that is requires, YYYYMMDD. No - or /. I started with a function, but the lapply, doesn't seem to be applying to the function to call the air pressure data.
I have tried loops in many ways, and I can't seem to get it. Here's an example.
for (i in dates)) {
air_pressure[i] <- coops_search(begin_date = start[i], end_date = end[i],
station_name = 8727520, product= "air_pressure", units = "metric", time_zone = "gmt")
print(air_pressure[i])
}
start<-seq(as.Date("2015/01/01"), by = "month", length.out = 100)
start <- as.numeric(gsub("-","",start))
end<-seq(as.Date("2015/02/01"), by = "month", length.out = 100)
end <- as.numeric(gsub("-","",end))
pressure_function<- function(air_pressure) {
coops_search(station_name = 8727520, begin_date = starting,
end_date = ending, product = "air_pressure")
}
lapply(pressure_function, starting= start, ending= end, FUN= sum)
No real error messages, just don't populate, or run the function.

There's some pretty fundamental things wrong here. First, your for loop has too many closing parentheses. Second, your lapply call passes a function as the first parameter; that does not work, pass it in the second slot. And more ....
Anyway, try this:
library(rnoaa)
fun <- function(begin, end) {
coops_search(station_name = 8727520, begin_date = gsub("-", "", begin),
end_date = gsub("-", "", end), product = "air_pressure")
}
start_dates <- seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by = "month")
end_dates <- seq(as.Date("2015-02-01"), as.Date("2016-01-01"), by = "month") - 1
res <- Map(fun, start_dates, end_dates)
df <- dplyr::bind_rows(lapply(res, "[[", "data"))
head(df)
#> t v f
#> 1 2015-01-01 00:00:00 1025.3 0,0,0
#> 2 2015-01-01 00:06:00 1025.4 0,0,0
#> 3 2015-01-01 00:12:00 1025.5 0,0,0
#> 4 2015-01-01 00:18:00 1025.6 0,0,0
#> 5 2015-01-01 00:24:00 1025.6 0,0,0
#> 6 2015-01-01 00:30:00 1025.6 0,0,0
NROW(df)
#> [1] 87600

Related

how to break a time range into monthly queries?

Consider this simple example
bogus <- function(start_time, end_time){
print(paste('hey this starts on', start_time, 'until', end_time))
}
start_time <- ymd('2018-01-01')
end_time <- ymd('2018-05-01')
> bogus(start_time, end_time)
[1] "hey this starts on 2018-01-01 until 2018-05-01"
Unfortunately, doing so with a long time range does not work with my real-life bogus function, so I need to break my original time range into monthly pieces.
In other words the first call would be bogus(ymd('2018-01-01'), ymd('2018-01-31')), the second one bogus(ymd('2018-02-01'), ymd('2018-02-28')), etc.
Is there a simple way to do using purrr and lubridate?
Thanks
Are you looking for something like:
library(lubridate)
seq_dates <- seq(start_time, end_time - 1, by = "month")
lapply(seq_dates, function(x) print(paste('hey this starts on', x, 'until', ceiling_date(x, unit = "month") - 1)))
You could also do a short bogus function like:
bogus <- function(start_var, end_var) {
require(lubridate)
seq_dates <- seq(as.Date(start_var), as.Date(end_var) - 1, by = "month")
printed_statement <- lapply(seq_dates, function(x) paste('hey this starts on', x, 'until', ceiling_date(x, unit = "month") - 1))
for (i in printed_statement) { print(i) }
}
And call it like:
bogus("2018-01-01", "2018-05-01")
Output:
[1] "hey this starts on 2018-01-01 until 2018-01-31"
[1] "hey this starts on 2018-02-01 until 2018-02-28"
[1] "hey this starts on 2018-03-01 until 2018-03-31"
[1] "hey this starts on 2018-04-01 until 2018-04-30"
This way you can just give minimum start and maximum end date and get everything in-between.
With base:
seqdate<-seq.Date(start_time,end_time,by="1 month")
dateranges<-data.frame(start.dates=seqdate[1:length(seqdate)-1],
end.dates=seqdate[2:length(seqdate)]-1)
start.dates end.dates
1 2018-01-01 2018-01-31
2 2018-02-01 2018-02-28
3 2018-03-01 2018-03-31
4 2018-04-01 2018-04-30

Fastest way of matching observations within time difference

I'm calculating price differences between trades that have a specific time difference (say 60 seconds). I need this to be done with several assets and several trades. However, I could not figure a way to do this without an eternal for-loop.
Let's create some random prices:
library(birk)
library(tictoc)
library(dplyr)
initial.date <- as.POSIXct('2018-10-27 10:00:00',tz='GMT')
last.date <- as.POSIXct('2018-10-28 17:00:00',tz='GMT')
PriorityDateTime=seq.POSIXt(from=initial.date,to = last.date,by = '30 sec')
TradePrice=seq(from=1, to=length(PriorityDateTime),by = 1)
ndf<- data.frame(PriorityDateTime,TradePrice)
ndf$InstrumentSymbol <- rep_len(x = c('asset1','asset2'),length.out = length(ndf$PriorityDateTime))
ndf$id <- seq(1:length(x = ndf$InstrumentSymbol))
My main function is the following:
For each trade (at the TradePrice column) I need to find closest trade that falls in the 60-second interval.
calc.spread <- function(df,c=60){
n<-length(df$PriorityDateTime)
difft <- dspread <- spread <- rep(0,n)
TimeF <- as.POSIXct(NA)
for (k in 1:n){
diffs <- as.POSIXct(df$PriorityDateTime) - as.POSIXct(df$PriorityDateTime[k])
idx <- which.closest(diffs,x=c)
TimeF[k]<- as.POSIXct(df$PriorityDateTime[idx])
difft[k] <- difftime(time1 = TimeF[k],time2 = df$PriorityDateTime[k], units = 'sec')
dspread[k] <- abs(df$TradePrice[k] - df$TradePrice[idx])
spread[k] <- 2*abs(log(df$TradePrice[k]) - log(df$TradePrice[idx]))
}
df <- data.frame(spread,dspread,difft,TimeF,PriorityDateTime=df$PriorityDateTime,id=df$id)
}
The function which.closest is just a wrapper for which.min(abs(vec - x)). As I have a data frame with multiple assets, I run:
c=60
spreads <- ndf %>% group_by(InstrumentSymbol) %>% do(calc.spread(.,c=c))
The problem is that I need to run this for 3-million row data frames. I have searched on the forum but couldn't find a way to run this code faster. Ddply is a little bit slower than using dplyr.
Is there any suggestion?
Being quite unsatisfied by my own previous answer, I asked here for help and turns out there is at least one way in data.table which is clearly faster. Also made a dplyr-related question here
s <- Sys.time()
initial.date <- as.POSIXct('2018-10-27 10:00:00',tz='GMT')
last.date <- as.POSIXct('2018-12-28 17:00:00',tz='GMT')
PriorityDateTime=seq.POSIXt(from=initial.date,to = last.date,by = '30 sec');length(PriorityDateTime)
TradePrice=seq(from=1, to=length(PriorityDateTime),by = 1)
ndf<- data.frame(PriorityDateTime,TradePrice)
ndf$InstrumentSymbol <- rep_len(x = c('asset1','asset2'),length.out = length(ndf$PriorityDateTime))
ndf$id <- seq(1:length(x = ndf$InstrumentSymbol))
ndf$datetime <- ymd_hms(ndf$PriorityDateTime)
res <- ndf %>% data.table()
res2 <- setDT(res)
res2 <- res2[, `:=` (min_60 = datetime - 60, plus_60 = datetime + 60, idx = .I)][
res2, on = .(InstrumentSymbol = InstrumentSymbol, datetime >= min_60, datetime <= plus_60), allow.cartesian = TRUE][
idx != i.idx, .SD[which.min(abs(i.TradePrice - TradePrice))], by = id][
, .(id, minpricewithin60 = i.TradePrice, index.minpricewithin60 = i.idx)][
res, on = .(id)][, `:=` (min_60 = NULL, plus_60 = NULL, idx = NULL)]
res2[]
e <- Sys.time()
> e-s
Time difference of 1.23701 mins
You can then apply your calc.spread function directly to the minpricewithin60 column.
You might have made a mistake in the sense that you are not looking for the minimum difference within 60 secs difference as described, but instead you are looking for a trade which took place as close as possible to 60secs in past or future:
idx <- which.closest(diffs,x=c)
Using this a trade which took place 1 sec ago would be discarded for a trade that happened closer to 60 secs away, I don't think that this is what you want. You probably want the lowest price difference for all trades within 60 secs which can be done by:
res$idx[i] <<- which.min(pricediff)[1]
See the code below:
library(lubridate)
library(dplyr)
ndf$datetime <- ymd_hms(ndf$PriorityDateTime)
res <- ndf %>% data.frame(stringsAsFactors = F)
res$dspread <- res$idx <- res$spread <- NA
sapply(1:nrow(res),function(i){
within60 <- abs(difftime(ndf$datetime[i],ndf$datetime,"secs"))<=60
samesymbol <- res$InstrumentSymbol[i]==res$InstrumentSymbol
isdifferenttrade <- 1:nrow(res)!=i
pricediff <- ifelse(within60&samesymbol&isdifferenttrade,abs(res$TradePrice[i]-res$TradePrice), Inf)
res$dspread[i] <<- min(pricediff)
res$idx[i] <<- which.min(pricediff)[1] #in case several elements have same price
res$spread[i] <<- 2*abs(log(res$TradePrice[i])-log(res$TradePrice[res$idx[i]]))
} )
head(res)
What I used was apply which is similar to (and can be even slower than) for loops. If this is any faster for your real data, it is because I did the operations in a way which needed less steps.
Let me know, otherwise you can try the same in a for loop, or we'd have to try with data.table which I am less familiar with. These are generally time consuming of course because you need to define conditions based on each row of data.
PriorityDateTime TradePrice InstrumentSymbol id datetime spread idx
1 2018-10-27 10:00:00 1 asset1 1 2018-10-27 10:00:00 2.1972246 3
2 2018-10-27 10:00:30 2 asset2 2 2018-10-27 10:00:30 1.3862944 4
3 2018-10-27 10:01:00 3 asset1 3 2018-10-27 10:01:00 2.1972246 1
4 2018-10-27 10:01:30 4 asset2 4 2018-10-27 10:01:30 1.3862944 2
5 2018-10-27 10:02:00 5 asset1 5 2018-10-27 10:02:00 1.0216512 3
6 2018-10-27 10:02:30 6 asset2 6 2018-10-27 10:02:30 0.8109302 4
dspread
1 2
2 2
3 2
4 2
5 2
6 2

time differences between rows without night in R

A subject was measured at several time points over several days. I have a row "resptime_s" (time that the subject was answered the beep on his smartphone). Now I want to know the mean time between those (so between the rows of this column) with the night time taken out (nighttime is always from 22:30 p.m till 7:30 a.m). Take as example:
The R script:
setwd("C:/Users/Hanne/Desktop/")
dat <- read.csv(file="datnew2.csv", sep=";",header=TRUE)
rows <- c(1:388) #time points
columns <- c(2,60) # datum and time
nVariables = 2
newdata<-dat[rows,columns]
head(newdata)
fun2 <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
f <- cumsum(c(FALSE, diff(bt) < 0))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
bt <- as.POSIXct(paste(d, x))
res <- sapply(split(bt, f), function(b) c(0, difftime(b[-1], b[1])))
unname(unlist(res))
}
fun2(newdata$resptime_s)
But the result isn't correct.
And with:
dput(head(newdata, 30))
I obtained this output:
Using the different functions for working with time intervals in lubridate gives the most elegant and easy to understand solution.
library(tidyverse)
library(lubridate)
data <- tribble(
~time_point, ~beeptime,
1, "08:30",
2, "11:13",
3, "12:08",
4, "17:20",
5, "22:47",
6, "7:36",
7, "9:40"
) %>%
mutate(beeptime = as_datetime(hm(beeptime)))
1. Define the daytime interval
day <- interval(
as_datetime(hm("07:30")),
as_datetime(hm("22:30"))
)
2. Keep daytime beeps and estimate the time (interval) between them
# %--% is basically the same as interval() above.
data_interval <-
data %>%
filter(beeptime %within% day) %>%
mutate(beep_interval = lag(beeptime) %--% beeptime)
3. Take the average
# You can use as.numeric() to extract (e.g.) minutes, which you can
# just pass to mean().
data_interval$beep_interval %>%
as.numeric("minutes") %>%
abs() %>%
mean(na.rm = TRUE)
#> [1] 247.6
Try the following. It pastes a date that increments every time the next hour is less than the previous one. Then difftime works as expected.
fun <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
inx <- as.logical(cumsum(c(FALSE, diff(bt) < 0)))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
d[inx] <- d[inx] + 1
beeptime <- as.POSIXct(paste(d, x))
difftime(beeptime[-1], beeptime[1])
}
fun(newdata$beeptime)
#Time differences in hours
#[1] 2.716667 3.633333 8.833333 14.283333 23.100000 25.166667
Data.
newdata <-
structure(list(time_point = 1:7, beeptime = structure(1:7, .Label = c("08:30",
"11:13", "12:08", "17:20", "22:47", "7:36", "9:40"), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
Edit.
I believe that I have missunderstood the question. The OP does not want differences between the first hour and all others. What is needed is the differences restarting from zero every night.
If this is the case, the following function will do it.
fun2 <- function(x){
bt <- as.integer(sub("(^\\d{1,2}):.*", "\\1", x))
f <- cumsum(c(FALSE, diff(bt) < 0))
d <- rep(as.Date("2018-01-01"), length.out = length(bt))
bt <- as.POSIXct(paste(d, x))
res <- sapply(split(bt, f), function(b) c(0, difftime(b[-1], b[1])))
unname(unlist(res))
}
fun2(newdata$beeptime)
#[1] 0.000000 2.716667 3.633333 8.833333 14.283333 0.000000 2.066667
Another approach could be to convert beeptime in offset (in seconds) from midnight using lubridate package.
We can then write a function(s) to calculate difference in time excluding night time (22:30 - 7:30).
Before we start solution, lets have a look for offset in seconds from midnight for 7:30 and 22:30.
library(lubridate)
as.numeric(seconds(hm("7:30")))
# [1] 27000
as.numeric(seconds(hm("22:30")))
# [1] 81000
I have written two sets of function to calculate difference between two times:
# Function checks individual time and shifts them to night boundary. So that
# time over night can be excluded.
checkNightBoundry <- function(val){
if(val < 27000){
val = 27000
} else if(val > 81000) {
val = 81000
}
val
}
# Arguments are offset from midnight in seconds
# Calculate difference between two time, excluding midtime
calcDifftime <- function(currVal, prevVal){
diffTime <- 0
currVal = checkNightBoundry(currVal)
prevVal = checkNightBoundry(prevVal)
if(currVal > prevVal){
diffTime = currVal - prevVal
}else if(currVal < prevVal){
diffTime = (81000 - prevVal) + (currVal - 27000)
}
diffTime
}
Now, use above functions:
library(dplyr)
library(lubridate)
df %>% mutate(beeptimeOffset = as.numeric(seconds(hm(beeptime)))) %>%
mutate(diffTime = mapply(calcDifftime,
beeptimeOffset, lag(beeptimeOffset, default = first(beeptimeOffset)))/3600)
# timepoint beeptime beeptimeOffset(sec) diffTime(hrs)
# 1 1 08:30 30600 0.0000000
# 2 2 11:13 40380 2.7166667
# 3 3 12:08 43680 0.9166667
# 4 4 17:20 62400 5.2000000
# 5 5 22:47 82020 5.1666667
# 6 6 7:36 27360 0.1000000
# 7 7 9:40 34800 2.0666667
Data:
df <- read.table(text =
"timepoint beeptime
1 08:30
2 11:13
3 12:08
4 17:20
5 22:47
6 7:36
7 9:40",
header = TRUE, stringsAsFactors = FALSE)

Compare date intervals within the same data frame

I have search around and find similar questions but can make it work for my data.
I have a data frame with start and end dates, as well as several other factors. Ideally, the start date of a row should be posterior to the end date of any previous row, but the data has duplicated starts or ends, and sometimes the interval of the dates overlap.
I tried to make a reproducible example:
df = data.frame(start=c("2018/04/15 9:00:00","2018/04/15 9:00:00","2018/04/16 10:20:00","2018/04/16 15:30:00",
"2018/04/17 12:40:00","2018/04/17 18:50:00"),
end=c("2018/04/16 8:00:00","2018/04/16 7:10:00","2018/04/17 18:20:00","2018/04/16 16:30:00",
"2018/04/17 16:40:00","2018/04/17 19:50:00"),
value=c(10,15,11,13,14,12))
I was able to remove the duplicated (end or start dates), but I can't remove the overlapping intervals. I want to create a loop that "cleans" the intervals contained within any larger interval. So the results looks like this:
result = df[c(1,3,6),]
I thought I could make a loop that would "clean" both duplicates and overlapping intervals, but I can't make it work.
Any suggestions?
The data.table package is suited for this kind of problem using the overlapping join function foverlaps (inspired by findOverlaps function from the Bioconductor package IRanges) and then an anti-join (data.table syntax is B[!A, on]) to remove those inner intervals.
library(data.table)
cols <- c("start", "end")
setDT(df)
df[, (cols) := lapply(.SD, function(x) as.POSIXct(x, format="%Y/%m/%d %H:%M:%S")), .SDcols=cols]
setkeyv(df, cols)
anti <- foverlaps(df, df, type="within")[start!=i.start | end!=i.end | value!=i.value]
df[!anti, on=.(start=i.start, end=i.end, value=i.value)]
# start end value
# 1: 2018-04-15 09:00:00 2018-04-16 08:00:00 10
# 2: 2018-04-16 10:20:00 2018-04-17 18:20:00 11
# 3: 2018-04-17 18:50:00 2018-04-17 19:50:00 12
Alternative approach is to use %within% of the lubridate() package:
library(lubridate)
# transform characters to dates
start_time <- as_datetime(df[ , "start"], tz = "UTC")
end_time <- as_datetime(df[ , "end"], tz = "UTC")
# construct intervals
start_end_intrvls <- interval(start_time, end_time)
# find indices of the non-within intervals
not_within <- !(sapply(FUN = function(i) any(start_end_intrvls[i] %within% start_end_intrvls[-i]),
X = seq(along.with = df[ , "start"])))
df[not_within, ]
# start end value
# 1 2018/04/15 9:00:00 2018/04/16 8:00:00 10
# 3 2018/04/16 10:20:00 2018/04/17 18:20:00 11
# 6 2018/04/17 18:50:00 2018/04/17 19:50:00 12
Update
The as_datetime() function causes an error when being applied to a tibble:
as_datetime(tibble("2018/04/15 9:00:00"), tz = "UTC")
Error in as.POSIXct.default(x) :
do not know how to convert 'x' to class “POSIXct”
The solution above may be modified to resolve this issue with substitution of the as_datetime() with the as.POSIXlt():
df_tibble <- tibble(start=c("2018/04/15 9:00:00","2018/04/15 9:00:00","2018/04/16 10:20:00",
"2018/04/16 15:30:00", "2018/04/17 12:40:00","2018/04/17 18:50:00"),
end=c("2018/04/16 8:00:00","2018/04/16 7:10:00","2018/04/17 18:20:00","2018/04/16 16:30:00",
"2018/04/17 16:40:00","2018/04/17 19:50:00"), value=c(10,15,11,13,14,12))
start_time_lst <- lapply(FUN = function(i) as.POSIXlt(as.character(df_tibble[i , "start"]),
tz = "UTC"),
X = seq(along.with = unlist(df_tibble[ , "start"])))
end_time_lst <- lapply(FUN = function(i) as.POSIXlt(as.character(df_tibble[ i, "end"]),
tz = "UTC"),
X = seq(along.with = unlist(df_tibble[ , "end"])))
start_end_intrvls <- lapply(function(i) interval(start_time_lst[[i]] , end_time_lst[[i]]),
X = seq(along.with = unlist(df_tibble[ , "start"])))
not_within <- sapply(function(i) !(any(unlist(Map(`%within%`,
start_end_intrvls[[i]], start_end_intrvls[-i])))),
X = seq(along.with = unlist(df_tibble[ , "start"])))

Expanding R Matrix on Date

I have the following R matrix:
Date MyVal
2016 1
2017 2
2018 3
....
2026 10
What I want to do is "blow it up" so that it goes like this (where monthly values are linearly interpolated):
Date MyVal
01/01/2016 1
02/01/2016 ..
....
01/01/2017 2
....
01/01/2026 10
I realize I can easily generate the sequence using:
DateVec <- seq(as.Date(paste(minYear,"/01/01", sep = "")), as.Date(paste(maxYear, "/01/01", sep = "")), by = "month")
And I can use that to make a large matrix and then fill things in using a for loop over the DateVector in but I wonder if there's a more elegant R way to do this?
You can use stats::approx:
library(stats)
ipc <- approx(df$Date, df$MyVal, xout = DateVec,
rule = 1, method = "linear", ties = mean)
You probably need to first convert the data in your original data-frame to have month and day and also be in asPOSIXct or as.Date format.
Based on what you provided, this works:
#Make the reference data-frame for interpolation:
DateVec <- seq(min(df$Date, na.rm=T),
max(df$Date, na.rm=T), by = "month")
#Interpolation:
intrpltd_df <- approx(df$Date, df$MyVal, xout = DateVec,
rule = 1, method = "linear", ties = mean)
# x y
# 1 2016-01-01 1.000000
# 2 2016-02-01 1.084699
# 3 2016-03-01 1.163934
# 4 2016-04-01 1.248634
# 5 2016-05-01 1.330601
# 6 2016-06-01 1.415301
Data:
#reproducing the data-frame:
Date <- seq(2016,2026)
MyVal <- seq(1:11)
Date <- data.frame(as.Date(paste0(Date,"/01/01"))) #yyyy-mm-dd format
df <- cbind(Date, MyVal)
df <- as.data.frame(df)
colnames(df) <- c ("Date", "MyVal") #Changing Column Names

Resources