How to trace where the Error occurs when executing a `mapply()`? - r

I have a data.table of 600,000 rows and execute the following command on it:
ranges <- mapply(function(mi, ma) {seq(from=mi, to=ma, by="days")}, mi=Moves$Start, ma=Moves$End)
I get the following error message after a while:
Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument
I have tested my code with a smaller dataset and that seems to be working fine. This leads me to think that the error message is the result of the values in the dataset. Can anybody recommend an efficient way to trace the problem row(s) in the data.table? Needless to say, manually checking 600k rows is a bit too much.
Your suggestions for finding the problem rows in the data.table are appreciated!

The obvious solution is to turn the anonymous function into a first class, fully named function, and then you can debug the function. Or turn on the recover option and then you can step into the evaluation frames for the current stack and see the state of the variables at the point the error was raised.
myFun <- function(mi, ma) {
seq(from=mi, to=ma, by="days")
}
gets you a named function, which you can debug via
debug(myFun)
or
debugonce(myFun)
To turn on error recovery do
op <- options(error = recover)
(you can rest that then with: options(op) or options(error = stop)
In this case I suspect that mi is greater than ma:
> myFun(Sys.Date(), Sys.Date()-1)
Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument
so you could alter myFun to see if that is the case:
myFun <- function(mi, ma) {
if(mi > ma)
stop("`mi` is > than `ma`")
seq(from=mi, to=ma, by="days")
}
That way you get a more informative error message.
If that fails I'd use options(error = recover) and then drop into the evaluation call corresponding to the function and see what the values of mi and ma are.

Overview
seq.Date()'s error message is trying to tell you that a date in Moves$End (i.e. June 23, 2017) occurs before Moves$Start (i.e. April 17, 2017). Because seq.Date() assumes all dates in from occur before the dates in to, the error stops the function from proceeding.
To identify where this occurs, use which() to identify which dates in Moves$End are less than Moves$Start. From there, update those dates so that they occur after Moves$Start.
# load necessary data
Moves <- data.frame( Start = as.Date( x = c("2017-04-17", "2018-03-01", "2019-04-01") )
, End = as.Date( x = c("2017-06-23", "2018-02-14", "2018-04-24") )
, stringsAsFactors = FALSE )
# try to create a sequence of dates
date.ranges <-
mapply( FUN = function( mi, ma )
seq.Date( from = mi
, to = ma
, by = "day" )
, Moves$Start
, Moves$End
, SIMPLIFY = FALSE )
# identify the instance
# where the End date occurs
# before the Start date
wrong.end.date <-
which( Moves$End < Moves$Start )
# view results
wrong.end.date
# [1] 2 3
# correct those End Dates
# so that they occur
# after the Start date
Moves$End[ wrong.end.date ] <-
as.Date( x = c("2019-02-14", "2019-04-24") )
# rerun the mapply() function
date.ranges <-
mapply( FUN = function( mi, ma )
seq.Date( from = mi
, to = ma
, by = "day" )
, Moves$Start
, Moves$End
, SIMPLIFY = FALSE )
# end of script #

Related

Why does rasterToPoints generate an error on first call but not second?

I have some code that loops over a list of study IDs (ids) and turns them into separate polygons/spatial points. On the first execution of the loop it produces the following error:
Error in (function (x) : attempt to apply non-function
This is from the raster::rasterToPoints function. I've looked at the examples in the help section for this function and passing fun=NULL seems to be an acceptable method (filters out all NA values). All the values are equal to 1 anyways so I tried passing a simple function like it suggests such as function(x){x==1}. When this didn't work, I also tried to just suppress the error message but without any luck using try() or tryCatch().
Main questions:
1. Why does this produce an error at all?
2. Why does it only display the error on the first run through the loop?
Reproducible example:
library(ggplot2)
library(raster)
library(sf)
library(dplyr)
pacific <- map_data("world2")
pac_mod <- pacific
coordinates(pac_mod) <- ~long+lat
proj4string(pac_mod) <- CRS("+init=epsg:4326")
pac_mod2 <- spTransform(pac_mod, CRS("+init=epsg:4326"))
pac_rast <- raster(pac_mod2, resolution=0.5)
values(pac_rast) <- 1
all_diet_density_samples <- data.frame(
lat_min = c(35, 35),
lat_max = c(65, 65),
lon_min = c(140, 180),
lon_max = c(180, 235),
sample_replicates = c(38, 278),
id= c(1,2)
)
ids <- all_diet_density_samples$id
for (idnum in ids){
poly1 = all_diet_density_samples[idnum,]
pol = st_sfc(st_polygon(list(cbind(c(poly1$lon_min, poly1$lon_min, poly1$lon_max, poly1$lon_max, poly1$lon_min), c(poly1$lat_min, poly1$lat_max, poly1$lat_max, poly1$lat_min, poly1$lat_min)))))
pol_sf = st_as_sf(pol)
x <- rasterize(pol_sf, pac_rast)
df1 <- raster::rasterToPoints(x, fun=NULL, spatial=FALSE) #ERROR HERE
df2 <- as.data.frame(df1)
density_poly <- all_diet_density_samples %>% filter(id == idnum) %>% pull(sample_replicates)
df2$density <- density_poly
write.csv(df2, paste0("pol_", idnum, ".csv"))
}
Any help would be greatly appreciated!
These are error messages, but not errors in the strict sense as the script continues to run, and the results are not affected. They are related to garbage collection (removal from memory of objects that are no longer in use) and this makes it tricky to pinpoint what causes it (below you can see a slightly modified example that suggests another culprit), and why it does not always happen at the same spot.
Edit (Oct 2022)
These annoying messages
Error in x$.self$finalize() : attempt to apply non-function
Error in (function (x) : attempt to apply non-function
Will disappear with the next release of Rcpp, which is planned for Jan 2023. You can also install the development version of Rcpp like this:
install.packages("Rcpp", repos="https://rcppcore.github.io/drat")

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!
I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Try statement issue R

Having an issue with this simple try() statement. All I would like it to do is if the number is not there or if an error comes up move to the next. New to R and I have some info in certain folders but missing some numbers between the range.
library(readr)
season <- c(2014:2014)
gamenumbers <- c(20300:21271)
#############################################
# TEAM NULL DF's
season_teamstatsadj5v5 <- NULL
print('NUll DFs Created')
##############################################
for(game in gamenumbers){
try(
print('Start Team')
print(as.character(game))
###################################################################################################################
# team_stats_adj_5v5_df Bind
teamstatsadj5v5<-paste0('//LVS_DB/Users/Mike/Desktop/NHL_PBP/', season,'/', game, '/', game, '_teamstatsadj5v5.csv')
teamstatsadj5v5_df <- read_delim(teamstatsadj5v5, delim = ',')
season_teamstatsadj5v5 <- rbind(season_teamstatsadj5v5, teamstatsadj5v5_df)
)
}
Please see the corrected code you shared. error argument, which will handle the exception thrown should be indicated in tryCatch call. Please see as below:
library(readr)
season <- c(2014:2014)
gamenumbers <- c(20300:21271)
#############################################
# TEAM NULL DF's
season_teamstatsadj5v5 <- NULL
print('NUll DFs Created')
##############################################
for(game in gamenumbers){
tryCatch({
print('Start Team')
print(as.character(game))
###################################################################################################################
# team_stats_adj_5v5_df Bind
teamstatsadj5v5<-paste0('//LVS_DB/Users/Mike/Desktop/NHL_PBP/', season,'/', game, '/', game, '_teamstatsadj5v5.csv')
teamstatsadj5v5_df <- read_delim(teamstatsadj5v5, delim = ',')
season_teamstatsadj5v5 <- rbind(season_teamstatsadj5v5, teamstatsadj5v5_df)
}, error = function(e) {message(paste0(e, "\n"))})
}

Error in getSymbols, must use auto.assign=TRUE for multiple symbol requests

I'm trying to write a program that would take a .csv file of stock symbols and test them against each other for things like cointegration. However, when I run the following code quatnmod gives me something about having to use auto.assign = TRUE for multiple symbol requests.
getprices<-function(sym){
#get prices from last 7 years
prices<-getSymbols(sym, from = Sys.Date() - (365*7), auto.assign=FALSE)
#exract closing prices
prices<-Cl(prices)
return(prices)}
symbols1 <- c('TSN', 'MSFT')
symbols2 <- c('AAPL', 'NFLX')
container<-c()
addprices <- function(symbols1, symbols2){
for (i in symbols1){
for (g in symbols2){
i<-getprices(i)
g<-getprices(g)
container <- i+g
}
}
return(container)
}
When I run addprices(symbols1, symbols2) I get this error:
Error in getSymbols(sym, from = Sys.Date() - (365 * 7), auto.assign = FALSE) :
must use auto.assign=TRUE for multiple Symbols requests
Calls: addprices -> getprices -> getSymbols
I know when I do this I should get that error, and I believe this is what the error is referring to:
getSymbols(sym, from = Sys.Date() - (365 * 7), auto.assign = FALSE)
However, what I'm doing isn't that, so what gives? Any advice? Is there a work around?
I googled this and there really weren't any relevant questions/answers.
The problem is that you're over-writing the iterator i inside the g for loop. The first iteration of g works fine but i is no longer symbols1[1] in the second iteration... it's the output from getprices(i).

NA Error min Function in R

I am running into an error using R's min() function.
wip <- read.csv("WIP-01-11-11.csv") # Get WIP CSV
wip <- transform(wip, End.Date=as.Date(wip$End.Date,format='%d-%b-%y', na.rm=T))
wip <- transform(wip, Start.Date=as.Date(wip$Start.Date,format='%d-%b-%y', na.rm=T))
wip2 <- transform(wip, duration=ifelse(
round((wip3$End.Date - wip3$Start.Date)/30, digits = 0)==0,
1,
round((wip3$End.Date - wip3$Start.Date)/30, digits = 0)))
# At this point, I get NAs
wip3 <- transform(wip2, monthsRec=min( (
(2011*12+11) - as.numeric(format(wip3$Start.Date, '%Y'))*12 +
as.numeric(format(wip3$Start.Date, '%m'))),
wip3$duration)
)
Why am I getting NAs in the "duration" calculation for wip2 when End.Date and Start.Date have no NAs.
Thanks,
wip3 = list()
wip3$Start.Date = as.Date('2011-01-01')
wip3$duration = 10
> min(((2011*12+11) - as.numeric(format(wip3$Start.Date, '%Y'))*12+as.numeric(format(wip3$Start.Date, '%m'))),wip3$duration)
[1] 10
Works fine for me. Do you have any NAs in your data? If so, you probably want to use the na.rm=T flag to min().
I reproduced your problem with #John Colbys example if I use the wrong casing for wip3$start.Date:
wip3 = list()
wip3$start.Date = as.Date('2011-01-01')
wip3$duration = 10
min(((2011*12+11) - as.numeric(format(wip3$Start.Date, '%Y'))*12+as.numeric(format(wip3$Start.Date, '%m'))),wip3$duration)
Which produces
[1] NA
Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
I suspected since you have wip3$duration, you probably have wip3$start.Date too - but you accessed it as wip3$Start.Date in your code. That returns NULL, which doesn't work well with the rest...

Resources