Strange behaviour while accessing elements in an XTS - r

I have an XTS that is part of a list returns$sig and from that XTS, I pull out a set of elements based on some conditions and store the Index in a variable tstart.
> tstart <- index(returns$sig[which(returns$sig != lag(returns$sig,1) & returns$sig != 0)])
> length(tstart)
[1] 599
When I try and access the returns$sig XTS again with the dates in tstart, I get a XTS with a different length:
> length(returns$sig[tstart])
[1] 478
It should return something with length 599. If I try and access the XTS in a different way, I do get something of the same length:
> length(returns$sig[match(tstart,index(returns$sig))])
[1] 599
Spent hours on this and haven't found a resolution. Is there something obvious that I am doing wrong? And to make matters worse, I swear that length(returns$sig[tstart]) returned 599 yesterday and everything was working fine.

Sorry for the poorly formed question. I couldn't reproduce the error with a short example and didn't want to post all my code. I have finally figured out the issue. It seems to be related to a bug some ppl (myself included) have been reporting with XTS. All I needed to do was to specify a timezone for my system with Sys.setenv(TZ = "GMT").
For those that are interested, I am using xts_0.8-8.

Related

Accessing API with for-loop randomly has encoding error, which breaks loop in R

I'm trying to access an API from iNaturalist to download some citizen science data. I'm using the package rinat to get this done (see vignette). The loop below is, essentially, pulling all observations for one species, in one state, in one year iteratively on a per-month basis, then summing the number of observations for that year (input parameters subset from my actual script for convenience).
require(rinat)
state_ids <- c(18, 14)
bird_ids <- c(14886,1409)
months <- c(1:12)
final_nums <- vector()
for(i in 1:length(state_ids)){
total_count <- vector()
for(j in 1:length(months)){
monthly <- get_inat_obs(place_id=state_ids[i],
taxon_id=bird_ids[i],
year=2019,
month = months[j])
total_count <- append(total, length(monthly$scientific_name))
print(paste("done with month", months[j], "in state", state_ids[i]))
}
final_nums <- append(final_nums, sum(total_count))
print(paste("done with state", state_ids[i]))
}
Occasionally, and seemingly randomly, I get the following error:
No encoding supplied: defaulting to UTF-8.
Error in if (!x$headers$`content-type` == "text/csv; charset=utf-8") { :
argument is of length zero
This ends up breaking the loop or makes the loop run without actually pulling any real data. Is this an issue with my script, or the API, or something else? I've tried manually supplying encoding information to the get_inat_obs() function, but it doesn't accept that as an argument. Thank you in advance!
I don't believe this is an error in your script. The issue is with the api most likely.
the error argument is of length zero is a common error when you try to make a comparison that has no length. For example:
if(logical(0) == "TEST") print("WORKED!!")
#Error in if (logical(0) == "TEST") print("WORKED!!") :
# argument is of length zero
I did some a few greps on their source code to see where this if statement is and it seems to be within inat_handle line 211 in get_inate_obs.R
This would suggest that the authors did not expect for
!x$headers$`content-type` == 'text/csv; charset=utf-8'
to evaluate to logical(0), but more specifically
x$headers$`content-type`
to be NULL.
I would suggest making a bug report on their GitHub and recommend they change the specified line to:
if(is.null(x$headers$`content-type`) || !x$headers$`content-type` == 'text/csv; charset=utf-8'){
Suggesting a bug is usually more well received if you have a reproducible example.
Also, you could totally make this change yourself locally by cloning out the git repository, editing the file, rebuild the package, and then confirm if you no longer get an error in your code.

Automate Response at Prompt in R interactive

Please see below my reference to a previous question asked along these lines.
I am running the library taxize in R. Taxize includes a function for getting a stable number associated with a scientific name, get_tsn().
I can run this in interactive mode or non-interactive mode so that I am either
prompted or not, respectively, to choose among multiple hits.
Interactive:
> tax.num <- get_tsn("Acer rubrum", ask=TRUE)
Retrieving data for taxon 'Acer rubrum'
tsn target commonNames nameUsage
1 28728 Acer rubrum red maple accepted
2 28730 Acer rubrum ssp. drummondii NA not accepted
3 526853 Acer rubrum var. drummondii Drummond's maple accepted
...
More than one TSN found for taxon 'Acer rubrum'!
Enter rownumber of taxon (other inputs will return 'NA'):
Non-interactive:
> tax.num <- get_tsn("Acer rubrum", ask=TRUE)
Retrieving data for taxon 'Acer rubrum'
Warning message:
> 1 result; no direct match found
I need to run this library in interactive mode so that I do not get an empty result when there is more than one match. However, babysitting this script is totally unrealistic for the size of my data, which are in the millions of scientific names. Thus, I want to automate a response to the prompt so that the answer is always 1. This will be the right answer for probably 99% of cases and will ultimately still lead to the right answer downstream in 100% of cases for reasons that are probably beyond the scope of this question.
Thus, how can I automate the response to always be 1?
I looked at this question and tried modifying my code accordingly.
options(httr_oauth_cache=T)
tax.num <- get_tsn("Acer rubrum",ask=T)
However, this gave the same result shown for interactive mode above.
Your help is appreciated.
UPDATE: Ignore below. Obviously Nathan Werth posted the best answer in a comment above.
tax.num <- get_tsn_(searchterm = "Acer rubrum", rows = 1)
works wonderfully!
...
I decided to modify the source code to handle this. I suspect that there is a more desirable solution, but this one meets my needs.
Thus, in the file get_tsn.R from the source, I replaced the following block of code
# prompt
message("\n\n")
print(tsn_df)
message("\nMore than one TSN found for taxon '", x, "'!\n
Enter rownumber of taxon (other inputs will return 'NA'):\n")
# prompt
take <- scan(n = 1, quiet = TRUE, what = 'raw')
with
take <- 1
I could have deleted other echoing to screen bits, that are unnecessary and now not true.
The revised function, which I tested using trace("get_tsn",edit=TRUE), returns as follows:
> print(tax.num)
[1] "28728"
attr(,"match")
[1] "found"
attr(,"multiple_matches")
[1] TRUE
attr(,"pattern_match")
[1] FALSE
attr(,"uri")
[1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?
search_topic=TSN&search_value=28728"
attr(,"class")
[1] "tsn"
I will recompile and install it on Linux now with the edit for use with this particular project.
I still welcome other, better answers.

window function R code

fine people of stackoverflow. I have become trapped on a rather simple part of my program and was wondering if you guys could help me.
library(nonlinearTseries)
tt<-c(0,500,1000)
mm<-rep(0,2)
for (j in 1:2){mm[j]=estimateEmbeddingDim(window(rnorm(1000), start=tt[j],end=tt[j+1]), number.points=(tt[j+1]-tt[j]),do.plot=FALSE)}
Warning message:
In window.default(rnorm(1000), start = tt[j], end = tt[j + 1]) :
'start' value not changed
If I plug in the values directly (tt[1], tt[2], tt[3]), it works but I also get a warning
estimateEmbeddingDim(window(rnorm(1000), start=tt[1],end=tt[2]), number.points=(tt[2]-tt[1]),do.plot=FALSE)
[1] 9
Warning message:
In window.default(rnorm(1000), start = tt[1], end = tt[2]) :
'start' value not changed
Thanks, Matt.
The problem seems to be with the
window(rnorm(1000), start=tt[j],end=tt[j+1])
lines. First of all, window is only meant to be used with a time series object (class=="ts"). In this case, rnorm(1000) simply returns a numeric vector, there are no dates associated with this object. So i'm not sure what you think this function does. Did you only want to extract the values that were between 0-500 and 500-1000? If so that seems a bit because with a standard normal variable, the max of 1000 samples isn't likely to be much over 4 let alone 500.
So be sure to use a proper "ts" object with dates and everything to get this to work.

`data.table` error: "reorder received irregular lengthed list" in setkey

I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:
setkey(my.dt,my.column)
I receive the following cryptic error message:
"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"
I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.
Even more frustrating, the following code completes correctly:
first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)
I have no idea what could be going on here. Any tips?
Edit 1: I have confirmed every value in the key fits a fairly standard format:
> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE
Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.
> Sys.info()
sysname release version nodename machine login
"Windows" "Server 2008 x64" "build 7600" "WIN-9RH28AH0CKG" "x86-64" "Administrator"
user effective_user
"Administrator" "Administrator"
Please run :
sapply(my.dt, length)
I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.
If so, there is this bug fix in v1.8.1 :
o rbind() of DT with an irregular list() now recycles the list items
correctly, #2003. Test added.
Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.
EDIT :
The original cryptic error message is now improved in v1.8.1, as follows :
DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)
Error in setkeyv(x, cols, verbose = verbose) :
Column 2 is length 4 which differs from length of column 1 (6). Invalid
data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
not already reported and fixed, please report to datatable-help.
NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.

Can't get end value of an IRange in R/Bioconductor

I am new to the IRanges package and am having trouble getting the end value of an IRange. I am able to get the start and width values with no problem, which has me a bit baffled, and my case/spelling of end match the header line. Has anyone else run into this or can please spot what I am doing wrong? Thanks and it is much appreciated!
library(IRanges)
> test=IRanges(100645,100664)
> test
IRanges of length 1
start end width
[1] 100645 100664 20
> test#start
[1] 100645
> test#width
[1] 20
> test#end
Error: no slot of name "end" for this object of class "IRanges"
The easiest manner to access the fields of an IRange object is using the helper functions: start(),end() and width(). These will return a vector with all the elements of the corresponding column.
No experience with the package, but based on ?"class:Ranges":
end(test$ranges[1])
It would also help in the future to provide reproducible sample data.

Resources