Error in Sys.sleep(reset + 2) : invalid 'time' value - r

I am currently trying to download data using the search_tweets command from the rtweet package. I have a list of over 400 requests that I want to loop it over. The code seems to run without problems, yet, after a while this happens:
retry on rate limit...
waiting about 16 minutes... #which it then counts down#
Error in Sys.sleep(reset + 2) : invalid 'time' value
Searching for related questions, I only found this: https://github.com/ropensci/rtweet/issues/277. There they say that in the latest rtweet version this issue has been solved. But I am using the latest R and rtweet versions.
Has someone experienced a similar issue? And how have you been able to solve it?
This is the code I am using. Please don't hesitate to tell me if it is a mistake in the code that causes the problem. I was wondering e.g. if it is possible to include a condition that only runs the next request as soon as the first request is fully downloaded?
for (x in 1:length(mdbs)) {
mdbName = mdbs[x]
print(mdbName)
myMDBSearch = paste0(mdbName,"lang:de -filter:retweets")
print(myMDBSearch)
req <- search_tweets(
q = myMDBSearch,
n = 100000,
retryonratelimit = TRUE,
token = bearer_token(),
parse=TRUE
)
data_total <- rbind(data_total,req)
print("sleeping 5 seconds")
Sys.sleep(5)
}

Related

STRINGdb r environment; error in plot_network

I'm trying to use stringdb in R and i'm getting the following error when i try to plot the network:
Error in if (grepl("The document has moved", res)) { : argument is
of length zero
code:
library(STRINGdb)
#(specify organism)
string_db <- STRINGdb$new( version="10", species=9606, score_threshold=0)
filt_mapped = string_db$map(filt, "GeneID", removeUnmappedRows = TRUE)
head(filt_mapped)
(i have columns titled: GeneID, logFC, FDR, STRING_id with 156 rows)
filt_mapped_hits = filt_mapped$STRING_id
head(filt_mapped_hits)
(156 observations)
string_db$plot_network(filt_mapped_hits, add_link = FALSE)
Error in if (grepl("The document has moved", res)) { : argument is
of length zero
You are using quite few years old version of Bioconductor and by extension the STRING package.
If you update to the newest one, it will work. However the updated package only supports only the latest version STRING (currently version 11), so the underlying network may change a bit.
More detailed reason is this:
The STRING's hardware infrastructure underwent recently major changes which forced a different server setup.
Now all the old calls are forwarded to a different URL, however the cURL call, how it was implemented, does not follow our redirects which breaks the STRINGdb package functionality.
We cannot update the old bioconductor package and our server setup can’t be really changed.
That said, the fix for an old version is relatively simple.
In STRINGdb library there is script with all the methods "rstring.r".
In there you’ll find “get_png” method. In it replace this line:
urlStr = paste("http://string-db.org/version_", version, "/api/image/network", sep="" )
With this line:
urlStr = paste("http://version", version, ".string-db.org/api/image/network", sep="" )
Load the library again and it should create the PNG, as before.

blogdown::new_post not working with date=""

The function blogdown::new_post() recently stopped working for me with the parameter date="".
here's the line of code:
blogdown:::new_post("home", kind = "default-frontpage", open=F, date="", subdir="", ext = ".Rmd")
it gives this error:
Error in if (tryCatch(date > Sys.Date(), error = function(e) FALSE)) warning("The date of the post is in the future: ", :
missing value where TRUE/FALSE needed
Someone else who works in the lab has tried to reproduce this error on their computer, but wasn't able to. All of my R packages are up to date, according to RStudio.
When I call Sys.Date(), it returns today's date as: "2020-11-29"
I can specify a date in new_post(), but it would require re-writing a substantial amount of our code, and it seems like this changes the automatically generated title of the post.
Could anyone suggest a next step?
Thanks a bunch!
Caleb
I just fixed this issue on Github, and you may try
remotes::install_github('rstudio/blogdown')
Thanks for the report!

R curl::has_internet() FALSE even though there are internet connection

My problem arose when downloading data from EuroSTAT using the R package eurostat:
# Population data by NUTS3
pop_data <- subset(eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num"),
(age == "TOTAL") & (sex == "T") &
(nchar(trimws(geo)) == 5))[, c("time","geo","values")]
#Fejl i eurostat::get_eurostat("demo_r_pjangrp3", time_format = "num") :
# You have no internet connection, please reconnect!
Seaching, I have found out that it is the statement (in the eurostat-package code):
if (curl::has_internet() {stop("You have no inernet connection, please connnect") that cause the problem.
However, I have interconnection and can e.g. ping www.eurostat.eu
I have tried curl::has_internet() on different computers, all with internet connection. On some it work (respond TRUE) on others it don't.
I have talked with our IT department, and we tried if it could be a firewall problem. Removing the firewall, did not solve the problem.
Unfortunately, I am ignorant on network-settings. Hence, when trying to read the documentation for the curl-package I am lost.
Downloading data from EuroSTAT using the command above have worked for the last at least 2 years, and for me the problem arose at the start of 2020 (January 7).
Hope someone can help with this, as downloading population data from EuroSTAT is a mandatory part in more of my/our regular work.
In the special case of curl::has_internet, you don't need to modify the function to return a specific value. It has its own enclosing environment, from which it reads a state variable indicating whether a proxy connection exists. You can modify that state variable instead.
assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
curl::has_internet() # will always be TRUE
# [1] TRUE
It's difficult to tell without knowing your settings but there are a couple of things to try. This issue has been noted and possibly addressed in a development version which you can install with
install.packages("https://github.com/jeroen/curl/archive/master.tar.gz", repos = NULL)
You could also try updating libcurl, which is the C library for which the R package acts as an R interface. The problem you describe seems to be more common with older versions of libcurl.
If all else fails, you could overwrite the curl::has_internet function like this:
remove_has_internet <- function()
{
unlockBinding(sym = "has_internet", asNamespace("curl"))
assign("has_internet", function() return(TRUE), envir = asNamespace("curl"))
lockBinding(sym = "has_internet", asNamespace("curl"))
}
Now if you run remove_has_internet(), any call to curl::has_internet() will return TRUE for the remainder of your R session. However, this will only work if other curl functionality is working properly with your network settings. If it isn't then you will get other strange errors and should abandon this approach.
If, for any reason, you want to restore the functionality of the original curl::has_internet without restarting an R session, you can do this:
restore_has_internet <- function()
{
unlockBinding(sym = "has_internet", asNamespace("curl"))
assign("has_internet",
function() {!is.null(nslookup("r-project.org", error = FALSE))},
envir = asNamespace("curl"))
lockBinding(sym = "has_internet", asNamespace("curl"))
}
I just got into this problem, so here's an additional solution, blending both previous answers. It's reversible and checks if we actually have internet to avoid bigger problems later.
# old value
op = get("has_internet_via_proxy", environment(curl::has_internet))
# check for internet
np = !is.null(curl::nslookup("r-project.org", error = FALSE))
assign("has_internet_via_proxy", np, environment(curl::has_internet))
Within a function, this line can be added to automatically revert the process:
on.exit(assign("has_internet_via_proxy", op, environment(curl::has_internet)))

R - arules apriori. Error in length(obj) : Method length not implemented for class rules

I understand there is another question already for this...I am new and thus unable to comment on it. Additionally, I don't believe the question was answered.
Anyway I am running arules pkg: aprori.
I am using the following params:
testbasket_rules <- apriori(testbasket_txn, parameter = list(sup = 0.1, conf = 0.5, maxlen = 100))
I get 2 rules back, but also the error:
Error in length(obj) : Method length not implemented for class rules...
So I can't even inspect the 2 rules that were generated
I can mess around with the sup & conf and get more or less rules back, but always get the length error.
I checked my max basket length and it is not more than 100 and you can see that i set my maxlen to 100.
Does anybody have any ideas how to resolve?
I shut down my computer when I went home, upon restart the next day and opening up R and trying the script again I got desired output. Not sure what exactly was causing the issues...perhaps something with packages not loading correctly. But issue seems to be resolved now

How do I close unused connections after read_html in R

I am quite new to R and am trying to access some information on the internet, but am having problems with connections that don't seem to be closing. I would really appreciate it if someone here could give me some advice...
Originally I wanted to use the WebChem package, which theoretically delivers everything I want, but when some of the output data is missing from the webpage, WebChem doesn't return any data from that page. To get around this, I have taken most of the code from the package but altered it slightly to fit my needs. This worked fine, for about the first 150 usages, but now, although I have changed nothing, when I use the command read_html, I get the warning message " closing unused connection 4 (http:....." Although this is only a warning message, read_html doesn't return anything after this warning is generated.
I have written a simplified code, given below. This has the same problem
Closing R completely (or even rebooting my PC) doesn't seem to make a difference - the warning message now appears the second time I use the code. I can run the querys one at a time, outside of the loop with no problems, but as soon as I try to use the loop, the error occurs again on the 2nd iteration.
I have tried to vectorise the code, and again it returned the same error message.
I tried showConnections(all=TRUE), but only got connections 0-2 for stdin, stdout, stderr.
I have tried searching for ways to close the html connection, but I can't define the url as a con, and close(qurl) and close(ttt) also don't work. (Return errors of no applicable method for 'close' applied to an object of class "character and no applicable method for 'close' applied to an object of class "c('xml_document', 'xml_node')", repectively)
Does anybody know a way to close these connections so that they don't break my routine? Any suggestions would be very welcome. Thanks!
PS: I am using R version 3.3.0 with RStudio Version 0.99.902.
CasNrs <- c("630-08-0","463-49-0","194-59-2","86-74-8","148-79-8")
tit = character()
for (i in 1:length(CasNrs)){
CurrCasNr <- as.character(CasNrs[i])
baseurl <- 'http://chem.sis.nlm.nih.gov/chemidplus/rn/'
qurl <- paste0(baseurl, CurrCasNr, '?DT_START_ROW=0&DT_ROWS_PER_PAGE=50')
ttt <- try(read_html(qurl), silent = TRUE)
tit[i] <- xml_text(xml_find_all(ttt, "//head/title"))
}
After researching the topic I came up with the following solution:
url <- "https://website_example.com"
url = url(url, "rb")
html <- read_html(url)
close(url)
# + Whatever you wanna do with the html since it's already saved!
I haven't found a good answer for this problem. The best work-around that I came up with is to include the function below, with Secs = 3 or 4. I still don't know why the problem occurs or how to stop it without building in a large delay.
CatchupPause <- function(Secs){
Sys.sleep(Secs) #pause to let connection work
closeAllConnections()
gc()
}
I found this post as I was running into the same problems when I tried to scrape multiple datasets in the same script. The script would get progressively slower and I feel it was due to the connections. Here is a simple loop that closes out all of the connections.
for (i in seq_along(df$URLs)){function(i)
closeAllConnections(i)
}

Resources