httr GET function running out of space when downloading a large file - r

i'm trying to download a file that's 1.1 gigabytes with httr but i'm hitting the following error:
x <- GET( extract.path )
Error in curlPerform(curl = handle$handle, .opts = curl_opts$values) :
cannot allocate more space: 1728053248 bytes
my C drive has 400GB free..
in the RCurl package, i see the maxfilesize and maxfilesize.large options when using getCurlOptionsConstants() but i don't understand if/how these might be passed to httr through config or set_config.. or if i need to switch over to RCurl for this.. and even if i do need to switch, will increasing the maximum filesize work?
here's my sessionInfo..
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.96-1.1 httr_0.2
loaded via a namespace (and not attached):
[1] digest_0.6.0 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.0
..and (this is not recommended, just because it will take you a while) if you want to reproduce my error, you can go to https://usa.ipums.org/usa-action/samples, register for a new account, choose the 2011 5-year acs extract, add about a hundred variables, and then wait for the extract to be ready. then edit the first three lines and run the code below. (again, not recommended)
your.email <- "email#address.com"
your.password <- "password"
extract.path <- "https://usa.ipums.org/usa-action/downloads/extract_files/some_file.csv.gz"
require(httr)
values <-
list(
"login[email]" = your.email ,
"login[password]" = your.password ,
"login[is_for_login]" = 1
)
POST( "https://usa.ipums.org/usa-action/users/validate_login" , body = values )
GET( "https://usa.ipums.org/usa-action/extract_requests/download" , query = values )
# this line breaks
x <- GET( extract.path )

FYI - this has been added in the write_disk() control in httr:
https://github.com/hadley/httr/blob/master/man/write_disk.Rd

GET calls httr:::make_request this sets the curl options defined in config = list(). However it appears the writefunction otpion is hard coded in 'httr'
opts$writefunction <- getNativeSymbolInfo("R_curl_write_binary_data")$address
You will probably need to use RCurl and define an appropriate `writefunction'. The following
solution Create a C-level file handle in RCurl for writing downloaded files from #Martin Morgan appears to be the way to go.

Related

data.table `:=` call stuck

I updated all my packages today via the RStudio package feature, which I haven't done for a while. After doing so, a line using data.table's := operator got stock in an endless loop. Please find a reproducible example below:
fun <- function(x, N, shift=0) {
if (is.na(N==length(x))) {
return(NA)
}
if (N==length(x)) {
return(NA)
} else {
x[(N+1+shift):(length(x)+shift)]
}
}
library(data.table)
DT <- data.table(ID = rep(1:1000, each=100),
QTR = rep(1:100, times=1000),
vec = list(1:100))
#Stuck in endless calculation
DT[, vec2:=list(list(fun(x=unlist(vec), N=QTR, shift=-1))), by=list(ID,QTR)]
# Takes less than a second
DT[1:nrow(DT), vec2:=list(list(fun(x=unlist(vec), N=QTR, shift=-1))), by=list(ID,QTR)]
The first call keeps running and running for me within RStudio. If I select all rows in the second call (hence, in reality, not changing anything at all), it runs in less than a second.
My sessionInfo:
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.13.0
loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3
My questions:
Is this issue specific to my set up or is someone else able to reproduce it?
In the latter case, is it because my code is incorrect or is that a bug in a new data.table version? I notice a note from data.table about it using 2 threads since 1.13.0, but looking at ?getDTthreads()` it is not clear to me if this is related and what I could/should change to make it work again.
Glad the update resolved it!
Comment from the post:
In the change log https://rdatatable.gitlab.io/data.table/news/index.html#unreleased-data-table-v1-13-1-in-development-, bug fix 3
mentions a regression in performance with lists. Try upgrading to
1.13.1 perhaps. I have 1.12.8 and also can't replicate it

Displaying UTF-8 encoded characters in R

I am using the RODBC package to read data from SQL server. R is reading the Chinese characters as "?????"
I have passed the parameter DBMSencoding = "UTF-8" to the odbcConnect function.
Following is the sample code I am using:
Connection <- odbcConnect("abc", uid = "123", pwd = "123",
DBMSencoding = "UTF-8", readOnlyOptimize=T)
Var1 <- sqlQuery(Connection, query, errors = TRUE, stringsAsFactors=F)
May be I didn't pass the arguments the way I am supposed to?
sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RODBC_1.3-12
loaded via a namespace (and not attached):
[1] tools_3.2.3
odbcGetInfo(mainConnection)
DBMS_Name DBMS_Ver Driver_ODBC_Ver Data_Source_Name Driver_Name
"Microsoft SQL Server" "10.50.4000" "03.52" "SQLSRV32.DLL"
Driver_Ver ODBC_Ver Server_Name
"06.01.7601" "03.80.0000"
Check the database's character encoding:
select userenv('language') from dual;
SIMPLIFIED CHINESE_CHINA.AL32UTF8
Change your Environment Variable NLS_LANG before connecting to the database:
Sys.setenv(NLS_LANG="SIMPLIFIED CHINESE_CHINA.AL32UTF8")
Connection <- odbcConnect("abc", uid = "123", pwd = "123", DBMSencoding = "UTF-8", readOnlyOptimize=T)
R on Windows has a lot of problems displaying characters outside of ASCII, even though it is often faithfully representing them internally. There is a lot of information in this answer about why this is the case, and some simple diagnostics in this answer. First try plotting, like:
# first, make sure plotting Chinese works in general
# (i.e., you have an appropriate font)
hanzi <- "漢字"
plot(1, 1, type="n")
text(1, 1, hanzi)
If that works, replace the hanzi <- "漢字" line with your sql query line to get some Chinese text from your database into a string variable, and try plotting that. If it shows up on the plot, then the characters are being read fine and represented internally fine, and the problem is just displaying them in the console. If plotting worked for the "漢字" string variable but doesn't work for your SQL-extracted string, then at least you know that the problem is actually with the SQL part and not just with display in the console.
I got the same problem and successfully solved it. It was quite simple. Go to Control Panel --> Region and Language --> Administrative --> Change system locale --> Chinese.

Error: "could not find function "lengths"

While running:
reg_model = glmer(modeling1 ~ pTOT_VIOPT_CT + (1|STUDY_CODE_ALIAS),
data=data_CACZ, family=binomial)
I am getting the error:
Error in .fixupDimnames(.Object#Dimnames) :
could not find function "lengths"
Not sure what could be the possible reason.
FYI: I am putting below the sessionInfo():
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rcpp_0.11.5 lme4_1.1-10 Matrix_1.2-3 car_2.0-21 MASS_7.3-29
loaded via a namespace (and not attached):
[1] grid_3.0.2 lattice_0.20-29 minqa_1.2.4 nlme_3.1-118 nloptr_1.0.4 nnet_7.3-7 splines_3.0.2 tools_3.0.2
(From the maintainer of 'Matrix'): This is indeed really strange,
but differently than #nicola suggests.
It is okay to update Recommended packages such as Matrix without updating R. That's indeed one feature of these.
Matrix 1.2-3 is indeed the current version of Matrix on CRAN, which is good.
However, since Matrix 1.2-2, there is the following code in Matrix/R/zzz.R :
if(getRversion() >= "3.2.0") {
......
} else {
......
lengths <- function (x, use.names = TRUE) vapply(x, length, 1L, USE.NAMES = use.names)
}
But that code only works correctly for you, when the Matrix package is built from the source with a correct version of R, i.e. in your case with a version of R before 3.2.0.
This (correctly built) happens for all platforms but Windows (the broken by choice one).
If you really don't want (or can't easily) upgrade your version of R,
you have two other options:
downgrade your version of Matrix to the one that comes with your version of R. As you probably have installed the new Matrix replacing the old one, you need to get that and replace the newer version of Matrix. HOWEVER, I guess you would also have to downgrade lme4, and in the end you suffer too much.
Safer and in principle correct: Install the 'Rtools' collection for Windows (you need the one that matches your version of R !)
including C compiler etc, until the following works correctly
install.packages("Matrix", type = "source")
The work to do this may well be worth it. You will be able to install other R packages from their source, which is very useful anyway
(I would not want to live without this possibility if I had to work with Windows).

Using special characters in Rstudio

I am working with some special characters in Rstudio. It coverts them into plain letters.
print("Safarzyńska2013")
[1] "Safarzynska2013"
x <- "Māori"
x
[1] "Maori"
Is there any way to read in the exact original characters.
Following info might be helpful:
Rstudio default encoding is UTF-8
sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.1.1
This not an exclusively RStudio problem.
Typing print("Safarzyńska2013") on the console of RGui also converts them to plain letters. Running this code from an UTF-8 encoded Script in RGui returns [1] "Safarzy?ska2013".
I don't think that it is a good idea to type such special chars on the console. x <- "SomeString"; Encoding(x) returns "unknown" and that is probably the problem: R has no idea what encoding you are using on the console and probably has no chance to get your original encoding.
I put "Safarzyńska2013\nMāori\n" in a text file encoded with UTF-8. Then the following works fine:
tbl <- read.table('c:/test1.txt', encoding = 'UTF-8', stringsAsFactors = FALSE)
tbl[1,1]
tbl[2,1]
Encoding(tbl[1,1]) # returns "UTF-8"
If you really want to use the console, you probably will have to mask the special chars. In ?Encoding we find the following example to create a word with special chars:
x <- "fa\xE7ile"
Encoding(x)
Actually I don't know at the moment how to get these codes for your special chars and ?Encoding has also no hints...
Go to the label File of RStudio, them click on Save with encoding... , Choose Encoding
UTF-8 , Set as default encoding for source file and save.
Hope this helps

Problems using package 'minet' - could not find function mutinformation

When trying to run the sample code that is in the Minet paper / vignette I am experiencing a nunber of issues, e.g.
mim <- build.mim(discretize(syn.data), estimator)
Error in build.mim(dataset, estimator, disc, nbins):
could not find function "mutinformation"
I have also received other errors such as "unknown estimator" when trying methods prefixed with "mi." e.g. "mi.empirical."
I am running windows 8.1. Any help would be much appreciated!
EDIT 1: Additional Information
After playing around some more, the main problem I am having is when trying to use the discretize function like so:
> data(syn.data)
> disc <- "equalwidth"
> nbins <- sqrt(nrow(syn.data))
> ew.data <- discretize(syn.data, disc, nbins)
Error: could not find function "discretize"
This causes the same Error in all functions e.g. build.mim or minet which utilise discretize. I can run build.mim successfully without including discretize.
In addition, I am getting errors if I use minet (whilst excluding the discretize argument) with any of the mi.* estimation methods, e.g.
> res<-minet(syn.data,"mrnet","mi.empirical","equal width",10)
Error in build.mim(dataset, estimator, disc, nbins) :
could not find function "mutinformation"
However, running the same function with the "spearman" estimator works fine.
Edit 2: Output of sessionInfo()
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] minet_3.20.0
loaded via a namespace (and not attached):
[1] tools_3.1.0
discretize() is a function from the package infotheo that you can find in CRAN. There are some references to that package in the minet documentation. Maybe the authors of minet have moved some functionality to the infotheo package but since it is not a dependency it does not get installed automatically. It may be worth contacting the authors about this.
library(infotheo)
data(syn.data)
disc <- "equalwidth"
nbins <- sqrt(nrow(syn.data))
ew.data <- discretize(syn.data, disc, nbins)
The same applies for the multiinformation() function(). It is part of the infotheo package.

Resources