wget options to get output straight to R

wget options to get output straight to R - r

I have a wget_string and commands made as follows:
wget_string <- paste("wget --user=" , u_name , " --password=", p_word, " ", my_urls,' -qO ', file_name, sep="")
system(wget_string)
readLines(file_name)
Which works, but then I have to use readLines() to read the file into R. I would like to run the command so that file is available directly in R, without being saved to the hard disk and then loaded from it.
I'm hoping to save resources by loading the file straight into R from the web. Can't use readlines from the beginning because of the secure server. What are the options for this?

An intern=TRUE parameter the system function has. Capture the output from the command it does. Use the force, and the right options to wget to print to stdout:
> wget_string="wget -qO- http://www.google.com"
> s = system(wget_string,intern=TRUE)
If your returned data is a CSV file, textConnection you can use, feed it to read.csv you can.

If you use the -O - option of wget you set the output to the standar output (writing directly on the screen). In this way you can read directly from the output of the wget command.
E.g.
wget -O - http: //www.address.com
Will download the web page and print it directly to the standard output. So you can directly read the output of the system(wget_string).
From the wget man page:
-O file
--output-document=file
The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If - is used as file, documents will be printed to standard output, disabling link conversion. (Use
./- to print to a file literally named -.)

Related

Default R output with a command line parameter

I am trying to provide a way of running unattended R scripts through Rscript. The problem I am having is that the default output for graphics is a PDF file in the current directory. I would like to redirect this output to a separate folder but without having to change the script too much.
If my script is something simple as:
args = commandArgs(trailingOnly = TRUE)
mydata <- get some data frame somehow
plot(tayside)
And I execute the following commandline:
Rscript.exe --vanilla --default-packages=RODBC,graphics,grDevices sample.R > C:\temp\sample.Rout
I get a Rplots.pdf in the current folder and the sample.Rout file in the C:\temp\ folder.
Is there a way to specify an output folder and have Rscript put all output there? I have tried playing with the pdf.options(...) to pre-pend a default folder to the file parameter but no can do.

Ok, apparently it was easier than I thought, no need to use pdf.options() but simply pdf() at the top of the file (after getting the arguments):
pdf(paste0(args[1], "MyFile.pdf"))
or, for multiple files:
pdf(paste0(args[1], "MyFile%03d.pdf"), onefile=FALSE)

R: unzipping large compressed .csv yields "zip file is corrupt" warning

I am downloading a 78MB zip file from the UN FAO, which contains a 2.66GB csv. I am able to unzip the the downloaded file from a folder using winzip, but have been unable to unzip the file using unzip() in R:
Warning - 78MB download!
url <- "http://fenixservices.fao.org/faostat/static/bulkdownloads/FoodBalanceSheets_E_All_Data_(Normalized).zip"
path <- file.path(getwd(),"/zipped_data.zip")
download.file(url, path, mode = "wb")
unzipped_data <- unzip(path)
This results in a warning and a failure to unzip the file:
Warning message
In unzip(path) : zip file is corrupt
In the ?unzip documentation I see
"It does have some support for bzip2 compression and > 2GB zip files (but not >= 4GB files pre-compression contained in a zip file: like many builds of unzip it may truncate these, in R's case with a warning if possible)"
This makes me believe that unzip() should handle my file, but this same process has successfully downloaded, unzipped, and read multiple other smaller tables from the FAOstat. Is there a chance that the size of my csv is the source of this error? If so, what is the workaround?

I can't test my solution and it also depends on your installation but hopefully that'll work or at least point you to a suitable solution:
You can run winzip through command line, this page shows the structure of the call
And you can also run command lines from R, with system or shell (which is just a wrapper for system
The command line general structure to extract would be:
winzip32 -e [options] filename[.zip] folder
So we create a string with this structure and your input paths, and we create a function around it that mimics unzip with parameters zipfile and exdir
unzip_wz <- function(zipfile,exdir){
dir.create(exdir,recursive = FALSE,showWarnings=FALSE) # I don't know how/if unzip creates folders, you might want to tweak or remove this line altogether
str1 <- sprintf("winzip32 -e '%s' '%s'",zipfile,exdir)
shell(str1,wait = TRUE) # set to FALSE if you want the program to keep running while unzipping, proceed with caution but in some cases that could be an improvement of your current solution
}
You can try to use this function in place of unzip. It assumes that winzip32 was added to your system path variables, if it isn't, either add it, or replace it by the exec full name so you have something like:
str1 <- sprintf("'C://probably/somewhere/in/program/files/winzip32.exe' -e '%s' '%s'",zipfile,exdir)
PS: use full paths! the command line doesn't know your working directories (we could implement the feature in our function if needed).

I had the same problem running unzip() on Ubuntu Server 20.04. Setting argument unzip(..., unzip = "/usr/bin/unzip"), instead of unzip = "internal", did the trick.

Error with fread in R--embedded nul in string: '\0'

I am trying to read a csv file >4GB, However, when I use fread command it produces and error
library(data.table)
csv1 <- fread("cleaned.csv",sep = ",",colClasses = "character",showProgress = TRUE)
Error: embedded nul in string: '\0'
After some looking I found that you could use sed function
such as in this stackoverflow Question But I have no clue how to use it in my scenario. Please help!
UPDATE:
I have attempted to use the sed function as described below in comments, however, they throw an error.
sed couldn't flush stdout no space left on device
UPDATE2:
I have solved it with the help of some colleagues.However, I am still looking to automate this activity since I had to repeat the process for each file. Expected Automation would either be from within the R or using a BASH Script. Any Suggestions?

The csv files were populated with ^# and they were placed within the blank values, somehow they couldn't be searched or replaced via sed commands to solve the problem, I followed the following solution.
In linux, follow to the file directory and use vim command such as,
vim filename.csv
:%s/CTRL+2//g
ESC #TO SWITCH FROM INSERT MODE
:wq # TO SAVE THE FILE
I had to do this manually for every file. However, I still looking for a way to automate this either within R or using from BASH script.

Converting .Rd file to plain text

I'm trying to convert R documentation files (extension .Rd) into plain text. I am aware that RdUtils contains a tool called Rdconv, but as far as I know it can only be used from the command line. Is there a way to access Rdconv (or a similar conversion tool) from within an R session?

Try
tools::Rd2txt("path/to/file.Rd")

You may always invoke a system command e.g. with the system2 function:
input <- '~/Projekty/stringi/man/stri_length.Rd'
output <- '/tmp/out.txt'
system2('R', paste('CMD Rdconv -t txt', filename, '-o', output))
readLines(output)
## [1] "Count the Number of Characters"
## ...
Make sure that R is in your system's search path. If not, replace the first argument of system2() above with full path, e.g. C:\Program Files\R\3.1\bin\R.exe.

Print plain text of help file to console [duplicate]

I'd like to be able to write the contents of a help file in R to a file from within R.
The following works from the command-line:
R --slave -e 'library(MASS); help(survey)' > survey.txt
This command writes the help file for the survey data file
--slave hides both the initial prompt and commands entered from the
resulting output
-e '...' sends the command to R
> survey.txt writes the output of R to the file survey.txt
However, this does not seem to work:
library(MASS)
sink("survey.txt")
help(survey)
sink()
How can I save the contents of a help file to a file from within R?

Looks like the two functions you would need are tools:::Rd2txt and utils:::.getHelpFile. This prints the help file to the console, but you may need to fiddle with the arguments to get it to write to a file in the way you want.
For example:
hs <- help(survey)
tools:::Rd2txt(utils:::.getHelpFile(as.character(hs)))
Since these functions aren't currently exported, I would not recommend you rely on them for any production code. It would be better to use them as a guide to create your own stable implementation.

While Joshua's instructions work perfectly, I stumbled upon another strategy for saving an R helpfile; So I thought I'd share it. It works on my computer (Ubuntu) where less is the R pager. It essentially just involves saving the file from within less.
help(survey)
Then follow these instructions to save less buffer to file
i.e., type g|$tee survey.txt
g goes to the top of the less buffer if you aren't already there
| pipes text between the range starting at current mark
and ending at $ which indicates the end of the buffer
to the shell command tee which allows standard out to be sent to a file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

wget options to get output straight to R - r

Related

Default R output with a command line parameter

R: unzipping large compressed .csv yields "zip file is corrupt" warning

Error with fread in R--embedded nul in string: '\0'

Converting .Rd file to plain text

Print plain text of help file to console [duplicate]

Categories

Resources