How to dump png to stdout? [duplicate] - r

Is it possible to get R to write a plot in bitmap format (e.g. PNG) to standard output? If so, how?
Specifically I would like to run Rscript myscript.R | other_prog_that_reads_a_png_from_stdin. I realise it's possible to create a temporary file and use that, but it's inconvenient as there will potentially be many copies of this pipeline running at the same time, necessitating schemes for choosing unique filenames and removing them afterwards.
I have so far tried setting outf <- file("stdout") and then running either bitmap(file=outf, ...) or png(filename=outf, ...), but both complain ('file' must be a non-empty character string and invalid 'filename' argument, respectively), which is in line with the official documentation for these functions.
Since I was able to persuade R's read.table() function to read from standard input, I'm hoping there's a way. I wasn't able to find anything relevant here on SO by searching for [r] stdout plot, or any of the variations with stdout replaced by "standard output" (with or without double quotes), and/or plot replaced by png.
Thanks!

Unfortunately the {grDevices} (and, by implication, {ggplot2}) seems to fundamentally not support this.
The obvious approach to work around this is: let a graphics device write to a temporary file, and then read that temporary file back into the R session and write it to stdout.
But this fails because, on the one hand, the data cannot be read into a string: character strings in R do not support embedded null characters (if you try you’ll get an error such as “nul character not allowed”). On the other hand, readBin and writeBin fail because writeBin categorically refuses to write to any device that’s hooked up to stdout, which is in text mode (ignoring the fact that, on POSIX system, the two are identical).
This can only be circumvented in incredibly hacky ways, e.g. by opening a binary pipe to a command such as cat:
dev_stdout = function (underlying_device = png, ...) {
filename = tempfile()
underlying_device(filename, ...)
filename
}
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = pipe('cat', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}
To use it:
tmp_dev = dev_stdout()
contour(volcano)
dev_stdout_off(tmp_dev)
On systems where /dev/stdout exists (which are most but not all POSIX systems), the dev_stdout_off function can be simplified slightly by removing the command redirection:
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = file('/dev/stdout', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}

This might not be a complete answer, but it's the best I've got: can you open a connection using the stdout() command? I know that png() will change the output device to a file connection, but that's not what you want, so it might work to simply substitute png by stdout. I don't know enough about standard outputs to test this theory, however.
The help page suggests that this connection might be text-only. In that case, a solution might be to generate a random string to use as a filename, and pass the name of the file through stdout so that the next step in your pipeline knows where to find your file.

Related

Pass result of Julia `download` to memory instead of file?

With Julia 1.6's download function, the typical behavior is to output to a file. How can I save the result directly to something in memory?
E.g. I'd like something like:
result = download(url)
contains(result,"hello")
As suggested by the help text for download, use the Downloads library; download can take an IOBuffer. Example:
result = String(take!(Downloads.download(url,IOBuffer())))
Julia uses the curl library, or something similar, for the download function, and that library writes to a file by default, or to stdout, not to a C or Julia string. Consider that many downloads may be large, perhaps larger than system RAM, to see why.
You could easily extend Julia to download, create a string for the download, and remove the temp file:
import Base.download
function Base.download(url::AbstractString, String)
tmpfile = download(url)
str = read(tmpfile, String)
rm(tmpfile)
return str
end
Watch out for big files though :)
You can also use UrlDownload.jl library. It uses HTTP.jl instead of curl, so it always keep the result in memory, and you can process it on the fly.
julia> using UrlDownload
julia> url = "https://raw.githubusercontent.com/Arkoniak/UrlDownload.jl/master/data/ext.csv"
"https://raw.githubusercontent.com/Arkoniak/UrlDownload.jl/master/data/ext.csv"
julia> urldownload(url, parser = x -> String(x))
"x,y\n1,2\n3,4\n"

R: filename list result not recognized for actually reading the file (filename character encoding problem)

I get .xlsx files from various sources, to read and analyse the data in R, working great. Files are big, 10+ MB. So far, readxl::read_xlsx was the only solution that worked. xlsx::read.xls produced only error messages: Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: GC overhead limit exceeded)
Problem: some files have non-standard letters in the filename, e.g. displayed in Windows 10/explorer as '...ü...xlsx' (the character 'ü' somewhere in the filename). When I read all filenames in the folder in R, I get '...u"...xlsx'). I check for doublettes of the filenames from different folders before I actualle read the files. However, when it comes to read the above file, I get an error message '... file does not exist', no matter if I use
the path/filename character variable directly obtained from list.files (showing '...u"...xlsx')
the string constant '...u"...xlsx'
the string constant '...ü...xlsx'
As far as I understand, the problem arises from aequivalent, yet not identical, unicode compositions. I have no influence on how these characters are originally encoded. Therefore I see no way to read the file, other than (so far manually) rename the file in Windows explorer, changing an 'ü' coded as 'u+"' to 'ü'.
Questions:
is there a workaround within R? (keep in mind the requirement to use read_xlsx, unless a yet unknown package works with huge files.
if not possible within R, what would be the best option to change filenames automatically ('u+"' to 'ü') - I need to keep the 'ü' (or ä, ö, and others) in order to connect the analysis results back to the input), preferrably without additional (non-standard) software (e.g. command shell).
EDIT:
To read the list of files, dir_ls works (as suggested), but it returns an even stranger filename: 'ö' instead of 'ö', which in turn cannot be read (found) by read_xlsx either.
try using the fs library. My workflow looks something like this:
library(tidyverse)
library(lubridate)
library(fs)
library(readxl)
directory_to_read <- getwd()
file_names_to_read <- dir_ls(path = directory_to_read,
recurse = FALSE, # set this to TRUE to read all subdirectories
glob = "*.xls*",
ignore.case = TRUE) %>% # This is to ignore upper/lower case extensions
# Use this to weed out temp files - I constantly have this probles
str_subset(string = .,
regex(pattern = "\\/~\\$", ignore_case = TRUE), #use \\ before $ else it will not work
negate = TRUE) # TRUE Returns non-matching patterns
map(file_names_to_red[4], read_excel)

Writing a plot in bitmap format (e.g. PNG) to standard output

Is it possible to get R to write a plot in bitmap format (e.g. PNG) to standard output? If so, how?
Specifically I would like to run Rscript myscript.R | other_prog_that_reads_a_png_from_stdin. I realise it's possible to create a temporary file and use that, but it's inconvenient as there will potentially be many copies of this pipeline running at the same time, necessitating schemes for choosing unique filenames and removing them afterwards.
I have so far tried setting outf <- file("stdout") and then running either bitmap(file=outf, ...) or png(filename=outf, ...), but both complain ('file' must be a non-empty character string and invalid 'filename' argument, respectively), which is in line with the official documentation for these functions.
Since I was able to persuade R's read.table() function to read from standard input, I'm hoping there's a way. I wasn't able to find anything relevant here on SO by searching for [r] stdout plot, or any of the variations with stdout replaced by "standard output" (with or without double quotes), and/or plot replaced by png.
Thanks!
Unfortunately the {grDevices} (and, by implication, {ggplot2}) seems to fundamentally not support this.
The obvious approach to work around this is: let a graphics device write to a temporary file, and then read that temporary file back into the R session and write it to stdout.
But this fails because, on the one hand, the data cannot be read into a string: character strings in R do not support embedded null characters (if you try you’ll get an error such as “nul character not allowed”). On the other hand, readBin and writeBin fail because writeBin categorically refuses to write to any device that’s hooked up to stdout, which is in text mode (ignoring the fact that, on POSIX system, the two are identical).
This can only be circumvented in incredibly hacky ways, e.g. by opening a binary pipe to a command such as cat:
dev_stdout = function (underlying_device = png, ...) {
filename = tempfile()
underlying_device(filename, ...)
filename
}
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = pipe('cat', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}
To use it:
tmp_dev = dev_stdout()
contour(volcano)
dev_stdout_off(tmp_dev)
On systems where /dev/stdout exists (which are most but not all POSIX systems), the dev_stdout_off function can be simplified slightly by removing the command redirection:
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = file('/dev/stdout', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}
This might not be a complete answer, but it's the best I've got: can you open a connection using the stdout() command? I know that png() will change the output device to a file connection, but that's not what you want, so it might work to simply substitute png by stdout. I don't know enough about standard outputs to test this theory, however.
The help page suggests that this connection might be text-only. In that case, a solution might be to generate a random string to use as a filename, and pass the name of the file through stdout so that the next step in your pipeline knows where to find your file.

R: possible truncation of >= 4GB file

I have a 370MB zip file and the content is a 4.2GB csv file.
I did:
unzip("year2015.zip", exdir = "csv_folder")
And I got this message:
1: In unzip("year2015.zip", exdir = "csv_folder") :
possible truncation of >= 4GB file
Have you experienced that before? How did you solve it?
I agree with #Sixiang.Hu's answer, R's unzip() won't work reliably with files greater than 4GB.
To get at how did you solve it?: I've tried a few different tricks with it, and in my experience the result of anything using R's built-ins is (almost) invariably an incorrect identification of the end-of-file (EOF) marker before the actual end of the file.
I deal with this issue in a set of files I process on a nightly basis, and to deal with it consistently and in an automated fashion, I wrote the function below to wrap the UNIX unzip. This is basically what you're doing with system(unzip()), but gives you a bit more flexibility in its behavior, and allows you to check for errors more systematically.
decompress_file <- function(directory, file, .file_cache = FALSE) {
if (.file_cache == TRUE) {
print("decompression skipped")
} else {
# Set working directory for decompression
# simplifies unzip directory location behavior
wd <- getwd()
setwd(directory)
# Run decompression
decompression <-
system2("unzip",
args = c("-o", # include override flag
file),
stdout = TRUE)
# uncomment to delete archive once decompressed
# file.remove(file)
# Reset working directory
setwd(wd); rm(wd)
# Test for success criteria
# change the search depending on
# your implementation
if (grepl("Warning message", tail(decompression, 1))) {
print(decompression)
}
}
}
Notes:
The function does a few things, which I like and recommend:
uses system2 over system because the documentation says "system2 is a more portable and flexible interface than system"
separates the directory and file arguments, and moves the working directory to the directory argument; depending on your system, unzip (or your choice of decompression tool) gets really finicky about decompressing archives outside the working directory
it's not pure, but resetting the working directory is a nice step toward the function having fewer side effects
you can technically do it without this, but in my experience it's easier to make the function more verbose than have to deal with generating filepaths and remembering unzip CLI flags
I set it to use the -o flag to automatically overwrite when rerun, but you could supply any number of arguments
includes a .file_cache argument which allows you to skip decompression
this comes in handy if you're testing a process which runs on the decompressed file, since 4GB+ files tend to take some time to decompress
commented out in this instance, but if you know you don't need the archive after decompressing, you can remove it inline
the system2 command redirects the stdout to decompression, a character vector
an if + grepl check at the end looks for warnings in the stdout, and prints the stdout if it finds that expression
Checking ?unzip, found the following comment in Note:
It does have some support for bzip2 compression and > 2GB zip files
(but not >= 4GB files pre-compression contained in a zip file: like
many builds of unzip it may truncate these, in R's case with a warning
if possible).
You can try to unzip it outside of R (using 7-Zip for example).
To add to the list of possible solutions, in case you have Java (JDK) available on your machine, you can wrap jar xf into an R function similar to utils::unzip() in interface, a very simple example:
unzipLarge <- function(zipfile, exdir = getwd()) {
oldWd <- getwd()
on.exit(setwd(oldWd))
setwd(exdir)
system2("jar", args = c("xf", zipfile))
}
And then use:
unzipLarge("year2015.zip", exdir = "csv_folder")

Raw text strings for file paths in R

Is it possible to use a prefix when specifying a filepath string in R to ignore escape characters?
For example if I want to read in the file example.csv when using windows, I need to manually change \ to / or \\. For example,
'E:\DATA\example.csv'
becomes
'E:/DATA/example.csv'
data <- read.csv('E:/DATA/example.csv')
In python I can prefix my string using r to avoid doing this (e.g. r'E:\DATA\example.csv'). Is there a similar command in R, or an approach that I can use to avoid having this problem. (I move between windows, mac and linux - this is just a problem on the windows OS obviously).
You can use file.path to construct the correct file path, independent of operating system.
file.path("E:", "DATA", "example.csv")
[1] "E:/DATA/example.csv"
It is also possible to convert a file path to the canonical form for your operating system, using normalizePath:
zz <- file.path("E:", "DATA", "example.csv")
normalizePath(zz)
[1] "E:\\DATA\\example.csv"
But in direct response to your question: I am not aware of a way to ignore the escape sequence using R. In other words, I do not believe it is possible to copy a file path from Windows and paste it directly into R.
However, if what you are really after is a way of copying and pasting from the Windows Clipboard and get a valid R string, try readClipboard
For example, if I copy a file path from Windows Explorer, then run the following code, I get a valid file path:
zz <- readClipboard()
zz
[1] "C:\\Users\\Andrie\\R\\win-library\\"
It is now possible with R version 4.0.0. See ?Quotes for more.
Example
r"(c:\Program files\R)"
## "c:\\Program files\\R"
If E:\DATA\example.csv is on the clipboard then do this:
example.csv <- scan("clipboard", what = "")
## Read 1 item
example.csv
## [1] "E:\\DATA\\example.csv"
Now you can copy "E:\\DATA\\example.csv" from the above output above onto the clipboard and then paste that into your source code if you need to hard code the path.
Similar remarks apply if E:\DATA\example.csv is in a file.
If the file exists then another thing to try is:
example.csv <- file.choose()
and then navigate to it and continue as in 1) above (except the file.choose line replaces the scan statement there).
Note that its not true that you need to change the backslashes to forward slashes for read.csv on Windows but if for some reason you truly need to do that translation then if the file exists then this will translate backslashes to forward slashes (but if it does not exist then it will give an annoying warning so you might want to use one of the other approaches below):
normalizePath(example.csv, winslash = "/")
and these translate backslashes to forward slashes even if the file does not exist:
gsub("\\", "/", example.csv, fixed = TRUE)
## [1] "E:/DATA/example.csv"
or
chartr("\\", "/", example.csv)
## [1] "E:/DATA/example.csv"
In 4.0+ the following syntax is supported. ?Quotes discusses additional variations.
r"{E:\DATA\example.csv}"
EDIT: Added more info on normalizePath.
EDIT: Added (4).
A slightly different approach I use with a custom made function that takes a windows path and corrects it for R.
pathPrep <- function() {
cat("Please enter the path:\\n\\n")
oldstring <- readline()
chartr("\\\\", "/", oldstring)
}
Let's try it out!
When prompted paste the path into console or use ctrl + r on everything at once
(x <- pathPrep())
C:/Users/Me/Desktop/SomeFolder/example.csv
Now you can feed it to a function
shell.exec(x) #this piece would work only if
# this file really exists in the
# location specified
But as others pointed out what you want is not truly possible.
No, this is not possible with R versions before 4.0.0. Sorry.
I know this question is old, but for people stumbling upon this question in recent times, wanted to share that with the latest version R4.0.0, it is possible to parse in raw strings. The syntax for that is r"()". Note that the string goes in the brackets.
Example:
> r"(C:\Users)"
[1] "C:\\Users"
Source: https://cran.r-project.org/doc/manuals/r-devel/NEWS.html
jump to section: significant user-visible changes.
Here's an incredibly ugly one-line hack to do this in base R, with no packages necessary:
setwd(gsub(", ", "", toString(paste0(read.table("clipboard", sep="\\", stringsAsFactors=F)[1,], sep="/"))))
Usable in its own little wrapper function thus (using suppressWarnings for peace of mind):
> getwd()
[1] "C:/Users/username1/Documents"
> change_wd=function(){
+ suppressWarnings(setwd(gsub(", ", "", toString(paste0(read.table("clipboard", sep="\\", stringsAsFactors=F)[1,], sep="/")))))
+ getwd()
+ }
Now you can run it:
#Copy your new folder path to clipboard
> change_wd()
[1] "C:/Users/username1/Documents/New Folder"
To answer the actual question of "Can I parse raw-string in R without having to double-escape backslashes?" which is a good question, and has a lot of uses besides the specific use-case with the clipboard.
I have found a package that appears to provide this functionality:
https://github.com/trinker/pathr
See "win_fix".
The use-case specified in the docs is exactly the use-case you just stated, however I haven't investigated whether it handles more flexible usage scenarios yet.

Resources