Pass result of Julia `download` to memory instead of file?

Pass result of Julia `download` to memory instead of file? - http

With Julia 1.6's download function, the typical behavior is to output to a file. How can I save the result directly to something in memory?
E.g. I'd like something like:
result = download(url)
contains(result,"hello")

As suggested by the help text for download, use the Downloads library; download can take an IOBuffer. Example:
result = String(take!(Downloads.download(url,IOBuffer())))

Julia uses the curl library, or something similar, for the download function, and that library writes to a file by default, or to stdout, not to a C or Julia string. Consider that many downloads may be large, perhaps larger than system RAM, to see why.
You could easily extend Julia to download, create a string for the download, and remove the temp file:
import Base.download
function Base.download(url::AbstractString, String)
tmpfile = download(url)
str = read(tmpfile, String)
rm(tmpfile)
return str
end
Watch out for big files though :)

You can also use UrlDownload.jl library. It uses HTTP.jl instead of curl, so it always keep the result in memory, and you can process it on the fly.
julia> using UrlDownload
julia> url = "https://raw.githubusercontent.com/Arkoniak/UrlDownload.jl/master/data/ext.csv"
"https://raw.githubusercontent.com/Arkoniak/UrlDownload.jl/master/data/ext.csv"
julia> urldownload(url, parser = x -> String(x))
"x,y\n1,2\n3,4\n"

Related

How to dump png to stdout? [duplicate]

Is it possible to get R to write a plot in bitmap format (e.g. PNG) to standard output? If so, how?
Specifically I would like to run Rscript myscript.R | other_prog_that_reads_a_png_from_stdin. I realise it's possible to create a temporary file and use that, but it's inconvenient as there will potentially be many copies of this pipeline running at the same time, necessitating schemes for choosing unique filenames and removing them afterwards.
I have so far tried setting outf <- file("stdout") and then running either bitmap(file=outf, ...) or png(filename=outf, ...), but both complain ('file' must be a non-empty character string and invalid 'filename' argument, respectively), which is in line with the official documentation for these functions.
Since I was able to persuade R's read.table() function to read from standard input, I'm hoping there's a way. I wasn't able to find anything relevant here on SO by searching for [r] stdout plot, or any of the variations with stdout replaced by "standard output" (with or without double quotes), and/or plot replaced by png.
Thanks!

Unfortunately the {grDevices} (and, by implication, {ggplot2}) seems to fundamentally not support this.
The obvious approach to work around this is: let a graphics device write to a temporary file, and then read that temporary file back into the R session and write it to stdout.
But this fails because, on the one hand, the data cannot be read into a string: character strings in R do not support embedded null characters (if you try you’ll get an error such as “nul character not allowed”). On the other hand, readBin and writeBin fail because writeBin categorically refuses to write to any device that’s hooked up to stdout, which is in text mode (ignoring the fact that, on POSIX system, the two are identical).
This can only be circumvented in incredibly hacky ways, e.g. by opening a binary pipe to a command such as cat:
dev_stdout = function (underlying_device = png, ...) {
filename = tempfile()
underlying_device(filename, ...)
filename
}
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = pipe('cat', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}
To use it:
tmp_dev = dev_stdout()
contour(volcano)
dev_stdout_off(tmp_dev)
On systems where /dev/stdout exists (which are most but not all POSIX systems), the dev_stdout_off function can be simplified slightly by removing the command redirection:
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = file('/dev/stdout', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}

This might not be a complete answer, but it's the best I've got: can you open a connection using the stdout() command? I know that png() will change the output device to a file connection, but that's not what you want, so it might work to simply substitute png by stdout. I don't know enough about standard outputs to test this theory, however.
The help page suggests that this connection might be text-only. In that case, a solution might be to generate a random string to use as a filename, and pass the name of the file through stdout so that the next step in your pipeline knows where to find your file.

Textract in R (paws) without S3Object

When using textract from the paws package in R the start_document_analysis call requires the path to a S3Object in DocumentLocation.
textract$start_document_analysis(
DocumentLocation = list(
S3Object = list(Bucket = bucket, Name = file)
)
)
Is it possible to use DocumentLocation without a S3Object? I would prefer to just provide the path to a local PDF.

The start_document_analysis api only supports providing an s3 object as input, and not a base64 encoded string like the analyze_document api (see also CLI docs on https://docs.aws.amazon.com/cli/latest/reference/textract/start-document-analysis.html)
So unfortunately you have to use S3 as a place to (temporarily) store your data. Of course you can write your own logic to do that :). Great tutorial on that can be found at
https://www.gormanalysis.com/blog/connecting-to-aws-s3-with-r/
Since you have already set up credentials etc. you can skip a lot of the steps and start at step 3 for example.

How to Read Data from .rda with read.table [duplicate]

I am trying to load an .rda file in r which was a saved dataframe. I do not remember the name of it though.
I have tried
a<-load("al.rda")
which then does not let me do anything with a. I get the error
Error:object 'a' not found
I have also tried to use the = sign.
How do I load this .rda file so I can use it?
I restared R with load("al.rda) and I know get the following error
Error: C stack usage is too close to the limit

Use 'attach' and then 'ls' with a name argument. Something like:
attach("al.rda")
ls("file:al.rda")
The data file is now on your search path in position 2, most likely. Do:
search()
ls(pos=2)
for enlightenment. Typing the name of any object saved in al.rda will now get it, unless you have something in search path position 1, but R will probably warn you with some message about a thing masking another thing if there is.
However I now suspect you've saved nothing in your RData file. Two reasons:
You say you don't get an error message
load says there's nothing loaded
I can duplicate this situation. If you do save(file="foo.RData") then you'll get an empty RData file - what you probably meant to do was save.image(file="foo.RData") which saves all your objects.
How big is this .rda file of yours? If its under 100 bytes (my empty RData files are 42 bytes long) then I suspect that's what's happened.

I had to reinstall R...somehow it was corrupt. The simple command which I expected of
load("al.rda")
finally worked.

I had a similar issue, and it was solved without reinstall R. for example doing
load("al.rda) works fine, however if you do
a <- load("al.rda") will not work.

The load function does return the list of variables that it loaded. I suspect you actually get an error when you load "al.rda". What exactly does R output when you load?
Example of how it should work:
d <- data.frame(a=11:13, b=letters[1:3])
save(d, file='foo.rda')
a <- load('foo.rda')
a # prints "d"
Just to be sure, check that the load function you actually call is the original one:
find("load") # should print "package:base"
EDIT Since you now get an error when you load the file, it is probably corrupt in some way. Try this and say what it prints:
file.info("a1.rda") # Prints the file size etc...
readBin("a1.rda", "raw", 50) # reads first 50 bytes from the file
Without having access to the file, it's hard to investigate more... Maybe you could share the file somehow (http://www.filedropper.com or similar)?

I usually use save to save only a single object, and I then use the following utility method to retrieve that object into a given variable name using load, but into a temporary namespace to avoid overwriting existing objects. Maybe it will be helpful for others as well:
load_first_object <- function(fname){
e <- new.env(parent = parent.frame())
load(fname, e)
return(e[[ls(e)[1]]])
}
The method can of course be extended to also return named objects and lists of objects, but this simple version is for me the most useful.

Writing a plot in bitmap format (e.g. PNG) to standard output

Is it possible to get R to write a plot in bitmap format (e.g. PNG) to standard output? If so, how?
Specifically I would like to run Rscript myscript.R | other_prog_that_reads_a_png_from_stdin. I realise it's possible to create a temporary file and use that, but it's inconvenient as there will potentially be many copies of this pipeline running at the same time, necessitating schemes for choosing unique filenames and removing them afterwards.
I have so far tried setting outf <- file("stdout") and then running either bitmap(file=outf, ...) or png(filename=outf, ...), but both complain ('file' must be a non-empty character string and invalid 'filename' argument, respectively), which is in line with the official documentation for these functions.
Since I was able to persuade R's read.table() function to read from standard input, I'm hoping there's a way. I wasn't able to find anything relevant here on SO by searching for [r] stdout plot, or any of the variations with stdout replaced by "standard output" (with or without double quotes), and/or plot replaced by png.
Thanks!

Unfortunately the {grDevices} (and, by implication, {ggplot2}) seems to fundamentally not support this.
The obvious approach to work around this is: let a graphics device write to a temporary file, and then read that temporary file back into the R session and write it to stdout.
But this fails because, on the one hand, the data cannot be read into a string: character strings in R do not support embedded null characters (if you try you’ll get an error such as “nul character not allowed”). On the other hand, readBin and writeBin fail because writeBin categorically refuses to write to any device that’s hooked up to stdout, which is in text mode (ignoring the fact that, on POSIX system, the two are identical).
This can only be circumvented in incredibly hacky ways, e.g. by opening a binary pipe to a command such as cat:
dev_stdout = function (underlying_device = png, ...) {
filename = tempfile()
underlying_device(filename, ...)
filename
}
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = pipe('cat', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}
To use it:
tmp_dev = dev_stdout()
contour(volcano)
dev_stdout_off(tmp_dev)
On systems where /dev/stdout exists (which are most but not all POSIX systems), the dev_stdout_off function can be simplified slightly by removing the command redirection:
dev_stdout_off = function (filename) {
dev.off()
on.exit(unlink(filename))
fake_stdout = file('/dev/stdout', 'wb')
on.exit(close(fake_stdout), add = TRUE)
writeBin(readBin(filename, 'raw', file.info(filename)$size), fake_stdout)
}

This might not be a complete answer, but it's the best I've got: can you open a connection using the stdout() command? I know that png() will change the output device to a file connection, but that's not what you want, so it might work to simply substitute png by stdout. I don't know enough about standard outputs to test this theory, however.
The help page suggests that this connection might be text-only. In that case, a solution might be to generate a random string to use as a filename, and pass the name of the file through stdout so that the next step in your pipeline knows where to find your file.

Test package function that writes to disk

I am trying to write a test for a package function in R.
Let's say we have a function that simply writes a string x to disk using writeLines():
exporting_function <- function(x, file) {
writeLines(x, con = file)
invisible(NULL)
}
One way of testing it would be to check if a file exists. Typically, it should not exist at first, but after the exporting function was run it should. Also, you might want to test the file size to be greater than 0:
library(testthat)
test_that("file is written to disk", {
file = 'output.txt'
expect_false(file.exists(file))
exporting_function("This is a test",
file = file)
expect_true(file.exists(file))
expect_gt(file.info('output.txt')$size, 0)
})
Is this a good way to test it? In the CRAN Repository Policy it states that Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory. Would this test violate this constraint?
There is a expect_output_file function. From the documentation and examples I am not sure if this is a more appropriate expectation to test the function. It requires a.o. an object argument which should be the object to test. What is the object to test in my case?

That looks as if it violates CRAN policy. Why not simply write to the temporary directory, using
file <- tempfile()
in place of
file = 'output.txt'
?
As to whether it is a good test: wouldn't it be better to try reading the file back in, and confirming that what was read matches what was written? That's easy in your toy example. It might be harder in the real one, but having an import function paired with your export function is always a good idea.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Pass result of Julia `download` to memory instead of file? - http

With Julia 1.6's download function, the typical behavior is to output to a file. How can I save the result directly to something in memory? E.g. I'd like something like: result = download(url) contains(result,"hello")

As suggested by the help text for download, use the Downloads library; download can take an IOBuffer. Example: result = String(take!(Downloads.download(url,IOBuffer())))

Related

How to dump png to stdout? [duplicate]

Textract in R (paws) without S3Object

How to Read Data from .rda with read.table [duplicate]

Writing a plot in bitmap format (e.g. PNG) to standard output

Test package function that writes to disk

Categories

Resources