I have a list that looks like this:
> indices
$`48-168`
$`48-168`$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`60-180`
$`60-180`$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`180-300`
$`180-300`$`1`
[1] 1 2
$`180-300`$`4`
[1] 4 5 6 7 8 9 10
$`180-300`$`3`
[1] 3
I want to print it somehow in a file and then recover the same list later.
I though printing the object given by unlist(as.relistable(obj)) and use relist later but then I do not know how to recover the information from the file.
Given that your data is not particularly well structured, you might want to just use save() here, and save the original R list object:
save(indices, file="/path/to/your/file.txt")
When you want to load indices again, use the load() function:
load(file="/path/to/your/file.txt")
Related
Say, I have a dataframe df in R as follows,
id inflam
1 1 0.03093764
2 2 0.50115406
3 3 0.82153770
4 4 0.01985961
5 5 0.04994588
6 6 0.91714810
7 7 0.83438400
8 8 0.80832225
9 9 0.12360681
10 10 0.08490079
I can access the entirety of the inflam column by indexing as df[,2] or df[2]. However, typeof(df[,2]) returns double, whereas typeof(df[2]) returns list. The comma seems to be the differentiator, but why is this the case? What is going on under the hood?
I found a very suprising and unpleasant feature of R - it completes list item names!!! See the following code:
a <- list(cov_spring = "spring")
a$cov <- c()
a$cov
# spring ## I expect it to be empty!!! I've set it empty!
a$co
# spring
a$c
I don't know what to do with that.... I need to be able to set $cov to NULL and have $cov_spring there at the same time!!! And use $cov separately!! This is annoying!
My question:
What is going on here? How is this possible, what is the logic behind?
Is there some easy fix, how to turn this completion off? I need to use list items cov_spring and cov independently as if they are normal variables. No damn completion please!!!
From help("$"):
'x$name' is equivalent to 'x[["name", exact = FALSE]]'
When you scroll back and read up on exact=:
exact: Controls possible partial matching of '[[' when extracting by
a character vector (for most objects, but see under
'Environments'). The default is no partial matching. Value
'NA' allows partial matching but issues a warning when it
occurs. Value 'FALSE' allows partial matching without any
warning.
So this provides you partial matching capability in both $ and [[ indexing:
mtcars$cy
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[["cy"]]
# NULL
mtcars[["cy", exact=FALSE]]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
There is no way I can see of to disable the exact=FALSE default for $ (unless you want to mess with formals, which I do not recommend for the sake of reproducibility and consistent behavior).
Programmatic use of frames and lists (for defensive purposes) should prefer [[ over $ for precisely this reason. (It's rare, but I have been bitten by this permissive behavior.)
Edit:
For clarity on that last point:
mtcars$cyl becomes mtcars[["cyl"]]
mtcars$cyl[1:3] becomes mtcars[["cyl"]][1:3]
mtcars[,"cy"] is not a problem, nor is mtcars[1:3,"cy"]
You can use [ or [[ instead.
a["cov"] will return a list with a NULL element.
a[["cov"]] will return the NULL element directly.
I found a very suprising and unpleasant feature of R - it completes list item names!!! See the following code:
a <- list(cov_spring = "spring")
a$cov <- c()
a$cov
# spring ## I expect it to be empty!!! I've set it empty!
a$co
# spring
a$c
I don't know what to do with that.... I need to be able to set $cov to NULL and have $cov_spring there at the same time!!! And use $cov separately!! This is annoying!
My question:
What is going on here? How is this possible, what is the logic behind?
Is there some easy fix, how to turn this completion off? I need to use list items cov_spring and cov independently as if they are normal variables. No damn completion please!!!
From help("$"):
'x$name' is equivalent to 'x[["name", exact = FALSE]]'
When you scroll back and read up on exact=:
exact: Controls possible partial matching of '[[' when extracting by
a character vector (for most objects, but see under
'Environments'). The default is no partial matching. Value
'NA' allows partial matching but issues a warning when it
occurs. Value 'FALSE' allows partial matching without any
warning.
So this provides you partial matching capability in both $ and [[ indexing:
mtcars$cy
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[["cy"]]
# NULL
mtcars[["cy", exact=FALSE]]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
There is no way I can see of to disable the exact=FALSE default for $ (unless you want to mess with formals, which I do not recommend for the sake of reproducibility and consistent behavior).
Programmatic use of frames and lists (for defensive purposes) should prefer [[ over $ for precisely this reason. (It's rare, but I have been bitten by this permissive behavior.)
Edit:
For clarity on that last point:
mtcars$cyl becomes mtcars[["cyl"]]
mtcars$cyl[1:3] becomes mtcars[["cyl"]][1:3]
mtcars[,"cy"] is not a problem, nor is mtcars[1:3,"cy"]
You can use [ or [[ instead.
a["cov"] will return a list with a NULL element.
a[["cov"]] will return the NULL element directly.
Probably a similar situation has already been solved but I could not find it.
I have a mapper data frame like the following
mapper
bucket_label bucket_no
1 (-Inf; 9.99) 1
2 (25.01; 29.99) 1
3 (29.99; 30.01) 1
4 (30.01; Inf) 1
5 (19.99; 20.01) 2
6 (20.01; 24.99) 2
7 (24.99; 25.01) 2
8 (9.99; 10.11) 3
9 (10.11; 14.99) 3
10 (14.99; 15.01) 3
11 (15.01; 19.99) 3
and a vector x with random data
x <- rnorm(100)*100
I need to set the corresponding bucket for each entry of this in a quick way and findInterval and cut seem not to help for this issue.
I am wondering how to test functions that produce graphics. I have a simple
plotting function img:
img <- function() {
plot(1:10)
}
In my package I like to create a unit test for this function using testthat.
Because plot and its friends in base graphics just return NULL a simple
expect_identical is not working:
library("testthat")
## example for a successful test
expect_identical(plot(1:10), img()) ## equal (as expected)
## example for a test failure
expect_identical(plot(1:10, col="red"), img()) ## DOES NOT FAIL!
# (because both return NULL)
First I thought about plotting into a file and compare the md5 checksums to
ensure that the output of the functions is equal:
md5plot <- function(expr) {
file <- tempfile(fileext=".pdf")
on.exit(unlink(file))
pdf(file)
expr
dev.off()
unname(tools::md5sum(file))
}
## example for a successful test
expect_identical(md5plot(img()),
md5plot(plot(1:10))) ## equal (as expected)
## example for a test failure
expect_identical(md5plot(img()),
md5plot(plot(1:10, col="red"))) ## not equal (as expected)
That works well on Linux but not on Windows. Surprisingly
md5plot(plot(1:10)) results in a new md5sum at each call.
Aside this problem I need to create a lot of temporary files.
Next I used recordPlot (first creating a null-device, call the plotting
function and record its output). This works as expected:
recPlot <- function(expr) {
pdf(NULL)
on.exit(dev.off())
dev.control(displaylist="enable")
expr
recordPlot()
}
## example for a successful test
expect_identical(recPlot(plot(1:10)),
recPlot(img())) ## equal (as expected)
## example for a test failure
expect_identical(recPlot(plot(1:10, col="red")),
recPlot(img())) ## not equal (as expected)
Does anybody know a better way to test the graphical output of functions?
EDIT: regarding the points #josilber asks in his comments.
While the recordPlot approach works well you have to rewrite the whole plotting function in the unit test. That becomes complicated for complex plotting functions. It would be nice to have an approach that allows to store a file (*.RData or *.pdf, ...) which contains an image against you could compare in future tests. The md5sum approach isn't working because the md5sums differ on different platforms. Via recordPlot you could create an *.RData file but you could not rely on its format (from the recordPlot manual page):
The format of recorded plots may change between R versions.
Recorded plots can not be used as a permanent storage format for
R plots.
Maybe it would be possible to store an image file (*.png, *.bmp, etc), import it and compare it pixel by pixel ...
EDIT2: The following code illustrate the desired reference file approach using svg as output. First the needed helper functions:
## plot to svg and return file contant as character
plot_image <- function(expr) {
file <- tempfile(fileext=".svg")
on.exit(unlink(file))
svg(file)
expr
dev.off()
readLines(file)
}
## the IDs differ at each `svg` call, that's why we simple remove them
ignore_svg_id <- function(lines) {
gsub(pattern = "(xlink:href|id)=\"#?([a-z0-9]+)-?(?<![0-9])[0-9]+\"",
replacement = "\\1=\"\\2\"", x = lines, perl = TRUE)
}
## compare svg character vs reference
expect_image_equal <- function(object, expected, ...) {
stopifnot(is.character(expected) && file.exists(expected))
expect_equal(ignore_svg_id(plot_image(object)),
ignore_svg_id(readLines(expected)), ...)
}
## create reference image
create_reference_image <- function(expr, file) {
svg(file)
expr
dev.off()
}
A test would be:
create_reference_image(img(), "reference.svg")
## create tests
library("testthat")
expect_image_equal(img(), "reference.svg") ## equal (as expected)
expect_image_equal(plot(1:10, col="red"), "reference.svg") ## not equal (as expected)
Sadly this is not working across different platforms. The order (and the names)
of the svg elements completely differs on Linux and Windows.
Similar problems exist for png, jpeg and recordPlot. The resulting files
differ on all platforms.
Currently the only working solution is the recPlot approach above. But therefore
I need to rewrite the whole plotting functions in my unit tests.
P.S.:
I am completley confused about the different md5sums on Windows. It seems they depending on the creation time of the temporary files:
# on Windows
table(sapply(1:100, function(x)md5plot(plot(1:10))))
#4693c8bcf6b6cb78ce1fc7ca41831353 51e8845fead596c86a3f0ca36495eacb
# 40 60
Mango Solutions have published an open source package, visualTest, that does fuzzy matching of plots, to address this use case.
The package is on github, so install using:
devtools::install_github("MangoTheCat/visualTest")
library(visualTest)
Then use the function getFingerprint() to extract a finger print for each plot, and compare using the function isSimilar(), specifying a suitable threshold.
First, create some plots on file:
png(filename = "test1.png")
img()
dev.off()
png(filename = "test2.png")
plot(1:11, col="red")
dev.off()
The finger print is a numeric vector:
> getFingerprint(file = "test1.png")
[1] 4 7 4 4 10 4 7 7 4 7 7 4 7 4 5 9 4 7 7 5 6 7 4 7 4 4 10
[28] 4 7 7 4 7 7 4 7 4 3 7 4 4 3 4 4 5 5 4 7 4 7 4 7 7 7 4
[55] 7 7 4 7 4 7 5 6 7 7 4 8 6 4 7 4 7 4 7 7 7 4 4 10 4 7 4
> getFingerprint(file = "test2.png")
[1] 7 7 4 4 17 4 7 4 7 4 7 7 4 5 9 4 7 7 5 6 7 4 7 7 11 4 7
[28] 7 5 6 7 4 7 4 14 4 3 4 7 11 7 4 7 5 6 7 7 4 7 11 7 4 7 5
[55] 6 7 7 4 8 6 4 7 7 4 4 7 7 4 10 11 4 7 7
Compare using isSimilar():
> isSimilar(file = "test2.png",
+ fingerprint = getFingerprint(file = "test1.png"),
+ threshold = 0.1
+ )
[1] FALSE
You can read more about the package at http://www.mango-solutions.com/wp/products-services/r-services/r-packages/visualtest/
It's worth noting that the vdiffr package also supports comparing plots. A nice feature is that it integrates with the testthat package -- it's actually used for testing in ggplot2 -- and it has an add-in for RStudio to help manage your testsuite.