Receiving an unexpected error when plotting by group - r

Sorry for the massive data dump but I can't reproduce this on the subsets of the data I've tried. Copy-pasted the dput of the data (165 obs., not crazy) to this Gist.
I'm trying to plot the data in DT by sport, according to:
Create empty plot with proper limits to accommodate all data
Plot the column gini as a scatterplot, with colors varying by sport
Plot the column five_year_ma as a line, with color matching that in 2.
This should be simple and I've done things like it before. Here's what should work:
#empty plot with proper axes
DT[ , plot(
NA, ylim = range(gini), xlim = range(season),
xlab = "Season", ylab = "Gini",
main = "Comparison of Gini Coefficient Across Sports"
)]
#pick colors for each sport
cols <- c(NHL="black", NBA="red")
DT[ , by = sport, {
#add points to current plot
points(season, gini, col = cols[.BY$sport])
#add lines to current plot
lines(season, five_yr_ma, col = cols[.BY$sport], lwd = 3)
}]
But this gives me output/error:
# Empty data.table (0 rows) of 1 col: sport
Error: x and y lengths differ in plot.xy()
This is strange. If we skip the grouping and just do it manually, it works perfectly fine:
all_sports[sport == "NBA", {
points(season, gini, col = "red")
lines(season, five_yr_ma, col = "red", lwd = 3)
}]
all_sports[sport == "NHL", {
points(season, gini, col = "black")
lines(season, five_yr_ma, col = "black", lwd = 3)
}]
Moreover, even in the context of grouping, it's unclear why plot.xy has received arguments of different length -- if we make the following adjustment to force R to record the inputs just before they're sent, there doesn't appear to be any issue:
all_sports[ , {
cat("\n\nPlotting for sport: ", .BY$sport)
points(x1 <- season, y1 <- gini, col = cols[.BY$sport])
lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY$sport], lwd = 3)
cat("\npoints/season: ",length(x1),
"\npoints/gini: ", length(y1),
"\nlines/season: ", length(x2),
"\nlines/five_yr_ma: ", length(y2))},
by = sport]
Has output:
# Plotting for sport: NHL
# points/season: 98
# points/gini: 98
# lines/season: 98
# lines/five_yr_ma: 98
# Plotting for sport: NBA
# points/season: 67
# points/gini: 67
# lines/season: 67
# lines/five_yr_ma: 67
What could be going on??
Since it appears like this is not common across machines, here's my sessionInfo():
R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.7
loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4

Indeed, as #Arun points out, it seems this is a resurfacing of the (as yet unsolved) issue which was causing the error in this question:
Values of the wrong group are used when using plot() within a data.table() in RStudio
As #Arun discovered there, it seems like RStudio's native graphics device is somehow getting tripped up by the changing pointers used for the different subgroups created when evaluating j when by is present, which lends itself to the workaround of simply copying all of .SD each time, like:
points(copy(season), copy(gini),
col = cols[.BY$sport])
lines(copy(season), copy(five_yr_ma),
col = cols[.BY$sport], lwd = 3)
Or
x <- copy(.SD)
with(x, {points(season, gini, cols = cols[.BY$sport]);
lines(copy(season), copy(five_yr_ma),
col = cols[.BY$sport], lwd = 3)})
Both of which worked for me (since the subgroups are so small, there's no computational efficiency concern at play here -- we can copy away without affecting performance noticeably).
This is #1524 at the data.table GitHub page and I've filed this bug report at RStudio Support; will update this if a fix is pushed.

Related

Coloring according to numeric vector doesn't work when using "reorder" in x axis

library(dplyr)
library(plotly)
# data frame
Title = c('Titanic', 'Avatar', 'Jurassic World')
Profit = c(458672302, 523505847, 502177271)
df = data.frame(Title, Profit)
Basically, I'm trying to color the bars according to the Profit column. And when it's not ordered, it works figure 1
# X axis not ordered (working)
plot_ly(df, x = ~Title,
y = ~Profit,
color = ~Profit,
type = 'bar')
But when I try to reorder the x axis (to look like this), it returns an error, probably because it's a factor:
# X axis reordered (not working)
plot_ly(df, x = ~reorder(Title, -Profit) %>% as.character(),
y = ~Profit,
color = ~Profit,
type = 'bar')
Error in Summary.factor(c(3L, 1L, 2L), na.rm = TRUE) :
‘range’ not meaningful for factors
In addition: Warning message:
textfont.color doesn't (yet) support data arrays
Does anyone have any tips?
Your code works fine for me:
library(dplyr)
library(plotly)
# data frame
Title = c('Titanic', 'Avatar', 'Jurassic World')
Profit = c(458672302, 523505847, 502177271)
df = data.frame(Title, Profit)
# X axis reordered (not working)
plot_ly(df, x = ~reorder(Title, -Profit) %>% as.character(),
y = ~Profit,
color = ~Profit,
type = 'bar')
Output:
This is my sessioninfo:
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 12.3.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] nl_NL.UTF-8/nl_NL.UTF-8/nl_NL.UTF-8/C/nl_NL.UTF-8/nl_NL.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plotly_4.10.0 ggplot2_3.3.5 dplyr_1.0.8
So what you can do is check your session using sessioninfo() in your console and check if you packages are right.

Obtaining an error when running exact code from a blog

I am following a tutorial here. A few days ago I was able to run this code without error and run it on my own data set (it was always a little hit and miss with obtaining this error) - however now I try to run the code and I always obtain the same error.
Error in solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 2) :
constraints are inconsistent, no solution!
I get that the solver cannot solve the equations but I am a little confused as to why it worked previously and now it does not... The author of the article has this code working...
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals <- sapply(er_vals, function(er) {
op <- portfolio.optim(as.matrix(df), er)
return(op$ps)
})
SessionInfo:
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] lpSolve_5.6.13.1 data.table_1.12.0 tseries_0.10-46 rugarch_1.4-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 MASS_7.3-51.1 mclust_5.4.2
[4] lattice_0.20-38 quadprog_1.5-5 Rsolnp_1.16
[7] TTR_0.23-4 tools_3.5.3 xts_0.11-2
[10] SkewHyperbolic_0.4-0 GeneralizedHyperbolic_0.8-4 quantmod_0.4-13.1
[13] spd_2.0-1 grid_3.5.3 KernSmooth_2.23-15
[16] yaml_2.2.0 numDeriv_2016.8-1 Matrix_1.2-15
[19] nloptr_1.2.1 DistributionUtils_0.6-0 ks_1.11.3
[22] curl_3.3 compiler_3.5.3 expm_0.999-3
[25] truncnorm_1.0-8 mvtnorm_1.0-8 zoo_1.8-4
tseries::portfolio.optim disallows short selling by default, see argument short. If short = FALSE asset weights may not go below 0. And as the weights must sum up to 1, also no individual asset weight could be above 1. There's no leverage.
(Possibly, in an earlier version of tseries default could have been short = TRUE. This would explain why it previously worked for you.)
Your target return (pm) cannot exceed the highest return of any of the input assets.
Solution 1: Allow short selling, but remember that that's a different efficient frontier. (For reference, see any lecture or book discussing Markowitz optimization. There's a mathematical solution to the problem without short-selling restriction.)
op <- portfolio.optim(as.matrix(df), er, shorts = T)
Solution 2: Limit the target returns between the worst and the best asset's return.
er_vals <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
Here's a plot of the obtained efficient frontiers.
Here's the full script that gives both solutions.
library(tseries)
library(data.table)
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))
df_table <- melt(df)[, .(er = mean(value),
sd = sd(value)), by = variable]
# er_vals <- seq(from = min(df_table$er), to = max(df_table$er), length.out = 1000)
er_vals1 <- seq(from = 0, to = 0.15, length.out = 1000)
er_vals2 <- seq(from = min(colMeans(df)), to = max(colMeans(df)), length.out = 1000)
# find an optimal portfolio for each possible possible expected return
# (note that the values are explicitly set between the minimum and maximum of the expected returns per asset)
sd_vals1 <- sapply(er_vals1, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = T)
return(op$ps)
})
sd_vals2 <- sapply(er_vals2, function(er) {
op <- portfolio.optim(as.matrix(df), er, short = F)
return(op$ps)
})
plot(x = sd_vals1, y = er_vals1, type = "l", col = "red",
xlab = "sd", ylab = "er",
main = "red: allowing short-selling;\nblue: disallowing short-selling")
lines(x = sd_vals2, y = er_vals2, type = "l", col = "blue")

grid_plot + tikzDevice + shared legend with latex mark up

I've been trying to follow this vignette on how to make a shared legend for multiple ggplot2. The given examples work perfectly as is, but in my case, I'm using tikzDevice to export a tikzpicture environment. The main problem seems to be that the widths of the legend keys are not correctly captured by grid_plot.
I came up with a minimal R code that reproduces the problem:
require(ggplot2)
require(grid)
require(gridExtra)
require(cowplot)
require(tikzDevice)
tikz(file = "./tmp.tex", width = 5.6, height = 2.2, standAlone = T )
mpg2 <- mpg
mpg2$cyl = as.factor(mpg2$cyl)
levels(mpg2$cyl) <- c("\\textbf{\\textsc{four}}",
"\\textbf{\\textsc{five}}",
"\\textbf{\\textsc{six}}",
"\\textbf{\\textsc{seven}}",
"\\textbf{\\textsc{eight}}")
plot.mpg <- ggplot(mpg2, aes(x=cty, colour=cyl, y = hwy)) +
geom_point() +
theme(legend.position='none')
legend <- get_legend(plot.mpg + theme(legend.position = "top"))
print(plot_grid(legend,
plot.mpg, nrow=2, ncol=1,align='h',
rel_heights = c(.1, 1)))
dev.off()
The generated PDF file (after compiling tmp.tex) looks like this:
As we can observe, first legend key (four) is only partially displayed and legend key (eight) is completely invisible. I tried changing tikz command width to no avail.
Also, I suspect that the reason behind the problem is that grid_plot command incorrectly measures the length of the legend keys if they contain latex mark up. To show that this is the cause of the problem, consider changing the levels of the mpg2$cyl to the following:
levels(mpg2$cyl) <- c("four",
"five",
"six",
"seven",
"eight")
This should result in the following plot with a perfect legend:
Please note that the example above is just meant to reproduce the problem and is not what I'm trying to do. Instead, I have four plots that I'm trying to use a shared common legend for them.
Anyone please can tell me how to fix the legend problem when it contains latex mark up?
By the way, here is my sessionInfo():
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.5
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] tikzDevice_0.10-1 dplyr_0.5.0 gdata_2.17.0 cowplot_0.7.0
[5] gridExtra_2.2.1 ggplot2_2.2.0
loaded via a namespace (and not attached):
[1] gtools_3.5.0 colorspace_1.2-6 DBI_0.5 RColorBrewer_1.1-2
[5] plyr_1.8.4 munsell_0.4.3 gtable_0.2.0 labeling_0.3
[9] Rcpp_0.12.6 scales_0.4.1 filehash_2.3 digest_0.6.10
[13] tools_3.3.2 magrittr_1.5 lazyeval_0.2.0 tibble_1.1
[17] assertthat_0.1 R6_2.1.3
Thank you all.
ggplot2 appears to calculate the string widths based on the raw string as opposed to box sizes that should be returned by TeX. I'm guessing it's a calculation done too early (ie not at drawing time) in the guides code.
As a workaround you could edit the relevant widths manually in the gtable, by calling getLatexStrWidth explicitly. Note I also added a package in the preamble otherwise the default font doesn't show bold small caps.
require(ggplot2)
require(grid)
require(gridExtra)
require(tikzDevice)
setTikzDefaults(overwrite = TRUE)
preamble <- options("tikzLatexPackages")
options("tikzLatexPackages" = c(preamble$tikzLatexPackages, "\\usepackage{bold-extra}"))
tikz(file = "./tmp.tex", width = 5.6, height = 2.2, standAlone = TRUE )
mpg2 <- mpg
mpg2$cyl = as.factor(mpg2$cyl)
levels(mpg2$cyl) <- c("\\textbf{\\textsc{four}}",
"\\textbf{\\textsc{five}}",
"\\textbf{\\textsc{six}}",
"\\textbf{\\textsc{seven}}",
"\\textbf{\\textsc{eight}}")
p <- ggplot(mpg2, aes(x=cty, colour=cyl, y = hwy)) +
geom_point() +
theme(legend.position='none')
leg <- cowplot::get_legend(p + theme(legend.position = "top"))
ids <- grep("label",leg$grobs[[1]]$layout$name)
pos <- leg$grobs[[1]]$layout$l[ids]
wl <- sapply(leg$grobs[[1]][["grobs"]][ids], function(g) getLatexStrWidth(g[["label"]]))
leg$grobs[[1]][["widths"]][pos] <- unit(wl, "pt")
grid.arrange(p, top=leg)
dev.off()
Maybe it's easier to introduce a temporary markup that doesn't affect much the string width, and post-process the tex file between R and latex steps,
levels(mpg2$cyl) <- c("$four$",
"$five$",
"$six$",
"$seven$",
"$eight$")
[...]
tmp <- readLines("tmp.tex")
gs <- gsub("\\$(four|five|six|seven|eight)\\$", "\\\\textsc{\\1}", tmp, perl=TRUE)
cat(paste(gs, collapse="\n"), file="tmp2.tex")

Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found (new)

I know that the title of this question is a duplicate of this Question and this Question but the solutions over there don't work for me and the error message is (slightly) different:
Error in grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found
(note the missing part about the missing font)
I tried all suggestions that I found (updating / reinstalling all loaded graphic packages, ggplot2, GGally, and scales, reinitialising the Fonts on Mac OSX by starting in safe mode, moving the Fonts from /Fonts/ (Disabled) back into /Fonts...) but none of it resolved the problem.
The error seems to occure when I plot a ggplot graph with
scale_y_continuous(label=scientific_10)
where scientific_10 is defined as
scientific_10 <- function(x) {
parse(text = gsub("e", " %*% 10^", scientific_format()(x)))
}
Therefore the I suspect that the scales library has something to do with it.
The most puzzling is that the error only occurs each so-and-so many times, maybe each 3rd or 5th time i try to plot the same graph...
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.0.0 scales_0.3.0 broom_0.4.0 tidyr_0.3.1 ggplot2_1.0.1 GGally_0.5.0 dplyr_0.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.11.5 magrittr_1.5 MASS_7.3-43 mnormt_1.5-1 munsell_0.4.2 colorspace_1.2-6 lattice_0.20-33 R6_2.0.1
[9] stringr_0.6.2 plyr_1.8.1 tools_3.2.2 parallel_3.2.2 grid_3.2.2 gtable_0.1.2 nlme_3.1-121 psych_1.5.8
[17] DBI_0.3.1 htmltools_0.2.6 lazyeval_0.1.10 yaml_2.1.13 assertthat_0.1 digest_0.6.8 reshape2_1.4.1 rmarkdown_0.8.1
[25] labeling_0.3 reshape_0.8.5 proto_0.3-10
traceback()
35: grid.Call(L_textBounds, as.graphicsAnnot(x$label), x$x, x$y,
resolveHJust(x$just, x$hjust), resolveVJust(x$just, x$vjust),
x$rot, 0)
34: widthDetails.text(x)
33: widthDetails(x)
32: (function (x)
{
widthDetails(x)
})(list(label = expression(5 %*% 10^+5, 7.5 %*% 10^+5, 1 %*%
10^+6, 1.25 %*% 10^+6, 1.5 %*% 10^+6), x = 1, y = c(0.0777214770341215,
0.291044141334423, 0.504366805634725, 0.717689469935027, 0.931012134235329
), just = "centre", hjust = 1, vjust = 0.5, rot = 0, check.overlap = FALSE,
name = "axis.text.y.text.8056", gp = list(fontsize = 9.6,
col = "black", fontfamily = "", lineheight = 0.9, font = 1L),
vp = NULL))
31: grid.Call.graphics(L_setviewport, vp, TRUE)
30: push.vp.viewport(X[[i]], ...)
I solved it by installing the library extrafont, installing a set of specific fonts and forcing ggplot to use only these fonts:
require(extrafont)
# need only do this once!
font_import(pattern="[A/a]rial", prompt=FALSE)
require(ggplot2)
# extending the help file example
df <- data.frame(gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30))
ds <- plyr::ddply(df, "gp", plyr::summarise, mean = mean(y), sd = sd(y))
plotobj <- ggplot(df, aes(gp, y)) +
geom_point() +
geom_point(data = ds, aes(y = mean), colour = 'red', size = 3) +
theme(text=element_text(size=16, family="Arial"))
print(plotobj)
I experienced the same issue when trying to plot ggplot/grid output to the graph window in Rstudio. However, plotting to an external graphing device seems to work fine.
The external device of choice depends on your system, but the script below, paraphrased from this blog, works for most systems:
a = switch(tolower(Sys.info()["sysname"]),
"darwin" = "quartz",
"linux" = "x11",
"windows" = "windows")
options("device" = a)
graphics.off()
rm(a)
and to switch back to using the Rstudio plot window:
options("device"="RStudioGD")
graphics.off()
Note that by switching, you lose any existing plots.
A lot of solutions for this particular error direct you to look under the hood of your computer but this error can also be caused by a scripting error in which R expects to match elements from two data structures but cannot.
For me the error was caused by calling a fairly complex graphing function (see below) that read an ordered character vector as well as a matrix whose row names were supposed to each match a value in the ordered character vector. The problem was that some of my values contained dashes in them and R's read.table() function translated those dashes to periods (Ex: "HLA-DOA" became "HLA.DOA").
I was using the ComplexHeatmap package with a call like this:
oncoPrint(mat,
get_type = function(x) strsplit(x, ";")[[1]],
alter_fun_list = alter_fun_list,
col = col,
row_order = my_order,
column_title = "OncoPrint",
heatmap_legend_param = list(title = "Alternations", at = c("AMP", "HOMDEL", "MUT"), labels = c("Amplification", "Deep deletion", "Mutation"))
)
In this call:
mat was a matrix that had dashes swapped out for periods
my_order was a character vector containing the same values as the row names of matexcept the dashes remained
every other argument is essential to the call but irrelevant to this post
To help R find this elusive "polygon edge", I just edited my character vector with:
row_order <- gsub("\\.", "-", row_order)
If you've tried re-installing packages, restarting your computer and re-enabling fonts - maybe check and see if you've got some faulty character matching going on in your call.
i tried to set the font of aes,returned the error info
the added words:
p <- p + theme(text = element_text(family = "宋体"))
when i tried to remove the setting,it's ok then.
Actually, I have the same problem on my MAC and couldn't solve it on a regular base... Since it also happens like every 5th or 10th execution I decided to wrap the whole ggplot command into a trycatch call and execute it until it doesn't fail...
The code would looks like this
error_appeared <- FALSE
repeat{
tryCatch({ # we put everything into a try catch block, because sometimes we get an error
gscat <-
ggplot() # my ggplot command which sometimes fail
ggsave('file.pdf', gscat, width=8,height=8)
plot(gscat)
},
error=function(e) {
print('redo the ratioscatterplot.')
error_appeared <- TRUE
}
)
if(!error_appeared){
break
}
}
Actually I figured out, only the drawing/plotting of the figure gives problems! Saving always works.
Maybe this is helping someone, since I couldn't find a solution which actually solves the whole thing!
Additional:
If somebody wants to play with the problem on a "reproducible example" the code below throws an average of 2 errors out of 20 within the loop.
library(scales)
library(ggplot2)
df <- data.frame(
log2.Ratio.H.L.normalized.rev = c(2.53861265542646, 0.402176424979483, 0.438931541934545, 0.639695233399582, 0.230203013366421,
2.88223218956399, 1.23051046036618, 2.56554843533357, 0.265436896049098,
1.32866415755805, -0.92108963514092, 0.0976107966264223, -0.43048946484291,
-0.558665259531966, 4.13183638727079, 0.904580434921318, -0.0733780789564803,
-0.621932351219966, 1.48594198341242, -0.365611185917855, 1.21088754922081,
-2.3717583289898, 2.95160644380282, 3.71446534016249),
Intensity = c(5951600000, 2.4433e+10, 1.1659e+10, 2273600000, 6.852e+10, 9.8746e+10, 5701600000,
1758500000, 987180000, 3.4167e+11, 1.5718e+10, 6.8888e+10, 5.5936e+10,
8702900000, 1093500000, 4426200000, 1.3681e+11, 7.773e+09, 5860400000,
1.2861e+12, 2017900000, 2061300000, 240520000, 1382700000),
my_label = c("RPL18",
"hCG_2024613", "NOL7", "PRPF4B", "HIST1H2BC", "XRCC1", "C9orf30",
"CABIN1", "MGC3731", "XRCC6", "RPL23", "RPL27", "RPL17", "RPL32",
"XPC", "RPL15", "GNL3", "RPL29", "JOSD3", "PARP1", "DNAPTP6",
"ORC2L", "NCL", "TARDBP"))
unlink("figures", recursive=TRUE)
if(!dir.exists('figures')) dir.create('figures')
for(i in 1:20) {
error_appeared <- FALSE
repeat{
tryCatch({ # we put everything into a try catch block, because sometimes we get an error
gscat <-
ggplot(df, aes_string("log2.Ratio.H.L.normalized.rev", 'Intensity')) +
geom_point(data=df[abs(df[["log2.Ratio.H.L.normalized.rev"]]) < 1,],
color='black', alpha=.3, na.rm=TRUE) +
scale_y_log10(labels = scales::trans_format("log10", scales::math_format()))
ggsave(file.path('figures', paste0('intensity_scatter_', i, '.pdf')),
gscat, width=8, height=8)
plot(gscat)
},
error=function(e) {
# print(e)
print(sprintf('%s redo the ratioscatterplot.', i))
error_appeared <- TRUE
}
)
if(!error_appeared){
break
}
}
}

R programming ggvis histogram verses hist - How to size the buckets, and define X axis spacing (ticks)

I am learning to use ggvis and wanted to understand how to create the equivalent histogram to that produced by hist. Specifically, how do you set the bin widths and upper and lower bounds of x in ggvis histograms? What am I missing?
Question: How do I get the ggvis histogram output to match the hist output?
Let me provide an example:
require(psych)
require(RCurl)
require(ggvis)
if ( !exists("impact") ) {
url <- "https://dl.dropboxusercontent.com/u/8272421/stat/stat_one.txt"
myCsv <- getURL(url, ssl.verifypeer = FALSE)
impact <- read.csv(textConnection(myCsv), sep = "\t")
impact$subject <- factor(impact$subject)
}
describe(impact)
hist(impact$verbal_memory_baseline,
main = "Distribution of verbal memory baseline scores",
xlab = "score", ylab = "frequency")
Ok, lets try and reproduce with ggvis... the output does not match...
impact %>%
ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
layer_histograms(width = 5) %>%
add_axis("x", title = "score") %>%
add_axis("y", title = "frequency")
How do I get the ggvis output to match the hist output?
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] psych_1.5.6 knitr_1.11 ggvis_0.4.2.9000 setwidth_1.0-4 colorout_1.1-1 vimcom_1.2-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 digest_0.6.8 dplyr_0.4.3.9000 assertthat_0.1 mime_0.3
[6] R6_2.1.1 jsonlite_0.9.16 xtable_1.7-4 DBI_0.3.1 magrittr_1.5
[11] lazyeval_0.1.10.9000 rstudioapi_0.3.1 rmarkdown_0.7 tools_3.2.2 shiny_0.12.2
[16] httpuv_1.3.3 yaml_2.1.13 parallel_3.2.2 rsconnect_0.4.1.4 mnormt_1.5-3
[21] htmltools_0.2.6
Try
impact %>%
ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
layer_histograms(width = 5, boundary = 5) %>%
add_axis("y", title = "frequency") %>%
add_axis("x", title = "score", ticks = 5)
Which gives:
The official documentation is a bit cryptic about how boundary and center works. Have a look at DataCamp's How to Make a Histogram with ggvis in R
The width argument already set the bin width to 5, but where do bins
start and where do they end? You can use the center or boundary
argument for this. center should refer to one of the bins’ center
value, which automatically determines the other bins location. The
boundary argument specifies the boundary value of one of the bins.
Here again, specifying a single value fixes the location of all bins.
As these two arguments specify the same thing in a different way, you
should set at most one of center or boundary.
If you want the same result using center instead of boundary try:
impact %>%
ggvis( x = ~verbal_memory_baseline, fill := "white") %>%
layer_histograms(width = 5, center = 77.5) %>%
add_axis("y", title = "frequency") %>%
add_axis("x", title = "score", ticks = 5)
Here you specify the center of a bin (77.5) and it determines all the others automatically
Stevens answer is correct.
Having his pointers allowed me to read the documentation much more deeply:
layer_histograms():
http://www.rdocumentation.org/packages/ggvis/functions/layer_histograms
Boundary
A boundary between two bins. As with center, things are shifted when
boundary is outside the range of the data. For example, to center on
integers, use width = 1 and boundary = 0.5, even if 1 is outside the
range of the data. At most one of center and boundary may be
specified.
add_axis()
http://www.rdocumentation.org/packages/ggvis/functions/add_axis
ticks
A desired number of ticks. The resulting number may be different so
that values are "nice" (multiples of 2, 5, 10) and lie within the
underlying scale's range.

Resources