legend in R plot display all lines - r

I have a following problem.
I want to put a legend into my graph. My code:
plot(Lc(`BEL_2016_final.csv`$value),col="red",lwd=2,
xaxt="n", yaxt="n", cex.lab = 1.5)
axis(side=1, at=axTicks(1), cex.axis = 1.5)
axis(side=2, at=axTicks(2), cex.axis = 1.5)
par(new=TRUE)
plot(Lc(`CRO_2016_final.csv`$value),col="blue",lwd=2,
xaxt="n", yaxt="n", cex.lab = 1.5)
axis(side=1, at=axTicks(1), cex.axis = 1.5)
axis(side=2, at=axTicks(2), cex.axis = 1.5)
legend(x = "topleft", legend=paste0(c("Belgium, Gini "),
round(Gini(`BEL_2016_final.csv`$value), digits = 2),
c("Croatia, Gini "),
round(Gini(`CRO_2016_final.csv`$value), digits = 2)),
col=c("red", "blue"), lty=1:2, cex=1, lwd=1.5)
However, the legend looks like this:
When I try:
legend=paste0(c("Belgium, Gini ", "Croatia, Gini "),
round(c(Gini(`BEL_2016_final.csv`$value)),
Gini(`CRO_2016_final.csv`$value)),
digits = 2)
I got this result:
which is wrong, because Gini index for Croatia is 0.73.
How can I modify my code to display both lines (red and blue) in the legend, both on a new line? Thanks a lot.

Your parentheses are mismatched. Whatever IDE/editor you are using I encourage the use of matching (sometimes "rainbow") parentheses. For example, in RStudio, if the cursor is the _ symbol (and accepting RStudio's insistence on its indentation preference):
notice that the ( next to paste0 is highlighted, suggesting you that digits=2 is the last argument in paste0. This is incorrect. Another hint is using RStudio's indentation preference (highlight the block and press Ctrl-I, the default keypress for "Reindent Lines"): the second Gini lines up with c(, not with the first Gini, meaning that c( and second-Gini are at the same level ... where I would expect the second-Gini to be nested within the c(.
To validate what is going on, I'll replace the Gini(.) calls with your 0.52 and 0.73 values, verbatim (but please keep them as Gini(.) in your code:
paste0(c("Belgium, Gini ", "Croatia, Gini "),
round(c(0.52),
0.73),
digits = 2)
# [1] "Belgium, Gini 0.52" "Croatia, Gini 0.52"
Looking at it this way, it appears as if the first right-paren after 0.52 might have been intended to be after the 0.73, since grouping 0.52 and 0.73 makes sense.
Here is corrected code, where all I do is remove one right-paren from after the first-Gini, and add one right-paren to the very end of this expression:
legend=paste0(c("Belgium, Gini ", "Croatia, Gini "),
round(c(Gini(`BEL_2016_final.csv`$value),
Gini(`CRO_2016_final.csv`$value)),
digits = 2) )
and the associated matching-paren highlighting (again, _ is the current cursor):
<soapbox>
PS: I am not saying that one must use the RStudio IDE for R work. In fact, I don't, I use emacs/ess. There are other editors to use as well. However, as much as indentation and similar can be viewed as style and therefore not important for programming, I argue that indentation and some editor functionality like matching-parens can help in readability as well as troubleshooting code before you even get to a mistake; for instance, a consistent indentation style alone here hints to improper paren-closure, and the matching-paren-highlighter confirms it. Use what you prefer, but some programming styles are actually beneficial functionally (and therefore pragmatic).
</soapbox>

Correct solution is:
legend(x = "topleft", legend=paste0(c("Belgium, Gini ", "Croatia, Gini "),
c(round(Gini(`BEL_2016_final.csv`$value), digits = 2),
round(Gini(`CRO_2016_final.csv`$value), digits = 2)
)),
col=c("red", "blue"), lty=1:2, cex=1, lwd=1.5)

Related

R: plot() Function with type="h" Misrepresents Small Numbers ( For Larger Values of "lwd" )

I am trying to generate a plot showing the probabilities of a Binomial(10, 0.3) distribution.
I'd like to do this in base R.
The following code is the best I have come up with,
plot(dbinom(1:10, 10, 0.3), type="h", lend=2, lwd=20, yaxs="i")
My issue with the above code is the small numbers get disproportionately large bars. (See below) For example P(X = 8) = 0.00145 but the height in the plot looks like about 0.025.
It seems to be an artifact created by wanting wider bars, if the lwd = 20 argument is removed you get tiny bars but their heights seem to be representative.
I think the problem is your choice of lend (line-end) parameter. The 'round' (0) and 'square' (2) choices are intended for when you want a little bit of extra extension beyond the end of a segment, e.g. so that adjacent segments join nicely, e.g. if you were plotting line segments that should be part of a connected line (see example below).
f <- function(le) plot(dbinom(1:10, 10, 0.3),
type="h", lend = le, lwd=20, yaxs="i", main = le)
par(mfrow=c(1,3))
invisible(lapply(c("round", "butt", "square"), f))
"round", "butt", and "square" could also be specified (less mnemonically) as 0, 1, and 2 ...
x <- 1:5; y <- c(1,4,2,3,5)
f2 <- function(le) {
plot(x,y, type ="n", main = le)
segments(x[-length(x)], y[-length(x)], x[-1], y[-1],
lwd = 20, lend = le)
}
par(mfrow=c(1,3))
invisible(lapply(c("round", "butt", "square"), f2))
Here you can see that the round end caps work well, both 'butt' and 'square' have issues. (I can't think offhand of a use case for "square", but I'm sure one exists ...) There is a good description of line-drawing parameters here (although it also doesn't suggest use cases ...)

How to put statistical information on the output diagram with hist()?

I made the following histogram. I want to display a stats box with a border here, like the image on this sites image, and write various information there. In addition, it is assumed that you will also write legends for multiple histograms.
As a result of research, I found that ROOT has a similar function, but I would like to realize this with R. How can I do this with R?
The xlsx file I used is large, so I uploaded it here.
library(readxl)
file = read_excel("./data.xlsx")
summary(file)
data = file[["foo[%]"]]
par(las=1, family="Century gothic", xaxs="i", yaxs="i", cex.main=3, cex.lab=1.1, cex.axis=1.1, font.lab=2, font.axis=2)
h = hist(data, yaxt="n", tck=0.03, breaks=seq(18,18.35,0.01), main=NA, xlab="xaxis", ylab="freq", border=NA, col="#00000000", ylim=c(0,100))
axis(2, tck=0.03, at=c(0,10,20,30,40,50,60,70,80,90,100))
grid()
lines(rep(h$breaks, each=2)[-c(1,2*length(h$breaks))], rep(h$counts, each=2), lwd=3)
box()
#jay.sf's suggestion is a little tricky to implement. Here's an example that comes close, but it's not perfect: the spacing is wrong. Maybe someone can improve it?
statname <- expression("Entries", "Mean", "Std Dev", chi^2/"ndf", "Prob", "Constant", "Slope")
statvalue <- expression(5000, 19.59, 18.61, 131.1/115, 0.1447, 5.54 %+-% 0.04,
-0.0514 %+-% 0.0011)
plot(1)
legend("topright", title = "hB",
legend = c(statname, statvalue),
ncol = 2,
x.intersp = 0)
Created on 2021-02-21 by the reprex package (v0.3.0)
It uses expression() for the entries because you have math in them; you might be able to do it reasonably well with special characters instead. I couldn't find a way to set the spacing of the two columns differently: your example had the names flush left and the values flush right. Nor could I set the column widths separately. Probably if you really want to fine tune this, you'll have to write your own function based on the legend function.

How to fix text using mtext function in R

I struggle to fix text in my plot using mtext
Assuming this is my data:
df<-rnorm(100,12,2)
The codes used are :
plot(df)
mtext(col="red",side=3,line=1,at=39, paste(round(12,4)))
mtext('text here=',col="dark green", side=3, line=1, at=10)
When I use these codes, I get a gap between 'text here=' and the value of '12'. When I fix it, and when I expand the plot area in Rstudio, I will get the gap.
I want to have text here= 12 and when I expand the plot, it is not going to be changed.
It would be good if we could simplify the codes.
You could use a phantom expression with bquote for that:
Edit:
To adjust the position, use adj and padj.
df<-rnorm(100,12,2)
plot(df)
txt1 <- bquote(expression("text here = " * phantom(.(round(12,4)))))
txt2 <- bquote(expression(phantom("text here = ") * .(round(12,4))))
mtext(eval(txt1), col = "dark green", adj=0, padj=-1)
mtext(eval(txt2), col = "red", adj=0, padj=-1)
Created on 2020-03-28 by the reprex package (v0.3.0)
My answer will look as a hack because it is a hack:
plot(df)
mtext(col=c("red","blue"), side=3, line=1, at=10,
c('text here = ', paste0(c(rep(" ", 23), 12), collapse = "")))
You will have to find how many spaces you have to use (here is 23) before the number you want to appear (12). Resizing the plot did not change the relative positions between the text and number.
Of course, this will be difficult to adapt if the text varies from graph to graph.
I hope someone else comes with a better answer.
Great answer by #user12728748 !

Setting col= parameter using par()

I want to illustrate R's par() graphics parameter command with multiple graphs, so I did a simple 2×2 layout that's graphed great. I added a single par (col = "green") command to cause the one barplot() and three hist()ograms, but it did nothing that I could see.
Here's my R script, which should be safe since I save and restore your graphics settings at the top and bottom. Apologies for the long dput() but I want you to have the data I have.
savedGraphicsParams <- par(no.readonly=TRUE)
layout(matrix(c(1, 2, 3, 4), nrow=2, byrow=TRUE))
par(col = "green") # doesn't work
attach(Lakes)
# GRAPH 1:
barplot(table(N_of_Fish), main="Fish", xlab = "No. of Fish")
# GRAPH 2:
hist(Elevation, main = "Elevation", xlab = "ft")
# GRAPH 3
hist(Surface_Area, main="Surface Area", xlab = parse(text="ft^2"))
# GRAPH 4
hist(`, main="Max Depth", xlab = "ft")
detach(Lakes)
par(savedGraphicsParams) # Reset the graphics
tl;dr Unfortunately, as far as I know, you just can't do this; you have to use col= in the individual plot calls. Picking through ?par, we find:
Several parameters can only be set by a call to ‘par()’: ...
The remaining parameters can also be set as arguments (often via
‘...’) to high-level plot functions ...
However, see the comments on ‘bg’, ‘cex’, ‘col’, ‘lty’,
‘lwd’ and ‘pch’ which may be taken as arguments to certain plot
functions rather than as graphical parameters.
(emphasis added).
I interpret this as meaning that bg et al. cannot be set globally by a call to par() (even though they're described and discussed in ?par), but must be set as arguments to individual plotting calls. I would write the code this way (also avoiding the use of attach(), which is advised against even in its own manual page ...)
plot_col <- "green"
with (Lakes,
{
barplot(table(N_of_Fish), main="Fish", xlab = "No. of Fish", col=plot_col)
hist(Elevation, main = "Elevation", xlab = "ft", col=plot_col)
hist(Surface_Area, main="Surface Area", xlab = parse(text="ft^2"), col=plot_col)
hist(Maximum_Depth, main="Max Depth", xlab = "ft", col=plot_col)
})

(R) Axis widths in gbm.plot

Hoping for some pointers or some experiences insight as i'm literally losing my mind over this, been trying for 2 full days to set up the right values to have a function spit out clean simple line plots from the gbm.plot function (packages dismo & gbm).
Here's where I start. bty=n in par to turn off the box & leave me with only left & bottom axes. Gbm.plot typically spits out one plot per explanatory variable, so usually 6 plots etc, but I'm tweaking it to do one per variable & looping it. I've removed the loop & lots of other code so it's easy to see what's going on.
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1, #this is part of the multiple plots thing, calls the explanatory variable
lwd=8, #this controls the width of the main result line ONLY
rug=F)
dev.off()
So this is what the starting condition looks like. Aim: make the axes & ticks thicker. That's it.
Putting "lwd=20" in par does nothing.
Adding axes=F into gbm.plot() turns the axes and their numbers off. So I conclude that the control of these axes is handled by gbm.plot, not par. Here's where it get's frustrating and crap. Accepted wisdom from searches says that lwd should control this but it only controls the wiggly centre line as per my note above. So maybe I could add axis(side=1, lwd=8) into gbm.plot() ?
It runs but inexplicably adds a smoother! (which is very thin & hard to see on the web but it's there, I promise). It adds these warnings:
In if (smooth & is.vector(predictors[[j]])) { ... :
the condition has length > 1 and only the first element will be used
Fine, R's going to be a dick for seemingly no reason, I'll keep plugging the leaks as they come up. New code with axis as before and now smoother turned off:
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1,
lwd=8,
rug=F,
smooth=F,
axis(side=1,lwd=8))
dev.off()
Gives error:
Error in axis(side = 1, lwd = 8) : plot.new has not been called yet
So it's CLEARLY drawing axes within plot since I can't affect the axes from par and I can turn them off in plot. I can do what I want and make one axis bold, but that results in a smoother and warnings. I can turn the smoother off, but then it fails because it says plot.new hadn't been called. And this doesn't even account for the other axis I have to deal with, which also causes the plot.new failure if I call 2 axis sequentially and allow the smoother.
Am I the butt of a big joke here, or am I missing something obvious? It took me long enough to work out that par is supposed to be before all plots unless you're outputting them with png etc in which case it has to be between png & plot - unbelievably this info isn't in ?par. I know I'm going off topic by ranting, sorry, but yeah, 2 full days. Has this been everyone's experience of plotting in R?
I'm going to open the vodka in the freezer. I appreciate I've not put the full reproducible code here, apologies, I can do if absolutely necessary, but it's such a huge timesuck to get to reproducible stage and I'm hoping someone can see a basic logical/coding failure screaming out at them from what I've given.
Thanks guys.
EDIT: reproducibility
core data csv: https://drive.google.com/file/d/0B6LsdZetdypkWnBJVDJ5U3l4UFU
(I've tried to make these data reproducible before and I can't work out how to do so)
samples<-read.csv("data.csv", header = TRUE, row.names=NULL)
my_gbm_model<-gbm.step(data=samples, gbm.x=1:6, gbm.y=7, family = "bernoulli", tree.complexity = 2, learning.rate = 0.01, bag.fraction = 0.5))
Here's what will widen your axis ticks:
..... , lwd.ticks=4 , ...
I predict on the basis of no testing because I keep getting errors with what limited code you have provided) that it will get handled correctly in either gbm.plot or in a subsequent axis call. There will need to be a subsequent axis call, two of them in fact (because as you noted 'lwd' gets passed around indiscriminately):
png(filename = "whatever.png",width=4*480, height=4*480, units="px", pointsize=80, bg="white", res = NA, family="", type="cairo-png")
par(mar=c(2.6,2,0.4,0.5), fig=c(0,1,0.1,1), las=1, bty="n", mgp=c(1.6,0.5,0))
gbm.plot(my_gbm_model,
n.plots=1,
plot.layout = c(1,1),
y.label = "",
write.title=F,
variable.no = 1,
lwd=8,
rug=F,
smooth=F, axes="F",
axis(side=1,lwd=8))
axis(1, lwd.ticks=4, lwd=4)
# the only way to prevent `lwd` from also affecting plot line
axis(2, lwd.ticks=4, lwd=4)
dev.off()
This is what I see with a simple example:
png(); Speed <- cars$speed
Distance <- cars$dist
plot(Speed, Distance,
panel.first = lines(stats::lowess(Speed, Distance), lty = "dashed"),
pch = 0, cex = 1.2, col = "blue", axes=FALSE)
axis(1, lwd.ticks=4, lwd=4)
axis(2, lwd.ticks=4, lwd=4)
dev.off()

Resources