Bug in dotchart pch? - r

I think there may be a bug in the way the pch parameter is read within the dotchart function, but would appreciate peer confirmation before reporting it.
In the following, I would like both colour and symbol to vary with the group. Colour works fine, as expected, but not symbol.
foo <- data.frame(Specimen=paste("Specimen", 1:18),
Group=c(rep("Benign", 4),
rep("In-situ", 6),
rep("Invasive", 8)),
Outcome=rweibull(18, 5) + (1:18 / 18))
with(foo, dotchart(Outcome,
groups = Group,
color = c("green", "orange", "red")[Group],
pch=c(16, 15, 17)[Group],
xlab="Outcome measure /bar",
labels = Specimen))
There is an easy but rather bizarre workaround by reversing the "Group" column encoding pch :
with(foo, dotchart(Outcome,
groups = Group,
color = c("green", "orange", "red")[Group],
pch=c(16, 15, 17)[rev(Group)],
xlab="Outcome measure /bar",
labels = Specimen))
However, I cannot see a single legitimate reason why the vector for pch should have to be reversed, particularly since colour seems to work entirely as expected. Thoughts?
Incidentally, the reason I generally try to vary the symbol as well as the colour for different groups in a chart is for the benefit of colour blind readers. Granted, it is not so important in this case.

I agree this may be a bug (which I am genuinely cautious about in base R functions like this).
Specficially, dotchart reorders the color and lcolor (line color) arguments here:
o <- sort.list(as.numeric(groups), decreasing = TRUE)
x <- x[o]
groups <- groups[o]
color <- rep_len(color, length(groups))[o]
lcolor <- rep_len(lcolor, length(groups))[o]
...and those are used in the subsequent abline and points calls, but pch is passed on unchanged. The fix would likely be to simply add the line,
pch <- rep_len(pch, length(groups))[o]
If I wanted to put my pedantic hat on (which is a good idea before submitting a bug report), I would note that the documentation for ?dotchart specifies:
color the color(s) to be used for points and labels.
for the color argument, but only:
pch the plotting character or symbol to be used.
for the pch argument. Some may argue that this "clearly" implies that only color is intended to take multiple values, and so in that sense this isn't a "bug".

This definitely looks like a bug. I have a dataset where samples have a fairly complex 4*4 color+pch coding corresponding to things that are also in the sample names, on top of groups, and the pch values just don't seem to be reordered at all during group reordering. I'll try to submit a bug report in the next weeks. I have R 3.6.1

Related

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

How do I exclude parameters from an RDA plot

I'm still relatively inexperienced manipulating plots in R, and am in need of assistance. I ran a redundancy analysis in R using the rda() function, but now I need to simplify the figure to exclude unnecessary information. The code I'm currently using is:
abio1516<-read.csv("1516 descriptors.csv")
attach(abio1516)
bio1516<-read.csv("1516habund.csv")
attach(bio1516)
rda1516<-rda(bio1516[,2:18],abio1516[,2:6])
anova(rda1516)
RsquareAdj(rda1516)
summary(rda1516)
varpart(bio1516[,2:18],~Distance_to_source,~Depth, ~Veg._cover, ~Surface_area,data=abio1516)
plot(rda1516,bty="n",xaxt="n",yaxt="n",main="1516; P=, R^2=",
ylab="Driven by , Var explained=",xlab="Driven by , Var explained=")
The produced plot looks like this:
Please help me modify my code to: exclude the sites (sit#), all axes, and the internal dashed lines.
I'd also like to either expand the size of the field, or move the vector labels to all fit in the plotting field.
updated as per responses, working code below this point
plot(rda,bty="n",xaxt="n",yaxt="n",type="n",main="xxx",ylab="xxx",xlab="xxx
Overall best:xxx")
abline(h=0,v=0,col="white",lwd=3)
points(rda,display="species",col="blue")
points(rda,display="cn",col="black")
text(rda,display="cn",col="black")
Start by plotting the rda with type = "n" which generates an empty plot to which you can add the things you want. The dotted lines are hard coded into the plot.cca function, so you need either make your own version, or use abline to hide them (then use box to cover up the holes in the axes).
require(vegan)
data(dune, dune.env)
rda1516 <- rda(dune~., data = dune.env)
plot(rda1516, type = "n")
abline(h = 0, v = 0, col = "white", lwd = 3)
box()
points(rda1516, display = "species")
points(rda1516, display = "cn", col = "blue")
text(rda1516, display = "cn", col = "blue")
If the text labels are not in the correct position, you can use the argument pos to move them (make a vector as long as the number of arrows you have with the integers 1 - 4 to move the label down, left, up, or right. (there might be better solutions to this)

Color option in xtsExtra

I am having trouble adjusting the colors of a multiple time series plot using xtsExtra.
This is the code of a minimal example:
require("xtsExtra")
n <- 50
data <- replicate(2, rnorm(n))
my.ts <- as.xts(ts(data, start=Sys.Date()-n, end=Sys.Date()))
plot.zoo(my.ts, col = c('blue', 'green'))
plot.xts(my.ts, col = c('blue', 'green'))
The plot.zoo commands yields
,
whereas the plot command from the xtsExtra package results in
.
In the second plot, the two time series are nicely overlaid, but seem insensitive to the col option.
I'm using the latest version 0.0-1 of the xtsExtra package (rev. 862).
It is my understanding that the xts and xtsExtra packages are designed as extensions of zoo and should work with the same arguments (plus many additional ones). Even though I can get the same overlay behavior in plot.zoo using the screens option, I cannot really resort to using it because the call to plot.xts that causes my problems is within the quantstrat package (functions chart.forward.training and chart.forward.testing for example) which I'd loathe to modify. (Incidentally, the dev.new() in these functions is causing me trouble as well.)
Question: Why does plot from the xtsExtra package seem not to respond to the col= option and what can be done about it, if modifying
the call to the function is not a real option?
Q1. If you take time to read the help text for plot.xts, you see that the function does not have a col argument. Together with the fact that partial matching of argument names doesn't seem to be allowed in the function, it explains why plot.xts it does not respond col =.
Compare with a case where partial matching works:
plot(x = 1:2, y = 1:2, type = "b"); plot(x = 1:2, y = 1:2, ty = "b"); "ty" matches "type".
See here: "If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched".
Q2. Instead you may use the colorset argument:
"color palette to use, set by default to rational choices" (colorset = 1:12).
plot.xts(my.ts, colorset = c('blue', 'green'))

R legend pch mix of character and numeric

Is it possible to use a mix of character and number as plotting symbols in R legend?
plot(x=c(2,4,8),y=c(5,4,2),pch=16)
points(x=c(3,5),y=c(2,4),pch="+")
legend(7,4.5,pch=c("+",16),legend=c("A","B")) #This is the problem
Use the numerical equivalent of the "+" character:
plot(x=c(2,4,8),y=c(5,4,2),pch=16)
points(x=c(3,5),y=c(2,4),pch="+")
legend(7,4.5,pch=c(43,16),legend=c("A","B"))
There are actually numerical equivalents for all symbols!
Source: Dave Roberts
The pch code is the concatenation of the Y and X coordinates of the above plot.
For example, the + symbol is in row (Y) 4 and column (X) 3, and therefore can be drawn using pch = 43.
Example:
plot(x=c(2,4,8),y=c(5,4,2),pch=16)
points(x=c(3,5),y=c(2,4),pch="+")
legend(7,4.5,pch=c(43,16),legend=c("A","B"))
My first thought is to plot the legend twice, once to print the character symbols and once to print the numeric ones:
plot(x=c(2,4,8),y=c(5,4,2),pch=16)
points(x=c(3,5),y=c(2,4),pch="+")
legend(7,4.5,pch=c(NA,16),legend=c("A","B")) # NA means don't plot pt. character
legend(7,4.5,pch=c("+",NA),legend=c("A","B"))
NOTE: Oddly, this works in R's native graphical device (on Windows) and in pdf(), but not in bmp() or png() devices ...
I bumped to this issue several time, so I wrote a tiny function below. You can use to specify the pch value, e.g.
pch=c(15:17,s2n("|"))
String to Numeric
As noted in previous answers, you can simply add the numerical equivalent of the numeric and character symbols you want to plot.
However, just a related aside: if you want to plot larger numbers (e.g., > 100) or strings (e.g., 'ABC') as symbols, you need to use a totally different approach based on using text().
`Plot(x,y,dat,type='n') ; text(x,y,labels = c(100,'ABC')
Creating a legend in this case is more complicated, and the best approach I've ever come up with is to stack legends on top of each other and using the legend argument for both the pch symbol and the description:
pchs <- c(100,'ABC','540',sum(13+200),'SO77')
plot(1:5,1:5,type='n',xlim=c(1,5.1))
text(1:5,1:5,labels = pchs)
legend(3.5,3,legend = pchs,bty='n',title = '')
legend(3.5,3,legend = paste(strrep(' ',12),'ID#',pchs),bty='n',title='Legend')
rect(xleft = 3.7, ybottom = 1.5, xright = 5.1, ytop = 3)
This uses strrep to concatenate spaces in order to shift the text over from the "symbols", and it uses rect to retroactively fit a box around the printed legend text.

lattice auto.key - how to adjust lines and points?

When I use barchart() and I get something like this (I know the image is not a bar chart but my auto.key produces the same legend):
I would like to fill the points and make them larger or set them to rectangles with the corresponding color.
When I use densityplot() and I get something like this:
I would like to make the lines "thicker" if possible.
See ?xyplot. Some details:
For your first question about changing colors use col argument, e.g.
barplot(table(mtcars$am, mtcars$gear), col = c("green", "yellow"))
But if you want to deal with a scatterplot instead of barplot (confused here) with modified symbols, then auto.key is not on option unfortunately, but something like this would work without problems:
xyplot(mtcars$hp ~ mtcars$wt, groups = mtcars$gear,
key = list(text = list(as.character(unique(mtcars$gear))),
points = list(pch = 10:12, col = 12:14)), pch = 10:12, col = 12:14)
For your second question use lwd:
densityplot(mtcars$hp, lwd = 3)
I just spent a good chunk of time on essentially this same problem. For some reason, the #daroczig style approach wasn't working for changing line types (including for the key) in a densityplot.
In any case, I think the "right" approach is to use trellis.par.set along with auto.key like so:
# Maybe we'll want this later
old.pars <- trellis.par.get()
trellis.par.set(superpose.symbol=list(pch = 10:12, col = 12:14))
xyplot(hp ~ wt, data=mtcars, groups = gear, auto.key=TRUE)
# Optionally put things back how they were
trellis.par.set(old.pars)
There's actually less typing this way (especially if you don't count my saving and restoring the original trellis pars), and less redundancy (allowing for DRY coding). Also, for the life of me, I can't figure out how to easily make multiple columns using key, but you can add columns as one of the elements of the auto.key list.
Also, make sure you're changing the right element! For example, if you changed plot.symbol (which sure sounds like the right thing), it would not do anything. Generally, for things based on xyplot, I believe superpose.* are the right elements to actually modify the symbols, lines, etc.
daroczig's answer is what I typically do when I face this kind of situation. In general, however, I prefer to use lattice default colors instead of specifying my own colors.
You can do that by doing this:
lattice.theme <- trellis.par.get()
col <- lattice.theme$superpose.symbol$col
pl <- xyplot(X ~ Y, groups=Z, data=dframe, pch=1:nlevels(dframe$Z),
type='o', key=list(text=list(levels(dframe$Z)), space='top',
points=list(pch=1:nlevels(dframe$Z), col=col),
lines=list(col=col),
columns=nlevels(dframe$Z)))

Resources