lattice auto.key - how to adjust lines and points? - r

When I use barchart() and I get something like this (I know the image is not a bar chart but my auto.key produces the same legend):
I would like to fill the points and make them larger or set them to rectangles with the corresponding color.
When I use densityplot() and I get something like this:
I would like to make the lines "thicker" if possible.

See ?xyplot. Some details:
For your first question about changing colors use col argument, e.g.
barplot(table(mtcars$am, mtcars$gear), col = c("green", "yellow"))
But if you want to deal with a scatterplot instead of barplot (confused here) with modified symbols, then auto.key is not on option unfortunately, but something like this would work without problems:
xyplot(mtcars$hp ~ mtcars$wt, groups = mtcars$gear,
key = list(text = list(as.character(unique(mtcars$gear))),
points = list(pch = 10:12, col = 12:14)), pch = 10:12, col = 12:14)
For your second question use lwd:
densityplot(mtcars$hp, lwd = 3)

I just spent a good chunk of time on essentially this same problem. For some reason, the #daroczig style approach wasn't working for changing line types (including for the key) in a densityplot.
In any case, I think the "right" approach is to use trellis.par.set along with auto.key like so:
# Maybe we'll want this later
old.pars <- trellis.par.get()
trellis.par.set(superpose.symbol=list(pch = 10:12, col = 12:14))
xyplot(hp ~ wt, data=mtcars, groups = gear, auto.key=TRUE)
# Optionally put things back how they were
trellis.par.set(old.pars)
There's actually less typing this way (especially if you don't count my saving and restoring the original trellis pars), and less redundancy (allowing for DRY coding). Also, for the life of me, I can't figure out how to easily make multiple columns using key, but you can add columns as one of the elements of the auto.key list.
Also, make sure you're changing the right element! For example, if you changed plot.symbol (which sure sounds like the right thing), it would not do anything. Generally, for things based on xyplot, I believe superpose.* are the right elements to actually modify the symbols, lines, etc.

daroczig's answer is what I typically do when I face this kind of situation. In general, however, I prefer to use lattice default colors instead of specifying my own colors.
You can do that by doing this:
lattice.theme <- trellis.par.get()
col <- lattice.theme$superpose.symbol$col
pl <- xyplot(X ~ Y, groups=Z, data=dframe, pch=1:nlevels(dframe$Z),
type='o', key=list(text=list(levels(dframe$Z)), space='top',
points=list(pch=1:nlevels(dframe$Z), col=col),
lines=list(col=col),
columns=nlevels(dframe$Z)))

Related

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

How do I exclude parameters from an RDA plot

I'm still relatively inexperienced manipulating plots in R, and am in need of assistance. I ran a redundancy analysis in R using the rda() function, but now I need to simplify the figure to exclude unnecessary information. The code I'm currently using is:
abio1516<-read.csv("1516 descriptors.csv")
attach(abio1516)
bio1516<-read.csv("1516habund.csv")
attach(bio1516)
rda1516<-rda(bio1516[,2:18],abio1516[,2:6])
anova(rda1516)
RsquareAdj(rda1516)
summary(rda1516)
varpart(bio1516[,2:18],~Distance_to_source,~Depth, ~Veg._cover, ~Surface_area,data=abio1516)
plot(rda1516,bty="n",xaxt="n",yaxt="n",main="1516; P=, R^2=",
ylab="Driven by , Var explained=",xlab="Driven by , Var explained=")
The produced plot looks like this:
Please help me modify my code to: exclude the sites (sit#), all axes, and the internal dashed lines.
I'd also like to either expand the size of the field, or move the vector labels to all fit in the plotting field.
updated as per responses, working code below this point
plot(rda,bty="n",xaxt="n",yaxt="n",type="n",main="xxx",ylab="xxx",xlab="xxx
Overall best:xxx")
abline(h=0,v=0,col="white",lwd=3)
points(rda,display="species",col="blue")
points(rda,display="cn",col="black")
text(rda,display="cn",col="black")
Start by plotting the rda with type = "n" which generates an empty plot to which you can add the things you want. The dotted lines are hard coded into the plot.cca function, so you need either make your own version, or use abline to hide them (then use box to cover up the holes in the axes).
require(vegan)
data(dune, dune.env)
rda1516 <- rda(dune~., data = dune.env)
plot(rda1516, type = "n")
abline(h = 0, v = 0, col = "white", lwd = 3)
box()
points(rda1516, display = "species")
points(rda1516, display = "cn", col = "blue")
text(rda1516, display = "cn", col = "blue")
If the text labels are not in the correct position, you can use the argument pos to move them (make a vector as long as the number of arrows you have with the integers 1 - 4 to move the label down, left, up, or right. (there might be better solutions to this)

Bug in dotchart pch?

I think there may be a bug in the way the pch parameter is read within the dotchart function, but would appreciate peer confirmation before reporting it.
In the following, I would like both colour and symbol to vary with the group. Colour works fine, as expected, but not symbol.
foo <- data.frame(Specimen=paste("Specimen", 1:18),
Group=c(rep("Benign", 4),
rep("In-situ", 6),
rep("Invasive", 8)),
Outcome=rweibull(18, 5) + (1:18 / 18))
with(foo, dotchart(Outcome,
groups = Group,
color = c("green", "orange", "red")[Group],
pch=c(16, 15, 17)[Group],
xlab="Outcome measure /bar",
labels = Specimen))
There is an easy but rather bizarre workaround by reversing the "Group" column encoding pch :
with(foo, dotchart(Outcome,
groups = Group,
color = c("green", "orange", "red")[Group],
pch=c(16, 15, 17)[rev(Group)],
xlab="Outcome measure /bar",
labels = Specimen))
However, I cannot see a single legitimate reason why the vector for pch should have to be reversed, particularly since colour seems to work entirely as expected. Thoughts?
Incidentally, the reason I generally try to vary the symbol as well as the colour for different groups in a chart is for the benefit of colour blind readers. Granted, it is not so important in this case.
I agree this may be a bug (which I am genuinely cautious about in base R functions like this).
Specficially, dotchart reorders the color and lcolor (line color) arguments here:
o <- sort.list(as.numeric(groups), decreasing = TRUE)
x <- x[o]
groups <- groups[o]
color <- rep_len(color, length(groups))[o]
lcolor <- rep_len(lcolor, length(groups))[o]
...and those are used in the subsequent abline and points calls, but pch is passed on unchanged. The fix would likely be to simply add the line,
pch <- rep_len(pch, length(groups))[o]
If I wanted to put my pedantic hat on (which is a good idea before submitting a bug report), I would note that the documentation for ?dotchart specifies:
color the color(s) to be used for points and labels.
for the color argument, but only:
pch the plotting character or symbol to be used.
for the pch argument. Some may argue that this "clearly" implies that only color is intended to take multiple values, and so in that sense this isn't a "bug".
This definitely looks like a bug. I have a dataset where samples have a fairly complex 4*4 color+pch coding corresponding to things that are also in the sample names, on top of groups, and the pch values just don't seem to be reordered at all during group reordering. I'll try to submit a bug report in the next weeks. I have R 3.6.1

Dimple dPlot color x-axis bar values in R

I'm attempting to set manual colors for Dimple dPlot line values and having some trouble.
d1 <- dPlot(
x="Date",
y="Count",
groups = "Category",
data = AB_DateCategory,
type = 'line'
)
d1$xAxis(orderRule = "Date")
d1$yAxis(type = "addMeasureAxis")
d1$xAxis(
type = "addTimeAxis",
inputFormat = "%Y-%m-%d",
outputFormat = "%Y-%m-%d",
)
The plot comes out looking great, but I would like to manually set the "Category" colors. Right now, it's set to the defaults and I cannot seem to find a method of manually setting a scale.
I have been able to set the defaults using brewer.pal, but I want to match other colors in my report:
d1$defaultColors(brewer.pal(n=4,"Accent"))
Ideally, these are my four colors - the category values I'm grouping on are R, D, O and U.
("#377EB8", "#4DAF4A", "#E41A1C", "#984EA3"))
If I understand correctly, you want to make sure R is #377EB8, etc. To match R, D, O, U consistently to the colors especially across multiple charts, you will need to do something like this.
d1$defaultColors = "#!d3.scale.ordinal().range(['#377EB8', '#4DAF4A', '#E41A1C', '#984EA3']).domain(['R','D','O','U'])!#"
This is on my list of things to make easier.
Let me know if this doesn't work.
The issue with the accepted answer above is that defining an ordinal scale will not guarantee that specific colors are bound to specific categories R, D, O and U. The color mapping will change depending on the input data. To assign each color specifically you can use assignColor like this
d1$setTemplate(afterScript = '<script>
myChart.assignColor("R","#377EB8");
myChart.draw();
</script>')

Plotting three densities on the same graph in different line patterns with titles etc

I am very, very new to R so please forgive the basic nature of my question. In short, I have done a lot of Google searching to try to answer this, but I find that even the basic guides available, and simple discussions on forums are assuming more prior knowledge than I have, especially when it comes to outlining what all of the coding terms are and what changing them means for a plot.
In short I have a tab formatted table with three columns of data that I wish to plot densities for on a single graph. I would like the lines to be different patterns (dotted, dashed etc. whatever makes it easy to tell them apart, I cannot use colours as my supervisor is colour blind).
I have code that reads in the data and makes accessible the columns I am interested in:
mydata <- read.table("c:/Users/Demon/Desktop/Thesis/Fst_all_genome.txt", header=TRUE,
sep="\t")
fstdata <- data.frame(Fst_ceu_mkk =rnorm(10),
Fst_ceu_yri =rnorm(10),
Fst_mkk_yri =rnorm(10))
Where do I go from here?
Appendix A of 'An Introduction to R' has a nice walkthrough tutorial you can do in ten minutes; it teaches among other things about line types etc
After that, plotting densities was explained dozens of times here too; search in the search box above for eg '[r] density'. There is also the R Graph Gallery (possibly down right now) and more.
A nice, free guide I often recommend is John Verzani's simpleR which stresses graphs a lot and will teach you what you need here.
Two options for you to explore using high-level graphics.
# dummy data
d = data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
You first need to reshape the data from wide to long format,
require(reshape2)
m = melt(d)
ggplot2 graphics
require(ggplot2)
ggplot(data = m, mapping = aes(x = value, linetype = variable)) +
geom_line(stat = "density")
Lattice graphics
Using the same melt()ed data,
require(lattice)
densityplot( ~ value, data = m, group = variable,
auto.key = TRUE, par.settings = col.whitebg())
If you need something very simple, you could do simply:
plot(density(mydata$col_1))
lines(density(mydata$col_2), lty = 2)
lines(density(mydata$col_2), lty = 3)
If the second and third density curves are far away from the first, you'll need define xy limits of the plotting region explicitly:
dens1 <- density(mydata$col_1)
dens2 <- density(mydata$col_2)
dens3 <- density(mydata$col_3)
plot(dens1, xlim = range(dens1$x, dens2$x, dens3$x),
ylim = range(dens1$y, dens2$y, dens3$y))
lines(density(mydata$col_2), lty = 2)
lines(density(mydata$col_2), lty = 3)
Hope this helps.

Resources