Colouring a subset of data in Lattice plots R - r

I have plotted a xyplot in lattice of shellfish catch rates by year grouped by survey area using the below code:
xyplot(catch.rate ~ Year | Area, data, xlab = "Year", ylab = "Catch rate",
col ="black", par.settings = list(strip.background=list(col="white")))
I have one year of data that I would like to highlight on the plot in a different colour (e.g. red). I created a subset of this data with:
subset <- grep("^0214A",data$Haul_ID,ignore.case=TRUE)
I have done something similar with the standard R plots using points before but as I am new to lattice and I am not sure how to do this using this package.

For plots without conditioning variables, the col= argument accepts a vector parallel to the points being plotted, so for instance
xyplot(mpg~disp, mtcars, col=mtcars$cyl, pch=20, cex=4)
colors points by the number of cylinders. Maybe you'd do
cols=c("red", "green")[grepl("^0214A", data$Haul_ID, ignore.case=TRUE) + 1L]
For plots with conditioning variables, one can write a panel function that accepts the col vector and subscripts, an index into the data describing the rows currently being plotted. Pass the arguments to the panel function to panel.xyplot(), adjusting the color of each point to reflect the subset of data in the panel. Here's the panel function
panel <- function(..., col, subscripts) {
panel.xyplot(..., col=col[subscripts])
}
and in action
xyplot(mpg ~ disp | factor(cyl), mtcars, col=mtcars$cyl,
panel=panel, pch=20, cex=4)

Related

Select data and name when pointing it chart with ggplotly

I did everything in ggplot, and it was everything working well. Now I need it to show data when I point a datapoint. In this example, the model (to identify point), and the disp and wt ( data in axis).
For this I added the shape (same shape, I do not actually want different shapes) to model data. and asked ggplot not to show shape in legend. Then I convert to plotly. I succeeded in showing the data when I point the circles, but now I am having problems with the legend showing colors and shapes separated with a comma...
I did not wanted to make it again from scrach in plotly as I have no experience in plotly, and this is part of a much larger shiny project, where the chart adjust automatically the axis scales and adds trend lines the the chart among other things (I did not include for simplicity) that I do not know how to do it in plotly.
Many thanks in advance. I have tried a million ways for a couple of days now, and did not succeed.
# choose mtcars data and add rowname as column as I want to link it to shapes in ggplot
data1 <- mtcars
data1$model <- rownames(mtcars)
# I turn cyl data to character as when charting it showed (Error: Continuous value supplied to discrete scale)
data1$cyl <- as.character(data1$cyl)
# linking colors with cylinders and shapes with models
ccolor <- c("#E57373","purple","green")
cylin <- c(6,4,8)
# I actually do not want shapes to be different, only want to show data of model when I point the data point.
models <- data1$model
sshapes <- rep(16,length(models))
# I am going to chart, do not want legend to show shape
graff <- ggplot(data1,aes(x=disp, y=wt,shape=model,col=cyl)) +
geom_point(size = 1) +
ylab ("eje y") + xlab('eje x') +
scale_color_manual(values= ccolor, breaks= cylin)+
scale_shape_manual(values = sshapes, breaks = models)+
guides(shape='none') # do not want shapes to show in legend
graff
chart is fine, but when converting to ggplotly, I am having trouble with the legend
# chart is fine, but when converting to ggplotly, I am having trouble with the legend
graffPP <- ggplotly(graff)
graffPP
legend is not the same as it was in ggplot
I succeeded in showing the model and data from axis when I point a datapoint in the chart... but now I am having problems with the legend....
To the best of my knowledge there is no easy out-of-the box solution to achieve your desired result.
Using pure plotly you could achieve your result by assigning legendgroups which TBMK is not available using ggplotly. However, you could assign the legend groups manually by manipulating the plotly object returned by ggplotly.
Adapting my answer on this post to your case you could achieve your desired result like so:
library(plotly)
p <- ggplot(data1, aes(x = disp, y = wt, shape = model, col = cyl)) +
geom_point(size = 1) +
ylab("eje y") +
xlab("eje x") +
scale_color_manual(values = ccolor, breaks = cylin) +
scale_shape_manual(values = sshapes, breaks = models) +
guides(shape = "none")
gp <- ggplotly(p = p)
# Get the names of the legend entries
df <- data.frame(id = seq_along(gp$x$data), legend_entries = unlist(lapply(gp$x$data, `[[`, "name")))
# Extract the group identifier, i.e. the number of cylinders from the legend entries
df$legend_group <- gsub("^\\((\\d+).*?\\)", "\\1", df$legend_entries)
# Add an indicator for the first entry per group
df$is_first <- !duplicated(df$legend_group)
for (i in df$id) {
# Is the layer the first entry of the group?
is_first <- df$is_first[[i]]
# Assign the group identifier to the name and legendgroup arguments
gp$x$data[[i]]$name <- df$legend_group[[i]]
gp$x$data[[i]]$legendgroup <- gp$x$data[[i]]$name
# Show the legend only for the first layer of the group
if (!is_first) gp$x$data[[i]]$showlegend <- FALSE
}
gp

R:par(cex.lab=2) doesn't work in plot(effect(),…)

I made a linear regression with a database including group(1=smoke,2=control) , gender(1=m,2=f) and a dependent variable like weight. I want to see the interactions between group and gender with a plot. I need to change the size of the label of axes but it doesn't work with par(). The code is like this:
lin <- lm(weight ~ group + gender + group:gender, data=data)
par(cex.lab = 2, cex.axis = 2)
library(effects)
plot(effect("group:gender",lin,,list(gender=c(1,2))),multiline=T)
The size doesn't change. And if I want to delete the axis like this:
plot(effect("group:gender",lin,,list(gender=c(1,2))),multiline=T,axes=FALSE)
It gives me this error:
$ operator is invalid for atomic vectors
how to solve this?
I don't know why that is happening, my guess is that the class(effect) is "eff" which perhaps may not be suited for plot to render it properly, to avoid this convert this object to data.frame and then use the par functionality to do your task.
Answer to your question: Here if you change par options with different values, your font size will change like in the graph which I have mentioned down.
You can do this:
library(effects)
lin <- lm(mpg ~ cyl + am + am:cyl, data=mtcars)
par(cex.lab=1.2, cex.axis=1.2, cex.main=1.2, cex.sub=1.2) #Here you can check, the par options, if you change it the font will incrase or decrese
effect1 <- data.frame(effect("cyl:am",lin,,list(cyl=c(4,6,8))))
effects <- effect1[,c("cyl","am", "fit")] ##Keeping only the required columns
You can do a plotting with effects , by using all three objects: cyl, am and fit, However, the lines are getting joined , I am not aware any functionality like ggplot's group in base plot R. So I will split it and then plot it.
xvals <- split(effects$am,effects$cyl) #split x-axis basis cyl
yvals <- split(effects$fit,effects$cyl) #split y-axis basis cyl
plot(1:max(unlist(xvals)),xlim = c(0,max(unlist(xvals))),ylim=(c(0,max(unlist(yvals)))),type="n", main="plot b/w mpg, am * cyl",
xlab="am", ylab="mpg") #adding header, labels and xlim and ylim to the graphs
Map(lines,xvals,yvals,col=c("red","blue","black"),pch=1:2,type="o") #plotting the objects using Map
legend("bottomright", legend=c("8", "6", "4"),
col=c("red", "blue", "black"), lty=1:2, cex=0.8) #adding the legend
Output:
With par options fixed at 1.2
With par options fixed at 1.5:

Differentiating each Line with different type in `ggsurv` plots (or in `plot`)

I am using Rstudio. I am using ggsurv function from GGally package for drawing Kaplan-Meier curves for my data (for survival analysis), from tutorial here. I am using it instead of plot because ggsurv takes care of legends by itself.
As shown on the link, multiple curves are differentiated by color. I want to differentiate based on linetype. The tutorial does not seem to have any option for that. Following is my command:
surv1 <- survfit(Surv(DaysOfTreatment,Survived)~AgeOnFirstContactGroup)
print(ggsurv(surv1, lty.est = 3)+ ylim(0, 1))
lty.est=3(or 2) gives same dashed lines for all the lines. I want differently dashed line for each line. Using lty=type gives error:object 'type' not found. And lty=type would work in ggplot but ggplot does not directly deal with survfit plots.
Please show me how to differentiate curves by linetype in either ggsurv or simple plot (although I would prefer ggsurv because it takes care of legends)
From the documentation for ggsurv
lty.est: linetype of the survival curve(s). Vector length should be
either 1 or equal to the number of strata.
So, to get a different line type for each stratum, set lty.est equal to a vector of the same length as the number of lines you are plotting, with each value corresponding to a different line type.
For example, using the lung data from the survival package
library(GGally)
library(survival)
data(lung)
surv1 <- survfit(Surv(time,status) ~ sex, data = lung)
ggsurv(surv1, lty.est=c(1,2), surv.col = 1)
Gives the following plot
You can add ggplot themes or other ggplot elements to the plot too. For example, we can improve the appearance using the cowplot theme as follows
library(ggplot2)
library(cowplot)
ggsurv(surv1, lty.est=c(1,2), surv.col = 1) + theme_cowplot()
If you need to change the legend labels after differentiating by linetype, then you can do it this way
ggsurv(surv1, lty.est=c(1,2), surv.col = 1) +
guides(colour = FALSE) +
scale_linetype_discrete(name = 'Sex', breaks = c(1,2), labels = c('Male', 'Female'))

create a boxplot in R that labels a box with the sample size (N)

Is there a way to create a boxplot in R that will display with the box (somewhere) an "N=(sample size)"? The varwidth logical adjusts the width of the box on the basis of sample size, but that doesn't allow comparisons between different plots.
FWIW, I am using the boxplot command in the following fashion, where 'f1' is a factor:
boxplot(xvar ~ f1, data=frame, xlab="input values", horizontal=TRUE)
Here's some ggplot2 code. It's going to display the sample size at the sample mean, making the label multifunctional!
First, a simple function for fun.data
give.n <- function(x){
return(c(y = mean(x), label = length(x)))
}
Now, to demonstrate with the diamonds data
ggplot(diamonds, aes(cut, price)) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text")
You may have to play with the text size to make it look good, but now you have a label for the sample size which also gives a sense of the skew.
You can use the names parameter to write the n next to each factor name.
If you don't want to calculate the n yourself you could use this little trick:
# Do the boxplot but do not show it
b <- boxplot(xvar ~ f1, data=frame, plot=0)
# Now b$n holds the counts for each factor, we're going to write them in names
boxplot(xvar ~ f1, data=frame, xlab="input values", names=paste(b$names, "(n=", b$n, ")"))
To get the n on top of the bar, you could use text with the stat details provided by boxplot as follows
b <- boxplot(xvar ~ f1, data=frame, plot=0)
text(1:length(b$n), b$stats[5,]+1, paste("n=", b$n))
The stats field of b is
a matrix, each column contains the extreme of the lower whisker, the lower hinge, the median, the upper hinge and the extreme of the upper whisker for one group/plot.
The gplots package provides boxplot.n, which according to the documentation produces a boxplot annotated with the number of observations.
I figured out a workaround using the Envstats package. This package needs to be downloaded, loaded and activated using:
library(Envstats)
The stripChart (different from stripchart) does add to the chart some values such as the n values. First I plotted my boxplot. Then I used the add=T in the stripChart. Obviously, many things were hidden in the stripChart code so that they do not show up on the boxplot. Here is the code I used for the stripChart to hide most items.
Boxplot with integrated stripChart to show n values:
stripChart(data.frame(T0_G1,T24h_G1,T96h_G1,T7d_G1,T11d_G1,T15d_G1,T30d_G1), show.ci=F,axes=F,points.cex=0,n.text.line=1.6,n.text.cex=0.7,add=T,location.scale.text="none")
So boxplot
boxplot(data.frame(T0_G1,T24h_G1,T96h_G1,T7d_G1,T11d_G1,T15d_G1,T30d_G1),main="All Rheometry Tests on Egg Plasma at All Time Points at 0.1Hz,0.1% and 37 Set 1,2,3", names=c("0h","24h","96h","7d ", "11d", "15d", "30d"),boxwex=0.6,par(mar=c(8,4,4,2)))
Then stripChart
stripChart(data.frame(T0_G1,T24h_G1,T96h_G1,T7d_G1,T11d_G1,T15d_G1,T30d_G1), show.ci=F,axes=F,points.cex=0,n.text.line=1.6,n.text.cex=0.7,add=T,location.scale.text="none")
You can always adjust the high of the numbers (n values) so that they fit where you want.

how to script in R over a factor's levels

I have a data frame with a quantitative variable, x, and several different factors, f1, f2, ...,fn. The number of levels is not constant across factors.
I want to create a (single) plot of densities of x by factor level fi.
I know how to hand code this for a specific factor. For example, here is the plot for a factor with two levels.
# set up the background plot
plot(density(frame$x[frame$f1=="level1"]))
# add curves
lines(density(frame$x[frame$f1=="level2"]))
I could also do this like so:
# set up the background plot
plot(NA)
# add curves
lines(density(frame$x[frame$f1=="level1"]))
lines(density(frame$x[frame$f1=="level2"]))
What I'd like to know is how can I do this if I only specify the factor as input. I don't even know how to write a for loop that would do what I need, and I have the feeling that the 'R way' would avoid for loops.
Bonus: For the plots, I would like to specify limiting values for the axes. Right now I do this in this way:
xmin=min(frame$x[frame$f1=="level1"],frame$x[frame$f1=="level2"])
How can I include this type of calculation in my script?
I'm assuming your data is in the format (data frame called df)
f1 f2 f3 fn value
A........................... value 1
A............................value 2
.............................
B............................value n-1
B............................value n
In that cause, lattice (or ggplot2) will be very useful.
library(lattice)
densityplot(~value, groups = f1, data = df, plot.points = FALSE)
This should get you close to what you are looking for, I think.
Greg
You could also do:
# create an empty plot. You may want to add xlab, ylab etc
# EDIT: also add some appropriate axis limits with xlim and ylim
plot(0, 0, "n", xlim=c(0, 10), ylim=c(0, 2))
levels <- unique(frame$f1)
for (l in levels)
{
lines(density(frame$x[frame$f1==l]))
}
ggplot2 code
library(ggplot2)
ggplot(data, aes(value, colour = f1)) +
stat_density(position = "identity")

Resources