Using two data frames in ggplot and having trouble with the legend - r

I have plotted x and y variables in a scatter plot with a legend that provides a shape and color for each point based off a single sample id variable. I want to overlay points from a second data frame, however when I try to add in the points from the second data frame I get an error saying it can't find the variable that I used to specify the color and shape of the points from the original data frame. I am using this code:
p=ggplot(HebWater, aes(x = PCSrCa.24, y = SrIso, group = Location,
color = Location, shape = Location)) +
geom_point(size=6) +
scale_shape_manual(values = 1:17) +
theme(panel.grid.major=element_line(colour="white"),
panel.grid.minor=element_blank(),panel.background=element_rect(colour="black",fill="white"))
p
p1=p+geom_errorbar(aes(ymin=SrIso-SDSrIso*3, ymax=SrIso+SDSrIso*3))+
geom_errorbarh(aes(xmin=PCSrCa.24-SDSrCa*3, xmax=PCSrCa.24+SDSrCa*3))
p1
p2=p1+geom_point(data=HebOto, aes(SrCa,SrIso))
p2
Everything works fine until I try to run the code for graph p2, I was already successful plotting both data frames on the same graph but was not able to get the legend to display properly no matter how I tried to change the shape, and color parameters using this code:
ggplot(HebWater, aes(PCSrCa.24, SrIso))+
geom_errorbar(aes(ymin=SrIso-SDSrIso*3, ymax=SrIso+SDSrIso*3))+
scale_shape_manual(values=1:17)+
geom_errorbarh(aes(xmin=PCSrCa.24-SDSrCa*3, xmax=PCSrCa.24+SDSrCa*3))+
geom_point(data=HebOto, aes(SrCa,SrIso))+
geom_point(data=HebWater, show_guide=TRUE, shape=c(1:17), colour=c(1:17), size=6)+
theme(panel.grid.major=element_line(colour="white"),
panel.grid.minor=element_blank(),
panel.background=element_rect(colour="black",fill="white"))
My data frames are both organized in this fashion:
> head(HebWater)
Location SrIso PCSrCa.24 SDSrIso SDSrCa PCSrCa.28
1 Gib (baseflow) 0.70966 0.2911440 0.000643719 0.0308056 0.3396680
2 Fire (baseflow) 0.71006 0.1119312 0.000643719 0.0308056 0.1305864
3 Mad R (runoff) 0.71052 0.2043264 0.000643719 0.0308056 0.2383808

HebWater <- data.table(
SrIso = runif(20),
SDSrIso = runif(20),
PCSrCa.24 = runif(20),
SDSrCa = runif(20)
)
HebOto <- data.table(
SrCa = runif(20),
SrIso = runif(20)
)
library(ggplot2)
ggplot(HebWater, aes(PCSrCa.24, SrIso))+
geom_errorbar(aes(ymin=SrIso-SDSrIso*3, ymax=SrIso+SDSrIso*3))+
scale_shape_manual(values=1:17)+
geom_errorbarh(aes(xmin=PCSrCa.24-SDSrCa*3, xmax=PCSrCa.24+SDSrCa*3))+
geom_point(data=HebOto, aes(SrCa,SrIso))+
geom_point(data=HebWater, show_guide=TRUE, shape=c(1:17), colour=c(1:17), size=6)
Produces this error:
Error: Incompatible lengths for set aesthetics: shape, colour, size
This seems logical to me since you fix the shape, color and size of your points in the second geom_point(). There is no legend to give since those do not depend at all on the data! If you want a legend, you have to use aes(shape=some_variable1, colour=some_variable2, size=some_variable3) and then if necessary force the randering with scale_xxxxxx_manual().

Related

Select data and name when pointing it chart with ggplotly

I did everything in ggplot, and it was everything working well. Now I need it to show data when I point a datapoint. In this example, the model (to identify point), and the disp and wt ( data in axis).
For this I added the shape (same shape, I do not actually want different shapes) to model data. and asked ggplot not to show shape in legend. Then I convert to plotly. I succeeded in showing the data when I point the circles, but now I am having problems with the legend showing colors and shapes separated with a comma...
I did not wanted to make it again from scrach in plotly as I have no experience in plotly, and this is part of a much larger shiny project, where the chart adjust automatically the axis scales and adds trend lines the the chart among other things (I did not include for simplicity) that I do not know how to do it in plotly.
Many thanks in advance. I have tried a million ways for a couple of days now, and did not succeed.
# choose mtcars data and add rowname as column as I want to link it to shapes in ggplot
data1 <- mtcars
data1$model <- rownames(mtcars)
# I turn cyl data to character as when charting it showed (Error: Continuous value supplied to discrete scale)
data1$cyl <- as.character(data1$cyl)
# linking colors with cylinders and shapes with models
ccolor <- c("#E57373","purple","green")
cylin <- c(6,4,8)
# I actually do not want shapes to be different, only want to show data of model when I point the data point.
models <- data1$model
sshapes <- rep(16,length(models))
# I am going to chart, do not want legend to show shape
graff <- ggplot(data1,aes(x=disp, y=wt,shape=model,col=cyl)) +
geom_point(size = 1) +
ylab ("eje y") + xlab('eje x') +
scale_color_manual(values= ccolor, breaks= cylin)+
scale_shape_manual(values = sshapes, breaks = models)+
guides(shape='none') # do not want shapes to show in legend
graff
chart is fine, but when converting to ggplotly, I am having trouble with the legend
# chart is fine, but when converting to ggplotly, I am having trouble with the legend
graffPP <- ggplotly(graff)
graffPP
legend is not the same as it was in ggplot
I succeeded in showing the model and data from axis when I point a datapoint in the chart... but now I am having problems with the legend....
To the best of my knowledge there is no easy out-of-the box solution to achieve your desired result.
Using pure plotly you could achieve your result by assigning legendgroups which TBMK is not available using ggplotly. However, you could assign the legend groups manually by manipulating the plotly object returned by ggplotly.
Adapting my answer on this post to your case you could achieve your desired result like so:
library(plotly)
p <- ggplot(data1, aes(x = disp, y = wt, shape = model, col = cyl)) +
geom_point(size = 1) +
ylab("eje y") +
xlab("eje x") +
scale_color_manual(values = ccolor, breaks = cylin) +
scale_shape_manual(values = sshapes, breaks = models) +
guides(shape = "none")
gp <- ggplotly(p = p)
# Get the names of the legend entries
df <- data.frame(id = seq_along(gp$x$data), legend_entries = unlist(lapply(gp$x$data, `[[`, "name")))
# Extract the group identifier, i.e. the number of cylinders from the legend entries
df$legend_group <- gsub("^\\((\\d+).*?\\)", "\\1", df$legend_entries)
# Add an indicator for the first entry per group
df$is_first <- !duplicated(df$legend_group)
for (i in df$id) {
# Is the layer the first entry of the group?
is_first <- df$is_first[[i]]
# Assign the group identifier to the name and legendgroup arguments
gp$x$data[[i]]$name <- df$legend_group[[i]]
gp$x$data[[i]]$legendgroup <- gp$x$data[[i]]$name
# Show the legend only for the first layer of the group
if (!is_first) gp$x$data[[i]]$showlegend <- FALSE
}
gp

How to incorporate data into plot which was constructed in ggplot2 using data from another file (R)?

Using a dataset, I have created the following plot:
I'm trying to create the following plot:
Specifically, I am trying to incorporate Twitter names over the first image. To do this, I have a dataset with each name in and a value that corresponds to a point on the axes. A snippet looks something like:
Name Score
#tedcruz 0.108
#RealBenCarson 0.119
Does anyone know how I can plot this data (from one CSV file) over my original graph (which is constructed from data in a different CSV file)? The reason that I am confused is because in ggplot2, you specify the data you want to use at the start, so I am not sure how to incorporate other data.
Thank you.
The question you ask about ggplot combining source of data to plot different element is answered in this post here
Now, I don't know for sure how this is going to apply to your specific data. Here I want to show you an example that might help you to go forward.
Imagine we have two data.frames (see bellow) and we want to obtain a plot similar to the one you presented.
data1 <- data.frame(list(
x=seq(-4, 4, 0.1),
y=dnorm(x = seq(-4, 4, 0.1))))
data2 <- data.frame(list(
"name"=c("name1", "name2"),
"Score" = c(-1, 1)))
The first step is to find the "y" coordinates of the names in the second data.frame (data2). To do this I added a y column to data2. y is defined here as a range of points from the may value of y to the min value of y with some space for aesthetics.
range_y = max(data1$y) - min(data1$y)
space_y = range_y * 0.05
data2$y <- seq(from = max(data1$y)-space, to = min(data1$y)+space, length.out = nrow(data2))
Then we can use ggplot() to plot data1 and data2 following some plot designs. For the current example I did this:
library(ggplot2)
p <- ggplot(data=data1, aes(x=x, y=y)) +
geom_point() + # for the data1 just plot the points
geom_pointrange(data=data2, aes(x=Score, y=y, xmin=Score-0.5, xmax=Score+0.5)) +
geom_text(data = data2, aes(x = Score, y = y+(range_y*0.05), label=name))
p
which gave this following plot:

How does one control the appearance (e.g. line size, line type, colour) of mqgam plots produced using plot.mgamViz from the "mgcViz" package?

I am using quantile regression in R with the qgam package and visualising them using the mgcViz package, but I am struggling to understand how to control the appearance of the plots. The package effectively turns gams (in my case mqgams) into ggplots.
Simple reprex:
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
plot.mgamViz(getViz(egfit))
I am able to control things that can be added, for example the axis labels and theme of the plot, but I'm struggling to effect things that would normally be addressed in the aes() or geom_x() functions.
How would I control the thickness of the line? If this were a normal geom_smooth() or geom_line() I'd simply put size = 1 inside of the geoms, but I cannot see how I'd do so here.
How can I control the linetype of these lines? The "id" is continuous and one cannot supply a linetype to a continuous scale. If this were a nomral plot I would convert "id" to a character, but I can't see a way of doing so with the plot.mgamViz function.
How can I supply a new colour scale? It seems as though if I provide it with a new colour scale it invents new ID values to put on the legend that don't correlate to the actual "id" values, e.g.
plot.mgamViz(getViz(egfit)) + scale_colour_viridis_c()
I fully expect this to be relatively simple and I'm missing something obvious, and imagine the answer to all three of these subquestions are very similar to one another. Thanks in advance.
You need to extract your ggplot element using this:
p1 <- plot.mgamViz(getViz(egfit))
p <- p1$plots [[1]]$ggObj
Then, id should be as.factor:
p$data$id <- as.factor(p$data$id)
Now you can play with ggplot elements as you prefer:
library(mgcViz)
egfit <- mqgam(data = iris,
Sepal.Length ~ s(Petal.Length),
qu = c(0.25,0.5,0.75))
p1 <- plot.mgamViz(getViz(egfit))
# Taking gg infos and convert id to factor
p <- p1$plots [[1]]$ggObj
p$data$id <- as.factor(p$data$id)
# Changing ggplot attributes
p <- p +
geom_line(linetype = 3, size = 1)+
scale_color_brewer(palette = "Set1")+
labs(x="Petal Length", y="s(Petal Length)", color = "My ID labels:")+
theme_classic(14)+
theme(legend.position = "bottom")
p
Here the generated plot:
Hope it is useful!

Apply colours to a line graph using ggplot

I have produced a line graph using ggplot. The data contains two groups with 9 samples each that were followed up over 11 time points (x-values). Now, I have tried to give each sample line of one group an individual colour, while giving only a single colour to the samples of the other group (here: black).
Here is the important part of my script.
data <- read.csv2("140929 example.csv",check.names = FALSE)
library(reshape2)
data.m <- melt(data)
library(ggplot2)
ggplot(data.m, aes(x = variable, y = value, group = Group,colour = Group))+
geom_line()+
theme_bw()
This turns out a graph with individual colours for all lines.
How can I improve? Thank you for your help.
This is a bit hard to tell without data or a picture of your current plot. But you can try assigning a new variable to your data.m to control color. I.E. set a new variable up as a sequence then for the solid color group set it up to be the same throughout that group.
data.m$mycolor <- 1:nrow(data.m)
data.m[data.m$group == somegroup,]$mycolor <- 0
Then in your aesthetic use colour = mycolor

Plotting error while using ggplot faceting function in R

I am trying to do the comparison of my observed and modeled data sets for two stations. One station is called station "red" and another is called "blue". I was able to create the facets but when I tried to add two series in one facet, only one facet got updated while other didn't.
This means for blue only one series is plotted and for red two series are plotted.
The code I used is as follows:
# install.packages("RCurl", dependencies = TRUE)
require(RCurl)
out <- postForm("https://dl.dropbox.com/s/ainioj2nn47sis4/watersurf1.csv?dl=1", format="csv")
watersurf <- read.csv(textConnection(out))
watersurf[1:100,]
watersurf$coupleid <- factor(rep(unlist(by(watersurf$id,watersurf$group1,
function(x) {ave(as.numeric(unique(x)),FUN=seq_along)}
)),each=6239))
p <- ggplot(data=watersurf,aes(x=time,y=data,group=id))+geom_line(aes(linetype=group1),size=1)+facet_wrap(~coupleid)
p
Is it also possible to add a third series in the graph but of unequal length (i.e not same interval)?
The output is
I followed the example on this page to create the graphs.
http://www.ats.ucla.edu/stat/r/faq/growth.htm
Is this what you are looking for,
ggplot(data = watersurf, aes( x = time, y = data))
+ geom_line(aes(linetype = group1, colour = group1), size = 0.2)
+ facet_wrap(~ id)

Resources