R: ggplot, legend control using scale_shape_manual and one data frame - r

Using scale shape manual in ggplot, I created different values for three different types of factories (squares, triangles, and circles), which corresponds to North, South, and West respectively. Is it possible to have the North/South/West labels in the legend without creating three different data frames for each region? Can I add these labels to the original data frame?
I have one data frame for a plot (as recommended by the ggplot2 book), and with my code below, the default legend lists every row in my data frame, which is repetitive and not what I want.
Basically, I would like to know the best way to label these regions in the plot. The only reason I would like to maintain one data frame is because the code will be easy to use over and over again by just switching the data frame (the benefit of one df mentioned in the ggplot2 book).
I think part of the problem is that I am using scale shape manual to assign values to each point individually. Should I put the North/South/West labels in my data frame and alter my scale shape manual? If so, what is the best way to accomplish this?
Please let me know if my question is unclear. My code is below, and it replicates my plot as it stands. Thanks.
#Data frame
points <- c(3,5,4,7,12)
bars <- c(.8,1.2,1.4,2.1,4)
points_df<-data.frame(points)
row.names(points_df) <- c( "Factory 1","Factory 2","Factory 3","Factory 4","Factory 5" )
df<-data.frame(Output=points,Errors=bars,lev.names= rownames(points_df))
df$lev.names<-factor(df$lev.names,levels=df$lev.names[order(df$Output)])
# GGPLOT #
library(ggplot2)
library(scales)
p2 <- ggplot(df,aes(lev.names,Output,shape=lev.names))
p2 <- p2 +geom_errorbar(aes(ymin=Output-Errors, ymax=Output+Errors), width=0,color="gray40", lty=1, size=0)
p2 <- p2 + geom_point(aes(size=2))
p2 <- p2 + scale_shape_manual(values=c(6,7,6,1,1))
p2 <- p2 + theme_bw() + xlab(" ") + ylab("Output")
p2 <- p2 + opts(title = expression("Production"))
p2 <- p2+ coord_flip()
print(p2)

Yes, put the location in your data.frame and use it in the aes mapping:
df$location <- c("North","South","North","West","West")
p2 <- ggplot(df,aes(lev.names,Output,shape=location)) +
geom_errorbar(aes(ymin=Output-Errors, ymax=Output+Errors),
width=0,color="gray40", lty=1, size=0) +
geom_point(size=3) +
theme_bw() + xlab(" ") + ylab("Output") +
ggtitle(expression("Production")) +
coord_flip()
print(p2)
I've also fixed some other stuff (e.g., opts is deprecated and you don't want to map size, but to set it).

Related

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

Plotting matrices of different sizes in one window (in R)

I’m trying to create color matrices to illustrate the change in the standardized values of several variables over 25 years. I’ve divided up the variables into a few subcategories and want to show the results for each subcategory in different plots in one window with one colorkey and title. I tried to do this using reshape and ggplot2 using the following code. Because each of the categories have a different number of variables, however, this produces a lot of empty space in the plots.
library(reshape)
library(ggplot2)
v1 <- replicate(7,rnorm(25))
v2 <- replicate(15, rnorm(25))
v3 <- replicate(11, rnorm(25))
v4 <- replicate(9, rnorm(25))
v5 <- replicate(9, rnorm(25))
v <- list(v1,v2,v3, v4, v5)
ggplot(melt(v), aes(x=X1, y=X2)) + facet_wrap(~ L1, ncol=1) +
geom_tile(aes(fill=value)) + ggtitle("Title") +
theme(plot.title = element_text(lineheight=2, face="bold"))
What is a better way of producing plots I need in one window without all the unnecessary blank space? Note that I originally tried to do this using the levelplot function in the lattice package. However, the only way I could figure out was to print each individual levelplot, which produced a color key and title for each plot (not what I wanted).
Is this what you are looking for??
You can get rid of the blank space using scales="free_y" in the call to facet_wrap(...). This forces each facet to have it's own y-axis, but does not force the display of a separate x-axis on each facet. I also added a different color scale (take it out if you prefer the default).
library(ggplot2)
library(reshape2)
library(RColorBrewer)
ggplot(melt(v), aes(x=X1, y=X2)) +
facet_wrap(~ L1, ncol=1,scales="free_y") +
geom_tile(aes(fill=value)) + ggtitle("Title") +
scale_fill_gradientn(colours=rev(brewer.pal(9,"Spectral")))+
theme(plot.title = element_text(lineheight=2, face="bold"))

want to layer aes in ggplot2

I would like to plot another series of data on top of a current graph. The additional data only contains information for 3 (out of 6) spp, which are used in the facet_wraping.
The other series of data is currently a column (in the same data file).
Current graph:
ped.num <- ggplot(data, aes(ped.length, seeds.inflorstem))
ped.num + geom_point(size=2) + theme_bw() + facet_wrap(~spp, scales = "free_y")
Additional layer would be:
aes(ped.length, seeds.filled)
I feel I should be able to plot them using the same y-axis, because they have just slightly smaller values. How do I go about add this layer?
#ialm 's solution should work fine, but I recommend calling the aes function separately in each geom_* because it makes the code easier to read.
ped.num <- ggplot(data) +
geom_point(aes(x=ped.length, y=seeds.inflorstem), size=2) +
theme_bw() +
facet_wrap(~spp, scales="free_y") +
geom_point(aes(x=ped.length, y=seeds.filled))
(You'll always get better answers if you include example data, but I'll take a shot in the dark)
Since you want to plot two variables that are on the same data.frame, it's probably easiest to reshape the data before feeding it into ggplot:
library(reshape2)
# Melting data gives you exactly one observation per row - ggplot likes that
dat.melt <- melt(dat,
id.var = c("spp", "ped.length"),
measure.var = c("seeds.inflorstem", "seeds.filled")
)
# Plotting is slightly different - instead of explicitly naming each variable,
# you'll refer to "variable" and "value"
ggplot(dat.melt, aes(x = ped.length, y = value, color = variable)) +
geom_point(size=2) +
theme_bw() +
facet_wrap(~spp, scales = "free_y")
The seeds.filled values should plot only on the facets for the corresponding species.
I prefer this to Drew's (totally valid) approach of explicitly mapping different layers because you only need a single geom_point() whether you have two variables or twenty and it's easy to map a variety of aesthetics to variable.

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

How can I change the colors in a ggplot2 density plot?

Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green

Resources