I'm having trouble with position_dodge, when using colours and shapes.
I want to graph results from an experiment in which two treatments are replicated at many sites and I would like to emphasize certain data points graphically.
As the x-axis is a factor, I'd used position_dodge, to separate the treatments. So far so good, see graph 1 below.
However, if I want to emphasize a particular data point by changing the shape, see graph 2. The data points have now been split into three columns, not the two.
Any suggestions on how I would make a graph as pictured in the third panel below.
site <- rep(c("site1"),times=6)
treatment <- rep(c("one","two"),times=2,each=3)
set.seed(21)
response <- c(rnorm(3,mean=4),
rnorm(3,mean=5))
special <- as.factor(c(0,1,0,0,0,0))
mydata <- data.frame(site,treatment,response,special)
#graph 1
ggplot()+
geom_point(data=mydata,
aes(x = site,
y = response,
colour=treatment),
size=4,
position=position_dodge(1))
#graph 2
ggplot()+
geom_point(data=mydata,
aes(x = site,
y = response,
colour=treatment,
shape=special),
size=4,
position=position_dodge(0.5))
Related
I have multiple sources of data over three decades.
The data is discontiguous and overlaps in multiple places. I would like to plot the points for each data source in a different color but then add a single trendline that uses all of the data sources.
The included code has some sample data and two plot examples. The first call to ggplot, plots a single trendline for all of the data. the second ggplot call, plots each source distinctly in different colors with its own trendline.
library(ggplot2)
the.data <- read.table( header=TRUE, sep=",",
text="source,year,value
S1,1976,56.98
S1,1977,55.26
S1,1978,68.83
S1,1979,59.70
S1,1980,57.58
S1,1981,61.54
S1,1982,48.65
S1,1983,53.45
S1,1984,45.95
S1,1985,51.95
S1,1986,51.85
S1,1987,54.55
S1,1988,51.61
S1,1989,52.24
S1,1990,49.28
S1,1991,57.33
S1,1992,51.28
S1,1993,55.07
S1,1994,50.88
S2,1993,54.90
S2,1994,51.20
S2,1995,52.10
S2,1996,51.40
S3,2002,57.95
S3,2003,47.95
S3,2004,48.15
S3,2005,37.80
S3,2006,56.96
S3,2007,48.91
S3,2008,44.00
S3,2009,45.35
S3,2010,49.40
S3,2011,51.19")
ggplot( the.data, aes( the.data$year, the.data$value ) ) + geom_point() + geom_smooth()
#ggplot( the.data, aes( the.data$year, the.data$value, color=the.data$source ) ) + geom_point() + geom_smooth()
The second call displays the colored data points and I would like to add a single contiguous trendline representing all of the years.
Like this:
ggplot(the.data, aes( x = year, y = value ) ) +
geom_point(aes(colour = source)) +
geom_smooth(aes(group = 1))
A few notes:
Don't map aesthetics to an isolated vector like the.data$year. (Until you really know what you're doing, and know when to break that rule.) Just use the column names.
Map the aesthetics that you want in separate layers in their respective geom calls. In this case, I want the points colored differently, but for the smooth line, I want the data grouped all together (group = 1).
Let me introduce my data-set and my preliminary result first for better understanding my question. my dataset looks like:
Place Species Size Conc.
A BT 24 0.2
A ST 76 1.4
...
B BT 45 1.2
B ST 21 0.7
...
I want to make scatterplot of Size against Conc. for each Species at each Place. What I have done uses ggplot2 to make a graph as below:
scatterplot <- ggplot(mydata, aes(x = Size, y = Conc, color = Species)) +
geom_point(shape = 1)
Though this graph plots by the species group in different color, it summarizes all data in the dataset and fails to plot for different places.
I think the code below
scatterplot <- ggplot(mydata[mydata$place == "A"], aes(x = Size, y = Conc, color = Species)) + geom_point(shape = 1)
works for plotting just place A and I can do this for different places one by one. However, in my real dataset, the place variable has tons of different places, and I can't type them all out one by one manually. Thus my question actually is how to let R make those plots for different places automatically at one time?
Try:
ggplot(ddf)+geom_point(aes(Size, Conc.))+facet_grid(Place~Species)
If there are too many places:
ggplot(ddf)+geom_point(aes(Size, Conc., color=Place))+facet_grid(.~Species)
Or, in one graph:
ggplot(ddf)+geom_point(aes(Size, Conc., color=Place,shape=Species), size=5)
I cannot figure out how to add multiple barcharts (or, even better, piecharts) to one plot.
The simplest case would be to add two barcharts at different x,y locations onto a plane.
An application example would be to illustrate both the number of people living in a certain area, and the number of migrants (for lack of better example) living there as well.
By packaging this population information with spatial information, I hope to convey the corresponding information efficiently.
Solutions involving ggmaps are fine, however, I do not require them (displaying the data without a map layer in the background is acceptable).
To be more precise, here is some code, that is not working as I would like it to. In particular, the bar-charts are replaced by rectangles, which are not stacked, but overlap each other, leading to wrongly displayed information.
Furthermore, at each location, the total height of each bar in the bar chart (or size of the pie, for that matter) should correspond to the sum of both parts.
require(ggplot2)
x <- c(1,2,3)
y <- c(3,2,4)
pop <- c(1,7,8)
mig <- c(1,5,2)
df <- rbind(x,y,pop,mig)
df <- t(df)
df <- data.frame(df)
# bring data in long format
require(reshape2)
tmp <- melt(df, id.vars = c("x","y"))
p <- ggplot(tmp, aes(x=x, y=y, fill = variable))
p <- p + geom_rect(aes(xmin = x, xmax = x + 0.1,
ymin = y, ymax = y + value
))
print(p)
Eventually, this should serve as an input into a larger animation, that visualizes temporal development of the variables.
i am totally new in R so maybe the answer to the question is trivial but I couldn't find any solution after searching in the net for days.
I am using ggplot2 to create graphs containing the mean of my samples with the confidence interval in a ribbon (I can't post the pic but something like this: S1
I have a data frame (df) with time in the first column and the values of the variable measured in the other columns (each column is a replicate of the measurement).
I do the following:
mdf<-melt(df, id='time', variable_name="samples")
p <- ggplot(data=mdf, aes(x=time, y=value)) +
geom_point(size=1,colour="red")
stat_sum_df <- function(fun, geom="crosbar", ...) {
stat_summary(fun.data=fun, geom=geom, colour="red")
}
p + stat_sum_df("mean_cl_normal", geom = "smooth")
and I get the graph I have shown at the beginning.
My question is: if I have two different data frames, each one with a different variable, measured in the same sample at the same time, how I can plot the 2 graphs in the same plot? Everything I have tried ends in doing the statistics in the both sets of data or just in one of them but not in both. Is it possible just to overlay the plots?
And a second small question: is it possible to change the colour of the ribbon?
Thanks!
something like this:
library(ggplot2)
a <- data.frame(x=rep(c(1,2,3,5,7,10,15,20), 5),
y=rnorm(40, sd=2) + rep(c(4,3.5,3,2.5,2,1.5,1,0.5), 5),
g = rep(c('a', 'b'), each = 20))
ggplot(a, aes(x=x,y=y, group = g, colour = g)) +
geom_point(aes(colour = g)) +
geom_smooth(aes(fill = g))
I'd suggest you reading the basics of ggplot. Check ?ggplot2 for help on ggplot but also available help topics here and particularly how group aesthetic may be manipulated.
You'll find useful the discussion group at Google groups and maybe join it. Also, QuickR have a lot of examples on ggplot graphs and, obviously, here at Stackoverflow.
I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.