wrong linking point with lines in ggplot - r

I don't know what I'm missing but I cannot figure out a very simple task. This is a small piece of my dataframe:
dput(df)
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "SOU55", class = "factor"), Depth = c(2L, 4L,
6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L), Value = c(211.8329815,
278.9603866, 255.6111086, 212.6163368, 193.7281895, 200.9584658,
160.9289157, 192.0664419, 174.5951019, 7.162682425)), .Names = c("ID",
"Depth", "Value"), class = "data.frame", row.names = c(NA, -10L
))
What I'm trying to do is simply plotting Depth versus Value with ggplot, this is the simple code:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_line()
and this the result:
But it is pretty different from what I really want. This is the plot made with Libreoffice:
It seems that ggplot doesn't link correctly the values. What am I doing wrong?
Thanks to all!

You need geom_path() to connect the observations in the original order. geom_line() sorts the data according to the x-aesthetic before plotting:
ggplot(df, aes(Value, Depth))+
geom_point()+
geom_path()

Related

What is the best way to use agricolae to do ANOVAs on a split plot design?

I'm trying to run some ANOVAs on data from a split plot experiment, ideally using the agricolae package. It's been a while since I've taken a stats class and I wanted to be sure I'm analyzing this data correctly, so I did some searching online and couldn't really find consistency in the way people were analyzing their split plot experiments. What is the best way for me to do this?
Here's the head of my data:
dput(head(rawData))
structure(list(ï..Plot = 2111:2116, Variety = structure(c(5L,
4L, 3L, 6L, 1L, 2L), .Label = c("Burbank", "Hodag", "Lamoka",
"Norkotah", "Silverton", "Snowden"), class = "factor"), Rate = c(4L,
4L, 4L, 4L, 4L, 4L), Rep = c(1L, 1L, 1L, 1L, 1L, 1L), totalTubers = c(594L,
605L, 656L, 729L, 694L, 548L), totalOzNoCulls = c(2544.18, 2382.07,
2140.69, 2401.56, 2440.56, 2503.5), totalCWTacNoCulls = c(461.76867,
432.345705, 388.535235, 435.88314, 442.96164, 454.38525), avgLWratio = c(1.260615419,
1.287949374, 1.111981583, 1.08647584, 1.350686661, 1.107173509
), Hollow = c(14L, 15L, 22L, 25L, 14L, 13L), Double = c(10L,
13L, 15L, 22L, 11L, 9L), Knob = c(86L, 80L, 139L, 156L, 77L,
126L), Researcher = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Wang", class = "factor"),
CullsPounds = c(1.75, 1.15, 4.7, 1.85, 0.8, 5.55), CullsOz = c(28,
18.4, 75.2, 29.6, 12.8, 88.8), totalOz = c(2572.18, 2400.47,
2215.89, 2431.16, 2453.36, 2592.3), totalCWTacCulls = c(466.85067,
435.685305, 402.184035, 441.25554, 445.28484, 470.50245)), row.names = c(NA,
6L), class = "data.frame")
For these data, the whole plot is Rate, the split plot is Variety, the block is Rep, and for discussion's sake here, we can look at totalCWTacNoCulls as the response.
Any help would be very much appreciated! I am still getting the hang of Stack Overflow, so if I have made any mistakes or shared my data wrong, please let me know and I'll change it. Thank you!
You can do this using agricolae package as follows
library(agricolae)
attach(rawData)
Rate = factor(Rate)
Variety = factor(Variety)
Rep = factor(Rep)
sp.plot(Rep, Rate, Variety, totalCWTacNoCulls)
Usage according to agricolae package is
sp.plot(block, pplot, splot, Y)
where, block is replications, pplot is main-plot Factor, splot is sub-plot Factor and Y response variable

Have two colour scales ggplot [duplicate]

This question already has answers here:
Assign color to 2 different geoms and get 2 different legends
(3 answers)
Closed 4 years ago.
I am trying to change have separate colors for my lines and points. My data is split by Arm so at each time-point there should be two dots and two lines connecting them to the previous and future time-point.
I can get both the line and dot colors to change together, but I would like the line to be a different colour, still based on Arm though. As in, I want the lines to be light blue for Arm=1 and yellow for Arm=2, but the dots to stay they color shown below. Is this possible with ggplot?
Any help would be much appreciated.
What I have:
Code:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm))) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()
Data:
TOT <- structure(list(Arm = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
VisitNo = structure(c(0L, 6L, 12L, 16L, 24L, 36L, 0L, 6L, 12L, 16L, 24L, 36L),
label = "VisitNo", class = c("labelled", "integer")),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("PWB", "SWB", "EWB", "FWB", "AC"), class = "factor"),
Mean = c(25.3025326086957, 25.4365119047619, 25.8333333333333, 21.3452380952381,
26, 26.8235294117647, 25.2272727272727, 25.6172839506173,
25.6805555555556, 21.625976744186, 26.24, 26)),
row.names = c(NA, 12L), class = "data.frame")
If you just want the lines to be a bit lighter than the points, you can use alpha to make the lines a bit transparent:
ggplot(head(TOT, 12), aes(x=VisitNo, y=Mean)) +
geom_line(size=1.5, aes(color=as.factor(Arm)), alpha = 0.4) +
geom_point(size=3, aes(color=as.factor(Arm))) +
scale_colour_manual(values = c("blue", "orange")) +
theme_bw()

Plotting multiple effect plots from logistic regression

I have a number of logistic regression models with different response variables but the same predictor variables. I want to use grid.arrange (or anything else) to make a single figure with all these effect plots that were made with the effects package. I followed the advice here to make such a graph: grid.arrange with John Fox's effects plots
library(effects)
library(gridExtra)
data <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L,1L, 1L, 2L, 2L, 2L), .Label = c("group1", "group2"), class = "factor"),obs = c(1L, 1L, 4L, 4L, 6L, 12L, 26L, 1L, 10L, 6L),responseA = c(1L, 1L, 2L, 0L, 1L, 10L, 20L, 0L, 3L, 2L), responseB = c(0L, 0L, 2L, 4L, 6L, 4L, 8L, 1L, 8L, 5L)), .Names = c("group", "obs", "responseA","responseB"), row.names = c(53L, 54L, 55L, 56L, 57L, 58L,59L, 115L, 116L, 117L), class = "data.frame")
model1<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
model2<-glm(cbind(responseA,(obs-responseA))~group,family=binomial, data=data)
ef1 <-allEffects(model1)[[1]]
ef2 <- allEffects(model2)[[1]]
elist <- list( ef1,ef2)
class(elist) <- "efflist"
plot(elist, col=2)
The problem is that, in the models I am using the response variable in the model in the form cbind(response A,no response A), but for the figure I would like to change it to something more clean (like Response A). I tried changing the y labels by putting a list, but got a warning, and it turned both labels into "Response A".
plot(elist, ylab=c("response A","response B"),col=2)
Then tried the second method suggestion to change the class to trellis, got an error, so grid.arrange didn’t work either.
p1<-plot(allEffects(model1),ylab="Response A")
p2<-plot(allEffects(model2),ylab="Response B")
class(p1) <- class(p2) <- "trellis"
grid.arrange(p1, p2, ncol=2)
Can anyone provide a method to change each y-axis label separately?
With the ef1 and ef2 variables you created, you can try the following
plot1 <- plot(ef1, ylab = "Response A")
plot2 <- plot(ef2, ylab = "Response B")
grid.arrange(plot1, plot2, ncol=2)

Add symbol on top of ggplot2 boxplots to indicate value of variable

Working with the following subset of a much larger dataset,
ex <- structure(list(transect_id = c(1L, 1L, 1L, 1L, 1L, 15L, 15L,
15L, 15L, 15L, 15L), number_f = c(2L, 2L, 2L, 2L, 2L, 0L, 0L,
0L, 0L, 0L, 0L), years_f = c(1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L,
6L, 6L, 6L), b = c(5.036625862, 6.468666553, 8.028989792, 4.168409348,
5.790089607, 10.67796993, 9.371051788, 10.54364777, 6.904324532,
7.203606129, 9.1611166)), .Names = c("transect_id", "number_f",
"years_f", "b"), class = "data.frame", row.names = c(1L, 2L,
3L, 4L, 5L, 2045L, 2046L, 2047L, 2048L, 2049L, 2050L))
I've plotted the distributions of "b" for each of the groups indicated by "transect_id" and have colored them by "number_f", which I do here:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) + geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')
What I need to do for each of the "transect_id" groups is stack symbols - asterisks or some other symbol - on top of each boxplot to provide an indication of the value of "years_f" that corresponds to each "transect_id". In the data subset below, "years_f" amounts to 1 and 6 for transect_ids 1 and 15, respectively. I'd like to see something like this, which I manually mocked up.
Also keep in mind that the dataset I'm working with is very large so I'll need to use some loop or some other way of doing this automatically. Please note that I absolutely welcome other ideas for better ways of indicating the value of "years_f" that might not overburden the figure as much as having all of these stacked symbols that will particularly be an issue for larger values of "years_f".
Try adding
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
to the end of your plot like so:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = c(1, 2), y = 3, label = paste0('Year_F =', unique(ex$years_f)))
To use it on a bigger dataset you would have to edit the x and y argument, but this might be a decent alternative. A possibility for the y coordinate could be something like 0.9 * min(ex$b).
edit In response to your comment:
You could first count how many levels there are of transect_id to specify x
len.levels <- length(levels(as.factor(ex$transect_id)))
then, you could create a summary table of the uniqe years_f variable by transect_id:
sum.table <- aggregate(years_f~reorder(ex$transect_id, ex$b, median),
data = ex, FUN = unique)
reorder(ex$transect_id, ex$b, median) years_f
1 1 1
2 15 6
and then plot as follows:
ggplot(aes(x=reorder(transect_id, b, FUN=median), y=b), data=ex) +
geom_boxplot(aes(fill=as.factor(number_f))) + xlab('Transect ID')+
annotate('text', x = 1:len.levels, y = .9 * min(ex$b),
label = paste0('Year_F =', sum.table[,2]))

GGPlot geom_text coloring with facets

Hopefully someone here will be able to help me with a problem that I'm having with a ggplot script I'm trying to get right. The script will be used many times with different data, so it needs to be relatively flexible. I've got it almost where I want it, but I've come across a problem I haven't been able to solve.
The script is for a line graph with labels for each line in the right hand margin. Sometimes the graph is faceted, other times it is not.
The piece I'm having trouble with is that I would like to color code the labels in the right margin as black if there was no significant change over time, green if there was positive change, and red if there was negative change. I've got a script that works to carry this out when I only have a single facet, but as soon as I have multiple facets in the graph, the color coding of the labels gives the following error
Error: Incompatible lengths for set aesthetics:
Below is the script with data with multiple facets. The problem seems to be in the way that I'm specifying color in the geom_text line. If I delete the color call in the geom_text line in the script, then I get the attributes printed in the correct place, just not colored. I'm really at a loss on this one. This is my first post here, so let me know if I've done anything wrong with my post.
WITH MULTIPLE FACETS (DOES NOT WORK)
require(ggplot2)
require(grid)
require(zoo)
require(reshape)
require(reshape2)
require(directlabels)
time.data<-structure(list(Attribute = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L), .Label = c("Taste 1", "Taste 2", "Taste 3",
"Use 1", "Use 2", "Use 3"), class = "factor"), Attribute.Category = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Nutritional/Usage",
"Taste/Quality"), class = "factor"), Attribute.Order = c(1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), Category.Order = c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), Color = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), .Label = c("#084594",
"#2171B5", "#4292C6", "#6A51A3", "#807DBA", "#9E9AC8"), class = "factor"),
value = c(75L, 78L, 90L, 95L, 82L, 80L, 43L, 40L, 25L, 31L,
84L, 84L), Date2 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L), .Label = c("1/1/2013", "9/1/2012"), class = "factor")), .Names = c("Attribute",
"Attribute.Category", "Attribute.Order", "Category.Order", "Color",
"value", "Date2"), class = "data.frame", row.names = c(NA, -12L
))
label.data<-structure(list(7:12, Attribute = structure(1:6, .Label = c("Taste 1",
"Taste 2", "Taste 3", "Use 1", "Use 2", "Use 3"), class = "factor"),
Attribute.Category = structure(c(2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Nutritional/Usage",
"Taste/Quality"), class = "factor"), Attribute.Order = 1:6,
Category.Order = c(1L, 1L, 1L, 2L, 2L, 2L), Color = structure(1:6, .Label = c("#084594",
"#2171B5", "#4292C6", "#6A51A3", "#807DBA", "#9E9AC8"), class = "factor"),
Significance = structure(c(2L, 3L, 1L, 1L, 3L, 2L), .Label = c("neg",
"neu", "pos"), class = "factor"), variable = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "1/1/2013", class = "factor"),
value = c(78L, 95L, 80L, 40L, 31L, 84L), Date2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "2013-01-01", class = "factor"),
label.color = structure(c(1L, 2L, 3L, 3L, 2L, 1L), .Label = c("black",
"forestgreen", "red"), class = "factor")), .Names = c("",
"Attribute", "Attribute.Category", "Attribute.Order", "Category.Order",
"Color", "Significance", "variable", "value", "Date2", "label.color"
), class = "data.frame", row.names = c(NA, -6L))
color.palette<-as.character(unique(time.data$Color))
time.data$Date2<-as.Date(time.data$Date2,format="%m/%d/%Y")
plot<-ggplot()+
geom_line(data=time.data,aes(as.numeric(time.data$Date2),time.data$value,group=time.data$Attribute,color=time.data$Color),size=1)+
geom_text(data=label.data,aes(x=Inf, y=label.data$value, label=paste(" ",label.data$Attribute)),
color=label.data$label.color,
size=4,vjust=0, hjust=0,na.rm=T)+
facet_grid(Attribute.Category~.,space="free")+
theme_bw()+
scale_x_continuous(breaks=as.numeric(unique(time.data$Date2)),labels=format(unique(time.data$Date2),format = "%b %Y"))+
theme(strip.background=element_blank(),
strip.text.y=element_blank(),
legend.text=element_blank(),
legend.title=element_blank(),
plot.margin=unit(c(1,5,1,1),"cm"),
legend.position="none")+
scale_colour_manual(values=color.palette)
gt3 <- ggplot_gtable(ggplot_build(plot))
gt3$layout$clip[gt3$layout$name == "panel"] <- "off"
grid.draw(gt3)
Some problems:
Inside your aesthetic declarations, you should not be referencing the data columns as time.data$Date2, but just as Date2. The data argument specifies where to look for that information (which needs to all be in the same data.frame for a given layer, but, as you take advantage of, can vary layer to layer).
In the geom_text call, color was not inside the aes call; if you are mapping it to data which is in the data.frame, you have to have it inside the aes call. This would throw a different error after fixing the first part because then it would not be able to find label.color anywhere because it would not know to look inside label.data.
Fixing those, then the scale_colour_manual complains that there are 9 colors and you have only supplied 6. That is because there are 6 colors from the lines and 3 from the text. Since you specified these as actual color names, you can just use scale_colour_identity.
Putting this all together:
plot <- ggplot()+
geom_line(data=time.data, aes(as.numeric(Date2), value,
group=Attribute, color=Color),
size=1)+
geom_text(data=label.data, aes(x=Inf, y=value,
label=paste(" ",Attribute),
color=label.color),
size=4,vjust=0, hjust=0)+
facet_grid(Attribute.Category~.,space="free") +
scale_x_continuous(breaks=as.numeric(unique(time.data$Date2)),
labels=format(unique(time.data$Date2),format = "%b %Y")) +
scale_colour_identity() +
theme_bw()+
theme(strip.background=element_blank(),
strip.text.y=element_blank(),
legend.text=element_blank(),
legend.title=element_blank(),
plot.margin=unit(c(1,5,1,1),"cm"),
legend.position="none")
gt3 <- ggplot_gtable(ggplot_build(plot))
gt3$layout$clip[gt3$layout$name == "panel"] <- "off"
grid.draw(gt3)
To get an idea how much you can strip down your example, this is much closer to minimal:
time.data <-
structure(list(Attribute = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L), .Label = c("Taste 1", "Taste 2", "Use 1", "Use 2"), class = "factor"),
Attribute.Category = structure(c(2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L), .Label = c("Nutritional/Usage", "Taste/Quality"), class = "factor"),
Color = c("#084594", "#084594", "#2171B5", "#2171B5", "#6A51A3",
"#6A51A3", "#807DBA", "#807DBA"), value = c(75L, 78L, 90L,
95L, 43L, 40L, 25L, 31L), Date2 = structure(c(15584, 15706,
15584, 15706, 15584, 15706, 15584, 15706), class = "Date")), .Names = c("Attribute",
"Attribute.Category", "Color", "value", "Date2"), row.names = c(NA,
-8L), class = "data.frame")
label.data <-
structure(list(value = c(78L, 95L, 40L, 31L), Attribute = structure(1:4, .Label = c("Taste 1",
"Taste 2", "Use 1", "Use 2"), class = "factor"), label.color = c("black",
"forestgreen", "red", "forestgreen"), Attribute.Category = structure(c(2L,
2L, 1L, 1L), .Label = c("Nutritional/Usage", "Taste/Quality"), class = "factor"),
Date2 = structure(c(15706, 15706, 15706, 15706), class = "Date")), .Names = c("value",
"Attribute", "label.color", "Attribute.Category", "Date2"), row.names = c(NA,
-4L), class = "data.frame")
ggplot() +
geom_line(data = time.data,
aes(x=Date2, y=value, group=Attribute, colour=Color)) +
geom_text(data = label.data,
aes(x=Date2, y=value, label=Attribute, colour=label.color),
hjust = 1) +
facet_grid(Attribute.Category~.) +
scale_colour_identity()
The theme stuff (and the making the labels visible outside the plot) isn't relevant to the question, nor is the x-axis conversions from Date to numeric to handle having Inf. I also trimmed the data to just the needed columns, and reduced categorical variable to only two categories.

Resources