ggplot2: overlay control group line on graph panel set - r

I have a stacked areaplot made with ggplot2:
dists.med.areaplot<-qplot(starttime,value,fill=dists,facets=~groupname,
geom='area',data=MDist.median, stat='identity') +
labs(y='median distances', x='time(s)', fill='Distance Types')+
opts(title=subt) +
scale_fill_brewer(type='seq') +
facet_wrap(~groupname, ncol=2) + grect #grect adds the grey/white vertical bars
It looks like this:
I want to add a an overlay of the profile of the control graph (bottom right) to all the graphs in the output (groupname==rowH is the control).
So far my best efforts have yielded this:
cline<-geom_line(aes(x=starttime,y=value),
data=subset(dists.med,groupname=='rowH'),colour='red')
dists.med.areaplot + cline
I need the 3 red lines to be 1 red line that skims the top of the dark blue section. And I need that identical line (the rowH line) to overlay each of the panels.
The dataframe looks like this:
> str(MDist.median)
'data.frame': 2880 obs. of 6 variables:
$ groupname: Factor w/ 8 levels "rowA","rowB",..: 1 1 1 1 1 1 1 1 1 1 ...
$ fCycle : Factor w/ 6 levels "predark","Cycle 1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ fPhase : Factor w/ 2 levels "Light","Dark": 2 2 2 2 2 2 2 2 2 2 ...
$ starttime: num 0.3 60 120 180 240 300 360 420 480 540 ...
$ dists : Factor w/ 3 levels "inadist","smldist",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 110 117 115 113 114 ...
The red line should be calculated as the sum of the value at each starttime, where groupname='rowH'. I have tried creating cline the following ways. Each results in an error or incorrect output:
#sums the entire y for all points and makes horizontal line
cline<-geom_line(aes(x=starttime,y=sum(value)),data=subset(dists.med,groupname=='rowH'),colour='red')
#using related dataset with pre-summed y's
> cline<-geom_line(aes(x=starttime,y=tot_dist),data=subset(t.med,groupname=='rowH'))
> dists.med.areaplot + cline
Error in eval(expr, envir, enclos) : object 'dists' not found
Thoughts?
ETA:
It appears that the issue I was having with 'dists' not found has to do with the fact that the initial plot, dists.med.areaplot was created via qplot. To avoid this issue, I can't build on a qplot. This is the code for the working plot:
cline.data <- subset(
ddply(MDist.median, .(starttime, groupname), summarize, value = sum(value)),
groupname == "rowH")
cline<-geom_line(data=transform(cline.data,groupname=NULL), colour='red')
dists.med.areaplot<-ggplot(MDist.median, aes(starttime, value)) +
grect + nogrid +
geom_area(aes(fill=dists),stat='identity') +
facet_grid(~groupname)+ scale_fill_brewer(type='seq') +
facet_wrap(~groupname, ncol=2) +
cline
resulting in this graphset:

This Learning R blog post should be of some help:
http://learnr.wordpress.com/2009/12/03/ggplot2-overplotting-in-a-faceted-scatterplot/
It might be worth computing the summary outside of ggplot with plyr.
cline.data <- ddply(MDist.median, .(starttime, groupname), summarize, value = sum(value))
cline.data.subset <- subset(cline.data, groupname == "rowH")
Then add it to the plot with
last_plot() + geom_line(data = transform(cline.data.subset, groupname = NULL), color = "red")

Related

Why is my boxplot on R appearing with 1 box when the factor has 3 levels?

I am trying to carry out a nested ANOVA test on some data and have been following an R tutorial. To visualise the data to start, I am creating a boxplot, but only 1 box is appearing on the x axis for "location" when there are 3 locations in the data.
All data has been turned into factors using "as.factor"
> str(ANOVADATArobin)
tibble [105 × 4] (S3: tbl_df/tbl/data.frame)
$ location : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
$ site : Factor w/ 12 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
$ repeat : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
$ flighttime: Factor w/ 73 levels "0","3","5","6",..: 73 73 31 57 73 56 73 65 73 73 ...
> boxplot(flighttime-location, xlab="location", ylab="flighttime")
appeared with 1 box in the boxplot enter image description here
Adding "x=factor(location)"
> boxplot(flighttime-location, xlab="location", ylab="flighttime")
created a second line enter image description here
My aim is to create a boxplot like this:enter image description here
Make sure you type a "~" instead of a "-" between the x and y variables.
## Making a reproducible example
location <- c(rep(1:3,length.out=30))
flighttime <- c(sample(5:78,size=30))
ANOVEDATArobin <- data.frame(location, flighttime)
Here is what you wrote, and you get one big box:
boxplot(flighttime - location, xlab="location", ylab="flighttime")
This is what I wrote to get three boxes:
boxplot(flighttime ~ location, xlab="location", ylab="flighttime")
Even, better why not play around with the ggplot package!
ggplot(ANOVEDATArobin)+
aes(x = location, y = flighttime, group = location)+
geom_boxplot()

Where should I do reorder on bargraph to achieve make the bar group same squence as dataframe

I have a dataframe like this:
> str(mydata6)
'data.frame': 6 obs. of 4 variables:
$ Comparison : Factor w/ 6 levels "Decreased_Adult",..: 5 2 6 3 4 1
$ differential_IR_number: num 446 305 965 599 1799 ...
$ Stage : Factor w/ 3 levels "AdultvsE11","E14vsE11",..: 2 2 3 3 1 1
$ Change : Factor w/ 2 levels "Decrease","Increase": 2 1 2 1 2 1
column 1,3,4 are factors and column 2 are numeric
I used the following code to do a bargraph:
ggplot(mydata6, aes(x=Stage, y=differential_IR_number, fill=Change)) + #don't need to use "" for x= and y, comparing to the above code
geom_bar(stat = "identity", position = "stack") + #using stack to make decrease and increase stack with each other
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + #using theme function to change the labeling to be vertical
geom_text(aes(label=differential_IR_number), position=position_stack(vjust=0.5))
The result is following:
But I want the order to be E14vsE11 E18vsE11 and AdultvsE11, I tried to reorder/sort at different positions but none works.
Why it does not following the order of mydataframe?
The order is the one of the levels of the factor. You can set the order you want as follows:
mydata6$Stage <- factor(mydata6$Stage, levels = c("E14vsE11", "E18vsE11", "AdultvsE11"))

ggplot boxplot only shows one box instead of 10, how to fix?

My data is in this format
Responder_status variable value
1. good AHSP 0.01
2. good AHSP 1.16
3. poor AHSP 0.00
4. good HBB 0.25
It keeps going for all 10 variables, a row for each cell (792 cells). So in total I have 7920 rows. Here's the output of str.
'data.frame': 7920 obs. of 3 variables:
$ Responder_status: Factor w/ 3 levels "good","poor",..: 1 1 1 1 1 1 1 1 1 1 ...
$ variable : Factor w/ 10 levels "AHSP","APOC1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 8.76 1.62 10.35 2.58 0 ...
When I plot a boxplot for it like this:
library(ggplot2)
ggplot(data, aes(x=factor(variable), y=value))+ geom_boxplot(aes(fill=factor(Responder_status)))
or like this:
ggplot(data, aes(x=factor(variable), y=value, fill=factor(Responder_status))) + geom_boxplot()
I get the following plot:
Why do I only get the box for my final variable and not for all of them (what I want)?
You can try wrapping fill inside the aesthetic function like below:
library(ggplot2)
ggplot(data, aes(x=factor(variable), y=value, fill=factor(Responder_status)))+
geom_boxplot()

R ggplot - Error stat_bin requires continuous x variable

My table is data.combined with following structure:
'data.frame': 1309 obs. of 12 variables:
$ Survived: Factor w/ 3 levels "0","1","None": 1 2 2 2 1 1 1 1 2 2 ...
$ Pclass : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
$ Name : Factor w/ 1307 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
$ Sex : num 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : Factor w/ 929 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 187 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
$ Embarked: Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
$ Title : Factor w/ 4 levels "Master.","Miss.",..: 3 3 2 3 3 3 3 1 3 3 ...
I want to draw a graph to reflect the relationship between Title and Survived, categorized by Pclass. I used the following code:
ggplot(data.combined[1:891,], aes(x=Title, fill = Survived)) +
geom_histogram(binwidth = 0.5) +
facet_wrap(~Pclass) +
ggtitle ("Pclass") +
xlab("Title") +
ylab("Total count") +
labs(fill = "Survived")
However this results in error: Error: StatBin requires a continuous x variable the x variable is discrete. Perhaps you want stat="count"?
If I change variable Title into numeric: data.combined$Title <- as.numeric(data.combined$Title) then the code works but the label in the graph is also numeric (below). Please tell me why it happens and how to fix it. Thanks.
Btw, I use R 3.2.3 on Mac El Capital.
Graph: Instead of Mr, Miss,Mrs the x axis shows numeric values 1,2,3,4
Sum up the answer from the comments above:
1 - Replace geom_histogram(binwidth=0.5) with geom_bar(). However this way will not allow binwidth customization.
2 - Using stat_count(width = 0.5) instead of geom_bar() or geom_histogram(binwidth = 0.5) would solve it.
extractTitle <- function(Name) {
Name <- as.character(Name)
if (length(grep("Miss.", Name)) > 0) {
return ("Miss.")
} else if (length(grep("Master.", Name)) > 0) {
return ("Master.")
} else if (length(grep("Mrs.", Name)) > 0) {
return ("Mrs.")
} else if (length(grep("Mr.", Name)) > 0) {
return ("Mr.")
} else {
return ("Other")
}
}
titles <- NULL
for (i in 1:nrow(data.combined)){
titles <- c(titles, extractTitle(data.combined[i, "Name"]))
}
data.combined$title <- as.factor(titles)
ggplot(data.combined[1:892,], aes(x = title, fill = Survived))+
geom_bar(width = 0.5) +
facet_wrap("Pclass")+
xlab("Pclass")+
ylab("total count")+
labs(fill = "Survived")
As stated above use geom_bar() instead of geom_histogram, refer sample code given below(I wanted separate graph for each month for birth date data):
ggplot(data = pf,aes(x=dob_day))+
geom_bar()+
scale_x_discrete(breaks = 1:31)+
facet_wrap(~dob_month,ncol = 3)
I had the same issue but none of the above solutions worked. Then I noticed that the column of the data frame I wanted to use for the histogram wasn't numeric:
df$variable<- as.numeric(as.character(df$variable))
Taken from here
I had the same error. In my original code, I read my .csv file with read_csv(). After I changed the file into .xlsx and read it with read_excel(), the code ran smoothly.

Scatterplots in R using lattice and cloud, how to determine colors by factors?

I am still struggling with R plots and colors -- some results are as I expected, some not.
I have a 2-million point data set, generated by a simulation process. There are several variables on the dataset, but I am interested on three and on a factor that describe the class for that data point.
Here is a short snippet of code that reads the points and get some basic statistics on it:
library(lattice)
library(plyr)
myData <- read.table("dados - b1000 n10000 var 0,2 - MAX40.txt",
col.names=c("Class","Thet1Thet2","Thet3Thet2","Thet3Thet1",
"K12","K23","delta","w_1","w_2","w_3"))
count (myData$Class)
That gives me
## x freq
## 1 A 8030
## 2 B 17247
## 3 C 4999
## 4 D 16495
## 5 E 1949884
## 6 N 3345
(the input file is quite large, cannot add it as a link)
I want to see these points in a scatterplot matrix, so I use the code
colors=c("red","green","blue","cyan","magenta","yellow")
# Let's try with a very small dot size, see if we can visualize the inners of the cube.
cloud(myData$delta ~ myData$K12 + myData$K23, xlab="K12", ylab="K23", zlab="delta",
cex=0.001,main="All Classes",col.point = colors[myData$Class])
Here is the result. As expected, points from class E are in vast majority, so I cannot see points of other classes. The problem is that I expected the points to be plotted in magenta (classes are A, B, C, D, E, N; colors are red, green, blue, cyan, magenta, yellow).
When I do the plot class by class it works as expected, see two examples:
data <- subset(myData, Class=="A")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,main="Class A",
col.point = colors[data$Class])
gives this:
And this snippet of code
data <- subset(myData, Class=="E")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,main="Class E",
col.point = colors[data$Class])
gives this:
This also seems as expected: a plot of points of all classes except E.
data <- subset(myData, Class!="E")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,
cex=0.01,main="All Classes (except E)",col.point = colors[data$Class])
The question is, why on the first plot the points are blue instead of magenta?
This question is somehow similar to Color gradient for elevation data in a XYZ plot with R and Lattice but now I am using factors to determine colors on the scatterplot.
I've also read Changing default colours of a lattice plot by factor -- grouping plots by a factor (using the parameter groups.factor=myData$Class) does not solve my problem, plots are still in blue but separated by class.
Edited to add more information: this fake data set can be used for tests.
num <- 10
data <- as.data.frame(
cbind(
x=rep(seq(1,num), each=num*num),
y=rep(seq(1,num), each=num),
z=rep(seq(1,num))
))
# This is ugly but works!
data$Class[data$z==1]<-'A'
data$Class[data$z==2]<-'A'
data$Class[data$z==3]<-'B'
data$Class[data$z==4]<-'B'
data$Class[data$z==5]<-'C'
data$Class[data$z==6]<-'C'
data$Class[data$z==7]<-'D'
data$Class[data$z==8]<-'D'
data$Class[data$z==9]<-'E'
data$Class[data$z==10]<-'E'
str(data)
When I plot it with
colors=c("red","green","blue","cyan","magenta","yellow")
cloud(data$z ~ data$x + data$y, xlab="X", ylab="Y", zlab="Z",main="All Classes",
col.point = colors[data$Class])
I get the plot below. All points are in blue.
JeremyCG found the problem. Here is the complete code that works, for future reference.
library(lattice)
num <- 10
data <- as.data.frame(
cbind(
x=rep(seq(1,num), each=num*num),
y=rep(seq(1,num), each=num),
z=rep(seq(1,num))
))
data$Class[data$z==1]<-'A'
data$Class[data$z==2]<-'A'
data$Class[data$z==3]<-'B'
data$Class[data$z==4]<-'B'
data$Class[data$z==5]<-'C'
data$Class[data$z==6]<-'C'
data$Class[data$z==7]<-'D'
data$Class[data$z==8]<-'D'
data$Class[data$z==9]<-'E'
data$Class[data$z==10]<-'E'
str(data)
That showed the issue:
## 'data.frame': 1000 obs. of 4 variables:
## $ x : int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : int 1 1 1 1 1 1 1 1 1 1 ...
## $ z : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class: chr "A" "A" "B" "B" ...
Class must be a factor. This solved it:
data$Class <- as.factor(data$Class)
str(data)
## 'data.frame': 1000 obs. of 4 variables:
## $ x : int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : int 1 1 1 1 1 1 1 1 1 1 ...
## $ z : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class: Factor w/ 5 levels "A","B","C","D",..: 1 1 2 2 3 3 4 4 5 5 ...
Then plot it:
colors=c("red","green","blue","cyan","magenta","yellow")
cloud(data$z ~ data$x + data$y, xlab="X", ylab="Y", zlab="Z",
pch=20,main="All Classes",col = colors[data$Class])
Here is the result:
Thanks #jeremycg !

Resources