I'm building a bar plot with ggplot2 and the code works fine until I add error bars with geom_errorbar. My dataset consists of two factors [Sex(two levels) and Time(seven levels)] and several dependent continuous variables. ABA.mean is the mean ABA.se is the standard error.
Data structure
Here's the code for the plot (I made sure Sex and Time were factors).
p<- ggplot(data=sex.data1, aes(x=Time, y=ABA.mean, ymin=ABA.mean-ABA.se, ymax=ABA.mean+ABA.se))
p1<-p + geom_bar(aes(fill=Sex), stat="identity",
position="dodge")+ geom_errorbar(aes(color=Sex), position="dodge")
And here's the plot:
output of bar plot with error bars:
Here's also some data (not showing all data to facilitate comprehension)
dput(sex.data1)
structure(list(Sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor"),Time = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,4L, 5L, 6L, 7L), .Label = c("1", "2", "3", "4", "5", "6",
"7"), class = "factor"), RWC.mean = c(46.87233333, 56.971,
5.884, 6.562666667, 10.30466667, 80.95266667, 79.22333333,
72.04366667, 80.87166667, 77.15266667, 6.962, 8.733, 86.051,
84.586), ABA.mean = c(9.532666667, 322.969, 28.4, 30.15066667,
45.529, 46.298, 18.60933333, 13.838, 46.31466667, 202.3803333,
10.5005, 16.637, 17.64466667, 6.595333333),RWC.se = c(6.428766324,19.39234553, 2.152576673, 0.328793924, 1.972588936, 1.542849888,4.434089322, 8.443211501, 3.087210679, 5.593021853, 0.574815043,NA, 9.684611522, 1.546559515), ABA.se = c(2.654699878, 89.919,11.59730729, 10.52325178, 24.42691451, 29.76969347, 8.154232119,4.295445767, 21.57449026, 132.4679665, 1.1755, NA, 9.29181176,3.315272605)
However, when I compute the plot without the geom_errobar, the bars appear.
p<-ggplot(sex.data1, aes(x=Time, y=ABA.mean, fill=Sex))
p+geom_bar(stat="identity", position=position_dodge())
I'm guessing there's something wrong with the code of geom_errorbar.
Many thanks in advance!
Your plotting code looks fine to me, but your dput formatting was a bit strange. I had fix the syntax in the data, so it seems like that might have been the issue (ie format/syntax of your input data). The code you posted produces the plot just fine:
library(ggplot2)
ggplot(data=sex.data1, aes(x=Time, y=ABA.mean, ymin=ABA.mean-ABA.se, ymax=ABA.mean+ABA.se)) +
geom_bar(aes(fill=Sex), stat="identity", position="dodge") +
geom_errorbar(aes(color=Sex), position="dodge")
data:
sex.data1 <- data.frame(
Sex = c("F", "F", "F", "F", "F", "F", "F", "M", "M", 'M', "M", "M", 'M', "M"),
Time = c("1", "2", "3", "4", "5", "6", "7"),
RWC.mean = c(46.87233333, 56.971, 5.884, 6.562666667, 10.30466667, 80.95266667, 79.22333333, 72.04366667, 80.87166667, 77.15266667, 6.962, 8.733, 86.051, 84.586),
ABA.mean = c(9.532666667, 322.969, 28.4, 30.15066667, 45.529, 46.298, 18.60933333, 13.838, 46.31466667, 202.3803333, 10.5005, 16.637, 17.64466667, 6.595333333),
RWC.se = c(6.428766324,19.39234553, 2.152576673, 0.328793924, 1.972588936, 1.542849888,4.434089322, 8.443211501, 3.087210679, 5.593021853, 0.574815043,NA, 9.684611522, 1.546559515),
ABA.se = c(2.654699878, 89.919,11.59730729, 10.52325178, 24.42691451, 29.76969347, 8.154232119,4.295445767, 21.57449026, 132.4679665, 1.1755, NA, 9.29181176,3.315272605))
Related
Dataframe
df <- data.frame(
structure(list(biological_group = structure(1:15, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N",
"O"), class = "factor"), norm_expression = c(2L, 3L, 4L, 6L,
1L, 5L, 7L, 8L, 9L, 3L, 2L, 6L, 7L, 8L, 1L), SE = c(0.171499719,
0.089692493, 0.153777208, 0.188012958, 0.153776128, 0.192917199,
0.224766056, 0.231338325, 0.121716906, 0.094763028, 0.09635363,
0.069986333, 0.113681329, 0.094614957, 0.391182473), Group = structure(c(1L,
1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"Plant Products", "Sugars"), class = "factor")), class = "data.frame", row.names = c(NA,
-15L), .Names = c("biological_group", "norm_expression", "SE",
"Group")))
Sample code
library(ggplot2)
DFplot <- ggplot(df, aes(biological_group,norm_expression)) +
ylab("Relative Normalized\nExpression (∆∆Cq)") +
geom_bar(fill="black",stat="identity") +
theme(axis.title.x = element_blank())
DFplot2 <- DFplot+geom_errorbar(aes(ymin=norm_expression-SE,ymax=norm_expression+SE),width=0.5) +
geom_boxplot() +
facet_grid(~Group, scales = "free_x", switch = "x", space = "free_x") +
scale_y_continuous(expand = c(0,0), labels = scales::number_format(accuracy = 0.1)) +
theme_classic()
That gives me this graph:
I'd like to remove the vertical lines from the strip text labels (specified by strip.background), like this:
I realize I could just use photoshop or something, but I have several graphs to make so it would be easier if I could just specify it in the code.
This answer was helpful.
In your case, you want to find the strip-b for your bottom strips to substitute.
Edit: Replaced top bar of bottom strip that was removed accidentally.
library(grid)
q <- ggplotGrob(DFplot2)
lg <- linesGrob(x=unit(c(1,0),"npc"), y=unit(c(1,1),"npc"), gp=gpar(col="black", lwd=2))
for (k in grep("strip-b",q$layout$name)) {
q$grobs[[k]]$grobs[[1]]$children[[1]] <- lg
}
grid.draw(q)
I am a newbie to R and have been struggling like crazy to visualize a 3 way table as a heat map using geom_tile in R. I can easily do this in Excel, but cannot find any examples of how to do this in R. I have looked at using Mosaics but this is not what I want and I have found hundreds of examples of two way tables, but seems there are no examples of three way tables.
I want the output to look like this:
my data set looks like this: (its a small snapshot of 30,000 records):
xxx <- structure(list(rfm_score = c(111, 112, 113, 114, 115, 121), n = c(2624L,
160L, 270L, 23L, 5L, 650L), rec = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
freq = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1",
"2", "3", "4", "5"), class = "factor"), mon = structure(c(1L,
2L, 3L, 4L, 5L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
It is essentially an RFM analysis of customer shopping behavior (Recency, Frequency and Monetary). The output heat map (that I want) should be the count of customers in each RFM segments. In the heat map I supplied, you will see there are two variables on the left (e.g. R = Recency(quintile ranges 1 to 5) and F = Frequency (quintile ranges 1 to 5)and at the top of the heat map is the M = monetary variable (quintile ranges 1 to 5). So, for instance, the segment RFM = 555 has a count of 2511 customers.
I have tried the following code and variations of it, but just get errors
library(ggplot2)
library(RColorBrewer)
library(dplyr)
cols <- rev(brewer.pal(11, 'RdYlBu'))
ols <- brewer.pal(9, 'RdYlGn')
ggplot(xxx)+ geom_tile(aes(x= mon, y = reorder(freq, desc(freq)), fill = n)) +
theme_change +
facet_grid(rec~.) +
# geom_text(aes(label=n)) +
# scale_fill_gradient2(midpoint = (max(xxx$n)/2), low = "red", mid = "yellow", high = "darkgreen") +
# scale_fill_gradient(low = "red", high = "blue") + scale_fill_gradientn(colours = cols) +
# scale_fill_brewer() +
labs(x = "monetary", y= "frequency") +
scale_x_discrete(expand = c(0,0)) + scale_y_discrete(expand = c(0,0)) +
coord_fixed(ratio= 0.5)
I have no idea how to to create this heat map in R. Can anyone please help me..
Kind regards
Heinrich
You can use DT and formattable package to make table with conditional colour formatting:
library(DT)
library(formattable)
xxx <- data.frame(rfm_score = c(111, 112, 113, 114, 115, 121),
n = c(2624L, 160L, 270L, 23L, 5L, 650L),
rec = c(1L, 1L, 1L, 1L, 1L, 1L),
freq = c(1L, 1L, 1L, 1L, 1L, 2L),
mon = c(1L, 2L, 3L, 4L, 5L, 1L))
xxx_dt <- formattable(
xxx,
list(
rfm_score = color_tile("pink", "light blue"),
n = color_tile("pink", "light blue"),
rec = color_tile("pink", "light blue"),
freq = color_tile("pink", "light blue"),
mon = color_tile("pink", "light blue")))
as.datatable(xxx_dt)
Output:
I am trying to make a ggplot. When I had shape in aesthetics, the code was working just fine. However, I need to put shape in geom_point() because I'm trying to reproduce a figure. And when I added shape to geom_point() it gave me the following error:
Aesthetics must be either length 1 or the same as the data (6): shape
I've looked for other answers here but apparently, nothing seems to be working for me. Above I've provided with an image of what my data looks like. There are 17000 entries.
Below is my code:
summarised_data <-ddply(mammals,c('mammals$chr','mammals$Species','mammals$chrMark'),
function (x) c(median_rpkm = median(x$RPKM), median = median(x$dNdS)))
ggplot(summarised_data,aes(x = summarised_data$median_rpkm, y = summarised_data$median,
color = summarised_data$`mammals$Species`)) + geom_smooth(se = FALSE, method = "lm") +
geom_point(shape = summarised_data$`mammals$chrMark`) + xlab("median RPKM") + ylab("dNdS")
"ENSG00000213221", "ENSG00000213341", "ENSG00000213380", "ENSG00000213424",
"ENSG00000213533", "ENSG00000213551", "ENSG00000213619", "ENSG00000213626",
"ENSG00000213699", "ENSG00000213782", "ENSG00000213949", "ENSG00000214013",
"ENSG00000214338", "ENSG00000214357", "ENSG00000214367", "ENSG00000214517",
"ENSG00000214814", "ENSG00000215203", "ENSG00000215305", "ENSG00000215367",
"ENSG00000215440", "ENSG00000215897", "ENSG00000221947", "ENSG00000222011",
"ENSG00000224051", "ENSG00000225830", "ENSG00000225921", "ENSG00000239305",
"ENSG00000239474", "ENSG00000239900", "ENSG00000241058", "ENSG00000242247",
"ENSG00000242612", "ENSG00000243646", "ENSG00000244038", "ENSG00000244045"),
class = "factor"), Species = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("Chimp", "Gori", "Human", "Maca",
"Mouse", "Oran"), class = "factor"), labs = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Chimp-A", "Chimp-X",
"Gori-A", "Gori-X", "Human-A", "Human-X", "Maca-A", "Maca-X",
"Mouse-A", "Mouse-X", "Oran-A", "Oran-X"), class = "factor"),
chrMark = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("A", "X"), class = "factor"), chr = structure(c(27L,
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L), .Label = c("1",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"2", "20", "21", "22", "2a", "2A", "2b", "2B", "3", "4",
"5", "6", "7", "8", "9", "X"), class = "factor"), dN = c(3.00669,
3.27182, 7.02044, 1.01784, 3.0363, 2.32786, 4.92959, 3.03753,
3.0776, 1.02147), dS = c(3.15631, 5.87147, 3.13716, 2.05438,
4.10205, 5.24764, 4.2014, 3.18086, 5.4942, 3.02169), dNdS = c(0.9525965447,
0.5572403504, 2.2378329444, 0.4954487485, 0.7401908802, 0.4436013141,
1.1733207978, 0.954939859, 0.5601543446, 0.3380459279), RPKM = c(31.6,
13.9, 26.3, 9.02, 11.3, 137, 242, 1.05, 59.4, 10.1), Tau = c(0.7113820598,
0.8391023102, 0.3185943152, 0.6887167806, 0.9120531859, 0.6254200542,
0.7165302682, 0.7257435312, 0.2586613298, 0.6493567251),
GC3 = c(0.615502, 0.622543, 0.393064, 0.490141, 0.461592,
0.626407, 0.490305, 0.482853, 0.346424, 0.466484)), .Names = c("gene",
"Species", "labs", "chrMark", "chr", "dN", "dS", "dNdS", "RPKM",
"Tau", "GC3"), row.names = c(NA, 10L), class = "data.frame")
There's a few things wrong with your code and how ggplot handles non-standard evaluation, I'd recommend reading a ggplot tutorial or the docs. Having a column called within summarised_data called 'mammals$species' and 'mammals$chrMark' is going to cause lots of problems.
If we change these to something more sensible...
names(summarised_data)[names(summarised_data) == "mammals$species"] <- "mammals_species"
names(summarised_data)[names(summarised_data) == "mammals$chrMark"] <- "mammals_chrMark"
We can make the ggplot code more friendly. Note that shape has to been within aes, as you're mapping it to your data.
ggplot(summarised_data, aes(x = median_rpkm, y = median)) +
geom_smooth(se = FALSE, method = "lm") +
geom_point(aes(shape = mammals_chrMark,
color = mammals_species)) +
xlab("median RPKM") + ylab("dNdS")
Hopefully this should work, or at least get you somewhere closer to an answer.
I am trying to create this type of chart from the data on the left (arbitrary values for simplicity):
The goal is to plot variable X on the x-axis with the mean on the Y-axis and error bars equal to the standard error se.
The problem is that values 1-10 should be each be represented individually (blue curve), and that the values for A and B should be plotted on each of the 1-10 values (green and red line).
I can draw the curve if I manually save the data and manually copy the values for A and B to each value for X but this is not very time efficient. Is there a more elegant way to do this?
Thanks in advance!
EDIT: As suggested the code:
df <- structure(list(X = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 2L, 11L, 12L), .Label = c("1", "10", "2", "3", "4", "5",
"6", "7", "8", "9", "A", "B"), class = "factor"), mean = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 5.5, 6.5), sd = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), se = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("X", "mean", "sd", "se"), class = "data.frame", row.names = c(NA,-12L))
df<-as.data.frame(df)
df$X<-factor(df$X)
plot <- ggplot(df, aes(x=df$X, y=df$mean)) + geom_point() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1)
plot
Im afraid I don't know ggplot, but hopefully this is what you want (it might also aid others in understanding your question).
You want a ggplot with three lines,
1. df$X,df$mean
2. df$X,df$row_A_mean
3. df$X,df$row_B_mean
4. error bars of the SE column
df <- structure(list(X = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 2L, 11L, 12L), .Label = c("1", "10", "2", "3", "4", "5",
"6", "7", "8", "9", "A", "B"), class = "factor"), mean = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 5.5, 6.5), sd = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), se = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("X", "mean", "sd", "se"), class = "data.frame", row.names = c(NA,-12L))
df<-as.data.frame(df)
df$X<-factor(df$X)
plot <- ggplot(df, aes(x=df$X, y=df$mean)) + geom_point() + geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1)
plot
#row A mean
df$row_A_mean<-rep(df[11,]$mean,nrow(df))# note that this could also be replaces by a horizontal line, unless the mean changes
#row A sd
df$row_A_sd<-rep(df[11,]$sd,nrow(df))
plot(as.numeric(df$X),df$mean,type="p",col="red")
lines(as.numeric(df$X),df$row_A_mean,col="green")
If we use a subset to define the data elements of the ggplot, we can come up with one solution using geom_hline:
theme_set(theme_bw())
ggplot(data = df[1:10,])+
geom_errorbar(aes(x = X, ymin = mean - se, ymax = mean + se))+
geom_point(aes(x = X, y = mean))+
geom_line(aes(x = X, y = mean), group = 1)+
geom_hline(data = df[11,], aes(yintercept = mean, colour = 'A'))+
geom_hline(data = df[12,], aes(yintercept = mean, colour = 'B'))
It's helpful to reorient your data into long form so that you can really utilize the aesthetic part of ggplot. Generally I would use reshape2::melt for this, but your data the way it's currently formatted doesn't really lend itself to it. I'll show you what I mean by long form and you can get the idea what we're shooting for:
#setting variables for your classes so it's a bit more scalable - reset as applicable
x.seriesLength <- 10
x.class.name <- "X" #name of the main series class; X in your example
a.vec <- c(5.5, 1, 1, "A")
b.vec <- c(6.5, 1, 1, "B")
#trimming df so we can reshape
df <- df[1:x.seriesLength, 2:4]
df$class <- x.class.name #adding class column
#converting your static A and B values to long form, sending to a data.frame and adding to df
add <- matrix(c(rep(a.vec, times = x.seriesLength),
rep(b.vec, times = x.seriesLength)),
byrow = T,
ncol = 4)
colnames(add) <- c("mean", "sd", "se", "class")
df <- rbind(df, add)
print(df)
Then we need to do a bit more cleaning:
df$rownum <- rep(1:x.seriesLength, times = 3)
df[,1:3] <- sapply(df[,1:3], as.numeric) #casting as numeric
df$barmin <- df$mean - df$sd
df$barmax <- df$mean + df$sd
Now we have a long form data frame with the required data. We can then use the new class column to plot and color multiple series.
#use class column to tell ggplot which points belong to which series
g <- ggplot(data = df) +
geom_point(aes(x = rownum, y = mean, color = class)) +
geom_errorbar(aes(x = rownum, ymin=barmin, ymax=barmax, color = class), width=.1)
g
Edit: If you want lines instead of points, just replace geom_point with geom_line.
I am trying to streamline a process by which I select and copy two columns from an excel worksheet and import them into R, where I further subset them. Here is my issue:
The excel data has multiple sets of data in the same column. So for example: column 1 is [V,1,2,3,4,V,1,2,3,4] and column two is [A,2,4,6,10,A,3,6,9,12] where V and A are the column headers. I tried copying the two relevant columns, then running the following code in R:
testing<-read.clipboard(header=TRUE, sep=" ")
testinga<-testing[1:4,]
the resulting table looks fine, but when plotted in ggplot
ggplot(testing, aes(V,A))+geom_point()
resulting graphs orders my data points by the first number (i.e. the 10 is plotted as a 1)
This is NOT an issue if I simply copy the first data set and import it using read.clipboard
What is going on here, and how do I get around it?
Edit:
# from dput()
testing <- structure(list(V = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), .Label = c("1", "2", "3", "4", "V"), class = "factor"), A = structure(c(3L, 5L, 6L, 1L, 8L, 4L, 6L, 7L, 2L), .Label = c("10", "12", "2", "3", "4", "6", "9", "A"), class = "factor")), .Names = c("V", "A"), class = "data.frame", row.names = c(NA, -9L))
Your problem is that the big data.frame's columns get converted to factors (not numerics) if there are things other than numbers in them, like more column names. You just need to convert back to numeric.
testinga <- testing[1:4, ]
testinga <- sapply(testinga, FUN = function(x){as.numeric(as.character(x))})
Then you should be able to plot just fine.