I have eg data below
eg_data <- data.frame(
period = c(sample( c("1 + 2"), 1000, replace = TRUE)),
max_sales = c(sample( c(1:10), 1000, replace = TRUE, prob =
c(.05, .10, .15, .25, .25, .10, .05, .02, .02, .01)))
I want to make a scatter (jitter, actually) plot and add horizontal lines at different points along the y-axis. I want to be able to customize the percentiles at which I add the lines, but for now, something like R's summary function would work just fine.
summary(eg_data$max_sales)
I have the code for a jitter plot below. It runs and produces the graph, but I keep getting the error message:
Each group consists of only one observation. Do you need to adjust the
group aesthetic?
jitter <- (
(ggplot(data = eg_data, aes(x=period, y=max_sales, group = 1)) +
geom_jitter(stat = "identity", width = .15, color = "blue", alpha = .4)) +
scale_y_continuous(breaks= seq(0,12, by=1)) +
geom_line(stat = 'summary', fun.y = "quantile", fun.args=list(probs=0.1)) +
ggtitle("Distribution of Sales by Period") + xlab("Period") + ylab("Sales") +
theme(plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(color = "black", size = 12, face = "bold"),
axis.title.y = element_text(color = "black", size = 12, face = "bold")) +
labs(fill = "Period") )
jitter
I tried looking at this question -
ggplot2 line chart gives "geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
It suggests making all variables numeric. My period variable is a character, I'd like to keep it that way, but even when I convert it to numeric, it still gives me the error.
Any help would be appreciated. Thank you!
Instead of geom_line what you want is geom_hline. In particular, replacing geom_line with
stat_summary(fun.y = "quantile", fun.args = list(probs = c(0.1, 0.2)),
geom = "hline", aes(yintercept = ..y..))
gives
where indeed
quantile(eg_data$max_sales, c(0.1, 0.2))
# 10% 20%
# 2 3
It also eliminates the warning you were getting.
I don't know if this is the most elegant solution, but you can always calculate the summary statistics elsewhere and put it in the plot. This also gives a bit more control over what is happening (for my taste)
hline_coordinates= data.frame(Quantile_Name=names(summary(eg_data$max_sales)),
quantile_values=as.numeric(summary(eg_data$max_sales)))
jitter <- (
(ggplot(data = eg_data, aes(x=period, y=max_sales)) + #removed group=1
geom_jitter(stat = "identity", width = .15, color = "blue", alpha = .4)) +
scale_y_continuous(breaks= seq(0,12, by=1)) +
geom_hline(data=hline_coordinates,aes(yintercept=quantile_values)) +
ggtitle("Distribution of Sales by Period") + xlab("Period") + ylab("Sales") +
theme(plot.title = element_text(color = "black", size = 14, face = "bold", hjust = 0.5),
axis.title.x = element_text(color = "black", size = 12, face = "bold"),
axis.title.y = element_text(color = "black", size = 12, face = "bold")) +
labs(fill = "Period") )
jitter
Related
I would like to plot the congruence effects (incongruent minus congruent) as a violin plot per combination of stimulus age and response type. This is what my code looks like so far. I am not yet satisfied with the representation. How can I change it so that for each of the four conditions (adult frown, adult smile, child frown, child smile) I get the corresponding violin plot horizontally next to each other? Thanks in advance for the help. Attached is the code and an excerpt from the data frame.
violin plot
dataset$congruency_effect <- ifelse(dataset$congruency == "congruent", dataset$avgAmplitude, -dataset$avgAmplitude)
p <- ggplot(dataset, aes(x = stimulusResponse, y = congruency_effect, fill = congruency_effect, group = stimulusAge)) +
geom_violin() +
geom_point(position = position_dodge(width = 0.75), size = 3, stat = "summary", fun.y = "mean") +
scale_fill_manual(values = c("#F8766D", "#00BFC4")) +
ggtitle("Conventional EEG 350-450 ms") +
scale_y_continuous(limits = c(-5, 5)) +
facet_wrap(~stimulusAge, scales = "free_x")
EEG_Conventional450_age_response <- p + theme(
# Set the plot title and axis labels to APA style
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(face = "bold", size = 14),
# Set the axis tick labels to APA style
axis.text = element_text(size = 12),
# Set the legend title and labels to APA style
legend.title = element_text(face = "bold", size = 14),
legend.text = element_text(size = 12),
# Set the plot and panel backgrounds to white
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "white")
)
EEG_Conventional450_age_response
excerpt data frame
several permutations of arguments in ggplot
This has to do with the grouping aesthetic. Remove it, and your plot works.
library(ggplot2)
set.seed(42)
dataset <- data.frame(stimulusResponse = rep(c("frown", "smile"), each = 20),
congruency_effect = rnorm(40),
stimulusAge = rep(c("baby", "adult"), 20))
## removed group = stimulusAge
ggplot(dataset, aes(x = stimulusResponse, y = congruency_effect)) +
geom_violin() +
geom_point(position = position_dodge(width = 0.75), size = 3, stat = "summary") +
facet_wrap(~stimulusAge, scales = "free_x")
I have a ggplot that displays significance with downward brackets, but my boss wants them to have tick marks going up and down (see image below). As well, my boss wants the "ns" to be capital. Is there a good way to control this without having to it manually? I am making 10 plots total.
p <- ggplot(LessCountS, aes(x=Type, y=DC)) +
geom_dotplot(aes(fill= Type), binwidth = 4.5,
binaxis = "y",
stackdir = "center",
color = "black"
) +
theme_gray ()+
labs(x="", y = "Cell Count (cells/uL")+
ggtitle("Dendritic cells") +
scale_x_discrete(labels=c("HC" = "Controls", "IN" = "Inpatients",
"OUT" = "Outpatients")) +
theme(plot.title = element_text(hjust = 0.5, size = 14), legend.text=element_text(size=12),
legend.position = "none", text = element_text(family = "Calibri"), axis.text = element_text(size=14),
axis.title = element_text(size = 14)) +
#scale_fill_brewer(palette="Purples") +
scale_fill_manual(values=c("#CCCCCC", "#990066", "#3366CC")) +
stat_summary(fun.y = median, fun.ymin = median, fun.ymax = median,
geom = "crossbar", width = 0.35) +
stat_compare_means(comparisons = my_comparisons, label.y = , label = "p.signif", size = 5)
p
I would like the plot to look more like this:
Notice the difference in the significance denotations, I wan to make the dendritic plot look like the neutrophil plot.
Thanks in advance,
S
I am trying to generate a barplot with dual Y-axis and error bars. I have successfully generated a plot with error bars for one variable but I don't know how to add error bars for another one. My code looks like this. Thanks.
library(ggplot2)
#Data generation
Year <- c(2014, 2015, 2016)
Response <- c(1000, 1100, 1200)
Rate <- c(0.75, 0.42, 0.80)
sd1<- c(75, 100, 180)
sd2<- c(75, 100, 180)
df <- data.frame(Year, Response, Rate,sd1,sd2)
df
# The errorbars overlapped, so use position_dodge to move them horizontally
pd <- position_dodge(0.7) # move them .05 to the left and right
png("test.png", units="in", family="Times", width=2, height=2.5, res=300) #pointsize is font size| increase image size to see the key
ggplot(df) +
geom_bar(aes(x=Year, y=Response),stat="identity", fill="tan1", colour="black")+
geom_errorbar(aes(x=Year, y=Response, ymin=Response-sd1, ymax=Response+sd1),
width=.2, # Width of the error bars
position=pd)+
geom_line(aes(x=Year, y=Rate*max(df$Response)),stat="identity",color = 'red', size = 2)+
geom_point(aes(x=Year, y=Rate*max(df$Response)),stat="identity",color = 'black',size = 3)+
scale_y_continuous(name = "Left Y axis", expand=c(0,0),limits = c(0, 1500),breaks = seq(0, 1500, by=500),sec.axis = sec_axis(~./max(df$Response),name = "Right Y axis"))+
theme(
axis.title.y = element_text(color = "black"),
axis.title.y.right = element_text(color = "blue"))+
theme(
axis.text=element_text(size=6, color = "black",family="Times"),
axis.title=element_text(size=7,face="bold", color = "black"),
plot.title = element_text(color="black", size=5, face="bold.italic",hjust = 0.5,margin=margin(b = 5, unit = "pt")))+
theme(axis.text.x = element_text(angle = 360, hjust = 0.5, vjust = 1.2,color = "black" ))+
theme(axis.line = element_line(size = 0.2, color = "black"),axis.ticks = element_line(colour = "black", size = 0.2))+
theme(axis.ticks.length = unit(0.04, "cm"))+
theme(plot.margin=unit(c(1,0.1,0.1,0.4),"mm"))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 4, b = 0, l = 0)))+
theme(axis.title.x = element_text(margin = margin(t = 0, r = 4, b = 2, l = 0)))+
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank())+
ggtitle("SRG3")+
theme(legend.position="top")+
theme( legend.text=element_text(size=4),
#legend.justification=c(2.5,1),
legend.key = element_rect(size = 1.5),
legend.key.size = unit(0.3, 'lines'),
legend.position=c(0.79, .8), #width and height
legend.direction = "horizontal",
legend.title=element_blank())
dev.off()
and my plot is as follows:
A suggestion for future questions: your example is far from being a minimal reproducible example. All the visuals an the annotations are not related to your problem but render the code overly complex which makes it harder for others to work with it.
The following would be sufficient:
ggplot(df) +
geom_bar(aes(x = Year, y = Response),
stat = "identity", fill = "tan1",
colour = "black") +
geom_errorbar(aes(x = Year, ymin = Response - sd1, ymax = Response + sd1),
width = .2,
position = pd) +
geom_line(aes(x = Year, y = Rate * max(df$Response)),
color = 'red', size = 2) +
geom_point(aes(x = Year, y = Rate * max(df$Response)),
color = 'black', size = 3)
(Notice that I've removed stat = "identity" in all geom_s because this is set by default. Furthermore, y is not a valid aestetic for geom_errorbar() so I omitted that, too.)
Assuming that the additional variable you would like to plot error bars for is Rate * max(df$Response)) and that the relevant standard deviation is sd2, you may simply append
+ geom_errorbar(aes(x = Year, ymin = Rate * max(df$Response) - sd2,
ymax = Rate * max(df$Response) + sd2),
colour = "green",
width = .2)
to the code chunk above. This yields the output below.
I'm trying to plot a 2D density plot with ggplot, with added marginal histograms. Problem is that the polygon rendering is stupid and needs to be given extra padding to render values outside your axis limits (e.g. in this case I set limits between 0 and 1, because values outside this range have no physical meaning). I still want the density estimate though, because often it's much cleaner than a blocky 2D heatmap.
Is there a way around this problem, besides scrapping ggMarginal entirely and spending another 50 lines of code trying to align histograms?
Unsightly lines:
Now rendering works, but ggMarginal ignores choord_cartesian(), which demolishes the plot:
Data here:
http://pasted.co/b581605a
dataset <- read.csv("~/Desktop/dataset.csv")
library(ggplot2)
library(ggthemes)
library(ggExtra)
plot_center <- ggplot(data = dataset, aes(x = E,
y = S)) +
stat_density2d(aes(fill=..level..),
bins= 8,
geom="polygon",
col = "black",
alpha = 0.5) +
scale_fill_continuous(low = "yellow",
high = "red") +
scale_x_continuous(limits = c(-1,2)) + # Render padding for polygon
scale_y_continuous(limits = c(-1,2)) + #
coord_cartesian(ylim = c(0, 1),
xlim = c(0, 1)) +
theme_tufte(base_size = 15, base_family = "Roboto") +
theme(axis.text = element_text(color = "black"),
panel.border = element_rect(colour = "black", fill=NA, size=1),
legend.text = element_text(size = 12, family = "Roboto"),
legend.title = element_blank(),
legend.position = "none")
ggMarginal(plot_center,
type = "histogram",
col = "black",
fill = "orange",
margins = "both")
You can solve this problem by using xlim() and ylim() instead of coord_cartesian.
dataset <- read.csv("~/Desktop/dataset.csv")
library(ggplot2)
library(ggthemes)
library(ggExtra)
plot_center <- ggplot(data = dataset, aes(x = E,
y = S)) +
stat_density2d(aes(fill=..level..),
bins= 8,
geom="polygon",
col = "black",
alpha = 0.5) +
scale_fill_continuous(low = "yellow",
high = "red") +
scale_x_continuous(limits = c(-1,2)) + # Render padding for polygon
scale_y_continuous(limits = c(-1,2)) + #
xlim(c(0,1)) +
ylim(c(0,1)) +
theme_tufte(base_size = 15, base_family = "Roboto") +
theme(axis.text = element_text(color = "black"),
panel.border = element_rect(colour = "black", fill=NA, size=1),
legend.text = element_text(size = 12, family = "Roboto"),
legend.title = element_blank(),
legend.position = "none")
ggMarginal(plot_center,
type = "histogram",
col = "black",
fill = "orange",
margins = "both")
I have a continuous variable on y, and a categorical on x axis. At the categorical variable the order makes sense, and it would make sense to fit a regression by its index, I mean instead of c('a', 'b', 'c') use the indices (order(c('a', 'b', 'c')), which is c(1, 2, 3)), and fit the model against this. However, ggplot rejects to fit a geom_smooth(method = lm) if one variable is not numeric. Ok, then I can tell it that use the order:
geom_smooth(aes(x = order(hgcc), y = rtmean), method = lm)
But then it takes the indices of the whole column from the data frame, which is not good with faceting with scales = 'free', when only a subset of the levels of the x variable appears on one plot. The indexes in the whole dataframe are much higher in average, so the regression will be plotted far on the right:
Here is a minimal working example:
require(ggplot2)
load(url('http://www.ebi.ac.uk/~denes/54b510889336eb2591d8beff/sample_data.RData'))
ggplot(adata12cc, aes(x = hgcc, y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
aes(x = order(hgcc), y = rtmean),
method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
xlab('Carbon count unsaturation') +
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 21),
strip.text = element_text(size = 18),
panel.grid.minor.x = element_blank())
I know this is not the nice way of doing things, but ggplot could make life so much easier, if I could refer to those variables and do something with them which are subsetted anyways by faceting.
I think I've got a solution, but I'm not sure what you want...
The Main problem is that your x value label, is already split by uhgroup
If you look at the factor they are PC-O(38.7) PC(38.7 etc...
So the first thing is too create a new hgcc value for the x axis.
adata12cc$hgcc_value <-as.factor(substr(adata12cc$hgcc, (nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])-5), nchar(levels(adata12cc$hgcc)[adata12cc$hgcc])))
Then another problem is that you have different x axis for geom_point and geom_smooth. One is hgcc, the other is order(hgcc_value).
The solution is to use the same value, here I use as.numeric(hgcc_value) (instead of order()) and to precise in scale_x_continuous the label of the breaks.
ggplot(adata12cc, aes(x = as.numeric(hgcc_value), y = rtmean, color = cls, size = log10(intensity))) +
geom_point(stat = 'sum', alpha = 0.33) +
geom_smooth(
aes(x = as.numeric(hgcc_value), y = rtmean),
method = 'glm') +
facet_wrap( ~ uhgroup, scales = 'free') +
scale_radius(guide = guide_legend(title = 'Intensity (log)')) +
scale_color_discrete(guide = guide_legend(title = 'Class')) +
scale_x_continuous(name = "Carbon count unsaturation",
breaks=as.numeric(adata12cc$hgcc_value),
labels = adata12cc$hgcc_value,
minor_breaks = NULL)+
ylab('Mean RT [min]') +
ggtitle('RT vs. carbon count & unsaturation by headgroup') +
theme(axis.title = element_text(size = 24),
axis.text.x = element_text(angle = 90, vjust = 0.5, size = 9, hjust = 1),
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 21),
strip.text = element_text(size = 18),
panel.grid.minor.x = element_blank())
Is it what you were looking for?