Trying to plot a histogram with a large data set (>50,000 data points). There is a very clear distribution, an initial peak into a second smaller peak, however when we log the graph, the first peak pales in significance to the second, when even logged, it should remain a higher value. Will be viewing my data zoomed in FYI, just zoomed out to demonstrate effect better.
ggplot(sman,aes(x=V1, fill = V3)) +
geom_histogram(color ="#000000", center=100,binwidth = 100)+
scale_fill_manual(values = met.brewer("NewKingdom",4)) +
xlab("Feature Length (bp)") + ylab("Frequency") +
theme(axis.text = element_text(size=14)) + theme(axis.title = element_text(size=20)) +
theme(legend.text = element_text(face="italic")) +
theme(strip.text = element_text(size =18, face="italic")) + labs(fill="Species") +
theme(axis.text = element_text(size=18), axis.title = element_text(size=22, face="bold"))
ggplot(sman,aes(x=V1, fill = V3)) +
geom_histogram(color ="#000000", center=100,binwidth = 100)+
scale_fill_manual(values = met.brewer("NewKingdom",4)) +
xlab("Feature Length (bp)") + ylab("Frequency") + scale_y_continuous(trans="log10") +
theme(axis.text = element_text(size=14)) + theme(axis.title = element_text(size=20)) +
theme(legend.text = element_text(face="italic")) +
theme(strip.text = element_text(size =18, face="italic")) + labs(fill="Species") +
theme(axis.text = element_text(size=18), axis.title = element_text(size=22, face="bold"))
Data is formatted as such, note there are no 0 values in the dataset (I have done V1+1 to transform them:
V1 V2 V3.
1 S. mansoni TE
16 S. mansoni noTE
etc..
Related
I have created a R visualisation in Power BI and looking at having only 1 grid line where the horizontal axis value crosses the axis value at 1.
I am not good with words and not sure if I have explained it well in words. Please see the screenshots below to get a better understanding of what I want to achieve.
Any help is greatly appreciated.
First screenshot is from Excel where I was able to do it and I want to replicate the same in the R chart (second screenshot)
library(ggplot2)
ggplot(unique(dataset), aes(x = reorder(Condition, Rate), y = Rate)) +
labs(x = "Condition")+
geom_point(size = 5, stroke = 0, shape = 18, colour="brown") +
geom_point() + geom_line() +
geom_errorbar(aes(ymin = LL, ymax = UL), width=.2, position=position_dodge(.9), colour="brown", alpha=0.6, size=.7) +
theme_bw()+
theme(panel.grid.major = element_blank()) +
theme(axis.text.x = element_text(angle=90, hjust = 1))+
theme(axis.text.x = element_text(size = 10))
Let p is your original ggplot object
step 1: remove the original x axis
p + theme(axis.line.x = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()) +
labs(x = '') -> p1
step 2: add a line at 1
p1 + geom_hline(yintercept = 1, color = "black")
As you have not provided any data, so I am using iris dataset. You can use the following code
library(ggplot2)
ggplot(unique(iris), aes(x = Species, y = Petal.Width)) +
labs(x = "Condition")+
geom_point(size = 5, stroke = 0, shape = 18, colour="brown") +
geom_point() + geom_line() +
theme_bw()+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
theme(axis.text.x = element_text(angle=90, hjust = 1))+
theme(axis.text.x = element_text(size = 10)) +
geom_abline(slope=0, intercept=1, col = "darkblue",lty=1,size = 0.5)
I am making a plot in ggplot, and when I add the geom_line() layer it includes 2 lines instead of one. Can anyone help me understand why it's doing this?
Code:
library(ggplot2)
a <- data.frame(SubjectId=c(1:3, 1:3, 1:3, 1:3),
Cycle=c(1,1.1,1.2, 2,2.1,2.2, 3,3.1,3.2, 4,4.1,4.2),
Dose=c(sort(rep(1:3,3)), 3,3,3),
DLT=c("No","No","Yes","No","No","No","No","Yes","Yes","No","Yes","Yes"))
ggplot(aes(x=Cycle, y=Dose, fill=DLT), data = a) +
scale_fill_manual(values = c("white", "black")) +
geom_line(colour="grey20", size=1) +
geom_point(shape=21, size=5) +
xlim(1, 4.5) +
ylim(1, 4) +
ylab("Dose Level") +
theme_classic() +
theme(axis.text =element_text(size=10),
axis.title =element_text(size=12, face="bold", colour="grey20"),
legend.text =element_text(size=10),
legend.title=element_text(size=12, face="bold", colour="grey20"))
I just want one line to go through the points in order of Cycle, but sorting a by Cycle doesn't change the line at all. What am I doing wrong?
put fill=DLT in the geom_point() section, not at the top. Eg:
ggplot(aes(x=Cycle, y=Dose), data = a) +
scale_fill_manual(values = c("white", "darkred")) +
geom_line(colour="grey20", size=1) +
geom_point(shape=23, size=5, aes(fill=DLT)) +
xlim(1, 4.5) +
ylim(1, 4) +
ylab("Dose Level") +
theme_classic() +
theme(axis.text =element_text(size=10),
axis.title =element_text(size=12, face="bold", colour="grey20"),
legend.text =element_text(size=10),
legend.title=element_text(size=12, face="bold", colour="grey20"))
this is my small script in order to make some batch plot in ggplot:
by(database_per_grafici, database_per_grafici$variable, function(i){
ggplot(subset(i, !is.na(value)))+
stat_ecdf(aes(x=value, color=gruppo), size=2)+
scale_y_continuous(labels = percent) +
theme_bw() +
theme(panel.grid.major = element_line(colour = "black", size= 0.3))+
theme(legend.text = element_text(size = 60)) +
theme(legend.justification=c(1,0), legend.position=c(1,0))+
theme(legend.title=element_blank()) +
theme(axis.text.x = element_text(size=16)) +
theme(axis.title.x = element_text(size=20))+
xlab("mg/kg")+
theme(axis.text.y = element_text(size=16)) +
theme(axis.title.y = element_blank()) +
guides(colour = guide_legend(override.aes = list(size=10))) +
ggsave(sprintf("%s.png", unique(i$variable), width = 30,
height = 20, units = "cm"))
})
but there is a problem with the legend as you can see in the two following pictures:
and
the problem is that in some plots, the legend overlays the plot itself.
Is there a way to set the legend position according to the dataframe values?
Thanks
I need to gather two facet columns into one column with ggplot2.
In the following example, I need to overlay the content of the two columns DEG and RAN into one, while giving different colours to DEG and RAN data (small points and smooth line) and provide the corresponding legend (so I can distinguish them as they are overlayed).
I feel my code is not too, too far from what I need, but the relative complexity of the dataset blocks me. How to go about achieving this in ggplot2?
Here's my code so far:
require(reshape2)
library(ggplot2)
library(RColorBrewer)
fileName = paste("./4.csv", sep = "") # csv file available here: https://www.dropbox.com/s/bm9hd0t5ak74k89/4.csv?dl=0
mydata = read.csv(fileName,sep=",", header=TRUE)
dataM = melt(mydata,c("id"))
dataM = cbind(dataM,colsplit(dataM$variable,pattern = "_",names = c("NM", "ORD", "CAT")))
dataM$variable <- NULL
dataM <- dcast(dataM, ... ~ CAT, value.var = "value")
my_palette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
ggplot(dataM, aes(x=NR ,y= ASPL)) +
geom_point(size = .4,alpha = .5) +
stat_smooth(se = FALSE, size = .5) +
theme_bw() +
theme(plot.background = element_blank(),
axis.line = element_blank(),
legend.key = element_blank(),
legend.title = element_blank()) +
scale_y_continuous("ASPL", expand=c(0,0), limits = c(1, 7)) +
scale_x_continuous("NR", expand=c(0,0), limits = c(0, 100)) +
theme(legend.position="bottom") +
theme(axis.title.x = element_text(vjust=-0.3, face="bold", size=12)) +
theme(axis.title.y = element_text(vjust=1.5, face="bold", size=12)) +
ggtitle("Title") + theme(plot.title = element_text(lineheight=.8, face="bold")) +
theme(title = element_text(vjust=2)) +
facet_grid(NM ~ ORD)
Here's what it gives me right now:
Extra question: how come DEG/SF doesn't show a smooth line?
You can use the group aesthetic to define that data points with the same value of ORD belong together. You can also map aesthetics shape and color to this variable. You can also use . to specify that the facets are not split along a specific dimension.
I have made the changes to your code below after transforming NR and ASPL to numeric variables:
dataM$NR <- as.integer(dataM$NR)
dataM$ASPL <- as.numeric(dataM$ASPL)
ggplot(dataM, aes(x=NR ,y= ASPL, group=ORD, color=ORD)) +
geom_point(size = .7,alpha = .5, aes(shape=ORD)) + ## increased size
stat_smooth(se = FALSE, size = .5) +
theme_bw() +
theme(plot.background = element_blank(),
axis.line = element_blank(),
legend.key = element_blank(),
legend.title = element_blank()) +
scale_y_continuous("ASPL", expand=c(0,0), limits = c(1, 7)) +
scale_x_continuous("NR", expand=c(0,0), limits = c(0, 100)) +
theme(legend.position="bottom") +
theme(axis.title.x = element_text(vjust=-0.3, face="bold", size=12)) +
theme(axis.title.y = element_text(vjust=1.5, face="bold", size=12)) +
ggtitle("Title") + theme(plot.title = element_text(lineheight=.8, face="bold")) +
theme(title = element_text(vjust=2)) +
facet_grid(NM ~.)
I made ton of figures with ggplot 0.8.9 (which is what I'm still running). Now I need to modify these figures to include legends. I am running across all sorts of problems that are hard to solve because I am getting really confused about theme and opts and many SO answers that apply to later versions.
At this point, it seems like I need to update ggplot2 and rewrite all of my code just so I can have legends on my figures. Is this true? I've read the ggplot2 transition guide, it makes it seem true.
Here's what the old code looks like (does not produce a legend): And here is the data for the sake of reproducibility: mean10v2 and stderr10.
me10<-read.table("mean10v2.txt", header=TRUE)
se10<-read.table("stderr10.txt", header=TRUE)
ggplot() +
geom_ribbon(aes(x = me10[me10$trt=="CC", "tu"], ymin=(me10[me10$trt=="CC", "biomassA"]-
se10[se10$trt=="CC", "biomassA"]), ymax=(me10[me10$trt=="CC",
"biomassA"]+se10[se10$trt=="CC", "biomassA"])), alpha=0.25) +
geom_line(aes(me10[me10$trt=="CC", "tu"], y=me10[me10$trt=="CC", "biomassA"]), size=1)+
geom_ribbon(aes(x = me10[me10$trt=="PF", "tu"], ymin=(me10[me10$trt=="PF", "biomassA"]-
se10[se10$trt=="PF", "biomassA"]), ymax=(me10[me10$trt=="PF",
"biomassA"]+se10[se10$trt=="PF", "biomassA"])), alpha=0.25) +
geom_line(aes(me10[me10$trt=="PF", "tu"], y=me10[me10$trt=="PF", "biomassA"]),
colour="red2", linetype="dashed", size=1) +
geom_ribbon(aes(x = me10[me10$trt=="P", "tu"], ymin=(me10[me10$trt=="P", "biomassA"]-
se10[se10$trt=="P", "biomassA"]), ymax=(me10[me10$trt=="P",
"biomassA"]+se10[se10$trt=="P", "biomassA"])), alpha=0.25) +
geom_line(aes(me10[me10$trt=="P", "tu"], y=me10[me10$trt=="P", "biomassA"]),
colour="blue3", linetype="dotted", size=1) +
opts(panel.grid.minor = theme_blank()) +
opts(panel.grid.major = theme_blank()) +
opts(panel.background = theme_blank()) +
opts(axis.line = theme_segment()) +
opts(legend.position=c(.5,.5)) +
opts(axis.title.x = theme_text(size=12,vjust=-0.5)) +
opts(axis.title.y = theme_text(size=12,angle=90)) +
opts(axis.text.x = theme_text(colour="black", size=16)) +
opts(axis.text.y = theme_text(colour="black", size=16)) +
annotate("text", x = -Inf, y = Inf, label = "a", face="bold", hjust = -5, vjust=2, size
= 9) +
ylab("") +
xlab("") +
ylim(0,2200)
Updating the theme parts is actually quite simple. You really just need to change opts() to theme() and replace theme_* with element_*. Some other names have changed, like you'll use element_line instead of theme_segment.
But more generally, you're using ggplot all wrong:
my_df <- me10[,c('trt','tu','biomassA')]
my_se <- setNames(se10[,c('trt','tu','biomassA')],c('trt','tu','se'))
my_df <- merge(my_df,my_se)
ggplot(data = my_df,aes(x = tu,y = biomassA)) +
geom_ribbon(aes(group = trt,ymin = biomassA - se,ymax = biomassA + se),alpha = 0.25) +
geom_line(aes(group = trt,linetype = trt,colour = trt)) +
labs(x = "",y = "") +
ylim(0,2200) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(),
legend.position=c(.5,.5),
axis.title.x = element_text(size=12,vjust=-0.5),
axis.title.y = element_text(size=12,angle=90),
axis.text.x = element_text(colour="black", size=16),
axis.text.y = element_text(colour="black", size=16)) +
annotate("text", x = -Inf, y = Inf, label = "a", face="bold", hjust = -5, vjust=2, size = 9)
Notice how much cleaner that is, and putting the data into an appropriate form only took three lines. Also note that there is absolutely no need to keep repeating the opts() or theme() calls for every....single....thing...you....set.
And then if you want to choose specific colors/linetypes for each group, you do that using the scale functions, not by setting them individually:
+ scale_colour_manual(values = c('black','red2','blue3')) +
scale_linetype_manual(values = c('solid','dashed','dotted'))