I have the following dataframe and I am using ggplot to plot the ind vs values.
ggplot(data=stats,aes(x=ind,y=values,fill=ind))+geom_bar(stat="identity")+coord_flip()+scale_fill_brewer()
stats
values ind
1 238970950 testdb_i
2 130251496 testdb_b
3 314350612 testdb_s
4 234212341 testdb_m
5 222281421 testdb_e
6 183681071 testdb_if
7 491868567 testdb_l
8 372612463 testdb_p
The plot in y-axis is in the form of 0e+00, 1e+08, 2e+08 and so on but instead I need it in the form of 100M(hundred million), 200M(two hunderd million) etc marks. How can I get the desired axes in ggplot?
You may try
ggplot(data=stats,aes(x=ind,y=values,fill=ind))+
geom_bar(stat="identity")+
coord_flip()+
scale_fill_brewer()+
scale_y_continuous(labels=function(x) paste0(x/1e6,"M"))
Related
I am currently stuck on formatting a grouped bar chart.
I have a dataframe, which I would like to visualize:
iteration position value
1 1 eEP_SRO 20346
2 1 eEP_drift 22410
3 1 eEP_hole 29626
4 2 eEP_SRO 35884
5 2 eEP_drift 39424
6 2 eEP_hole 51491
7 3 eEP_SRO 51516
8 3 eEP_drift 55523
9 3 eEP_hole 74403
The position should be shown as color and the value should be represented in the height of the bar.
My code is:
fig <- ggplot(df_eEP_Location_plot, aes(fill=position, y=value, x=iteration, order=position)) +
geom_bar(stat="identity")
which gives me this result:
I would like to have a correct y-axis labelling and would also like to sort my bars from largest to smallest (ignoring the iteration number). How can I achieve this?
Thank you very much for your help!
I would recommend using fct_reorder from the forcats package to reorder your iterations along the specified values prior to plotting in ggplot. See the following with the sample data you've provided:
library(ggplot2)
library(forcats)
iteration <- factor(c(1,1,1,2,2,2,3,3,3))
position <- factor(rep(c("eEP_SRO","eEP_drift","eEP_hole")))
value <- c(20346,22410,29626,35884,39424,51491,51516,55523,74403)
df_eEP_Location_plot <- data.frame(iteration, position, value)
df_eEP_Location_plot$iteration <- fct_reorder(df_eEP_Location_plot$iteration,
-df_eEP_Location_plot$value)
fig <- ggplot(df_eEP_Location_plot, aes(y=value, x=iteration, fill=position)) +
geom_bar(stat="identity")
fig
This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 2 years ago.
I am trying the following plotting.
I have this data set:
Pathway Value Col.Code
AKTSig 1 r
HRAS 2 r
Lbind 3 h
GPCRact 4 r
ACHsig 5 h
ACEest -2 r
MRNAspl -3 h
Notch -4 h
Delta -5 r
Sonic -6 r
I would like to plot a graph that has these columns with pathway along the x axis, value up the y axis and the columns coloured by the Col.Code column. I have tried geom_col() from ggplot2 but this always rearranges the columns into a random order i.e. not highest value to most negative. I have also tried geom_bar() but this creates counts for the pathways and doesn't plot what I have described above.
You can use this:
library(dplyr)
ggplot(data,aes(x=reorder(Pathway,-Value),y=Value,fill=Col.Code))+geom_bar(stat='identity')
One other approach is with fct_reorder from the forcats package:
library(forcats)
ggplot(data,aes(x=fct_reorder(Pathway,-Value),y=Value,fill=Col.Code)) +
geom_bar(stat='identity') +
labs(x = "Pathway")
I want to make a simple histogram which involves two vectors ,
values <- c(1,2,3,4,5,6,7,8)
freq <- c(4,6,4,4,3,2,1,1)
df <- data.frame(values,freq)
Now the data.farame df consists the following values :
values freq
1 4
2 6
3 4
4 4
5 3
6 2
7 1
8 1
Now I want to draw a simple histogram, in which values are on the x axis and freq is on y axis. I am trying to use the hist function, but I am not able to give two variables. How can I make a simple histogram from this data?
using ggplot2:
library(ggplot2)
ggplot(df, aes(x = values, y = freq)) +
geom_bar(stat="identity")
Since you have the frequencies already, what you really want is a bar plot:
barplot(df$freq,names.arg=df$values)
If you've got your heart set on using hist, you should do:
hist(rep(df$values,df$freq))
Please read ?barplot and ?hist for further plotting options.
Also, because I'm somewhat of a zealot, I think the code looks cleaner if you use data.table:
library(data.table)
setDT(df) #convert df to a data.table by reference
df[,barplot(freq,names.arg=values)]
and
df[,hist(rep(values,freq))]
I am looking to scale the x axis on my barplot to time, so as to accurately represent when measurements were taken.
I have these data frames:
> Botcv
Date Average SE
1 2014-09-01 4.0 1.711307
2 2014-10-02 5.5 1.500000
> Botc1
Date Average SE
1 2014-10-15 2.125 0.7180703
2 2014-11-12 1.000 0.4629100
3 2014-12-11 0.500 0.2672612
> Botc2
Date Average SE
1 2014-10-15 3.375 1.3354708
2 2014-11-12 1.750 0.4531635
3 2014-12-11 0.625 0.1829813
I use this code to produce a grouped barplot:
covaverage <- c(Botcv$Average,NA,NA,NA)
c1average <- c(NA,NA, Botc1$Average)
c2average <- c(NA,NA, Botc2$Average)
date <- c(Botcv$Date, Botc1$Date)
averagematrix <- matrix(c(covaverage,c1average, c2average), nrow=3, ncol=5, byrow=TRUE)
barplot(averagematrix,date, xlab="Date", ylab="Average", axis.lty=1, space=NULL,width=3,beside=T, ylim=c(0.00,6.00))
R plots the bars equal distances apart by default and I have been trying to find a workaround for this. I have seen several other solutions that utilise ggplot2 but I am producing plots for my masters thesis and would like to keep the appearance of my barplots in line with other graphs that I have created using base R graphics. I also want to add error bars to the plot. If anyone could provide a solution then I would be very grateful!! Thanks!
Perhaps you can use this as a start. It is probably easier to use boxplots, as they can be put at a given x position by using the at argument. For base barplots this cannot be done, but you can use rectangle instead to replicate the barplot look. Error bars can be added using arrows or segments.
bar_w = 1 # width of bars
offset = c(-1,1) # offset to avoid overlapping
cols = grey.colors(2) # colors for different types
# combine into a single data frame
d = data.frame(rbind(Botc1, Botc2), 'type' = c(1,1,1,2,2,2))
# set up empty plot with sensible x and y lims
plot(as.Date(d$Date), d$Average, type='n', ylim=c(0,4))
# draw data of data frame 1 and 2
for (i in unique(d$type)){
dd = d[d$type==i, ]
x = as.Date(dd$Date)
y = dd$Average
# rectangles
rect(xleft=x-bar_w+offset[i], ybottom=0, xright=x+bar_w+offset[i], ytop=y, col=cols[i])
# errors bars
arrows(x0=x+offset[i], y0=y-0.5*dd$SE, x1=x+offset[i], y1=y+0.5*dd$SE, col=1, angle=90, code=3, length = 0.1)
}
If what you want to get is simply the theme that will match the base theme the + theme_bw() in ggplot2 will achieve this:
data(mtcars)
require(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
theme_bw()
Result
Alternative
boxplot(mpg~cyl,data=mtcars)
If, as you said, the only thing you want to achieve is similar look, and you have working plot in the ggplot2 using the theme_bw() should produce plots that are indistinguishable from what would be derived via the standard plotting mechanism. If you feel so inclined you may tweak some minutiae details like font sizes, thickness of graph borders or visualisation of outliers.
I want to plot 2 graphs in 1 frame. Basically I want to compare the results.
Anyways, the code I tried is:
plot(male,pch=16,col="red")
lines(male,pch=16,col="red")
par(new=TRUE)
plot(female,pch=16,col="green")
lines(female,pch=16,col="green")
When I run it, I DO get 2 plots in a frame BUT it changes my y-axis. Added my plot below. Anyways, y-axis values are -4,-4,-3,-3,...
It's like both of the plots display their own axis.
Please help.
Thanks
You don't need the second plot. Just use
> plot(male,pch=16,col="red")
> lines(male, pch=16, col = "red")
> lines(female, pch=16, col = "green")
> points(female, pch=16, col = "green")
Note: that will set the frame boundaries based on the first data set, so some data from the second plot could be outside the boundaries of the plot. You can fix it by e.g. setting the limits of the first plot yourself.
For this kind of plot I usually like the plotting with ggplot2 much better. The main reason: It generalizes nicely to more than two lines without a lot of code.
The drawback for your sample data is that it is not available as a data.frame, which is required for ggplot2. Furthermore, in every case you need a x-variable to plot against. Thus, first let us create a data.frame out of your data.
dat <- data.frame(index=rep(1:10, 2), vals=c(male, female), group=rep(c('male', 'female'), each=10))
Which leaves us with
> dat
index vals group
1 1 -0.4334269341 male
2 2 0.8829902521 male
3 3 -0.6052638138 male
4 4 0.2270191965 male
5 5 3.5123679143 male
6 6 0.0615821014 male
7 7 3.6280155376 male
8 8 2.3508890457 male
9 9 2.9824432680 male
10 10 1.1938052833 male
11 1 1.3151289227 female
12 2 1.9956491556 female
13 3 0.8229389822 female
14 4 1.2062726250 female
15 5 0.6633392820 female
16 6 1.1331669670 female
17 7 -0.9002109636 female
18 8 3.2137052284 female
19 9 0.3113656610 female
20 10 1.4664434215 female
Note that my command assumes you have 10 data values each. That command would have to be adjusted according to your actual data.
Now we may use the mighty power of ggplot2:
library(ggplot2)
ggplot(dat, aes(x=index, y=vals, color=group)) + geom_point() + geom_line()
The call above has three elements: ggplot initializes the plot, tells R to use dat as datasource and defines the plot aesthetics, or better: Which aesthetic properties of the plot (such as color, position, size, etc.) are influenced by your data. We use the x and y-values as expected and furthermore set the color aesthetic to the grouping variable - that makes ggplot automatically plot two groups with different colors. Finally, we add two geometries, that pretty much do what is written above: Draw lines and draw points.
The result:
If you have your data saved in the standard way in R (in a data.frame), you end with one line of code. And if after some thousands years of evolution you want to add another gender, it is still one line of code.