using ggplot2 to show multiple number columns in barplot? - r

Would anybody please help using ggplot2 in R, to show a barplot, where i need to show columns (first, second, third, fourth, fifth) on x axis and their values on y-axis ? without showing the column "uname".
> head(golQ1Grades)
qname uname first second third fourth fifth
1 onlinelernen_quiz_1 xxx 100 0 0 0 0
2 onlinelernen_quiz_2 xxxx 100 0 0 0 0
3 onlinelernen_quiz_4 xxxx 42 71 0 0 0
4 onlinelernen_quiz_7 xxxx 85 100 0 0 0
5 onlinelernen_quiz_1 xxx 85 100 0 0 0
6 onlinelernen_quiz_3 xxxx 71 0 0 0 0
Thanks for the advanced help.

It is my guess that you would like to display the mean value on the Y-axis.
library(ggplot2)
Data
dat<-data.frame(c(100,100,42,85,85,71), c(0,0,71,100,100,0), c(0,0,0,0,0,0), c(0,0,0,0,0,0), c(0,0,0,0,0,0))
names(dat)<-NULL
Compute mean and get new data
v1<-apply(dat, 2, mean)
nv1<-c("first","second","third", "fourth","fifth")
ndat<-data.frame(nv1, v1)
Plot
p <- ggplot(ndat, aes(factor(nv1), v1))
p + geom_bar(stat="identity")

I think the better option is dplyr and tidyr.E.g. (I change data.frame a little)
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(qname = letters[1:10],
first = seq(1,10,1),
second = seq(10,100,10),
third = seq(2,20,2))
And then use gather feature:
df <- df %>%
gather(variable, value, -qname)
in your case it will be
df <- golQ1Grades %>%
gather(variable,value, -qname, -uname)
Futhermore, instead of computing average value it is also extremely helpful facet_grid:
ggplot(df, aes(factor(qname),value))+
geom_bar(stat = "identity")+
facet_grid(.~variable)

Related

How to create a dot plot from multiple columns in one plot?

Consider a df that I would like to plot.
The exemplary df:
df
Entry A. B. C. D. Value
O60701 1 1 1 0 2.7181970
Q8WZ42 1 1 1 1 3.6679832
P60981 1 1 0 0 2.2974231
Q15047 1 0 0 0 0.5535473
Q9UER7 1 0 0 0 4.1030394
I want Entry to be on y axis and Value on x axis. Do you have any ideas how to create a plot, so that if a protein is found (==1) let us say in column A it would be a dot on a plot? Since we have four columns (A-D), there can be maximum 4 dots. Hence, I would like to be able to distinguish which dot (or any other shape) comes from which column.
Here is what I have so far:
ggplot(df, aes(x=Value, y=Entry)) +
geom_point(size=1) +
theme_ipsum()
library(tidyverse)
df %>%
pivot_longer(cols = A:D) %>%
# by default, pivot_longer creates `name` column with either A/B/C/D,
# and a `value` column holding the original 0/1 value from those columns
filter(value == 1) %>% # only plot if protein found (A/B/C/D==1)
ggplot(aes(Value, Entry, color = name)) +
geom_jitter(height = 0.1, width = 0.1) + # since you have multiple points at the same locations
hrbrthemes::theme_ipsum()

no. of geom_point matches the value

I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As
As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)

how to make the boxplots with dot points and labels?

I have a dataframe as below
G1 G2 G3 G4 group
S_1 0 269.067 0.0817233 243.22 N
S_2 0 244.785 0.0451406 182.981 N
S_3 0 343.667 0.0311259 351.329 N
S_4 0 436.447 0.0514887 371.236 N
S_5 0 324.709 0 293.31 N
S_6 0 340.246 0.0951976 393.162 N
S_7 0 382.889 0.0440337 335.208 N
S_8 0 368.021 0.0192622 326.387 N
S_9 0 267.539 0.077784 225.289 T
S_10 0 245.879 0.368655 232.701 T
S_11 0 17.764 0 266.495 T
S_12 0 326.096 0.0455578 245.6 T
S_13 0 271.402 0.0368059 229.931 T
S_14 0 267.377 0 248.764 T
S_15 0 210.895 0.0616382 257.417 T
S_16 0.0401525 183.518 0.0931699 245.762 T
S_17 0 221.535 0.219924 203.275 T
Now I want to make a multiboxplot with all the 4 genes in columns. The first 8 rows are for normal samples an rest 9 rows are tumor samples so for each gene I should be able to make 2 box plots with labels of tissues. I am able to make individual boxplots but how should I put all the 4 genes in one plot and also label the tissue for each boxplots and use the stripchart points. Is there a easy way to do it? I can only make individual plots using the row and column names but cannot mark the labels based on column groups in the plot and also plot the points with the stripchart. Any help will be appreciated. Thanks
with facet_wrap:
head(df)
G1 G2 G3 G4 group
S_1 0 269.067 0.0817233 243.220 N
S_2 0 244.785 0.0451406 182.981 N
S_3 0 343.667 0.0311259 351.329 N
S_4 0 436.447 0.0514887 371.236 N
S_5 0 324.709 0.0000000 293.310 N
S_6 0 340.246 0.0951976 393.162 N
library(reshape2)
df <- melt(df)
library(ggplot2)
ggplot(df, aes(x = variable,y = value, group=group, col=group)) +
facet_wrap(~variable, scales = 'free') + geom_boxplot()
Not sure what you mean with stripchart points, I assumed you wanted to visualize the actual points overlaid on the boxplots. Would the following suffice?
library(ggplot2)
library(dplyr)
library(reshape2)
melt(df) %>%
ggplot(aes(x = variable, y = value, col = group)) +
geom_boxplot() +
geom_jitter()
Where df is the above data frame. Result:

data order and color in ggplot

I have assets by manager in a data frame
Date C B A E D
2011-06-30 20449251 2011906 0 0 0
2011-09-30 20766092 1754940 0 0 0
2011-12-31 15242138 1921684 0 0 0
2012-03-31 15811841 2186571 0 0 0
2012-06-30 16221813 2026042 2423039 2419517 0
2012-09-30 16155686 2261729 2563734 1160693 0
2012-12-31 16297839 2231341 2592015 1151989 0
2013-03-31 14627046 2441132 2769681 1249464 0
2013-06-30 14186185 2763985 2615053 1260893 0
2013-09-30 14039954 2780167 2698988 1264244 0
2013-12-31 13832117 3081687 2962113 1318903 0
2014-03-31 14177177 3133202 3077684 1353243 0
2014-06-30 14503900 3235089 3196623 1415319 0
2014-09-30 12561057 3227862 3048216 1413446 2073068
I then melt and plot to get a stacked area graph
library('ggplot2')
library('reshape2')
colorscheme = scale_fill_brewer(type="qual",palette = 2)
df = melt(data,id.var="Date",variable.name="Manager")
df[,3] = as.numeric(df[,3])
#Stacked Area
layout(c(1,1))
p = ggplot(df,aes(x=Date,y=value,group=Manager,fill=Manager))+
geom_area(position="fill") + colorscheme
print(p)
and this works great:
Now I want a pie chart of the last row (i.e, current date)
df1 = data[nrow(data),-1]
df1 = as.data.frame(t(df1))
colnames(df1) = "AUM"
p = ggplot(df1,aes(x=1,y=df1$AUM,fill=rownames(df1))) +
geom_bar(stat="identity") + colorscheme + coord_polar(theta="y")
plot(p)
and I get the following:
Ignoring the formatting, my question is about the color selection. The colors don't match by manager. Manager A color in the area graph is now the color for Manager C. I realize it is because the pie chart is sorted by Manager name where as the Manager order in data isn't sorted.
I don't have control of how I receive data. Is there way to reorder data and/or df (data melted) so that the first graph is in manager order? Or change the way data is sent to the pie chart?
Thanks,
Rather than messing around with factor levels, wouldn't it just be easier to subset df by the Date from the last row in data??
ggplot(df[df$Date==tail(data,1)$Date,],aes(x=1,y=value,fill=Manager)) +
geom_bar(stat="identity") + colorscheme + coord_polar(theta="y")

how to do boxplot in ggplot2

I need to create some box plots showing the abundance of some bacterial taxa in different samples.
My data looks like:
my.data <- "Taxon 06.TO.VG 21.TO.V 02.TO.VG 41.TO.VG 30.TO.V 04.BA.V 34.TO.VG 01.BA.V 28.TO.VG 18.TO.O 44.TO.V 08.BA.O 07.BA.O 06.BA.V 11.TO.V 06.BA.VG 07.BA.VG 05.BA.VG 07.BA.V 05.BA.V 06.BA.O 02.BA.O 04.BA.O 01.BA.O 05.BA.O 03.BA.O 02.BA.VG 03.BA.V 02.BA.V 04.BA.VG 03.BA.VG 01.BA.VG 15.TO.O 31.TO.O 09.TO.O 27.TO.V 42.TO.VG 08.TO.VG 16.TO.O 07.TO.V 13.TO.O 32.TO.V 29.TO.VG 10.TO.V 25.TO.V 05.TO.VG 20.TO.O 19.TO.V 17.TO.O 35.TO.V 43.TO.O 24.TO.V 26.TO.VG 01.TO.VG 37.TO.O 04.TO.VG 33.TO.O 39.TO.VG 14.TO.O 12.TO.O 38.TO.VG 22.TO.O
Bacteroides 0.072745558 0.011789182 0.028956894 0.059031877 0.097387173 0.086673889 0.432662192 0.060246679 0.269535674 0.152713335 0.014511873 0.063421323 0.091253905 0.139856373 0.013677012 0.200847907 0.180712032 0.21332737 0.031756181 0.272166702 0.019861211 0.133804422 0.168692685 0.100862392 0.152431791 0.104702194 0.119352089 0.410334347 0.024104844 0.0493905 0.068065382 0.047854785 0.011860175 0.168986083 0.015748031 0.407974482 0.264409881 0.250364431 0.330547112 0.536443695 0.578045113 0.400459167 0.204446209 0.357879234 0.242751388 0.488863722 0.521495803 0.001852281 0.045638126 0.503566932 0.069072806 0.171181339 0.183629007 0.371751412 0.385231317 0.023690205 0.255697356 0.104054054 0.242741552 0.043973941 0.221033868 0.004587156
Prevotella 0.073080791 0.302011096 0.586048042 0.487603306 0.290973872 0.014897075 0 0.333254269 0.029445074 0 0.153034301 0.002399726 0.025658188 0.090664273 0.440294582 0.100688924 0 0 0 0 0 0.000227946 0.093623374 0 0.000197707 0.115987461 0.076442171 0 0.047507606 0.000210172 0.000243962 0.042079208 0.52184769 0 0.394750656 0 0 0.235787172 0 0.000936856 0.000300752 0 0.051607781 0 0 0 0.002289494 0.735586941 0.023828756 0 0.011200996 0 0.046374105 0 0.00044484 0.085421412 0.000455789 0.306756757 0 0.11970684 0.008912656 0.371559633"
I'm wandering bout using ggplot2 to do to do the box plot, but I'm not sure about how the data have to be formatted....
I tried this:
df <- read.csv("my.data", header=T)
ggplot(data = df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Taxon))
but it gave me an error saying that the variable was not found...
Anyone can help me?
Many thanks
Francesca
An quick example of how to format your data:
categs = sample(LETTERS[1:3], 120, TRUE)
y = c(rnorm(40), rnorm(40, 3, 2), rnorm(40, 5, 3))
# example dataset
dados = data.frame(categs, y)
require(ggplot2)
ggplot(dados) + geom_boxplot(aes(x = categs, y = y))
# categs y
#1 B 0.7392673
#2 B -0.1694076
#3 A -2.3804024
#4 B 0.5999949
#5 A 0.5816400
#6 A 2.1263669
See also http://ggplot2.org/

Resources