ggplot - use numeric values to fill stacked bar charts - r

I would like to build a stacked bar chart, providing in the field fill numeric values and not categories.
This is my graph so far:
In the ggplot example for the stacked bar chart, the field fill correspond to the column cut of the diamonds dataset.
This column correspond to :
> class(diamonds$cut)
[1] "ordered" "factor"
Therefore, I think that the frequence of the different terms
> head(diamonds$cut)
[1] Ideal Premium Good Premium Good Very Good
Levels: Fair < Good < Very Good < Premium < Ideal
is calculated and used to fill the bars.
In my case I have each value of the bar displayed on X (tot in my dataframe) formed by two type of value: up, down. These correspond to columns in my dataframe:
> head(cyt.4)
COG tot up down
1 [C] Energy production and conversion 17 16 1
2 [D] Cell cycle control, cell division, chromosome partitioning 0 0 0
3 [E] Amino acid transport and metabolism 34 30 4
4 [F] Nucleotide transport and metabolism 11 9 2
5 [G] Carbohydrate transport and metabolism 13 9 4
6 [H] Coenzyme transport and metabolism 3 3 0
For example a bar that has a X (tot) value of 10, can be divided in up=7, down=3. Now, let's say I assign red to up and green to down, I would like that my bar would be filled for the 70% (7 out of 10) red and for the 30% green (3 out of 10).
I was struggling with that for days and I did not get any valid results.

Convert your data from "wide" to "long" format, for example using the reshape package. Then things get easier in ggplot. The restructured dataframe contains variable with values "down" and "up". This can be given to fill= as an ordered or unordered factor.
Below is a minimal example, that mimics your data:
library(ggplot2)
library(reshape)
x <- c(14,11,9,17)
dfr <- data.frame(COG=letters[1:4], down=1:4, up=x-1:4, tot=x)
dfr <- melt(dfr[,-4], idvar="COG")
ggplot(dfr, aes(x=COG, y=value, fill=factor(variable))) +
geom_bar(stat="identity") +
coord_flip() +
scale_fill_manual(values=c("green3","red3"))
Cheers!
Edit: If the levels get mixed up in your dataset, then it's because factor creates factor levels in the order as it finds it. To change the order either reorder your dataset (as I did) and let melt take care of it, or leave it and use ordered to make the factor follow an order as you specify.

Related

Plotting mean values of groups in a dataframe in R

I have conducted a study with triplicates (SampleID) for each sample (Sample) on different time points.
Now, I want to plot the means of the triplicates for the characteristic "Aerobic".
I want to plot for example the development of amount of aerobic bacteria over time. Therefore, I need to calculate the means (and the standard deviation) of the triplicates and then plot these means in the graph. Here, I could imagine to use a geom_line or geom_point diagram.
SampleID Sample Aerobic Anaerobic Day
[Factor] [Factor] [num] [num] [num]
1 V1.1.K1 V1.1.K 0.610063430 0.05146154 1
2 V1.1.K2 V1.1.K 0.740887757 0.02115290 1
3 V1.1.K3 V1.1.K 0.683726217 0.04270182 1
4 V1.1.N1 V1.1.N 0.432019752 0.35722350 1
5 V1.1.N2 V1.1.N 0.515792694 0.41357935 1
6 V1.14.K16 V1.14.K 0.038141335 0.84496088 14
7 V1.14.K17 V1.14.K 0.042078682 0.76523093 14
8 V1.14.K18 V1.14.K 0.009594763 0.90767637 14
9 V1.14.N0 V1.14.N 0.513100502 0.10618731 14
10 V1.14.W16 V1.14.W 0.483710571 0.32765968 14
How should i do this?
I tried it with the following code
plot <- mydata %>%
group_by(Sample) %>%
mutate(Mean=mean(Aerobic)) %>%
ggplot(aes(x=Day, y=Aerobic)) +
geom_point()
If I google the questions I get only information about how to calculate the mean alone, but not to set up a new table with the means for the different variables.
Is there something like
calc_mean_by_group ??
You would help me a lot :)
Simple base-R solution for calculating the means:
tapply(X = foo$Aerobic, INDEX = foo$Sample, FUN = mean)
("foo" being the name of your data.frame)

How to plot profiles in R with ggplot2

I have a large data set with protein IDs and corresponding abundance profiles across a number of gel fractions. I want to plot these profiles of abundances across the fractions.
The data looks like this
IDs<- c("prot1", "prot2", "prot3", "prot4")
fraction1 <- c(3,4,2,4)
fraction2<- c(1,2,4,1)
fraction3<- c(6,4,6,2)
plotdata<-data.frame(IDs, fraction1, fraction2, fraction3)
> plotdata
IDs fraction1 fraction2 fraction3
1 prot1 3 1 6
2 prot2 4 2 4
3 prot3 2 4 6
4 prot4 4 1 2
I want it to look like this:
Every protein has a profile. Every fraction has a corresponding abundance value per protein. I want to have multiple proteins per plot.
I tried figuring out ggplot2 using the cheat sheet and failed. I don't know what the input df should look like and what method I should use to get these profiles.
I would use excel, but a bug draws the wrong profile of my data depending on order of data, so I can't trust it to do what I want.
First, you'll have to reorganize your data.frame for ggplot2. You can do it one step with reshape2::melt. Here you can change the 'variable' and 'value' names.
library(reshape2)
library(dplyr)
library(ggplot2)
data2 <- melt(plotdata, id.vars = "IDs")
Then, we'll group the data by protein:
data2 <- group_by(data2, IDs)
Finally, you can plot it quite simply:
ggplot(data2) +
geom_line(aes(variable, value, group = IDs,
color = IDs))

Creating stacked barplots in R using different variables

I am a novice R user, hence the question. I refer to the solution on creating stacked barplots from R programming: creating a stacked bar graph, with variable colors for each stacked bar.
My issue is slightly different. I have 4 column data. The last column is the summed total of the first 3 column. I want to plot bar charts with the following information 1) the summed total value (ie 4th column), 2) each bar is split by the relative contributions of each of the three column.
I was hoping someone could help.
Regards,
Bernard
If I understood it rightly, this may do the trick
the following code works well for the example df dataframe
df <- a b c sum
1 9 8 18
3 6 2 11
1 5 4 10
23 4 5 32
5 12 3 20
2 24 1 27
1 2 4 7
As you don't want to plot a counter of variables, but the actual value in your dataframe, you need to use the goem_bar(stat="identity") method on ggplot2. Some data manipulation is necessary too. And you don't need a sum column, ggplot does the sum for you.
df <- df[,-ncol(df)] #drop the last column (assumed to be the sum one)
df$event <- seq.int(nrow(df)) #create a column to indicate which values happaned on the same column for each variable
df <- melt(df, id='event') #reshape dataframe to make it readable to gpglot
px = ggplot(df, aes(x = event, y = value, fill = variable)) + geom_bar(stat = "identity")
print (px)
this code generates the plot bellow

r- hist.default, 'x' must be numeric

Just picking up R and I have the following question:
Say I have the following data.frame:
v1 v2 v3
3 16 a
44 457 d
5 23 d
34 122 c
12 222 a
...and so on
I would like to create a histogram or barchart for this in R, but instead of having the x-axis be one of the numeric values, I would like a count by v3. (2 a, 1 c, 2 d...etc.)
If I do hist(dataFrame$v3), I get the error that 'x 'must be numeric.
Why can't it count the instances of each different string like it can for the other columns?
What would be the simplest code for this?
OK. First of all, you should know exactly what a histogram is. It is not a plot of counts. It is a visualization for continuous variables that estimates the underlying probability density function. So do not try to use hist on categorical data. (That's why hist tells you that the value you pass must be numeric.)
If you just want counts of discrete values, that's just a basic bar plot. You can calculate counts of values in R for discrete data using table and then plot that with the basic barplot() command.
barplot(table(dataFrame$v3))
If you want to require a minimum number of observations, try
tbl<-table(dataFrame$v3)
atleast <- function(i) {function(x) x>=i}
barplot(Filter(atleast(10), tbl))

Create a barplot of two tables of differing length

I can not seem to figure out how to get a nice barplot that contains the data from two tables that contain a different number of columns.
The tables in question are something like (snipped some data from the end):
> tab1
1 2 3 6 8 31
5872 1525 831 521 299 4
> tab2
1 2 3 4 22
7874 422 2 5 1
Note the column names and sizes are different. When I just do barplot() on one of these tables it comes out with the plot I'd like (showing the column names as the X-axis, frequencies on Y-axis). But, I would like these two side by side.
I've gotten as far as creating a data frame containing both variables as comments and the different row names in the first column (with data.frame()and merge()), but when I plot this the X-axis seems to be all wrong. Attempting to reorder the columns gives me an exception about lengths differing.
Code:
combined <- merge(data.frame(tab1), data.frame(tab2), by = c('Var1'), all=T)
barplot(t(combined[,2:3]), names.arg = combined[,1], beside=T)
This shows a plot, but not all labels are present and the value for position 26 is plotted after 33.
Is there any simple way to get this plot working? A ggplot2 solution would be nice.
You can put all your data in one data frame (as in example).
df<-data.frame(group=rep(c("A","B"),times=c(2,3)),
values=c(23,56,345,6,7),xval=c(1,2,1,2,8))
group values xval
1 A 23 1
2 A 56 2
3 B 345 1
4 B 6 2
5 B 7 8
Then ggplot() with geom_bar() can be used to plot the data.
ggplot(df,aes(xval,values,fill=group))+
geom_bar(stat="identity",position="dodge")

Resources