Creating stacked barplots in R using different variables

Creating stacked barplots in R using different variables - r

I am a novice R user, hence the question. I refer to the solution on creating stacked barplots from R programming: creating a stacked bar graph, with variable colors for each stacked bar.
My issue is slightly different. I have 4 column data. The last column is the summed total of the first 3 column. I want to plot bar charts with the following information 1) the summed total value (ie 4th column), 2) each bar is split by the relative contributions of each of the three column.
I was hoping someone could help.
Regards,
Bernard

If I understood it rightly, this may do the trick
the following code works well for the example df dataframe
df <- a b c sum
1 9 8 18
3 6 2 11
1 5 4 10
23 4 5 32
5 12 3 20
2 24 1 27
1 2 4 7
As you don't want to plot a counter of variables, but the actual value in your dataframe, you need to use the goem_bar(stat="identity") method on ggplot2. Some data manipulation is necessary too. And you don't need a sum column, ggplot does the sum for you.
df <- df[,-ncol(df)] #drop the last column (assumed to be the sum one)
df$event <- seq.int(nrow(df)) #create a column to indicate which values happaned on the same column for each variable
df <- melt(df, id='event') #reshape dataframe to make it readable to gpglot
px = ggplot(df, aes(x = event, y = value, fill = variable)) + geom_bar(stat = "identity")
print (px)
this code generates the plot bellow

Related

Plotting mean values of groups in a dataframe in R

I have conducted a study with triplicates (SampleID) for each sample (Sample) on different time points.
Now, I want to plot the means of the triplicates for the characteristic "Aerobic".
I want to plot for example the development of amount of aerobic bacteria over time. Therefore, I need to calculate the means (and the standard deviation) of the triplicates and then plot these means in the graph. Here, I could imagine to use a geom_line or geom_point diagram.
SampleID Sample Aerobic Anaerobic Day
[Factor] [Factor] [num] [num] [num]
1 V1.1.K1 V1.1.K 0.610063430 0.05146154 1
2 V1.1.K2 V1.1.K 0.740887757 0.02115290 1
3 V1.1.K3 V1.1.K 0.683726217 0.04270182 1
4 V1.1.N1 V1.1.N 0.432019752 0.35722350 1
5 V1.1.N2 V1.1.N 0.515792694 0.41357935 1
6 V1.14.K16 V1.14.K 0.038141335 0.84496088 14
7 V1.14.K17 V1.14.K 0.042078682 0.76523093 14
8 V1.14.K18 V1.14.K 0.009594763 0.90767637 14
9 V1.14.N0 V1.14.N 0.513100502 0.10618731 14
10 V1.14.W16 V1.14.W 0.483710571 0.32765968 14
How should i do this?
I tried it with the following code
plot <- mydata %>%
group_by(Sample) %>%
mutate(Mean=mean(Aerobic)) %>%
ggplot(aes(x=Day, y=Aerobic)) +
geom_point()
If I google the questions I get only information about how to calculate the mean alone, but not to set up a new table with the means for the different variables.
Is there something like
calc_mean_by_group ??
You would help me a lot :)

Simple base-R solution for calculating the means:
tapply(X = foo$Aerobic, INDEX = foo$Sample, FUN = mean)
("foo" being the name of your data.frame)

Scatterplot with R from text file with log scale

I have data saved in a text file with couple thousands line. Each line only has one value. Like this
52312
2
3
4
5
7
9
4
5
3
The first value is always roughly 10.000 times bigger than all the other values.
I can read in the data with data<-read.table("data.txt")
When I just use plot(data) all the data have the same y-value, resulting in a line, where the x values just represent the values given from the data.
What I want, however, is that the x-value represents the linenumber and y-value the actual data. So for the above example my values would be (1,52312), (2,2), (3,3), (4,4), (5,5), (6,7), (7,9), (8,4), (9,5), (10,3).
Also, since the first value is way higher than all the other values, I'd like to use a log scale for the y-axis.
Sorry, very new to R.

set.seed(1000)
df = data.frame(a=c(9999999,sample(2:78,77,replace = F)))
plot(x=1:nrow(df), y=log(df$a))
i) set.seed(1000) helps you reproduce the same random numbers from sample() each time you run this code. It makes code reproducible.
ii) type ?sample in R console for documentation.
iii) since you wanted the x-axis to be linenumber - I create it using ":" operator. 1:3 = 1,2,3. Similarily I created a "id" index using 1:nrow(df) which will create based on the dimension of your data.
iv) for log ,just use it simple :). read more about ?plot and its parameters

Try this:
df
x y
1 1 52312
2 2 2
3 3 3
4 4 4
5 5 5
6 6 7
7 7 9
8 8 4
9 9 5
10 10 3
library(ggplot2)
ggplot(df, aes(x, y)) + geom_point(size=2) + scale_y_log10()

ggplot - use numeric values to fill stacked bar charts

I would like to build a stacked bar chart, providing in the field fill numeric values and not categories.
This is my graph so far:
In the ggplot example for the stacked bar chart, the field fill correspond to the column cut of the diamonds dataset.
This column correspond to :
> class(diamonds$cut)
[1] "ordered" "factor"
Therefore, I think that the frequence of the different terms
> head(diamonds$cut)
[1] Ideal Premium Good Premium Good Very Good
Levels: Fair < Good < Very Good < Premium < Ideal
is calculated and used to fill the bars.
In my case I have each value of the bar displayed on X (tot in my dataframe) formed by two type of value: up, down. These correspond to columns in my dataframe:
> head(cyt.4)
COG tot up down
1 [C] Energy production and conversion 17 16 1
2 [D] Cell cycle control, cell division, chromosome partitioning 0 0 0
3 [E] Amino acid transport and metabolism 34 30 4
4 [F] Nucleotide transport and metabolism 11 9 2
5 [G] Carbohydrate transport and metabolism 13 9 4
6 [H] Coenzyme transport and metabolism 3 3 0
For example a bar that has a X (tot) value of 10, can be divided in up=7, down=3. Now, let's say I assign red to up and green to down, I would like that my bar would be filled for the 70% (7 out of 10) red and for the 30% green (3 out of 10).
I was struggling with that for days and I did not get any valid results.

Convert your data from "wide" to "long" format, for example using the reshape package. Then things get easier in ggplot. The restructured dataframe contains variable with values "down" and "up". This can be given to fill= as an ordered or unordered factor.
Below is a minimal example, that mimics your data:
library(ggplot2)
library(reshape)
x <- c(14,11,9,17)
dfr <- data.frame(COG=letters[1:4], down=1:4, up=x-1:4, tot=x)
dfr <- melt(dfr[,-4], idvar="COG")
ggplot(dfr, aes(x=COG, y=value, fill=factor(variable))) +
geom_bar(stat="identity") +
coord_flip() +
scale_fill_manual(values=c("green3","red3"))
Cheers!
Edit: If the levels get mixed up in your dataset, then it's because factor creates factor levels in the order as it finds it. To change the order either reorder your dataset (as I did) and let melt take care of it, or leave it and use ordered to make the factor follow an order as you specify.

R: graph multiple columns on one line

This seems simple, but I've tried multiple variations of matplot, ggplot2, regular old plot...I can't get any to do what I need.
I have a gigantic dataframe of years, months, and observations. I simplified it down to number of observations per month, per year, see below. I'm not sure why it read in with the "X" in front of each column heading, but if it's not going to affect the code, right now I don't care.
head(storms)
X Month X1992 X1993 X1994
1 1 1 2 1
2 2 2 4 1
3 3 3 26 10
4 4 4 47 26
5 5 5 969 615
The full (simplified) set is 10 columns of years (1992-2001), each with 12 months/rows of totals (1 storm in Jan 1992, 26 storms in March 1993...). I need simply to plot these all on an x-axis 120 months long, # of observations per month on the y-axis. It could be a line or bars or vertical lines. I've seen many ways to plot 20 lines with 12 months on the x-axis; that is not what I'm going for. I also need to label the years every 12 months, but I think I can figure that out after I get this block out of the way.
In other words (I hope this is more clear if the previous is not):
y axis: # of storms, ylim=c(0-1000)
x axis: 10 sets of months (Jan-Dec, 1992-2001, 120 months total). The only labels will be the years, every 12 months of course.
I know I'm just thinking about it wrong, could someone please set my head straight?
(first post; please also tell me if I'm not formatting or inquiring properly!)

is this something you are looking for? If I am not mistaken, you may want to rearrange your data frame. You wanna make your data frame longer rather than wider. Then, you can draw a figure. The thing is that you have 120 month. So you may need to think plot space issue. But at least this example let you move forward. I hope this helps you.
library(tidyr)
library(ggplot2)
# Create a sample data
month <- rep(c(1:12), each = 1, times = 2)
nintytwo <- runif(24, 0, 20)
nintythree <- runif(24, 0, 20)
# Crate a data frame
ana <- data.frame(month, nintytwo, nintythree)
# Make the data longer rather than wider.
bob <- gather(ana, year, value, -month)
bob$month <- as.factor(bob$month)
# Draw a firure
cathy <- ggplot(bob, aes(x= year,y = value, fill = month)) + geom_bar(stat="identity", position="dodge")
cathy

Here's an example using base R :
# create an example data
set.seed(123)
df <- data.frame(Month=1:12)
for(y in 1992:2001){
tmp <- data.frame(X=as.integer(abs(rnorm(12,mean=2,sd=10))))
colnames(tmp) <- paste("X",y,sep="")
df <- cbind(df,tmp)
}
# reshape to long format (one column with n.of storms, and period columns)
long <- reshape(df[,-1], idvar="Month", ids=df$Month,
times=names(df[,-1]), timevar="Year",
varying = list(names(df[,-1])),
direction = "long",v.names="Storms")
# remove the "X" from the year
long$Year <- substr(long$Year,2,nchar(long$Year))
nYears <- length(unique(long$Year))
# plot the line
plot(x=1:nrow(long),y=long$Storms,type="l",
xaxt="n",main="Monthly Storms",
xlab="Period",ylab="Storms",col="RoyalBlue")
# add custom labels
axis(1,at=((1:nYears)*12)-6,labels=unique(long$Year))
# add vertical lines
abline(v=c(0.5,((1:nYears)*12)+0.5),col="Gray80",lty=2)
Result :

Create a barplot of two tables of differing length

I can not seem to figure out how to get a nice barplot that contains the data from two tables that contain a different number of columns.
The tables in question are something like (snipped some data from the end):
> tab1
1 2 3 6 8 31
5872 1525 831 521 299 4
> tab2
1 2 3 4 22
7874 422 2 5 1
Note the column names and sizes are different. When I just do barplot() on one of these tables it comes out with the plot I'd like (showing the column names as the X-axis, frequencies on Y-axis). But, I would like these two side by side.
I've gotten as far as creating a data frame containing both variables as comments and the different row names in the first column (with data.frame()and merge()), but when I plot this the X-axis seems to be all wrong. Attempting to reorder the columns gives me an exception about lengths differing.
Code:
combined <- merge(data.frame(tab1), data.frame(tab2), by = c('Var1'), all=T)
barplot(t(combined[,2:3]), names.arg = combined[,1], beside=T)
This shows a plot, but not all labels are present and the value for position 26 is plotted after 33.
Is there any simple way to get this plot working? A ggplot2 solution would be nice.

You can put all your data in one data frame (as in example).
df<-data.frame(group=rep(c("A","B"),times=c(2,3)),
values=c(23,56,345,6,7),xval=c(1,2,1,2,8))
group values xval
1 A 23 1
2 A 56 2
3 B 345 1
4 B 6 2
5 B 7 8
Then ggplot() with geom_bar() can be used to plot the data.
ggplot(df,aes(xval,values,fill=group))+
geom_bar(stat="identity",position="dodge")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating stacked barplots in R using different variables - r

Related

Plotting mean values of groups in a dataframe in R

Scatterplot with R from text file with log scale

ggplot - use numeric values to fill stacked bar charts

R: graph multiple columns on one line

Create a barplot of two tables of differing length

Categories

Resources