I was trying to use ggplot2 to creat a percentage barplot.
An example dataframe
sample mapped(%) unmapped(%) reads
sample1 96.5 3.5 1320
sample2 97.4 2.6 1451
sample3 92.1 7.9 1824
sample4 98.7 1.3 1563
and I used following code to create the barplot
df <- algin %>% gather(col,reads,mapped:reads)
ggplot(df,aes(x=sample, y=reads, fill=col)) + geom_col(position = position_stack()) + coord_flip() + scale_fill_manual("legend", values = c("mapped" = "darkred", "unmapped" = "red", "reads"="darkblue"))
Although the created barplot here is close to what I desired to display, it doesn't seem like correct, e.g. legend should be "mapped" with darkblue color, "unmapped" with darkred color.
I set above values as I tried different settings, and only above one gave me the desired visual effect.
For example, I also tried
ggplot(df, aes(x = sample, y = reads, fill = col)) +
geom_col(position = position_stack()) +
coord_flip() +
scale_fill_manual(
"legend",
values = c("mapped" = "darkblue", "unmapped" = "darkred", "reads" = "red")
)
Then the plot looks like...
What I want to see is
bar length represents reads (sequencing reads) of each sample, and add every x-axis values with M unit, e.g. 500M, 1000M, etc;
darkblue color corresponds to the percentage of reads that were aligned (i.e. mapped) to the reference genome;
darkred color corresponds to the percentage of reads that were not aligned (i.e. unmapped) to the reference genome;
legend: mapped, unmapped, and better to remove reads (as is no necessary to be there)
An example of the desired plot as follows
Solutions appreciated!
Thanks!
Assuming these data:
algin <- tribble(
~sample, ~mapped, ~unmapped, ~reads,
"sample1", 96.5, 3.5, 1320,
"sample2", 97.4, 2.6, 1451,
"sample3", 92.1, 7.9, 1824,
"sample4", 98.7, 1.3, 1563
)
We can create the plotting df like this:
df <- algin %>%
transmute(
sample,
mapped = reads * mapped / 100,
unmapped = reads * unmapped / 100
) %>%
gather(mapping, n, -sample)
And then plot what is pretty close to what you showed:
df %>%
ggplot(
aes(sample, n,
# Factor levels control the order of the colors
fill = factor(mapping, levels = c( "unmapped","mapped")))
) +
geom_col() +
scale_fill_manual(
# Control the shade with the colors of your example
values = c("mapped" = "#427BB0", "unmapped" = "#B0064C"),
# Control what the colors look like in the legend
# We could have directly named the new columns wit CamelCase too
labels = c("mapped" = "Mapped", "unmapped" = "Unmapped"),
# Control the order in the legend
breaks = c("mapped", "unmapped")
) +
# Flip sideways
coord_flip() +
# To not have the grey background
theme_minimal() +
theme(
# Your example didn't have horizontal lines
panel.grid.major.y = element_blank(),
# Self explanatory
legend.position = "bottom"
) +
# Add M to everything except 0
scale_y_continuous(labels = as_mapper(~ifelse(. == 0, "0",paste0(., "M")))) +
labs(
# Your example has no x axis label
x = NULL,
y = "# Reads",
# The values are self explanatory
fill = NULL
)
Your table:
df <- structure(list(sample = structure(1:4, .Label = c("sample1",
"sample2", "sample3", "sample4"), class = "factor"), `mapped(%)` = c(96.5,
97.4, 92.1, 98.7), `unmapped(%)` = c(3.5, 2.6, 7.9, 1.3), reads = c(1320L,
1451L, 1824L, 1563L)), class = "data.frame", row.names = c(NA,
-4L))
You need to calculate the number of mapped and unmapped reads, and we make it into a long format using pivot_longer which is similar to gather() which you used. We keep only the columns we need.
library(tidyverse)
plotdf <- df %>%
mutate(mapped=`mapped(%)`*reads/100,
unmapped=`unmapped(%)`*reads/100) %>%
select(sample,mapped,unmapped) %>%
pivot_longer(-sample) %>%
mutate(name = factor(name, levels = c("unmapped","mapped")))
Then we set colors like you said, and also defined the breaks. And plot basically using something you already have:
COLS <- alpha(c("mapped" = "darkred", "unmapped" = "darkblue"),0.7)
BR <- seq(0,1750,by=250)
ggplot(plotdf,aes(x=sample,y=value,fill=name)) +
scale_y_continuous(breaks=BR,labels=paste(BR,"M",sep=""))+
geom_col() + coord_flip() + scale_fill_manual("legend", values = COLS)+
theme_light()+
theme(legend.position = "bottom")+
ylab("#Reads")+xlab("")
Related
I am plotting max_temperature (mean_tmax) against rainfall (mean_rain) in a mirrored barplot: max temp displayed upwards, rain values downwards on the negative scale. These two are stored in the "name" variable.
To highlight the highest values out of the 32 years plotted, I created two vectors colVecTmax, colVecRain. They return a color vector of length 32 each, with the index of max values marked differently.
But when adding these two vectors to fill within geom_bar(), it turns out that ggplot stops counting the top after 16 bars, and moves down to the negative scale to continue. So it does not count by the name (mean_tmax, or mean_rain) variable.
This messes up the plot, and I am not sure how to get ggplot count through on the top bars for max_temperature first, coloring by colVecTmax, and then move down to do the same for rain on the negative scale with colVecRain.
Can anyone give a hint on how to solve this?
colVecTmax <- rep("orange",32)
colVecTmax[which.max(as.numeric(unlist(df.long[df.long$place=="sheffield" & df.long$name == "mean_tmax",4])))] <- "blue"
colVecRain <- rep("grey",32)
colVecRain[which.max(as.numeric(unlist(df.long[df.long$place=="sheffield" & df.long$name == "mean_rain",4])))] <- "blue"
ggplot(df.long[df.long$name %in% c('mean_rain', 'mean_tmax'), ] %>% filter(place== "sheffield")%>%
group_by(name) %>% mutate(value = case_when(
name == 'mean_rain' ~ value/10 * -1,
TRUE ~ value)) %>% mutate(place==str_to_sentence(placenames)) %>%
mutate(name = recode(name,'mean_rain' = "rainfall" , "mean_tmax" = "max temp"))
, aes(x = yyyy, y = value, fill=name))+
geom_bar(stat="identity", position="identity", fill=c(colVecTmax,colVecRain))+
labs(x="Year", y=expression("Rain in cm, temperature in ("*~degree *C*")"))+
geom_smooth(colour="black", lwd=0.5,se=F)+
scale_y_continuous(breaks = seq(-30, 30 , 5))+
scale_x_continuous(breaks = seq(1990, 2025, 5))+
guides(fill= guide_legend(title=NULL))+
scale_fill_discrete(labels=c("Max temperature", "Rainfall"))+
guides(fill=guide_legend(reverse=T), res=96)
Using ggplot2 there are much easier and less error prone ways to assign colors. Instead of creating color vectors which you pass to the color or fill argument you could simply map on aesthetics (which you basically already have done) and assign your desired colors using a manual scale, e.g. scale_fill_manual. The same approach works fine when you want to highlight some values. To this end you could create additional categories, e.g. in the code below I add "_max" to the name for the observations with the max temperature or rainfall and assign your desired "blue" color to these categories. As doing so will add additional categories I use the breaks argument of scale_fill_manual so that these max categories will not show up in the legend.
Using some fake random example data:
# Create example data
set.seed(123)
df.long <- data.frame(
name = rep(c("mean_rain", "mean_tmax"), each = 30),
place = "sheffield",
yyyy = rep(1991:2020, 2),
value = c(runif(30, 40, 100), runif(30, 12, 16))
)
library(ggplot2)
library(dplyr)
df_plot <- df.long %>%
filter(name %in% c("mean_rain", "mean_tmax")) |>
filter(place == "sheffield") %>%
mutate(value = case_when(
name == "mean_rain" ~ -value / 10,
TRUE ~ value
)) |>
# Maximum values
group_by(name) |>
mutate(name = ifelse(abs(value) >= max(abs(value)), paste(name, "max", sep = "_"), name))
ggplot(df_plot, aes(x = yyyy, y = value, fill = name)) +
geom_col(position = "identity") +
geom_smooth(colour = "black", lwd = 0.5, se = F) +
scale_y_continuous(breaks = seq(-30, 30, 5), labels = abs) +
scale_x_continuous(breaks = seq(1990, 2025, 5)) +
scale_fill_manual(
values = c(
mean_rain = "orange", mean_tmax = "grey",
mean_rain_max = "blue", mean_tmax_max = "blue"
),
labels = c(mean_tmax = "Max temperature", mean_rain = "Rainfall"),
breaks = c("mean_rain", "mean_tmax")
) +
labs(x = "Year", y = expression("Rain in cm, temperature in (" * ~ degree * C * ")"), fill = NULL) +
guides(fill = guide_legend(reverse = TRUE))
I am trying to make a certain alluvial plot with different widths specified in different columns. Let me try to explain it by drawing it, as I am not sure how to do this in ggalluvial.
Notice that the width of the flow from the Male box represents 3 units, while it represents 10 in box 3. Is it possible to create such graphs in ggalluvial? Or how can one construct such a graph in R?
I haven't drawn the other flows just to focus on the flow from male to 3.
I would hereby would like to present some data to create such a graph:
test_data <- data.table(`2018 - Gender` = c("Male", "Female", "Female", "Male"),
`2018 - Value` = c(10, 20, 30, 20),
`2019 - Gender` = c("Male", "Female", "Male", "Female"),
`2019 - Value` = c(20, 30, 10, 10)
)
Notice that the column names determine the "columns" in the graphs (i.e. the x-axis). While the Gender variable determines the blocks. The value from 2018 is the starting width, while the value from 2019 is the ending width of the strata.
As some have pointed out that I need to put more focus on my question. The question is how to make flow graphs with different starting and ending width.
Perhaps the following dummy example gives you a better idea. Please check if your data is in alluvial form with is_alluvia_form(), before you plot it.
c <- c(LETTERS[1:4], LETTERS[2:6], LETTERS[3:7], LETTERS[3:8])
t <- c(rep("Fortnight 1",4), rep("Fortnight 2",5), rep("Fortnight 3",5), rep("Fortnight 4",6))
s <- c(rep(c("Female","Male"),10))
ag <- c(2,3,4,6,11,13)
f <- rnorm(20,20,99)
df <- data.frame(Timeframe=t,Code=c,Sex=s,Freq=round(abs(f))) %>% mutate(Organization=ifelse((row_number() %in% ag), "Agencia2","Agencia1" ))
alluvial_data <- as.data.frame(df %>%select(Organization, Timeframe, Code, Freq, Sex))
alluvial_data <- alluvial_data %>% mutate(id = row_number())
#Remove duplicates
alluvial_data <- alluvial_data %>%
distinct(Organization, Timeframe, Code, Sex, .keep_all = TRUE)
#levels(alluvial_data$Timeframe)
# Convert Timeframe to Factor - Categorical Variable
alluvial_data$Timeframe <-as.factor(alluvial_data$Timeframe)
# Convert Code to String
alluvial_data$Code <-as.character(alluvial_data$Code)
library(RColorBrewer)
# Define the number of colors you want
nb.cols <- 10
mycolors <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolor2 <- colorRampPalette(brewer.pal(2, "Set2"))(nb.cols)
# Chart
ggplot(alluvial_data,
aes(y = Freq, axis1 = Organization, axis2 = Timeframe, axis3 = Code,fill=Sex)) +
#scale_fill_brewer(type = "qual", palette = "Set2") +
scale_x_discrete(limits=c("Organization","Timeframe","Code"), expand=c(0.05,0.05)) +
scale_fill_manual(values = mycolors) +
geom_flow(stat = "alluvium", lode.guidance = "frontback" #, color="grey"
) +
geom_stratum(width = 1/4, fill = "cyan", color = "grey") +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
theme(legend.position = "bottom") +
ggtitle("Organizations") +
guides(fill=guide_legend(override.aes = list(color=mycolors[1:2])))+
labs(fill=NULL)
I have a dataframe like so:
set.seed(453)
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )])
freq= c(sample(1:15, 20, replace=T), sample(1:18, 16,replace=T))
df<-data.frame(year, lepsp, freq)
df<-
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
Frequencies freq of each lepsp within each year are ranked in the rank column. Larger freq values correspond to the smallest rank value and smaller freq values have the largest rank values. Some rankings are repeated if levels of lepsp have the same abundance.
I would like to split the df into multiple subsets by year. Then I would like to plot each subsetted dataframe in a multipanel figure. Essentially this is to create species abundance curves. The x-axis would be rank and the yaxis needs to be freq.
In my real dataframe I have 22 years of data. I would prefer the graphs to be displayed as 2 columns of 4 rows for a total of 8 graphs per page. Essentially I would have to repeat the solution offered here 3 times.
I also need to demarcate the 25%, 50% and 75% quartiles with vertical lines to look like this (desired result):
It would be great if each graph specified the year to which it belonged, but since all axis are the same name, I do not want x and y labels to be repeated for each graph.
I have tried to plot multiple lines on the same graph but it gets messy.
year.vec<-unique(df$year)
plot(sort(df$freq[df$year==year.vec[1]],
decreasing=TRUE),bg=1,type="b", ylab="Abundance", xlab="Rank",
pch=21, ylim=c(0, max(df$freq)))
for (i in 2:22){
points(sort(df$freq[df$year==year.vec[i]], decreasing=TRUE), bg=i,
type="b", pch=21)
}
legend("topright", legend=year.vec, pt.bg=1:22, pch=21)
I have also tried a loop, however it does not produce an output and is missing some of the arguments I would like to include:
jpeg('pract.jpg')
par(mfrow = c(6, 4)) # 4 rows and 2 columns
for (i in unique(levels(year))) {
plot(df$rank,df$freq, type="p", main = i)
}
dev.off()
Update
(Attempted result)
I found the following code after my post which gets me a little closer, but is still missing all the features I would like:
library(reshape2)
library(ggplot2)
library (ggthemes)
x <- ggplot(data = df2, aes(x = rank, y = rabun)) +
geom_point(aes(fill = "dodgerblue4")) +
theme_few() +
ylab("Abundance") + xlab("Rank") +
theme(axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
plot.title = element_blank(), # we don't want individual plot titles as the facet "strip" will give us this
legend.position = "none", # we don't want a legend either
panel.border = element_rect(fill = NA, color = "darkgrey", size = 1.25, linetype = "solid"),
axis.ticks = element_line(colour = 'darkgrey', size = 1.25, linetype = 'solid')) # here, I just alter to colour and thickness of the plot outline and tick marks. You generally have to do this when faceting, as well as alter the text sizes (= element_text() in theme also)
x
x <- x + facet_wrap( ~ year, ncol = 4)
x
I prefer base R to modify graph features, and have not been able to find a method using base R that meets all my criteria above. Any help is appreciated.
Here's a ggplot approach. First off, I made some more data to get the 3x2 layout:
df = rbind(df, mutate(df, year = year + 4), mutate(df, year = year + 8))
Then We do a little manipulation to generate the quantiles and labels by group:
df_summ =
df %>% group_by(year) %>%
do(as.data.frame(t(quantile(.$rank, probs = c(0, 0.25, 0.5, 0.75)))))
names(df_summ)[2:5] = paste0("q", 0:3)
df_summ_long = gather(df_summ, key = "q", value = "value", -year) %>%
inner_join(data.frame(q = paste0("q", 0:3), lab = c("Common", "Rare-75% -->", "Rare-50% -->", "Rare-25% -->"), stringsAsFactors = FALSE))
With the data in good shape, plotting is fairly simple:
library(ggthemes)
library(ggplot2)
ggplot(df, aes(x = rank, y = freq)) +
geom_point() +
theme_few() +
labs(y = "Abundance (% of total)", x = "Rank") +
geom_vline(data = df_summ_long[df_summ_long$q != "q0", ], aes(xintercept = value), linetype = 4, size = 0.2) +
geom_text(data = df_summ_long, aes(x = value, y = Inf, label = lab), size = 3, vjust = 1.2, hjust = 0) +
facet_wrap(~ year, ncol = 2)
There's some work left to do - mostly in the rarity text overlapping. It might not be such an issue with your actual data, but if it is you could pull the max y values into df_summ_long and stagger them a little bit, actually using y coordinates instead of just Inf to get it at the top like I did.
Say I have this data frame:
treatment <- c(rep("A",6),rep("B",6),rep("C",6),rep("D",6),rep("E",6),rep("F",6))
year <- as.numeric(c(1999:2004,1999:2004,2005:2010,2005:2010,2005:2010,2005:2010))
variable <- c(runif(6,4,5),runif(6,5,6),runif(6,3,4),runif(6,4,5),runif(6,5,6),runif(6,6,7))
se <- c(runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5),runif(6,0.2,0.5))
id <- 1:36
df1 <- as.data.table(cbind(id,treatment,year,variable,se))
df1$year <- as.numeric(df1$year)
df1$variable <- as.numeric(df1$variable)
df1$se <- as.numeric(df1$se)
As I mentioned in a previous question (draw two lines with the same origin using ggplot2 in R), I wanted to use ggplot2 to display my data in a specific way.
I managed to do so using the following script:
y1 <- df1[df1$treatment=='A'&df1$year==2004,]$variable
y2 <- df1[df1$treatment=='B'&df1$year==2004,]$variable
y3 <- df1[df1$treatment=='C'&df1$year==2005,]$variable
y4 <- df1[df1$treatment=='D'&df1$year==2005,]$variable
y5 <- df1[df1$treatment=='E'&df1$year==2005,]$variable
y5 <- df1[df1$treatment=='E'&df1$year==2005,]$variable
y6 <- df1[df1$treatment=='F'&df1$year==2005,]$variable
p <- ggplot(df1,aes(x=year,y=variable,group=treatment,color=treatment))+
geom_line(aes(y = variable, group = treatment, linetype = treatment, color = treatment),size=1.5,lineend = "round") +
scale_linetype_manual(values=c('solid','solid','solid','dashed','solid','dashed')) +
geom_point(aes(colour=factor(treatment)),size=4)+
geom_errorbar(aes(ymin=variable-se,ymax=variable+se),width=0.2,size=1.5)+
guides(colour = guide_legend(override.aes = list(shape=NA,linetype = c("solid", "solid",'solid','dashed','solid','dashed'))))
p+labs(title="Title", x="years", y = "Variable 1")+
theme_classic() +
scale_x_continuous(breaks=c(1998:2010), labels=c(1998:2010),limits=c(1998.5,2010.5))+
geom_segment(aes(x=2004, y=y1, xend=2005, yend=y3),colour='blue1',size=1.5,linetype='solid')+
geom_segment(aes(x=2004, y=y1, xend=2005, yend=y4),colour='blue1',size=1.5,linetype='dashed')+
geom_segment(aes(x=2004, y=y2, xend=2005, yend=y5),colour='red3',size=1.5,linetype='solid')+
geom_segment(aes(x=2004, y=y2, xend=2005, yend=y6),colour='red3',size=1.5,linetype='dashed')+
scale_color_manual(values=c('blue1','red3','blue1','blue1','red3','red3'))+
theme(text = element_text(size=12))
As you can see I used both geom_line and geom_segment to display the lines for my graph.
It's almost perfect but if you look closely, the segments that are drawn (between 2004 and 2005) do not display the same line size, even though I used the same arguments values in the script (i.e. size=1.5 and linetype='solid' or dashed).
Of course I could change manually the size of the segments to get similar lines, but when I do that, segments are not as smooth as the lines using geom_line.
Also, I get the same problem (different line shapes) by including the size or linetype arguments within the aes() argument.
Do you have any idea what causes this difference and how I can get the exact same shapes for both my segments and lines ?
It seems to be an anti-aliasing issue with geom_segment, but that seems like a somewhat cumbersome approach to begin with. I think I have resolved your issue by duplicating the A and B treatments in the original data frame.
# First we are going to duplicate and rename the 'shared' treatments
library(dplyr)
library(ggplot2)
df1 %>%
filter(treatment %in% c("A", "B")) %>%
mutate(treatment = ifelse(treatment == "A",
"AA", "BB")) %>%
bind_rows(df1) %>% # This rejoins with the original data
# Now we create `treatment_group` and `line_type` variables
mutate(treatment_group = ifelse(treatment %in% c("A", "C", "D", "AA"),
"treatment1",
"treatment2"), # This variable will denote color
line_type = ifelse(treatment %in% c("AA", "BB", "D", "F"),
"type1",
"type2")) %>% # And this variable denotes the line type
# Now pipe into ggplot
ggplot(aes(x = year, y = variable,
group = interaction(treatment_group, line_type), # grouping by both linetype and color
color = treatment_group)) +
geom_line(aes(x = year, y = variable, linetype = line_type),
size = 1.5, lineend = "round") +
geom_point(size=4) +
# The rest here is more or less the same as what you had
geom_errorbar(aes(ymin = variable-se, ymax = variable+se),
width = 0.2, size = 1.5) +
scale_color_manual(values=c('blue1','red3')) +
scale_linetype_manual(values = c('dashed', 'solid')) +
labs(title = "Title", x = "Years", y = "Variable 1") +
scale_x_continuous(breaks = c(1998:2010),
limits = c(1998.5, 2010.5))+
theme_classic() +
theme(text = element_text(size=12))
Which will give you the following
My numbers are different since they were randomly generated.
You can then modify the legend to your liking, but my recommendation is using something like geom_label and then be sure to set check_overlap = TRUE.
Hope this helps!
I am new to R and ggplot2.I have searched a lot regarding this but I could not find the solution.
Sample observation1 observation2 observation3 percentage
sample1_A 163453473 131232689 61984186 30.6236955883
Sample1_B 170151351 137202212 59242536 26.8866816109
sample2_A 194102849 162112484 89158170 40.4183031852
sample2_B 170642240 141888123 79925652 41.7493687378
sample3_A 192858504 161227348 90532447 41.8068248626
sample3_B 177174787 147412720 81523935 40.5463120438
sample4_A 199232380 174656081 118115358 55.6409038531
sample4_B 211128931 186848929 123552556 54.7201927527
sample5_A 186039420 152618196 87012356 40.9656544833
sample5_B 145855252 118225865 66265976 39.5744515254
sample6_A 211165202 186625116 112710053 48.5457722338
sample6_B 220522502 193191927 114882014 47.238670909
I am planning to plot a bar plot with ggplot2. I want to plot the first three columns as a bar plot "dodge" and label the observation3 bar with the percentage. I could plot the bars as below but I could not use geom_text() to add the label.
data1 <- read.table("readStats.txt", header=T)
data1.long <- melt(data1)
ggplot(data1.long[1:36,], aes(data1.long$Sample[1:36],y=data1.long$value[1:36], fill=data1.long$variable[1:36])) + geom_bar(stat="identity", width=0.5, position="dodge")
Transform data1 to long form with the observation columns as the measure variables and the Sample and percentage columns as the id variables. Compute the maximum value, mx, to be used to place the percentages. Then perform the plot. Note that geom_bar uses data1.long but geom_text uses data1. We have colored the text giving the percentages the same color as the observation3 bars. (See this post for how to specify default colors.) Both inherit aes(x = Sample) but use different y and other aesthetics. We clean up the X axis labels by removing all lower case letters and underscores from the data1$Sample (optional).
library(ggplot2)
library(reshape2)
data1.long <- melt(data1, measure = 2:4) # cols 2:4 are observation1, ..., observation3
mx <- max(data1.long$value) # maximum observation value
ggplot(data1.long, aes(x = Sample, y = value)) +
geom_bar(aes(fill = variable), stat = "identity", width = 0.5, position = "dodge") +
geom_text(aes(y = mx, label = paste0(round(percentage), "%")), data = data1,
col = "#619CFF", vjust = -0.5) +
scale_x_discrete(labels = gsub("[a-z_]", "", data1$Sample))
(click on chart to enlarge)
Note: We used this data. Note that one occurrence of Sample was changed to sample with a lower case s:
Lines <- "Sample observation1 observation2 observation3 percentage
sample1_A 163453473 131232689 61984186 30.6236955883
sample1_B 170151351 137202212 59242536 26.8866816109
sample2_A 194102849 162112484 89158170 40.4183031852
sample2_B 170642240 141888123 79925652 41.7493687378
sample3_A 192858504 161227348 90532447 41.8068248626
sample3_B 177174787 147412720 81523935 40.5463120438
sample4_A 199232380 174656081 118115358 55.6409038531
sample4_B 211128931 186848929 123552556 54.7201927527
sample5_A 186039420 152618196 87012356 40.9656544833
sample5_B 145855252 118225865 66265976 39.5744515254
sample6_A 211165202 186625116 112710053 48.5457722338
sample6_B 220522502 193191927 114882014 47.238670909"
data1 <- read.table(text = Lines, header = TRUE)
UPDATE: minor improvements
It might be that G. Grothendieck's answer is a better solution, but here's my suggestion (code below)
# install.packages("ggplot2", dependencies = TRUE)
require(ggplot2)
df <- structure(list(Sample = structure(1:12, .Label = c("sample1_A",
"Sample1_B", "sample2_A", "sample2_B", "sample3_A", "sample3_B",
"sample4_A", "sample4_B", "sample5_A", "sample5_B", "sample6_A",
"sample6_B"), class = "factor"), observation1 = c(163453473L,
170151351L, 194102849L, 170642240L, 192858504L, 177174787L, 199232380L,
211128931L, 186039420L, 145855252L, 211165202L, 220522502L),
observation2 = c(131232689L, 137202212L, 162112484L, 141888123L,
161227348L, 147412720L, 174656081L, 186848929L, 152618196L,
118225865L, 186625116L, 193191927L), observation3 = c(61984186L,
59242536L, 89158170L, 79925652L, 90532447L, 81523935L, 118115358L,
123552556L, 87012356L, 66265976L, 112710053L, 114882014L),
percentage = c(30.6236955883, 26.8866816109, 40.4183031852,
41.7493687378, 41.8068248626, 40.5463120438, 55.6409038531,
54.7201927527, 40.9656544833, 39.5744515254, 48.5457722338,
47.238670909)), .Names = c("Sample", "observation1", "observation2",
"observation3", "percentage"), class = "data.frame", row.names = c(NA,
-12L))
# install.packages("reshape2", dependencies = TRUE)
require(reshape2)
data1.long <- melt(df, id=c("Sample"), measure.var = c("observation1", "observation2", "observation3"))
data1.long$percentage <- paste(round(data1.long$percentage, 2), "%", sep="")
data1.long[data1.long$variable == "observation1" | data1.long$variable == "observation2" ,2] <- ""
ggplot(data1.long, aes(x = Sample, y = value, fill=variable)) +
geom_bar(, stat="identity", width=0.5, position="dodge") +
geom_text(aes(label = percentage), vjust=2.10, size=2, hjust=-.06, angle = 90)