Change Bar Colours in a Grouped Bar Plot - r

My data consist of numerical values between 100 - 2000 grouped into 3 different drug treatment groups, which are then subdivided into 3 groups (based on their anatomical location in an organism, termed "Inner", "Middle", "Outer"). The final plot should be 3 groups of 3 bars (each representing the mean values of cell survival in each of the 3 locations). So far I have managed to make individual barplots, but I want to combine them. Here is some code that I have, and below that is a small excerpt from the data set.
Treatment Inner Middle Outer
RAD 317 373 354
RAD 323 217 174
RAD 236 255 261
HUTS 1411 1844 1978
HUTS 1922 1756 1856
HUTS 1478 1711 1433
RGD 1433 1489 1633
RGD 1400 1500 1544
RGD 1222 1333 1444
With some help, I've been able to create a grouped bar plot using the code:
df %>%
gather(key = group, value = value, -Treatment) %>%
ggplot(aes(x = Treatment, y = value, fill = group)) +
stat_summary(fun.y = mean, geom = "col", position = position_dodge())
Now, however, I want to be able to choose the colours of the bars.
Any help would be really appreciated!

Related

Show columns as percentage in R ggplot

I need help with a graph I am trying to built in R.
This is the data:
Location
Total Number of Employees
Local Number
Remote Number
L1
150
50
100
L2
355
148
207
L3
477
106
371
L4
234
82
152
L5
987
523
464
L6
4564
2504
2060
L7
2342
1425
917
L8
754
415
339
And this is what I am aiming for
[1]: https://i.stack.imgur.com/eoVxL.jpg
So, basically I want to present the "Total Number of Employees" column in a 0-100% range and since L6 has the highest number of employees, 4564 should be 100%. The legend should show the local and remote number, where the "Local" column should be shown in the positive grid and the "Remote" column in the negative one. The locations should be ordered from min to max.
Something like this?
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
mutate(across(Local.Number:Remote.Number, ~ .x / max(Total.Number.of.Employees)),
Remote.Number = -Remote.Number) %>%
pivot_longer(-c(Location, Total.Number.of.Employees)) %>%
ggplot() +
aes(x = Location, y = value, fill = name) +
geom_col() +
scale_y_continuous(labels = scales::label_percent()) +
theme_bw()

Plotting each value of columns for a specific row

I am struggling to plot a specific row from a dataframe. Below is the Graph i am trying to plot. I have tried using ggplot and normal plot but i cannot figure it out.
Wt2 Wt3 Wt4 Wt5 Lngth2 Lngth3 Lngth4 Lngth5
1 48 59 95 82 141 157 168 183
2 59 68 102 102 140 168 174 170
3 61 77 93 107 145 162 172 177
4 54 43 104 104 146 159 176 171
5 100 145 185 247 150 158 168 175
6 68 82 95 118 142 140 178 189
7 68 95 109 111 139 171 176 175
Above is the Data frame I am trying to plot with. The rows are for each bears measurement. So row 1 is for bear 1. How would I plot only the Wt columns for bear 1 against an X-axis that goes from years 2 to 5
You can pivot your data frame into a longer format:
First add a column with the row number (bear number):
df = cbind("Bear"=as.factor(1:nrow(df)), df)
It needs to be factor so we can pass it as a group variable to ggplot. Now pivot:
df2 = tidyr::pivot_longer(df[,1:5], cols=2:5,
names_to="Year", values_to="Weight", names_prefix="Wt")
df2$Year = as.numeric(df2$Year)
We ignore the Length columns with df[,1:5]; say that we only want to pivot the weight columns with df[,2:5]; then say the name of the columns we want to create with names_to and values_to; and lastly the names_prefix="Wt" removes the "Wt" before the column names, leaving only the year number, but we get a character, so we need to make it numeric with as.numeric().
Then plot:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) + geom_line()
Output (Ps: i created my own data, so the actual numbers are off):
Just an addition, if you don't want to specify the columns of your dataset explicity, you can do:
df2 = df2[,grep("Wt|Bear", colnames(df)]
df2 = tidyr::pivot_longer(df2, cols=grep("Wt", colnames(df2)),
names_to="Year", values_to="Weight", names_prefix="Wt")
Edit: one plot for each group
You can use facet_wrap:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) +
facet_wrap(~Bear, nrow=2, ncol=4) +
geom_line()
Output:
You can change the nrow and ncol as you wish, and can remove the linetype from aes() as you already have a differenciation, but it's not mandatory.
You can also change the levels of the categorical data to make the labels on each graph better, do levels(df2$Bear) = paste("Bear", 1:7) for example (or do that the when creating it).
Try
ggplot(mapping = aes(x = seq.int(2, 5), y = c(48, 59, 95, 82))) +
geom_point(color = "blue") +
geom_line(color = "blue") +
xlab("Year") +
ylab("Weight")

Violin plot from summary data

I'd like to use a violin plot to visualise the number of archaeological artefacts by site (A and B) and by century with data in the following format (years are Before Present):
Year SiteA SiteB
22400 356 182
22500 234 124
22600 144 231
22700 12 0
...
24800 112 32
There are some 6000 artefacts in total. In ggplot2, it would seem as if the preferred data entry format is of one line per observation (artefact) for a violin plot:
Site Year
A 22400
A 22400
... (356 times)
A 22400
B 22400
B 22400
... (182 times)
A 22500
A 22500
... (234 times)
A 22500
... ... ... (~5000 lines)
B 24800
B 24800
... (32 times)
B 24800
Is there an effective way of converting summary dataframe (1st grey box) into an observation-by-observation dataframe (2nd grey box) for use in a violin plot?
Alternatively, is there a way of making violin plots from data formatted as in the first grey box?
Update:
With the answer provided by eipi10, if either Site A or B has zero artefacts (as in the updated example above for the year 22,700), I get the following error:
Error in data.frame(Year = rep(dat$Year[i], dat$value[i]), Site = dat$key[i]) :
arguments imply differing number of rows: 0, 1
The plot would look like this:
How about this:
library(tidyverse)
dat = read.table(text="Year SiteA SiteB
22400 356 182
22500 234 124
22600 144 231
24800 112 32", header=TRUE, stringsAsFactors=FALSE)
dat = gather(dat, key, value, -Year)
dat.long = data.frame(Year = rep(dat$Year, dat$value), Site=rep(dat$key, dat$value))
ggplot(dat.long, aes(Site, Year)) +
geom_violin()

Dotplot with two categorical variables and two quantitative variables

I have a problem making a dotplot. I have a data frame "distribution_tab" with 4 columns and 6 rows. The two first columns are quantitative variables and the two other are categorical values :
read.length percentage.GC strand organism
1 203 63.0 forward bacteria
2 250 33.0 forward plant
3 205 72.0 reverse bacteria
4 240 36.0 reverse plant
5 210 33.5 forward plant
6 230 63.5 reverse bacteria
I want to make only one dotplot out of this data frame, with read.length in the x axis and percentage.GC in the y axis. The strand "forward" has to be represented with a dot and the strand reverse with a triangle (or with whatever two other different symbols). The organism "bacteria" has to be represented in pink and the organism "plant" in green.
So for instance, if one data is "forward and bacteria", it has to be represented with a pink dot in the dotplot, and if it is "reverse and plant" it has to be a green triangle.
I really don't know how to do this (or if it possible at all). For the moment I have made a dotplot with the two quantitative variables :
plot(distribution_tab$read_length ~ distribution_tab$percentage.GC)
I have no idea how to distinguish them in the plot according to their organism and strand values.
distribution_tab <- read.table(header = TRUE, text = "read.length percentage.GC strand organism
1 203 63.0 forward bacteria
2 250 33.0 forward plant
3 205 72.0 reverse bacteria
4 240 36.0 reverse plant
5 210 33.5 forward plant
6 230 63.5 reverse bacteria ")
plot(percentage.GC ~ read.length, data = distribution_tab,
pch = c(17,19)[(strand %in% 'forward') + 1L],
col = c('pink', 'green')[(organism %in% 'plant') + 1L])
or using ifelse but the above method is more flexible
plot(percentage.GC ~ read.length, data = distribution_tab,
pch = ifelse(strand %in% 'forward', 19, 17),
col = ifelse(organism %in% 'plant', 'green', 'pink'))
Using ggplot:
library(ggplot2)
df$col <- ifelse(df$organism == "bacteria", "pink", "green")
ggplot(df, aes(read.length, percentage.GC, shape = strand, col = col)) +
geom_point(size = 4) +
scale_color_identity()
Data:
#dummy data
df <- read.table(text=" read.length percentage.GC strand organism
1 203 63.0 forward bacteria
2 250 33.0 forward plant
3 205 72.0 reverse bacteria
4 240 36.0 reverse plant
5 210 33.5 forward plant
6 230 63.5 reverse bacteria ", header = TRUE)

Draw plot for comparing each row? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I would like to draw a plot for following table.
T6 T26 D6 D26
ENSMUSG00000026427 420 170 197 249
ENSMUSG00000026436 27 21 54 77
ENSMUSG00000018189 513 246 429 484
ENSMUSG00000026470 100 55 82 73
ENSMUSG00000026696 147 73 182 283
ENSMUSG00000026568 3620 1571 1264 1746
ENSMUSG00000026504 95 60 569 428
I want to compare each row and specified each column by different colour.
X.lab= Gene name
y.Lab= Counts
I think that the appropriate plotting choice depends on the characteristics of your full dataset, and from what I can tell, on the number of possible unique values of IDs ("ENSMUSG*") and the possible number of variables ("T26", "D26", ...). What is clear however, is that the variables have different scales, so should not be combined on the same plot, and so I have chosen a faceted grid plot below.
Here is some code that makes an appropriate choice based on the sample of the data that you have chosen to show us:
library(readr)
library(dplyr)
library(tidyr)
df_foo = read.table(textConnection(
"T6 T26 D6 D26
ENSMUSG00000026427 420 170 197 249
ENSMUSG00000026436 27 21 54 77
ENSMUSG00000018189 513 246 429 484
ENSMUSG00000026470 100 55 82 73
ENSMUSG00000026696 147 73 182 283
ENSMUSG00000026568 3620 1571 1264 1746
ENSMUSG00000026504 95 60 569 428"
))
# plot the data
df_foo %>%
add_rownames(var = "ID") %>%
gather(key = Variable, value = Value, -ID) %>%
ggplot(aes(x = ID, y = Value, fill = Variable)) +
geom_bar(stat = "identity") +
theme_bw() +
facet_wrap(~ Variable, scales = "free_y") +
theme(axis.text.x = element_text(angle = 50, hjust = 1))
# save the plot
ggsave("results/faceted_bar.png", dpi = 600)
Note that making the color aesthetic above is strictly not required given that we are faceting by Variable anyway. Here is what the above code produces:
It can be easily argued that this is not the appropriate chart for your data given more context and knowledge about your data. You should add more detail to the question as others have commented.

Resources