I'm trying to plot the seasonality of nesting and hatching for turtles on one graph - with the count of how many nests were laid/hatched for every day of the season (01/05/2021-30/09/2021). Some of my data is as follows:
Date - Laid Green - Hatched Green
14/05/2021 - 0 - 0
15/05/2021- 0 - 0
16/05/2021- 0 - 0
17/05/2021- 0 - 0
18/05/2021- 0 - 0
19/05/2021 - 0 - 0
20/05/2021 -1 - 0
21/05/2021- 2 - 0
22/05/2021- 0 - 0
23/05/2021- 1 - 0
24/05/2021 - 2- 0
25/05/2021- 0 - 0
26/05/2021 -1 - 0
27/05/2021 - 4 - 0
When then trying to plot it with ggplot using:
ggplot(seasonality,aes(x=Date,y=seasonality$Laid Green))+geom_bar(stat="identity",width=1)
I get this:
I want to pool my data so that this is visually more pleasing, perhaps into 5 days? but I'm unsure how to do this. I am also trying to plot the green hatched on the same graph with nesting and hatching in 2 different colours.
Any help is appreciated!
You can use the package lubridate to round dates to a week start. dplyr from tidyverse can help you to then sum the counts.
library(lubridate)
library(tidyverse)
# so our random dataframes look the same
set.seed(123)
# fake data
seasonality <- tibble(date = sample(seq(as.Date('2021-04-01'), as.Date('2021-06-01'), by="day"),
size = 100,
replace = TRUE),
laid_green = sample(c(0:1),
size = 100,
replace = TRUE),
hatched_green = sample(c(0:1),
size = 100,
replace = TRUE)
) %>%
arrange(date)
# plot
seasonality %>%
mutate(week = floor_date(date,
unit = 'week')
) %>%
group_by(week) %>%
summarise(laid_green = sum(laid_green),
hatched_green = sum(hatched_green)) %>%
pivot_longer(-week) %>%
ggplot(aes(x=week,y=value, fill = name)) +
geom_col(pos = 'dodge')
Related
I have a data frame of multiple columns. I want to create a two boxplots of the two variable "secretary" and "driver" but the result is not satisfiying as the picture shows boxplot. This is my code:
profession ve.count.descrition euse.count.description Qualitative.result
secretary 0 1 -0.5
secretary 0 2 1
driver 1 1 -1
driver 0 2 0.3
data %>%
mutate(Qualitative.result = factor(Qualitative.result)) %>%
ggplot(aes(x = Profession , fill = Qualitative.result)) +
geom_boxplot()
You should not make Qualitative.result as factor. Maybe you want something like this:
library(tidyverse)
data %>%
ggplot(aes(x = Profession, y = Qualitative.result, fill = Profession)) +
geom_boxplot()
Output:
This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 1 year ago.
I have a dataset that in a wide format represent lenders' characters for a banking credit system. I want to make a scatter plot using ggplot where colours represent the purpose of the credit. My table looks like this: where 1 means the purpose of the credit.
lending_duration
lending_amount
Car
Furniture
TV/RADIO
House
1 month
2000
0
1
0
0
16 months
15600
1
0
0
0
4 month
13094
0
0
0
1
etc...
I tried:
ggplot(Data, aes(x = DURATION, y = AMOUNT))+ geom_point(aes(color = c(Car, Furniture, 'TV/Ratio', House))+ scale_color_viridis_c()
Not working out. Another question is how can I escape the / in the variable name, for example here TV/(OR)Radio, I try to use '' to escape the / in the variables but seems not working out.
Can someone help me here? Much appreciated!
Here's a solution for both questions. You can rename columns containing special characters by simply putting them in backticks:
library(tidyverse)
library(RColorBrewer)
# your sample data in a df
df <- tibble(lending_duration = c("1 month", "16 month", "4 month"),
lending_amount = c(2000, 15600, 13094),
Car = c(0, 1 ,0),
furniture = c(1,0,0),
`TV/Radio` = c(0, 0, 0),
House = c(0, 0, 1))
df %>% rename(TV_or_Radio = `TV/Radio`) %>%
pivot_longer(cols = c(Car, furniture, TV_or_Radio, House)) %>%
filter(value != 0) %>%
# split string in lending_duration and use only first part converted to numeric,
# allows to plot durations in increasing order
mutate(lending_duration = as.numeric(str_split(lending_duration, " ") %>% map_chr(., 1))) %>%
ggplot(aes(lending_duration, lending_amount, color = name)) +
geom_point(size = 3) +
scale_color_viridis_d() +
xlab("lending_duration in month")
Consider a df that I would like to plot.
The exemplary df:
df
Entry A. B. C. D. Value
O60701 1 1 1 0 2.7181970
Q8WZ42 1 1 1 1 3.6679832
P60981 1 1 0 0 2.2974231
Q15047 1 0 0 0 0.5535473
Q9UER7 1 0 0 0 4.1030394
I want Entry to be on y axis and Value on x axis. Do you have any ideas how to create a plot, so that if a protein is found (==1) let us say in column A it would be a dot on a plot? Since we have four columns (A-D), there can be maximum 4 dots. Hence, I would like to be able to distinguish which dot (or any other shape) comes from which column.
Here is what I have so far:
ggplot(df, aes(x=Value, y=Entry)) +
geom_point(size=1) +
theme_ipsum()
library(tidyverse)
df %>%
pivot_longer(cols = A:D) %>%
# by default, pivot_longer creates `name` column with either A/B/C/D,
# and a `value` column holding the original 0/1 value from those columns
filter(value == 1) %>% # only plot if protein found (A/B/C/D==1)
ggplot(aes(Value, Entry, color = name)) +
geom_jitter(height = 0.1, width = 0.1) + # since you have multiple points at the same locations
hrbrthemes::theme_ipsum()
Would anybody please help using ggplot2 in R, to show a barplot, where i need to show columns (first, second, third, fourth, fifth) on x axis and their values on y-axis ? without showing the column "uname".
> head(golQ1Grades)
qname uname first second third fourth fifth
1 onlinelernen_quiz_1 xxx 100 0 0 0 0
2 onlinelernen_quiz_2 xxxx 100 0 0 0 0
3 onlinelernen_quiz_4 xxxx 42 71 0 0 0
4 onlinelernen_quiz_7 xxxx 85 100 0 0 0
5 onlinelernen_quiz_1 xxx 85 100 0 0 0
6 onlinelernen_quiz_3 xxxx 71 0 0 0 0
Thanks for the advanced help.
It is my guess that you would like to display the mean value on the Y-axis.
library(ggplot2)
Data
dat<-data.frame(c(100,100,42,85,85,71), c(0,0,71,100,100,0), c(0,0,0,0,0,0), c(0,0,0,0,0,0), c(0,0,0,0,0,0))
names(dat)<-NULL
Compute mean and get new data
v1<-apply(dat, 2, mean)
nv1<-c("first","second","third", "fourth","fifth")
ndat<-data.frame(nv1, v1)
Plot
p <- ggplot(ndat, aes(factor(nv1), v1))
p + geom_bar(stat="identity")
I think the better option is dplyr and tidyr.E.g. (I change data.frame a little)
library(dplyr)
library(tidyr)
library(ggplot2)
df <- data.frame(qname = letters[1:10],
first = seq(1,10,1),
second = seq(10,100,10),
third = seq(2,20,2))
And then use gather feature:
df <- df %>%
gather(variable, value, -qname)
in your case it will be
df <- golQ1Grades %>%
gather(variable,value, -qname, -uname)
Futhermore, instead of computing average value it is also extremely helpful facet_grid:
ggplot(df, aes(factor(qname),value))+
geom_bar(stat = "identity")+
facet_grid(.~variable)
I need to create some box plots showing the abundance of some bacterial taxa in different samples.
My data looks like:
my.data <- "Taxon 06.TO.VG 21.TO.V 02.TO.VG 41.TO.VG 30.TO.V 04.BA.V 34.TO.VG 01.BA.V 28.TO.VG 18.TO.O 44.TO.V 08.BA.O 07.BA.O 06.BA.V 11.TO.V 06.BA.VG 07.BA.VG 05.BA.VG 07.BA.V 05.BA.V 06.BA.O 02.BA.O 04.BA.O 01.BA.O 05.BA.O 03.BA.O 02.BA.VG 03.BA.V 02.BA.V 04.BA.VG 03.BA.VG 01.BA.VG 15.TO.O 31.TO.O 09.TO.O 27.TO.V 42.TO.VG 08.TO.VG 16.TO.O 07.TO.V 13.TO.O 32.TO.V 29.TO.VG 10.TO.V 25.TO.V 05.TO.VG 20.TO.O 19.TO.V 17.TO.O 35.TO.V 43.TO.O 24.TO.V 26.TO.VG 01.TO.VG 37.TO.O 04.TO.VG 33.TO.O 39.TO.VG 14.TO.O 12.TO.O 38.TO.VG 22.TO.O
Bacteroides 0.072745558 0.011789182 0.028956894 0.059031877 0.097387173 0.086673889 0.432662192 0.060246679 0.269535674 0.152713335 0.014511873 0.063421323 0.091253905 0.139856373 0.013677012 0.200847907 0.180712032 0.21332737 0.031756181 0.272166702 0.019861211 0.133804422 0.168692685 0.100862392 0.152431791 0.104702194 0.119352089 0.410334347 0.024104844 0.0493905 0.068065382 0.047854785 0.011860175 0.168986083 0.015748031 0.407974482 0.264409881 0.250364431 0.330547112 0.536443695 0.578045113 0.400459167 0.204446209 0.357879234 0.242751388 0.488863722 0.521495803 0.001852281 0.045638126 0.503566932 0.069072806 0.171181339 0.183629007 0.371751412 0.385231317 0.023690205 0.255697356 0.104054054 0.242741552 0.043973941 0.221033868 0.004587156
Prevotella 0.073080791 0.302011096 0.586048042 0.487603306 0.290973872 0.014897075 0 0.333254269 0.029445074 0 0.153034301 0.002399726 0.025658188 0.090664273 0.440294582 0.100688924 0 0 0 0 0 0.000227946 0.093623374 0 0.000197707 0.115987461 0.076442171 0 0.047507606 0.000210172 0.000243962 0.042079208 0.52184769 0 0.394750656 0 0 0.235787172 0 0.000936856 0.000300752 0 0.051607781 0 0 0 0.002289494 0.735586941 0.023828756 0 0.011200996 0 0.046374105 0 0.00044484 0.085421412 0.000455789 0.306756757 0 0.11970684 0.008912656 0.371559633"
I'm wandering bout using ggplot2 to do to do the box plot, but I'm not sure about how the data have to be formatted....
I tried this:
df <- read.csv("my.data", header=T)
ggplot(data = df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Taxon))
but it gave me an error saying that the variable was not found...
Anyone can help me?
Many thanks
Francesca
An quick example of how to format your data:
categs = sample(LETTERS[1:3], 120, TRUE)
y = c(rnorm(40), rnorm(40, 3, 2), rnorm(40, 5, 3))
# example dataset
dados = data.frame(categs, y)
require(ggplot2)
ggplot(dados) + geom_boxplot(aes(x = categs, y = y))
# categs y
#1 B 0.7392673
#2 B -0.1694076
#3 A -2.3804024
#4 B 0.5999949
#5 A 0.5816400
#6 A 2.1263669
See also http://ggplot2.org/