I have a dataset looks like this:
01/02/2013 02/02/2013 03/02/2013 04/02/2013
1 2 3 3
2 1 6 7
3 3 4 2
4 1 1 8
I want to make a graph with n boxplots according to the number of the columns in my dataset, where each boxplot only contains one variable which is its corresponding column. So in this case, there would be 4 boxplots.
I used boxplot() function and it worked for my data, however I want to use geom_jitter() from ggplot2 to beautify my plots. And ggplot2 requires both x and y axes where I don't really have with my dataset.
This is what I want for my plot:
Bring your data in long format with pivot_longer from tidyr package (is in tidyverse)
use ggplot from ggplot2 package (is also in tidyverse)
geom_boxplot and geom_jitter if needed.
library(tidyverse)
df %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = starts_with("X"),
names_to = "names",
values_to = "values"
) %>%
ggplot(aes(x=names, y=values, fill=names))+
geom_boxplot() +
geom_jitter(aes(y=values))
Related
So i have this data, that I would like to plot onto a graph - all the lines on the same graph
>ndiveristy
Quadrant nta.shannon ntb.shannon ntc.shannon
1 1 2.188984 0.9767274 1.8206140
2 2 1.206955 1.3240481 1.3007058
3 3 1.511083 0.5805081 0.7747041
4 4 1.282976 1.4222243 0.4843907
5 5 1.943930 1.7337267 1.5736545
6 6 2.030524 1.8604619 1.6860711
7 7 2.043356 1.5707110 1.5957869
8 8 1.421275 1.4363365 1.5456799
here is the code that I am using to try to plot it:
ggplot(ndiversity,aes(x=Quadrant,y=Diversity,colour=Transect))+
geom_point()+
geom_line(aes(y=nta.shannon),colour="red")+
geom_line(aes(y=ntb.shannon),colour="blue")+
geom_line(aes(y=ntc.shannon),colour="green")
But all I am getting is the error
data must be a data frame, or other object coercible by fortify(), not a numeric vector.
Can someone tell me what I'm doing wrong
Typically, rather than using multiple geom_line calls, we would only have a single call, by pivoting your data into long format. This would create a data frame of three columns: one for Quadrant, one containing labels nta.shannon, ntb.shannon and ntc.shannon, and a column for the actual values. This allows a single geom_line call, with the label column mapped to the color aesthetic, which automatically creates an instructive legend for your plot too.
library(tidyverse)
as.data.frame(ndiversity) %>%
pivot_longer(-1, names_to = 'Type', values_to = 'Shannon') %>%
mutate(Type = substr(Type, 1, 3)) %>%
ggplot(aes(Quadrant, Shannon, color = Type)) +
geom_line(size = 1.5) +
theme_minimal(base_size = 16) +
scale_color_brewer(palette = 'Set1')
For posterity:
convert to data frame
ndiversity <- as.data.frame(ndiversity)
get rid of the excess code
ggplot(ndiversity,aes(x=Quadrant))+
geom_line(aes(y=nta.shannon),colour="red")+
geom_line(aes(y=ntb.shannon),colour="blue")+
geom_line(aes(y=ntc.shannon),colour="green")
profit
not the prettiest graph I ever made
I have the following data frame in R:
> data <- data.frame(tbi_military[0:4])
> data
Severity Active Guard Reserve
1 Penetrating 189 33 12
2 Severe 102 26 11
3 Moderate 709 177 63
4 Mild 5896 1332 541
5 Not Classifiable 122 29 12
And when I do barplot(as.matrix(data)) I get the following output:
Barplot Image
Is there a way for me to get rid of the severity on the x-axis to only have Active, Guard, Reserve? Thanks
one option is to send only the data you want to plot to the plotting function. In this case you want all columns from the second to the last (number four) so a small adjustment to your function call does the job:
barplot(as.matrix(data[, 2:4]))
A solution within the tidyverse (dplyr, tidyr and ggplot2) would be this:
library(dplyr)
library(tidyr)
library(ggplot2)
data %>%
# get data in tidy format to be able to use ggplot2 efciently
tidyr::pivot_longer(-Severity, names_to = "Type", values_to = "Value") %>%
# set up the plot by assigning variable to plot
ggplot2::ggplot(aes(Type, Value, fill = Severity)) +
# put out a bar chart with stat parameter set for stacked barchart
ggplot2::geom_bar(stat = "identity")
I have a dataframe with a column of 'Y' or 'N' for 2 groups eg:
drug<-c("Y","Y","N","Y","Y","Y","N","N","N","N","N","Y","Y","Y","N","N")
group<-c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1)
df<-data.frame(drug,group)
I want to make barplots of the 'Y'/'N' for both groups with the two groups beside each other.
I've tried various things with ggbarplot and get weird plots out
ggbarplot(my_matches, x = "group", y = "drug",
color = "group", palette = c("#00AFBB", "#FC4E07"))
and have tried making tables and plotting these as barplots like
counts0<-df[which(df$group==0),]
counts1<-df[which(df$group==1),]
grp0<-table(counts0$drug)
grp1<-table(counts1$drug)
s<- as.data.frame(t(rbind(grp0,grp1)))
barplot(s$grp0, s$grp1,beside=T)
As you can tell, I'm a beginner and have been driving myself mad trying to solve this. Please help!
First, there's no need to create vectors as data frame columns, and df is not a great variable name (there's a function of the same name). Create your data frame in one step like this:
mydata <- data.frame(drug = c("Y","Y","N","Y","Y","Y","N","N","N","N","N","Y","Y","Y","N","N"),
group = c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1))
Second: if you're working with data frames, it's worth learning dplyr. So install it, along with ggplot2, then load:
library(dplyr)
library(ggplot2)
Now we can count Y/N by group:
mydata %>%
count(group, drug)
# A tibble: 4 x 3
group drug n
<dbl> <fct> <int>
1 0 N 3
2 0 Y 5
3 1 N 5
4 1 Y 3
And plot counts versus group. We need to convert the groups to factors, since group is a categorical variable:
mydata %>%
count(group, drug) %>%
mutate(group = factor(group)) %>%
ggplot(aes(group, n)) +
geom_col(aes(fill = drug))
I am trying to create a stacked bar chart in ggplot2 to display the percentage of values corresponding to each categorical variable. Here's an example of the data that I am trying to work with.
sampledf <- data.frame("Death" = rep(0:1, each = 5),
"HabitA" = rep(0:1, c(3, 7)),
"HabitB" = rep(1:2, c(4, 6)),
"HabitC" = rep(0:1, c(6, 4)))
Each of the habits are the columns that I am using to create the stacked bar chart, and I want to use the Death column in facet_grid. I'm looking to show the percentage of values for each habit in the bar chart.
The output data I think I need to create the chart should will translate to, under Death = 0, HabitA has 60% 0 values, and 40% of the values are 1, while under Death = 1, 100% of HabitA values are 1.
I have produced charts like this using ggplot and group_by, summarise for only one attribute, but I am not sure how this works with multiple categorical attributes in the data.
sampledf %>%
group_by(Death, HabitA) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
This produces what I want for just one variable, but when I include another attribute in the group by argument, it returns counts a percentages for a combination of all 3 attributes which is not what I am looking for. I tried using the summarise_at/mutate_at but it doesn't seem to be working.
sampledf %>%
group_by(Death) %>%
mutate_at(c("HabitA", "HabitB"), Counts = n())
Is there a straightforward way to do this in R, and use the resulting data as input for ggplot2?
Edit:
I tried to reshape the data and using the long form to build my plot. Here's what I have.
long <- melt(sampledf, id.vars = c("Death"))
The resulting data is in this format.
Death variable value
1 0 HabitA 0
2 0 HabitA 0
3 0 HabitA 0
4 0 HabitA 1
5 0 HabitA 1
6 1 HabitA 1
7 1 HabitA 1
I'm not sure how to use the value attribute to build the plot, because the ggplot I am currently trying to build is counting the total number of times each level occurs in the variable column.
ggplot(long, aes(x = variable, fill = variable)) +
geom_bar(stat = "count", position = "dodge") + facet_grid(~ Death)
Try this, maybe not so straightforward, but it works. It includes reshaping as #aosmith suggested by gather. Then calculation of number of observations after grouping and then percentage for each group Death + habitat. Then summarized to get unique values.
sampledf_edited <- sampledf %>%
tidyr::gather("habitat", "count", 2:4) %>%
group_by(Death, habitat, count) %>%
mutate(observation = n()) %>%
ungroup() %>%
group_by(Death, habitat) %>%
mutate(percent = observation/n()) %>%
ungroup() %>%
group_by(Death, habitat, count, percent) %>%
summarize()
It is necessarry to make count factor.
sampledf_edited$count <- as.factor(sampledf_edited$count)
Plotting by ggplot.
ggplot(sampledf_edited, aes(habitat, percent, fill = count)) +
geom_bar(stat = "identity") +
facet_grid(~ Death)
If your question has been answered, please make sure to accept an answer for further references.
---EDIT---
plot added
This question already has answers here:
Building a box plot from all columns of data frame with column names on x in ggplot2 [duplicate]
(1 answer)
Multiple boxplots using ggplot
(1 answer)
Closed 5 years ago.
EDIT: Added the boxplot generated with standard boxplot() function.
Given the iris dataste, the following code:
boxplot(iris[,])
Creates a boxplot with five boxes, one for each variable, without splitting them into categories such as, for instance, species. While this is simple enough, I have been unable to do the same in ggplot2.
My question, then, is simple: how can I achieve this?
Species is a factor with three levels (setosa, versicolor and virginica). I think it doesn't make sense if you plot it with the other variables.
It makes more sense if you want to plot all other 4 variables (Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width) in one plot as below
library(dplyr)
library(tidyr)
library(ggplot2)
iris %>% dplyr::select(Species, everything()) %>% tidyr::gather("id", "value",2:5) %>%
ggplot(., aes(x = id, y = value))+geom_boxplot()
If you want to plot all 5 variables in the same plot, you need to convert species to be numeric
iris %>% dplyr::mutate(Species = as.numeric(Species)) %>% tidyr::gather("id", "value",1:5) %>%
ggplot(., aes(x = id, y = value))+geom_boxplot()