This question already has answers here:
Order Bars in ggplot2 bar graph
(16 answers)
Closed 1 year ago.
I wanted to move my bars according to this particular order for the beetle number i.e., from 0 to 1-5 to 6-10 to 11-15 to Above 15. I also wanted to place Village first and the Municipality. The plots should also be arranged in terms of the age of the building. Under 5 years first, then 5-10 years followed by Above 10 years
ggplot(g,aes(x=Locality.Division))+
geom_bar(aes(fill=Number.of.Beetle),position="dodge")+
facet_wrap(~Building.Age)
#> Error in ggplot(g, aes(x = Locality.Division)): could not find function "ggplot"
Created on 2021-05-30 by the reprex package (v2.0.0)
The order of the bars is determined by the order of the factor levels of the variable.
You have the Number.of.Beetle variable in your data a character variable. ggplot() converts this to a factor variable with factor(), which by default sorts character variables alphabetically. To specify a different order, convert the variable to a factor yourself before plotting:
g <- mutate(g,
Number.of.Beetle = factor(Number.of.Beetle, levels = c("1-5", "6-10", "11-15", "15+))
)
If the order is shown backwards, then also use forcats::fct_rev() to reverse the order:
g <- mutate(g,
Number.of.Beetle = forcats::fct_rev(factor(Number.of.Beetle, levels = c("1-5", "6-10", "11-15", "15+)))
)
I hope the following helps to get you started. You did not provide a minimal reproducible example, thus, I simulate some data. I also adapted the variable names.
A key strategy to control the order of variables is making them a factor. I do this when plotting.
Note: number of beetles is quasi-sorted given the values used. Here you could also work with a factor, if needed.
library(ggplot2)
set.seed(666) # fix random picks for replicability
# simulate data of 30 buildings
df <- data.frame(
Building = 1:30
, Building.Age = sample(x = c("U5","5-10","A10"), size = 30, replace = TRUE)
, Nbr.Beetle = sample(x = c("1-5","6-10","11-15","15+"), size = 30, replace = TRUE)
, Locality = sample(x = c("A","B","C"), size = 30, replace = TRUE))
# plot my example
ggplot(data = df, aes(x=Locality)) +
geom_bar(aes(fill=Nbr.Beetle),position="dodge") +
# --------------------- control the sequence of panels by forcing level sequence of factor
facet_wrap(. ~ factor( Building.Age, levels = c("U5","5-10","A10") ) )
This yields:
Related
What I want to do
My dataset consists of several cases (id) with different outcomes (outcome) for a given number of repeated meaures (cycle). Each cycle should be counted as 1 (val) or be visualized of equal length.
The plot I want to end up with is a stacked bar chart, where each cycle of each case has the same length. The sequence of cycles must be continous. The sequence of the outcomes is dependent on the according cycles.
My Problem
The sample code below produces a bar chart that sums up the cycles (although being a factor). However, using the val column instead of cycle messes with the sequence of the outcomes, which must not change.
# setup
library(ggplot2)
library(dplyr)
set.seed(0)
# test data
data.frame(
cycle=factor(rep(1:8,2),levels=1:8),
val=1,
id=factor(rep(1:2,each=8)),
outcome=factor(paste("Outcome",sample(1:8,16,T)),levels=paste("Outcome",1:8))) %>%
# plot
ggplot(.,aes(id,cycle,fill=outcome))+
geom_bar(stat="identity",position=position_stack(reverse=T),width=0.99)+
coord_flip()
My Question
Is it possible to make cycles count as 1 for each id, keeping the outcome sequence?
Thank you in advance!
The Plots
This is what I get when using the above code:
This is what I get, when using val instead of cycle:
The goal is to keep the outcome sequence, while counting each cycle as 1 or making them appear of the same length for each id.
As far as I get it you could achieve your desired result using geom_tile:
library(ggplot2)
set.seed(0)
dat <- data.frame(
cycle = factor(rep(1:8, 2), levels = 1:8),
val = 1,
id = factor(rep(1:2, each = 8)),
outcome = factor(paste("Outcome", sample(1:8, 16, T)), levels = paste("Outcome", 1:8))
)
ggplot(dat, aes(cycle, id, fill = outcome)) +
geom_tile()
I'm very new to R so I'm sorry if this is something really simple.
I've had a look on a bunch of cheat sheets and can't see anything obvious.
I have a simple set of data that has date, temperature, and 4 different factors (based on the bloom of a tree // 1 = "", 2 = "bloom", 3 = "full", 4 = "scatter")
What I want to do, but have no idea how, is to do a scatter plot of the date and temperature of each factor individually.
One approach is to use ggplot2 with facet_wrap. First, be sure to set the level names of the Bloom factor so the plots will label usefully.
Then, we use ggplot to plot the data and group = by the Bloom factor. Then we add facet_wrap with the formula that . (everything else) should be grouped by Bloom.
library(ggplot2)
levels(TreeData$Bloom) <- c("None","Bloom","Full","Scatter")
ggplot(TreeData, aes(x=Date,y=Temp,group = Bloom, color = Bloom)) +
geom_point(show.legend = FALSE) +
facet_wrap(. ~ Bloom)
Per your comment, if you wanted individual graphs you could use base R subsetting with TreeData[TreeData$Bloom == "Full",]. Note that "Full" is the factor level we set earlier.
ggplot(TreeData[TreeData$Bloom == "Full",], aes(x=Date,y=Temp)) +
geom_point() + labs(title="Full Bloom")
Data
set.seed(1)
TreeData <- data.frame(Date = rep(seq.Date(from=as.Date("2019-04-01"), to = as.Date("2019-08-01"), by = "week"),each = 10) , Temp = round(runif(22,38,n=180)), Bloom = as.factor(sample(1:4,180,replace = TRUE)))
I've tried to search for an answer, but can't seem to find the right one that does the job for me.
I have a dataset (data) with two variables: people's ages (age) and number of awards (awards)
My objective is to plot the number of awards against age in R. FYI, a person can have multiple awards and people can have the same age.
I tried to plot a histogram and barplot, but the problem with that is that it counts the number of observations instead of summing the number of awards.
A sample dataset:
age <- c(21,22,22,25,30,34,45,26,37,46,49,21)
awards <- c(0,3,2,1,0,0,1,3,1,1,1,1)
data <- data.frame(cbind(age,awards))
What I'm looking for is a histogram (or barplot) that represents this data.
Ideally, I'd want the ages to be split into age groups. For example,
20-30, 31-40, 41-50 and then the total number of awards for each group.
The age group would be on the x-axis and the total number of awards for each age group would be on the y-axis.
Thanks!
We can use the aggregate function and then use the ggplot2 package. I don't make too many barplots in base R these days so I'm not sure of the best way to do it without loading ggplot2:
create sample data
#data
set.seed(123)
dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
awards = rpois(200, 3))
head(dat)
age awards
1 28 2
2 44 6
3 32 3
4 47 3
5 49 2
6 21 5
By age
#aggregate
sum_by_age <- aggregate(awards ~ age, data = dat, FUN = sum)
library(ggplot2)
ggplot(sum_by_age, aes(x = age, y = awards))+
geom_bar(stat = 'identity')
By age group
#create groups
dat$age_group <- ifelse(dat$age <= 30, '20-30',
ifelse(dat$age <= 40, '30-40',
'41 +'))
sum_by_age_group <- aggregate(awards ~ age_group, data = dat, FUN = sum)
ggplot(sum_by_age_group, aes(x = age_group, y = awards))+
geom_bar(stat = 'identity')
Note
We could skip the aggregate step altogether and just use:
ggplot(dat, aes(x = age, y = awards)) + geom_bar(stat = 'identity')
but I don't prefer that way because I think having an intermediate data step may be useful within your analytical pipeline for comparisons other than visualizing.
For completeness, I am adding the base R solution to #bouncyball's great answer. I will use their synthetic data, but I will use cut to create the age groups before aggregation.
# Creates data for plotting
> set.seed(123)
> dat <- data.frame(age = sample(20:50, 200, replace = TRUE),
awards = rpois(200, 3))
# Created a new column containing the age groups
> dat[["ageGroups"]] <- cut(dat[["age"]], c(-Inf, 20, 30, 40, Inf),
right = FALSE)
cut will divide up a set of numeric data based on breaks defined in the second argument. right = FALSE flips the breaks so values the groups would include the lower values rather than the upper ones (ie 20 <= x < 30 rather than the default of 20 < x <= 30). The groups do not have to be equally spaced. If you do not want to include data above or below a certain value, simply remove the Inf from the end or -Inf from the beginning respectively, and the function will return <NA> instead. If you would like to give your groups names, you can do so with the labels argument.
Now we can aggregate based on the groups we created.
> (summedGroups <- aggregate(awards ~ ageGroups, dat, FUN = sum))
ageGroups awards
1 [20,30) 188
2 [30,40) 212
3 [40, Inf) 194
Finally, we can plot these data using the barplot function. The key here is to use names for the age groups.
> barplot(summedGroups[["awards"]], names = summedGroups[["ageGroups"]])
I have 1000 categorical data sampled over 5 years which I collected which I may demonstrate as
senerio <- as.integer(runif(1000, min = 1, max = (4+1)))
the cases are numbers (1,2,3,4) with the first 181 integers for year1, the next 211 integer for year2, the next 205 integers for year3, the next 185 integers for year4, and the last 218 integers for year5. all within a column. I want to draw a group bar chart with year as x-axis (with the case 1,2,3,4 being a sub-bars in the same x_axis) while the y-axis as the frequency of occurrence.
I want to know how many 1's in year1, year2, year3, year4 and also know how many 2s,3s,4s in each year.
my MWE which do no produceenter image description here
barplot(senerio, legend = c("1",2","3","4"),beside=TRUE)
this is how I want the group chart to look like
enter image description here
Using ggplot is the likely solution. First, though, you will need to declare the years in your data. Below is a verbose example to show manually creating a dataframe and creating a years column, as well as a quick ggplot example. I'm not 100% sure I nailed you expected output. However,this is common question so this should provides you a start for exploring similar questions.
library(tidyverse)
senerio <- as.integer(runif(1000, min = 1, max = (4+1)))
senerio <- data.frame(senerio)
colnames(senerio) <- "value"
senerio$value <- as.factor(senerio$value)
senerio$years <- 0
senerio$years[1:181] <- 1
senerio$years[182:392] <- 2
senerio$years[393:597] <- 3
senerio$years[598:782] <- 4
senerio$years[783:1000] <- 5
ggplot(senerio,aes(years,fill=value)) + geom_bar(position=position_dodge())
use ggplot2 :
`
library(ggplot2)
dat1 <- data.frame(
gender = factor(c("Female","Female","Male","Male")),
time = factor(c("Lunch","Dinner","Lunch","Dinner"), levels=c("Lunch","Dinner")),
total_bill = c(13.53, 16.81, 16.24, 17.42)
)
ggplot(data=dat1, aes(x=time, y=total_bill, fill=sex)) +
geom_bar(stat="identity", position=position_dodge())
`
position is the important property for visual which you need.
Say I want to plot percentages of "yes" answers to a question, across different age groups in ggplot. These age groups are obviously factors, but I want them to be shown in a scale-like fashion, so want to use a line graph.
Here's some data:
mydata <- data.frame(
age_group = c("young", "middle", "old"),
question = sample(c("yes", "no"), 99, replace = TRUE))
mydata$age_group = factor(mydata$age_group,levels(mydata$age_group)[c(3, 1, 2)])
mydata$question = factor(mydata$question,levels(mydata$question)[c(2,1)])
So far, I have been using this code to generate a stacked barplot:
ggplot(mydata, aes(age_group, fill = question)) + geom_bar(position = "fill")
How could I change this into a line graph, with just the frequency counts of the "yes" answers? Mark in the answers suggests a workaround which produces the right output:
But I hoping there was a way to do this automatically in one line of code, rather than creating this summary table first.
If I understood correctly, this does what you want:
ggplot(mydata) +
stat_bin(aes(x=age_group, color=question, group=question), geom="line")
Note this doesn't look exactly the same as yours in terms of yes/no because you didn't set a seed for the random numbers.
If you just want the percentages of "yes" for each category, I suggest changing your data to the following:
question age_group value percent
1 yes young 14 0.4242424
3 yes middle 17 0.5151515
5 yes old 20 0.6060606
Using this code to summarize the data:
library(reshape)
mydata.summary = melt(xtabs(~question+age_group,data=mydata))
mydata.summary2 = mydata.summary[mydata.summary$question=="yes",]
mydata.summary2$percent <- mydata.summary2$value/melt(xtabs(~age_group,data=mydata))$value
ggplot(mydata.summary2, aes(age_group,percent, group = question, colour=question)) + geom_line()