This question already has an answer here:
ggplot legends - change labels, order and title
(1 answer)
Closed 6 years ago.
First time asking here so forgive me if I'm not clear enough.
So far I have seen many replies to similar questions, which explain how to sort bars by some field of a data frame; but I've been not able to find how to sort them by the default stat "count" of geom_bar (which is obviously NOT a field of the data frame.)
For example, I run this code:
library(ggplot2)
Name <- c( 'Juan','Michael','Andrea','Charles','Jonás','Juan','Donata','Flavia' )
City <- c('Madrid','New York','Madrid','Liverpool','Madrid','Buenos Aires','Rome','Liverpool')
City.Id <- c(1,2,1,3,1,4,5,3)
df = data.frame( Name,City,City.Id )
a <- ggplot( df,aes( x = City, text=paste("City.Id=",City.Id)) ) +
geom_bar()
ggplotly(a)
And then I would like to visualize the resulting bars ordered by their height (=count.) Note that I must keep the "City.Id" info to show in the final plot. How can this be done?
Given that you're already using ggplot2, I'd suggest looking into what else the tidyverse can offer. Namely the forcats package for working with factors.
forcats has a nice function fct_infreq() which will (re)set the levels of a factor to be in the order of their frequency. If the data is a character vector not already a factor (like City is in your data) then it will first make it a factor, and then set the levels to be in frequency order.
Try this code:
# Load packages
library(ggplot2)
library(forcats)
# Create data
Name <- c( 'Juan','Michael','Andrea','Charles','Jonás','Juan','Donata','Flavia' )
City <- c('Madrid','New York','Madrid','Liverpool','Madrid','Buenos Aires','Rome','Liverpool')
City.Id <- c(1,2,1,3,1,4,5,3)
df = data.frame( Name,City,City.Id )
# Create plot
a <- ggplot(df, aes(x = fct_infreq(City), text=paste("City.Id=",City.Id)) ) +
geom_bar()
a
One could use reorder :
df$City <- reorder(df$City,df$City.Id,length)
and then plot with the code in the question.
Related
I am trying to sort y-axis numerically according to population values. Have tried other stackoverflow answers that suggested reorder/ converting columns to numeric data type (as.numeric), but those solutions does not seem to work for me.
Without using reorder, the plot is sorted alphabetically:
Using reorder, the plot is sorted as such:
The code i am using:
library(ggplot2)
library(ggpubr)
library(readr)
library(tidyverse)
library(lemon)
library(dplyr)
pop_data <- read_csv("respopagesextod2011to2020.csv")
temp2 <- pop_data %>% filter(`Time` == '2019')
ggplot(data=temp2,aes(x=reorder(PA, Pop),y=Pop)) + geom_bar(stat='identity') + coord_flip()
How should I go about sorting my y-axis? Any help will be much appreciated. Thanks!
I am using data filtered from: https://www.singstat.gov.sg/-/media/files/find_data/population/statistical_tables/singapore-residents-by-planning-areasubzone-age-group-sex-and-type-of-dwelling-june-20112020.zip
The functions are all working as intended - the reason you don't see the result as expected is because the reorder() function is specifying the ordering of the pop_data$PA based on each observation in the set, whereas the bars you are plotting are a result of summary statistics on pop_data.
The easiest solution is to probably perform the summarizing first, then plot and reorder the summarized dataset. This way, the reordering reflects an ordering of the summarized data, which is what you want.
temp3 <- pop_data %>% filter(`Time` == '2019') %>%
group_by(PA) %>%
summarize(Pop = sum(Pop))
ggplot(data=temp3, aes(x=reorder(PA, Pop),y=Pop)) +
geom_bar(stat='identity') + coord_flip()
I have several datasets and my end goal is to do a graph out of them, with each line representing the yearly variation for the given information. I finally joined and combined my data (as it was in a per month structure) into a table that just contains the yearly means for each item I want to graph (column depicting year and subsequent rows depicting yearly variation for 4 different elements)
I have one factor that is the year and 4 different variables that read yearly variations, thus I would like to graph them on the same space. I had the idea to joint the 4 columns into one by factor (collapse into one observation per row and the year or factor in the subsequent row) but seem unable to do that. My thought is that this would give a structure to my y axis. Would like some advise, and to know if my approach to the problem is effective. I am trying ggplot2 but does not seem to work without a defined (or a pre defined range) y axis. Thanks
I would suggest next approach. You have to reshape your data from wide to long as next example. In that way is possible to see all variables. As no data is provided, this solution is sketched using dummy data. Also, you can change lines to other geom you want like points:
library(tidyverse)
set.seed(123)
#Data
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
#Plot
df %>% pivot_longer(-year) %>%
ggplot(aes(x=factor(year),y=value,group=name,color=name))+
geom_line()+
theme_bw()
Output:
We could use melt from reshape2 without loading multiple other packages
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.var = 'year'), aes(x = factor(year), y = value,
group = variable, color = variable)) +
geom_line()
-output plot
Or with matplot from base R
matplot(as.matrix(df[-1]), type = 'l', xaxt = 'n')
data
set.seed(123)
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
I am learning r currently and I have an r data-frame containing data I have scraped from a football website.
There are 58 columns(Variables,attributes) for each row. Out of these variables, I wish to plot 3 in a single bar chart.I have 3 important variables 'Name', 'Goals.with.right.foot', 'Goals.with.left.foot'.
What I want to build is a bar chart with each 'Name' appearing on the x-axis and 2 independent bars representing the other 2 variables.
Sample row entry:
{......., RONALDO, 10(left), 5(right),............}
I have tried playing around a lot with ggplot2 geom_bar with no success.
I have also searched for similar questions however I cannot understand the answers. Is anyone able to explain simply how do I solve this problem?
my data frame is called 'Forwards' who are the strikers in a game of football. They have attributes Name, Goals.with.left.foot and Goals.with.right.foot.
barplot(counts, main="Goals",
xlab="Goals", col=c("darkblue","red"),
legend = rownames(counts))
You could try it this way:
I simulated a frame as a stand in for yours, just replace it with a frame containing the columns you're interested in:
df <- data.frame(names = letters[1:5], r.foot = runif(5,1,10), l.foot = runif(5,1,10))
# transform your df to long format
library(reshape2)
plotDf <- melt(df, variable.name = 'footing', value.name = 'goals')
# plot it
library(ggplot2)
ggplot(plotDf, aes(x = names, y = goals, group = footing, fill = footing)) +
geom_col(position = position_dodge()) #does the same as geom_bar, but uses stat_identity instead of stat_count
Results in this plot:
your plot
This works, because ggplot expects one variable containing the values needed for the y-axis and one or more variable containing the grouping factor(s).
with the melt-function, your data.frame is merged into the so called 'long format' which is exactly the needed orientation of data.
Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)
I made a grouped barchart in R using the ggplot package. I used the following code:
ggplot(completedDF,aes(year,value,fill=variable)) + geom_bar(position=position_dodge(),stat="identity")
And the graph looks like this:
The problem is that I want the 1999-2008 data to be at the end.
Is there anyway to move it?
Thanks any help appreciated.
ggplot will follow the order of the levels in a factor. If you didn't ordered your factor, then it is assumed that the order is alphabetical.
If you want your "1999-2008" modality to be at the end, just reorder your factor using
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
For example :
library(ggplot2)
# Create a sample data set
set.seed(2014)
years_labels <- c( "1999-2008","1999-2002", "2002-2005", "2005-2008")
variable_labels <- c("pointChangeVector", "nonPointChangeVector",
"onRoadChangeVector", "nonRoadChangeVecto")
years <- rbinom(n=1000, size=3,prob=0.3)
variables <- rbinom(n=1000, size=3,prob=0.3)
year <- factor(x=years , levels=0:3, labels=years_labels)
variable <- factor(x=variables , levels=0:3, labels=variable_labels)
completed <- data.frame( year, variable)
# Plot
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
# change the order
completed$year <- factor(x=completed$year,
levels=c("1999-2002", "2002-2005", "2005-2008", "1999-2008"))
ggplot(completed,aes(x=year, fill=variable)) + geom_bar(position=position_dodge())
Furthermore, the other benefit of using this is you will have also your results in a good order for others functions like summary or plot.
Does it help?
Yeah this is a real probelm in ggplot. It always changes the order of non-numeric values
The easiest way to solve it is to add scale_x_discrete in this way:
p <- ggplot(completedDF,aes(year,value,fill=variable))
p <- p + geom_bar(position=position_dodge(),stat="identity")
p <- p + scale_x_discrete(limits = c("1999-2002","2002-2005","2005-2008","1999-2008"))