Here is come basic code for a column plot:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarise(
count = n()
) %>%
ggplot(
aes(
x = cut,
y = count,
fill = count
)
) +
geom_col() +
scale_fill_viridis_c(
option = "plasma"
)
I could not find any examples of what I would like to do so I will try and explain it as best I can. I have applied a colour gradient to the fill aesthetic which colours the whole column plot one colour. Is it possible to have it such that each column of the plot contains the full colour spectrum up until it's respective value?
By which I mean the "Ideal" column of my plot would look exactly like the key in the legend. Then the "Premium" column would look like the key in the legend but cut off ~2/3 of the way up.
Thanks
You can do this fairly easily with a bit of data manipulation. You need to give each group in your original data frame a sequential number that you can associate with the fill scale, and another column the value of 1. Then you just plot using position_stack
library(ggplot2)
library(dplyr)
diamonds %>%
group_by(cut) %>%
mutate(fill_col = seq_along(cut), height = 1) %>%
ggplot(aes(x = cut, y = height, fill = fill_col)) +
geom_col(position = position_stack()) +
scale_fill_viridis_c(option = "plasma")
Related
I have an uncolored geom_col and would like it to display information about another (continuous) variable by displaying different shades of color in the bars.
Example
Starting with a geom_col
library(dplyr)
library(ggplot2)
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n()) %>%
ggplot(aes(Species, n)) +
geom_col()
Suppose we want to color the bars according to how low/high mean(Sepal.Width) in each grouping
(note: I don't know if there's a way to provide 'continuous' colors to a ggplot, but, if not, the following colors would be fine to use)
library(RColorBrewer)
display.brewer.pal(n = 3, name= "PuBu")
brewer.pal(n = 3, name = "PuBu")
[1] "#ECE7F2" "#A6BDDB" "#2B8CBE"
The end result should be the same geom_col as above but with the bars colored according to how low/high mean(Sepal.Width) is.
Notes
This answer shows something similar but is highly manual, and is okay for 3 bars, but not sustainable for many plots with a high number of bars (since would require too many case_when conditions to be manually set)
This is similar but the coloring is based on a variable already displayed in the plot, rather than another variable
Note also, in the example I provide above, there are 3 bars and I provide 3 colors, this is somewhat manual and if there's a better (i.e. less manual) way to designate colors would be glad to learn it
What I've tried
I thought this would work, but it seems to ignore the colors I provide
library(RColorBrewer)
# fill info from: https://stackoverflow.com/questions/38788357/change-bar-plot-colour-in-geom-bar-with-ggplot2-in-r
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n(), sep_mean = mean(Sepal.Width)) %>%
arrange(desc(n)) %>%
mutate(colors = brewer.pal(n = 3, name = "PuBu")) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill = colors)) +
geom_col()
Do the following
add fill = sep_mean to aes()
add + scale_fill_gradient()
remove mutate(colors = brewer.pal(n = 3, name = "PuBu")) since the previous step takes care of colors for you
set.seed(124)
iris[sample(1:150, 50), ] %>%
group_by(Species) %>%
summarise(n=n(), sep_mean = mean(Sepal.Width)) %>%
arrange(desc(n)) %>%
mutate(Species=factor(Species, levels=Species)) %>%
ggplot(aes(Species, n, fill = sep_mean, label=sprintf("%.2f", sep_mean))) +
geom_col() +
scale_fill_gradient() +
labs(fill="Sepal Width\n(mean cm)") +
geom_text()
data1=data.frame("Grade"=c(1,1,1,2,2,2,3,3,3),
"Class"=c(1,2,3,1,2,3,1,2,3),
"Score"=c(6,9,9,7,7,4,9,6,6))
I am sincerely apologetic if this already was posted but I did not see it. I wish to prepare a stacked bar plot there the X axis is 'Grade' and each Grade is 1 bar. Every bar contains three color shades because there are three classes ('Class'). Finally the height of the bar is 'Score' and it always starts from low class to high. So it will look something like this but this is not to proper scale
We can use xtabs to convert the data to wide format and then apply the barplot
barplot(xtabs(Score ~ Grade + Class, data1), legend = TRUE,
col = c('yellow', 'red', 'orange'))
Or using ggplot
library(dplyr)
library(ggplot2)
data1 %>%
mutate_at(vars(Grade, Class), factor) %>%
ggplot(aes(x = Grade, y = Score, fill = Class)) +
geom_col()
If we want to order for 'Class', convert to factor with levels specified in that order based on the 'Score' values
data1 %>%
mutate(Class = factor(Class, levels = unique(Class[order(Score)])),
Grade = factor(Grade)) %>%
ggplot(aes(x = Grade, y = Score, fill = Class)) +
geom_col()
I've seen a lot of people use facets to visualize data. I want to be able to run this on every column in my dataset and then have it grouped by some categorical value within each individual plot.
I've seen others use gather() to plot histogram or densities. I can do that ok, but I guess I fundamentally misunderstand how to use this technique.
I want to be able to do just what I have below - but when I have it grouped by a category. For example, histogram of every column but stacked by the value color. Or dual density plots of every column with these two lines of different colors.
I'd like this - but instead of clarity it is every single column like this...
library(tidyverse)
# what I want but clarity should be replaced with every column except FILL
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(position = 'stack') +
facet_wrap(clarity~.)
# it would look exactly like this, except it would have the fill value by a group.
gathered_data = gather(diamonds %>% select_if(is.numeric))
ggplot(gathered_data , aes(value)) +
geom_histogram() +
theme_classic() +
facet_wrap(~key, scales='free')
tidyr::gather needs four pieces:
1) data (in this case diamonds, passed through the pipe into the first parameter of gather below)
2) key
3) value
4) names of the columns that will be converted to key / value pairs.
gathered_data <- diamonds %>%
gather(key, value,
select_if(diamonds, is.numeric) %>% names())
It's not entirely clear what you are looking for. A picture of your expected output would have been much more illuminating than a description (not all of us are native English speakers...), but perhaps something like this?
diamonds %>%
rename(group = color) %>% # change this line to use another categorical
# column as the grouping variable
group_by(group) %>% # select grouping variable + all numeric variables
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>% # gather all numeric variables
ggplot(aes(x = value, fill = group)) +
geom_histogram(position = "stack") +
theme_classic() +
facet_wrap(~ key, scales = 'free')
# alternate example using geom density
diamonds %>%
rename(group = cut) %>%
group_by(group) %>%
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>%
ggplot(aes(x = value, color = group)) +
geom_density() +
theme_classic() +
facet_wrap(~ key, scales = 'free')
I have the following example data frame:
Parameter<-c("As","Hg","Pb")
Loc1<-c("1","10","12")
Loc2<-c("3","14","9")
Loc3<-c("5","12","8")
Loc4<-c("9","20","6")
x<-data.frame(Parameter,Loc1,Loc2,Loc3,Loc4)
x$Loc1<-as.numeric(x$Loc1)
x$Loc2<-as.numeric(x$Loc2)
x$Loc3<-as.numeric(x$Loc3)
x$Loc4<-as.numeric(x$Loc4)
The Parameter column holds the names of the heavy metal and Loc1 to Loc4 columns hold the measured value of the heavy metal at the individual location.
I need a plot with one boxplot for each heavy metal at each location. The location is the grouping value. I tried the following:
melt<-melt(x, id=c("Parameter"))
ggplot(melt)+
geom_boxplot (aes(x=Parameter, y=value, colour=variable))
However, the resulting plot did not somehow grouped the boxplots by location.
A boxplot with one observation per Parameter per Location makes little sense (see my example at the end of my post). I assume you are in fact after a barplot.
You can do something like this
library(tidyverse)
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity")
Or with dodged bars
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, fill = Location)) +
geom_bar(stat = "identity", position = "dodge")
To demonstrate why a boxplot makes little sense, let's show the plot
x %>%
gather(Location, Value, -Parameter) %>%
ggplot(aes(x = Parameter, y = Value, colour = Location)) +
geom_boxplot()
Note that a single observation per group results in the bar being reduced to a single horizontal line. This is probably not what you want to show.
I'm trying to change my (stacked) bar width according to the counts (or proportion) of the categories, As an example i used the diamonds dataset. I want to see a varying width according to the frequency of each category (of the variable cut). I first created a variable cut_prop and then plotted with the following code
library(tidyverse)
cut_prop = diamonds %>%
group_by(cut) %>%
summarise(cut_prop = n()/nrow(diamonds))
diamonds = left_join(diamonds, cut_prop)
ggplot(data = diamonds,
aes(x = cut, fill = color)) +
geom_bar(aes(width=cut_prop), position = "fill") +
theme_minimal() +
coord_flip()
Which gave me the following barplot:
R gives a warning which tells: Ignoring unknown aesthetics: width and obviously doesn't take the proportion of categories for the width of the bars into account, anyone who can help me out here? Thanks!
I think this works. Starting where you left off...
df <- diamonds %>%
count(cut, color, cut_prop) %>%
group_by(cut) %>%
mutate(freq = n / sum(n)) %>%
ungroup
ggplot(data = df,
aes(x = cut, fill = color, y = freq, width = cut_prop)) +
geom_bar(stat = "identity") +
theme_minimal() +
coord_flip()
Essentially, I calculate the proportions myself instead of using position = "fill", then use stat = identity rather than stat = count.