How to remove extra column in facet_wrap plot with ggplot2? - r

I am trying to generate a facet plot with facet_wrap with an unbalanced grouped data, and it provided a plot with extra blank axis column.
Like the paragraph showed, I want to generate a plot without the rightmost axis column.
Here is an example code:
library(ggplot2)
name <- c(factor(letters[1:4]),factor(LETTERS[1:3]))
room <- rep(c('A','B'),c(4,3))
goal <- c(rnorm(7,mean=60,sd=10))
test <- data.frame(name,goal,room)
test %>% ggplot(aes(name, goal))+
facet_wrap(~factor(room))+
geom_bar(stat = "identity")
'scales="free"' way: automatic set, can it be set manually?
facetted_pos_scales in ggh4x developed by #teunbrand sovled the problem, thnaks! Here is the supplementary code:
library(ggh4x)
scales <- list(
scale_y_continuous(limits = c(0, 100)),
scale_y_continuous(limits = c(0, 80))
)
test %>% ggplot(aes(name, goal))+
facet_wrap(~factor(room), scales="free")+
geom_bar(stat = "identity")+
facetted_pos_scales(y=scales)

Update on comment of op:
Does this help: You can use coord_cartesian(ylim = c(0, 90))
to set the ylim:
test %>% ggplot(aes(name, goal))+
geom_bar(stat = "identity")+
coord_cartesian(ylim = c(0, 100)) +
facet_wrap(~factor(room), scales="free")
Use scales="free" instead of scales="free_x"
library(ggplot2)
name <- c(factor(letters[1:4]),factor(LETTERS[1:3]))
room <- rep(c('A','B'),c(4,3))
goal <- c(rnorm(7,mean=60,sd=10))
test <- data.frame(name,goal,room)
test %>% ggplot(aes(name, goal))+
facet_wrap(~factor(room), scales="free")+
geom_bar(stat = "identity")

Related

Normal curves on multiple histograms on a same plot

My example dataframe:
sample1 <- seq(100,157, length.out = 50)
sample2 <- seq(113, 167, length.out = 50)
sample3 <- seq(95,160, length.out = 50)
sample4 <-seq(88, 110, length.out = 50)
df <- as.data.frame(cbind(sample1, sample2, sample3, sample4))
I have managed to create histograms for these four variables, which share the same y-axis. Now I need an overlay normal curve. Based on previous posts, I've managed a density curve, but this is not what I want. This comes close, but I'd like a smooth line...
This is my current code for plotting:
df <- as.data.table(df)
new.df<-melt(df,id.vars="sample")
names(new.df)=c("sample","type","value")
cdat <- ddply(new.df, "type", summarise, value.mean=mean(value))
ggplot(data = new.df,aes(x=value)) +
geom_histogram(aes(x = value), bins = 15, colour = "black", fill = "gray") +
facet_wrap(~ type) + geom_density(aes(x = value),alpha=.2, fill="#FF6666") +
geom_vline(data=cdat, aes(xintercept=value.mean),
linetype="dashed", size=1, colour="black") +
theme_classic() +
theme(text = element_text(size = 15), element_line(size = 0.5),aspect.ratio = 0.75 )
And I found the following code, which I hoped would do the trick, but this gives me nothing:
stat_function(fun = dnorm, args = list(mean = mean(df$value), sd = sd(df$value)))
Unfortunately, stat_function doesn't play nicely with facets: it overlays the same function on each facet without taking account of the faceting variable.
One of the most common reasons I see for people posting ggplot questions on Stack Overflow is that they get lost while trying to coerce ggplot to do too much of their data manipulation. Functions like geom_smooth and geom_function are useful helpers for common tasks, but if you want to do something that is complex or uncommon, it is best to produce the data you want to plot, then plot it.
In fact, the main author of ggplot2 recommends this approach for a very similar problem to yours in this thread, saying:
I think you are better off generating the data outside of ggplot2 and then plotting it. See https://speakerdeck.com/jennybc/row-oriented-workflows-in-r-with-the-tidyverse to get started.
Hadley Wickham, 26 April 2018
So here's one way of doing that using tidyverse. You create a data frame of the dnorm for each sample and plot these using plain old geom_line.
Note that your histograms are counts, so you either need to change them to density, or multiply the dnorm output by the number of observations * the binwidth, otherwise you will just get an apparently "flat" line on the x axis, since the dnorm values will all be so small in relation to the counts:
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
dfn <- df %>%
pivot_longer(everything()) %>%
ddply("name", function(x) {
xvar <- seq(min(x$value), max(x$value), length.out = 100)
data.frame(value = xvar,
y = 5 * nrow(x) * dnorm(xvar, mean(x$value), sd(x$value)))
})
df %>%
pivot_longer(everything()) %>%
group_by(name) %>%
mutate(mean = mean(value), sd = sd(value)) %>%
ggplot(aes(value)) +
geom_histogram(aes(x = value), binwidth = 5,
colour = "black", fill = "gray") +
facet_wrap(~ name) +
geom_vline(aes(xintercept = mean),
linetype = "dashed", size=1, colour="black") +
geom_line(data = dfn, aes(y = y)) +
theme_classic() +
theme(text = element_text(size = 15), element_line(size = 0.5),
aspect.ratio = 0.75 )
Created on 2020-12-07 by the reprex package (v0.3.0)

R ggplot . HOW TO plot only the variables >0?

I am plotting the number of covid19 PCR in the towns of my province. The problem its that many town haven´t any PCR positive. I need a way to plot only the towns with at least 1+ PCR.
This is my code:
library(tidyverse)
library('data.table')
dfcsv1 <- read.csv("https://dadesobertes.gva.es/datastore/dump/ee17a346-a596-4866-a2ac-a530eb811737?bom=True",
encoding = "UTF-8", header = TRUE, sep = ",")
colnames(dfcsv1) <- c("code","code2","Municipio", "PCR", "TasaPCR", "PCR14",
"TasaPCR14", "Muertos", "TasaMuertos")
dfcsv1$TasaMuertos = as.numeric(gsub(",","\\.",dfcsv1$TasaMuertos))
dfcsv1$TasaPCR = as.numeric(gsub(",","\\.",dfcsv1$TasaPCR))
dfcsv1$TasaPCR14 = as.numeric(gsub(",","\\.",dfcsv1$TasaPCR14))
dfcsv1 %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x=Municipio, y=PCR14, fill =TasaPCR14)) +
geom_bar(stat="identity", width=0.6) +
coord_flip() +
geom_text(data=dfcsv1, aes(y=PCR14,label=PCR14),vjust=1)+
scale_fill_gradient(low="steelblue", high="red")
As others have said in the comments, you need to filter out the PCR14 that is greater than 0 before reordering the factor levels. However, you will also need to remove the data parameter from geom_text, otherwise all those factor levels come back and you will have a big mess. It's already a bit crowded with the zero levels removed.
I think you should also change the vjust to an hjust to put the text in a nicer position since you have flipped the coordinates, with a compensating increase in the (flipped) y axis range to accommodate it:
dfcsv1 %>%
filter(PCR14 > 0) %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x = Municipio, y = PCR14, fill = TasaPCR14)) +
geom_bar(stat = "identity", width = 0.6) +
coord_flip() +
geom_text(aes(y = PCR14,label = PCR14), hjust= -0.5) +
scale_fill_gradient(low = "steelblue", high = "red") +
ylim(c(0, 45))
Incidentally, it looks a lot better with the ones removed too:
dfcsv1 %>%
filter(PCR14 > 1) %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x=Municipio, y=PCR14, fill =TasaPCR14)) +
geom_bar(stat="identity", width=0.6) + coord_flip() +
geom_text(aes(y=PCR14,label=PCR14),hjust=-0.5)+
scale_fill_gradient(low="steelblue", high="red") +
ylim(c(0, 45))
As a general rule, regardless of the type of plot or whether you are using ggplot , lattice or the base plot function, subsetting should happen first.
plot(x[y>0] , y[y>0])
The rest is aesthetics.

Plot many variables

Having a dataframe like this one:
From a dataframe like this one:
data <- data.frame(year = c(2010,2011,2012,2010,2011,2012),
name = c("stock1","stock1","stock1","stock2","stock2","stock2"),
value = c(0,3,1,4,1,3))
I would like to create a plot and I use this:
library(ggplot2)
ggplot(data=data, xName="year", groupName="name", brewerPalette="Blues")
but I can't receive the plot. Anything wrong in the call?
I think you need something like this:
library(ggplot2)
library(dplyr)
library(RColorBrewer)
df %>%
group_by(name) %>%
ggplot(aes(year,value,fill=name))+
geom_col()+
scale_fill_brewer(palette = "Blues")
If you want a grouped bar plot (as I guessed from your code), this code may be helpful:
ggplot(data = data, aes(x = as.factor(year), y = value, fill = name)) +
geom_bar(stat = "identity", position = position_dodge(0.8), width = 0.7) +
scale_fill_brewer(palette = "Blues")

Split data to plot histograms side-by-side in R

I am learning R with the Australian athletes data set.
By using ggplot, I can plot a histogram like this.
library(DAAG)
ggplot(ais, aes(wt, fill = sex)) +
geom_histogram(binwidth = 5)
By using summary(ais$wt), the 3rd Quartile is 84.12. Now I want to split the data by the wt 84.12. and plot 2 similar histograms accordingly (side by side)
The split is:
ais1 = ais$wt[which(ais$wt>=0 & ais$wt<=84.12)]
ais2 = ais$wt[which(ais$wt>84.12)]
But I don’t know how to fit them in the plotting. I tried but it doesn't work:
ggplot(ais1, aes(wt, fill = sex)) +...
How can I plot the histograms (2 similar histograms accordingly, side by side)?
Add the split as a column to your data
ais$wt_3q = ifelse(ais$wt < 84.12, "Quartiles 1-3", "Quartile 4")
Then use facets:
ggplot(ais, aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
facet_wrap(~ wt_3q)
The created variable is a factor, if you specify the order of the levels you can order the facets differently (lots of questions on here showing that if you search for them - same as reordering bars for a ggplot barplot). You can also let the scales vary - look at ?facet_wrap for more details.
Generally, you shouldn't create more data frames. Creating ais1 and ais2 is usually avoidable, and your life will be simpler if you use a single data frame for a single data set. Adding a new column for grouping makes it easy to keep things organized.
We can do this with ggarrange to arrange the plot objects for each subset
library(DAAG)
library(ggplot2)
library(ggpubr)
p2 <- ais %>%
filter(wt>=0, wt<=84.12) %>%
ggplot(., aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
coord_cartesian(ylim = c(0, 30))
p1 <- ais %>%
filter(wt>84.12) %>%
ggplot(., aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
coord_cartesian(ylim = c(0, 30))
ggarrange(p1, p2, ncol =2, nrow = 1, labels = c("p1", "p2"))
-output

Grouping data outside limits in histogram using ggplot2

I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?
Sample code:
x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) +
scale_y_continuous(labels = percent) +
coord_cartesian(xlim=c(0, 10)) +
scale_x_continuous(breaks = 0:10)
Here is how the histogram looks now:
How the histogram looks now
And here is how I would like it to look:
How the histogram should look
Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?
You could use forcats and dplyr to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:
library(forcats)
library(dplyr)
library(ggplot2)
x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>%
mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>%
group_by(x_grp) %>%
dplyr::summarize(count = n())
ggplot(x2, aes(x = x_grp, y = count/10000)) +
geom_bar(stat = "identity", colour = "grey50") +
scale_y_continuous(labels = percent)
However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution:

Resources