using a boxplot in R - r

I am trying to make a density plot out of 4000 rows (height.values), with 4 different categories (height.ind). this is the code i used.
library(ggplot2)
plom %>%
ggplot(aes(x = height.values, color=height.ind)) +
geom_density() +
labs(title = "height alimony")
I am able to get a density plot but there are a lot of lines instead of the 4 i want.
Anyone has an idea to fix it?

Related

box plots with individual observations

very beginner question here:
I have a dataset of 4 columns of values and I need to create a graph with 4 boxplots showing average and standard deviation, and I wanted to know how to also show the individual observations as points (with ggplot2).
Thank you for your help!!!!
This is relatively simple, as you can add multiple geom_s in ggplot.
Here is a small example that showcases the geom_boxplot in combination with geom_jitter.
In order to also be able to show outliers in a box plot (if that is what you want), you can add color or different point-types with e.g. geom_boxplot(outlier.color = "red").
library(tidyverse)
iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot(outlier.colour = "red") + # Add the boxplot geom
geom_jitter(width = 0.1) # Add the points with a random jitter on the X-axis
Created on 2022-08-11 by the reprex package (v2.0.0)

Partaly "free_y" Facet Wrap with ggplot

my goal is to produce a column graph showing different element concentrations.
there is a very wide range so I want to customise the scale of my faceted graph into 3 groups.
that way the graphs are able to show the variation in samples for each element and still be comparable between elements,
so idealy I would have 3 different scales for Groups 1,2,and 3 in the graph below.
this is the code to make the above graph
ggplot(binded)+
aes(y=mean,
x=sample,
group=id)+
geom_col(aes(fill=element))+
geom_errorbar(aes(ymin = mean - sd,
ymax = mean + sd))+
facet_wrap(rang~element)+
scale_x_continuous(breaks = seq(1,15,by=1),
name = "Sample ID")+
scale_y_continuous(name="Elemental Conc. (mg/kg)",labels = comma)+
theme(legend.position = "none")
and the data used is below
if i swich the facting to facet_wrap(rang~element,scales = "free_y") then i get
is there any way to mage the scales only free within each group of rang?
i suspect im going to have to just create 3 seperat graphs.
Thanks to Danlooo for the suggestion of patchwork that package and creating 3 separate graphs + plus another one for the y axis label proved successful.
I produced several graphs with the original code and a data frame filters for different concentrations. and the following patchwork code to produce the following graph
p5<-(p1 | p2) / p3+ plot_layout(heights=c(1,2))
(p4+p5)+plot_layout(widths = c(1, 25))

ggplot boxplot: too many outliers?

The dataset is available here but I am only using the ones from Year 2010 - 2016 as a subset: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results/
I am trying to plot the height of different gender with a boxplot and it returns this plot:
I felt that it is not correct since there are way too many outliers...(mean=175, min=133, max=221).
I was wondering if I need to adjust the Y-axis to include more data points in this boxplot? If so, how can I do that?
Here is my code:
ggplot(data = olympics, aes(x = Sex, y = Height) +
geom_boxplot() +
labs(title= "Height Distribution of Olympics Athletes by Gender")
Also, I was wondering if it is possible to plot such a graph with base R language as well? Thank you!
Welcome to stackoverflow #VanLindert. The best way to get help is to give us code to run that replicates the problem. The datapasta and reprex packages make this easy to do. https://reprex.tidyverse.org/articles/articles/datapasta-reprex.html
What I suspect is going on is that you are readjusting the y-axis limits and the boxplot keeps changing. When you use plot + scale_y_continuous(limits = c(130, 225)) or the shorthand plot + ylim(130, 225) ggplot filters out values above/below those 130 and 225 and the quartiles are recalculated. If you want to just zoom in on the plot to a specific range, you can use
plot + coord_cartesian(ylim = c(130, 225))

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

How can I change the colors in a ggplot2 density plot?

Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green

Resources