Box plot with ggplot2 using data from read.table

Box plot with ggplot2 using data from read.table - r

I am plotting a box plot that shows the height of students. However I am unsure what I use as x and y. I have only measurments, so one should be height and the other one amount of students that have that height.
x=N, y=Height
My code:
# Library
library(ggplot2)
library(tidyverse)
# 1. Read data (comma separated)
data = read.table(text = "184,180,183,184,184,160,173",
sep=",",stringsAsFactors=F, na.strings="unknown")
# 2. Print table
print(data)
# 3. Plot box plot
data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")

I think the best plot to represent a vector of data is an histogram. However you could use the boxplot by create a dummy factor that group your observation. i.e.
data %>%
pivot_longer(cols = everything()) %>%
mutate(type="student") %>%
ggplot(aes(x=type, y=value)) +
geom_boxplot() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")
if you want a histogram (I think much better for your situation), you don'ty need the dummy factor and you could do something like :
data %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x=value)) +
geom_histogram() +
theme_classic() +
xlab("Students") +
ylab("Height") +
ggtitle("Height of students")

To use a boxplot correctly, you have to have one categorical variable and one continuous. Put the categorical (e.g. make, female, etc.) on the x-axis and the continuous on the y-axis (height in your case).

Related

Combine scale_x_upset with scale_y_break

I made an upset plot using the ggupset package and added a break to the y axis with scale_y_break from the ggbreakpackage.
However, when I add scale_y_break, the combination matrix under the bar plot disappears.
Is there a way to combine the combination matrix of the plot made without scale_y_break with the bar plot portion of a plot made with scale_y_break? I can't seem to be able to access the grobs of these plots or use any other workaround. If anyone could help, I would greatly appreciate it!
Example with scale_x_upset and scale_y_break:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
I would like to combine the barplot portion of the plot created with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
with the combination matrix portion of the plot made with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)
Thanks!

Can you add a count to the legend for each level of a factor in ggplot?

I am producing a scatterplot using ggplot, and will be colouring the data points by a given factor. The legend that is produced, details the colour assigned to each level of the factor, but is it possible for it to also count the number of points in each factor.
For example, I have included the code for the cars data set:
p <- ggplot(mtcars, aes(wt, mpg))
p + geom_point(aes(colour = factor(cyl)))
In this plot, I would be looking to have the count for each number of cylinders. So 4(Count 1), 6(Count 2) and 8(Count 3).
Thanks in advance.

you can try something like this
mtcars %>%
group_by(cyl) %>%
mutate(label = paste0(cyl, ' (Count ', n(), ')')) %>%
ggplot(aes(wt, mpg)) +
geom_point(aes(colour = factor(label)))

ggplot for each column in a data

I am missing some basics in R.
How do I make a plot for each column in a data frame?
I have tried making plots for each column separately. I was wondering if there was a easier way?
library(dplyr)
library(ggplot2)
data(economics)
#scatter plots
ggplot(economics,aes(x=pop,y=pce))+
geom_point()
ggplot(economics,aes(x=pop,y=psavert))+
geom_point()
ggplot(economics,aes(x=pop,y=uempmed))+
geom_point()
ggplot(economics,aes(x=pop,y=unemploy))+
geom_point()
#boxplots
ggplot(economics,aes(y=pce))+
geom_boxplot()
ggplot(economics,aes(y=pop))+
geom_boxplot()
ggplot(economics,aes(y=psavert))+
geom_boxplot()
ggplot(economics,aes(y=uempmed))+
geom_boxplot()
ggplot(economics,aes(y=unemploy))+
geom_boxplot()
All I'm looking for is having 1 box plot 2*2 and 1 2*2 scatter plot with ggplot2. I understand there is facet grid which I have failed to understand how to implement.(I believe this can be achieved easily with par(mfrow()) and base R plots. I saw somewhere else using using widening the data? which i didn't understand.

In cases like this the solution is almost always to reshape the data from wide to long format.
economics %>%
select(-date) %>%
tidyr::gather(variable, value, -pop) %>%
ggplot(aes(x = pop, y = value)) +
geom_point(size = 0.5) +
facet_wrap(~ variable, scales = "free_y")
economics %>%
tidyr::gather(variable, value, -date) %>%
ggplot(aes(y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y")

Displaying a Cross-tabulation As a Plot on RStudio

I'm trying to visualize a cross-tabulation on RStudio using ggplot2. I've been able to create plots in the past, and have a cross-tabulation done as well, but can't crack this. Can anyone help?
Here's my code for an x-tab:
library(dplyr)
data_dan %>%
group_by(Sex, Segment) %>%
count(variant) %>%
mutate(prop = prop.table(n))
and here's what I've got for creating a plot:
#doing a plot
variance_art_new.plot = ggplot(data_dan, aes(Segment, fill=variant)) +
geom_bar(position="fill")+
theme_classic()+
scale_fill_manual(values = c("#fc8d59", "#ffffbf", "#99d594"))
variance_art_new.plot
Here's a sample of the data I'm operating with:
Word Segment variant Position Sex
1 LIKE K R End Female
2 LITE T S End Male
3 CRACK K R End Female
4 LIKE K R End Male
5 LIPE P G End Female
6 WALK K G End Female
My aim is to have the independent variables of 'Sex', 'Segment' plotted on a boxplot against the dependent variable 'variant'.
I included the first code to show that I can create a table to show this cross-tabulation and the second bit is what I normally do for running a box plot for just one independent variable.

I'm still not sure if this gets all the way to what you are asking, but if you are asking for counts (or portions) within two separate variable, you can use facet_wrap to separate the two groups.
(Note, all of these are run with theme_set(theme_bw()) because I prefer it for this type of plot.)
Working with the builtin dataset mtcars you can get counts with:
mtcars %>%
ggplot(aes(x = factor(cyl), fill = factor(gear))) +
geom_bar() +
facet_wrap(~vs)
Or with the sorting reversed with:
mtcars %>%
ggplot(aes(x = factor(vs), fill = factor(gear))) +
geom_bar() +
facet_wrap(~cyl, labeller = label_both)
You can also plot the within-group distribution by using position = "fill"
mtcars %>%
ggplot(aes(x = factor(vs), fill = factor(gear))) +
geom_bar(position = "fill") +
facet_wrap(~cyl, labeller = label_both) +
scale_y_continuous(name = "Within group Percentage"
, labels = scales::percent)

Plot including one categorical variable and two numeric variables

How can I show the values of AverageTime and AverageCost for their corresponding type on a graph. The scale of the variables is different since one of them is the average of time and another one is the average of cost. I want to define type as x and y refers to the value of AverageTime and AverageCost. (In this case, I will have two line plots just in one graph)
Type<-c("a","b","c","d","e","f","g","h","i","j","k")
AverageTime<-c(12,14,66,123,14,33,44,55,55,6,66)
AverageCost<-c(100,10000,400,20000,500000,5000,700,800,400000,500,120000)
df<-data.frame(Type,AverageTime,AverageCost)

This could be done using facet_wrap and scales="free_y" like so:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
mutate(AverageCost=as.numeric(AverageCost), AverageTime=as.numeric(AverageTime)) %>%
gather(variable, value, -Type) %>%
ggplot(aes(x=Type, y=value, colour=variable, group=variable)) +
geom_line() +
facet_wrap(~variable, scales="free_y")
There you can compare the two lines even though they are different scales.
HTH

# install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
p <- ggplot(df, aes(AverageTime, AverageCost, colour=Type)) + geom_point()
p + geom_abline()

To show both lines in the same plot it will be hard since there are on different scales. You also need to convert AverageTime and AverageCost into a numeric variable.
library(ggplot2)
library(reshape2)
library(plyr)
to be able to plot both lines in one graph and take the average of the two, you need to some reshaping.
df_ag <- melt(df, id.vars=c("Type"))
df_ag_sb <- df_ag %>% group_by(Type, variable) %>% summarise(meanx = mean(as.numeric(value), na.rm=TRUE))
ggplot(df_ag_sb, aes(x=Type, y=as.numeric(meanx), color=variable, group=variable)) + geom_line()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Box plot with ggplot2 using data from read.table - r

To use a boxplot correctly, you have to have one categorical variable and one continuous. Put the categorical (e.g. make, female, etc.) on the x-axis and the continuous on the y-axis (height in your case).

Related

Combine scale_x_upset with scale_y_break

Can you add a count to the legend for each level of a factor in ggplot?

ggplot for each column in a data

Displaying a Cross-tabulation As a Plot on RStudio

Plot including one categorical variable and two numeric variables

Categories

Resources