boxplot in R is howing a vertical straight line - r

I have a data frame of multiple columns. I want to create a two boxplots of the two variable "secretary" and "driver" but the result is not satisfiying as the picture shows boxplot. This is my code:
profession ve.count.descrition euse.count.description Qualitative.result
secretary 0 1 -0.5
secretary 0 2 1
driver 1 1 -1
driver 0 2 0.3
data %>%
mutate(Qualitative.result = factor(Qualitative.result)) %>%
ggplot(aes(x = Profession , fill = Qualitative.result)) +
geom_boxplot()

You should not make Qualitative.result as factor. Maybe you want something like this:
library(tidyverse)
data %>%
ggplot(aes(x = Profession, y = Qualitative.result, fill = Profession)) +
geom_boxplot()
Output:

Related

How to create a dot plot from multiple columns in one plot?

Consider a df that I would like to plot.
The exemplary df:
df
Entry A. B. C. D. Value
O60701 1 1 1 0 2.7181970
Q8WZ42 1 1 1 1 3.6679832
P60981 1 1 0 0 2.2974231
Q15047 1 0 0 0 0.5535473
Q9UER7 1 0 0 0 4.1030394
I want Entry to be on y axis and Value on x axis. Do you have any ideas how to create a plot, so that if a protein is found (==1) let us say in column A it would be a dot on a plot? Since we have four columns (A-D), there can be maximum 4 dots. Hence, I would like to be able to distinguish which dot (or any other shape) comes from which column.
Here is what I have so far:
ggplot(df, aes(x=Value, y=Entry)) +
geom_point(size=1) +
theme_ipsum()
library(tidyverse)
df %>%
pivot_longer(cols = A:D) %>%
# by default, pivot_longer creates `name` column with either A/B/C/D,
# and a `value` column holding the original 0/1 value from those columns
filter(value == 1) %>% # only plot if protein found (A/B/C/D==1)
ggplot(aes(Value, Entry, color = name)) +
geom_jitter(height = 0.1, width = 0.1) + # since you have multiple points at the same locations
hrbrthemes::theme_ipsum()

no. of geom_point matches the value

I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As
As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)

Looping over columns of data frame to create plots with ggplot2

I am trying to overcome this. Can't get any further.
I have a dataframe with factor and numeric variables. Herewith displayed are the first few rows and columns.
# A tibble: 6 x 5
cluster SEV_D SEV_M OBS PAN
<int> <dbl> <dbl> <fct> <fct>
1 1 5 1 0 1
2 2 6 1 0 0
3 1 5 1 0 1
4 2 4 2 0 0
5 1 4 1 1 1
6 1 4 2 1 0
cluster=as.factor(c(1,2,1,2,1,1))
SEV_D=as.numeric(c(5,6,5,4,4,4))
SEV_M=as.numeric(c(1,1,1,2,1,2))
OBS=as.factor(c(0,0,0,0,1,1))
PAN=as.factor(c(1,0,1,0,1,0))
data<-data.frame(cluster,SEV_D,SEV_M,OBS,PAN)
I split the dataframe like this, in numeric and factor variables, keeping 'cluster' in both subsets since I need it for grouping.
data_fact <- data[, sapply(data, class) == 'factor']
data_cont <- data[, sapply(data, class) == 'numeric' | names(data)
== "cluster"]
The two following snippets of code would produce the plots I want.
data_fact %>% group_by(cluster,OBS)%>%summarise(total.count=n()) %>%
ggplot(., aes(x=cluster, y=total.count, fill=OBS)) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=total.count),
position=position_dodge(width=0.9), vjust=-0.2)
data_cont %>% group_by(cluster) %>% dplyr::summarise(mean =
mean(SEV_D), sd = sd(SEV_D)) %>%
ggplot(.,aes(x=cluster,y=mean))+geom_bar(position=position_dodge(),
stat="identity",colour="black",size=.3)+geom_errorbar(aes(ymin=mean-
sd, ymax=mean+sd),size=.3,width=.4,position=position_dodge(.4)) +
ggtitle("SEV_D")
My goal is to create as many graphs as variables in the data frame, looping over columns and to store such graphs in one single sheet.
My attempt was
col<-names(data_fact)[!names(data_fact)%in%"cluster"]
for(i in col) {
data_fact %>% group_by(cluster,i)%>%summarise(total.count=n()) %>%
ggplot(., aes(x=cluster, y=total.count, fill=i)) + geom_bar(position
= 'dodge', stat='identity') + geom_text(aes(label=total.count),
position=position_dodge(width=0.9), vjust=-0.2)
}
But it throws this error:
Error in grouped_df_impl(data, unname(vars), drop) :
Column i is unknown
On top of that, that code would not display all graphs in one sheet I am afraid. Any help would be much appreciated!!!
The link above is a good reference. Or see Rstudio's tidyeval cheatsheet: https://github.com/rstudio/cheatsheets/raw/master/tidyeval.pdf
To evaluate i in the ggplot statement, you need to unquote the string with the !!ensym( ) function construct. Also, you will need to use the print statement to print the plots within the loop.
library(ggplot2)
col<-names(data_fact)[!names(data_fact)%in%"cluster"]
for(i in col) {
print(i)
g<-data_fact %>% group_by(cluster, !!ensym(i)) %>% summarise(total.count=n()) %>%
ggplot(., aes(x=cluster, y=total.count, fill=!!ensym(i))) +
geom_bar(position = 'dodge', stat='identity') +
geom_text(aes(label=total.count), position = position_dodge(width=0.9), vjust=-0.2) +
labs(title=i)
print(g)
}

plot count of values by factor in ggplot

New to R, stuck googling this (probably easy) thing for too long.
I want to plot the proportion of males that fathered offspring, according to whether they have a nest or not. (I don't want the information of how many offspring they fathered). This is my dataset called "males"
fishID nest off
fish1 1 25
fish2 0 0
fish3 0 5
fish4 1 15
fish5 1 0
fish6 0 2
fish7 0 0
fish8 1 4
I've used the following code to change the values of offspring to 0 and 1 (though this feels clumsy already)...
#converts the values in offspring to 0 and 1s
vars=c("off")
males[males$off != "0", vars]="1"
males
...and I can plot proportions using...
ggplot(males,aes(x = males$nest,fill = males$off)) +
geom_bar(position = "fill")
...but I would like to colour them so that 0 (no nest) is one colour and 1 (nest) is another colour, then the proportion of males that didn't father offspring is a paler version of each colour. The above produces colours according to "offspring", irrespective of "nest".
Tips welcome.
(Mac OS X, R 3.0.3 GUI 1.63 Snow Leopard build (6660))
Is this what you're looking for?
library(ggplot2)
males$nest <- as.factor(males$nest)
males$off <- as.factor(males$off)
ggplot(males, aes(x = nest, fill = off)) +
geom_bar(width = 0.25) +
scale_fill_manual(values = c('green', 'darkgreen'))
Done it! Thank you. It was the fill by interaction I was missing.
require(ggplot2)
males$off <- factor(as.numeric(males$off != 0))
males$nest <- as.factor(males$nest)
ggplot(males, aes(x = nest, fill = interaction(males$nest, males$off))) + geom_bar(width = 0.25) + scale_fill_manual(values = c('deepskyblue3', 'tomato3', 'deepskyblue', 'tomato'))
(Eventually needed the same number of lines of code as days googling...)

barplots in R comparing data from two columns

I have the following:
> ArkHouse2014 <- read.csv(file="C:/Rwork/ar14.csv", header=TRUE, sep=",")
> ArkHouse2014
DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349
What I would like to do is make a barplot (or series of barplots) to compare the totals in the second and third columns on the y-axis while the x-axis would display the information in the first column.
It seems like this should be very easy to do, but most of the information on making barplots that I can find has you make a table from the data and then barplot that, e.g.,
> table(ArkHouse2014$GOP)
2,936 3,258 3,508 3,573 3,581 3,588 3,638 3,830 3,899 3,951 4,133 4,166 4,319 4,330 4,345 4,391 4,396 4,588
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4,969 5,130 5,177 5,343 5,425 5,466 5,710 5,991 6,070 6,100 6,234 6,490 6,550 6,980 7,847 8,846
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I don't want the counts of how many have each total, I'd like to just represent the quantities visually. I feel pretty stupid not being able to figure this out, so thanks in advance for any advice you have to offer me.
Here's an option using libraries reshape2 and ggplot2:
I first read your data (with dec = ","):
df <- read.table(header=TRUE, text="DISTRICT GOP DEM
1 AR-60 3,951 4,001
2 AR-61 3,899 4,634
3 AR-62 5,130 4,319
4 AR-100 6,550 3,850
5 AR-52 5,425 3,019
6 AR-10 3,638 5,009
7 AR-32 6,980 5,349", dec = ",")
Then reshape it to long format:
library(reshape2)
df_long <- melt(df, id.var = "DISTRICT")
Then create a barplot using ggplot:
library(ggplot2)
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
or if you want the bars stacked:
ggplot(df_long, aes(x = DISTRICT, y = value, fill = variable)) +
geom_bar(stat = "identity")

Resources