ggplot statistical differences in plot labels (reproducible code included)

ggplot statistical differences in plot labels (reproducible code included) - r

I have a code that generates two plots (actually from different datasets) like this one:
#Plot 1
p1 <- ggplot(mtcars,aes(x=factor(cyl),fill=factor(gear)))+
geom_bar(position="fill")+
geom_text(aes(label=scales::percent(..count../sum(..count..))),
stat='count',position=position_fill(vjust=0.5))
#Plot 2
p2 <- ggplot(mtcars,aes(x=factor(cyl),fill=factor(gear)))+
geom_bar(position="fill")+
geom_text(aes(label=scales::percent(..count../sum(..count..))),
stat='count',position=position_fill(vjust=0.5))
plot <- p1 + p2
plot
Is it possible using gglpot o other library to test statistical differences among factos and if there are statistical differences among them to change the label from 25% from something like "25% ↑" or "25% ** " so what I want is to compare values and change labeling to include statistical differences. In my example values are the same but in reality plots are coming from different datasets.

As MrFlick mentioned, ggplot might not be the right tool to do the calculations. But once you have your calculations, you could do something like that
# some date with calculated levels of significance
dplyr::tibble(YEAR=rep(c(2019,2020),eac=3),
GRP=rep(c("A","B","C"),2),
VAL=c(20,100,30,25,70,30),
SIG=rep(c("*","***",""),2)) %>%
# create labels
dplyr::group_by(GRP) %>%
dplyr::mutate(LABEL=dplyr::case_when(VAL/sum(VAL)<0.5 ~ paste("<",SIG),
VAL/sum(VAL)>0.5 ~ paste(">",SIG),
TRUE ~ paste(""))) %>%
dplyr::ungroup() %>%
# calculate percentages
dplyr::group_by(YEAR) %>%
dplyr::mutate(VAL=VAL/sum(VAL)) %>%
dplyr::ungroup() %>%
# plot data: combining percentages and sig-levels as label
ggplot2::ggplot(ggplot2::aes(x=YEAR,
y=VAL,
fill=GRP,
label=glue::glue("{scales::percent(VAL)} {LABEL}"))) +
ggplot2::geom_bar(stat="identity") +
ggplot2::geom_text(position=ggplot2::position_fill(0.5))

Related

Calculating Wilson interval to plot binomial proportions in R?

I have a data frame consisting of six variables -- one two-level grouping variable indicating treatment status and four binary (0/1) variables. I would like to plot the proportion of successes with 95% confidence intervals as error bars for each binary variable, including separate dots and colors for each treatment group.
I'm currently plotting these as shown below.
df2 <-
df %>%
select(., c(q1_active, # select variables
q2_appt,
q2_trmt,
q2_img,
q2_tele,
q2_trav))
df3 <-
df2 %>%
pivot_longer(cols = starts_with("q2"),
names_to = "variable",
names_prefix = "q2",
values_to = "values")
se <- function(x) sqrt(var(x)/length(x)) #creates function to calculate standard error of the mean
df4 <-
df3 %>%
group_by(variable, q1_active) %>% # group by both binom variable and treatment status
mutate(means=mean(values)) %>% # calculate proportions for binomial variables
mutate(se=se(values)) %>% # calculates std error
distinct(means, .keep_all=TRUE)
ungroup() %>%
drop_na() # there is one "NA" group in the treatment variable I do not need
pos <- position_dodge(.5)
p2 <-
df5 %>%
ggplot(., aes(x=variable, y=means)) +
geom_point(aes(colour=as.factor(q1_active)),position=pos) +
geom_errorbar(aes(ymin=means-(1.96*se), ymax=means+(1.96*se),
colour=as.factor(q1_active),
group=as.factor(q1_active)),
width=.2, position=pos) +
labs(title="Title Here",
subtitle="Subtitle Here",
x="",
y="")
The plot looks okay. I know the proportions are correct because I've double-checked the "means" variable.
However, I'm unsure that I'm calculating the standard error correctly for these proportions. Additionally (and as you can likely see), when I run the plot, I have one proportion with zero frequency. I would like to instead calculate and plot the Wilson interval for these proportions instead of the standard error as I have done.
Could someone(s) guide me on how to correctly calculate for these binomial proportions the Wilson (or "exact") confidence interval -- either before or after I pivot my data frame -- and how to plot these using ggplot?
I'm relatively new to coding and R, so please forgive any sloppy code or misunderstandings. And please let me know if you need clarification on anything. Thank you in advance.

How to make plots scales the same or trun them into Log scales in ggplot

I am using this script to plot chemical elements using ggplot2 in R:
# Load the same Data set but in different name, becaus it is just for plotting elements as a well log:
Core31B1 <- read.csv('OilSandC31B1BatchResultsCr.csv', header = TRUE)
#
# Calculating the ratios of Ca.Ti, Ca.K, Ca.Fe:
C31B1$Ca.Ti.ratio <- (C31B1$Ca/C31B1$Ti)
C31B1$Ca.K.ratio <- (C31B1$Ca/C31B1$K)
C31B1$Ca.Fe.ratio <- (C31B1$Ca/C31B1$Fe)
C31B1$Fe.Ti.ratio <- (C31B1$Fe/C31B1$Ti)
#C31B1$Si.Al.ratio <- (C31B1$Si/C31B1$Al)
#
# Create a subset of ratios and depth
core31B1_ratio <- C31B1[-2:-18]
#
# Removing the totCount column:
Core31B1 <- Core31B1[-9]
#
# Metling the data set based on the depth values, to have only three columns: depth, element and count
C31B1_melted <- melt(Core31B1, id.vars="depth")
#ratio melted
C31B1_ra_melted <- melt(core31B1_ratio, id.vars="depth")
#
# Eliminating the NA data from the data set
C31B1_melted<-na.exclude(C31B1_melted)
# ratios
C31B1_ra_melted <-na.exclude(C31B1_ra_melted)
#
# Rename the columns:
colnames(C31B1_melted) <- c("depth","element","counts")
# ratios
colnames(C31B1_ra_melted) <- c("depth","ratio","percentage")
#
# Ploting the data in well logs format using ggplot2:
Core31B1_Sp <- ggplot(C31B1_melted, aes(x=counts, y=depth)) +
theme_bw() +
geom_path(aes(linetype = element))+ geom_path(size = 0.6) +
labs(title='Core 31 Box 1 Bioturbated sediments') +
scale_y_reverse() +
facet_grid(. ~ element, scales='free_x') #rasterImage(Core31Image, 0, 1515.03, 150, 0, interpolate = FALSE)
#
# View the plot:
Core31B1_Sp
I got the following image (as you can see the plot has seven element plots, and each one has its scale. Please ignore the shadings and the image at the far left):
My question is, is there a way to make these scales the same like using log scales? If yes what I should change in my codes to change the scales?

It is not clear what you mean by "the same" because that will not give you the same result as log transforming the values. Here is how to get the log transformation, which, when combined with the no using free_x will give you the plot I think you are asking for.
First, since you didn't provide any reproducible data (see here for more on how to ask good questions), here is some that gives at least some of the features that I think your data has. I am using tidyverse (specifically dplyr and tidyr) to do the construction:
forRatios <-
names(iris)[1:3] %>%
combn(2, paste, collapse = " / ")
toPlot <-
iris %>%
mutate_(.dots = forRatios) %>%
select(contains("/")) %>%
mutate(yLocation = 1:n()) %>%
gather(Comparison, Ratio, -yLocation) %>%
mutate(logRatio = log2(Ratio))
Note that the last line takes the log base 2 of the ratio. This allows ratios in each direction (above and below 1) to plot meaningfully. I think that step is what you need. you can accomplish something similar with myDF$logRatio <- log2(myDF$ratio) if you don't want to use dplyr.
Then, you can just plot that:
ggplot(
toPlot
, aes(x = logRatio
, y = yLocation) ) +
geom_path() +
facet_wrap(~Comparison)
Gives:

ggplot grouped barchart based on marginal proportions

I am trying to create a grouped barplot that uses marginal (row) proportions rather than cell proportions and can't figure out how to change:
y = (..count..)/sum(..count..)
in ggplot to do this.
Using the mtcars dataset as an example and considering two categorical variables (cyl and am - purely for the sake of the example taking cyl as the response and am as the explanatory variable). Can anyone help me to do this:
data(mtcars)
# Get Proportions
mtcars_xtab <- table(mtcars$cyl,mtcars$am)
mtcars_xtab
margin.table(mtcars_xtab, 1) # A frequencies (summed over B)
margin.table(mtcars_xtab, 2) # B frequencies (summed over A)
prop.table(mtcars_xtab) # cell percentages - THIS IS WHAT'S USED IN THE PLOT
prop.table(mtcars_xtab, 1) # row percentages - THESE ARE WHAT I WANT TO USE IN THE PLOT
# Make Plot
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)
ggplot(mtcars, aes(x=am, fill=cyl)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "dodge") +
scale_fill_brewer(palette="Set2")
Thank you.

Graphing a histogram overlaid with a fitted 2 parameter Weibull function

I would like to plot both a histogram to a fitted Weibull function on the same graph. The code to plot the histogram is:
hist(data$grddia2, prob=TRUE,breaks=5)
The code for the fitted Weibull function is:(Need the MASS package)
fitdistr(data$grddia2,densfun=dweibull,start=list(scale=1,shape=2))
How do I plot both together on the same graph. I've attached the data set.
Also, bonus to anyone who can provide code that can achieve the same thing, but create a graph for each column of data. Many columns within a data set. Would be nice to have all graphs on the same page.
https://www.dropbox.com/s/ra9c2kkk49vyyyc/Diameter%20Distribution.csv?dl=0

Here is the code
library("ggplot2")
library("dplyr")
library("tidyr")
library("MASS")
# Import dataset and filter the column "treeno"
# Use namespace dplyr:: explicitly because of conflict with MASS:: for function "select"
data <- read.csv("Diameter Distribution.csv") %>%
dplyr::select(-treeno)
# Function to provide the Weibull distribution for each column
# The distribution is calculated based on the estimated scale and shape parameters of the input
fitweibull <- function(column) {
x <- seq(0,7,by=0.01)
fitparam <- column %>%
unlist %>%
fitdistr(densfun=dweibull,start=list(scale=1,shape=2))
return(dweibull(x, scale=fitparam$estimate[1], shape=fitparam$estimate[2]))
}
# Apply function for each column then consolidate all in a data.frame
fitdata <-data %>%
apply(2, as.list) %>%
lapply(FUN = fitweibull) %>%
data.frame()
# Display graphs
multiplyingFactor<-10
ggplot() +
geom_histogram(data=gather(data), aes(x=value, group=key, fill=key), alpha=0.2) +
geom_line(data=gather(fitdata), aes(x=rep(seq(0,7,by=0.01),ncol(fitdata)), y=multiplyingFactor*value, group=key, color=key))
And the output figure
Variant: thanks to the wonderful ggplot2 package you can also have the graphs apart just by adding this final line of code
+ facet_wrap(~ key) + theme(legend.position = "none")
Which gives you this other figure:

Creating a histogram in R that shows the difference in the number of errors made by three groups

I have to create a histogram in RStudio with the number of errors of three (3) Data groups. There are GroupA, GroupB and GroupC. Each one of them has 4 variables and one of them is the "errors" variable. So its like GroupA$errors etc..
How am I going to combine these 3 Groups and make a plot on which on the x axis shows 3 bars (each one of them is each group) and on the y axis the number of errors?
dput: http://pastebin.com/vGEPDNFf

With your data:
myData <- data.frame(case = 1:48,
group = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),
age = c(70,68,61,68,77,72,64,65,69,67,71,75,73,68,65,69,63,70,78,73,76,78,65,68,75,65,62,69,70,71,60,69,60,66,75,70,62,63,79,79,66,76,64,61,70,67,69,63),
errors = c(9,6,7,8,10,11,4,5,5,6,12,8,9,3,7,6,8,6,12,7,13,10,8,8,11,5,9,6,9,6,9,7,5,3,6,6,7,5,9,8,6,6,3,4,7,5,4,5))
Here is the code you have to run in R:
library(ggplot2)
library(dplyr)
myData %>%
group_by(group) %>%
summarize(total.errors=sum(errors)) %>%
ggplot(aes(x=factor(group), y=total.errors)) + geom_bar(stat = "identity")
It gives you the following figure:

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ggplot statistical differences in plot labels (reproducible code included) - r

Related

Calculating Wilson interval to plot binomial proportions in R?

How to make plots scales the same or trun them into Log scales in ggplot

ggplot grouped barchart based on marginal proportions

Graphing a histogram overlaid with a fitted 2 parameter Weibull function

Creating a histogram in R that shows the difference in the number of errors made by three groups

Categories

Resources