I have a data frame consisting of six variables -- one two-level grouping variable indicating treatment status and four binary (0/1) variables. I would like to plot the proportion of successes with 95% confidence intervals as error bars for each binary variable, including separate dots and colors for each treatment group.
I'm currently plotting these as shown below.
df2 <-
df %>%
select(., c(q1_active, # select variables
q2_appt,
q2_trmt,
q2_img,
q2_tele,
q2_trav))
df3 <-
df2 %>%
pivot_longer(cols = starts_with("q2"),
names_to = "variable",
names_prefix = "q2",
values_to = "values")
se <- function(x) sqrt(var(x)/length(x)) #creates function to calculate standard error of the mean
df4 <-
df3 %>%
group_by(variable, q1_active) %>% # group by both binom variable and treatment status
mutate(means=mean(values)) %>% # calculate proportions for binomial variables
mutate(se=se(values)) %>% # calculates std error
distinct(means, .keep_all=TRUE)
ungroup() %>%
drop_na() # there is one "NA" group in the treatment variable I do not need
pos <- position_dodge(.5)
p2 <-
df5 %>%
ggplot(., aes(x=variable, y=means)) +
geom_point(aes(colour=as.factor(q1_active)),position=pos) +
geom_errorbar(aes(ymin=means-(1.96*se), ymax=means+(1.96*se),
colour=as.factor(q1_active),
group=as.factor(q1_active)),
width=.2, position=pos) +
labs(title="Title Here",
subtitle="Subtitle Here",
x="",
y="")
The plot looks okay. I know the proportions are correct because I've double-checked the "means" variable.
However, I'm unsure that I'm calculating the standard error correctly for these proportions. Additionally (and as you can likely see), when I run the plot, I have one proportion with zero frequency. I would like to instead calculate and plot the Wilson interval for these proportions instead of the standard error as I have done.
Could someone(s) guide me on how to correctly calculate for these binomial proportions the Wilson (or "exact") confidence interval -- either before or after I pivot my data frame -- and how to plot these using ggplot?
I'm relatively new to coding and R, so please forgive any sloppy code or misunderstandings. And please let me know if you need clarification on anything. Thank you in advance.
I am using this script to plot chemical elements using ggplot2 in R:
# Load the same Data set but in different name, becaus it is just for plotting elements as a well log:
Core31B1 <- read.csv('OilSandC31B1BatchResultsCr.csv', header = TRUE)
#
# Calculating the ratios of Ca.Ti, Ca.K, Ca.Fe:
C31B1$Ca.Ti.ratio <- (C31B1$Ca/C31B1$Ti)
C31B1$Ca.K.ratio <- (C31B1$Ca/C31B1$K)
C31B1$Ca.Fe.ratio <- (C31B1$Ca/C31B1$Fe)
C31B1$Fe.Ti.ratio <- (C31B1$Fe/C31B1$Ti)
#C31B1$Si.Al.ratio <- (C31B1$Si/C31B1$Al)
#
# Create a subset of ratios and depth
core31B1_ratio <- C31B1[-2:-18]
#
# Removing the totCount column:
Core31B1 <- Core31B1[-9]
#
# Metling the data set based on the depth values, to have only three columns: depth, element and count
C31B1_melted <- melt(Core31B1, id.vars="depth")
#ratio melted
C31B1_ra_melted <- melt(core31B1_ratio, id.vars="depth")
#
# Eliminating the NA data from the data set
C31B1_melted<-na.exclude(C31B1_melted)
# ratios
C31B1_ra_melted <-na.exclude(C31B1_ra_melted)
#
# Rename the columns:
colnames(C31B1_melted) <- c("depth","element","counts")
# ratios
colnames(C31B1_ra_melted) <- c("depth","ratio","percentage")
#
# Ploting the data in well logs format using ggplot2:
Core31B1_Sp <- ggplot(C31B1_melted, aes(x=counts, y=depth)) +
theme_bw() +
geom_path(aes(linetype = element))+ geom_path(size = 0.6) +
labs(title='Core 31 Box 1 Bioturbated sediments') +
scale_y_reverse() +
facet_grid(. ~ element, scales='free_x') #rasterImage(Core31Image, 0, 1515.03, 150, 0, interpolate = FALSE)
#
# View the plot:
Core31B1_Sp
I got the following image (as you can see the plot has seven element plots, and each one has its scale. Please ignore the shadings and the image at the far left):
My question is, is there a way to make these scales the same like using log scales? If yes what I should change in my codes to change the scales?
It is not clear what you mean by "the same" because that will not give you the same result as log transforming the values. Here is how to get the log transformation, which, when combined with the no using free_x will give you the plot I think you are asking for.
First, since you didn't provide any reproducible data (see here for more on how to ask good questions), here is some that gives at least some of the features that I think your data has. I am using tidyverse (specifically dplyr and tidyr) to do the construction:
forRatios <-
names(iris)[1:3] %>%
combn(2, paste, collapse = " / ")
toPlot <-
iris %>%
mutate_(.dots = forRatios) %>%
select(contains("/")) %>%
mutate(yLocation = 1:n()) %>%
gather(Comparison, Ratio, -yLocation) %>%
mutate(logRatio = log2(Ratio))
Note that the last line takes the log base 2 of the ratio. This allows ratios in each direction (above and below 1) to plot meaningfully. I think that step is what you need. you can accomplish something similar with myDF$logRatio <- log2(myDF$ratio) if you don't want to use dplyr.
Then, you can just plot that:
ggplot(
toPlot
, aes(x = logRatio
, y = yLocation) ) +
geom_path() +
facet_wrap(~Comparison)
Gives:
I am trying to create a grouped barplot that uses marginal (row) proportions rather than cell proportions and can't figure out how to change:
y = (..count..)/sum(..count..)
in ggplot to do this.
Using the mtcars dataset as an example and considering two categorical variables (cyl and am - purely for the sake of the example taking cyl as the response and am as the explanatory variable). Can anyone help me to do this:
data(mtcars)
# Get Proportions
mtcars_xtab <- table(mtcars$cyl,mtcars$am)
mtcars_xtab
margin.table(mtcars_xtab, 1) # A frequencies (summed over B)
margin.table(mtcars_xtab, 2) # B frequencies (summed over A)
prop.table(mtcars_xtab) # cell percentages - THIS IS WHAT'S USED IN THE PLOT
prop.table(mtcars_xtab, 1) # row percentages - THESE ARE WHAT I WANT TO USE IN THE PLOT
# Make Plot
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)
ggplot(mtcars, aes(x=am, fill=cyl)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "dodge") +
scale_fill_brewer(palette="Set2")
Thank you.
I would like to plot both a histogram to a fitted Weibull function on the same graph. The code to plot the histogram is:
hist(data$grddia2, prob=TRUE,breaks=5)
The code for the fitted Weibull function is:(Need the MASS package)
fitdistr(data$grddia2,densfun=dweibull,start=list(scale=1,shape=2))
How do I plot both together on the same graph. I've attached the data set.
Also, bonus to anyone who can provide code that can achieve the same thing, but create a graph for each column of data. Many columns within a data set. Would be nice to have all graphs on the same page.
https://www.dropbox.com/s/ra9c2kkk49vyyyc/Diameter%20Distribution.csv?dl=0
Here is the code
library("ggplot2")
library("dplyr")
library("tidyr")
library("MASS")
# Import dataset and filter the column "treeno"
# Use namespace dplyr:: explicitly because of conflict with MASS:: for function "select"
data <- read.csv("Diameter Distribution.csv") %>%
dplyr::select(-treeno)
# Function to provide the Weibull distribution for each column
# The distribution is calculated based on the estimated scale and shape parameters of the input
fitweibull <- function(column) {
x <- seq(0,7,by=0.01)
fitparam <- column %>%
unlist %>%
fitdistr(densfun=dweibull,start=list(scale=1,shape=2))
return(dweibull(x, scale=fitparam$estimate[1], shape=fitparam$estimate[2]))
}
# Apply function for each column then consolidate all in a data.frame
fitdata <-data %>%
apply(2, as.list) %>%
lapply(FUN = fitweibull) %>%
data.frame()
# Display graphs
multiplyingFactor<-10
ggplot() +
geom_histogram(data=gather(data), aes(x=value, group=key, fill=key), alpha=0.2) +
geom_line(data=gather(fitdata), aes(x=rep(seq(0,7,by=0.01),ncol(fitdata)), y=multiplyingFactor*value, group=key, color=key))
And the output figure
Variant: thanks to the wonderful ggplot2 package you can also have the graphs apart just by adding this final line of code
+ facet_wrap(~ key) + theme(legend.position = "none")
Which gives you this other figure:
I have to create a histogram in RStudio with the number of errors of three (3) Data groups. There are GroupA, GroupB and GroupC. Each one of them has 4 variables and one of them is the "errors" variable. So its like GroupA$errors etc..
How am I going to combine these 3 Groups and make a plot on which on the x axis shows 3 bars (each one of them is each group) and on the y axis the number of errors?
dput: http://pastebin.com/vGEPDNFf
With your data:
myData <- data.frame(case = 1:48,
group = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),
age = c(70,68,61,68,77,72,64,65,69,67,71,75,73,68,65,69,63,70,78,73,76,78,65,68,75,65,62,69,70,71,60,69,60,66,75,70,62,63,79,79,66,76,64,61,70,67,69,63),
errors = c(9,6,7,8,10,11,4,5,5,6,12,8,9,3,7,6,8,6,12,7,13,10,8,8,11,5,9,6,9,6,9,7,5,3,6,6,7,5,9,8,6,6,3,4,7,5,4,5))
Here is the code you have to run in R:
library(ggplot2)
library(dplyr)
myData %>%
group_by(group) %>%
summarize(total.errors=sum(errors)) %>%
ggplot(aes(x=factor(group), y=total.errors)) + geom_bar(stat = "identity")
It gives you the following figure: