I am a very basic user of R, so I apologize beforehand for the simplicity of the question, or if the formulation is lacking.
I have a large data set, where I have one continuous numerical variable and two factors with 2 levels each.
This is (more or less) a reconstruction of my data based on generated/artificial data:
wordhigh.mu <- -2
wordlow.mu <- -2.5
pswordhigh.mu <- -1.5
pswordlow.mu <- -1.5
sigma <- 0.3
wordshigh <- rnorm(50,mean = wordhigh.mu,sd=sigma)
wordslow <- rnorm(50,mean = wordlow.mu,sd=sigma)
pswordshigh <- rnorm(50,mean = pswordhigh.mu,sd=sigma)
pswordslow <- rnorm(50,mean = pswordlow.mu,sd=sigma)
value <- c(wordshigh,wordslow,pswordshigh,pswordslow)
LexicalitySample <- c(rep("Word",100),rep("Pseudoword",100))
FrequencySample <- c(rep("High",50),rep("Low",50),rep("High",50),rep("Low",50))
new.table <- data.frame(ErpMinAv=value,Lexicality=LexicalitySample,Frequency=FrequencySample)
I managed to plot my data using ggplot:
ExampleBarPlot <- ggplot(new.table,aes(Lexicality,ErpMinAv,fill=Frequency)) + geom_bar(stat="identity",position="dodge") + xlab("Lexicality") + ylab("Microvolts") + labs(title = "Frequency effect for singular nouns and pseudoword controls") + scale_y_continuous("Microvolts",breaks = round(seq(0, -20, by = -0.5),1)) + guides(fill=guide_legend(title="Frequency"))+ scale_colour_manual(values = c("blue","red"))
The plot looks like this:
What I would like to do now is to show that the difference in frequency between pseudowords is not significant, but it is significant between words. For that it would be great to have significance statistics (t values in my case) and also confidence intervals. I know how to compute these, but I don't know how to add them to the barplot.
I have looked extensively on the Internet but I could not find an example that resulted in what I want to see.
All assistance is much appreciated.
I'm going to assume that you want means and confidence intervals.
Currently you are plotting sums, because you have a stacked barplot. We can see that when adding a border color:
We can use stat_summary() to calculate means, and bootstrap confidence intervals:
ggplot(new.table, aes(Lexicality,ErpMinAv,fill=Frequency)) +
stat_summary(geom = 'bar', fun.y = mean, position = position_dodge(0.9)) +
stat_summary(
geom = 'errorbar',
fun.data = mean_cl_boot,
position = position_dodge(0.9),
width = 0.5
) +
scale_y_continuous("Microvolts",breaks = round(seq(0, -20, by = -0.5),1))
Related
Unfortunately I don't know the name of this kind of plot/calculation method:
There is an outcome from the single'runs' from a MCS. To show converence I want to plot the mean and/or variance of every adding step in R.
E.g.
outcome <- c(1,1.2,0.8,0.9)
In the graph: mean = 1 over step 1, mean = 1.1 over step 2... What is the standard method? How to perform this growing number of means/variance?
Thanks!
Ok. you are looking for cumulative mean:
Here is a ggplot version:
outcome <- c(1, 1.2, 0.8, 0.9)
cumulative_mean <- cummean(outcome)
# Create a data frame with the cumulative means and their indices
df <- data.frame(cumulative_mean = cumulative_mean, index = 1:length(cumulative_mean))
# Plot the cumulative means
ggplot(df, aes(x = index, y = cumulative_mean)) +
geom_point() +
xlab("Index") +
ylab("Cumulative Mean") +
ggtitle("Cumulative Mean Plot")+
theme_minimal()
And here is base R version:
plot(cumulative_mean, type = "p", xlab = "Step", ylab = "Cumulative Mean")
I am using Mclust to estimate probability of component membership, but "density" is not included in the output from me.weighted(). Consequently, I am unable to plot probability density. The following code is lengthy because I want to clearly illustrate my purpose and problem, but I clearly indicate where my problem/question arises. My last chunk of code is my attempt at a solution, but it probably only highlights my ignorance of probability densities.
In this research project, my first objective is to compute an index of age-1 fish abundance for subsequent analysis. For that, I want to estimate the proportion of age-1 fish at specific lengths (i.e. an age-length key). It is reasonable to assume that the smaller mode is mostly age-1 fish and the larger mode is of age-2+ fish. My data are fish body length (fork length, cm) and abundance as a proportion of total (i.e., weighted univariate). Note, some outlying large lengths with small proportions were omitted; thus, sum(dat.df$proportions) < 1.
My specific aim here is to illustrate the probability densities superimposed on fish size composition, which reflects two age groups. Basically, in the last chunk of ggplot code, I want to swap out the estimated probability of membership to each (red) or either (green) component with probability densities becauses it would make a nice, informative figure in my manuscript.
I have read relevant articles (Murphy; Scrucca et al.; Mignan; R-Bloggers, etc), but found no answer.
So, I would greatly appreciate any help on how to compute the probability densities for each component and also the component-combined probability density.
Packages
library(ggplot2)
library(mclust)
Data
dat.df <- data.frame(flcm = 15:33, proportion = c(0.0043, 0.0114, 0.0296, 0.0519, 0.0540, 0.0403, 0.0294, 0.0152, 0.0257, 0.0793, 0.1458, 0.1505, 0.1277, 0.0909, 0.0389, 0.0308, 0.0121, 0.0101, 0.0085), z1 = c(rep(1,9), rep(0,10)), z2 = c(rep(0,9), rep(1,10)))
Plot data
ggplot()+
geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
fill = "gray", position="dodge", stat="identity")+
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
WITHOUT WEIGHTS (i.e., ignore dat.df$proportion)
Fit mixture model without weights
mod1 <- densityMclust(dat.df[, "flcm"], modelName = "V")
Plot probability density
plot(mod1, what = "density", data = dat.df$flcm, breaks = 5)
WITH WEIGHTS (i.e., include dat.df$proportion)
Refit model with weights
mod1_w <- me.weighted(modelName = "V",
data = dat.df$flcm,
z = cbind(dat.df$z1, dat.df$z2),
weights = dat.df$proportion)
Plot data with estimated fractional membership (updated z)
ggplot()+
geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
fill = "gray", position="dodge", stat="identity")+
geom_line(aes(x = dat.df$flcm,
y = (mod1_w$z[,1] * dat.df$proportion)),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = (mod1_w$z[,2] * dat.df$proportion)),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = (mod1_w$z[,1] * dat.df$proportion) +
mod1_w$z[,2] * dat.df$proportion),
color = "green") +
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
Plot probability density - Here's where my problem/question arises
plot(mod1_w, what = "density", data = dat.df$flcm, breaks = 5)`
Here's my attempted solution. Basically, for each component (age1, age2), multiply probabilities and scale to proportional abundance:
#age1 probability density
age1 <- mod1_w$z[,1]* #probability of age1 membership multiplied by
dnorm(dat.df$flcm, mod1_w$parameters$mean[1], #probability of flcm given age1
mod1_w$parameters$variance$sigmasq[1])*
sum(mod1_w$z[,1]*mod1_w$weights) #and scaled to proportional abundance of age1
#age2 probability density
age2 <- mod1_w$z[,2]* #probability of age2 membership multiplied by
dnorm(dat.df$flcm, mod1_w$parameters$mean[2],
mod1_w$parameters$variance$sigmasq[2])* #probability of flcm given age2
sum(mod1_w$z[,2]*mod1_w$weights) #and scaled to proportional abundance of age2
#combined ages probability density
age_all <- age1 + age2
#looks bad - the probability densities don't correspond well with proportional abundance
ggplot()+
geom_bar(aes(x=dat.df$flcm, y=dat.df$proportion),
fill = "gray", position="dodge", stat="identity")+
geom_line(aes(x = dat.df$flcm,
y = age1),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = age2),
color = "red") +
geom_line(aes(x = dat.df$flcm,
y = age_all),
color = "green") +
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
I am posting a solution to my own question; hopefully, this will help others. Basically, I switched to the package mxdist, which gives me the desired output as indicated in the following code.
library(mxdist)
#input data (dat.df is created by code in my original question)
dat.mx <- as.mixdata(dat.df[, 1:2])
#preliminary plot
plot(dat.mx)
#define initial parameters
dat_parms <- data.frame(pi=c(0.3, 0.7), mu=c(18, 26), sigma=c(2, 3))
#fit the model
fit1 <- mix(dat.mx, dat_parms, "gamma", constr=mixconstr(consigma="CCV"))
#plot default
plot(fit1)
#replot using ggplot for greater flexibility over appearance
z <- fitted(fit1)
dat.mx[dat.mx$flcm == "Inf", "flcm"] <- 34
ggplot()+
geom_bar(aes(x=dat.mx$flcm, y=dat.mx$proportion),
fill = "gray", position="dodge", stat="identity")+
geom_line(aes(x = dat.mx$flcm,
y = z$joint[,1]),
color = "red") +
geom_line(aes(x = dat.mx$flcm,
y = z$joint[,2]),
color = "red") +
geom_line(aes(x = dat.mx$flcm,
y = z$mixed),
color = "green") +
xlab("Fork length (cm)")+
ylab("Probability density")+
theme_bw()
#conditional probabilities are output
z$conditprob
I'm currently finishing off my Masters project and need to include some graphics for the write-up. Without boring you too much, I have some data which is associated with AR(1) parameters ranging from 0.1 to 0.9 by 0.1 increments. As such I thought of doing a faceted histogram like the one below (worry not about the hideous fruit salad of colours, it will not be used).
I used this code.
ggplot(opt_lens_geom,aes(x=l_1024,fill=factor(rho))) + geom_histogram()+coord_flip()+facet_grid(.~rho,scales = "free_x")
I also would like to draw a trend line for the median values since the AR(1) parameter is continuous. In a later iteration I deleted the padding and made it "look" like it was one graph, but I have had issues with the endpoints matching up since each facet is a separate graphical device. Can anyone give me some advice on how to do this? I am not particularly partial to the faceting so if it is not needed I do away with it.
I will try and upload sample data, but all simulating 100 values for each of the 9 rhos would work just to get it started like:
opt_lens_geom <- data.frame(rho= rep(seq(0.1,0.9,by=0.1),each=100),l_1024=rnorm(900))
You might consider ggridges. I've assumed here that you want a median value for each value of rho.
library(ggplot2)
library(ggridges)
library(dplyr)
set.seed(1001)
opt_lens_geom <- data.frame(rho = rep(seq(0.1, 0.9, by = 0.1), each = 100),
l_1024 = rnorm(900))
opt_lens_geom %>%
mutate(rho_f = factor(rho)) %>%
ggplot(aes(l_1024, rho_f)) +
stat_density_ridges(quantiles = 2, quantile_lines = TRUE)
Result. You can add scale = 1 as a parameter to stat_density_ridges if you don't like the amount of overlap.
Try the following. It uses a pre-computed data frame of the medians.
library(ggplot2)
df <- iris[c(1, 5)]
names(df) <- c("val", "rho")
med <- plyr::ddply(df, "rho", summarise, m = median(val))
ggplot(data = df, aes(x = val, fill = factor(rho))) +
geom_histogram() +
coord_flip() +
geom_vline(data = med, aes(xintercept = m), colour = 'black') +
facet_wrap(~ factor(rho))
You could do a variant on this using geom_violin instead of using histograms, although you wouldn't get labelled counts, just an idea of the relative density. Example with made up data:
df = data.frame(
rho = rep(c(0.1, 0.2, 0.3), each = 50),
val = sample(1:10, 150, replace = TRUE)
)
df$val = df$val + (5 * (df$rho == 0.2)) + (8 * (df$rho == 0.3))
ggplot(df, aes(x = rho, y = val, fill = factor(rho))) +
geom_violin() +
stat_summary(aes(group = 1), colour = "black",
geom = "line", fun.y = "median")
This produces a violin for each value of rho, and joins the medians for each violin.
Sorry if this is not well asked, first ever question.
Aim: to calculate the bone mineral density T-score (+/- 2.5 SD for sex and age specific BMD value). To say whether a patient is osteoporotic or not.
I am trying to do this graphically using ggplot 2 and geom_smooth
I am using the NHANES dataset (https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DXXFEM_H.htm#DXXINBMD) which is accessed through nhanesA package.
r load programs:
library(nhanesA)
library(ggplot2)
I am interested only in the intertrochanteric BMD, age and sex.
r load data:
nhanesTableVars('EXAM', "DXXFEM_D")
DXXFEM_D <- nhanes('DXXFEM_D')
fem_d <- DXXFEM_D
demo_d <- nhanes('DEMO_D')
demo_d <- nhanesTranslate('DEMO_D', 'RIAGENDR', data=demo_d)
DXXFEM_D_vars <- nhanesTableVars('EXAM', 'DXXFEM_D', namesonly=TRUE)
DXXFEM_D <- nhanesTranslate('DXXFEM_D', DXXFEM_D_vars, data=DXXFEM_D)
FEM_demo <- merge(demo_d, DXXFEM_D)
FEM_demo_1 <- FEM_demo[,c(5,6,55)]
Then I attempted the plot but with a levels argument in the "geom_smooth" does not work with level at 2.5.
r plot BMD with SD:
ggplot(data = FEM_demo_1, aes(x = RIDAGEYR, y = DXXINBMD, group = RIAGENDR, color = RIAGENDR)) +
geom_smooth(se = TRUE, level = 2.5) +
scale_x_continuous(minor_breaks = seq(0,85,1), breaks = seq(0,85,5))
1) I would ideally like a plot which shows the mean, -1SD (which refers to Osteopaenia) and -2SD which refers to cut off for osteoporosis which can be used to translate BMD into clinical criteria. Is there a way to do this?
2) Is there anyway to do this numerically?
Thanks
Here is the code for plot with mean, -1SD and -2SD. You can add styling to your liking. The calculations for mean and SD are done beforehand in dataframe.
data <- aggregate(FEM_demo_1$DXXINBMD, by=list(FEM_demo_1$RIAGENDR, FEM_demo_1$RIDAGEYR), FUN=mean, na.rm=TRUE)
names(data) <- c("gender", "age", "mean")
data[,"sd"] <- aggregate(FEM_demo_1$DXXINBMD, by=list(FEM_demo_1$RIAGENDR, FEM_demo_1$RIDAGEYR), FUN=sd, na.rm=TRUE)[3]
ggplot(data=data, aes(x=age, group=gender))+
geom_smooth(se = FALSE, aes(y=mean))+
geom_smooth(se = FALSE, aes(y=mean-sd))+
geom_smooth(se = FALSE, aes(y=mean-(2*sd)))
I have a collection of data over several studies. For each study I am interested about the mean of a variable by gender, and if this significantly differs. For each study I have the mean and 95% confidence intervals for both males and females.
What I would like to do is something similar to this:
I have used several flavours of dotplots (dotplot, dotplot2, Dotplot) but did not quite get there.
Using Dotplot from Hmisc I managed to have one series and its errorbars, but I am at a loss on how to adding the second series.
I used Dotplot and got the vertical ending of the error bars following advice given here.
Here is a working example of the code I am using
data<-data.frame(ID=c("Study1","Study2","Study3"),avgm=c(2,3,3.5),avgf=c(2.5,3.3,4))
data$lowerm <- data$avgm*0.9
data$upperm <- data$avgm*1.1
data$lowerf <- data$avgf*0.9
data$upperf <- data$avgf*1.1
# Create the customized panel function
mypanel.Dotplot <- function(x, y, ...) {
panel.Dotplot(x,y,...)
tips <- attr(x, "other")
panel.arrows(x0 = tips[,1], y0 = y,
x1 = tips[,2], y1 = y,
length = 0.05, unit = "native",
angle = 90, code = 3)
}
library(Hmisc)
Dotplot(data$ID ~ Cbind(data$avgm,data$lowerm,data$upperm), col="blue", pch=20, panel = mypanel.Dotplot,
xlab="measure",ylab="study")
This plots three columns of data, the average for males (avgm), and the lower and upper bound of the 95% confidence interval (lowerm and upperm). I have other three series, for the same studies, that do the same job for the female subjects (avgf, lowerf, upperf).
The results I have look like this:
What is missing, in a nutshell:
adding a second series (avgf) with means and confidence intervals defined on three other variables for the same studies
adding some vertical jitter so that they are not one on top of the other but the reader can see both even when they overlap.
Unfortunately I can't help you with Dotplot, but I find it fairly straightforward using ggplot. You just need to rearrange the data slightly.
library(ggplot2)
# grab data for males
df_m <- data[ , c(1, 2, 4, 5)]
df_m$sex <- "m"
names(df_m) <- c("ID", "avg", "lower", "upper", "sex")
df_m
# grab data for females
df_f <- data[ , c(1, 3, 6, 7)]
df_f$sex <- "f"
names(df_f) <- c("ID", "avg", "lower", "upper", "sex")
df_m
# bind the data together
df <- rbind(df_m, df_f)
# plot
ggplot(data = df, aes(x = ID, y = avg, ymin = lower, ymax = upper, colour = sex)) +
geom_point(position = position_dodge(width = 0.2)) +
geom_errorbar(position = position_dodge(width = 0.2), width = 0.1) +
coord_flip() +
scale_colour_manual(values = c("blue", "red")) +
theme_classic()
# if you want horizontal grid lines you may change the last line with:
theme_bw() +
theme(panel.grid.major.y = element_line(colour = "grey", linetype = "dashed"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank())