I am trying replicate the published code for ML-NMR from the multinma package, which is published here:
https://cran.r-project.org/web/packages/multinma/vignettes/example_plaque_psoriasis.html#ref-methods_paperl.
When I get to the following steps, I ran into problems applying the dlogitnorm function from the logitnorm package. I assume this is due to package dependencies. I've replaced the last line of the code since the function takes as arguments 'mu' and 'sigma', instead of 'mean' and 'sd'. However, when I produce the histogram, it doesn't match the published one. Any ideas how to match the published histogram?
library(multinma)
library(logitnorm) # required to use logit-Normal distribution
**# Get mean and sd of covariates in each study**
ipd_summary <- pso_ipd %>%
group_by(studyc) %>%
summarise_at(vars(weight, durnpso, bsa), list(mean = mean, sd = sd, min = min, max = max)) %>%
pivot_longer(weight_mean:bsa_max, names_sep = "_", names_to = c("covariate", ".value")) %>%
# Assign distributions
mutate(dist = recode(covariate,
bsa = "dlogitnorm",
durnpso = "dgamma",
weight = "dgamma")) %>%
# Compute density curves
group_by(studyc, covariate) %>%
mutate(value = if_else(dist == "dlogitnorm",
list(seq(0, 1, length.out = 101)),
list(seq(min*0.8, max*1.2, length.out = 101)))) %>%
unnest(cols = value) %>%
#Note this line was edited from the original code to solve an error caused by dlogitnorm(), which uses 'mu'=' and 'sigma' as arguments
**#mutate(dens = eval(call(first(dist), x = value, mean = first(mean), sd = first(sd))))**
mutate(dens = ifelse(dist != "dlogitnorm", eval(call(first(dist), x = value, mean = first(mean), sd = first(sd))), NA)
dens = ifelse(dist == "dlogitnorm", eval(call(first(dist), x = value, mu = first(mean), sigma =first(sd), log=FALSE)), dens))
*# Plot histograms and assumed densities*
pso_ipd %>%
pivot_longer(c(weight, durnpso, bsa), names_to = "covariate", values_to = "value") %>%
ggplot(aes(x = value)) +
geom_histogram(aes(y = stat(density)),
binwidth = function(x) diff(range(x)) / nclass.Sturges(x),
boundary = 0,
fill = "grey50") +
geom_line(aes(y = dens), data = ipd_summary,
colour = "darkred", size = 0.5) +
facet_wrap(~studyc + covariate, scales = "free", ncol = 3) +
theme_multinma()
Related
I'm using ggplot geom_vline in combination with a custom function to plot certain values on top of a histogram.
The example function below e.g. returns a vector of three values (the mean and x sds below or above the mean). I can now plot these values in geom_vline(xintercept) and see them in my graph.
#example function
sds_around_the_mean <- function(x, multiplier = 1) {
mean <- mean(x, na.rm = TRUE)
sd <- sd(x, na.rm = TRUE)
tibble(low = mean - multiplier * sd,
mean = mean,
high = mean + multiplier * sd) %>%
pivot_longer(cols = everything()) %>%
pull(value)
}
Reproducible data
#data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))
df <- bind_rows(lst(normal, outliers), .id = "type")
df %>%
ggplot(aes(x = data)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
linetype = "dashed", color = "red") +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
linetype = "dashed")
The problem is, that as you can see I would have to define data$df at various places.
This becomes more error-prone when I apply any change to the original df that I pipe into ggplot, e.g. filtering out outliers before plotting. I would have to apply the same changes again at multiple places.
E.g.
df %>% filter(type == "normal")
#also requires
df$data
#to be changed to
df$data[df$type == "normal"]
#in geom_vline to obtain the correct input values for the xintercept.
So instead, how could I replace the df$data argument with the respective column of whatever has been piped into ggplot() in the first place? Something similar to the "." operator, I assume. I've also tried stat_summary with geom = "vline" to achieve this, but without the desired effect.
You can enclose the ggplot part in curly brackets and reference the incoming dataset with the . symbol both in the ggplot command and when calculating the sds_around_the_mean. This will make it dynamic.
df %>%
{ggplot(data = ., aes(x = data)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
linetype = "dashed", color = "red") +
geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
linetype = "dashed")}
I have the evolution of the mean values for two groups. But as the number of valid observations changes at each timepoint, I want to add to the graph at each timepoint the number of valid values for each group. The aim is to make the reader see that the mean over time are not calculated on the same number of individuals
mydata<-data.frame(
ID=1:10,
groupe=c(rep("A",5),rep("B",5)),
value1=c(50,49,47,46,44,39,37,36,30,30),
value2=c(43,40,42,36,25,37,36,35,30,28),
value3=c(32,30,38,32,NA,34,36,32,27,NA),
value4=c(24,25,30,NA,NA,30,32,28,NA,28),
value5=c(24,22,NA,NA,NA,25,27,NA,NA,NA)
)
library(dplyr)
mydata2<-mydata %>%
group_by(groupe) %>%
summarise(mean_value1 = mean(value1),
mean_value2 = mean(value2),
mean_value3 = mean(value3,na.rm=T),
mean_value4 = mean(value4,na.rm=T),
mean_value5 = mean(value5,na.rm=T)
)
mydata2Lg<-mydata2%>%pivot_longer(
cols = mean_value1 :mean_value5,
names_to = "time",values_to = "mean",
names_prefix = "mean_value"
)
mydata2Lg$groupe<-as.factor(mydata2Lg$groupe)
ggplot(mydata2Lg,aes(x=time, y=mean, group=groupe,color=groupe))+
geom_line(aes(linetype=groupe),size=1)+
geom_point(aes(shape=groupe))
I'm sorry for not giving a clear visual indication of what I want. I hope you understand what I mean.
Below the graphic
A typical way to show the uncertainty caused by different sample size is to use error bars or a ribbon to indicate the standard error. This gives a nice visual intuition of the uncertainty introduced by both the spread of the data and the sample size. However, you can also add labels of counts too. You just need to summarize your data appropriately.
For completeness, here is your data represented with both a standard error ribbon and labels of the number of samples at each time point:
library(tidyverse)
mydata %>%
pivot_longer(value1:value5) %>%
group_by(groupe, name) %>%
summarize(count = sum(!is.na(value)),
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE)) %>%
mutate(time = as.numeric(gsub("\\D", "", name)),
upper = mean + sd/sqrt(count),
lower = mean - sd/sqrt(count)) %>%
ggplot(aes(time, mean, color = groupe)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = groupe),
color = NA, alpha = 0.2) +
geom_point() +
geom_line() +
geom_label(aes(label = paste0("n = ", count),
y = mean + ifelse(groupe == "A", 1,-1)),
key_glyph = draw_key_blank) +
scale_color_manual(values = c("orangered3", "deepskyblue4")) +
scale_fill_manual(values = c("orangered3", "deepskyblue4")) +
labs(title = 'Mean values for each group over time \u00B1 standard error',
subtitle = expression(italic("Labels show sample size at each point"))) +
theme_light(base_size = 16)
Is it possible to order error plots in R by their variance? So that they are from greatest variance to least?
Code:
library(ggplot2)
df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df, 3)
library(dplyr)
df.summary <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len, na.rm = TRUE),
len = mean(len)
)
df.summary
f <- ggplot(
df.summary,
aes(x = dose, y = len, ymin = len-sd, ymax = len+sd)
)
f + geom_pointrange()
# Standard error bars
f + geom_errorbar(width = 0.2) +
geom_point(size = 1.5)
Any help at all would be greatly appreciated!
Continue the pipe coercing the sd to ordered factor with the order given by the numeric sd. Then plot as in the question. All that needs to change is the mutate below.
df.summary <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len, na.rm = TRUE),
len = mean(len)
) %>%
mutate(i = order(sd, decreasing = TRUE),
dose = ordered(dose, levels = dose[i])) %>%
select(-i)
I want to add a power curve with confidence intervals to my diamter-weight relationship, which clearly follows a y=a*x^b regression. So far, I used the geom_smooth "loess" version, but this is not yet quite right and perfect. Any suggestion how to add a power regression line would be much appreciated. Below is the used code:
p2<-ggplot(Data,aes(x=Diameter,y=Wet_weight,colour=Site))+
geom_point(size=3.5,alpha=0.3)+
geom_smooth(aes(group=Species),method=loess,colour="black")+
labs(x="\nUmbrella diamter (mm)",y="Wet weight (mg)\n")+theme_classic()+
scale_colour_manual(values=c("black","dark blue","blue","dark green","green"))+
theme(axis.title.x=element_text(size=20),
axis.text.x=element_text(size=18,colour="black"),
axis.title.y=element_text(size=20),
axis.text.y=element_text(size=18,colour="black"),
axis.ticks=element_line(colour="black",size=1),
axis.line=element_line(colour="black",size=1,linetype="solid"),
legend.position=c(0.18,0.75),
legend.text=element_text(colour="black",size=17),
legend.title=element_text(colour="black",size=18))
p2
Thank you!
I used this to get many equations, R2, and plots.
df= #change your data frame so it fits the current code
variables=c("group","year") #if you have multiple groups/seasons/years/elements add them here
df$y= #which variable will be your y
df$x= #which variable will be your x
#No changes get the equations
text=df %>%
group_by(across(all_of(variables))) %>% #your grouping variables
do(broom::tidy(lm(log(y) ~ log(x), data = .))) %>%
ungroup() %>%
mutate(y = round(ifelse(term=='(Intercept)',exp(estimate),estimate),digits = 2)) %>% #your equation values rounded to 2
select(-estimate,-std.error,-statistic ,-p.value) %>%
pivot_wider(names_from = term,values_from = y) %>%
rename(.,a=`(Intercept)`,b=`log(x)`)
#CHANGE before running!! add your grouping variables
rsq=df %>%
split(list(.$group,.$year)) %>% #---- HERE add the names after $
map(~lm(log(y) ~ log(x), data = .)) %>%
map(summary) %>%
map_dbl("r.squared") %>%
data.frame()
#Join the R2 and y results for the plot in a single data frame and write the equations
labels.df=mutate(rsq,groups=row.names(rsq)) %>%
separate(col = groups,into = c(variables),sep = "[.]",
convert = TRUE, remove = T, fill = "right") %>%
rename("R"='.') %>%
left_join(text,.) %>%
mutate(R=round(R,digits = 4), #round your R2 digits
eq= paste('y==',a,"~x^(",b,")", sep = ""),
rsql=paste("R^2==",R),
full= paste('y==',a,"~x^(",b,")","~~R^2==",R, sep = ""))
# plot
ggplot(df,aes(x = x,y = y)) +
geom_point(size=4,mapping = aes(
colour=factor(ifelse(is.na(get(variables[2])),"",(get(variables[2])))), #points colour
shape=get(variables[1]))) + # different shapes
facet_wrap(get(variables[1])~ifelse(is.na(get(variables[2])),"",get(variables[2])),
scales = "free",labeller = labeller(.multi_line = F))+ #for multiple groups; join text in one line
stat_smooth(mapping=aes(colour=get(variables[1])), #colours for our trend
method = 'nls', formula = 'y~a*x^b',
method.args = list(start=c(a=1,b=1)),se=FALSE) +
geom_text(labels.df,x = Inf, y = Inf,size=5, mapping = aes(label = (eq)), parse = T,vjust=1, hjust=1)+
geom_text(labels.df,x = Inf,y = Inf,size=5, mapping = aes(label = (rsql)), parse = T,vjust=2.5, hjust=1)+
#scale_y_log10() + #add this to avoid problems with big y values
labs(x="Your x label",y="your y label")+
theme_bw(base_size = 16) +
theme(legend.position = "none",
strip.background = element_rect(fill="#b2d6e2"))
I have a dataframe that I would like to plot, generated by the following code.
df_rn1 = as.data.frame(cbind(rnorm(40, 1, 1), rep("rn1", 40)))
df_rn2 = as.data.frame(cbind(rnorm(40, 10, 1), rep("rn2", 40)))
df_rn3 = as.data.frame(cbind(rnorm(40, 100, 1), rep("rn3", 40)))
df_test = rbind(df_rn1, df_rn2, df_rn3)
colnames(df_test) <- c("value", "type")
I would like to plot the dataframe normalized by the respective first observation s.t. they are scaled properly. However, I am not getting further than this:
ggplot(aes(x = rep(1:40, 3), y=as.numeric(as.character(value)), color = type), data = df_test) +
geom_line()
Is it possible to do the normalization by types directly in the ggplot code?
Thx
How about this?
library(tidyverse);
df_test %>%
group_by(type) %>%
mutate(
value = as.numeric(as.character(value)),
value.scaled = (value - mean(value)) / sd(value),
idx = 1:n()) %>%
ggplot(aes(idx, value.scaled, colour = type)) + geom_line()
Note that values are scaled within type; not sure what you're after, for global scaling, see #ManishSaraswat's answer.
You can use scale function to normalize the values.
df_test %>%
mutate(value = scale(value)) %>%
ggplot(aes(x = rep(1:40, 3), y = value, color=type))+
geom_line()