Visually unequal spaced log2 scaled axis with geom_smooth

Visually unequal spaced log2 scaled axis with geom_smooth - r

I have a dataset containing descending HRs values (Y-axis) over Age (X-axis), stratified in two groups.
https://www.dropbox.com/s/l2p24llxcvndljl/reproducible_data_for_log2plot.txt?dl=0
I am trying to create two geom_smooth(method = "loess") with the Y-axis (HR) in log2 scale with uneven spacing, limited by specified breaks from 1 to 70.
Reading the data:
aLotOfHRs <- read.table("https://www.dropbox.com/s/l2p24llxcvndljl/reproducible_data_for_log2plot.txt?dl=1" , header = TRUE , sep = "\t")
My first attempt:
p <- ggplot(aLotOfHRs, aes(x = Age,y = HR,fill=Quantiles,color =Quantiles)) +
geom_smooth(method = "loess" , formula = y~x) +
# geom_point() +
theme_minimal() +
ylab("HR per 1-SD (log2 scale)") +
xlim(c(40,72)) +
scale_y_continuous(trans = scales::log2_trans())
1st Try
The scale is unfortunately visually equal but the log2 plot is correct. If I specify breaks, labels and limits:
p <- ggplot(aLotOfHRs, aes(x = Age,y = HR,fill=Quantiles,color =Quantiles)) +
geom_smooth(method = "loess" , formula = y~x) +
# geom_point() +
theme_minimal() +
ylab("HR per 1-SD (log2 scale)") +
xlim(c(40,72)) +
scale_y_continuous(trans = scales::log2_trans()
, breaks = log2(c(1.1,2,4,8,16,32,64))
, labels = c(1.1,2,4,8,16,32,64)
, limits = c(log2(1.1) , log2(70)))
# Note: I can't extend to 1 or it returns
# Error in seq.default(a, b, length.out = n + 1) :'from' must be a finite number
2nd Try
I obtain the right scale but the wrong plot.
The best result so far has been obtained transforming the coordinates last, but still I can't set the breaks the way I want them:
p <- ggplot(aLotOfHRs, aes(x = Age,y = HR,fill=Quantiles,color =Quantiles)) +
# geom_point() +
theme_minimal() +
ylab("HR per 1-SD of PRS (log2 scale)") +
xlim(c(40,70)) +
geom_smooth(method = "loess" , formula = y~x) +
ylim(c(1,70) ) +
coord_trans( y = "log2")
3d Try
Any suggestions?

This might be what you are after:
aLotOfHRs <- read.table("https://www.dropbox.com/s/l2p24llxcvndljl/reproducible_data_for_log2plot.txt?dl=1" , header = TRUE , sep = "\t")
p <- ggplot(aLotOfHRs, aes(x = Age,y = HR,fill=Quantiles,color =Quantiles)) +
# geom_point() +
theme_minimal() +
ylab("HR per 1-SD of PRS (log2 scale)") +
xlim(c(40,70)) +
geom_smooth(method = "loess" , formula = y~x)
p + scale_y_continuous(trans = scales::log2_trans(),breaks = c(1,2,4,8,16,32,64), lim = c(1,100), expand = c(0,0))
This is your 3rd one, but with the added scale_y_continuous() call at the end.

Related

How to count how many data points there are for a scattergraph?

Plot_Carnivore <- ggplot2::ggplot(data = dat1_carnivore,
aes(x = AdultBodyMass_g,
y = Range_Area_km2)) +
geom_point() +
geom_smooth(method = lm) +
scale_x_continuous(trans = "log10", labels = scales::comma) +
scale_y_continuous(trans = "log10", labels = scales::comma) +
ggtitle("Adult Body Size vs Geographic Range Size Of Carnivora") +
labs(x = "Adult Body Size In Grams", y = "Range Size In Km^2")
How do I count how many data points I have in my scattergraph as I had some datapoints removed due to some not containing any value from my data set so it cannot plot it. Please help.

How do you replace the points on a box plot with the point's corresponding row number index?

I have a data frame that looks like this:
Train_Table_Time_Power <- data.frame(
Skew = runif(250),
Crest = runif(250),
Kurt = runif(250),
Impulse = runif(250),
TI = sample(c("0.05", "0.10", "0.15", "0.20"), 10, replace = TRUE)
)
I then created a box with points using the code below:
Crest_BoxPlot <- ggplot(Train_Table_Time_Power, aes(x = TI, y = Crest, color = TI)) +
geom_boxplot(notch = T, id=TRUE) +
stat_summary(fun = mean, geom="point", shape=19, color="red", size=2) +
geom_jitter(shape=16, position = position_jitter(0.2), size = 0.3) +
labs(title = "Crest_Time", x = "TI", y = "Normalized Magnitude") +
theme_minimal() + theme_Publication()
I would like to somehow have the individual points of the boxplot be replaced with their row number index, however, I can't seem to figure out a way how. Could someone direct me on how to do this, if it is indeed possible?

Just use geom_text()instead of geom_jitter(), but be aware that readability is limited due to the overlapping labels.
# add the row number as column
library(tibble)
Train_Table_Time_Power <- rowid_to_column(Train_Table_Time_Power)
ggplot(Train_Table_Time_Power, aes(x = TI, y = Crest, color = TI, label = rowid)) +
geom_boxplot(notch = T, id=TRUE) +
stat_summary(fun = mean, geom="point", shape=19, color="red", size=2) +
geom_text(position = position_jitter(0.2)) +
labs(title = "Crest_Time", x = "TI", y = "Normalized Magnitude") +
theme_minimal()

Customize formula in geom-smooth / ggplot2 / R

I want to customize the formula used in geom_smooth like this:
library(MASS)
library(ggplot2)
data("Cars93", package = "MASS")
str(Cars93)
Cars93.log <- transform(Cars93, log.price = log(Price))
log.model <- lm(log.price ~ Horsepower*Origin, data = Cars93.log)
summary(log.model)
plot(log.model)
p <- ggplot(data = Cars93.log, aes(x = Horsepower, y = log.price, colour = Origin)) +
geom_point(aes(shape = Origin, color = Origin)) + # Punkte
facet_grid(~ Origin) +
theme(axis.title.x = element_text(margin=margin(15,0,0,0)),
axis.title.y = element_text(margin=margin(0,15,0,0))) +
scale_y_continuous(n.breaks = 7) +
scale_colour_manual(values = c("USA" = "red","non-USA" = "black")) +
scale_shape_manual(values = c(16,16)) +
ylab("Price(log)")
lm.mod <- function(df) {
y ~ x*Cars93.log$Origin
}
p_smooth <- by(Cars93.log, Cars93.log$Origin,
function(x) geom_smooth(data=x, method = lm, formula = lm.mod(x)))
p + p_smooth
However, I receive the error that the computation failed because of different lengths of my used variables.
length(Cars93.log$log.price)
length(Cars93.log$Origin)
length(Cars93.log$Horsepower)
But when I check the length for each variable they're all the same... Any ideas, what's wrong?
Thanks a lot, Martina

I agree with #Rui Barradas, seems like the issue is the lines for lm.mod and p_smooth and the by function
Once you are making a distinction by Origin (e.g., by doing either facet_wrap or color = Origin) then geom_smooth will automatically run different models for those facets.
p <- ggplot(data = Cars93.log,
aes(x = Horsepower, y = log.price, color = Origin)) +
geom_point(aes(shape = Origin)) +
facet_wrap(~ Origin) +
theme(axis.title.x = element_text(margin=margin(15,0,0,0)),
axis.title.y = element_text(margin=margin(0,15,0,0))) +
scale_y_continuous(n.breaks = 7) +
scale_colour_manual(values = c("USA" = "red","non-USA" = "black")) +
scale_shape_manual(values = c(16,16)) +
ylab("Price(log)")
p + geom_smooth(method = lm, formula = y ~ x)
you can convince yourself that this is the same as the output of log.model by extending the x-axis limits to see where the geom_smooth line would cross the y axis (e.g., + coord_cartesian(xlim = c(0, 300)))
You can also see the difference in the graph if you don't pass color = Origin to the geom_smooth function (essentially what is happening if you comment this out from the first ggplot() initialization):
p <- ggplot(data = Cars93.log,
aes(x = Horsepower, y = log.price)) + # color = Origin)) +
geom_point(aes(shape = Origin)) +
#facet_wrap(~ Origin) +
theme(axis.title.x = element_text(margin=margin(15,0,0,0)),
axis.title.y = element_text(margin=margin(0,15,0,0))) +
scale_y_continuous(n.breaks = 7) +
scale_colour_manual(values = c("USA" = "red","non-USA" = "black")) +
scale_shape_manual(values = c(16,16)) +
ylab("Price(log)")
p + geom_smooth(method = lm, formula = y ~ x)

How can I add a layer showing the distribution on a conditional variable in a probability plot in R studio?

I am fitting the following regression:
model <- glm(DV ~ conditions + predictor + conditions*predictor, family = binomial(link = "probit"), data = d).
I use 'sjPlot' (and 'ggplot2') to make the following plot:
library("ggplot2")
library("sjPlot")
plot_model(model, type = "pred", terms = c("predictor", "conditions")) +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")>
But I can't figure out how to add a layer showing the distribution on the conditioning variable like I can easily do by setting "hist = TRUE" using 'interplot':
library("interplot")
interplot(model, var1 = "conditions", var2 = "predictor", hist = TRUE) +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")
I have tried a bunch of layers using just ggplot as well, with no success
ggplot(d, aes(x=predictor, y=DV, color=conditions))+
geom_smooth(method = "glm") +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")
.
I am open to any suggestions!

I've obviously had to try to recreate your data to get this to work, so it won't be faithful to your original, but if we assume your plot is something like this:
p <- plot_model(model, type = "pred", terms = c("predictor [all]", "conditions")) +
xlab("Xlab") +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title")
p
Then we can add a histogram of the predictor variable like this:
p + geom_histogram(data = d, inherit.aes = FALSE,
aes(x = predictor, y = ..count../1000),
fill = "gray85", colour = "gray50", alpha = 0.3)
And if you wanted to do the whole thing in ggplot, you need to remember to tell geom_smooth that your glm is a probit model, otherwise it will just fit a normal linear regression. I've copied the color palette over too for this example, though note the smoothing lines for the groups start at their lowest x value rather than extrapolating back to 0.
ggplot(d, aes(x = predictor, y = DV, color = conditions))+
geom_smooth(method = "glm", aes(fill = conditions),
method.args = list(family = binomial(link = "probit")),
alpha = 0.15, size = 0.5) +
xlab("Xlab") +
scale_fill_manual(values = c("#e41a1c", "#377eb8")) +
scale_colour_manual(values = c("#e41a1c", "#377eb8")) +
ylab("Ylab") +
theme_minimal() +
ggtitle("Title") +
geom_histogram(aes(y = ..count../1000),
fill = "gray85", colour = "gray50", alpha = 0.3)
Data
set.seed(69)
n_each <- 500
predictor <- rgamma(2 * n_each, 2.5, 3)
predictor <- 1 - predictor/max(predictor)
log_odds <- c((1 - predictor[1:n_each]) * 5 - 3.605,
predictor[n_each + 1:n_each] * 0 + 0.57)
DV <- rbinom(2 * n_each, 1, exp(log_odds)/(1 + exp(log_odds)))
conditions <- factor(rep(c(" ", " "), each = n_each))
d <- data.frame(DV, predictor, conditions)

ggplot error bar legend

I am having difficulties adding a legend to my error bar plot. I tried several command that I've seen in other subject, but unfortunately it doesn't work (I am sure I'm missing something but I can't figure out what)
library(ggplot2)
errors=matrix(c(-3.800904,-3.803444,-3.805985,-3.731204,-3.743969,
-3.756735,-3.742510,-3.764961,-3.787413,-3.731204,-3.743969,-3.756735,
-3.711420,-3.721589,-3.731758,-3.731204,-3.743969,-3.756735,-3.636346,
-3.675159,-3.713971,-3.731204,-3.743969,-3.756735),nrow=4,byrow=TRUE)
modelName=c("model 1","model 2","model 3","model 0")
boxdata=data.frame(errors,modelName)
colnames(boxdata)=c("icp","pred","icm","icp_obs","obs","icm_obs","model")
qplot(boxdata$model,boxdata$pred,
main = paste("confidance level 95% for age ", age_bp + start_age - 1,sep="")) +
geom_errorbar(aes(x=boxdata$model, ymin=boxdata$icm, ymax=boxdata$icp), width=0.20,col='deepskyblue') +
geom_point(aes(x=boxdata$model,y=boxdata$obs),shape=4,col="orange") +
geom_errorbar(aes(x=boxdata$model, ymin=boxdata$icm_obs, ymax=boxdata$icp_obs), width=0.20,col='red') +
scale_shape_manual(name="legend", values=c(19,4)) +
scale_color_manual(name="legend", values = c("black","orange")) +
xlab("models") +
ylab("confidence level")

The problem is that you are using wide form data rather than long form data. You need to convert the data from wide to long before plotting if you want to get a legend.
library(ggplot2)
errors=matrix(c(-3.800904,-3.803444,-3.805985,-3.731204,-3.743969,
-3.756735,-3.742510,-3.764961,-3.787413,-3.731204,-3.743969,-3.756735,
-3.711420,-3.721589,-3.731758,-3.731204,-3.743969,-3.756735,-3.636346,
-3.675159,-3.713971,-3.731204,-3.743969,-3.756735),nrow=4,byrow=TRUE)
errors = rbind(errors[, 1:3], errors[,4:6]) # manually reshaping the data
modelName=c("model 1","model 2","model 3","model 0")
type = rep(c("model", "obs"), each = 4)
boxdata=data.frame(errors,modelName, type)
colnames(boxdata)=c("icp","pred","icm","model", "type")
ggplot(boxdata, aes(x = model, y = pred, ymax = icp, ymin = icm,
group = type, colour = type, shape = type)) +
geom_errorbar(width=0.20) +
geom_point() +
scale_shape_manual(values=c(19, 4)) +
scale_color_manual(values = c("black","orange")) +
xlab("models") +
ylab("confidence level")
The output looks closer to your output can be generated by:
ggplot(boxdata, aes(x = model, y = pred, ymax = icp, ymin = icm,
group = type, colour = type, shape = type)) +
geom_errorbar(width=0.20) +
geom_point(colour = rep(c("black","orange"), each = 4)) +
scale_shape_manual(values=c(19, 4)) +
scale_color_manual(values = c("deepskyblue", "red")) +
xlab("models") +
ylab("confidence level")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Visually unequal spaced log2 scaled axis with geom_smooth - r

Related

How to count how many data points there are for a scattergraph?

How do you replace the points on a box plot with the point's corresponding row number index?

Customize formula in geom-smooth / ggplot2 / R

How can I add a layer showing the distribution on a conditional variable in a probability plot in R studio?

ggplot error bar legend

Categories

Resources