I have succeded in building my first function i R.
I would now like to improve it, but don´t know how.
My dataset contains many variabels that have "mirror"variabels with almost the same name. The only naming difference is that the "mirror" variable has a "c" in front of the name.
The function plots comparisons of a variabel (VAR) and it´s "mirror" (cVAR).
Simplified dataset and simplified function code that reproduces the challenge:
library(ggplot2)
df <- data.frame(
X = 1:10+rnorm(10,mean=1,sd=0.5),
cX = 1:10+rnorm(10,mean=1,sd=0.5),
Y = 1:10+rnorm(10,mean=1,sd=0.5),
cY = 1:10-rnorm(10,mean=1,sd=0.5))
compare <- function(VAR, cVAR) {
VAR <- deparse(substitute(VAR))
cVAR <- deparse(substitute(cVAR))
ggplot(df, aes_string(x=VAR, y=cVAR))+
geom_point()+
geom_smooth(method="lm")+
geom_abline(intercept = 0, slope = 1)
}
compare(Y, cY)
I would like the function to do exactly the same as it does above, but I would like to just have to write compare(Y) instead.
In STATA I would try something like this:
y=c`VAR'
but I can´t find a similar approach in R.
How about this
compare <- function(VAR, cVAR) {
VAR <- deparse(substitute(VAR))
cVAR <- if(missing(cVAR)) {
paste0("c", VAR)
} else {
deparse(substitute(cVAR))
}
stopifnot(all(c(VAR, cVAR) %in% names(df)))
ggplot(df, aes_string(x=VAR, y=cVAR))+
geom_point()+
geom_smooth(method="lm")+
geom_abline(intercept = 0, slope = 1)
}
Basically we just use paste0() to add in the "c" to the first parameter when the second parameter is not specified.
Then you can run any of these
compare(Y) # to cY
compare(X) # to cX
compare(Y, cY)
compare(Y, cX)
Hope this is what you wanted. I simply paste0 VAR with defined myLetter and pass VAR to compare() as character.
compare <- function(VAR, myLetter = "c") {
library(ggplot2)
VAR2 <- paste0(myLetter, VAR)
ggplot(df, aes_string(VAR, VAR2))+
geom_point() +
geom_smooth(method = "lm")+
geom_abline(intercept = 0, slope = 1)
}
compare("Y")
Related
I want to wrap the following codes as a function
ez <- function(x,a) {
z<-x^3+1
return(z)
}
Q1 <- c(1,2,3,4,5)
ggplot(tibble(x = c(-10, 10)), aes(x)) +
map(1:length(Q1),
~stat_function(fun = ez, aes(color = paste0("sand ", .)), args=list(a = Q1[.])))
These codes develop multiple curves, but they are OVERLAP and it does not matter.
I want to generate a function like this
plot <- function(a) {
Q1 <- c(1,2,3,4,5)
ggplot(tibble(x = c(-6, 6)), aes(x)) +
map(1:length(Q1),
~stat_function(fun = ez, aes(color = paste0("sand ", .)), args=list(a = Q1[.])))
}
plot(2, 4, 3,6)
Maybe this is what your are looking for. As far as I get it you want to make a function, which you can pass a vector of parameters and which returns a plot of curves for the chosen parameters. To this end:
Pass the parameter values as a vector, i.e. put it in c(...)
In your plot function simply loop over a
Note: I adjusted the function ez to give different values (and non-overlapping curves) depending on a
ez <- function(x,a) {
z<-x^3+a^3
return(z)
}
library(ggplot2)
library(tibble)
library(purrr)
plot <- function(a) {
ggplot(tibble(x = c(-6, 6)), aes(x)) +
map(a,
~stat_function(fun = ez, aes(color = paste0("sand ", .)), args=list(a = .)))
}
plot(c(2, 4, 3, 6))
I have the following equation: y = 1 - cx, where c is a real number.
I'm trying to make something where I can pick the range of values for c and plot all the graphs of every function with the corresponding c.
Here's what I got as of now:
p <- ggplot(data = data.frame(x = 0), mapping = aes(x = x))
statfun1 <- c()
for (i in 1:3){
c <- i
fun1.i <- function(x){1 - c*x}
fun1.i.plot <- stat_function(fun = fun1.i, color="red")
statfun1 <- statfun1 + fun1.i.plot
}
p + statfun1 + xlim(-5, 5)
The p is basically what you need in ggplot2 to plot a function, then I go over in this case the values 1, 2 and 3 for c and I try to add them all at the end but this does not seem to work. Anyone maybe can help me out or put me on the right track?
Define your function
fun1.i <- function(x, c){1 - c*x}
Now from ?`+.gg`
You can add any of the following types of objects:
...
You can also supply a list, in which case each element of the list will be added in turn.
So you might use lapply
p + xlim(-5, 5) + lapply(1:3, function(c) {
stat_function(fun = fun1.i, args = list(c = c), geom = "line", color="red")
})
Result
I'm trying to write a function that can be called using the '+'-based ggplot2 syntax.
myplot + myfunction
Specifically, the function I'm writing symmetrizes the y-axis about zero, so it needs to determine the y-axis range for the input plot.
So let,
ylim_sym <- function(p){
get_y_range <- function(p){
ggplot2::ggplot_build(p)$layout$panel_ranges[[1]]$y.range
}
max_offset <- max(abs(get_y_range(p)))
p + ylim(- max_offset, max_offset)
}
With this function, the following works:
qplot(x = 1:10, y = exp(rnorm(10))) %>% ylim_sym()
But this doesn't work because of some precedence issue between +.gg and %>%:
qplot(x = 1:10, y = exp(rnorm(10))) +
geom_abline(slope = 0) %>%
ylim_sym()
(I could write the latter (all_my_ggplot_pipeline) %>% ylim_sym() but it's pretty ugly syntax).
Ideally, I'd like to be able to write ylim_sym such that it can be piped like so,
qplot(x = 1:10, y = exp(rnorm(10))) + ylim_sym()
but I can't work out how to access the plot on the LHS of + within ylim_sym
Any ideas?
I was able to solve it by doing the following.
StatSymYLim <- ggproto(
"StatSymYLim", Stat,
compute_group = function(data, scales) {
out <- data.frame(
x = median(data$x),
y = c(-1, 1) * max(abs(data$y))
)
out
},
required_aes = c("x", "y")
)
ylim_sym <- function(...){
geom_blank(..., stat = StatSymYLim)
}
Then the following works as required:
qplot(x = 1:10, y = exp(rnorm(10))) +
geom_abline(slope = 0) +
ylim_sym()
My understanding of ggplot2 internals is pretty shaky to be fair, so this might be a naive solution.
Note: your function needs an update as the structure of the object has slightly changed
Using package ggfun this would work:
# devtools::install_github("moodymudskipper/ggfun")
library(ggfun)
ylim_sym <- function(p){
get_y_range <- function(p){
ggplot2::ggplot_build(p)$layout$panel_params[[1]]$y.range
}
max_offset <- max(abs(get_y_range(p)))
p + ylim(- max_offset, max_offset)
}
qplot(x = 1:10, y = exp(rnorm(10))) +
geom_abline(slope = 0) +
ylim_sym
What are possible reasons as to why this is happening? It always happens after the value 10.
A subset of the dataset around the area of interest before and after the regression was applied:
Before
After
Dataset to reproduce graph
This is the ggplot2 call that I am using to generate the graph. The smoothing span used is 0.05.
dat <- read.csv("before_loess.csv", stringsAsFactors = FALSE)
smoothed.data <- applyLoessSmooth(dat, 0.05) # dat is the dataset before being smoothed
scan.plot.data <- melt(smoothed.data, id.vars = "sample.diameters", variable.name = 'series')
scan.plot <- ggplot(data = scan.plot.data, aes(sample.diameters, value)) +
geom_line(aes(colour = series)) +
xlab("Diameters (nm)") +
ylab("Concentration (dN#/cm^2)") +
theme(plot.title = element_text(hjust = 0.5))
Function used to apply the loess filter:
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
## response
vars <- colnames(raw.data)
## covariate
id <- 1:nrow(raw.data)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, given.data, span) loess(formula = as.formula(paste(x, "id", sep = "~")),
data = given.data,
degree = 1,
span = span)$fitted
## apply filter column-by-column
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data, span = smoothing.span),
col.names = colnames(raw.data))
sample.rows <- length(loess.graph.data[1])
loess.graph.data <- loess.graph.data %>% mutate("sample.diameters" = raw.data$sample.diameters[1:nrow(raw.data)])
}
The first problem is simply that your data is rounded to three significant figures. Below 10, the values on your x axis scan.plot.data$sample.diameters increase in 0.01 increments, which produces a smooth curve on the chart, but after 10 they increase in 0.1 increments, which shows up as visible steps on the chart.
The second problem is that you should be regressing against the values of sample.diameters, rather than against the row numbers id. I think this is causing there to be multiple smoothed values for each distinct value of x - hence the steps. Here are a couple of suggested small modifications to your function...
applyLoessSmooth <- function(raw.data, smoothing.span) {
raw.data <- raw.data[complete.cases(raw.data),]
vars <- colnames(raw.data)
vars <- vars[vars != "sample.diameters"] #you are regressing against this, so exclude it from vars
loess.filter <- function (x, given.data, span) loess(
formula = as.formula(paste(x, "sample.diameters", sep = "~")), #not 'id'
data = given.data,
degree = 1,
span = span)$fitted
loess.graph.data <- as.data.frame(lapply(vars, loess.filter, given.data = raw.data,
span = smoothing.span),
col.names = vars) #final argument edited
loess.graph.data$sample.diameters <- raw.data$sample.diameters #simplified
return(loess.graph.data)
}
All of which seems to do the trick...
Of course, you could have just done this...
dat.melt <- melt(dat, id.vars = "sample.diameters", variable.name = 'series')
ggplot(data = dat.melt, aes(sample.diameters, value, colour=series)) +
geom_smooth(method="loess", span=0.05, se=FALSE)
I have a data.frame (which I melted using the melt function), from which I produce multiple scatter plots and fit a regression line using the following:
ggplot(dat, aes(id, value)) + geom_point() + geom_smooth(method="lm", se=FALSE) + facet_wrap(variable~var1, scales="free")
I would like to add the regression equation and the R^2 in each of these scatter plots for the relevant regression (i.e. the one produced by geom_smooth in each scatter plot).
var1 above is just the name of one of the id columns of the melted data and I am facing the same question with facet_grid instad of facet_wrap.
I actually solved this, please see below a worked out example where the dependent variable is var1. This was a time series dataset, please ignore the date part if not relevant for your problem.
library(plyr)
library(ggplot2)
rm(dat)
dat <- read.table("data.txt", header = TRUE, sep = ",")
dat <- transform(dat, date = as.POSIXct(strptime(date, "%Y-%m-%dT%H:%M:%OS")))
rm(dat.m)
dat.m <- melt(dat, id = c('ccy','date','var1'))
lm_eqn = function(df){
m = lm(var1 ~ value, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
mymax = function(df){
max(df$value)
}
rm(regs)
regs <- ddply(dat.m, .(ccy,variable), lm_eqn)
regs.xpos <- ddply(dat.m, .(variable), function(df) (min(df$value)+max(df$value))/2)
regs.ypos <- ddply(dat.m, .(ccy,variable), function(df) min(df$var1) + 0.05*(max(df$var1)-min(df$var1)))
regs$y <- regs.ypos$V1
regs$x <- regs.xpos$V1
rm(gp)
gp <- ggplot(data=dat.m, aes(value, var1)) + geom_point(size = 1, alpha=0.75) + geom_smooth() + geom_smooth(method="lm", se=FALSE, color="red") + geom_text(data=regs, size=3, color="red", aes(x=x, y=y, label=V1), parse=TRUE) + facet_grid(ccy~variable, scales="free")
ggsave("data.png", gp, scale=1.5, width=11, height=8)
Nice solution. I'm surprised ggplot doesn't have a function built in to do this... I needed to display equations and R2 values from polynomial fits (generated by the ns(x,order) function in the splines package), and have expanded your lm_eqn function to accomodate polynomials of varying orders.
Disclaimer: I'm still quite new to R coding, and I'm aware that this code is very messy. There must be a nicer way to do it, and I'm going to start another thread to ask people to refine the code, and possibly expand it to other fit models... You can follow it here: https://groups.google.com/forum/?fromgroups#!forum/ggplot2
lm_eqn = function(df,x.var,y.var,signif.figs,eq.plot=T,model.type,order){
if(missing(x.var) | missing(y.var) | class(x.var)!='character' | class(y.var)!='character') stop('x.var and y.var must be the names of the columns you want to use as x and y as a character string.' )
if(missing(model.type)) stop("model.type must be 'lin' (linear y~x model) or 'poly' (polynomial y~ns(x,order) model, generated by splines package).")
if(model.type=='poly' & missing(order)) stop("order must be specified if poly method is used.")
if(eq.plot==T) {
# Linear y=mx+c equation
if(model.type=='lin') {
fit = lm(df[[y.var]] ~ df[[x.var]]);
eq <- substitute(italic(y) == c + m %.% italic(x)*","~~italic(r)^2~"="~r2,
list(c = signif(coef(fit)[1], signif.figs),
m = signif(coef(fit)[2], signif.figs),
r2 = signif(summary(fit)$r.squared, signif.figs)))
as.character(as.expression(eq));
}
# polynomial expression generated with the ns(x,order) function [splines package]
if(model.type=='poly') {
fit = lm(df[[y.var]] ~ ns(df[[x.var]],order));
base = gsub('!c!',signif(coef(fit)[1],signif.figs),"italic(y) == !c! + ")
element.1 = "!m! %.% italic(x)~"
element.2 = " + !m! %.% italic(x)^!o!~"
element.r2 = gsub('!r2!',signif(summary(fit)$r.squared,signif.figs),"~~italic(r)^2~\"=\"~!r2!")
eq=""
for(o in 1:(order)) {
if(o==1) {
if(coef(fit)[(o+1)]<0) tmp=gsub("[+]","",base) else tmp=base
eq=paste(tmp,gsub('!m!',signif(coef(fit)[(o+1)],signif.figs),element.1),sep="")
}
if(o>1) {
if(coef(fit)[(o+1)]<0) tmp=gsub("[+]","",element.2) else tmp=element.2
eq=paste(eq,gsub('!o!',o,gsub('!m!',signif(coef(fit)[(o+1)],signif.figs),tmp)),sep="")
}
if(o==(order)) eq=paste(eq,"\",\"",element.r2,sep="")
}
}
}
if(eq.plot==F) {
# Linear y=mx+c equations
if(model.type=='lin') {
fit = lm(df[[y.var]] ~ df[[x.var]]);
eq <- substitute(italic(r)^2~"="~r2,
list(r2 = signif(summary(fit)$r.squared, signif.figs)))
as.character(as.expression(eq));
}
# polynomial expression generated with the ns() function [splines package]
if(model.type=='poly') {
fit = lm(df[[y.var]] ~ ns(df[[x.var]],order));
eq = gsub('!r2!',signif(summary(fit)$r.squared,signif.figs),"italic(r)^2~\"=\"~!r2!")
}
}
return(eq)
}