I did some non-linear regression and retrieved the fitting coefficients/parameters b0,b1,b2, and b3 in a data frame df. I would like to plot the same function with parameters from each row as an overlay with the initial data points (from df2) that I have used for fitting. All contained in one single graph. Any help is appreciated. Thank you in advance!
group
b0
b1
b2
b3
abc
6.3
8.9
1.66
0.025
def
5.1
8.9
1.48
1.27
ghi
5.5
9.0
1.41
1.03
group<-c("abc","def","ghi")
b0 <- c(6.3,5.1,5.5)
b1 <- c(8.9,8.9,9.0)
b2 <- c(1.66,1.48,1.41)
b3 <- c(0.025,1.27,1.03)
df <- data.frame(group,b0, b1, b2,b3)
I have tried to apply the solution from other posts but couldn't make it work.
The following just connects the points of all 3 functions without distinguishing between the group, as it does for geom_point()
output
f <- function(x, b0,b1,b2,b3) b0*exp(-0.5*((x-b1)/b2)^2) + b3
ggplot(df2, aes(x = x, y = y, color=group))+
geom_point() +
stat_function(fun = f, args = list(b0 = df$b0, b1 = df$b1, b2 = df$b2, b3 = df$b3))
And the solution from here approach 2 returns
1: Computation failed in stat_function(): non-numeric argument to
binary operator
coeflines <-
alply(as.matrix(df), 1, function(df) {
stat_function(fun=function(x){df[2]*exp(-0.5*((x-df[3])/df[4])^2) + df[5]}, colour="grey")
})
ggplot(df2, aes(x=x, y=y, color=group)) +
scale_x_continuous(limits=c(0,15)) +
scale_y_continuous(limits=c(0,15)) +
coeflines +
geom_point()
Thank you for your help!
Making some assumptions about the value of x here. It looks like you're trying to use stat_function in a way that isn't intended. It doesn't handle grouping and isn't meant to take a data set as input, so you would need repeated calls for each set of parameters. Instead, we can just evaluate the function for each set of variables, store the results, and plot those. I'm also sampling a few points from each function to simulate your request to put the original data on the plot as well.
library(tidyverse)
f <- function(x, b0,b1,b2,b3) b0*exp(-0.5*((x-b1)/b2)^2) + b3
df.func <- df %>%
group_by(group, b0, b1, b2, b3) %>%
summarize(
x = seq(0, 20, length = 100),
y = f(x, b0, b1, b2, b3)
)
# A tibble: 300 × 7
# Groups: group, b0, b1, b2 [3]
group b0 b1 b2 b3 x y
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 abc 6.3 8.9 1.66 0.025 0 0.0250
2 abc 6.3 8.9 1.66 0.025 0.202 0.0250
3 abc 6.3 8.9 1.66 0.025 0.404 0.0250
4 abc 6.3 8.9 1.66 0.025 0.606 0.0250
5 abc 6.3 8.9 1.66 0.025 0.808 0.0250
6 abc 6.3 8.9 1.66 0.025 1.01 0.0251
7 abc 6.3 8.9 1.66 0.025 1.21 0.0251
8 abc 6.3 8.9 1.66 0.025 1.41 0.0252
9 abc 6.3 8.9 1.66 0.025 1.62 0.0254
10 abc 6.3 8.9 1.66 0.025 1.82 0.0257
# … with 290 more rows
df.points <- df.func %>%
sample_n(10)
ggplot(df.func, aes(x = x, y = y, color = group))+
geom_line() +
geom_point(data = df.points)
Related
I have a set of 500 equations listed in a single column of a .csv file. The equations are written as text like this (for example):
15+6.2*A-4.3*B+3.7*C-7.9*B*C+2*D^2
(this is the "right" side of the equation, which equals "y", but the text "y=" does not appear in the .csv file)
These are general linear models that have been written to a .csv file by someone else. Not all models have the same number of variables.
I would like to read these functions into R and format them in a way that will allow for using them to (iteratively) make simple line plots (one for each n = 500 models) of "y" across a range of values for A (shown on the x-axis), given values of B, C, and D.
Does anyone have any suggestions for how to do this?
I thought of something based on this [post][1], it is not the best solution, but it seems to work.
Equations
Created two equations for an example
models <- c("15+6.2*A-4.3*B+3.7*C-7.9*B*C+2*D^2","50+6.2*A-4.3*B+3.7*C-7.9*B*C+2*D^2")
models_names <- c("model1","model2")
Data
Random data as an example
data <-
tibble(
A = rnorm(100),
B = rnorm(100),
C = rnorm(100),
D = rnorm(100)
)
Function
Then a created a function that uses those text equations and apply as function returning the values
text_model <- function(formula){
eval(parse(text = paste('f <- function(A,B,C,D) { return(' , formula , ')}', sep='')))
out <- f(data$A,data$B,data$C,data$D)
return(out)
}
Applied equations
Finally, I apply each equation for the data, binding both.
data %>%
bind_cols(
map(.x = models,.f = text_model) %>%
set_names(models_names) %>%
bind_rows(.id = "model")
)
# A tibble: 100 x 6
A B C D model1 model2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.0633 1.18 -0.409 2.01 9.52 54.9
2 -0.00207 1.35 1.28 1.59 9.16 40.3
3 0.798 -0.141 1.58 -0.123 20.6 63.2
4 -0.162 -0.0795 0.408 0.663 14.3 52.0
5 -1.11 0.788 -1.37 1.20 4.71 46.0
6 2.80 1.84 -0.850 0.161 24.4 68.7
7 1.03 0.550 0.907 -1.92 19.0 60.8
8 0.515 -0.179 -0.980 0.0437 19.0 48.9
9 -0.353 0.0643 1.39 1.30 12.5 55.3
10 -0.427 -1.01 -1.11 -0.547 16.7 39.3
# ... with 90 more rows
I'm working with this data.frame and I would like to create a new column called "predicted" whose values are calculated with this formula:
rbeta(1,alfa,beta)
Here's some example data:
data<-structure(list(mu = c(0.548403436247893, 0.944576856539307, 0.72167558981069,
0.721610257581108, 0.987386739865525), kappa = c(77.8230430114621,
26.2939905325391, 28.0123299600893, 24.5166019567386, 42.8769003810988
), alfa = c(42.6784242067533, 24.8366949231, 20.2158147459191,
17.6914314530156, 42.336082882832), beta = c(35.1446188047087,
1.45729560943902, 7.7965152141702, 6.82517050372298, 0.540817498266786
)), class = "data.frame", row.names = c(NA, -5L))
Thanks
The first argument to rbeta is the number of values you want - use the number of rows of data, not 1.
data$predicted = with(data, rbeta(nrow(data), alfa, beta))
(rbeta is vectorized over the shape1 and shape2 parameters).
Or with dplyr:
library(dplyr)
data %>%
mutate(predicted = rbeta(n(), alfa, beta))
We can use package purrr to iterate over 2 columns:
library(purrr)
data %>%
mutate(predicted = map2_dbl(alfa, beta, ~ rbeta(1, .x, .y)))
mu kappa alfa beta predicted
1 0.5484034 77.82304 42.67842 35.1446188 0.5618492
2 0.9445769 26.29399 24.83669 1.4572956 0.9805548
3 0.7216756 28.01233 20.21581 7.7965152 0.7686036
4 0.7216103 24.51660 17.69143 6.8251705 0.8851859
5 0.9873867 42.87690 42.33608 0.5408175 0.9991376
If you use tibble instead of data.frame you can do it at the same time you're defining the mu, kappa, alfa and beta columns:
library(tibble)
data <- tibble(mu = c(0.548403436247893, 0.944576856539307, 0.72167558981069, 0.721610257581108, 0.987386739865525),
kappa = c(77.8230430114621, 26.2939905325391, 28.0123299600893, 24.5166019567386, 42.8769003810988),
alfa = c(42.6784242067533, 24.8366949231, 20.2158147459191, 17.6914314530156, 42.336082882832),
beta = c(35.1446188047087, 1.45729560943902, 7.7965152141702, 6.82517050372298, 0.540817498266786),
predicted = rbeta(n(), alfa, beta)
)
data
## A tibble: 5 x 5
# mu kappa alfa beta predicted
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.548 77.8 42.7 35.1 0.534
#2 0.945 26.3 24.8 1.46 0.846
#3 0.722 28.0 20.2 7.80 0.797
#4 0.722 24.5 17.7 6.83 0.653
#5 0.987 42.9 42.3 0.541 0.991
tibble is an enhanced version of data.frame and may worth take a look: https://r4ds.had.co.nz/tibbles.html
I'm trying to carry out the following action on the columns of a dataframe (df1):
term1+term2+term3*req_no
req_no is a range of numbers: 20:24
df1:
ID term1 term2 term3
X299 1.2 2.3 0.12
X300 1.4 0.6 2.4
X301 0.3 1.6 1.2
X302 0.9 0.6 0.4
X303 0.3 1.8 0.3
X304 1.3 0.3 2.1
I need help t get this output and here's my attempt:
Required output:
ID 20 21 22 23 24
X299 5.9 6.02 6.14 6.26 6.38
X300 50 52.4 54.8 57.2 59.6
X301 25.9 27.1 28.3 29.5 30.7
X302 9.5 9.9 10.3 10.7 11.1
X303 8.1 8.4 8.7 9 9.3
X304 43.6 45.7 47.8 49.9 52
Here's:
results <- list()
req_no <- 20:25
for(i in 1:nrow(df1){
for(j in rq_no){
res <- term1+term2+term3*j
results[j] <- res
}
results[[i]]
}
results2 <- do.call("rbind",result)
Help will be appreciated.
Here are a couple different approaches, though neither as succinct as Parfait's. Sample data:
df <- data.frame(ID=c("X299", "X300"),
term1=c(1.2, 1.4),
term2=c(2.3, 0.6),
term3=c(0.12, 2.4))
req_no <- 20:25
Loop approach
Your initial approach is headed in the right direction, but in the future, it would help to specify exactly what your error or problem is. For an iterated and perhaps easier-to-read approach, here's one answer:
results <- matrix(data=NA, nrow=nrow(df), ncol=length(req_no)) # Empty matrix to store our results
colnames(results) <- req_no # Optional; name columns based off of req_no values
for(i in 1:nrow(df)) {
# Do the calculation we want; returns a vector length 6
res <- df[i,]$term1 + df[i,]$term2 + (df[i,]$term3 * req_no)
# Save results for row i of df into row i of results matrix
results[i,] <- res
}
# Now bind the columns (named 20 through 25) to the respective rows of df
output <- cbind(df, results)
output
From your initial attempt, note:
We only do one loop, since it is easy to multiply by a vector in R
There are a few ways to subset data from a data frame in R. In this case, df[i,] gets everything in the i-th row, while $termX gets value in the column named termX
Using a results matrix instead of a list makes it very easy to copy the temporary computations (for each row) into rows of the matrix
Rather than rbind() (row bind), we want cbind() (column bind) to bind those results to new columns of the original rows.
Output:
ID term1 term2 term3 20 21 22 23 24 25
1 X299 1.2 2.3 0.12 5.9 6.02 6.14 6.26 6.38 6.5
2 X300 1.4 0.6 2.40 50.0 52.40 54.80 57.20 59.60 62.0
Dplyr/purrr functions
This could also be solved using tidy functions. In essence it's a pretty similar approach to Parfait's answer, but I've made the steps a bit more verbose to see what's going on.
# Use purrr's map functions to do the computation we want
nested_df <- df %>%
# Make new column holding term3 * req_no (stores a vector in each new cell)
mutate(term3r = map(term3, ~ .x * req_no)) %>%
# Make new column which sums the three columns of interest (stores a vector in each new cell)
mutate(sum = pmap(list(term1, term2, term3r), ~ ..1 + ..2 + ..3))
# "Unnest" those vectors which store our sums, and keep only those and ID
output <- nested_df %>%
# Creates six new columns (named ...1 to ...6) with the elements of each sum
unnest_wider(sum) %>%
# Keeps only the output data and IDs
select(ID, ...1:...6)
output
Output:
# A tibble: 2 x 7
ID ...1 ...2 ...3 ...4 ...5 ...6
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 X299 5.9 6.02 6.14 6.26 6.38 6.5
2 X300 50 52.4 54.8 57.2 59.6 62
Consider directly assigning new columns with sapply using your formula:
df[paste0(req_no)] <- sapply(req_no, function(r) with(df, term1 + term2 + term3 * r))
df
# ID term1 term2 term3 20 21 22 23 24
# 1 X299 1.2 2.3 0.12 5.9 6.02 6.14 6.26 6.38
# 2 X300 1.4 0.6 2.40 50.0 52.40 54.80 57.20 59.60
# 3 X301 0.3 1.6 1.20 25.9 27.10 28.30 29.50 30.70
# 4 X302 0.9 0.6 0.40 9.5 9.90 10.30 10.70 11.10
# 5 X303 0.3 1.8 0.30 8.1 8.40 8.70 9.00 9.30
# 6 X304 1.3 0.3 2.10 43.6 45.70 47.80 49.90 52.00
I'd like to make a boxplot with mean instead of median. Moreover, I would like the line to stop at 5% (lower) end 95% (upper) quantile. Here the code;
ggplot(data, aes(x=Cement, y=Mean_Gap, fill=Material)) +
geom_boxplot(fatten = NULL,aes(fill=Material), position=position_dodge(.9)) +
xlab("Cement") + ylab("Mean cement layer thickness") +
stat_summary(fun=mean, geom="point", aes(group=Material), position=position_dodge(.9),color="black")
I'd like to change geom to errorbar, but this doesn't work. I tried middle = mean(Mean_Gap), but this doesn't work either. I tried ymin = quantile(y,0.05), but nothing was changing. Can anyone help me?
The standard boxplot using ggplot. fill is Material:
Here is how you can create the boxplot using custom parameters for the box and whiskers. It's the solution shown by #lukeA in stackoverflow.com/a/34529614/6288065, but this one will also show you how to make several boxes by groups.
The R built-in data set called "ToothGrowth" is similar to your data structure so I will use that as an example. We will plot the length of tooth growth (len) for each vitamin C supplement group (supp), separated/filled by dosage level (dose).
# "ToothGrowth" at a glance
head(ToothGrowth)
# len supp dose
#1 4.2 VC 0.5
#2 11.5 VC 0.5
#3 7.3 VC 0.5
#4 5.8 VC 0.5
#5 6.4 VC 0.5
#6 10.0 VC 0.5
library(dplyr)
# recreate the data structure with specific "len" coordinates to plot for each group
df <- ToothGrowth %>%
group_by(supp, dose) %>%
summarise(
y0 = quantile(len, 0.05),
y25 = quantile(len, 0.25),
y50 = mean(len),
y75 = quantile(len, 0.75),
y100 = quantile(len, 0.95))
df
## A tibble: 6 x 7
## Groups: supp [2]
# supp dose y0 y25 y50 y75 y100
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 OJ 0.5 8.74 9.7 13.2 16.2 19.7
#2 OJ 1 16.8 20.3 22.7 25.6 26.9
#3 OJ 2 22.7 24.6 26.1 27.1 30.2
#4 VC 0.5 4.65 5.95 7.98 10.9 11.4
#5 VC 1 14.0 15.3 16.8 17.3 20.8
#6 VC 2 19.8 23.4 26.1 28.8 33.3
# boxplot using the mean for the middle and 95% quantiles for the whiskers
ggplot(df, aes(supp, fill = as.factor(dose))) +
geom_boxplot(
aes(ymin = y0, lower = y25, middle = y50, upper = y75, ymax = y100),
stat = "identity"
) +
labs(y = "len", title = "Boxplot with Mean Middle Line") +
theme(plot.title = element_text(hjust = 0.5))
In the figure above, the boxplot on the left is the standard boxplot with regular median line and regular min/max whiskers. The boxplot on the right uses the mean middle line and 5%/95% quantile whiskers.
Having problems with quoting and unquoting variable names within an R function that uses dplyr. Have been through this site as well as Hadley's Programming with dplyr site and it's still getting the best of me.
The function code that doesn't work is:
gcreatedata <- function(dataframe,depvar,iv1,iv2){
depvar <- enquo(depvar)
iv1 <- enquo(iv1)
iv2 <- enquo(iv2)
newdata <- dataframe %>%
mutate(!!iv1 := factor(!!iv1)) %>%
group_by(!!iv1, !!iv2) %>%
summarise(TheMean = mean(!!depvar,na.rm=TRUE),
TheSD = sd(!!depvar,na.rm=TRUE),
TheSEM = sd(!!depvar,na.rm=TRUE)/sqrt(length(!!depvar)),
CI95Muliplier = qt(.95/2 + .5, length(!!depvar)-1))
return(as_tibble(newdata))
}
calling it with mtcars it would be
sss <- gcreatedata(mtcars,mpg,am,cyl)
I'm simply trying to convert the variable am to a factor for use downstream in a ggplot. Yes I know I could do it before I enter the function but I want it generic. The function works minus the factor step just fine which you can see if you run this version.
gcreatedata <- function(dataframe,depvar,iv1,iv2){
depvar <- enquo(depvar)
iv1 <- enquo(iv1)
iv2 <- enquo(iv2)
newdata <- dataframe %>%
mutate(foo := factor(!!iv1)) %>%
group_by(foo, !!iv2) %>%
summarise(TheMean = mean(!!depvar,na.rm=TRUE),
TheSD = sd(!!depvar,na.rm=TRUE),
TheSEM = sd(!!depvar,na.rm=TRUE)/sqrt(length(!!depvar)),
CI95Muliplier = qt(.95/2 + .5, length(!!depvar)-1))
return(as_tibble(newdata))
}
sss <- gcreatedata(mtcars,mpg,am,cyl)
It returns what I want except for the fact that am has become foo how do I get the name right in this line of code mutate(!!iv1 := factor(!!iv1)) %>% right now I'm getting an Error: LHS must be a name or string message and despite all manner of combinations I could think of no dice.
Thanks in advance.
Your situation is described in the tutorial part here: http://dplyr.tidyverse.org/articles/programming.html#different-input-and-output-variable
The following code works for me:
> library(dplyr)
>
> gcreatedata <- function(dataframe,depvar,iv1,iv2){
+ depvar <- enquo(depvar)
+ iv1_q <- enquo(iv1)
+ iv2 <- enquo(iv2)
+
+ iv1_name <- paste0("mean_", quo_name(iv1_q))
+
+ newdata <- dataframe %>%
+ mutate(!!iv1_name := factor(!!iv1_q)) %>%
+ group_by(!!iv1_q, !!iv2) %>%
+ summarise(TheMean = mean(!!depvar,na.rm=TRUE),
+ TheSD = sd(!!depvar,na.rm=TRUE),
+ TheSEM = sd(!!depvar,na.rm=TRUE)/sqrt(length(!!depvar)),
+ CI95Muliplier = qt(.95/2 + .5, length(!!depvar)-1))
+ return(as_tibble(newdata))
+ }
> sss <- gcreatedata(mtcars,mpg,am,cyl)
> sss
# A tibble: 6 x 6
# Groups: am [?]
am cyl TheMean TheSD TheSEM CI95Muliplier
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 4.00 22.9 1.45 0.839 4.30
2 0 6.00 19.1 1.63 0.816 3.18
3 0 8.00 15.0 2.77 0.801 2.20
4 1.00 4.00 28.1 4.48 1.59 2.36
5 1.00 6.00 20.6 0.751 0.433 4.30
6 1.00 8.00 15.4 0.566 0.400 12.7
Hope that helps!
Well 24 hours brought me a clearer head. Here's the answer should anyone else need it in the future...
gcreatedata <- function(dataframe,depvar,iv1,iv2){
depvar <- enquo(depvar)
iv1 <- enquo(iv1)
iv2 <- enquo(iv2)
newdata <- dataframe %>%
mutate(!!quo_name(iv1) := factor(!!iv1), !!quo_name(iv2) := factor(!!iv2)) %>%
group_by(!!iv1, !!iv2) %>%
summarise(TheMean = mean(!!depvar,na.rm=TRUE),
TheSD = sd(!!depvar,na.rm=TRUE),
TheSEM = sd(!!depvar,na.rm=TRUE)/sqrt(length(!!depvar)),
CI95Muliplier = qt(.95/2 + .5, length(!!depvar)-1))
return(as_tibble(newdata))
}
to test it on common data...
gcreatedata(mtcars,mpg,am,vs)
# A tibble: 4 x 6
# Groups: am [?]
am vs TheMean TheSD TheSEM CI95Muliplier
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 0 0 15.0 2.77 0.801 2.20
2 0 1 20.7 2.47 0.934 2.45
3 1 0 19.8 4.01 1.64 2.57
4 1 1 28.4 4.76 1.80 2.45