create a function that generate plots from a data

create a function that generate plots from a data - r

I have a data with 8 variables (x1,y1,x2,y2,x3,y3,x4,y4), and i should do a function that generate a 4 plots x1vsy1, x2vsy2, x3vsy3 and x4vsy4.
So i was trying to do that one by one, doing a new data with the variables and after generate de plot.
minidata<-select(alldata,x1,y1)
ggplot(minidata,aes(x1,y1))+geom_point()+ggtitle("m VS n")
This works, but when i try to put that in the function
graph<-function(m,n){
minidata<-select(alldata,m,n)
ggplot(minidata,aes(x=m,y=n))+geom_point()+ggtitle("m VS n")
}
graph(y1,x1)
This doesnt work say "Error in FUN(X[[i]], ...) : object 'y1' not found"
what i could do to generate a function that creates the 4 plots?

There are a number of ways doing this. One approach is:
minidata <- data.frame( x1 = 1:20,
y1 = rnorm(20),
x2 = 1:20,
y2 = runif(20))
myGraph <- function( df, x, y ){
mdf <- df[ ,c(x,y)]
names(mdf) <- c("x","y")
ggplot(mdf,aes(x=x,y=y))+geom_point() + ggtitle(paste(y,"~",x)) + labs (x =x, y = y)
}
# call function by passing names of the column using names() function
myGraph (minidata, names(minidata)[1], names(minidata)[2])
# or simply giving a name
myGraph (minidata, "x2", "y2")

Related

Regression in R using a function

I am trying to smooth out my data for each variable in the data frame. Lets say it looks like this:
data <- data.frame(v1 = c(0.5,1.1,2.9,3.4,4.1,5.7,6.3,7.4,6.9,8.5,9.1),
v2 = c(0.1,0.8,0.5,1.1,1.9,2.4,0.8,3.4,2.9,3.1,4.2),
v3 = c(1.3,2.1,0.8,4.1,5.9,8.1,4.3,9.1,9.2,8.4,7.4))
data$x <- 1:nrow(data)
I then specify my x and y variables as:
x <- data$x
y <- data$v1
I can fit the predicted line I want (and I am happy with the process):
f <- function (x,a,b,d) {(a*x^2) + (b*x) + d}
order_two <- nls(y ~ f(x,a,b,d), start = c(a=1, b=1, d=1))
co2 <- coef(order_two)
data$order_two_predicted_v1 <- (co2[1] * (data$x)^2) + (co2[2] * data$x) + co2[3]
I therefore end up with an appropriately titled new variable (the predicted values for v1). I now want to do this for each of the other 100 variables in my data frame (v2 and v3 in this example).
I tried using a function to do this but can't get it to work as intended. Here is my attempt:
myfunction <- function(xaxis,yaxis){
# Specfiy my "y" and "x"
x <- data$xaxis
y <- data$yaxis
f <- function (x,a,b,d) {(a*x^2) + (b*x) + d}
order_two <- nls(y ~ f(x,a,b,d), start = c(a=1, b=1, d=1))
co2 <- coef(order_two)
data$order_two_predicted_yaxis <- (co2[1] * (data$x)^2) + (co2[2] * data$x) + co2[3]
}
myfunction(x,v1)
myfunction(x,v2)
myfunction(x,v3)
Not only does the function not work as intended, I would like to avoid calling the function 100 times for each variable and instead somehow loop through it.
This is really simple to do in SAS using macros but I am struggling to get this to work in R.

You can model your data directly with the lm() function:
data <- data.frame(v1 = c(0.5,1.1,2.9,3.4,4.1,5.7,6.3,7.4,6.9,8.5,9.1),
v2 = c(0.1,0.8,0.5,1.1,1.9,2.4,0.8,3.4,2.9,3.1,4.2),
v3 = c(1.3,2.1,0.8,4.1,5.9,8.1,4.3,9.1,9.2,8.4,7.4))
x <- 1:nrow(data)
# initialize a list to store the models
models = vector("list", length = (ncol(data)))
# create a loop running over the columns of data
for (i in 1:(ncol(data))){
models[[i]] = lm(data[,i] ~ poly(x,2, raw = TRUE))}
You can also use lapply instead of the for-loop, as stated in the comments.
Use predict() to get the values of the models:
smoothed_v1 = predict(model[[1]], newdata=data.frame(x = x))
Edit:
Regarding your comment - you can store the new values in data with:
for (i in (length(models):1)){
data <- cbind(predict(models[[i]], newdata=data.frame(x = x)), data)
# set the name for the new column
names(data)[1] = paste("pred_v",i, sep ="")}

Creating a boxplot loop with ggplot2 for only certain variables

I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots

I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots

Sending dataframes within list to a plot function

I'm trying to make multiple ggplot charts from multiple data frames. I have developed the code below but the final loop doesn't work.
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x) {
x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b)) +
ggsave(paste0(substitute(x),".png"))
}
ll <- list(df1,df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]])
}
I know its something to do with
ll[[i]]
but I dont understand why because when I put that in the console it gives the dataframe I want. Also, is there a way do this the tidyverse way with the map functions instead of a for loop?

I assume you want to see two files called df1.png and df2.png at the end.
You need to somehow pass on the names of the dataframes to the function. One way of doing it would be through named list, passing the name along with the content of the list element.
library(ggplot2)
library(purrr)
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x, nm) {
p <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(nm,".png"), p, device = "png")
}
ll <- list(df1=df1,df2=df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]], names(ll[i]))
}
In tidyverse you could just replace the loop with the following command without modifying the function.
purrr::walk2(ll, names(ll),chart_it)
or simply
purrr::iwalk(ll, chart_it)
There's also imap and lmap, but they will leave some output in the console, which is not what you would like to do, I guess.

The problem is in your chart_it function. It doesn't return a ggplot. Try saving the result of the pipe into a variable and return() that (or place it as the last statement in the function).
Something along the lines of
chart_it <- function(x) {
chart <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(substitute(x),".png")) # this will save the last ggplot figure
return(chart)
}

Boxplot sublists of a list

i'm new to R.
I'm trying to boxplot the data (df, list) in each sub-list using lapply.
I have written this function:
group.box <- function(x) {
lapply(X = x, FUN = boxplot)
}
Running it on the list that contains 6 sub-lists gives me 6 individual boxplot graph (6 separated graphs) and this text:
$sublist1
NULL
$sublist2
NULL
$sublist3
NULL
...
I tried to combine these graphs into one picture with 6 graphs:
par(mfrow=c(2,3))
group.box(data)
dev.off()
But then I only get the text (as displayed above) with no graphs.
I thought maybe I should just export these 6 graphs into one pdf file.
Thank you!

You could try
data <- data.frame(a = rnorm(100), b = rnorm(100), c = rnorm(100), d = rnorm(100), e = rnorm(100), f = rnorm(100))
group.box <- function(x, plot_row, plot_col) {
quartz()
par(mfrow=c(plot_row,plot_col))
lapply(X = x, FUN = boxplot)
}
group.box(data, 2,3)
You can of course use png(...) or pdf(...) etc. instead of quartz()

Is there a mapply equivalent in dplyr?

Using dplyr: is there a way to loop over variables in a data frame and pass both the data and the variable name to a custom function?
I have a solution for this using mapply in base R. In the interest of learning I am wondering if there is a neat dplyr-way to achieve the same result.
Here is a small example, where each column in a data frame is transformed by adding a constant. The constant I wish to add is different for each variable, as listed in myconstants.
library(tidyverse)
mydata <- tibble(
a = 1:5,
b = 1:5,
c = 1:5
)
myconstants <- tibble(
a = 10,
b = 20
)
custom_function <- function (x, y, k) {
constant <- if (is.null(k[[y]])) 0 else k[[y]]
x + constant
}
# solution in base R
foo <- mapply(
custom_function,
mydata,
names(mydata),
MoreArgs = list(k = myconstants)
) %>%
as_tibble()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

create a function that generate plots from a data - r

Related

Regression in R using a function

Creating a boxplot loop with ggplot2 for only certain variables

Sending dataframes within list to a plot function

Boxplot sublists of a list

Is there a mapply equivalent in dplyr?

Categories

Resources