Boxplot sublists of a list - r

i'm new to R.
I'm trying to boxplot the data (df, list) in each sub-list using lapply.
I have written this function:
group.box <- function(x) {
lapply(X = x, FUN = boxplot)
}
Running it on the list that contains 6 sub-lists gives me 6 individual boxplot graph (6 separated graphs) and this text:
$sublist1
NULL
$sublist2
NULL
$sublist3
NULL
...
I tried to combine these graphs into one picture with 6 graphs:
par(mfrow=c(2,3))
group.box(data)
dev.off()
But then I only get the text (as displayed above) with no graphs.
I thought maybe I should just export these 6 graphs into one pdf file.
Thank you!

You could try
data <- data.frame(a = rnorm(100), b = rnorm(100), c = rnorm(100), d = rnorm(100), e = rnorm(100), f = rnorm(100))
group.box <- function(x, plot_row, plot_col) {
quartz()
par(mfrow=c(plot_row,plot_col))
lapply(X = x, FUN = boxplot)
}
group.box(data, 2,3)
You can of course use png(...) or pdf(...) etc. instead of quartz()

Related

Problem converting tbl_cube with list measure into a tibble

I'd like to convert a tbl_cube with a list-valued measure into a tibble, but am running into problems.
For example
Y <- array(1:4, dim = c(2,2))
is a "cube" of integers (it only has two axes so it's really a matrix) which can be turned into a tbl_cube as follows
Y_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(y = Y))
which in turn can be converted into a tibble as follows as_tibble(Y_cube).
However, if we start with a list-valued array then we can create the tbl_cube:
X <- array(lapply(1:4, seq_len), dim = c(2,2))
X_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(x = X))
but as_tibble(X_cube) generates an error:
Error: Can't subset elements that don't exist.
x The locations 3 and 4 don't exist.
i There are only 2 elements.
I'm using dplyr version dplyr_0.8.5.

Creating a boxplot loop with ggplot2 for only certain variables

I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots
I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots

create a function that generate plots from a data

I have a data with 8 variables (x1,y1,x2,y2,x3,y3,x4,y4), and i should do a function that generate a 4 plots x1vsy1, x2vsy2, x3vsy3 and x4vsy4.
So i was trying to do that one by one, doing a new data with the variables and after generate de plot.
minidata<-select(alldata,x1,y1)
ggplot(minidata,aes(x1,y1))+geom_point()+ggtitle("m VS n")
This works, but when i try to put that in the function
graph<-function(m,n){
minidata<-select(alldata,m,n)
ggplot(minidata,aes(x=m,y=n))+geom_point()+ggtitle("m VS n")
}
graph(y1,x1)
This doesnt work say "Error in FUN(X[[i]], ...) : object 'y1' not found"
what i could do to generate a function that creates the 4 plots?
There are a number of ways doing this. One approach is:
minidata <- data.frame( x1 = 1:20,
y1 = rnorm(20),
x2 = 1:20,
y2 = runif(20))
myGraph <- function( df, x, y ){
mdf <- df[ ,c(x,y)]
names(mdf) <- c("x","y")
ggplot(mdf,aes(x=x,y=y))+geom_point() + ggtitle(paste(y,"~",x)) + labs (x =x, y = y)
}
# call function by passing names of the column using names() function
myGraph (minidata, names(minidata)[1], names(minidata)[2])
# or simply giving a name
myGraph (minidata, "x2", "y2")

R: object y not found in function (x,y) [function to pass through data frames in r]

I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}

How to avoid gaps due to missing values in matplot in R?

I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.
I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")
You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])
Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.

Resources