R: Increase space between multiple boxplots to avoid omitted x axis labels - r

Let's say I generate 5 sets of random data and want to visualize them using boxplots and save those to a file "boxplots.png". Using the code
png("boxplots.png")
data <- matrix(rnorm(25),5,5)
boxplot(data, names = c("Name1","Name2","Name3","Name4","Name5"))
dev.off()
there are 5 boxplots created as desired in "boxplots.png", however the names for the second ("Name2") and the fourth ("Name4") boxplot are omitted. Even changing the window of my png-view makes no difference. How can I avoid this behavior?
Thank you!

Your offered code does not produce an overlap in my setting, but that point is relatively moot: you want a way to allow more space between words.
One (brute-force-ish) way to fix the symptom is to alternate putting them on separate lines:
set.seed(42)
data <- matrix(rnorm(25),5,5)
nms <- c("Name1","Name2","Name3","Name4","Name5")
oddnums <- which(seq_along(nms) %% 2 == 0)
evennums <- which(seq_along(nms) %% 2 == 1)
(There's got to be a better way to do that, but it works.)
From here:
png("boxplot.png", height = 240)
boxplot(data, names = FALSE)
mtext(nms[oddnums], side = 1, line = 2, at = oddnums)
mtext(nms[evennums], side = 1, line = 1, at = evennums)
dev.off()
(The use of png is not important here, I just used it because of your edit.)

Related

Save multiple ggplots from a for loop in a single plot in a particular layout

I am trying to plot a single image that contains 35 ggplots. The order of the plots in the single image is fixed and is shown below.
I also want blank grids as shown in the grid image. Each grid should have the plot with a particular drug number. I have a data frame "drug_dctv2" which I am splitting, and making into a list to read data into the for loop.
The problem is: In plot_list[[i]], only the last plot is saved 35 times with i (1 to 35). I am also not sure how to save the plots in the particular order as shown in the grid.
Through my internet search, I found library like "cowplot" and "gridextra" but I couldn't find a proper way to implement these.
I made a plot layout file which contains the drug names in the following order as shown in the grid image and in place of blank spaces, I inserted "tab". But I do not find a way to proceed from there.
I am new to R. Any help and suggestion will be appreciated.
Data set looks like as shown below. Each Drug has 10 data points.
**Drug_name conc viab**
Drug_1 1 1.0265
Drug_1 0.1 1.2365
Drug_1 0.01 0.5896
-- -- --
Drug_2 1 2.0584
Drug_2 0.1 1.0277
Drug_2 0.01 1.5696
-- -- --
#
split <- split(file,rep(1:35,each=10)) #### this will be used in the for loop
plot_list = list()
for(i in 1:length(split))
{
data <- split[[i]]
c <- data$conc
v <- data$viab
p = ggplot(data = data,aes(x=c,y=v))+geom_point()+ylim(0,1.5)+
scale_x_continuous(trans='log10')+
theme(axis.text = element_blank(),axis.title = element_blank()) +
geom_line(data=line_data, aes(x=x,y=y2),color ="red",size=1)
plot_list[[i]] = p
}
Thank you in advance !!
ggplot, as many tidyverse packages, use delayed non standard evaluation. The expression you provide inside aes is not evaluated until the plot is built (e.g. printed or saved).
The expression in your question refers to the vectors c and v defined in the for loop. These vectors change on each iteration, but the aes call only contains an expression to the reference to c and v in the environment where the for loop is running, so the c and v values used in the plot are the ones available when the plot is printed or saved.
You can, as mentioned in the comments, use a column from the data frame directly, since ggplot evaluates the data frame when ggplot() is called.
An alternative if you wanted to keep using c and v, is to make sure each iteration runs in an independent environment, so ggplot references for c and v point to the different c and v on each loop iteration. This can be done for instance replacing the for loop with an lapply call.
plot_list <- lapply(split, function(data_drug) {
c <- data_drug$conc
v <- data_drug$viab
ggplot(data = data_drug,aes(x=c,y=v))+geom_point()+ylim(0,1.5)+
scale_x_continuous(trans='log10')+
theme(axis.text = element_blank(),axis.title = element_blank()) +
geom_line(data=line_data, aes(x=x,y=y2),color ="red",size=1)
})
This is one beautiful example where a for loop and an lapply call produce different results and it's a great learning experience about non standard evaluation and variable environments.
To combine the plots look at cowplot::plot_grid https://wilkelab.org/cowplot/articles/plot_grid.html
Something like this should work
library(cowplot)
plot_grid(
plot_list[[35]], plot_list[[5]], plot_list[[3]], plot_list[[2]],
plot_list[[34]], plot_list[[1]], plot_list[[4]], plot_list[[6]],
plot_list[[32]], plot_list[[8]], NULL, NULL,
plot_list[[30]], plot_list[[7]], plot_list[[33]] , NULL,
labels = "AUTO", ncol = 4
)
You can put all the function arguments in a list and use do.call to call the function with the arguments:
plot_order <- c(
35, 5, 3, 2,
34, 1, 4, 6,
32, 8, NA, NA
)
plot_grid_args <- c(plot_list[plot_order], list(ncol = 4))
do.call(plot_grid, plot_grid_args)
So, Finally I was able to solve this problem.
I made a variable layout with the position of the drugs as they are in the split[i] list. For eg: drug_35 has to come first on the grid and it is on 35th position in split[i] list, so in "layout" variable 35 comes first and so on.
I made a text file with the grid layout as shown above in the image and then read that file in the R script and by some lines of codes I was able to make the layout variable. For the sake of simplicity I am not showing those code lines here. But, I hope the concept is clear.
lay <- read.delim("layout.txt",stringsAsFactors = FALSE,sep = "\t", header = F)
lay1 = c(t(lay))
col_n = ncol(lay)
row_n = nrow(lay)
split <- split(file,rep(1:35,each=10))
## layout = 35 5 3 2 34 1 4 6 32 8 0 0 30 7 33 .....
## 0 means blank spaces
png("PLOT.png", width = 6, height = 10, units = "in", res = 400)
par(mfrow=c(row_n,col_n),mar=c(2,0.7,1.5,0.5)) ## margins: bottom, left, top and right
for(i in layout)
{
if(i== 0) { frame(); next; }
## Here if 0 comes then the for loop will be skipped and frame() will generate a blank in the grid image
data <- split[[i]]
c <- data$conc
v <- data$viab
plot(c,v,xlab = NULL,ylab = NULL, axes = F,log = "x")
}
dev.off()

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Using loops to set layout dimensions R

I am not sure this is possible. Basically what I'm trying to do is create a plot loop where if more than 5 plots are to be plotted then a second row of plots should be done ncol = to 5- number of plots.
data=matrix(rbinom(10*1000, 1, .5), ncol=10)
subdata1 = data[1:5,]
subdata2 = data[1:7,]
if (nrow(subdata1) <= 5){
par(mfrow = c(1, nrow(subdata1)))
for (i in 1:nrow(subdata1)){
plot(as.numeric(subdata1[i,1:5]), as.numeric(subdata1[i,6:10]))
}
}else{
## need to figure out how to bind layout based on nrows
## i.e. subdata2
return(NULL)
}
Basically I'm building a shinny app where based on users selections there could be anywhere from 1 plot to 10 and I want to be able to display this as nice as possible.
If you want to be as nice as possible perhaps you should look at the easy option of using the n2mfrow() function. This takes a number and turns it into the best row/column combination. With your example you can do par(mfrow = n2mfrow(nrow(data))) before running your for-loop plot. However, this will not fix the plots to 5 columns.

R - Histogram Doesn't show density due to magnitude of the Data

I have a vector called data with length 444000 approximately, and most of the numeric values are between 1 and 100 (almost all of them). I want to draw the histogram and draw the the appropriate density on it. However, when I draw the histogram I get this:
hist(data,freq=FALSE)
What can I do to actually see a more detailed histogram? I tried to use the breaks code, it helped, but it's really hard do see the histogram, because it's so small. For example I used breaks = 2000 and got this:
Is there something that I can do? Thanks!
Since you don't show data, I'll generate some random data:
d <- c(rexp(1e4, 100), runif(100, max=5e4))
hist(d)
Dealing with outliers like this, you can display the histogram of the logs, but that may difficult to interpret:
If you are okay with showing a subset of the data, then you can filter the outliers out either dynamically (perhaps using quantile) or manually. The important thing when showing this visualization in your analysis is that if you must remove data for the plot, then be up-front when the removal. (This is terse ... it would also be informative to include the range and/or other properties of the omitted data, but that's subjective and will differ based on the actual data.)
quantile(d, seq(0, 1, len=11))
d2 <- d[ d < quantile(d, 0.90) ]
hist(d2)
txt <- sprintf("(%d points shown, %d excluded)", length(d2), length(d) - length(d2))
mtext(txt, side = 1, line = 3, adj = 1)
d3 <- d[ d < 10 ]
hist(d3)
txt <- sprintf("(%d points shown, %d excluded)", length(d3), length(d) - length(d3))
mtext(txt, side = 1, line = 3, adj = 1)

spplot() - make color.key look nice

I'm afraid I have a spplot() question again.
I want the colors in my spplot() to represent absolute values, not automatic values as spplot does it by default.
I achieve this by making a factor out of the variable I want to draw (using the command cut()). This works very fine, but the color-key doesn't look good at all.
See it yourself:
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
meuse.grid$random[meuse.grid$random < 0] <- 0
meuse.grid$random[meuse.grid$random > 10] <- 10
# making a factor out of meuse.grid$ random to have absolute values plotted
meuse.grid$random <- cut(meuse.grid$random, seq(0, 10, 0.1))
spplot(meuse.grid, c("random"), col.regions = rainbow(100, start = 4/6, end = 1))
How can I have the color.key on the right look good - I'd like to have fewer ticks and fewer labels (maybe just one label on each extreme of the color.key)
Thank you in advance!
[edit]
To make clear what I mean with absolute values: Imagine a map where I want to display the sea height. Seaheight = 0 (which is the min-value) should always be displayed blue. Seaheight = 10 (which, just for the sake of the example, is the max-value) should always be displayed red. Even if there is no sea on the regions displayed on the map, this shouldn't change.
I achieve this with the cut() command in my example. So this part works fine.
THIS IS WHAT MY QUESTION IS ABOUT
What I don't like is the color description on the right side. There are 100 ticks and each tick has a label. I want fewer ticks and fewer labels.
The way to go is using the attribute colorkey. For example:
## labels
labelat = c(1, 2, 3, 4, 5)
labeltext = c("one", "two", "three", "four", "five")
## plot
spplot(meuse.grid,
c("random"),
col.regions = rainbow(100, start = 4/6, end = 1),
colorkey = list(
labels=list(
at = labelat,
labels = labeltext
)
)
)
First, it's not at all clear what you are wanting here. There are many ways to make the color.key look "nice" and that is to understand first the data being passed to spplot and what is being asked of it. cut() is providing fully formatted intervals like (2.3, 5.34] which will need to be handled a different way, increasing the margins in the plot, specific formatting and spacing for the labels, etc. etc. This just may not be what you ultimately want.
Perhaps you just want integer values, rounded from the input values?
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
Round the values (or trunc(), ceil(), floor() them . . .)
meuse.grid$rclass <- round(meuse.grid$random)
spplot(meuse.grid, c("rclass"), col.regions = rainbow(100, start = 4/6, end = 1))

Resources