R ggplot2 boxplot from 10 files - r

I have 4 files each called 0_X_cell.csv, 0_S_cell.csv and 15_X_cell.csv, 15_S_cell.csv of the format:
p U:0 U:1 U:2 Tracer Tracer_0 U_0:0
-34.014 0.15268 -3.7907 -0.20155 10.081 10.032 0.12454
-33.836 0.07349 -2.1457 -0.30531 27.706 27.278 0.076542
I'd like to create boxplots out of the values for Tracer/3600 and put them on the same graph using ggplot2 but I'm finding it not quite so straightforward. Any suggestions would be much appreciated:
I'm thinking it might something like this:
Import data from all files into separate variables:
Extract Tracer from each one and put into a data.frame
Plot the boxplots of every column Tracer/3600. But each column will be called Tracer...
What would the correct procedure be?

Here's one way to do it (if I understood you correctly):
`0_X_cell.csv` <- `0_S_cell.csv` <- `15_X_cell.csv` <- `15_S_cell.csv` <- read.table(header=T, text="
p U:0 U:1 U:2 Tracer Tracer_0 U_0:0
-34.014 0.15268 -3.7907 -0.20155 10.081 10.032 0.12454
-33.836 0.07349 -2.1457 -0.30531 27.706 27.278 0.076542")
lst <- mget(grep("cell.csv", ls(), fixed=TRUE, value=TRUE))
df <- stack(lapply(lapply(lst, "[", "Tracer"), unlist))
df$ind <- sub("^(\\d+_[A-Z]).*$", "\\1", df$ind)
library(ggplot2)
ggplot(df, aes(ind, values/3600)) + geom_boxplot()

To read in the data from your dir:
z <- list.files(pattern = ".*cell\\.csv$")
z <- lapply(1:length(z), function(x) {chars <- strsplit(z[x], "_");
cbind(data.frame(Tracer = read.csv(z[x])$Tracer), time = chars[[1]][1], treatment = chars[[1]][2])})
z <- do.call(rbind, z)
Then plot it:
library(ggplot2)
ggplot(z, aes(y = Tracer/3600, x = factor(time))) +geom_boxplot(aes(fill = factor(treatment))) + ylab("Tracer")

Related

Plot multiple graphs from multiple csv and follow up analysis

I have over hundreds csv files that I would like to plot graph for each of them. I've searched through the forum and found something that I can use but still need some editing.
The code is originally from Plotting multiple graphs from multiple .csv files using R.
library(dplyr)
list_of_dfs = lapply(list.files('path/to/files', pattern = '*csv'),
function(x) {
dat = read.csv(x)
dat$fname = x
return(dat)
})
one_big_df = list_of_dfs %>% bind_rows()
one_big_df %>% ggplot(aes(x = x, y = y)) + geom_point() + facet_wrap(~ fname)
It works fine except I need to save all the graphs separately.
I also need to analyse the graphs by overlapping the graphs according to the suffixes, is it possible to incorporate in the code?
Example file names:
MAX_C1-B3.csv
MAX_C2-B3.csv
MAX_C1-B4.csv
MAX_C2-B4.csv
...
So the ones with B3 should be in one graph and B4 another graphs.
Thanks for your help in advance!
I am not sure that the following is what the question is asking for.
The main method is always the same,
split the data with base function split. This creates a named list;
pipe the resulting list to seq_along to get index numbers into the list. This allows for access to the list's names attribute and to compose filenames according to them;
pipe the numbers to purrr::map and plot each list member separately;
save the results to disk.
First load the packages needed.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
library(purrr)
})
This is a common function to save the plots.
save_plot <- function(graph, graph_name, type = "") {
# file name depends on suffix and on directory structure
# the files are to be saved to a temp directory
# (it's just a code test)
if(type != "")
graph_name <- paste0(graph_name, "_", type)
filename <- paste0(graph_name, ".pdf")
filename <- file.path("~/Temp", filename)
ggsave(filename, graph, device = "pdf")
}
1. Plot all graphs separately
From the question:
It works fine except I need to save all the graphs separately.
Does this mean that the graphs corresponding to each file are to be saved separately? If yes, then the following code plots and saves them in files with filenames with the extension .csv changed to .pdf.
list_dfs_by_fname <- split(one_big_df, one_big_df$fname)
list_dfs_by_fname %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_fname)[i]
DF <- list_dfs_by_fname[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y)) +
geom_point()
save_plot(graph, graph_name)
})
2. Plot by suffix
First create a new column with either the suffix "B3" or the suffix "B4". Then split the data by groups so defined. The split data is needed for the two plots that follow.
inx <- grepl("B4$", one_big_df$fname)
one_big_df$group <- c("B3", "B4")[inx + 1L]
list_dfs_by_suffix <- split(one_big_df, one_big_df$group)
2.1. Plot by suffix, overlapped
To have the groups of fname overlap, map that variable to the color aesthetic.
list_dfs_by_suffix %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_suffix)[i]
DF <- list_dfs_by_suffix[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y, color = fname)) +
geom_point()
save_plot(graph, graph_name, type = "overlapped")
})
2.2. Plot by suffix, faceted
If the plots are faceted by fname, the code is copied and pasted from the question's with added scales = "free".
list_dfs_by_suffix %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_suffix)[i]
DF <- list_dfs_by_suffix[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
facet_wrap( ~ fname, scales = "free")
save_plot(graph, graph_name, "faceted")
})
Test data
Use built-in data sets iris and mtcars to test the code.
Only the last two instructions matter to the question, they check the data set one_big_df's column names and the values in fname.
suppressPackageStartupMessages({
library(dplyr)
})
df1 <- iris[3:5]
df2 <- mtcars[c("hp", "qsec", "cyl")]
names(df1) <- c("x", "y", "categ")
names(df2) <- c("x", "y", "categ")
df2$categ <- factor(df2$categ)
sp1 <- split(df1[1:2], df1$categ)
sp2 <- split(df2[1:2], df2$categ)
names(sp1) <- sprintf("MAX_C%d-B3", seq_along(sp1))
names(sp2) <- sprintf("MAX_C%d-B4", seq_along(sp2))
list_of_dfs <- c(sp1, sp2)
list_of_dfs <- lapply(seq_along(list_of_dfs), \(i) {
list_of_dfs[[i]]$fname <- names(list_of_dfs)[i]
list_of_dfs[[i]]
})
one_big_df <- list_of_dfs %>% dplyr::bind_rows()
names(one_big_df)
#> [1] "x" "y" "fname"
unique(one_big_df$fname)
#> [1] "MAX_C1-B3" "MAX_C2-B3" "MAX_C3-B3" "MAX_C1-B4" "MAX_C2-B4" "MAX_C3-B4"
Created on 2022-05-31 by the reprex package (v2.0.1)

Sending dataframes within list to a plot function

I'm trying to make multiple ggplot charts from multiple data frames. I have developed the code below but the final loop doesn't work.
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x) {
x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b)) +
ggsave(paste0(substitute(x),".png"))
}
ll <- list(df1,df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]])
}
I know its something to do with
ll[[i]]
but I dont understand why because when I put that in the console it gives the dataframe I want. Also, is there a way do this the tidyverse way with the map functions instead of a for loop?
I assume you want to see two files called df1.png and df2.png at the end.
You need to somehow pass on the names of the dataframes to the function. One way of doing it would be through named list, passing the name along with the content of the list element.
library(ggplot2)
library(purrr)
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x, nm) {
p <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(nm,".png"), p, device = "png")
}
ll <- list(df1=df1,df2=df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]], names(ll[i]))
}
In tidyverse you could just replace the loop with the following command without modifying the function.
purrr::walk2(ll, names(ll),chart_it)
or simply
purrr::iwalk(ll, chart_it)
There's also imap and lmap, but they will leave some output in the console, which is not what you would like to do, I guess.
The problem is in your chart_it function. It doesn't return a ggplot. Try saving the result of the pipe into a variable and return() that (or place it as the last statement in the function).
Something along the lines of
chart_it <- function(x) {
chart <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(substitute(x),".png")) # this will save the last ggplot figure
return(chart)
}

How to print and save multiple ggplot graphs without using for-loops?

I have a dataframe with lots of possible combinations of variables and for exploratory purposes I need to see univariate distributions from these combinations of variables. I succeeded doing it with for loops but would like to find a better and a faster way of doing it. Anybody has an idea?
I have produced a following code:
library(ggplot2)
library(dplyr)
SubjectID <- c(3772113,3772468)
Group <- c("Easy","Hard")
Object <- c("A","B")
dat <- data.frame(expand.grid(SubjectID,Group,Object))
dat$RT <- rnorm(8,1500,700)
colnames(dat) <- c("SubjectID","Group","Object","RT")
# GGplot function
pl <- function(x,group, object){
x <- filter(x, Group==group, Object==object)
print(ggplot(x,aes(x=RT)) +
geom_histogram(binwidth = 0.05) +
xlab("Reactions per second") +
ggtitle(paste(as.character(group),"_",as.character(object)), sep=""))
ggsave(paste(as.character(group),"_",as.character(object),".png"), path = "...")
}
for (group in unique(dat$Group)){
for (object in unique(dat$Object)){
pl(dat,group,object)
}
}
How can I replace the nested for loops in this graph printing?
You can try with lapply:
all_comb <- with(dat, expand.grid(levels(Group), levels(Object)))
lapply(1:nrow(all_comb),
function(i) pl(dat, group = all_comb[i, 1], object=all_comb[i, 2]))

How to extract the endpoints of an interval in R?

I've searched, but I cannot find an answer. I want to further process the data of a plot I've created in R with geom_bin2d. I've extracted the bins (intervals) from such a plot using
> library(ggplot2)
> my_plot <- ggplot(diamonds, aes(x = x, y = y))+ geom_bin2d(bins=3)
> plot_data <- ggplot_build(my_plot)
> data <- plot_data$data[[1]]
> data$xbin[[1]]
[1] [0,3.58]
Levels: [0,3.58] (3.58,7.16] (7.16,10.7] (10.7,14.3]
Nothing I tried worked, including min and mean. How do I access the endpoints of such an interval like data$xbin[[1]]?
(Update: I turned the example into a complete test case based on a built-in data set.)
Something like
library(stringr)
x <- cut(seq(1:5), breaks = 2)
as.numeric(unlist(str_extract_all(as.character(x[1]), "\\d+\\.*\\d*")))
or in you example
my_plot <- ggplot(diamonds, aes(x = x, y = y))+ geom_bin2d(bins=3)
plot_data <- ggplot_build(my_plot)
data <- plot_data$data[[1]]
x <- data$xbin[[1]]
as.numeric(unlist(str_extract_all(as.character(x), "\\d+\\.*\\d*")))[2]
3.58

Organizing data from physics experiments for ggplot2

I am currently trying to use ggplot2 to visualize results from simple current-voltage experiments. I managed to achieve good results for one set of data of course.
However, I have a number of current-voltage datasets, which I input in R recursively to get the following organisation (see minimal code) :
data.frame(cbind(batch(string list), sample(string list), dataset(data.frame list)))
Edit : My data are stored in text files names batchname_samplenumber.txt, with voltage and current columns. The code I use to import them is :
require(plyr)
require(ggplot2)
#VARIABLES
regex <- "([[:alnum:]_]+).([[:alpha:]]+)"
regex2 <- "G5_([[:alnum:]]+)_([[:alnum:]]+).([[:alpha:]]+)"
#FUNCTIONS
getJ <- function(list, k) llply(list, function(i) llply(i, function(i, indix) getElement(i,indix), indix = k))
#FILES
files <- list.files("Data/",full.names= T)
#NAMES FOR FILES
paths <- llply(llply(files, basename),function(i) regmatches(i,regexec(regex,i)))
paths2 <- llply(llply(files, basename),function(i) regmatches(i,regexec(regex2,i)))
names <- llply(llply(getJ(paths, 2)),unlist)
batches <- llply(llply(getJ(paths2, 2)),unlist)
samples <- llply(llply(getJ(paths2, 3)),unlist)
#SETS OF DATA, NAMED
sets <- llply(files,function(i) read.table(i,skip = 0, header = F))
names(sets) <- names
for (i in as.list(names)) names(sets[[i]]) <- c("voltage","current")
df<-data.frame(cbind(batches,samples,sets))
And a minimal data can be generated via :
require(plyr)
batch <- list("A","A","B","B")
sample <- list(1,2,1,2)
set <- list(data.frame(voltage = runif(10), current = runif(10)),data.frame(voltage = runif(10), current = runif(10)),data.frame(voltage = runif(10), current = runif(10)),data.frame(voltage = runif(10), current = runif(10)))
df<-data.frame(cbind(batch,sample,set))
My question is : is it possible to use the data as is to plot using a code similar to the following (which does not work) ?
ggplot(data, aes(x = dataset$current, y = dataset$voltage, colour = sample)) + facet_wrap(~batch)
The more general version would be : is ggplot2 able of handeling raw physical data, as opposed to discrete statistical data (like diamonds, cars) ?
With the newly-defined problem (two-column files named "batchname_samplenumber.txt"), I would suggest the following strategy:
read_custom <- function(f, ...) {
d <- read.table(f, ...)
names(d) <- c("V", "I")
## extract sample and batch from the base filename
ids <- strsplit(gsub(".txt", "", f), "_")
d$batch <- ids[[1]][1]
d$sample <- ids[[1]][2]
d
}
## list files to read
files <- list.files(pattern=".txt")
## read them all in a single data.frame
m <- ldply(files, read_custom)
It's not clear how the sample names are defined with respect to the dataset. The general idea for ggplot2 is that you should group all your data in the form of a melted (long format) data.frame.
library(ggplot2)
library(plyr)
library(reshape2)
l1 <- list(batch="b1", sample=paste("s", 1:4, sep=""),
dataset=data.frame(current=rnorm(10*4), voltage=rnorm(10*4)))
l2 <- list(batch="b2", sample=paste("s", 1:4, sep=""),
dataset=data.frame(current=rnorm(10*4), voltage=rnorm(10*4)))
l3 <- list(batch="b3", sample=paste("s", 1:4, sep=""),
dataset=data.frame(current=rnorm(10*4), voltage=rnorm(10*4)))
list_to_df <- function(l, n=10){
m <- l[["dataset"]]
m$batch <- l[["batch"]]
m$sample <- rep(l[["sample"]], each=n)
m
}
## list_to_df(l1)
m <- ldply(list(l1, l2, l3), list_to_df)
ggplot(m) + facet_wrap(~batch)+
geom_path(aes(current, voltage, colour=sample))

Resources