To minimize 3rd party package dependencies & reserve the ability to parallelize the code; this reproduceable example below is intended to create png images for each row step of a plot using R's Base graphics (no Tidyverse or GGPlot).
It, however, produces the entire series for each image, & not the intended iterative build required:
#
setwd("///images")
data(mtcars) # load DF
frames = 50 # set image qty rate
for(i in 1:frames){
# creating a name for each plot file with leading zeros
if (i < 10) {name = paste('000',i,'plot.png',sep='')}
if (i < 100 && i >= 10) {name = paste('00',i,'plot.png', sep='')}
if (i >= 100) {name = paste('0', i,'plot.png', sep='')}
png(name)
# plot(mtcars$mpg,type="l")
plot(mtcars$mpg)
dev.off()
}
my_cmd <- 'convert *.png -delay 5 -loop 5 mpg.gif'
system(my_cmd)
#
My own attempts to unsuccessfully resolve the issue include:
1) Remove the frame iteration & used nrows (mtcars) as the loop controlling agent?
2) Reference the row index somehow for each plot call?
3) Insert a sleep() call inside the loop after each plot?
4) Use the apply() function instead of a loop?
Any pointers or alternative coding to be more R efficient to make this work as intended?
Thanks.
This code will create one .png file for series of plots where each successive plot has one additional point on it:
# load data
data(mtcars)
# specify number of files to create (one per row of mtcars)
frames <- nrow(mtcars)
# figure out how many leading zeros will be needed in filename
ndigits <- nchar(as.character(frames))
for(i in 1:frames){
# name each file
zeros <- ndigits - nchar(as.character(i))
ichar <- paste0(strrep('0',zeros), i)
name <- paste0(ichar, 'plot.png')
# plot as .png
png(filename = name)
plot(x=1:i, y=mtcars$mpg[1:i], pch=20, col="blue",
xlim=c(0,frames), ylim=range(mtcars$mpg))
dev.off()
}
Related
Summary: Despite a complicated lead-up, the solution was very simple: In order to plot a row of a dataframe as a line instead of a lattice, I needed to transpose the data in order to invert from x obs of y variables to y obs of x variables.
I am using RStudio on a Windows 10 computer.
I am using scientific equipment to write measurements to a csv file. Then I ZIP several files and read to R using read.csv. However, the data frame behaves strangely. Commands "length" and "dim" disagree and the "plot" function throws errors. Because I can create simulated data that doesn't throw the errors, I think the problem is either in how the machine wrote the data or in my loading and processing of the data.
Two ZIP files are located in my stackoverflow repository (with "Monterey Jack" in the name):
https://github.com/baprisbrey/stackoverflow
Here is my code for reading and processing them:
# Unzip the folders
unZIP <- function(folder){
orig.directory <- getwd()
setwd(folder)
zipped.folders <- list.files(pattern = ".*zip")
for (i in zipped.folders){
unzip(i)}
setwd(orig.directory)
}
folder <- "C:/Users/user/Documents/StackOverflow"
unZIP(folder)
# Load the data into a list of lists
pullData <- function(folder){
orig.directory <- getwd()
setwd(folder)
#zipped.folders <- list.files(pattern = ".*zip")
#unzipped.folders <- list.files(folder)[!(list.files(folder) %in% zipped.folders)]
unzipped.folders <- list.dirs(folder)[-1] # Removing itself as the first directory.
oData <- vector(mode = "list", length = length(unzipped.folders))
names(oData) <- str_remove(unzipped.folders, paste(folder,"/",sep=""))
for (i in unzipped.folders) {
filenames <- list.files(i, pattern = "*.csv")
#setwd(paste(folder, i, sep="/"))
setwd(i)
files <- lapply(filenames, read.csv, skip = 5, header = TRUE, fileEncoding = "UTF-16LE") #Note unusual encoding
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- vector(mode="list", length = length(files))
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- files
}
setwd(orig.directory)
return(oData)
}
theData <- pullData(folder) #Load the data into a list of lists
# Process the data into frames
bigFrame <- function(bigList) {
#where bigList is theData is the result of pullData
#initialize the holding list of frames per set
preList <- vector(mode="list", length = length(bigList))
names(preList) <- names(bigList)
# process the data
for (i in 1:length(bigList)){
step1 <- lapply(bigList[[i]], t) # transpose each data
step2 <- do.call(rbind, step1) # roll it up into it's own matrix #original error that wasn't reproduced: It showed length(step2) = 24048 when i = 1 and dim(step2) = 48 501. Any comments on why?
firstRow <- step2[1,] #holding onto the first row to become the names
step3 <- as.data.frame(step2) # turn it into a frame
step4 <- step3[grepl("µA", rownames(step3)),] # Get rid of all those excess name rows
rownames(step4) <- 1:(nrow(step4)) # change the row names to rowID's
colnames(step4) <- firstRow # change the column names to the first row steps
step4$ID <- rep(names(bigList[i]),nrow(step4)) # Add an I.D. column
step4$Class[grepl("pos",tolower(step4$ID))] <- "Yes" # Add "Yes" class
step4$Class[grepl("neg",tolower(step4$ID))] <- "No" # Add "No" class
preList[[i]] <- step4
}
# bigFrame <- do.call(rbind, preList) #Failed due to different number of measurements (rows that become columns) across all the data sets
# return(bigFrame)
return(preList) # Works!
}
frameList <- bigFrame(theData)
monterey <- rbind(frameList[[1]],frameList[[2]])
# Odd behaviors
dim(monterey) #48 503
length(monterey) #503 #This is not reproducing my original error of length = 24048
rowOne <- monterey[1,1:(ncol(monterey)-2)]
plot(rowOne) #Error in plot.new() : figure margins too large
#describe the data
quantile(rowOne, seq(0, 1, length.out = 11) )
quantile(rowOne, seq(0, 1, length.out = 11) ) %>% plot #produces undesired lattice plot
# simulate the data
doppelganger <- sample(1:20461,501,replace = TRUE)
names(doppelganger) <- names(rowOne)
# describe the data
plot(doppelganger) #Successful scatterplot. (With my non-random data, I want a line where the numbers in colnames are along the x-axis)
quantile(doppelganger, seq(0, 1, length.out = 11) ) #the random distribution is mildly different
quantile(doppelganger, seq(0, 1, length.out = 11) ) %>% plot # a simple line of dots as desired
# investigating structure
str(rowOne) # results in a dataframe of 1 observation of 501 variables. This is a correct interpretation.
str(as.data.frame(doppelganger)) # results in 501 observations of 1 variable. This is not a correct interpretation but creates the plot that I want.
How do I convert the rowOne to plot like doppelganger?
It looks like one of my errors is not reproducing, where calls to "dim" and "length" apparently disagree.
However, I'm confused as to why the "plot" function is producing a lattice plot on my processed data and a line of dots on my simulated data.
What I would like is to plot each row of data as a line. (Next, and out of the scope of this question, is I would like to classify the data with adaboost. My concern is that if "plot" behaves strangely then the classifier won't work.)
Any tips or suggestions or explanations or advice would be greatly appreciated.
Edit: Investigating the structure with ("str") of the two examples explains the difference between plots. I guess my modified question is, how do I switch between the two structures to enable plotting a line (like doppelganger) instead of a lattice (like rowOne)?
I am answering my own question.
I am leaving behind the part about the discrepancy between "length" and "dim" since I can't provide a reproducible example. However, I'm happy to leave up for comment.
The answer is that in order to produce my plot, I simply have to transpose the row as follows:
rowOne %>% t() %>% as.data.frame() %>% plot
This inverts the structure from one observation of 501 variables to 501 obs of one variable as follows:
rowOne %>% t() %>% as.data.frame() %>% str()
#'data.frame': 501 obs. of 1 variable:
# $ 1: num 8712 8712 8712 8712 8712 ...
Because of the unusual encoding I used, and the strange "length" result, I failed to see a simple solution to my "plot" problem.
My pipeline reads in a csv to a dataframe, assigns rownames, removes a column, performs a pca, plots the pca and extracxts the meaningful variables from the pca which are also plotted.
Here is my current code, which only goes as far as the first plot:
library(ggplot2)
library(ggrepel)
tsv = read.csv('matrix.tsv', sep='\t')
bell= read.csv('bell.tsv', sep='\t')
tail= read.csv('tail.tsv', sep='\t')
dfList = list(tail, tsv, bell)
#process csv's
dfList = lapply(dfList, function(dum){
rownames(dum) = dum[,1]
dum[,1] = NULL
dum$X = NULL
dum = dum[, -grep('un', colnames(dum))]
})
#create pca's of dataframes
pcaList = lapply(dfList, function(pca){
prin_comp = prcomp(pca, scale. = T)
})
#plot top 2 principle components in the pca
plotList = lapply(pcaList, function(prin_comp){
t = qplot(x=prin_comp$rotation[,1], y=prin_comp$rotation[,2]) + geom_text_repel(aes(label=row.names(prin_comp$rotation)))
})
#this plots the 3 plots, one for each pca, but they are un-named
plotList
The problem is that the plots don't have meaningful names/titles. I don't know how to keep that information present, passed from function to function.
I know there must be a more elegant way of doing this. And I have spent a day reading similar and not so similar questions regarding processing multiple csv files. But either they weren't applicable or didn't work for my case.
And as the title of this question implies, I would prefer to do this on one csv at a time, not all 3 at a time, as the csv's in question are very large, over 5GB each, so keeping each dataframe and pca in memory at the same time is impossible.
You just need to keep a string you want to use as the title somewhere and add ggtitle(YOUR_TITLE) to your plot, but this is not so easy with your current code. Instead of performing each step of the analysis for each CSV before going to the next step, why don't you just perform all steps for one CSV at a time?
Your code could look like:
library(ggplot2)
library(ggrepel)
csvs <- c("matrix.tsv","bell.tsv","tail.tsv")
for (i in csvs) {
# read file
df <- read.csv(i, sep='\t')
# process file
rownames(df) <- df[,1]
df[,1] <- NULL
df$X = NULL
df = df[, -grep('un', colnames(df))]
# create pca
pca <- prcomp(df, scale = T)
# plot pca
pcaPlot <- qplot(x=pca$rotation[,1], y=pca$rotation[,2]) +
geom_text_repel(aes(label=row.names(pca$rotation))) +
ggtitle(i)
print(pcaPlot)
# extract and plot meaningful variables
# ...
}
Basically I just put everything you do in a lapply call inside of a for loop, this approach also does the processing for one CSV at the time.
I have a list, which contains 75 matrix with their names, and I want to do a plot for each matrix, and save each plot with the name that the matrix have.
My code do the plots with a loop and it works, I get 75 correct plots, but the problem is that the name of the plot file is like a vector "c(99,86,94....)",too long and I don´t know which one is.
I´m ussing that code, probably isn´t the best. I´m a beginner, and I have been looking for a solution one week, but it was impossible.
for (i in ssamblist) {
svg(paste("Corr",i,".svg", sep=""),width = 45, height = 45)
pairs(~CDWA+CDWM+HI+NGM2+TKW+YIELD10+GDD_EA,
data=i,lower.panel=panel.smooth, upper.panel=panel.cor,
pch=0, main=i)
dev.off()}
How put to a each plot his name?.
I try change "i" for names(i), but the name was the name of the first column,and only creates one plot. I try to do it with lapply but I could't.
PS: the plots are huge, and I have to expand the margins. I´m using Rstudio.
Thank you¡
Using for loop or apply:
# dummy data
ssamblist <- list(a = mtcars[1:10, 1:4], b = mtcars[11:20, 1:4], c = mtcars[21:30, 1:4])
# using for loop
for(i in names(ssamblist)) {
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()}
# using apply
sapply(names(ssamblist), function(i){
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()})
I have a dataframe data with information on tiffs, including one column txt describing the content of the tiff. Unfortunately, txt is not always correct and we need to correct them by hand. Therefore I want to loop over each row in data, show the tiff and ask for feedback, which is than put into data$txt.cor.
setwd(file.choose())
Some test tiffs (with nonsene inside, but to show the idea...):
txt <- sample(100:199, 5)
for (i in 1:length(txt)){
tiff(paste0(i, ".tif"))
plot(txt[i], ylim = c(100, 200))
dev.off()
}
and the dataframe:
pix.files <- list.files(getwd(), pattern = "*.tif", full.names = TRUE)
pix.file.info <- file.info(pix.files)
data <- cbind(txt, pix.file.info)
data$file <- row.names(pix.file.info)
data$txt.cor <- ""
data$txt[5] <- 200 # wrong one
My feedback function (error handling stripped):
read.number <- function(){
n <- readline(prompt = "Enter the value: ")
n <- as.character(n) #Yes, character. Sometimes we have alphanumerical data or leading zeros
}
Now the loop, for which help would be very much appreciated:
for (i in nrow(data)){
file.show(data[i, "file"]) # show the image file
data[i, "txt.cor"] <- read.number() # aks for the feedback and put it back into the dataframe
}
In my very first attempts I was thinking of the plot.lm idea, where you go through the diagnostic plots after pressing return. I suspect that plot and tiffs are not big friends. file.show turned out to be easier. But now I am having a hard time with that loop...
Your problem is that you don't loop over the data, you only evaluate the last row. Simply write 1:nrow(data)to iterate over all rows.
To display your tiff images in R you can use the package rtiff:
library(rtiff)
for (i in 1:nrow(data)){
tif <- readTiff(data[i,"file"]) # read in the tiff data
plot(tif) # plot the image
data[i, "txt.cor"] <- read.number() # aks for the feedback and put it back into the dataframe
}
I've made a loop to create multiple boxplots. The thing is, I want to save all the boxplots without overwriting each other. Any suggestions?
This is my current code:
boxplot <- list()
for (x in 1:nrow(checkresults)){
boxplots <- boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x], EV[,x],
main=colnames(PIM)[x],
xlab="PIM, MYC, OBX, WDR, EV")
}
Do you want to save them in some files, or save them to be able to look at them in different windows ?
If it is the first case, you can use a png, pdf or whatever function call inside your for loop :
R> for (i in 1:5) {
R> png(file=paste("plot",i,".png",sep=""))
R> plot(rnorm(10))
R> dev.off()
R> }
If you want to display them in separate windows, just use dev.new :
R> for (i in 1:5) {
R> dev.new()
R> plot(rnorm(10));
R> }
Just to add to #juba's answer, if you want to save the plots to a multi-page pdf file, then you don't have to use the paste command that #juba suggested. This
pdf("myboxplots.pdf")
for (x in seq_along(boxplots)){
boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x],EV[,x],
main = colnames(PIM)[x],
xlab = "PIM, MYC, OBX, WDR, EV")
}
dev.off()
creates a single multi-page pdf document, where each page is a boxplot. If you want to store the boxplots in separate pdf documents, then use the file=paste command.
First, create a list of the right length - it just makes things easier and is good practice to allocate storage before filling objects in via a loop:
boxplots <- vector(mode = "list", length = nrow(checkresults))
Then we can loop over the data you want, assigning to each component of the boxplots list as we go, using the [[x]] notation:
for (x in seq_along(boxplots)){
boxplots[[x]] <- boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x],EV[,x],
main = colnames(PIM)[x],
xlab = "PIM, MYC, OBX, WDR, EV")
}
Before, your code was overwriting the previous boxplot info during subsequent iterations.