Using multiple datasets for one graph - r

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.

As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Related

Advise a Chemist: Automate/Streamline his Voltammetry Data Graphing Code

I am a chemist dealing with a significant amount of voltammetry data recently. Let me be very clear and give some research information. I run scans from a starting voltage to an ending voltage on solid state conductive films. These scans are saved as .txt files (name scheme: run#.txt) in a single folder. I am looking at how conductance changes as temperature changes. The LINEST line plotting current v. voltage at a given temperature gives me a line with slope = conductance. Once I have the conductances (slopes) for each scan, I plot conductance v. temperature to see the temperature dependent conductance characteristics. I had been doing this in Excel, but have found quicker ways to get the job done using R. I am brand new to R (Rstudio) and recognize that my coding is not the best. Without doubt, this process can be streamlined and sped up which would help immensely. This is how I am performing the process currently:
# Set working directory with folder containing all .txt files for inspection
# Add all .txt files to the global environment
allruns<-list.files(pattern=".txt")
for(i in 1:length(allruns))assign(allruns[i],read.table(allruns[i]))
Since the voltage column (a 1x1000 matrix) is the same for all runs and is in column V1 of each .txt file, I assign a x to be the voltage column from the first folder
x<-run1.txt$V1
All currents (these change as voltage changes) are found in the V2 column of all the .txt files, so I assign y# to each. These are entered one at a time..
y1<-run1.txt$V2
y2<-run2.txt$V2
y3<-run3.txt$V2
# ...
yn<-runn.txt$V2
So that I can get the eqn for each LINEST (one LINEST for each scan and plotted with abline later). Again entered one at a time:
run1<-lm(y1~x)
run2<-lm(y2~x)
run3<-lm(y3~x)
# ...
runn<-lm(yn~x)
To obtain a single graph with all LINEST (one for each scan ) on the same plot, without the data points showing up, I have been using this pattern of coding to first get all data points on a single plot in separate series:
plot(x,y1,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y2,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y3,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
# ...
par(new=TRUE)
plot(x,yn,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
#To obtain all LINEST lines (one for each scan, on the single graph):
abline(run1,col=””, lwd=1)
abline(run2,col=””,lwd=1)
abline(run3,col=””,lwd=1)
# ...
abline(runn,col=””,lwd=1)
# Then to get each LINEST equation:
summary(run1)
summary(run2)
summary(run3)
# ...
summary(runn)
Each time I use summary(), I copy the slope and paste it into an Excel sheet- along with corresponding scan temp which I have recorded separately. I then graph the conductance v temp points for the film as X-Y scatter with smooth lines to give the temperature dependent conductance curve. Giving me a single LINEST lines plot in R and the conductance v temp in Excel.
This technique is actually MUCH quicker than doing it all in Excel, but it can be done much quicker and efficiently!!! Also, if I need to change something, this entire process needs to be reexecuted with whatever change is necessary. This process takes me maybe 5 hours in Excel and 1.5 hours in R (maybe I am too slow). Nonetheless, any tips to help automate/streamline this further are greatly appreciated.
There are plenty of questions about operating on data in lists; storing a list of matrix or a list of data.frame is fast, and code that operates cleanly on one can be applied to the remaining n-1 very easily.
(Note: the way I'm showing it here is one technique: maintaining everything in well-compartmentalized lists. Other will suggest -- very justifiably -- that combing things into a single data.frame and adding a group variable (to identify from which file/experiment the data originated) will help with more advanced multi-experiment regression or combined plotting, such as with ggplot2. I'm not going to go into this latter technique here, not yet.)
It is long decried not to do for(...) assign(..., read.csv(...)); you have the important part done, so this is relatively easy:
allruns <- sapply(list.files(pattern = "*.txt"), read.table, simplify = FALSE)
(The use of sapply(..., simplify=FALSE) is similar to lapply(...), but it has a nice side-effect of naming the individual list-ified elements with, in this case, each filename. It may not be critical here but is quite handy elsewhere.)
Extracting your invariant and variable data is simple enough:
allLMs <- lapply(allruns, function(mdl) lm(V2 ~ V1, data = mdl))
I'm using each table's V1 here instead of a once-extracted x ... though you might wonder why, I argue keeping it like for two reasons: (1) JUST IN CASE the V1 variable is ever even one-row-different, this will save you; (2) it is very easy to construct the model like this.
At this point, each object within allLMs is an lm object, meaning we might do:
summary(allLMs[[1]])
Plotting: I think I understand why you are using par=NEW, and I have to laugh ... I had been deep in R for a while before I started using that technique. What I think you need is actually much simpler:
xlim <- rev(range(allruns[[1]]$V1))
ylim <- range(sapply(allruns, `[`, "V2"))
# this next plot just sets the box and axes, no points
plot(NA, type = "na", xlim = xlim, ylim = ylim)
# no need to plot points with "transparent" ...
ign <- sapply(allLMs, abline, col = "") # and other abline options ...
Copying all models into Excel, again, using lists:
out <- do.call(rbind, sapply(allLMs, function(m) summary(m)$coefficients[,1]))
This will now be a single data.frame with all coefficients in two columns. (Feel free to use similar techniques to extract the other model summary attributes, including std err, t.value, or Pr(>|t|) (in the $coefficients); or $r.squared, $adj.r.squared, etc.)
write.csv(out, file="clipboard", sep="\t")
and paste into Excel. (Or, better yet, save it to a CSV file and import that, since you might want to keep it around.)
One of the tricks to using lists for this is to persevere: keep things in lists as long as you can, so that you don't have deal with models individually. One mantra is that if you do it once, you shouldn't have to type it again, just loop/apply/map/whatever. Don't extract too much from the lists before you have to.
Note: r2evans' answer provides good general advice and doesn't require heavy package dependencies. But it probably doesn't hurt to see alternative strategies.
The tidyverse can be quite handy for this sort of thing, here's a dummy example for illustration,
library(tidyverse)
# creating dummy data files
dummy <- function(T) {
V <- seq(-5, 5, length=20)
I <- jitter(T*V + T, factor = 1)
write.table(data.frame(V=V, I = I),
file = paste0(T,".txt"),
row.names = FALSE)
}
purrr::walk(300:320, dummy)
# reading
lf <- list.files(pattern = "\\.txt")
read_one <- function(f, ...) {cbind(T = as.numeric(gsub("\\.txt", "", f)), read.table(f, ...))}
m <- purrr::map_df(lf, read_one, header = TRUE, .id="id")
head(m)
ggplot(m, aes(V, I, group = T)) +
facet_wrap( ~ T) +
geom_point() +
geom_smooth(se = FALSE)
models <- m %>%
split(.$T) %>%
map(~lm(I ~ V, data = .))
coefs <- models %>% map_df(broom::tidy, .id = "T")
ggplot(coefs, aes(as.numeric(T), estimate)) +
geom_line() +
facet_wrap(~term, scales = "free")

How to put 2 boxplot in one graph in R without additional libraries?

I have this kind of dataset
Defect.found Treatment Program
1 Testing Counter
1 Testing Correlation
0 Inspection Counter
3 Testing Correlation
2 Inspection Counter
I would like to create two boxplotes, one boxplot of detected defects per program and one boxplot of detected defects per technique but in one graph.
Meaning having:
boxplot(exp$Defect.found ~ exp$Treatment)
boxplot(exp$Defect.found ~ exp$Program)
In a joined graph.
Searching on Stackoverflow I was able to create it but with lattice library typing:
bwplot(exp$Treatment + exp$Program ~ exp$Defects.detected)
but i would like to know if its possible to create the graph without additional libraries like ggplot and lattice
Prepare the plot window to receive two plots in one row and two columns (default is obviously one row and one column):
par(mfrow = c(1, 2))
My suggestion is to avoid using the word exp, because it is already used for the exponential function. Use for instance mydata.
Defects found against treatment (frame = F suppresses the external box):
with(mydata, plot(Defect.found ~ Treatment, frame = F))
Defects found against program (ylab = NA suppresses the y label because it is already shown in the previous plot):
with(mydata, plot(Defect.found ~ Program, frame = F, ylab = NA))

Issue: ggplot2 replicates last plot of a list in grid

I have some 16 plots. I want to plot all of these in grid manner with ggplot2. But, whenever I plot, I get a grid with all the plots same, i.e, last plot saved in a list gets plotted at all the 16 places of grid. To replicate the same issue, here I am providing a simple example with two files. Although data are entirely different, but plots drawn are similar.
library(ggplot2)
library(grid)
library(gridExtra)
library(scales)
set.seed(1006)
date1<- as.POSIXct(seq(from=1443709107,by=3600,to=1446214707),origin="1970-01-01")
power <- rnorm(length(date1),100,5)#with normal distribution
write.csv(data.frame(date1,power),"file1.csv",row.names = FALSE,quote = FALSE)
# Now another dataset with uniform distribution
write.csv(data.frame(date1,power=runif(length(date1))),"file2.csv",row.names = FALSE,quote = FALSE)
path=getwd()
files=list.files(path,pattern="*.csv")
plist<-list()# for saving intermediate ggplots
for(i in 1:length(files))
{
dframe<-read.csv(paste(path,"/",files[i],sep = ""),head=TRUE,sep=",")
dframe$date1= as.POSIXct(dframe$date1)
plist[[i]]<- ggplot(dframe)+aes(dframe$date1,dframe$power)+geom_line()
}
grid.arrange(plist[[1]],plist[[2]],ncol = 1,nrow=2)
You need to remove the dframe from your call to aes. You should do that anyway because you have provided a data-argument. In this case it's even more important because while you save the ggplot-object, things don't get evaluated until the call to plot/grid.arrange. When you do that, it looks at the current value of dframe, which is the last dataset in your iteration.
You need to plot with:
ggplot(dframe)+aes(date1,power)+geom_line()

Assigning "beanplot" object to variable in R

I have found that the beanplot is the best way to represent my data. I want to look at multiple beanplots together to visualize my data. Each of my plots contains 3 variables, so each one looks something like what would be generated by this code:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
beanplot(a, b ,c ,ylim = c(-4, 4), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
(Would have just included an image but my reputation score is not high enough, sorry)
I have 421 of these that I want to put into one long PDF (EDIT: One plot per page is fine, this was just poor wording on my part). The approach I have taken was to first generate the beanplots in a for loop and store them in a list at each iteration. Then I will use the multiplot function (from the R Cookbook page on multiplot) to display all of my plots on one long column so I can begin my analysis.
The problem is that the beanplot function does not appear to be set up to assign plot objects as a variable. Example:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
plot1 <- beanplot(a, b, ylim = c(-5,5), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
plot1
If you then type plot1 into the R console, you will get back two of the plot parameters but not the plot itself. This means that when I store the plots in the list, I am unable to graph them with multiplot. It will simply return the plot parameters and a blank plot.
This behavior does not seem to be the case with qplot for example which will return a plot when you recall the stored plot. Example:
library(ggplot2)
a <- rnorm(100)
b <- rnorm(100)
plot2 <- qplot(a,b)
plot2
There is no equivalent to the beanplot that I know of in ggplot. Is there some sort of workaround I can use for this issue?
Thank you.
You can simply open a PDF device with pdf() and keep the default parameter onefile=TRUE. Then call all your beanplot()s, one after the other. They will all be in one PDF document, each one on a separate page. See here.

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources