How to use an if/else statement to plot different colored lines within a plotting for-loop (in R) - r

I created a for loop to plot 38 lines (which are the rows of my matrix, results.summary.evap, and correspond to 38 total samples). I'd like to make these lines different colors, based on a characteristic pertaining to each sample: age. I can access the age in my input matrix: surp.data$Age_ka.
However, the matrix I am looping over (results.summary.evap) does not have sample age or sample name, though each sample should be located in the same rows for both surp.data and results.summary.evap.
Here is the for loop I created to plot 38 lines, one corresponding to each sample. In this case, results.summary.evap is what I am plotting from, and this matrix is derived from information in the surp.data input file.
par(mfrow=c(3,1))
par(mar=c(3, 4, 2, 0.5))
plot(NA,NA,xlim=c(0,10),ylim=c(0,2500), ylab = "Evaporation (mm/yr)", xlab = "Relative Humidity")
for(i in 1:range){
lines(rh.sens,results.summary.evap[i,])
}
```
I'd like to plot lines in different colors based on the age associated with each sample. I tried to incorporate an if/else statement into the loop, that would plot in a different color if the corresponding sample age was greater that 20 in the input file.
```
for(i in 1:range){
if surp.data$Age_ka[i]>20 {
lines(rh.sens,results.summary.evap[i,], col = 'red')
} else {
lines(rh.sens,results.summary.evap[i,], col = 'black')
}
}
This for loop won't run (due to issues with parentheses). I'm not sure if what I am doing if fundamentally wrong, or if i've just missed a parenthesis somewhere. I'm also not sure how to make this a bit more robust; for example, by plotting in 6-8 different colors based on age ranges, rather than just two.
Thank you!

You're missing parenthesis around your if statement
for(i in 1:range){
if(surp.data$Age_ka[i]>20){
lines(rh.sens,results.summary.evap[i,], col = 'red')
} else {
lines(rh.sens,results.summary.evap[i,], col = 'black')
}
}

Related

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

How to apply a chunk of code (not only a single function) to all columns in dataset

I would like to apply this chunk of code to each column in a dataset. I can run all columns individually, but it is tedious to make repeated code for 75 different columns and change all of the names in the code to match each column name. Is there a way that I can run all columns individually at once without making code for each column individually?
max.Width =lmer(mergeCowpeaTEST$max.Width ~ (1|Genotype) + (1|Year) + (1|Genotype:Year) + (1|Rep:Year), data=mergeCowpeaTEST,na.action = na.omit)
model.a_max.Width <-lmer(max.Width~ (1|Genotype) + (1|Year) + (1|Genotype:Year) + (1|Rep:Year), data=mergeCowpeaTEST)
alt.est.a_max.Width <- influence(model.a_max.Width, obs=TRUE)
cooks<-cooks.distance(alt.est.a_max.Width)
plot(alt.est.a_max.Width, which="cook", sort=FALSE,main="cook's distance plot of max.Width")
which(residuals(max.Width)>0.10)
which(residuals(max.Width)<(-0.10))
boxplot(residuals(max.Width))
myboxplot<-boxplot(residuals(max.Width))
myboxplot$out
hist(residuals(max.Width))
qqnorm(residuals(max.Width))
pdf("Widiv_max.Width_residual_graphs.pdf",height=8,width=10)
plot(fitted(max.Width),residuals(max.Width), xlab="Predicted values", ylab="Residuals", main="Residual Plot of widiv max.Width")
abline(h=0, col="red")
hist(resid(max.Width),main="histogram of max.Width residuals")
qqnorm(residuals(max.Width), main="Residuals Q-Q Plot");qqline(resid(max.Width))
qqnorm(ranef(max.Width)$Genotype$"(Intercept)", main="Genotypes Q-Q Plot"); qqline(ranef(max.Width)$Genotype$"(Intercept)")
qqnorm(ranef(max.Width)$"Genotype:Year"$"(Intercept)", main="Genotype by Year Q-Q Plot"); qqline(ranef(max.Width)$"Genotype:Year"$"(Intercept)")
plot(alt.est.a_max.Width, which="cook", sort=FALSE,main="cook's distance plot of max.Width")
dev.off()
The key to this is your describing it as "only" a single function. A single function can run an arbitrary amount of things. You can have it print something, then do something, then output something. Or do lots of things. Or play Global Geothermonuclear War. All in a single function.
apply( ChickWeight, 2, function(clmn) {
cat("Hi")
cat("Low")
cat("The only way to win is not to play at all")
} )

plotting specific points within a spreadsheet in r

currently I am plotting 2000 some lines on a single plot in r. I am using the data from a spreadsheet which i cannot disclose due to sensitive information, but I'll try to illistrate how it is arranged.
x1/x1/x1/x1/x1/x1/etc.
y1/y1/y1/y1/y1/y1/etc.
x2/x2/x2/x2/x2/x2/etc.
y2/y2/y2/y2/y2/y2/etc.
...
x4436/x4436/x4436/etc.
y4436/y4436/y4436/etc.
where each x1,y1 is a point on a separate line. I need to plot a point on the endpoint of each line and I cannot seem to get my code to work. Currently I am using this to generate the points:
for (k in (1:2218)*2) {
q <- unlist(e_web_clear[2*k])
w <- unlist(e_web_clear[716])
points(w, q, col = "lightblue")
}
the way I imagined it, it would loop back to each point in every other row, to get only the y value for each line, and it would take the values from only the last column of my data (column 716).
needless to say, it did not work as intended, any suggestions?
EDIT:
spreadsheet with just a small portion of values here
and the code used to generate the lines:
for (j in (1:2218)*2) {
x <- unlist(e_web_clear[2*j-1,])
y <- unlist(e_web_clear[2*j,])
lines(x,y,'l',lwd=.00000000001, col="black")
}
data was imported as text file
Edit2:
this is the graph i am getting.
the graph i want to get would have the endpoint of each line highlighted in light blue. i believe it should look something like this. http: / /imgur.com/13b9MZL
figured it out.
I edited my line loop to place a point on the last vector point of each line.
for (j in (1:2218)*2) {
x <- unlist(e_web_clear[2*j-1,])
y <- unlist(e_web_clear[2*j,])
lines(x,y,'l',lwd=.00000000001, col="black")
points(x[358], y[358], lwd = 1.5, cex = .1, col = "lightblue")
}

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources