R - If else statement within for loop - r

I have a data frame with 3 columns of data that I would like to plot separately - 3 plots. The data has NA in it (in different places within the 3 columns). I basically want to interpolate the missing values and plot that segment of the line (multiple sections) in red and the remainder of the line black.
I have managed to use 'zoo' to create the interpolated data but am unsure how then to plot this data point a different colour. I have found the following Elegant way to select the color for a particular segment of a line plot?
but was thinking I could use a for loop with if else statement to create the colour column as advised in the link - I would need 3 separate colour columns as I have 3 datasets.
Appreciate any help - cannot really provide an example as I'm unsure where to start! Thanks

This is my solution. It assumes that the NAs are still present in the original data. These will be omitted in the first plot() command. The function then loops over just the NAs.
You will probably get finer control if you take the plot() command out of the function. As written, "..." gets passed to plot() and a type = "b" graph is mimicked - but it's trivial to change it to whatever you want.
# Function to plot interpolated valules in specified colours.
PlotIntrps <- function(exxes, wyes, int_wyes, int_pt = "red", int_ln = "grey",
goodcol = "darkgreen", ptch = 20, ...) {
plot(exxes, wyes, type = "b", col = goodcol, pch = ptch, ...)
nas <- which(is.na(wyes))
enn <- length(wyes)
for (idx in seq(nas)) {
points(exxes[nas[idx]], int_wyes[idx], col = int_pt, pch = ptch)
lines(
x = c(exxes[max(nas[idx] - 1, 1)], exxes[nas[idx]],
exxes[min(nas[idx] + 1, enn)]),
y = c(wyes[max(nas[idx] - 1, 1)], int_wyes[idx],
wyes[min(nas[idx] + 1, enn)]),
col = int_ln, type = "c")
# Only needed if you have 2 (or more) contiguous NAs (interpolations)
wyes[nas[idx]] <- int_wyes[idx]
}
}
# Dummy data (jitter() for some noise)
x_data <- 1:12
y_data <- jitter(c(12, 11, NA, 9:7, NA, NA, 4:1), factor = 3)
interpolations <- c(10, 6, 5)
PlotIntrps(exxes = x_data, wyes = y_data, int_wyes = interpolations,
main = "Interpolations in pretty colours!",
ylab = "Didn't manage to get all of these")
Cheers.

Related

How to change the color of outliers of certain category in boxplot()?

Put simply, I want to color outliers, but only if they belong to specific category, i.e. I want
boxplot(mydata[,2:3], col=c("chartreuse","gold"), outcol="red")
but red only for those elements for which mydata[,1] is M .
It appears that outcol only specifies one color per variable (box). However, you can use points to overplot individual points any way that you want. You need to figure out the relevant x and y coordinates to use for plotting. When you make a boxplot with a statement like boxplot(mydata[,2:3]) the first variable (column 2) is plotted at x=1 and the second variable (column 3) is plotted at x=2. By capturing the return value of boxplot you can figure out the y values. Since you do not provide any data, I will illustrate with randomly generated data.
## Data
set.seed(42)
NumPts = 400
a = rnorm(NumPts)
b = rnorm(NumPts)
c = rnorm(NumPts)
CAT = sample(c("M", "N"), NumPts, replace=T)
mydata = data.frame(a,b,c, CAT)
## Find outliers
BP = boxplot(mydata[,2:3], col=c("chartreuse","gold"))
OUT2 = which(mydata[,2] %in% BP$out)
OUT3 = which(mydata[,3] %in% BP$out)
## Find outliers with category == M
M_OUT2 = OUT2[which(mydata$CAT[OUT2] == "M")]
M_OUT3 = OUT3[which(mydata$CAT[OUT3] == "M")]
## Plot desired points
points(rep(1, length(M_OUT2)),mydata[M_OUT2, 2], col="red")
points(rep(2, length(M_OUT3)),mydata[M_OUT3, 3], col="red")

How to avoid gaps due to missing values in matplot in R?

I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.
I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")
You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])
Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.

R: generate legend from dataframe variables

I am trying to generate a legend in R with reference to the following post.
I have the following MWE, which more or less represents what I'm working with. dataframes a,b and c are generated over the course of a R script, with the colours. (there might be more, as the groups are generated by a loop)
a <- density(rnorm(100,mean = 5, sd = 1))
b <- density(rnorm(100,mean = 10, sd = 1))
c <- density(rnorm(100,mean = 7, sd = 1))
plot(c,col = "#FFCC00FF")
lines(b, col = "#FF6600FF")
lines(a, col = "#FF0000FF")
legendDataFrame <- data.frame(Group = c("A","B","C"), Colour = c("#FF0000FF","#FF6600FF", "#FFCC00FF"))
legend("topleft",legend=unique(legendDataFrame$Group), pch=1, col=unique(legendDataFrame$Colour))
print(legendDataFrame)
but, i get the image like this, with incorrect colours.. suggestions?
try this:
legendDataFrame <- data.frame(stringsAsFactors=FALSE, Group = c("A","B","C"), Colour = c("#FF0000FF","#FF6600FF", "#FFCC00FF"))
P.S.
I smashed my head on data.frame(stringsAsFactors=TRUE) at least 1000 times. And I'm in good company:
http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-td921891.html
http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/
http://adv-r.had.co.nz/Data-structures.html
Instead of explicitly listing the colors, you can also try this if you want to maintain the dynamic functions:
legend("topleft",
legend=unique(legendDataFrame$Group),
pch=1,
col=as.vector(unique(legendDataFrame$Colour)))
It adds as.vector to convert the factor (unique(legendDataFrame$Colour)) into a vector.

R - visualising data over time

I'm trying to plot a dataset over time (timeframe of ms/s). I need to show the order of events, the type of event and the duration of each event + the time between events. The dataset consists of a start time, end time and category.
I got close with this code someone used to answer a similar question back in '11 but found that I couldn't get it to colour the events according to the category, and I don't understand what the code is doing well enough to fix the issue.
zucchini <- function(st, en, mingap=1)
{
i <- order(st, en-st);
st <- st[i];
en <- en[i];
last <- r <- 1
while( sum( ok <- (st > (en[last] + mingap)) ) > 0 )
{
last <- which(ok)[1];
r <- append(r, last);
}
if( length(r) == length(st) )
return( list(c = list(st[r], en[r]), n = 1 ));
ne <- zucchini( st[-r], en[-r]);
return(list( c = c(list(st[r], en[r]), ne$c), n = ne$n+1));
}
{
zu <- zucchini(st, en, mingap = 1);
plot.new();
plot.window( xlim=c(min(st), max(en)), ylim = c(0, zu$n+1));
box(); axis(1);
for(i in seq(1, 2*zu$n, 2))
{
x1 <- zu$c[[i]];
x2 <- zu$c[[i+1]];
for(j in 1:length(x1))
rect( x1[j], (i+1)/2, x2[j], (i+1)/2+0.5,col=data$Type, border="black",
);
legend('bottomright', legend = levels(data$Type), col = 1:10, cex = 0.8, pch = 1)}
}
st <- data$Time
en <- data$End
coliflore(st,en)
current code outputs this As best as I can tell it is assigning all boxes the same colour, that of the category of the first data point.
Does anyone know either: how to get this code to assign colours to the boxes based on a category, or how to accomplish this kind of plotting another way?
Its a little hard to for me to see whats going on without a toy dataset for your example. For maximum control over coloring in plots I like to add a color column to the dataframe or create a vector to store color values for use in plotting instead of using the factor levels to generate colors (eg data$Type). For instance if I want factors 1:3 to be red, green, and blue:
# create data frame with X,Y coordinates and 3 factor levels
toy_data<- data.frame (X= 1:9, Y=9:1, Factor = rep(1:3, times=3))
# create a vector of colors to use for plotting
# color function
colFxn<-function(val){
cw_df<-data.frame(value=1:3, color = c("red", "green", "blue"))
return(cw_df[cw_df$value %in% val,]$color)
}
col_vec<-sapply (toy_data$Factor, colFxn)
#plot
plot(toy_data$X, toy_data$Y, col=col_vec)
I prefer this option because of the control I have over my colors. This can also be expanded to transparent colors by changing the alpha value using the RGB function, or through using a color pallet available through many packages.

R, how to plot multiple plots from a multiple column table?

I have a table made of 10 rows and 6 columns, where each entry is a real value.
After the application of kmeans algorithm, I would like R to plot 6*(6-1) = 30 plots, in which each couple of rows is the axis in turn.
When I do it with the original data, everything works fine. But if I try to quantile-normalize the data, it does not work anymore and the system just shows the first couple plot.
Here are the data (data.csv):
chrName-chrStart-chrEnd,gm12878,h1-hesc,hela-s3,hepg2,huvec,k562
chr1-66660-66810,0,0,2.825,0.75,0,0.85
chr1-564520-564670,15.6356435644,4.5469879518,57.7813793103,130.2263636364,5.8088888889,101.680952381
chr1-568060-568210,17.9069767442,3.6970588235,15.962745098,34.8866666667,4.1,31.0394736842
chr1-568900-569050,41.7029411765,7.4568181818,28.3984615385,59.464957265,8.5194444444,44.6583333333
chr1-601040-601190,0.4,0.75,0.5333333333,0.4,0.3,0.3
chr1-662500-662650,0,3.45,0.25,63,0.9923076923,5.7469879518
chr1-714040-714190,115.0871428571,125.6707142857,80.8081632653,153.9737931034,70.0197080292,166.5101351351
chr1-730400-730550,1.3730769231,0,0,0.9,7.6690140845,0.76
chr1-753400-753550,1.3517241379,4.1,0.4818181818,0,0.3,1.4285714286
chr1-762820-762970,43.6430769231,17.875,21.2659574468,123.1888888889,14.5743589744,56.7931034483
Here's my working code:
dnaseSignalFile = "data.csv"
originalDataTab <- read.csv(dnaseSignalFile, header=TRUE, sep=",")
originalDataTabSubMatrixChromSel_onlyData <- originalDataTab [,2:7]
cl0 <- kmeans(originalDataTabSubMatrixChromSel_onlyData , 2)
plot(originalDataTabSubMatrixChromSel_onlyData , col = cl0$cluster)
points(cl0$centers, col = 1:2, pch = 8, cex = 2)
It then correctly shows this image:
And that's fine!
But if I tried to run a quantile-normalization, things do not work anymore:
library("slam"); library("preprocessCore"); library("nnet");
normQuant<- normalize.quantiles(as.matrix(originalDataTabSubMatrixChromSel_onlyData), copy=TRUE)
roundNormQuant <- round(normQuant)
roundNormQuantTab <- as.data.frame(roundNormQuant)
colnames(roundNormQuantTab) <- colnames(originalDataTabSubMatrixChromSel_onlyData)
roundNormQuantTab <- normQuant
colnames(roundNormQuantTab) <- colnames(originalDataTabSubMatrixChromSel_onlyData)
rownames(roundNormQuantTab) <- rownames(originalDataTabSubMatrixChromSel_onlyData)
dev.new()
cl <- kmeans(roundNormQuantTab, 2)
plot(roundNormQuantTab, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)
'Cause the only thing that I see is the following picture:
Why can't I get the six plots in the second case, too?
What's different between the former case and the latter one?
How could I solve this problem?

Resources