R: generate legend from dataframe variables

R: generate legend from dataframe variables - r

I am trying to generate a legend in R with reference to the following post.
I have the following MWE, which more or less represents what I'm working with. dataframes a,b and c are generated over the course of a R script, with the colours. (there might be more, as the groups are generated by a loop)
a <- density(rnorm(100,mean = 5, sd = 1))
b <- density(rnorm(100,mean = 10, sd = 1))
c <- density(rnorm(100,mean = 7, sd = 1))
plot(c,col = "#FFCC00FF")
lines(b, col = "#FF6600FF")
lines(a, col = "#FF0000FF")
legendDataFrame <- data.frame(Group = c("A","B","C"), Colour = c("#FF0000FF","#FF6600FF", "#FFCC00FF"))
legend("topleft",legend=unique(legendDataFrame$Group), pch=1, col=unique(legendDataFrame$Colour))
print(legendDataFrame)
but, i get the image like this, with incorrect colours.. suggestions?

try this:
legendDataFrame <- data.frame(stringsAsFactors=FALSE, Group = c("A","B","C"), Colour = c("#FF0000FF","#FF6600FF", "#FFCC00FF"))
P.S.
I smashed my head on data.frame(stringsAsFactors=TRUE) at least 1000 times. And I'm in good company:
http://r.789695.n4.nabble.com/stringsAsFactors-FALSE-td921891.html
http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/
http://adv-r.had.co.nz/Data-structures.html

Instead of explicitly listing the colors, you can also try this if you want to maintain the dynamic functions:
legend("topleft",
legend=unique(legendDataFrame$Group),
pch=1,
col=as.vector(unique(legendDataFrame$Colour)))
It adds as.vector to convert the factor (unique(legendDataFrame$Colour)) into a vector.

Related

Plotting spectral data in one plot using R

I'm having multiple data frames where the first column (in the end filled with NA's) is the wavenumber and the other columns are my variables of the specific wavenumber for multiple observations.
Is there a possibility to plot the columns in a way that my first column holds the variables for the x-axis and the other are plotted into one big plot with their respective y-values?
I already tried "matplot" (resulting in "numbers" instead of points),
matplot(df[,1],df[,3:5],xlab = "Wavelength [nm]", ylab = "Absorbance")
different sets of "xyplot" (no possibility to give more than one y-value), but none seem to work (on my level of knowledge on R).
The final result should look like this:
Thanks for any help!

You could always make your own function to do this ;I make such functions on a regular basis when nothing really fits my needs.
I put this together rather quickly but you can adapt it to your needs.
# generate data
set.seed(6)
n <- 50
dat <- data.frame(x1=seq(1,100, length.out = n),
x2=seq(1,20, length.out = n)+rnorm(n),
x3=seq(1,20, length.out = n)+rnorm(n, mean = 3),
x4=seq(1,20, length.out = n)+rnorm(n, mean = 5))
# make some NAs at the end
dat[45:n,2] <- NA
dat[30:n,3] <- NA
plot_multi <- function(df, x=1, y=2, cols=y,
xlim=range(df[,x], na.rm = T),
ylim=range(df[,y], na.rm = T),
main="", xlab="", ylab="", ...){
# setup plot frame
plot(NULL,
xlim=xlim,
ylim=ylim,
main=main, xlab=xlab, ylab=ylab)
# plot all your y's against your x
pb <- sapply(seq_along(y), function(i){
points(df[,c(x, y[i])], col=cols[i], ...)
})
}
plot_multi(dat, y=2:4, type='l', lwd=3, main = ":)",
xlab = "Wavelength", ylab = "Absorbance")
Results in :
EDIT
I actually found your dataset online by chance, so I'll include how to plot it as well using my code above.
file <- 'http://openmv.net/file/tablet-spectra.csv'
spectra <- read.csv(file, header = FALSE)
# remove box label
spectra <- spectra[,-1]
# add the 'wavelength' and rotate the df
# (i didn't find the actual wavelength values, but hey).
spectra <- cbind(1:ncol(spectra), t(spectra))
plot_multi(spectra, y=2:ncol(spectra), cols = rainbow(ncol(spectra)),
type='l', main=":))", ylab="Absorbance", xlab = "'Wavelength'")

You could use the pavo R package, which is made to deal with spectral data (full disclosure, I'm one of the maintainers):
library(pavo)
df <- t(read.csv("http://openmv.net/file/tablet-spectra.csv", header = FALSE))
df <- df[-1, ]
df <- apply(df, 2, as.numeric)
df <- cbind(wl = seq_len(nrow(df)),
df)
df <- as.rspec(df)
#> wavelengths found in column 1
plot(df, ylab = "Absorbance", col = rainbow(3))
Created on 2019-07-26 by the reprex package (v0.3.0)

T-Test For Genes using Apply Function in Dataframe

I’m trying to run a t.test on two data frames.
The dataframes (which I carved out from a data.frame) has the data I need to rows 1:143. I’ve already created sub-variables as I needed to calculate rowMeans.
> c.mRNA<-rowMeans(c007[1:143,(4:9)])
> h.mRNA<-rowMeans(c007[1:143,(10:15)])
I’m simply trying to run a t.test for each row, and then plot the p-values as histograms. This is what I thought would work…
Pvals<-apply(mRNA143.data,1,function(x) {t.test(x[c.mRNA],x[h.mRNA])$p.value})
But I keep getting an error?
Error in t.test.default(x[c.mRNA], x[h.mRNA]) :
not enough 'x' observations
I’ve got something off in my syntax and cannot figure it out for the life of me!
EDIT: I've created a data.frame so it's now just two columns, I need a p-value for each row. Below is a sample of my data...
c.mRNA h.mRNA
1 8.224342 8.520142
2 9.096665 11.762597
3 10.698863 10.815275
4 10.666233 10.972130
5 12.043525 12.140297
I tried this...
pvals=apply(mRNA143.data,1,function(x) {t.test(mRNA143.data[,1],mRNA143.data[, 2])$p.value})
But I can tell from my plot that I'm off (the plots are in a straight line).

A reproducible example would go a long way. In preparing it, you might have realized that you are trying to subset columns based on mean, which doesn't make sense, really.
What you want to do is go through rows of your data, subset columns belonging to a certain group, repeat for the second group and pass that to t.test function.
This is how I would do it.
group1 <- matrix(rnorm(50, mean = 0, sd = 2), ncol = 5)
group2 <- matrix(rnorm(50, mean = 5, sd = 2), ncol = 5)
xy <- cbind(group1, group2)
# this is just a visualization of the test you're performing
plot(0, 0, xlim = c(-5, 11), ylim = c(0, 0.25), type = "n")
curve(dnorm(x, mean = 5, sd = 2), add = TRUE)
curve(dnorm(x, mean = 0, sd = 2), add = TRUE)
out <- apply(xy, MARGIN = 1, FUN = function(x) {
# x is a vector, e.g. xy[i, ] or xy[1, ]
t.test(x = x[1:5], y = x[6:10])$p.value
})
out

How to avoid gaps due to missing values in matplot in R?

I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.

I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")

You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])

Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.

R, how to plot multiple plots from a multiple column table?

I have a table made of 10 rows and 6 columns, where each entry is a real value.
After the application of kmeans algorithm, I would like R to plot 6*(6-1) = 30 plots, in which each couple of rows is the axis in turn.
When I do it with the original data, everything works fine. But if I try to quantile-normalize the data, it does not work anymore and the system just shows the first couple plot.
Here are the data (data.csv):
chrName-chrStart-chrEnd,gm12878,h1-hesc,hela-s3,hepg2,huvec,k562
chr1-66660-66810,0,0,2.825,0.75,0,0.85
chr1-564520-564670,15.6356435644,4.5469879518,57.7813793103,130.2263636364,5.8088888889,101.680952381
chr1-568060-568210,17.9069767442,3.6970588235,15.962745098,34.8866666667,4.1,31.0394736842
chr1-568900-569050,41.7029411765,7.4568181818,28.3984615385,59.464957265,8.5194444444,44.6583333333
chr1-601040-601190,0.4,0.75,0.5333333333,0.4,0.3,0.3
chr1-662500-662650,0,3.45,0.25,63,0.9923076923,5.7469879518
chr1-714040-714190,115.0871428571,125.6707142857,80.8081632653,153.9737931034,70.0197080292,166.5101351351
chr1-730400-730550,1.3730769231,0,0,0.9,7.6690140845,0.76
chr1-753400-753550,1.3517241379,4.1,0.4818181818,0,0.3,1.4285714286
chr1-762820-762970,43.6430769231,17.875,21.2659574468,123.1888888889,14.5743589744,56.7931034483
Here's my working code:
dnaseSignalFile = "data.csv"
originalDataTab <- read.csv(dnaseSignalFile, header=TRUE, sep=",")
originalDataTabSubMatrixChromSel_onlyData <- originalDataTab [,2:7]
cl0 <- kmeans(originalDataTabSubMatrixChromSel_onlyData , 2)
plot(originalDataTabSubMatrixChromSel_onlyData , col = cl0$cluster)
points(cl0$centers, col = 1:2, pch = 8, cex = 2)
It then correctly shows this image:
And that's fine!
But if I tried to run a quantile-normalization, things do not work anymore:
library("slam"); library("preprocessCore"); library("nnet");
normQuant<- normalize.quantiles(as.matrix(originalDataTabSubMatrixChromSel_onlyData), copy=TRUE)
roundNormQuant <- round(normQuant)
roundNormQuantTab <- as.data.frame(roundNormQuant)
colnames(roundNormQuantTab) <- colnames(originalDataTabSubMatrixChromSel_onlyData)
roundNormQuantTab <- normQuant
colnames(roundNormQuantTab) <- colnames(originalDataTabSubMatrixChromSel_onlyData)
rownames(roundNormQuantTab) <- rownames(originalDataTabSubMatrixChromSel_onlyData)
dev.new()
cl <- kmeans(roundNormQuantTab, 2)
plot(roundNormQuantTab, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)
'Cause the only thing that I see is the following picture:
Why can't I get the six plots in the second case, too?
What's different between the former case and the latter one?
How could I solve this problem?

R - If else statement within for loop

I have a data frame with 3 columns of data that I would like to plot separately - 3 plots. The data has NA in it (in different places within the 3 columns). I basically want to interpolate the missing values and plot that segment of the line (multiple sections) in red and the remainder of the line black.
I have managed to use 'zoo' to create the interpolated data but am unsure how then to plot this data point a different colour. I have found the following Elegant way to select the color for a particular segment of a line plot?
but was thinking I could use a for loop with if else statement to create the colour column as advised in the link - I would need 3 separate colour columns as I have 3 datasets.
Appreciate any help - cannot really provide an example as I'm unsure where to start! Thanks

This is my solution. It assumes that the NAs are still present in the original data. These will be omitted in the first plot() command. The function then loops over just the NAs.
You will probably get finer control if you take the plot() command out of the function. As written, "..." gets passed to plot() and a type = "b" graph is mimicked - but it's trivial to change it to whatever you want.
# Function to plot interpolated valules in specified colours.
PlotIntrps <- function(exxes, wyes, int_wyes, int_pt = "red", int_ln = "grey",
goodcol = "darkgreen", ptch = 20, ...) {
plot(exxes, wyes, type = "b", col = goodcol, pch = ptch, ...)
nas <- which(is.na(wyes))
enn <- length(wyes)
for (idx in seq(nas)) {
points(exxes[nas[idx]], int_wyes[idx], col = int_pt, pch = ptch)
lines(
x = c(exxes[max(nas[idx] - 1, 1)], exxes[nas[idx]],
exxes[min(nas[idx] + 1, enn)]),
y = c(wyes[max(nas[idx] - 1, 1)], int_wyes[idx],
wyes[min(nas[idx] + 1, enn)]),
col = int_ln, type = "c")
# Only needed if you have 2 (or more) contiguous NAs (interpolations)
wyes[nas[idx]] <- int_wyes[idx]
}
}
# Dummy data (jitter() for some noise)
x_data <- 1:12
y_data <- jitter(c(12, 11, NA, 9:7, NA, NA, 4:1), factor = 3)
interpolations <- c(10, 6, 5)
PlotIntrps(exxes = x_data, wyes = y_data, int_wyes = interpolations,
main = "Interpolations in pretty colours!",
ylab = "Didn't manage to get all of these")
Cheers.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: generate legend from dataframe variables - r

Related

Plotting spectral data in one plot using R

T-Test For Genes using Apply Function in Dataframe

How to avoid gaps due to missing values in matplot in R?

R, how to plot multiple plots from a multiple column table?

R - If else statement within for loop

Categories

Resources