Manipulating contrast etc within a vector of colours - r

I'm seeking any efficient way to perform simple manipulations on colour vectors in R, such as brightness and contrast. I have a hacky method that converts hex-string to numerical values and adjusts these by, for example, increasing/decreasing values for lightness, or rescaling them for lightness contrast, before converting back to hex. It works but is too slow to run interactively, and I can't see any libraries (e.g. colorspace) that have this functionality. Does anyone know of an alternative method? Thanks in advance.
This illustrates the flow for darkening (the simplest manipulation):
cols = rainbow(100)
d = data.frame(x = 1:100, y0=rep(0,100), y1=rep(100,100))
plot_cols = function(colours){
plot.new(); plot.window(xlim=c(0,100), ylim=c(0,100))
segments(d$x, d$y0, d$x, d$y1, col=colours, lwd=5)
}
plot_cols(cols)
cols_num = t(col2rgb(cols))/255
plot_cols( rgb(cols_num * .5) )
Contrast effects require standard deviation sd which is I think the bottleneck.

Related

uniroot gives multiple answers to equation with 1 unknown

I want to create a column in a data frame in which each row is the solution to an equation with 1 unknown (x). The other variables in the equation are provided in the other columns. In another Stack Overflow question, #flodel provided a solution, which I have tried to adapt. However, the output data frame omits some observations entirely, and others have "duplicates" with two different solutions to the same equation.
Sample of my data frame:
Time
id
V1
V2
V3
V4
199304
79330
259.721
224.5090
0.040140442
0.08100474
201004
77520
5062.200
3245.6921
0.037812662
0.08509553
196804
23018
202.897
842.6852
0.154956206
0.12982818
197804
12319
181.430
341.4415
0.052389156
0.14196588
199404
18542
14807.000
16537.0873
-0.001394388
0.08758791
Code with the equation I want to solve. I have simplified the equation, but the issue relates to this simple equation too.
library(plyr)
library(rootSolve
set.seed(1)
df <- adply(df, 1, summarize,
x = uniroot.all(function(x) V1 * ((V4-V3)/(x-V3)) - V2,
interval = c(-10,10)))
How can I achieve this? If possible, it would be great to do this in an efficient manner, as my actual data frame has >1,000,000 rows
The previous answer by #StefanoBarbi was pointing in the right direction.
Here are the plots of the functions implied by each row of your example data frame, with the solution superimposed as a red vertical line (so that we can see that yes, you're right that there is a root in the interval ...) [code below]
The problem is that the algorithm underlying uniroot() is only guaranteed to find the root of a function that is continuous on the interval. Your functions have discontinuities/singularities. (Even for a continuous function I'm sure that the algorithm could be broken with a function that was sufficiently weird to cause problems with floating-point math ...)
Even a bisection algorithm, which is more robust than Brent's method (the algorithm underlying uniroot) since it makes fewer assumptions about continuity of the derivative, could easily fail on this kind of discontinuous function. (It could be made to work for a function that is discontinuous but monotonic, but your example is neither continuous nor monotonic ...)
Obviously your real problem is more complex than this (or you would just be using easy analytical solution you referred to); what this means is that you need to find some way to "tame" your function. In this example, if you rearrange the function to avoid dividing by x-V3 (but without completely solving the equation) then uniroot() should work ...
f1 <- function(L) with(L, (V1/V2)*(V4-V3) + V3)
f1(df[1,])
png("badfit.png")
par(mfrow = c(2,3), bty = "l", las = 1)
for (i in 1:nrow(df)) {
with(df[i,],
curve(V1 * ((V4-V3)/(x-V3)) - V2,
from = -10, to = 10,
ylab = "", xlab = ""))
abline(v=f1(df[i,]), col = 2)
abline(h=0, col = 4)
}
dev.off()

How to smooth a curve in R?

location diffrence<-c(0,0.5,1,1.5,2)
Power<-c(0,0.2,0.4,0.6,0.8,1)
plot(location diffrence,Power)
The guy which has written the paper said he has smoothed the curve using a weighted moving average with weights vector w = (0.25,0.5,0.25) but he did not explained how he did this and with which function he achieved that.i am really confused
Up front, as #MartinWettstein cautions, be careful in when you smooth data and what you do with it (infer from it). Having said that, a simple exponential moving average might look like this.
# replacement data
x <- seq(0, 2, len=5)
y <- c(0, 0.02, 0.65, 1, 1)
# smoothed
ysm <-
zoo::rollapply(c(NA, y, NA), 3,
function(a) Hmisc::wtd.mean(a, c(0.25, 0.5, 0.25), na.rm = TRUE),
partial = FALSE)
# plot
plot(x, y, type = "b", pch = 16)
lines(x, ysm, col = "red")
Notes:
the zoo:: package provides a rolling window (3-wide here), calling the function once for indices 1-3, then again for indices 2-4, then 3-5, 4-6, etc.
with rolling-window operations, realize that they can be center-aligned (default of zoo::rollapply) or left/right aligned. There are some good explanations here: How to calculate 7-day moving average in R?)
I surround the y data with NAs so that I can mimic a partial window. Normally with rolling-window ops, if k=3, then the resulting vector is length(y) - (k-1) long. I'm inferring that you want to include data on the ends, so the first smoothed data point would be effectively (0.5*0 + 0.25*0.02)/0.75, the second smoothed data point (0.25*0 + 0.5*0.02 + 0.25*0.65)/1, and the last smoothed data point (0.25*1 + 0.5*1)/0.75. That is, omitting the 0.25 times a missing data point. That's a guess and can easily be adjusted based on your real needs.
I'm using Hmisc::wtd.mean, though it is trivial to write this weighted-mean function yourself.
This is suggestive only, and not meant to be authoritative. Just to help you begin exploring your smoothing processes.

How to programmatically overlap arbitrary stat_functions in ggplot?

I am looking for a way to automatically plot an arbitrary number of stat_function objects in a single ggplot, each one with a different set of parameters, and coloring them.
Initially I thought of having one big data.table with a large number of samples from each distribution, each set associated with an index, and using geom_density, grouping and coloring by the index.
This is, however, very inefficient. There is, in my opinion, no need to spend time and memory to produce and keep large sets of values if we already have parameters that perfectly describe each distribution.
I present my initial solution below, but is there a more elegant and/or practical way of doing this?
distrData.dt <- data.table( Shape = c(2.1,2.2,2.3), Scale = c(1.1,1.2,1.3), time = c(1,2,3) )
ggplot(data.table(x=c(0:15)), aes(x)) +
apply(distrData.dt,1, FUN = function(x) stat_function(fun = dgamma,arg = list(shape=as.numeric(x[1]),scale=as.numeric(x[2])), mapping = aes_string(color=x[3]) ) ) +
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")
This is the current result:
It produces the main result, that is, it will plot as many "perfect" densities as the number of parameter sets you give it. However, I am not using aesthetics to pass parameters from the column names ("Shape" and "Scale") or to get the color of each line. As far as I understand, that is not possible, but is there another way?
First of all, your solution is absolutely fine to me: it does the job, and it does it elegantly. I just wanted to both expand on #joran's comment and show one useful trick that's called "function factory", which is perfectly suitable for a case like yours.
So I'm building a function that returns a function with fixed parameters. Note that using force prevents from shape and scale being lazily evaluated, that is necessary since we'll be using a for loop.
I'm using data.frame instead of data.table, but there shouldn't be a significant difference. That vector("list", n) construction is preallocating space for a list, as seen in ?list. I don't think it's obligatory in this particular case (significant overhead will appear for lenghts, say, >100, unlikely here), but it's always better to avoid iteratively growing objects, that's a bad practice.
As a last remark, check the stat_function call: it seems reasonably readable, at least you can see what's the mapping and what's related to dgamma parameters.
dgamma_factory <- function(shape, scale) {
force(shape)
force(scale)
function(x) dgamma(x, shape = shape, scale = scale)
}
l <- vector("list", nrow(distrData.dt))
for (i in seq.int(nrow(distrData.dt))) {
params <- distrData.dt[i, ]
l[[i]] <- stat_function(
fun = dgamma_factory(params$Shape, params$Scale),
mapping = aes_string(color = params$time))
}
ggplot(data.frame(x=c(0:15)), aes(x)) +
l +
scale_colour_gradient("Time Step", low="blue", high="red", space="Lab")

Graphing a polynomial output of calc.poly

I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)

Trying to determine why my heatmap made using heatmap.2 and using breaks in R is not symmetrical

I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. My matrix is symmetrical.
Here is a copy of the data-set I am using after it is run through pearson:DataSet
Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in my case pearson, then take that matrix and pass it to R and run the following code on it:
library(RColorBrewer);
library(gplots);
library(MASS);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
# location <- args[2];
# setwd(args[2]);
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
mycol <- c("blue","white","red")
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
#colors <- colorpanel(75,"midnightblue","mediumseagreen","yellow")
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
dev.off()
The issue I am having is once I use breaks to help me control the color separation the heatmap no longer looks symmetrical.
Here is the heatmap before I use breaks, as you can see the heatmap looks symmetrical:
Here is the heatmap when breaks are used:
I have played with the cutoff's for the sequences to make sure for instance one sequence does not end exactly where the other begins, but I am not able to solve this problem. I would like to use the breaks to help bring out the clusters more.
Here is an example of what it should look like, this image was made using cluster maker:
I don't expect it to look identical to that, but I would like it if my heatmap is more symmetrical and I had better definition in terms of the clusters. The image was created using the same data.
After some investigating I noticed was that after running my matrix through heatmap, or heatmap.2 the values were changing, for example the interaction taken from the provided data set of
Pacdh-2
and
pegg-2
gave a value of 0.0250313 before the matrix was sent to heatmap.
After that I looked at the matrix values using result$carpet and the values were then
-0.224333135
-1.09805379
for the two interactions
So then I decided to reorder the original matrix based on the dendrogram from the clustered matrix so that I was sure that the values would be the same. I used the following stack overflow question for help:
Order of rows in heatmap?
Here is the code used for that:
rowInd <- rev(order.dendrogram(result$rowDendrogram))
colInd <- rowInd
data_ordered <- matrix_a[rowInd, colInd]
I then used another program "matrix2png" to draw the heatmap:
I still have to play around with the colors but at least now the heatmap is symmetrical and clustered.
Looking into it even more the issue seems to be that I was running scale(matrix_a) when I change my code to just be mtscaled <- as.matrix(matrix_a) the result now looks symmetrical.
I'm certainly not the person to attempt reproducing and testing this from that strange data object without code that would read it properly, but here's an idea:
..., col=bluered(20)[4:20], ...
Here's another though which should return the full rand of red which tha above strategy would not:
shift.BR<- colorRamp(c("blue","white", "red"), bias=0.5 )((1:16)/16)
heatmap.2( ...., col=rgb(shift.BR, maxColorValue=255), .... )
Or you can use this vector:
> rgb(shift.BR, maxColorValue=255)
[1] "#1616FF" "#2D2DFF" "#4343FF" "#5A5AFF" "#7070FF" "#8787FF" "#9D9DFF" "#B4B4FF" "#CACAFF" "#E1E1FF" "#F7F7FF"
[12] "#FFD9D9" "#FFA3A3" "#FF6C6C" "#FF3636" "#FF0000"
There was a somewhat similar question (also today) that was asking for a blue to red solution for a set of values from -1 to 3 with white at the center. This it the code and output for that question:
test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
(But that would seem to be the wrong direction for the bias in your situation.)

Resources