R - Subtracting two smoothScatter plots - r

I have two smoothScatter plots and hope to subtract them. See Below:
par(mfrow=c(1,2))
set.seed(3)
x1 = rnorm(1000)
y1 = rnorm(1000)
smoothScatter(x1,y1,nrpoints=length(x1),cex=3)
x2 = rnorm(200)
y2 = rnorm(200)
smoothScatter(x2,y2,nrpoints=length(x2),cex=3,colramp=colorRampPalette(c("white","red")))
My hope is that I can produce a 3rd plot which is a colorful subtraction of the 1st plot from the 2nd plot. That is, there will be areas which are blue, red, and then if possible I'd like to make the overlapped areas gray. But I'd like the colors to be consistent with the new densities. For instance, the center of the new plot would be almost fully gray, whereas the outsides may have some gray but also patches of blue and red. Note that the two plots have different numbers of points. How could I do such a thing?
The only way I can think of doing this is to go pixel by pixel and subtract the colors from one plot to another. The problem is, I don't know how to grab the color intensities at each pixel to do this. However, even if I were to achieve this, white minus white would probably give black, which I wouldn't want.
Thanks in advance!

You might consider using slightly transparent colors
#helper function to make transparent ramps
alpharamp<-function(c1,c2, alpha=128) {stopifnot(alpha>=0 & alpha<=256);function(n) paste(colorRampPalette(c(c1,c2))(n), format(as.hexmode(alpha), upper.case=T), sep="")}
And then we can overplot the two graphs with
smoothScatter(x1,y1,nrpoints=length(x1),cex=3, colramp=alpharamp("white",blues9))
par(new=T)
smoothScatter(x2,y2,nrpoints=length(x2),cex=3,colramp= alpharamp("white","red"), axes=F, ann=F)
Here's that this code produces.
If, you still want to get to the actual color values in the plot, that's actually a bit tricky. You'd have to call grDevices:::.smoothScatterCalcDensity directly with your data. Then you'd have to transform the returned fhat values by taking 4th root and rescaling to 0-1. Then you convert to color by taking those values and then those values (let's call them z are converted to indexes using the formula floor((256 - 1e-05) * z + 1e-07)+1. Then those indexes are used to find a value from the 256 colors generated from the ramp you supply. It's all a bit crazy but you can read the source to smoothScatter and image.default to see how it really happens.

Related

Plot a curve with different color for each point in R

I have a curve, for instance
y_curve=c(1,2,5,6,9,1).
and the colors for each curve point
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000").
In theory I want to plot a curve where the first half has one color (except for the first point which is blue) and the second half has another color. In my example the dataset has more than 3000 observations so it makes sense.
For some reason, if I plot the data just using the command
plot(y_curve,col=colors), the color of points is plotted corrently.
Nevertheless, if I add the option type="l", the plotted curve has only one color - the blue, which is the first color in the vector colors ("#0000FF").
Does anyone know what am I doing wrong?
So the code is
y_curve=c(1,2,5,6,9,1)
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000")
plot(y_curve,col=colors,type="l")
Thank you all in advance.
I avoid using ggplot since this part of code is inside an already complicated function and I prefer using the base R commands.
The line option for the plot function does not accept multiple colors.
There is the segments() function that we can use to manually draw in each separate segment individually with a unique color.
y_curve=c(1,2,5,6,9,1)
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000")
#create a mostly blank plot
plot(y_curve,col=NA)
# Use this to show the points:
#plot(y_curve,col=colors)
#index variable
x = seq_along(y_curve)
#draw the segments
segments(head(x,-1), head(y_curve,-1), x[-1], y_curve[-1], type="l", col=colors)
This answer is based on the solution to this question:
How do I plot a graph in R, with the first values in one colour and the next values in another colour?

Coloring scatter plot in terms of an intensity in R

We would like to explain a continuous variable Y in terms of X1,X2,X3,X4,X5 (continuous grades from 0/20 to 20/20).
When plotting Y vs. X1, I would like to color the points in terms of the means (X1+X2+X3+X4+X5)/5 to see if the candidates that are bad in X1 are globally bad.
So, mean, varying from 0 to 20, would be blood red and gradually going to bright green for the ones having (and also indicate this in a legend). How could one proceed this?
Here is how I usually draw my scatter plots:
scatter.smooth(x=data$X1, y=data$Y1, main="Y1 ~ X1", xlab="X1", ylab="Y1")
Even better would be that each point is colored in terms of its X1 value, and has a colored "ring" around it corresponding to the color of its mean.
If you have no objection to ggplot, it'll make these things easy.
To shade by a continuous variable aes(color=myvar), has the behavior you want straight out of the box.
Customize the colors with +scale_color_gradient(low='red', high='green')
To do the rings, draw two sets of points: first one with size 3 (or whatever) in the ring color, then dot the centers with a point of size 1.

Simplifying noisy data for plotting and changing plot dimensions (horizontal)

Data Background: I have a large data frame (50,000 values, 10,000 when removing NAs) for a single chromosome. I am trying to plot a fixation index (Y-range: 0-1)(data$'N:S') across chromosomal positions (X-range: 0-250,000,000)(data$'pos'). I used a program (popoolation 2) to calculate sliding window averages for a window size of 50,000 and a step size of 10,000, resulting in my data. However, on R this is too noisy and it comes out looking like a blob. When I zoom in by changing the x-axis so each tick is 500,000 separation, you can see the trends nicely. I think I can fix this on a large chromosomal stage by increasing the area of the x-axis and finding a way to simplify the data.
Currently I have: All my data plotted, simple mean, StandDevs (color coded)
I am trying to figure out two things.
1 Is there a way to extend the X-axis to stretch out the length of it. I don't want to change the markers on it or what it displays, I want to make the actual length longer. (Example, if I had a graph on a piece of paper that showed an x-Axis of 1-10 on a 2" area, I would want to increase the area to 5", not change the defined limits to say 1-100. so, not xlim function)
2 Simplify the data in some way. I was thinking easiest would be a smoothed or rolling mean across the data. When I use rollmean() or smooth() it separates my data from the x-axis, so it only extends to the 8,000 points and when I plot it doesn't go across the whole chromosomal graph with the rest of my data. Someone mentioned there may be away to instead randomly sample data to simplify it?
2B If I get a trendline to work, can I color code it so that part of it that is 1 or 2 standard deviations above the mean can be a different color if I mute my actual background data and remove its color.
R Code
Image 1-Plotting All Positions
plot(data$'Pos',data$'N:S', ylim=c(0,0.5), col=data$Colour)
Image 3-I tried both
lines(smooth(datatest$`N:S`), type="l", col = "blue", lwd = 1)
and
rolling = rollmean(datatest$N:S, 9)
lines(rolling, type="b", col = "purple", lwd = 1)
Image 2-Plotting a Nice Subsection-- why I want to extend X-axis
plot(data$'Pos',data$'N:S', ylim=c(0,0.5), xlim=c(163000000,165000000), col=data$Colour)
Notes:
If it matters, my graph has colored points due to color coded regions related to means and Standard Dev.
data$Colour[data$'N:S'>=data_SD1above]="orange"
Also, the only difference between data and datatest was that datatest had NA values removed.
Image 1: All Positions-Messy
Image 2: Zoomed In to see trends
Image 3: All positions with the two attempted trendlines
So it seems like that you want to resize the width of the graph for the visualization.
if you use Rstudio, there is an output option which changes the width and height of the graph.
if you use the console, you can save your plot with width and height. for example
png("mychromosome.png".width=1000,height=300)
plot(..blah..blah..)
dev.off()
I hope it will help you.

How to draw a layered scatterplot in R?

I'm learning R, and want to draw a scatterplot of a large dataframe (~55000 rows). I'm using the scatterplot in car:
library(car)
d=read.csv("patches.csv", header=T)
scatterplot(energy ~ homogenity | label, data=d,
ylab="energy", xlab="homogenity ",
main="Scatter Plot",
labels=row.names(d))
where patches.csv contains the dataframe (below)
I want to show the two label sets differently. With a large volume of data, the plot is very dense, so I get the result below right (mostly red data visible). The image takes a while to render, so I can see the black labelled data fleetingly (below left) before it gets hidden in the final diagram.
Can I control R to plot the data with red first, or is there a better way to achieve my goal?
Here's a sample of my data:
label,channel,x,y,contrast,energy,entropy,homogenity
1,21,460,76,0.991667,0.640399,0.421422,0.939831
1,22,460,76,0.0833333,0.62375,0.364379,0.969445
1,23,460,76,0.129167,0.422908,0.589938,0.935417
1,24,460,76,0,1,0,1
1,25,460,76,0,1,0,1
1,26,460,76,0.0875,0.789627,0.253649,0.967361
1,27,460,76,2.4,0.528516,0.700859,0.845558
1,28,460,76,0.120833,0.562066,0.392998,0.945139
1,29,460,76,0.0125,0.975234,0.0329461,0.99375
1,30,460,76,0,1,0,1
1,31,460,76,0.1625,0.384662,0.5859,0.929861
0,0,483,82,0.404167,0.309505,0.61573,0.947222
0,1,483,82,0.0166667,0.728559,0.221967,0.991667
0,2,483,82,0,1,0,1
0,3,483,82,0.416667,0.327083,0.644057,0.940972
0,4,483,82,0.0208333,0.919054,0.0940364,0.989583
0,5,483,82,0.416667,0.327083,0.644057,0.940972
0,6,483,82,0,1,0,1
0,7,483,82,0.0333333,0.794479,0.192471,0.983333
0,8,483,82,0,1,0,1
0,9,483,82,0,1,0,1
0,10,483,82,0.0208333,0.958984,0.0502502,0.989583
If you want to change the order of the coloring, pass the parameter col=2:1 to scatterplot, then you would be plotting red before black. You can use the function alpha from scales package to make your points translucent (it takes a vector of colors and alpha values allowing to make each color different density).
## More data
d <- data.frame(homogeneity=(x=rnorm(10000, 0.85, sd=0.15)),
label=factor((lab=1:2)),
energy=rnorm(10000, lab^1.8*x^2-lab, sd=x))
library(car)
library(scales) # for alpha
opacity <- c(0.3, 0.1) # opacity for each color
col <- 1:2 # black then red
scatterplot(energy ~ homogeneity | label, data=d,
ylab="energy", xlab="homogenity ",
main=paste0(palette()[col], "(", opacity, ")", collapse=","),
col=alpha(col, opacity),
labels=row.names(d))
Similar to what bunk said with alpha,
If you have lots of points, the actual identification of individual points is no longer meaningful. Instead, you probably want a representation of the density. For that use smoothScatter(x,y) and overlay highlighted points with the usual points(morex,morey). You obviously know how to use points (same parameters as plot) so it's very easy for you to implement, and requires very little extra knowledge on your part.

Plot With Blocks

I have been searching for hours, but I can't find a function that does this.
How do I generate a plot like
Lets say I have an array x1 = c(2,13,4) and y2=c(5,23,43). I want to create 3 blocks with height from 2-5,13-23...
How would I approach this problem? I'm hoping that I could be pointed in the right direction as to what built-in function to look at?
I have not used your data because you say you are working with an array, but you gave us two vectors. Moreover, the data you showed us is overlapping. This means that if you chart three bars, you only see two.
Based on the little image you provided, you have three ranges you want to plot for each individual or date. Using times series, we usually see this to plot the min/max, the standard deviation and the current data.
The trick is to chart the series as layers. The first series is the one with the largest range (the beige band in this example). In the following example, I chart an empty plot first and I add three layers of rectangles, one for beige, one for gray and one for red.
#Create data.frame
n=100
df <-data.frame(1:n,runif(n)*10,60+runif(n)*10,25+runif(n)*10,40+runif(n)*10,35-runif(n)*10,35+runif(n)*10)
colnames(df) <-c("id","beige.min","beige.max","gray.min","gray.max","red.min","red.max")
#Create chart
plot(x=df$id,y=NULL,ylim=range(df[,-1]), type="n") #blank chart, ylim is the range of the data
rect(df$id-0.5,df[,2],df$id+0.5,df[,3],col="beige", border=FALSE) #first layer
rect(df$id-0.5,df[,4],df$id+0.5,df[,5],col="gray", border=FALSE) #second layer
rect(df$id-0.5,df[,6],df$id+0.5,df[,7],col="darkred", border=FALSE) #third layer
It's not entirely clear what you want based on the png, but based on what you've written:
x1 <- c(2,13,4)
y2 <- c(5,23,43)
foo <- data.frame(id=1:3, x1, y2)
library(ggplot2)
ggplot(data=foo) + geom_rect(aes(ymin=x1, ymax=y2, xmin=id-0.4, xmax=id+0.4))

Resources