How to draw a layered scatterplot in R? - r

I'm learning R, and want to draw a scatterplot of a large dataframe (~55000 rows). I'm using the scatterplot in car:
library(car)
d=read.csv("patches.csv", header=T)
scatterplot(energy ~ homogenity | label, data=d,
ylab="energy", xlab="homogenity ",
main="Scatter Plot",
labels=row.names(d))
where patches.csv contains the dataframe (below)
I want to show the two label sets differently. With a large volume of data, the plot is very dense, so I get the result below right (mostly red data visible). The image takes a while to render, so I can see the black labelled data fleetingly (below left) before it gets hidden in the final diagram.
Can I control R to plot the data with red first, or is there a better way to achieve my goal?
Here's a sample of my data:
label,channel,x,y,contrast,energy,entropy,homogenity
1,21,460,76,0.991667,0.640399,0.421422,0.939831
1,22,460,76,0.0833333,0.62375,0.364379,0.969445
1,23,460,76,0.129167,0.422908,0.589938,0.935417
1,24,460,76,0,1,0,1
1,25,460,76,0,1,0,1
1,26,460,76,0.0875,0.789627,0.253649,0.967361
1,27,460,76,2.4,0.528516,0.700859,0.845558
1,28,460,76,0.120833,0.562066,0.392998,0.945139
1,29,460,76,0.0125,0.975234,0.0329461,0.99375
1,30,460,76,0,1,0,1
1,31,460,76,0.1625,0.384662,0.5859,0.929861
0,0,483,82,0.404167,0.309505,0.61573,0.947222
0,1,483,82,0.0166667,0.728559,0.221967,0.991667
0,2,483,82,0,1,0,1
0,3,483,82,0.416667,0.327083,0.644057,0.940972
0,4,483,82,0.0208333,0.919054,0.0940364,0.989583
0,5,483,82,0.416667,0.327083,0.644057,0.940972
0,6,483,82,0,1,0,1
0,7,483,82,0.0333333,0.794479,0.192471,0.983333
0,8,483,82,0,1,0,1
0,9,483,82,0,1,0,1
0,10,483,82,0.0208333,0.958984,0.0502502,0.989583

If you want to change the order of the coloring, pass the parameter col=2:1 to scatterplot, then you would be plotting red before black. You can use the function alpha from scales package to make your points translucent (it takes a vector of colors and alpha values allowing to make each color different density).
## More data
d <- data.frame(homogeneity=(x=rnorm(10000, 0.85, sd=0.15)),
label=factor((lab=1:2)),
energy=rnorm(10000, lab^1.8*x^2-lab, sd=x))
library(car)
library(scales) # for alpha
opacity <- c(0.3, 0.1) # opacity for each color
col <- 1:2 # black then red
scatterplot(energy ~ homogeneity | label, data=d,
ylab="energy", xlab="homogenity ",
main=paste0(palette()[col], "(", opacity, ")", collapse=","),
col=alpha(col, opacity),
labels=row.names(d))

Similar to what bunk said with alpha,
If you have lots of points, the actual identification of individual points is no longer meaningful. Instead, you probably want a representation of the density. For that use smoothScatter(x,y) and overlay highlighted points with the usual points(morex,morey). You obviously know how to use points (same parameters as plot) so it's very easy for you to implement, and requires very little extra knowledge on your part.

Related

How to set heigth of rows grid in graph lines on ggplots (R)?

I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.
The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).
you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here

Rescaling colors palette in r

In R i have a cloud of data around zero ,and some data around 1, i want to "rescale" my heat colors to distinguish lower numbers.This has to be done in a rainbow way, i don't want "discrete colors".I tried with breaks in image.plot but it doesn't work.
image.plot(X,Y,as.matrix(mymatrix),col=heat.colors(800),asp=1,scale="none")
I tried :
lowerbreak=seq(min(values),quantile2,len=80)
highbreak=seq(quantile2+0.0000000001,max(values),len=20)
break=c(lowerbreak,highbreak)
ii <- cut(values, breaks = break,
include.lowest = TRUE)
colors <- colorRampPalette(c("lightblue", "blue"))(99)[ii]
Here's an approach using the "squash" library. With makecmap(), you specify your colour values and breaks, and you can also specify that it should be log stretched using the base parameter. It's a bit complex, but gives you granular control. I use it to colorize skewed data, where I need more definition in the "low end".
To achieve the rainbow palette, I used the built-in "jet" colour function, but you can use any colour set - I give an example for creating a greyscale ramp with "colorRampPalette".
Whatever ramp you use, it will take some playing with the base value to optimize for your data.
install.packages("squash")
library("squash")
#choose your colour thresholds - outliers will be RED
minval=0 #lowest value to get a colour
maxval=2.0 #highest value to get a colour
n.cols=100 #how many colours do you want in your palette?
col.int=1/n.cols
#create your palette
colramp=makecmap(x=seq(minval,maxval,col.int),
n=n.cols,
breaks=prettyLog,
symm=F,
base=10,#to give ramp a log(base) stretch
colFn=jet,
col.na="red",
right=F,
include.lowest=T)
# If you don't like the colFn options in "makecmap", define your own!
# Here's an example in greyscale; pass this to "colFn" above
user.colfn=colorRampPalette(c("black","white"))
Example for using colramp in a plot (assuming you've already created colramp as above somewhere in your program):
varx=1:100
vary=1:100
plot(x,y,col=colramp$colors) #colors is the 2nd vector in the colramp list
To select specific colours, subset from the list via, e.g., colors[1:20] (if you try this with the example above, the first colors will repeat 5 times - not really useful but you get the logic and can play around).
In my case, I had a grid of values that I wanted to turn into a coloured raster image (i.e. colour mapping some continuous data). Here's example code for that, using a made up matrix:
#create a "dummy matrix"
matx=matrix(data=c(rep(2,50),rep(0,500),rep(0.5,500),rep(1,500),rep(1.5,500)),nrow=50,ncol=41,byrow=F)
#transpose the matrix
# the output of "savemat" is rotated 90 degrees to the left
# so savemat(maty) will be a colorized version of (matx)
maty=t(matx)
#savemat creates an image using colramp
savemat(x=maty,
filename="/Users/KeeganSmith/Desktop/matx.png",
map=colramp,
outlier="red",
dev="png",
do.dev.off=T)
When using colorRampPalette, you can set the bias argument to emphasise low (or high) values.
Something like colorRampPalette(heat.colors(100),bias=3) will result focus the 'ramp' on the lower, helping them to be more visually distinguishable.

Display groups with different borders in histogram with panel.superpose

This answer shows how to use groups and panel.superpose to display overlapping histograms in the same panel, assigning different colors to each histogram. In addition, I want to give each histogram a different border color. (This will allow me to display one histogram as solid bars without a border, overlayed with a transparent, all-border histogram. The example below is a little different for the sake of clarity.)
Although it's possible to use border= to use different border colors in the plot, they are not assigned to groups as fill colors are with col=. If you give border= a sequence of colors, it seems to cycle through them one bar at at time. If the two histograms overlap, the effect is a bit silly (see below).
Is there a way to give each group a specific border color?
# This illustrates the problem: Assignment of border colors to bars ignores grouping:
# make some data
foo.df <- data.frame(x=c(rnorm(10),rnorm(10)+2), cat=c(rep("A", 10),rep("B", 10)))
# plot it
histogram(~ x, groups=cat, data=foo.df, ylim=c(0,75), breaks=seq(-3, 5, 0.5), lwd=2,
panel=function(...)panel.superpose(..., panel.groups=panel.histogram,
col=c("transparent", "cyan"),
border=c(rep("black", 3), rep("red", 3))))
Note that you can't just count how many bars there are in each group and provide those numbers to rep in the border setting. If the two histograms overlap, at least one of the histograms will use two border colors.
(It's the panel.superpose code that places the groups on the same panel and that assigns the colors. I don't have a deep understanding of it.)
panel.histogram() doesn't have a formal groups= argument, and if you examine its code, you'll see that it handles any supplied groups= argument differently and in a less standard way than panel.*() functions that do. The upshot of that design decision is that (as you've found) it's not in general easy to pass in to it vectors of graphical parameters specifying per-group appearance
As a workaround, I'd suggest using latticeExtra's +() and as.layer() functions to overlay a number of separate histogram() plots, one for each group. Here's how you might do that:
library(lattice)
library(latticeExtra)
## Split your data by group into separate data.frames
foo.df <- data.frame(x=c(rnorm(10),rnorm(10)+2), cat=c(rep("A", 10),rep("B", 10)))
foo.A <- subset(foo.df, cat=="A")
foo.B <- subset(foo.df, cat=="B")
## Use calls to `+ as.layer()` to layer each group's histogram onto previous ones
histogram(~ x, data=foo.A, ylim=c(0,75), breaks=seq(-3, 5, 0.5),
lwd=2, col="transparent", border="black") +
as.layer(
histogram(~ x, data=foo.B, ylim=c(0,75), breaks=seq(-3, 5, 0.5),
lwd=2, col="cyan", border="red")
)

R - Subtracting two smoothScatter plots

I have two smoothScatter plots and hope to subtract them. See Below:
par(mfrow=c(1,2))
set.seed(3)
x1 = rnorm(1000)
y1 = rnorm(1000)
smoothScatter(x1,y1,nrpoints=length(x1),cex=3)
x2 = rnorm(200)
y2 = rnorm(200)
smoothScatter(x2,y2,nrpoints=length(x2),cex=3,colramp=colorRampPalette(c("white","red")))
My hope is that I can produce a 3rd plot which is a colorful subtraction of the 1st plot from the 2nd plot. That is, there will be areas which are blue, red, and then if possible I'd like to make the overlapped areas gray. But I'd like the colors to be consistent with the new densities. For instance, the center of the new plot would be almost fully gray, whereas the outsides may have some gray but also patches of blue and red. Note that the two plots have different numbers of points. How could I do such a thing?
The only way I can think of doing this is to go pixel by pixel and subtract the colors from one plot to another. The problem is, I don't know how to grab the color intensities at each pixel to do this. However, even if I were to achieve this, white minus white would probably give black, which I wouldn't want.
Thanks in advance!
You might consider using slightly transparent colors
#helper function to make transparent ramps
alpharamp<-function(c1,c2, alpha=128) {stopifnot(alpha>=0 & alpha<=256);function(n) paste(colorRampPalette(c(c1,c2))(n), format(as.hexmode(alpha), upper.case=T), sep="")}
And then we can overplot the two graphs with
smoothScatter(x1,y1,nrpoints=length(x1),cex=3, colramp=alpharamp("white",blues9))
par(new=T)
smoothScatter(x2,y2,nrpoints=length(x2),cex=3,colramp= alpharamp("white","red"), axes=F, ann=F)
Here's that this code produces.
If, you still want to get to the actual color values in the plot, that's actually a bit tricky. You'd have to call grDevices:::.smoothScatterCalcDensity directly with your data. Then you'd have to transform the returned fhat values by taking 4th root and rescaling to 0-1. Then you convert to color by taking those values and then those values (let's call them z are converted to indexes using the formula floor((256 - 1e-05) * z + 1e-07)+1. Then those indexes are used to find a value from the 256 colors generated from the ramp you supply. It's all a bit crazy but you can read the source to smoothScatter and image.default to see how it really happens.

Circling a particular box in R boxplot

Is it possible to circle a particular box in a boxplot in R? The assumption here is that I know beforehand which of the boxes it is that I have to highlight.
I heartily second #csgillespie's suggestion to just make it a different color.
That said, I played around a bit, and this is what I came up with (using #Marc's data):
df <- data.frame(s1=rnorm(100), s2=rnorm(100, mean=2), s3=rnorm(100, mean=-2))
Plot the boxplot and keep the stats for plotting the ellipse:
foo <- boxplot(df, border=c(8,8,1), lwd=c(1,1,3))
Set semimajor and semiminor axes:
aa <- 0.5
bb <- foo$stats[4,3]-foo$stats[2,3]
Plot a parameterized ellipse around the third box:
tt <- seq(0,2*pi,by=.01)
lines(3+aa*cos(tt),foo$stats[3,3]+bb*sin(tt))
If you want to go with a somewhat hand drawn look and can do some interactive parts (for example, creating a presentation where one slide just shows the plot, then the next slide includes the circling of the one of interest).
use the locator function to click on points that surround the part of the plot that is of interest, you might want to set type='l' so you can see the shape that you are making (but then will need to recreate the plot without the added lines)
pass the return value from above to the xspline function with other options.
example:
boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
tmp <- locator(type='l') # click on plot around box of interest
boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
xspline(tmp, open=FALSE, border='red', lwd=3)

Resources