How to color different groups in qqplot? - r

I'm plotting some Q-Q plots using the qqplot function. It's very convenient to use, except that I want to color the data points based on their IDs. For example:
library(qualityTools)
n=(rnorm(n=500, m=1, sd=1) )
id=c(rep(1,250),rep(2,250))
myData=data.frame(x=n,y=id)
qqPlot(myData$x, "normal",confbounds = FALSE)
So the plot looks like:
I need to color the dots based on their "id" values, for example blue for the ones with id=1, and red for the ones with id=2. I would greatly appreciate your help.

You can try setting col = myData$y. I'm not sure how the qqPlot function works from that package, but if you're not stuck with using that function, you can do this in base R.
Using base R functions, it would look something like this:
# The example data, as generated in the question
n <- rnorm(n=500, m=1, sd=1)
id <- c(rep(1,250), rep(2,250))
myData <- data.frame(x=n,y=id)
# The plot
qqnorm(myData$x, col = myData$y)
qqline(myData$x, lty = 2)
Not sure how helpful the colors will be due to the overplotting in this particular example.

Not used qqPlot before, but it you want to use it, there is a way to achieve what you want. It looks like the function invisibly passes back the data used in the plot. That means we can do something like this:
# Use qqPlot - it generates a graph, but ignore that for now
plotData <- qqPlot(myData$x, "normal",confbounds = FALSE, col = sample(colors(), nrow(myData)))
# Given that you have the data generated, you can create your own plot instead ...
with(plotData, {
plot(x, y, col = ifelse(id == 1, "red", "blue"))
abline(int, slope)
})
Hope that helps.

Related

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

Multiple histograms with title and mean as a line?

I'm struggeling with the histogram function in my exploratory analysis. I would like to run a couple of variables in my dataset through a histogram function and for each add the title and a line at the arithmetic mean. This is how far I've got (but the main title is still missing):
histo.abline <-function(x){
hist(x)
abline(v = mean(x, na.rm = TRUE), col = "blue", lwd = 4)}
sapply(dataset[c(7:10)], histo.abline)
I tried to add a main argument in the histogram function but it just doesn't pick the right variable name of my dataset vector. When I put main=x there, it says returns NULL for each variable. Colnames, names and other functions didn't work either. Could you help me?
you can try to do it with ggplot:
library(ggplot)
histo.abline <-function(dataset,colnum){
p<-ggplot(dataset,aes(dataset[,colnum]))+geom_histogram(bins=5,fill=I("blue"),col=I("red"), alpha=I(.2))+
geom_vline(xintercept = mean(dataset[,colnum], na.rm = TRUE))+xlab(as.character(names(dataset)[colnum]))
return(p)
}
since you have not provided data lets work with mtcars and create a list of histograms
dataset=mtcars
listOfHistograms<-lapply(3:7,function(x) histo.abline(dataset,x))
your list has 5 histograms that you can plot for instance the first by:
print(listOfHistograms[[1]])
More histogram options for ggplot here: https://www.r-bloggers.com/how-to-make-a-histogram-with-ggplot2/
hope this helps
EDIT: Multiple Plot in one graph
One way to do it is through cowplot library:
library(cowplot)
plot_grid(plotlist=listOfHistograms[1:4])

Adding a specific line to a scatter plot

I have a plot of spectra vs frequency and I am trying to add a specific line through the data and what I have right now is
plot(freq, spc, log='xy', type='l')
y.loess <- loess(spc ~ freq, span=0.8, data.frame(x=freq, y=spc))
y.predict <- predict(y.loess, data.frame(x=freq))
lines(freq,y.predict)
lines(freq,y.predict, col='red')
This gives me the following
The black part of the graph is correct and what I need but the red line is incorrect what I need should look something like
I thought loess would work but it's not quite what I am going for. How do I add a line to my data to make it look like the second picture?
I would pre-scale the values and try a kernel smoother:
Ks <- ksmooth(log(freq),log(spc),kernel = "normal",bandwidth=0.3)
lines(Ks,col="red")
You can play around with the bandwidth or base it on standard deviation of your log(data). Look at this Wikipedia article for alternative using npreg.
You do not have a reproducible example, so, I'll be very simplistic in my answer. You can try out the existing function scatter.smooth:
require(graphics)
with(cars, scatter.smooth(speed, dist))
## You can elaborate more on the line and dots as your will:
with(cars, scatter.smooth(speed, dist, lpars =
list(col = "red", lwd = 3, lty = 3)))

colorRamp returns 0

I'm trying to plot lines and color the lines based on the probability of that connection. Given a vector of probabilities, I use:
colfunc <- colorRamp(c("white", "red"))
colors <- colfunc(probs)
colors is then an nx3 matrix of rgb values. However, colfunc quite often returns a 0 value, so when i attempt to plot using these colors, R complains
Error in col2rgb(colors) : numerical color values must be positive
Is there an error in the way I am defining my color function?
Your function works fine, I think, but it doesn't return colors you can use with plot, because plot wants a color, not RGB values in a matrix.
There's probably a better way, but you can simply covert the matrix:
probs <- runif(10)
colors <- colfunc(probs)
my_col = apply(colors, MARGIN = 1, function(x) rgb(x[1]/255, x[2]/255, x[3]/255))
plot(1:10, 1:10, col = my_col) # should work fine
or you could just wrap your function
better_colfunc <- function(x, ramp = colorRamp(c("white", "red"))) {
colors <- ramp(x)
colors = apply(colors, MARGIN = 1, function(x) rgb(x[1]/255, x[2]/255, x[3]/255))
return(colors)
}
plot(1:10, 1:10, col = better_colfunc(probs, ramp = colfunc))
As for "colfunc quite often returns a 0 value", and other issues, you'll need to share both some data (what do your probs look like?) as well as perhaps the actual plotting code. See here for tips on making reproducible questions.
I am a bit confused what you are trying to do...the col2rgb function returns rgb values, so if you already have those then what do you want?
Or if you want rgb, why not use:
col2rgb(c("white", "red"))

contour plot of a custom function in R

I'm working with some custom functions and I need to draw contours for them based on multiple values for the parameters.
Here is an example function:
I need to draw such a contour plot:
Any idea?
Thanks.
First you construct a function, fourvar that takes those four parameters as arguments. In this case you could have done it with 3 variables one of which was lambda_2 over lambda_1. Alpha1 is fixed at 2 so alpha_1/alpha_2 will vary over 0-10.
fourvar <- function(a1,a2,l1,l2){
a1* integrate( function(x) {(1-x)^(a1-1)*(1-x^(l2/l1) )^a2} , 0 , 1)$value }
The trick is to realize that the integrate function returns a list and you only want the 'value' part of that list so it can be Vectorize()-ed.
Second you construct a matrix using that function:
mat <- outer( seq(.01, 10, length=100),
seq(.01, 10, length=100),
Vectorize( function(x,y) fourvar(a1=2, x/2, l1=2, l2=y/2) ) )
Then the task of creating the plot with labels in those positions can only be done easily with lattice::contourplot. After doing a reasonable amount of searching it does appear that the solution to geom_contour labeling is still a work in progress in ggplot2. The only labeling strategy I found is in an external package. However, the 'directlabels' package's function directlabel does not seem to have sufficient control to spread the labels out correctly in this case. In other examples that I have seen, it does spread the labels around the plot area. I suppose I could look at the code, but since it depends on the 'proto'-package, it will probably be weirdly encapsulated so I haven't looked.
require(reshape2)
mmat <- melt(mat)
str(mmat) # to see the names in the melted matrix
g <- ggplot(mmat, aes(x=Var1, y=Var2, z=value) )
g <- g+stat_contour(aes(col = ..level..), breaks=seq(.1, .9, .1) )
g <- g + scale_colour_continuous(low = "#000000", high = "#000000") # make black
install.packages("directlabels", repos="http://r-forge.r-project.org", type="source")
require(directlabels)
direct.label(g)
Note that these are the index positions from the matrix rather than the ratios of parameters, but that should be pretty easy to fix.
This, on the other hand, is how easilyy one can construct it in lattice (and I think it looks "cleaner":
require(lattice)
contourplot(mat, at=seq(.1,.9,.1))
As I think the question is still relevant, there have been some developments in the contour plot labeling in the metR package. Adding to the previous example will give you nice contour labeling also with ggplot2
require(metR)
g + geom_text_contour(rotate = TRUE, nudge_x = 3, nudge_y = 5)

Resources