Adding a specific line to a scatter plot - r

I have a plot of spectra vs frequency and I am trying to add a specific line through the data and what I have right now is
plot(freq, spc, log='xy', type='l')
y.loess <- loess(spc ~ freq, span=0.8, data.frame(x=freq, y=spc))
y.predict <- predict(y.loess, data.frame(x=freq))
lines(freq,y.predict)
lines(freq,y.predict, col='red')
This gives me the following
The black part of the graph is correct and what I need but the red line is incorrect what I need should look something like
I thought loess would work but it's not quite what I am going for. How do I add a line to my data to make it look like the second picture?

I would pre-scale the values and try a kernel smoother:
Ks <- ksmooth(log(freq),log(spc),kernel = "normal",bandwidth=0.3)
lines(Ks,col="red")
You can play around with the bandwidth or base it on standard deviation of your log(data). Look at this Wikipedia article for alternative using npreg.

You do not have a reproducible example, so, I'll be very simplistic in my answer. You can try out the existing function scatter.smooth:
require(graphics)
with(cars, scatter.smooth(speed, dist))
## You can elaborate more on the line and dots as your will:
with(cars, scatter.smooth(speed, dist, lpars =
list(col = "red", lwd = 3, lty = 3)))

Related

Why aren't any points showing up in the qqcomp function when using plotstyle="ggplot"?

I want to compare the fit of different distributions to my data in a single plot. The qqcomp function from the fitdistrplus package pretty much does exactly what I want to do. The only problem I have however, is that it's mostly written using base R plot and all my other plots are written in ggplot2. I basically just want to customize the qqcomp plots to look like they have been made in ggplot2.
From the documentation (https://www.rdocumentation.org/packages/fitdistrplus/versions/1.0-14/topics/graphcomp) I get that this is totally possible by setting plotstyle="ggplot". If I do this however, no points are showing up on the plot, even though it worked perfectly without the plotstyle argument. Here is a little example to visualize my problem:
library(fitdistrplus)
library(ggplot2)
set.seed(42)
vec <- rgamma(100, shape=2)
fit.norm <- fitdist(vec, "norm")
fit.gamma <- fitdist(vec, "gamma")
fit.weibull <- fitdist(vec, "weibull")
model.list <- list(fit.norm, fit.gamma, fit.weibull)
qqcomp(model.list)
This gives the following output:
While this:
qqcomp(model.list, plotstyle="ggplot")
gives the following output:
Why are the points not showing up? Am I doing something wrong here or is this a bug?
EDIT:
So I haven't figured out why this doesn't work, but there is a pretty easy workaround. The function call qqcomp(model.list, plotstyle="ggplot") still returns an ggplot object, which includes the data used to make the plot. Using that data one can easily write an own plot function that does exactly what one wants. It's not very elegant, but until someone finds out why it's not working as expected I will just use this method.
I was able to reproduce your error and indeed, it's really intriguing. Maybe, you should contact developpers of this package to mention this bug.
Otherwise, if you want to reproduce this qqplot using ggplot and stat_qq, passing the corresponding distribution function and the parameters associated (stored in $estimate):
library(ggplot2)
df = data.frame(vec)
ggplot(df, aes(sample = vec))+
stat_qq(distribution = qgamma, dparams = as.list(fit.gamma$estimate), color = "green")+
stat_qq(distribution = qnorm, dparams = as.list(fit.norm$estimate), color = "red")+
stat_qq(distribution = qweibull, dparams = as.list(fit.weibull$estimate), color = "blue")+
geom_abline(slope = 1, color = "black")+
labs(title = "Q-Q Plots", x = "Theoritical quantiles", y = "Empirical quantiles")
Hope it will help you.

Run points() after plot() on a dataframe

I'm new to R and want to plot specific points over an existing plot. I'm using the swiss data frame, which I visualize through the plot(swiss) function.
After this, want to add outliers given by the Mahalanobis distance:
mu_hat <- apply(swiss, 2, mean); sigma_hat <- cov(swiss)
mahalanobis_distance <- mahalanobis(swiss, mu_hat, sigma_hat)
outliers <- swiss[names(mahalanobis_distance[mahalanobis_distance > 10]),]
points(outliers, pch = 'x', col = 'red')
but this last line has no effect, as the outlier points aren't added to the previous plot. I see that if repeat this procedure on a pair of variables, say
plot(swiss[2:3])
points(outliers[2:3], pch = 'x', col = 'red')
the red points are added to the plot.
Ask: is there any restriction to how the points() function can be used for a multivariate data frame?
Here's a solution using GGally::ggpairs. It's a little ugly as we need to modify the ggally_points function to specify the desired color scheme.
I've assumed that mu_hat = colMeans(swiss) and sigma_hat = cov(swiss).
library(dplyr)
library(GGally)
swiss %>%
bind_cols(distance = mahalanobis(swiss, colMeans(swiss), cov(swiss))) %>%
mutate(is_outlier = ifelse(distance > 10, "yes", "no")) %>%
ggpairs(columns = 1:6,
mapping = aes(color = is_outlier),
upper = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
lower = list(continuous = function(data, mapping, ...) {
ggally_points(data = data, mapping = mapping) +
scale_colour_manual(values = c("black", "red"))
}),
axisLabels = "internal")
Unfortunately this isn't possible the way you're currently doing things. When plotting a data frame R produces many plots and aligns them. What you're actually seeing there is 6 by 6 = 36 individual plots which have all been aligned to look nice.
When you use the dots command, it tells it to place the dots on the current plot. Which doesn't really make sense when you have 36 plots, at least not the way you want it to.
ggplot is a really powerful tool in R, it provides far greater combustibility. For example you could set up the dataframe to include your outliers, but have them labelled as "outlier" and place it in each plot that you have set up as facets. The more you explore it you might find there are better plots which suit your needs as well.
Plotting a dataframe in base R is a good exploratory tool. You could set up those outliers as a separate dataframe and plot it, so you can see each of the 6 by 6 plots side by side and compare. It all depends on your goal. If you're goal is to produce exactly as you've described, the ggplot2 package will help you create something more professional. As #Gregor suggested in the comments, looking up the function ggpairs from the GGally package would be a good place to start.
A quick google image search shows some funky plots akin to what you're after and then some!
Find it here

How to color different groups in qqplot?

I'm plotting some Q-Q plots using the qqplot function. It's very convenient to use, except that I want to color the data points based on their IDs. For example:
library(qualityTools)
n=(rnorm(n=500, m=1, sd=1) )
id=c(rep(1,250),rep(2,250))
myData=data.frame(x=n,y=id)
qqPlot(myData$x, "normal",confbounds = FALSE)
So the plot looks like:
I need to color the dots based on their "id" values, for example blue for the ones with id=1, and red for the ones with id=2. I would greatly appreciate your help.
You can try setting col = myData$y. I'm not sure how the qqPlot function works from that package, but if you're not stuck with using that function, you can do this in base R.
Using base R functions, it would look something like this:
# The example data, as generated in the question
n <- rnorm(n=500, m=1, sd=1)
id <- c(rep(1,250), rep(2,250))
myData <- data.frame(x=n,y=id)
# The plot
qqnorm(myData$x, col = myData$y)
qqline(myData$x, lty = 2)
Not sure how helpful the colors will be due to the overplotting in this particular example.
Not used qqPlot before, but it you want to use it, there is a way to achieve what you want. It looks like the function invisibly passes back the data used in the plot. That means we can do something like this:
# Use qqPlot - it generates a graph, but ignore that for now
plotData <- qqPlot(myData$x, "normal",confbounds = FALSE, col = sample(colors(), nrow(myData)))
# Given that you have the data generated, you can create your own plot instead ...
with(plotData, {
plot(x, y, col = ifelse(id == 1, "red", "blue"))
abline(int, slope)
})
Hope that helps.

function lines() is not working

I have a problem with the function lines.
this is what I have written so far:
model.ew<-lm(Empl~Wage)
summary(model.ew)
plot(Empl,Wage)
mean<-1:500
lw<-1:500
up<-1:500
for(i in 1:500){
mean[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[1]
lw[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[2]
up[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[3]
}
plot(Wage,Empl)
lines(mean,type="l",col="red")
lines(up,type="l",col="blue")
lines(lw,type="l",col="blue")
my problem i s that no line appears on my plot and I cannot figure out why.
Can somebody help me?
You really need to read some introductory manuals for R. Go to this page, and select one that illustrates using R for linear regression: http://cran.r-project.org/other-docs.html
First we need to make some data:
set.seed(42)
Wage <- rnorm(100, 50)
Empl <- Wage + rnorm(100, 0)
Now we run your regression and plot the lines:
model.ew <- lm(Empl~Wage)
summary(model.ew)
plot(Empl~Wage) # Note. You had the axes flipped here
Your first problem was that you flipped the axes. The dependent variable (Empl) goes on the vertical axis. That is the main reason you didn't get any lines on the plot. To get the prediction lines requires no loops at all and only a single plot call using matlines():
xval <- seq(min(Wage), max(Wage), length.out=101)
conf <- predict(model.ew, data.frame(Wage=xval),
interval="confidence", level=.90)
matlines(xval, conf, col=c("red", "blue", "blue"))
That's all there is to it.

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

Resources