Wrong coordinates in R plot (package e1071) - r

I'm trying to add single point to plot.tune but it ends up in a wrong place.
I'm calling
plot(radial.tune2, color.palette = topo.colors)
points(radial.tune2$best.parameters, pch=20, col='red')
where radial.tune2 is tune object.
The parameters are as follows:
> radial.tune2$best.parameters
cost gamma
52 0.385 0.04125
I think this is due to the fact, that the scale bar is not taken into account (as I re-scaled the plot, the red dot appeared in different places relative to axes).
I looked into documentation, but found nothing about adding points to plots of tune object.
I found this question which seems to have similar origin, but does not address the problem properly.
Edit: here is simplified example of the same problem.
It produces different plot, but with the same shifting-to-the-right property.
library(mlbench)
library(e1071)
data(BreastCancer)
BreastCancer$Id <- NULL
BreastCancer <- na.omit(BreastCancer)
data <- as.data.frame(lapply(BreastCancer, as.numeric))
data$Class <- BreastCancer$Class
names(data) <- names(BreastCancer)
C.range <- seq(0.1, 0.7, by=0.05)
gamma.range <- seq(0.001, 0.30, by=0.05)
set.seed(1)
radial.tune <- tune(svm, train.x=data[,-10], # 10 -> Class
train.y=data[,"Class"],
kernel="radial",
ranges=list(cost=C.range, gamma=gamma.range),
tunecontrol=tune.control(sampling="bootstrap"))
plot(radial.tune, color.palette = topo.colors)
points(radial.tune$best.parameters, pch=20, col='red')

I found out that this is actually problem of filled.countour which is being called under the hood.
See this thread for more info.

Related

Having trouble with paths returned by flowPath in raster R-package

Paths returned by the flowPath function in the raster package consist of segments parallel to the x- and y-axes.
Starting with the Vector Field Plots example in the rasterVis documentation (https://oscarperpinan.github.io/rastervis/), I try to find the flow-path from a starting point on the surface, but the path output is incorrect.
library(raster)
library(rasterVis)
proj <- CRS('+proj=longlat +datum=WGS84')
df <- expand.grid(x = seq(-2, 2, .01), y = seq(-2, 2, .01))
df$z <- with(df, (3*x^2 + y)*exp(-x^2-y^2))
r <- rasterFromXYZ(df, crs=proj)
# Up to this point we follow the example in the rasterVis documentation
# Now attempt to find the path from a point on the surface
contour(r$z)
r.fd<-terrain(r,opt='flowdir')
r.c<-cellFromXY(r,cbind(-1,0))
r.p<-flowPath(r.fd,r.c)
p.xy<-xyFromCell(r.fd,r.p)
lines(p.xy,col='green')
Flow path from point (-1,0) depicting undesired behavior.
As you can see above, the flow path proceeds to the minimum at approximately (0,-.8) by moving towards +x and then -y. I have been unable to construct a data set which does not exhibit this problem. However: the example included in the flowPath documentation (in the raster package, using the volcano data) produces output one might expect and does not exhibit this problem.
What am I doing incorrectly that I cannot extend the example in the rasterVis documentation?
Addendum: My reason for questioning the output may be more a misunderstanding of what flowPath is supposed to return. I expected the kind of path a droplet might follow as it moves downhill. Like this:
Expected flowPath
This was computed using a simple steepest-descent walk. However, if (as stated by respondent Hijmans) flowPath is working as intended, then I may need to find another function which provides the path droplets would follow moving downhill.
Why is the path incorrect? It looks good to me. Illustrated by aggregating and labeling.
library(raster)
proj <- CRS('+proj=longlat +datum=WGS84')
df <- expand.grid(x = seq(-2, 2, .01), y = seq(-2, 2, .01))
df$z <- with(df, (3*x^2 + y)*exp(-x^2-y^2))
r <- rasterFromXYZ(df, crs=proj)
r <- aggregate(r, 25) * 10
r.fd <- terrain(r, opt='flowdir')
r.p <- flowPath(r.fd, cbind(-1,0))
p.xy <- xyFromCell(r.fd,r.p)
plot(r)
lines(p.xy,col='green', lwd=2)
text(r)
Add some noise to get a more wiggly path
set.seed(01234)
r <- rasterFromXYZ(df, crs=proj)
r <- aggregate(r, 10) * 10
r <- r + runif(ncell(r), 1, 2)
r.fd <- terrain(r, opt='flowdir')
r.p <- flowPath(r.fd, cbind(-1,0))
p.xy <- xyFromCell(r.fd,r.p)
plot(r)
lines(p.xy,col='green', lwd=2)

rarecurve() plotted with Standard Error

Does rarecurve() (vegan) accept standard error for plotting?
If so, how can I plot such a curve?
I am following a classical script for this, with the BCI dataset:
S <- specnumber(BCI)
(raremax <- min(rowSums(BCI)))
Srare <- rarefy(BCI, raremax)
plot(S, Srare, xlab = "Observed No. of Species", ylab = "Rarefied No. of Species")
abline(0, 1)
rarecurve(BCI, step = 20, sample = raremax, col = "blue", cex = 0.6)
Statistically speaking, facilitating a function as this one would be helpful to most vegan users.
Thank you!
André
rarecurve does not give you SE. The reason is obvious and already given to you: there is enough clutter without extra curves. If you really want to do this, you must do it manually. That is not too complicated, because rarefy function accepts a vector sample sizes and gives you all the numbers you need. The following draws a basic plot using one site of Barro Colorado data set:
library(vegan)
data(BCI)
sum(BCI[1,]) # site 1, 448 tree stems
N <- seq(2, 448, by=8)
S <- rarefy(BCI[1,], N, se = TRUE)
plot(N, S[1,], type="l", lwd=3)
lines(N, S[1,] + 2*S[2,]) ## 2*SE is good enough for 95% CI
lines(N, S[1,] - 2*S[2,])
Statistically speaking, this gives you only the error caused by the subsampling process assuming that the observed data have no random variation. To me this makes little sense, and I find the rarefaction SE's misleading and meaningless. That does not stop me providing them in vegan.

What does autoplot.microbenchmark actually plot?

According to the docs, microbenchmark:::autoplot "Uses ggplot2 to produce a more legible graph of microbenchmark timings."
Cool! Let's try the example code:
library("ggplot2")
tm <- microbenchmark(rchisq(100, 0),
rchisq(100, 1),
rchisq(100, 2),
rchisq(100, 3),
rchisq(100, 5), times=1000L)
autoplot(tm)
I don't see anything about the...squishy undulations in the documentation, but my best guess from this answer by the function creator is that this is like a smoothed series of boxplots of the time taken to run, with the upper and lower quartiles connected over the body of the shape. Maybe? These plots look too interesting not to find out what is going on here.
What is this a plot of?
The short answer is a violin plot:
It is a box plot with a rotated kernel density plot on each side.
The longer more interesting(?) answer. When you call the autoplot function, you are actually calling
## class(ts) is microbenchmark
autoplot.microbenchmark
We can then inspect the actual function call via
R> getS3method("autoplot", "microbenchmark")
function (object, ..., log = TRUE, y_max = 1.05 * max(object$time))
{
y_min <- 0
object$ntime <- convert_to_unit(object$time, "t")
plt <- ggplot(object, ggplot2::aes_string(x = "expr", y = "ntime"))
## Another ~6 lines or so after this
The key line is + stat_ydensity(). Looking at ?stat_ydensity you
come to the help page on violin plots.

function lines() is not working

I have a problem with the function lines.
this is what I have written so far:
model.ew<-lm(Empl~Wage)
summary(model.ew)
plot(Empl,Wage)
mean<-1:500
lw<-1:500
up<-1:500
for(i in 1:500){
mean[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[1]
lw[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[2]
up[i]<-predict(model.ew,data.frame(Wage=i*100),interval="confidence",level=0.90)[3]
}
plot(Wage,Empl)
lines(mean,type="l",col="red")
lines(up,type="l",col="blue")
lines(lw,type="l",col="blue")
my problem i s that no line appears on my plot and I cannot figure out why.
Can somebody help me?
You really need to read some introductory manuals for R. Go to this page, and select one that illustrates using R for linear regression: http://cran.r-project.org/other-docs.html
First we need to make some data:
set.seed(42)
Wage <- rnorm(100, 50)
Empl <- Wage + rnorm(100, 0)
Now we run your regression and plot the lines:
model.ew <- lm(Empl~Wage)
summary(model.ew)
plot(Empl~Wage) # Note. You had the axes flipped here
Your first problem was that you flipped the axes. The dependent variable (Empl) goes on the vertical axis. That is the main reason you didn't get any lines on the plot. To get the prediction lines requires no loops at all and only a single plot call using matlines():
xval <- seq(min(Wage), max(Wage), length.out=101)
conf <- predict(model.ew, data.frame(Wage=xval),
interval="confidence", level=.90)
matlines(xval, conf, col=c("red", "blue", "blue"))
That's all there is to it.

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

Resources