GraphPad Mann Whitney scatter plot in R - r

I try to make a plot similar to the top three plot of this:
I found a partial answer here, however I am unsure how to add the p-values in the scatter-plot.
Any tips?

You've already got a partial answer. If you just want to know how to put p-values on then use text. (looking at graph C).
text(x = 1.5, y = 73, 'p = 0.03')
If you want the p-values and the lines underneath, assuming you also want those caps on the lines, use arrows instead of segments.
arrows(1, 70, 2, length = 2, angle = 90, code = 3)
If you're sticking with solving this in base R that's a great learning exercise and can give you full control over your plot. However, if you just want to get it done I'd suggest the beeswarm package (you're making beeswarm plots).
As an aside, this prompted me to investigate why you get those upward curving lines in beeswarm plots. It's a consequence of the typical algorithm. The line curves upward because the positions are calculated through increasing y-values. If the next y-value is so close that the points would overlap in the y-axis it's plotted at an angle off the x position. Many points close together on Y results in upward curving lines until you get far enough along Y to go back to X. Smaller points should alleviate that. Also, the beeswarm package in R has several optional algorithms that avoid that as well.

Related

Trying to find point of intersection using approx function, results are correct on y but off on x axis

Working in R, I am attempting to plot stream cross sections, interpolate a point at an intersection opposite an identified "bankful" point, and calculate the area under the bankful line. It is part of a loop that is processing many cross sections. The best solution I have come up with is using the approx function, however all of the points are not exactly on the point of intersection and I have not been able to figure out what I am doing wrong.
It is hard to provide sample data since it is part of a loop, but the sample of code below produces the result in the image. The blue triangle is supposed to be at the point of intersection between the dashed "bankful" line and the solid cross section perimeter line.
###sample data
stn.sub.sort <- data.frame(dist = c(0,1.222,2.213,2.898,4.453,6.990,7.439,7.781,8.753,10.824,10.903,13.601,17.447), depth=c(-0.474,-0.633,0,-0.349,-1.047,-2.982,-2.571,-3.224,-3.100,-3.193,-2.995,-0.065,-0.112), Bankful = c(0,0,0,0,1,0,0,0,0,0,0,0,0))
###plot cross section with identified bankful
plot(stn.sub.sort$dist,
as.numeric(stn.sub.sort$depth),
type="b",
col=ifelse(stn.sub.sort$Bankful==1,"red","black"),
ylab="Depth (m)",
xlab="Station (m)",
ylim=range(stn.sub.sort$depth),
xlim=range(stn.sub.sort$dist),
main="3")
###visualize bankful line of intersection
abline(h=stn.sub.sort$depth[stn.sub.sort$Bankful==1],
lty=2,
col="black")
###approximate point at intersection
index.bf=which(stn.sub.sort$Bankful==1)
index.approx<-which(stn.sub.sort$dist>stn.sub.sort$dist[index.bf])
sbf <- approx(stn.sub.sort$depth[index.approx],
stn.sub.sort$dist[index.approx],
xout=stn.sub.sort$depth[index.bf])
###plot opposite bankful points
points(sbf$y,sbf$x,pch=2,col="blue")
So your description leaves many questions about the nature of the data that you will have to deal with. I am going to assume that it will be roughly like your example - down from the first bankful point, then going back up with the curve crossing the depth of the bankful point just one more time.
With that assumption, it is easy to find the point before and the point after the crossing point. You just need to draw the line between those two points and solve for the correct dist value. I do this below by using approxfun to get the inverse function of the line connecting the two points. Then we can just plug in to get the dist-value of the crossing point.
BankfulDepth = stn.sub.sort$depth[stn.sub.sort$Bankful==1]
Low = max(which(stn.sub.sort$depth < BankfulDepth))
InvAF = approxfun(stn.sub.sort$depth[c(Low,Low+1)],
stn.sub.sort$dist[c(Low,Low+1)])
points(InvAF(BankfulDepth), BankfulDepth, pch=2,col="blue")

Area of polygon in ordiellipse is NaN - why?

I'm trying to add ellipses onto my NMDS plot created with Vegan package on R, but although the code goes through without an error, no polygons get drawn onto my graph. After using the summary() function, I found that the area of the polygon is NaN, hence why no polygons get drawn. I'm not sure why I don't have an area - is it something to do with my data?
My data can be found here: https://docs.google.com/spreadsheets/d/1uxWbKAvhdVqnorIMXURvYLrDZuoqejJpUsc9N6wSDxA/edit?usp=sharing
Three transects were done in three types of habitat - Interior forest, edge of the forest and disturbed habitat. Each dragonfly and damselfly seen was counted.
My R code is as follows:
OdonateNMDSdata <- read.csv(file.choose(), header=TRUE)
Odonaterownames <- row.names(OdonateNMDSdata) <- c("Interior", "Edge", "Disturbed")
library(vegan)
OdonateNMDS <- metaMDS(OdonateNMDSdata, k=2)
ordiplot(OdonateNMDS,type="n")
orditorp(OdonateNMDS,display="species",col="red",air=0.01)
orditorp(OdonateNMDS,display="sites",cex=1.25,air=0.01)
Ellipse <- ordiellipse(OdonateNMDS, groups=Odonaterownames, kind = "ehull", draw="polygon", col="blue", cex=0.7, conf=0.95)
summary(Ellipse)
Thanks
You have three points, and you want to draw three ellipses, one for each point. You need more than one point for each ellipse (and even for two points the enclosing ellipse would be a line connecting the points).
However, it seems that with enclosing ellipse (kind = "ehull") we give NaN as the area of one-point-ellipse, whereas with other kinds we give the area as 0 for one point. I'll change that.

R - locate intersection of two curves

There are a number of questions in this forum on locating intersections between a fitted model and some raw data. However, in my case, I am in an early stage project where I am still evaluating data.
To begin with, I have created a data frame that contains a ratio value whose ideal value should be 1.0. I have plotted the data frame and also used abline() function to plot a horizontal line at y=1.0. This horizontal line and the plot of ratios intersect at some point.
plot(a$TIME.STAMP, a$PROCESS.RATIO,
xlab='Time (5s)',
ylab='Process ratio',
col='darkolivegreen',
type='l')
abline(h=1.0,col='red')
My aim is to locate the intersection point, say x and draw two vertical lines at x±k, as abline(v=x-k) and abline(v=x+k) where, k is certain band of tolerance.
Applying a grid on the plot is not really an option because this plot will be a part of a multi-panel plot. And, because ratio data is very tightly laid out, the plot will not be too readable. Finally, the x±k will be quite valuable in my discussions with the domain experts.
Can you please guide me how to achieve this?
Here are two solutions. The first one uses locator() and will be useful if you do not have too many charts to produce:
x <- 1:5
y <- log(1:5)
df1 <-data.frame(x= 1:5,y=log(1:5))
k <-0.5
plot(df1,type="o",lwd=2)
abline(h=1, col="red")
locator()
By clicking on the intersection (and stopping the locator top left of the chart), you will get the intersection:
> locator()
$x
[1] 2.765327
$y
[1] 1.002495
You would then add abline(v=2.765327).
If you need a more programmable way of finding the intersection, we will have to estimate the function of your data. Unfortunately, you haven’t provided us with PROCESS.RATIO, so we can only guess what your data looks like. Hopefully, the data is smooth. Here’s a solution that should work with nonlinear data. As you can see in the previous chart, all R does is draw a line between the dots. So, we have to fit a curve in there. Here I’m fitting the data with a polynomial of order 2. If your data is less linear, you can try increasing the order (2 here). If your data is linear, use a simple lm.
fit <-lm(y~poly(x,2))
newx <-data.frame(x=seq(0,5,0.01))
fitline = predict(fit, newdata=newx)
est <-data.frame(newx,fitline)
plot(df1,type="o",lwd=2)
abline(h=1, col="red")
lines(est, col="blue",lwd=2)
Using this fitted curve, we can then find the closest point to y=1. Once we have that point, we can draw vertical lines at the intersection and at +/-k.
cross <-est[which.min(abs(1-est$fitline)),] #find closest to 1
plot(df1,type="o",lwd=2)
abline(h=1)
abline(v=cross[1], col="green")
abline(v=cross[1]-k, col="purple")
abline(v=cross[1]+k, col="purple")

Using Matlab, how does the visual geometric angle of a regression line change as I alter the axes of the graph?

I know that you can adjust the scale of the x and y axes to change the geometric angle of a regression line. For example, if you plotted a regression line with slope of b=0.3, perhaps the default settings of axes length etc. would create a regression angle of 35 degrees.
If you adjust the axes, you will change the angle the regression line makes with the x-axis so that it is greater or less than 35 degrees-WITHOUT changing the mathematical value of the slope--it will still stay as b=0.3.
What systematic equation/set of equations is there that allows me to know how the geometric angle of the regression line will be changed as I change the axes of the graph itself?
I have spent a lot of time on the internet looking for the answer to this and have not yet succeeded. For some reason statistics and geometry do not overlap much.
Refer to this web page: http://www.mathworks.in/help/matlab/ref/axis.html
Based on the data you have, set the same ranges for all the axes in your plot. Then the regression line would have the same angle for both the datasets.
Hope this helps!

Calculating the volume under a surface

I have created a 3D plot (a surface) using wireframe function. I wonder if there is any functions by which I can calculate the volume under the surface in a 3D plot?
Here is a sample of my data plus the wrieframe syntax I used to create my 3D (surface) plot:
x1<-c(13,27,41,55,69,83,97,111,125,139)
x2<-c(27,55,83,111,139,166,194,222,250,278)
x3<-c(41,83,125,166,208,250,292,333,375,417)
x4<-c(55,111,166,222,278,333,389,445,500,556)
x5<-c(69,139,208,278,347,417,487,556,626,695)
x6<-c(83,166,250,333,417,500,584,667,751,834)
x7<-c(97,194,292,389,487,584,681,779,876,974)
x8<-c(111,222,333,445,556,667,779,890,1001,1113)
x9<-c(125,250,375,500,626,751,876,1001,1127,1252)
x10<-c(139,278,417,556,695,834,974,1113,1252,1391)
df<-data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10)
df.matrix<-as.matrix(df)
wireframe(df.matrix,
aspect = c(61/87, 0.4),scales=list(arrows=FALSE,cex=.5,tick.number="10",z=list(arrows=T)),ylim=c(1:10),xlab=expression(phi1),ylab="Percentile",zlab=" Loss",main="Random Classifier",
light.source = c(10,10,10),drape=T,col.regions = rainbow(100, s = 1, v = 1, start = 0, end = max(1,100 - 1)/100, alpha = 1),screen=list(z=-60,x=-60))
Note: my real data is a 100X100 matrix
Thanks
The data you are feeding to wireframe is a grid of values. Hence one estimate of the volume of whatever underlying surface this is approximating is the sum of the grid values multiplied by the grid cell areas. This is just like adding up the heights of histogram bars to get the number of values in your histogram.
The problem I see with you doing this on your data is that the cell areas are going to be in odd units - percentiles on one axis, phi on the other has unknown units, so your volume is going to have units of loss times units of percentile times units of phi.
This isn't a problem if you want to compare volumes of similar things on exactly the same grid, but if you have surfaces on different grids (different values of phi, or different percentiles) then you need to be careful.
Now, noting that wireframe doesn't draw like a 3d histogram would (looking like square tower blocks) this gives us another way to estimate the volume. Your 10x10 matrix is plotted as 9x9 squares. Divide each of those squares into triangles and then compute the volume of the 192 right truncated triangular prisms (I think this is what they are - they are equilateral triangular prisms with a right angle and one sloping end). The formula for that should be out there somewhere. Probably base area times height to the centroid of the triangle or something.
I thought maybe this would be in the raster package, but it isn't. There's code for computing the surface area but not the volume! I'm sure the raster maintainer would be happy to have some code for this!
If the points are arbitrary (ie, don't follow smooth function), it seems like you're looking for the volume of the convex hull (minimum surface) surrounding these points. One package to help you calculate this is alphashape3d.
You'll need a 3-column matrix of the coordinates to form the right type of object to make the calculation but it seems rather straight-forward.

Resources