Determining the mean center and standard distance of a dataset in R - r

I have a dataset called mypoints and I have created a polygon and plotted the points as below:
mypoints=read.csv("d:\\data\\venus.csv",header = T)
mypoints
minx=min(mypoints[,1])
maxx=max(mypoints[,1])
miny=min(mypoints[,2])
maxy=max(mypoints[,2])
mypolygon=cbind(c(minx,maxx,maxx,minx),c(miny,miny,maxy,maxy))
plot(mypoints)
polygon(mypolygon)
I now want to write a function that calculates both the mean center and standard distance for mypoints. I then need to plot the standard distance as a circle centered on the mean center of all points with the radius equal to the standard distance. Note that the last expression evaluated in a function becomes the return value, the result of invoking the function.
So far:
#I think this how I calculate the mean center for x and y:
x1=sum(mypoints[,1])/length(mypoints[,1])
y1=sum(mypoints[,2])/length(mypoints[,2])
#This is the formula I was shown for standard distance:
sd.mypoints=sqrt(sum(x1+y1)/n)
#This is the formula I was shown for creating the circle:
symbols(sd.mypoints[1],sd.mypoints[2],sd.mypoints$sd,add=T,inches=F)
#This is the error that I get when I run the circle formula:
Error in sd.mypoints$sd : $ operator is invalid for atomic vectors
I have found it easier to find the Nearest Neighbor, do KDE, Ghat, and Fhat for this dataset than trying to figure this out. I am sure there is a easy solution for this but I just can't seem to get it. Third class in R and it has been a lot of fun up to this point.

You have the line
symbols(sd.mypoints[1],sd.mypoints[2],sd.mypoints$sd,add=T,inches=F)
in your code. As said in the comments, sd.mypoint is not a data.frame, so subsetting it with sd.mypoint$sd` causes the error you see.
From the documentation of symbols, which you can access with ?symbols you'll see that for circles the circles argument is mandatory, so the function can differentiate what sort of figure is drawing.
EDIT:
Also, please notice that you are using x and y points to symbols different to the ones you already calculated. So you need to replace that line with:
symbols(x1, y1,circles = sd.mypoints,add=T,inches=F)
Notice the use of x1 and y1. I can see the plot now.

Related

How to find whole distance between two points in a curved line in R?

I have a similar line graph plotted using R plot function (plot(df))
I want to get distance of the whole line between two points in the graph (e.g., between x(1) and x(3)). How can I do this?
If your function is defined over a fine grid of points, you can compute the length of the line segment between each pair of points and add them. Pythagoras is your friend here:
To the extent that the points are not close enough together that the function is essentially linear between the points, it will tend to (generally only slightly) underestimate the arc length.
Note that if your x-values are stored in increasing order, these ẟx and ẟy values can be obtained directly by differencing (in R that's diff)
If you have a functional form for y as a function of x you can apply the integral for the arc length -- i.e. integrate
∫ √[1+(dy/dx)²] dx
between a and b. This is essentially just the calculation in 1 taken to the limit.
If both x and y are parametric functions of another variable (t, say) you can simplify the parametric form of the above integral (if we don't forget the Jacobian) to integrating
∫ √[(dx/dt)²+(dy/dt)²] dt
between a and b
(Note the direct parallel to 1.)
if you don't have a convenient-to-integrate functional form in 2. or 3. you can use numerical quadrature; this can be quite efficient (which can be handy when the derivative function is expensive to evaluate).

How to cleanly use interpolation between points to generate a mean in R

I am having issues trying to generate a code that will cleanly produce a mean (specifically a weighted average) based on a simple plot of points using interpolation.
For Example;
ex=c(1,2,3,4,5)
why=c(2,5,9,15,24)
This shows the kind of information I am working with.
plot(ex, why, type="o")
At this point, I want to actually have each point "binned" so the lines between them are straight. To do this, I have been adding points to the x values manually in excel as (x+0.01).
This is the new output:
why=c(2,2,5,5,9,9,15,15,24,24)
ex=c(1,2,2.01,3,3.01,4,4.01,5,5.01,6)
plot(ex, why, type="o")
So this is where my question comes in to play. I have to do this many times and do not want to generate a ton of new vectors and objects. To get a weighted average, I have been interpolating y values for increments of x at 0.01 using interpolation into a new object. I am then able to go into this new object and get a mean when a point falls between the actual ex values, i.e.
mean(newy[1:245])
Because I made new y values for 100 increments of x that (basically) follow a straight line, I am getting a weighted average here for x= 1 to 2.45.
Is there an easier and more elegant way to embed the interpolate code into the mean code so I could just say "average of interpolated y for nonreal x to nonreal x?"
It doesn't do exactly what you want, but you should consider the stepfun function -- this creates a step function out of two series.
plot(stepfun(ex[-1], why))
stepfun is handy because it gives you a function defined over that interval, so you can easily interpolate just by evaluating anywhere. The downside to it is that it is not strictly defined on the range given (which is why we have to cut off the first value in ex).
Based on your second plotting example, I think you are probably looking for this:
library(ggplot2)
qplot(ex, why, geom="step")
this gives:
Or if you want the line to go vertical first, you can use:
qplot(ex, why, geom="step", direction = "vh")
which gives:

How can i plot a single point using maxima/wxmaxima?

Im writing a basic script to find the min distance between f(x):=log(x)-x and the origin. I would like to be able to plot the point closest to the origin over the top of a plot of f(x), but i can not figure out how to plot a single point.
Here is what i have written. Any ideas?
f(x):=log(x)-x;
d(x):=sqrt(x^2+f(x)^2)$
find_root(diff(d(x),x),x,0.01,5)$
a:%;
f(a);
print("min distance from f(x) to (0,0)")$
d(a);
print("passes second derivative test if next value greater than zero")$
g(x):=''(diff(d(x),x,2))$
g(a);
wxplot2d([f(x)], [x,.01,5], [y,-6,0])$
Use the discrete option as a second curve, and then use points in the style option.
Replacing your last line with
wxplot2d([f(x), [discrete, [a], [f(a)]]], [x,.01,5], [y,-6,0],
[style, lines, points],
[legend, "log(x)-x", "closest point to origin"],
[point_type, circle],
[gnuplot_preamble, "set key bottom"])$
gives you this:

How to count line segment occurrences by pixel in R?

I am trying to convey the concentration of lines in 2D space by showing the number of crossings through each pixel in a grid. I am picturing something similar to a density plot, but with more intuitive units. I was drawn to the spatstat package and its line segment class (psp) as it allows you to define line segments by their end points and incorporate the entire line in calculations. However, I'm struggling to find the right combination of functions to tally these counts and would appreciate any suggestions.
As shown in the example below with 50 lines, the density function produces values in (0,140), the pixellate function tallies the total length through each pixel and takes values in (0, 0.04), and as.mask produces a binary indictor of whether a line went through each pixel. I'm hoping to see something where the scale takes integer values, say 0..10.
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L = psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# image with 2-dimensional kernel density estimate
D = density.psp(L, sigma=0.03)
# image with total length of lines through each pixel
P = pixellate.psp(L)
# binary mask giving whether a line went through a pixel
B = as.mask.psp(L)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(L, main="L")
plot(D, main="density.psp(L)")
plot(P, main="pixellate.psp(L)")
plot(B, main="as.mask.psp(L)")
The pixellate.psp function allows you to optionally specify weights to use in the calculation. I considered trying to manipulate this to normalize the pixels to take a count of one for each crossing, but the weight is applied uniquely to each line (and not specific to the line/pixel pair). I also considered calculating a binary mask for each line and adding the results, but it seems like there should be an easier way. I know that you can sample points along a line, and then do a count of the points by pixel. However, I am concerned about getting the sampling right so that there is one and only one point per line crossing of a pixel.
Is there is a straight-forward way to do this in R? Otherwise would this be an appropriate suggestion for a future package enhancement? Is this more easily accomplished in another language such as python or matlab?
The example above and my testing has been with spatstat 1.40-0, R 3.1.2, on x86_64-w64-mingw32.
You are absolutely right that this is something to put in as a future enhancement. It will be done in one of the next versions of spatstat. It will probably be an option in pixellate.psp to count the number of crossing lines rather than measure the total length.
For now you have to do something a bit convoluted as e.g:
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L <- psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# split into individual lines and use as.mask.psp on each
masklist <- lapply(1:nsegments(L), function(i) as.mask.psp(L[i]))
# convert to 0-1 image for easy addition
imlist <- lapply(masklist, as.im.owin, na.replace = 0)
rslt <- Reduce("+", imlist)
# plot
plot(rslt, main = "")

dimensions of kde object from ks package, R

I am using the ks package from R to estimate 2d space utilization using distance and depth information. What I would like to do is to use the 95% contour output to get the maximum vertical and horizontal distance. So essentially, I want to be able to get the dimensions or measurements of the resulting 95% contour.
Here is a piece of code with as an example,
require(ks)
dist<-c(1650,1300,3713,3718)
depth<-c(22,19.5,20.5,8.60)
dd<-data.frame(cbind(dist,depth))
## auto bandwidth selection
H.pi2<-Hpi(dd,binned=TRUE)*1
ddhat<-kde(dd,H=H.pi2)
plot(ddhat,cont=c(95),lwd=1.5,display="filled.contour2",col=c(NA,"palegreen"),
xlab="",ylab="",las=1,ann=F,bty="l",xaxs="i",yaxs="i",
xlim=c(0,max(dd[,1]+dd[,1]*0.4)),ylim=c(60,-3))
Any information about how to do this will be very helpful. Thanks in advance,
To create a 95% contour polygon from your 'kde' object:
library(raster)
im.kde <- image2Grid (list(x = ddhat$eval.points[[1]], y = ddhat$eval.points[[2]], z = ddhat$estimate))
kr <- raster(im.kde)
It is likely that one will want to resample this raster to a higher resolution before constructing polygons, and include the following two lines, before creation of the polygon object:
new.rast <- raster(extent(im.kde),res = c(50,50))
kr <- resample(kr, new.rast)
bin.kr <- kr
bin.kr[bin.kr < contourLevels(k, prob = 0.05)]<-NA
bin.kr[bin.kr > 0]<-1
k.poly<-rasterToPolygons(bin.kr,dissolve=T)
Note that the results are similar, but not identical, to Hawthorne Beier's GME function 'kde'. He does use the kde function from ks, but must do something slightly different for the output polygon.
At the moment I'm going for the "any information" prize rather than attempting a final answer. The ks:::plot.kde function dispatches to ks:::plotkde.2d in this case. It works its magic through side effects and I cannot get these functions to return values that can be inspected in code. You would need to hack the plotkde.2d function to return the values used to plot the contour lines. You can visualize what is in ddhat$estimate with:
persp(ddhat$estimate)
It appears that contourLevels examines the estimate-matrix and finds the value at which greater than the specified % of the total density will reside.
> contourLevels(ddhat, 0.95)
95%
1.891981e-05
And then draws the contout based on which values exceed that level. (I just haven't found the code that does that yet.)

Resources