Re-classifying a random matrix - r

I am brand new to R and in some desperate need of help. I have created a random matrix and need to re-classify it. Each pixel is randomly generated from 0-255 and I need to able to classify the 0-255 digits into 8 classifications. How would I do this? Any help would be greatly appreciated and I have placed my code below. I know I could use a raster but I am unsure on how to use them.
Thanks
par(mar=rep(0,4))
m=matrix(runif(100),10,10)
image(m,axes=FALSE,col=grey(seq(0,1,length=255)))

I didn't think your example adequately fit your description of the problem (since runif only ranges from 0-1 if the limits are not specified) so I modified it to fit the natural language features:
m=matrix(runif(100, 0, 255),10,10)
m[] <- findInterval(m, seq(0, 256, length=8) )
image(m,axes=FALSE,col=grey(seq(0,1,length=255)))
The "[]" with no indices preserves the matrix structure of the m object. The findInterval function lets you do the same sort of binning as cut, but it returns a numeric vector rather than the factor that cut would give.

Related

Point pattern classification with spatstat: how to choose the right bandwidth?

I'm still trying to find the best way to classify bivariate point patterns:
Point pattern classification with spatstat: what am I doing wrong?
I now analysed 110 samples of my dataset using #Adrian's suggestion with sigma=bw.diggle (as I wanted an automatic bandwidth selection). f is a "resource selection function" (RSF) which describes the relationship between the intensity of the Cancer point process and the covariate (here kernel density of Immune):
Cancer <- split(cells)[["tumor"]]
Immune <- split(cells)[["bcell"]]
Dimmune <- density(Immune,sigma=bw.diggle)
f <- rhohat(Cancer, Dimmune)
I am in doubt about some results I've got. A dozen of rho-functions looked weird (disrupted, single peak). After changing to default sigma=NULL or sigma=bw.scott (which are smoother) the functions became "better" (see examples below). I also experimented with the following manipulations:
cells # bivariate point pattern with marks "tumor" and "bcell"
o.marks<-cells$marks # original marks
#A) randomly re-assign original marks
a.marks <- sample(cells$marks)
#B) replace marks randomly with a 50/50 proportion
b.marks<-as.factor(sample(c("tumor","bcell"), replace=TRUE, size=length(o.marks)))
#C) random (homogenious?) pattern with the original number of points
randt<-runifpoint(npoints(subset(cells,marks=="tumor")),win=cells$window)
randb<-runifpoint(npoints(subset(cells,marks=="bcell")),win=cells$window)
cells<-superimpose(tumor=randt,bcell=randb)
#D) tumor points are associated with bcell points (is "clustered" a right term?)
Cancer<-rpoint(npoints(subset(cells,marks=="tumor")),Dimmune,win=cells$window)
#E) tumor points are segregated from bcell points
reversedD<-Dimmune
density.scale.v<-sort(unique((as.vector(Dimmune$v)[!is.na(as.vector(Dimmune$v))]))) # density scale
density.scale.v.rev<-rev(density.scale.v)# reversed density scale
new.image.v<-Dimmune$v
# Loop over matrix
for(row in 1:nrow(Dimmune$v)) {
for(col in 1:ncol(Dimmune$v)) {
if (is.na(Dimmune$v[row, col])==TRUE){next}
number<-which(density.scale.v==Dimmune$v[row, col])
new.image.v[row, col]<-density.scale.v.rev[number]}
}
reversedD$v<-new.image.v # reversed density
Cancer<-rpoint(npoints(subset(cells,marks=="tumor")),reversedD,win=cells$window)
A better way to generate inverse density heatmaps is given by #Adrian in his post below.
I could not generate rpoint patterns for the bw.diggle density as it produced negative numbers.Thus I replaced the negatives Dimmune$v[which(Dimmune$v<0)]<-0 and could run rpoint then. As #Adrian explained in the post below, this is normal and can be solved easier by using a density.ppp option positive=TRUE.
I first used bw.diggle, because hopskel.test indicarted "clustering" for all my patterns. Now I'm going to use bw.scott for my analysis but can this decision be somehow justified? Is there a better method besides "RSF-function is looking weird"?
some examples:
sample10:
sample20:
sample110:
That is a lot of questions!
Please try to ask only one question per post.
But here are some answers to your technical questions about spatstat.
Negative values:
The help for density.ppp explains that small negative values can occur because of numerical effects. To force the density values to be non-negative, use the argument positive=TRUE in the call to density.ppp. For example density(Immune, bw.diggle, positive=TRUE).
Reversed image: to reverse the ordering of values in an image Z you can use the following code:
V <- Z
A <- order(Z[])
V[][A] <- Z[][rev(A)]
Then V is the order-reversed image.
Tips for your code:
to generate a random point pattern with the same number of points and in the same window as an existing point pattern X, use Y <- runifpoint(ex=X).
To extract the marks of a point pattern X, use a <- marks(X). To assign new marks to a point pattern X, use marks(X) <- b.
to randomly permute the marks attached to the points in a point pattern X, use Y <- rlabel(X).
to assign new marks to a point pattern X where the new marks are drawn randomly-with-replacement from a given vector of values m, use Y <- rlabel(X, m, permute=FALSE).

how to intersect an interpolated surface z=f(x,y) with z=z0 in R

I found some posts and discussions about the above, but I'm not sure... could someone please check if I am doing anything wrong?
I have a set of N points of the form (x,y,z). The x and y coordinates are independent variables that I choose, and z is the output of a rather complicated (and of course non-analytical) function that uses x and y as input.
My aim is to find a set of values of (x,y) where z=z0.
I looked up this kind of problem in R-related forums, and it appears that I need to interpolate the points first, perhaps using a package like akima or fields.
However, it is less clear to me: 1) if that is necessary, or the basic R functions that do the same are sufficiently good; 2) how I should use the interpolated surface to generate a correct matrix of the desired (x,y,z=z0) points.
E.g. this post seems somewhat related to the problem I am describing, but it looks extremely complicated to me, so I am wondering whether my simpler approach is correct.
Please see below some example code (not the original one, as I said the generating function for z is very complicated).
I would appreciate if you could please comment / let me know if this approach is correct / suggest a better one if applicable.
df <- merge(data.frame(x=seq(0,50,by=5)),data.frame(y=seq(0,12,by=1)),all=TRUE)
df["z"] <- (df$y)*(df$x)^2
ta <- xtabs(z~x+y,df)
contour(ta,nlevels=20)
contour(ta,levels=c(1000))
#why are the x and y axes [0,1] instead of showing the original values?
#and how accurate is the algorithm that draws the contour?
li2 <- as.data.frame(contourLines(ta,levels=c(1000)))
#this extracts the contour data, but all (x,y) values are wrong
require(akima)
s <- interp(df$x,df$y,df$z)
contour(s,levels=c(1000))
li <- as.data.frame(contourLines(s,levels=c(1000)))
#at least now the axis values are in the right range; but are they correct?
require(fields)
image.plot(s)
fancier, but same problem - are the values correct? better than the akima ones?

Converting "ppp" to multitype

I have been running two unmarked planar point pattern data sets through a series of spatstat functions. Now I would like to use the Kcross.inhom function to describe interaction between the two, but Kcross only works with marked data, so I have combined all x-y data into one csv file and added a column that distinguishes the two. I have established the following point pattern object, but do not understand how to edit the subsequent example of Kcross for my purposes. Or, perhaps there is a better way? Thanks for your help!
# read in data & create ppp
collisionspotholes<-read.csv("cpmulti.csv")
cp<-ppp(collisionspotholes[,3],collisionspotholes[,4],c(40.50390735,40.91115166),c(-74.25262139,-73.7078596))
# synthetic example
pp <- runifpoispp(50)
pp <- pp %mark% factor(sample(0:1, npoints(pp), replace=TRUE))
K <- Kcross(pp, "0", "1")
K <- Kcross(pp, 0, 1) # equivalent
I am not really clear as to what the problem is that you are having. You seem to me to "be there" essentially. However let me, for completeness, spell out the procedure that you should follow:
Let X and Y be your two point patterns (observed, presumably, in the same window).
Put these together into a single pattern:
XY <- superimpose(X=X,Y=Y)
Note that there is no need to dick around with your csv files; it is much more efficient to use the facilities provided by spatstat.
The foregoing syntax produces a multitype point pattern with marks being a factor with levels "X" and "Y". (If you want the levels to be denoted by other symbols you can easily arrange this.)
Then just calculate the inhomogeneous Kcross function:
Ki <- Kcross.inhom(XY,"X","Y")
That is all that there is to it.
Note that the foregoing uses the default method of estimating the intensities of the two patterns, explicitly leave-one-out kernel smoothing with bandwidth chosen by bw.diggle(). There may be better ways of estimating the intensities, perhaps by fitting a parametric model. This depends on the nature of the information available to you.
Interpreting the output of Kcross.inhom() is, IMHO, subtle and difficult.
Be cautious in any conclusions that you draw.
Rolf Turner's answer is correct. However, you say that
I have combined all x-y data into one csv file and added a column that distinguishes the two.
OK, suppose the data frame is called df and it has columns named x and y giving the spatial coordinates and h which is a character vector identifying whether the corresponding point is a pothole (h="p") or a collision (h="c"). Then you could do
X <- ppp(df$x, df$y, xlim, ylim, marks=factor(df$h))
where xlim, ylim are the limits for the spatial coordinates. Or more elegantly
X <- with(df, ppp(x, y, xlim, ylim, marks=factor(h))
Note the use of factor to ensure that the marks are categorical values. Then type
X
to check that you've got a 'multitype point pattern'.
Then you can do, e.g.
K <- Kcross(X)
Ki <- Kcross.inhom(X)
Please read the help files for Kcross, Kcross.inhom for advice about how to use these functions and how to interpret the results.
Incidentally, please do not send the same question to multiple forums at the same time. That is difficult for those who have to answer.

Is there a way of forcing the image function in R not to normalize coordinates?

When using the image function in R it normalized the length of the dimensions of the input matrix so X and Y axes go from 0 to 1.
Is there a way of telling the image function not to normalize these numbers?
I need to do so in order to overlay different kinds of data and normalizing all these coordinates into the [0,1] space is very tedious.
EDIT: The answer provided by Greg explains the situation.
A reproducible example would be very helpful here. Generally if you only give image a z matrix then the function chooses default x and y values that work, I think this is what you are seeing. On the other hand if you give image an x vector and a y vector then it uses that information to construct the graph. If the x/y vectors have a length equal to the corresponding dimension of z then those values represent the centers of the rectangles, if x/y is 1 longer than the corresponding dimension of z then they represent the corners of the rectangles. This gives you a lot of control over the things that you mention.
If this does not answer the question then give us a self contained reproducible example to work with.
I am going to answer my question based on the answer Greg Snow provided in order to follow the best practice of this site as anything that provides information should be an answer.
If you do not provide the x nor y parameters to the image() function, then the range of the axes is from 0 to 1 as in the next example.
> image(volcano)
Then, if you want to locate a point of interest in the matrix in use, for the element of the matrix with [x,y] coordinates of [10,40] you need to do something like:
> points(x=10/length(volcano[,1]),y=40/length(volcano[1,]))
If the x and y parameters are specified, and (as Greg mentioned) they fit the dimensions of the matrix, then the axes will range withing the specified x and y vectors.
> dim(volcano)
[1] 87 61
> image(x=1:87, y=1:61, z=volcano)
> points(10,40)

Graphing results of dbscan in R

Your comments, suggestions, or solutions are/will be greatly appreciated, thank you.
I'm using the fpc package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6).
I've found some clusters, and I need to graph just the significant ones. The problem is that I have a single cluster (the first) with about 39,000 points in it. I need to graph all other clusters but this one.
The dbscan() creates a special data type to store all of this cluster data in. It's not indexed like a data frame would be (but maybe there is a way to represent it as such?).
I can graph the dbscan type using a basic plot() call. But, like I said, this will graph the irrelevant 39,000 points.
tl;dr:
how do I graph only specific clusters of a dbscan data type?
If you look at the help page (?dbscan) it is organized like all others into sections labeled Description, Usage, Arguments, Details and Value. The Value section describes what the function dbscan returns. In this case it is simply a list (a standard R data type) with a few components.
The cluster component is simply an integer vector whose length it equal to the number of rows in your data that indicates which cluster each observation is a member of. So you can use this vector to subset your data to extract only those clusters you'd like and then plot just those data points.
For example, if we use the first example from the help page:
set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n,
sd=0.2))
ds <- dbscan(x, 0.2)
we can then use the result, ds to plot only the points in clusters 1-3:
#Plot only clusters 1, 2 and 3
plot(x[ds$cluster %in% 1:3,])
Without knowing the specifics of dbscan, I can recommend that you look at the function smoothScatter. It it very useful for examining the main patterns in a scatterplot when you otherwise would have too many points to make sense of the data.
The probably most sensible way of plotting DBSCAN results is using alpha shapes, with the radius set to the epsilon value. Alpha shapes are closely related to convex hulls, but they are not necessarily convex. The alpha radius controls the amount of non-convexity allowed.
This is quite closely related to the DBSCAN cluster model of density connected objects, and as such will give you a useful interpretation of the set.
As I'm not using R, I don't know about the alpha shape capabilities of R. There supposedly is a package called alphahull, from a quick check on Google.

Resources