raster: Modifications of the methods in resample() function - r

I’ve like to know the possibility of make some modifications in resample() function in raster package. First, in "bilinear" method by default is assigns a weighted average of the four nearest cells, and I’ll like to change for different number nearest cells, too, is possible? Second, is it possible too to create the mean method to calculate the arithmetic average of the n nearest cells too?
For example, in the first case for 25 cells: resample (myraster, myresolution, window=matrix (nrow=5,nol=5), method="bilinear") and in the second case: resample (myraster, myresolution, window=matrix (nrow=5,nol=5), fun=mean).

You cannot do all that, but you can use the focal function on the input data prior to using resample.

Related

How can I use the for function to stack new rasters?

I am trying to create a new raster from calculating the difference in values between existing rasters. I want to find the difference between all of the existing rasters and one specific raster. Then, I want to stack all of these rasters. I typed out the entire calculations for 60 rasters, but I want to know the faster way using for.
Change<- stack(
AMTs$X1950-AMTs$X1950,
AMTs$X1951-AMTs$X1950,
AMTs$X1952-AMTs$X1950,
AMTs$X1953-AMTs$X1950,
AMTs$X1954-AMTs$X1950,
AMTs$X1955-AMTs$X1950,
AMTs$X1956-AMTs$X1950,
AMTs$X1957-AMTs$X1950,
AMTs$X1958-AMTs$X1950,
AMTs$X1959-AMTs$X1950,
AMTs$X1960-AMTs$X1950,
AMTs$X1961-AMTs$X1950,
AMTs$X1962-AMTs$X1950,
AMTs$X1963-AMTs$X1950,
AMTs$X1964-AMTs$X1950,
AMTs$X1965-AMTs$X1950,
AMTs$X1966-AMTs$X1950,
AMTs$X1967-AMTs$X1950,
AMTs$X1968-AMTs$X1950,
AMTs$X1969-AMTs$X1950,
AMTs$X1970-AMTs$X1950,
AMTs$X1971-AMTs$X1950,
AMTs$X1972-AMTs$X1950,
AMTs$X1973-AMTs$X1950,
AMTs$X1974-AMTs$X1950,
AMTs$X1975-AMTs$X1950,
AMTs$X1976-AMTs$X1950,
AMTs$X1977-AMTs$X1950,
AMTs$X1978-AMTs$X1950,
AMTs$X1979-AMTs$X1950,
AMTs$X1980-AMTs$X1950,
AMTs$X1981-AMTs$X1950,
AMTs$X1982-AMTs$X1950,
AMTs$X1983-AMTs$X1950,
AMTs$X1984-AMTs$X1950,
AMTs$X1985-AMTs$X1950,
AMTs$X1986-AMTs$X1950,
AMTs$X1987-AMTs$X1950,
AMTs$X1988-AMTs$X1950,
AMTs$X1989-AMTs$X1950,
AMTs$X1990-AMTs$X1950,
AMTs$X1991-AMTs$X1950,
AMTs$X1992-AMTs$X1950,
AMTs$X1993-AMTs$X1950,
AMTs$X1994-AMTs$X1950,
AMTs$X1995-AMTs$X1950,
AMTs$X1996-AMTs$X1950,
AMTs$X1997-AMTs$X1950,
AMTs$X1998-AMTs$X1950,
AMTs$X1999-AMTs$X1950,
AMTs$X2000-AMTs$X1950,
AMTs$X2001-AMTs$X1950,
AMTs$X2002-AMTs$X1950,
AMTs$X2003-AMTs$X1950,
AMTs$X2004-AMTs$X1950,
AMTs$X2005-AMTs$X1950,
AMTs$X2006-AMTs$X1950,
AMTs$X2007-AMTs$X1950,
AMTs$X2008-AMTs$X1950,
AMTs$X2009-AMTs$X1950
)
Is this what you're looking for?
create a function that takes the minus of every layer equal to and after 1950 from 1950.
minus<-function(dd, cc) {
return(dd-cc)
}
#now use overlay() from raster to create this new raster object
change <- overlay(AMTs[[1:59]], AMTs$1950, fun=minus)
breakdown:
the x variable from overlay AMTs[[1:59]] is equal to dd from the minus function, and the y variable AMTs$1950 is equal to cc from the minus function.

K means set of initial (distinct) cluster centres

I don`t know if the Kmeans algorithm is the appropriate approach but I have the following example.I want to have 6 groups so that in each group the values are in range 1.Group=0%, 2.Group=0-20%, 3.Group=20-40%, 4.Group=40-60%, 5.Group=60-80%, 6.Group=80-100% of the data. Now I want to know if it is possible to set the range of cluster centers so that each value of data will be assigned to one of the groups.I know I can recode the values but was hoping for better approach.
data<-c(27,14,16,0,0,10,7,9,10,19)
k_means<-kmeans(data, centers = ??)
The solution is obvious.i did not set the values according to the % but just for demonstration purposes.
data<-c(27,14,16,0,0,10,7,9,10,19)
k_means<-kmeans(data, c(4,5,8,10))

How to use pointDistance with a very large vector

I've got a big problem.
I've got a large raster (rows=180, columns=480, number of cells=86400)
At first I binarized it (so that there are only 1's and 0's) and then I labelled the clusters.(Cells that are 1 and connected to each other got the same label.)
Now I need to calculate all the distances between the cells, that are NOT 0.
There are quiet a lot and that's my big problem.
I did this to get the coordinates of the cells I'm interested in (get the positions (i.e. cell numbers) of the cells, that are not 0):
V=getValues(label)
Vu=c(1:max(V))
pos=which(V %in% Vu)
XY=xyFromCell(label,pos)
This works very well. So XY is a matrix, which contains all the coordinates (of cells that are not 0). But now I'm struggling. I need to calculate the distances between ALL of these coordinates. Then I have to put each one of them in one of 43 bins of distances. It's kind of like this (just an example):
0<x<0.2 bin 1
0.2<x<0.4 bin2
When I use this:
pD=pointDistance(XY,lonlat=FALSE)
R says it's not possible to allocate vector of this size. It's getting too large.
Then I thought I could do this (create an empty data frame df or something like that and let the function pointDistance run over every single value of XY):
for (i in 1:nrow(XY))
{pD=PointDistance(XY,XY[i,],lonlat=FALSE)
pDbin=as.matrix(table(cut(pD,breaks=seq(0,8.6,by=0.2),Labels=1:43)))
df=cbind(df,pDbin)
df=apply(df,1,FUN=function(x) sum(x))}
It is working when I try this with e.g. the first 50 values of XY.
But when I use that for the whole XY matrix it's taking too much time.(Sometimes this XY matrix contains 10000 xy-coordinates)
Does anyone have an idea how to do it faster?
I don't know if this will works fast or not. I recommend you try this:
Let say you have dataframe with value 0 or 1 in each cell. To find coordinates all you have to do is write the below code:
cord_matrix <- which(dataframe == 1, arr.ind = TRUE)
Now, you get the coordinate matrix with row index and column index.
To find the euclidean distance use dist() function. Go through it. It will look like this:
dist_vector <- dist(cord_matrix)
It will return lower triangular matrix. can be transformed into vector/symmetric matrix. Now all you have to do is calculating bins according to your requirement.
Let me know if this works within the specific memory space.

In R: sort the maximum dissimilarity between rows in a matrix

I have a matrix, which includes 100 rows and 10 columns, here I want to compare the diversity between rows and sort them. And then, I want to select the 10 maximum dissimilarity rows from it, Which method can I use?
set.seed(123)
mat <- matrix(runif(100 * 10), nrow = 100, ncol = 10)
My initial method is to calculate the similarity (e.g. saying tanimoto coefficient or others: http://en.wikipedia.org/wiki/Jaccard_index ) between two rows, and dissimilairty = 1 - similarity, and then compare the dissimilarty values. At last I will sort all dissimilarity value, and select the 10 maximum dissimilarity values. But it seems that the result is a 100 * 100 matrix, maybe need efficient method to such calculation if there are a large number of rows. However, this is just my thought, maybe not right, so I need help.
[update]
After looking for some literatures. I find the one definition for the maximum dissimilarity method.
Maximum dissimilarity method: It begins by randomly choosing a data record as the first cluster center. The record maximally distant from the first point is selected as the next cluster center. The record maximally distant from both current points is selected after that . The process repeats itself until there is a sufficient number of cluster centers.
Here in my question, the sufficient number should be 10.
Thanks.
First of all, the Jacard Index is not right for you. From the wikipedia page
The Jaccard coefficient measures similarity between finite sample sets...
Your matrix has samples of floats, so you have a different problem (note that the Index in question is defined in terms of intersections; that should be a red flag right there :-).
So, you have to decide what you mean by dissimilarity. One natural interpretation would be to say row A is more dissimilar from the data set than row B if it has a greater Euclidean distance to the center of mass of the data set. You can think of the center of mass of the data set as the vector you get by taking the mean of each of the colums and putting them together (apply(mat, 2, mean)).
With this, you can take the distance of each row to that central vector, and then get an ordering on those distances. From that you can work back to the rows you desire from the original matrix.
All together:
center <- apply(mat, 2, mean)
# not quite the distances, actually, but their squares. That will work fine for us though, since the order
# will still be the same
dists <- apply(mat, 1, function(row) sum((row - center) ** 2))
# this gives us the row indices in order of least to greaest dissimiliarity
dist.order <- order(dists)
# Now we just grab the 10 most dissimilar of those
most.dissimilar.ids <- dist.order[91:100]
# and use them to get the corresponding rows of the matrix
most.dissimilar <- mat[most.dissimilar.ids,]
If I was actually writing this, I probably would have compressed the last three lines as most.dissimilar <- mat[order(dists)[91:100],], but hopefully having it broken up like this makes it a little easier to see what's going on.
Of course, if distance from the center of mass doesn't make sense as the best way of thinking of "dissimilarity" in your context, then you'll have to amend with something that does.

dealing with data table with redundant rows

The title is not precisely stated but I could not come up with other words which summarizes what I exactly going to ask.
I have a table of the following form:
value (0<v<1) # of events
0.5677 100000
0.5688 5000
0.1111 6000
... ...
0.5688 200000
0.1111 35000
Here are some of the things I like to do with this table: drawing the histogram, computing mean value, fitting the distribution, etc. So far, I could only figure out how to do this with vectors like
v=(0.5677,...,0.5688,...,0.1111,...)
but not with tables.
Since the number of possible values are huge by being almost continuous, I guess making a new table would not be that effective, so doing this without modifying the original table and making another table would be desirable very much. But if it has to be done so, it's okay. Thanks in advance.
Appendix: What I want to figure out is how to treat this table as a usual data vector:
If I had the following vector representing the exact same data as above:
v= (0.5677, ...,0.5677 , 0.5688, ... 0.5688, 0.1111,....,0.1111,....)
------------------ ------------------ ------------------
(100000 times) (5000+200000 times) (6000+35000) times
then we just need to apply the basic functions like plot, mean, or etc to get what I wanted. I hope this makes my question more clear.
Your data consist of a value and a count for that value so you are looking for functions that will use the count to weight the value. Type ?weighted.mean to get information on a function that will compute the mean for weighted (grouped) data. For density plots, you want to use the weights= argument in the density() function. For the histogram, you just need to use cut() to combine values into a small number of groups and then use aggregate() to sum the counts for all the values in the group. You will find a variety of weighted statistical measures in package Hmisc (wtd.mean, wtd.var, wtd.quantile, etc).

Resources