Include polygon ID when extracting raster values to polygons in R - r

I followed How do I extract raster values from polygon data then join into spatial data frame? (which was helpful) to create a matrix (then data frame) of mean raster values to a polygon. The problem now is that I want to know which polygon is which. My SpatialPolygonsDataFrame has an ID value in p$Block_ID. Is there a way to bring that over in the extract() code?
Alternatively, does the extract() function report output in the order it was input (that would make sense)? i.e. the order of p$Block_ID will be preserved in the output? I looked through the documentation and it was not clear one way or the other. If so it is easy enough to add an ID column to the extract() output.
Here is my generalized code for reference. NOTE note reproducible because I don't think it really needs to be at this point. Where r is a raster and p in the polygons
extract(r, p, small = TRUE, fun = mean, na.rm = TRUE, df = TRUE, nl = 1)
Thoughts?

The values are returned in order, as one would expect in R, and as stated in the manual (?extract): The order of the returned values corresponds to the order of object y
Thus you can do (reproducible example from ?extract)
e <- extract(r, p)
ee <- data.frame(ID=p$Block_ID, e)

I could not get R. Hijmans answer working for me. I found that this works.
e = extract(r, p)
e$ID = as.factor(e$ID)
levels(e$ID) = levels(p$Block_ID)

Related

r Kohonen map - How to find position of one dataset?

I have a dataframe df with my data of interest
I rescale with
df.sc <- scale(df)
and make my Kohonen map with
df.grid <- somgrid(15, 10, "hexagonal")
df.som <- som(df.sc, rlen=700, grid = df.grid)
That works fine and I get a nice map.
Now I have an extra datapoint
extra.sc <- as.matrix(-0.29985191, -0.35905786, -0.260923297, -0.2415673150,
-0.259426676, -0.330404078)
It is scaled exactly the same way as df.sc
Now I want to see the position of the unit in the kohonen map given the df.som for the extra.sc
map(df.som,extra.sc)
does not give me what I want.
How can I determine the position of extra.sc within df.som? And preferentially also how I can mark it on the map
Maybe you defined your new data incorrectly, i.e. they did not have similar dimension with that of the training data. Check the output of extra.sc using parenthesis (extra.sc). I recommend that you provide the number of rows and columns to the definition of extra.sc using matrix() and c() function instead of as.matrix(). For example:
extra.sc <- matrix(c(0.29985191, -0.35905786, -0.260923297, -0.2415673150, -0.259426676, -0.330404078), nrow = 1, ncol = 6)`
and observe the result:
(extra.sc)
It is one row and six columns. If you do not provide the shape of your data, then R will regard them as one column and six rows.
extra.sc <- matrix(c(-0.29985191, -0.35905786, -0.260923297, -0.2415673150, -0.259426676, -0.330404078))
(extra.sc)

Problem in plotCountDepth R function. How to solve it?

I'm dealing with a dataframe that i called GBM that contains single-cell measurements. So i'm relying to SCnorm package to deal with the normalization process and to have a previous check of my data. I'm using (plotCountDepth function)
This is my pipeline :
sce <- SingleCellExperiment::SingleCellExperiment(assays = list('counts' = GBM))
sce <- plotCountDepth(Data = sce,
Conditions = Label,
FilterCellProportion = .1,
NCores = 3)
I do not really understand why I continue to have this error returned
Error in colSums(Data[, which(Conditions == Levels[x])]) :
'x' must be an array of at least two dimensions
even if I'm applying the same criteria I find in BioConductor
For you to have major information Label is a vector of the same dimension of GBM that is a matrix G x S, containing a series of labels to distinguish each cell group.
Thank you in advance
PS : GBM is a matrix which columns are named by the various cell names while the rows are of course the genes
As the vignette stated:
Data: can be a matrix of single-cell expression with cells where rows
are genes and columns are samples. Gene names should not be a column
in this matrix, but should be assigned to rownames(Data).
Below I provide a minimum working example and I suggest you check whether you specified the rownames correctly:
library(SingleCellExperiment)
library(SCnorm)
GBM = matrix(rpois(10000,20),ncol=50)
rownames(GBM) = paste0("Gene",1:200)
colnames(GBM) = paste0("Sample",1:50)
Label=rep(c("X","Y"),each=25)
sce <- SingleCellExperiment(assays = list('counts' = GBM))
This function works but it is not very well written because it prints out the ggplot object but there's no way of storing it:
plt <- plotCountDepth(Data = sce,Conditions = Label,
FilterCellProportion = .1,NCores = 3)

Performing HCPC on the columns (i.e. variables) instead of the rows (i.e. individuals) after (M)CA

I would like to perform a HCPC on the columns of my dataset, after performing a CA. For some reason I also have to specify at the start, that all of my columns are of type 'factor', just to loop over them afterwards again and convert them to numeric. I don't know why exactly, because if I check the type of each column (without specifying them as factor) they appear to be numeric... When I don't load and convert the data like this, however, I get an error like the following:
Error in eigen(crossprod(t(X), t(X)), symmetric = TRUE) : infinite or
missing values in 'x'
Could this be due to the fact that there are columns in my dataset that only contain 0's? If so, how come that it works perfectly fine by reading everything in first as factor and then converting it to numeric before applying the CA, instead of just performing the CA directly?
The original issue with the HCPC, then, is the following:
# read in data; 40 x 267 data frame
data_for_ca <- read.csv("./data/data_clean_CA_complete.csv",row.names=1,colClasses = c(rep('factor',267)))
# loop over first 267 columns, converting them to numeric
for(i in 1:267)
data_for_ca[[i]] <- as.numeric(data_for_ca[[i]])
# perform CA
data.ca <- CA(data_for_ca,graph = F)
# perform HCPC for rows (i.e. individuals); up until here everything works just fine
data.hcpc <- HCPC(data.ca,graph = T)
# now I start having trouble
# perform HCPC for columns (i.e. variables); use their coordinates that are stocked in the CA-object that was created earlier
data.cols.hcpc <- HCPC(data.ca$col$coord,graph = T)
The code above shows me a dendrogram in the last case and even lets me cut it into clusters, but then I get the following error:
Error in catdes(data.clust, ncol(data.clust), proba = proba, row.w =
res.sauv$call$row.w.init) : object 'data.clust' not found
It's worth noting that when I perform MCA on my data and try to perform HCPC on my columns in that case, I get the exact same error. Would anyone have any clue as how to fix this or what I am doing wrong exactly? For completeness I insert a screenshot of the upper-left corner of my dataset to show what it looks like:
Thanks in advance for any possible help!
I know this is old, but because I've been troubleshooting this problem for a while today:
HCPC says that it accepts a data frame, but any time I try to simply pass it $col$coord or $colcoord from a standard ca object, it returns this error. My best guess is that there's some metadata it actually needs/is looking for that isn't in a data frame of coordinates, but I can't figure out what that is or how to pass it in.
The current version of FactoMineR will actually just allow you to give HCPC the whole CA object and tell it whether to cluster the rows or columns. So your last line of code should be:
data.cols.hcpc <- HCPC(data.ca, cluster.CA = "columns", graph = T)

R function to count coordinates

Trying to get it done via mapply or something like this without iterations - I have a spatial dataframe in R and would like to subset all more complicated shapes - ie shapes with 10 or more coordinates. The shapefile is substantial (10k shapes) and the method that is fine for a small sample is very slow for a big one. The iterative method is
Street$cc <-0
i <- 1
while(i <= nrow(Street)){
Street$cc[i] <-length(coordinates(Street)[[i]][[1]])/2
i<-i+1
}
How can i get the same effect in any array way? I have a problem with accessing few levels down from the top (Shapefile/lines/Lines/coords)
I tried:
Street$cc <- lapply(slot(Street, "lines"),
function(x) lapply(slot(x, "Lines"),
function(y) length(slot(y, "coords"))/2))
/division by 2 as each coordinate is a pair of 2 values/
but is still returns a list with number of items per row, not the integer telling me how many items are there. How can i get the number of coordinates per each shape in a spatial dataframe? Sorry I do not have a reproducible example but you can check on any spatial file - it is more about accessing low level property rather than a very specific issue.
EDIT:
I resolved the issue - using function
tail()
Here is a reproducible example. Slightly different to yours, because you did not provide data, but the principle is the same. The 'principle' when drilling down into complex S4 structures is to pay attention to whether each level is a list or a slot, using [[]] to access lists, and # for slots.
First lets get a spatial ploygon. I'll use the US state boundaries;
library(maps)
local.map = map(database = "state", fill = TRUE, plot = FALSE)
IDs = sapply(strsplit(local.map$names, ":"), function(x) x[1])
states = map2SpatialPolygons(map = local.map, ID = IDs)
Now we can subset the polygons with fewer than 200 vertices like this:
# Note: next line assumes that only interested in one Polygon per top level polygon.
# I.e. assumes that we have only single part polygons
# If you need to extend this to work with multipart polygons, it will be
# necessary to also loop over values of lower level Polygons
lengths = sapply(1:length(states), function(i)
NROW(states#polygons[[i]]#Polygons[[1]]#coords))
simple.states = states[which(lengths < 200)]
plot(simple.states)

Returning function output on array of values in R

I'm a beginner R programmer struggling with a multivariate array problem.
I'm attempting to input an array of 4 parameter values, say a=1:10, b=1:10, p=1:10, q=1:10, into a function y=f(x|a, b, p, q) that calculates values of y based on my dataset, x, and every possible combination of the given 4 parameters [(a=1,b=1,p=1,q=1),(a=2,b=1,p=1,q=1),...,(a=10,b=1,p=1,q=1),...,(a=10,b=10,p=10,q=10)] = 10^4 = 10,000 possible combinations and therefore 10,000 y values.
Ideally I'd like the output to be in an array format which I can then graph in R, allowing each parameter to be plotted as a separate axis.
If anyone could point me in the right direction it would be much appreciated!
Thanks,
Robert
I agree with JD Long that the request is too vague to allow a final answer, but there is an answer to the first part:
all.comb.dfrm <- expand.grid(a=1:10, b=1:10, p=1:10, q=1:10)
all.comb.dfrm$Y <- with(all.comb.dfrm, f(a,b,p,q) )

Resources