In R, how do I join and subset SpatialPolygonsDataFrame? - r

I'm trying to figure out my way on how to perform (so easy in GIS) operations in R.
Let's take some example polygon data set from spdep package
library("spdep")
c <- readShapePoly(system.file("etc/shapes/columbus.shp", package="spdep")[1])
plot(c)
I've managed to figure out that I can choose polygons with logical statements using subset. For instance:
cc <- subset(c, c#data$POLYID<5) plot(cc)
Now, let's suppose I have another data frame that I'd like to join to my spatial data:
POLYID=1:9
TO.LINK =101:109
link.data <- data.frame(POLYID=POLYID, TO.LINK=TO.LINK)
Using these two datasets, how can I get two spatial data frames:
First, consisting of polygons that have their ID in the second data frame
Second, consisting of the opposite set - polygons that do not exist in the second data frame.
How could I get to this point?

This will probably work. First, you want your relevant IDs.
myIDs <- link.data$POLYID
Then, use subset as you've pointed out:
subset(c, POLYID %in% myIDs)
subset(c, !(POLYID %in% myIDs))
Note that this assumes that your first dataframe, c, also has a relevant column called POLYID.

Related

Sub-setting a spatial point data frame in R

I am trying to subset a spatial point data frame using the function subset as follows:
data(puechabonsp)
Chou.subset <- subset(puechabonsp,puechabonsp$relocs$Name=="Chou")
I was expecting to get all rows of the individual named "Chou" but instead, I got an empty list.
Obviously I am doing it wrong and would apprentice some help.
Thanks!
Idan
The puechabonsp variable contains 2 parts, a fixed map in the $map part and some tracks in the $relocs part. If you only want to know the locations where a specific animal is you can do this.
Chou.subset <- puechabonsp$relocs[puechabonsp$relocs$Name=='Chou',]

Looping through groups with deldir() in R

I have inputted some data consisting of three columns, X,Y and Group.
I am looking to get the underling data for a voronoi diagram for each group.
By using
a=deldir(Test.data$X,Test.data$Y,rw=c(0,1,0,1))
I succesfully create the voronoi data for the entire dataset. However I do not know how to iterate this process through the different groups that I have in the dataset.
Does anyone have any ideas? I have expereince with the ggplot function and know in here I can simply add a third dimension, something like
ggplot(Test.data,aes(x=X,y=Y,colour=Group))
Is there a way I can get a similar affect with the deldir() function
Thanks in advance for your help.
Ben
Consider creating a list of groups and then filter dataset. Below lapply() creates a list of deldir objects, one for each distinct group:
groups <- unique(Test.data$groupcol)
deldirList <- lapply(groups, function(g) {
temp <- Test.data[Test.data$groupcol==g,]
deldir(temp$X, temp$Y, rw=c(0,1,0,1))
})

Applying a function to a dataframe to trim empty columns within a list environment R

I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])

Manipulating cutree object in R to segment original dataframe

I'm using R's built-in correlation matrix and hierarchical clustering methods to segment daily sales data into 10 clusters. Then, I'd like to create agglomerated daily sales data by cluster. I've got as far as creating a cutree() object, but am stumped on extracting only the column names in the cutree object where the cluster number is 1, for example.
For simplicity's sake, I'll use the EuStockMarkets data set and cut the tree into 2 segments; bear in mind that I'm working with thousands of columns here so the needs to be scalable:
data=as.data.frame(EuStockMarkets)
corrMatrix<-cor(data)
dissimilarity<-round(((1-corrMatrix)/2), 3)
distSimilarity<-as.dist(dissimilarity)
hirearchicalCluster<-hclust(distSimilarity)
treecuts<-cutree(hirearchicalCluster, k=2)
now, I get stuck. I want to extract only the column names from treecuts where the cluster number is equal to 1, for example. But, the object that cutree() makes is not a DataFrame, making sub-setting difficult. I've tried to convert treecuts into a data frame, but R does not create a column for the row names, all it does is coerce the numbers into a row with the name treecuts.
I would want to do the following operations:
....Code that converts treecuts into a data frame called "treeIDs" with the
columns "Index" and "Cluster"......
cluster1Columns<-colnames(treeIDs[Cluster==1, ])
cluster1DF<-data[ , (colnames(data) %in% cluster1Columns)]
rowSums(cluster1DF)
...and voila, I'm done.
Thoughts/suggestions?
Here is the solution:
names(treecuts[which(treecuts[1:4]==1)])
[1] "DAX" "SMI" "FTSE"
If you want,say, also for the cluster 2 (or higher), you can then use %in%
names(treecuts[which(treecuts[1:4] %in% c(1,2))])
[1] "DAX" "SMI" "CAC" "FTSE"
Why not just
data$clusterID <- treecuts
then subset data as usual?

Data organisation in R: vectors of differing lengths as a single object

I'm trying to plot multiple overlaying density plots for two vectors on the same figure. As far as I know, I'm not able to do so unless they are in the same object.
In order to plot the data, I need to have a data.frame() with two columns; one for the value, and one to specify which vector each value belongs to.
My first vector contains 400 data. The second contains 1200. My current (somewhat inelegant) solution involves concatenating the two vectors into a new data.frame vector, and adding a second vector to the data.frame which contains 400 'a's and 1200 'b's, to indicate which vector the original data came from. This only works because I know how many data there were in each original vector.
Surely there must be a more efficient way to do this?
Let's say my original data are from dframe1$vector and dframe2$vector. I'm looking to create a new object called dframe3 which contains the columns $value and $original_vector_number. How do I do this?
You're trying to solve a problem you don't need to solve. You don't need to have them in the same object to plot their densities. Just use lines.
x <- rnorm(400,0,1)
y <- rnorm(1200,2,2)
plot(density(x))
lines(density(y))
Use library(reshape) and melt if you don't want to do this by hand:
library(reshape)
dframe <- data.frame(a = rnorm(400,1,1),b = rnorm(1200,1.2,2))
df.m <- melt(dframe)
library(ggplot2)
ggplot(df.m,aes(x = value,color = variable)) + geom_density()
Note that this will not truly provide the correct answer as putting the data frames together does expand the smaller of the two to fit the number of rows. The correct way to do this and plot in ggplot is the following:
By hand:
vecA <- data.frame(rnorm(400,1,1),'a')
vecB <- data.frame(rnorm(1200,1.2,2),'b')
names(vecA) <- c('value','name')
names(vecB) <- c('value','name')
dtf <- rbind(vecA,vecB)
library(ggplot2)
ggplot(dtf,aes(x=value,color=name))+geom_density()

Resources