I am trying to extract values from a rasterstack and append those to an existing dataframe. The values are a collection of metrics (PatchStat from r package SDMtools) which I am able to extract into list format, but I am stuck trying to bind the values to my existing dataframe.
Input data:
library(sp)
library(sf)
library(raster)
library(dplyr)
library(SDMTools)
mydata <- read.table(header=TRUE, text = "
animal X Y ord.year
1 pb_20414 157978.9 2323819 2009168
2 pb_20414 156476.3 2325586 2009168
3 pb_06817 188512.0 2299679 2006263
4 pb_06817 207270.9 2287248 2006264")
# add rasters
s <- stack(system.file("external/rlogo.grd", package="raster"))
names(s) <- c('masie_ice_r00_v01_2009168_4km', 'masie_ice_r00_v01_2006263_4km', 'masie_ice_r00_v01_2006264_4km')
# Create sp object
projection <-CRS('+proj=stere +lat_0=90 +lat_ts=60 +lon_0=-80 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +units=m + datum=WGS84 +no_defs +towgs84=0,0,0') # matches MASIE raster
coords <- cbind(mydata$X, mydata$Y)
mydata.sp <- SpatialPointsDataFrame(coords = coords, data = mydata, proj4string = projection)
# Create sf object
mydata.sf <- st_as_sf(mydata)
mydata.buf30 <- st_buffer(mydata.sf, 30000)
My goal is to match each GPS point (X,Y) with the correct GeoTIFF by date (mydata$ord.year), crop the raster to a (spatially explicit) 30 km buffer, run PatchStat in program SDMtools for R, and append the results to the original dataframe. The catch is that PatchStat results are provided in a dataframe, so I am having trouble matching those results to my existing dataframe.
Here is an example of results provided when I run PatchStat:
patchID n.cell n.core.cell n.edges.perimeter n.edges.internal area core.area perimeter
2 3 73 13 86 206 73 13 86
perim.area.ratio shape.index frac.dim.index core.area.index
2 1.178082 2.388889 1.430175 0.1780822
Here is what I have been able to do so far:
# separate date component of TIF name to correspond to mydata$ord.year
stack <- list()
date<-vector()
for (i in 1:length(rasterlist)) {
stack[[i]]<-raster(rasterlist[i])
tt<-unlist(strsplit(names(stack[[i]]), "[_]"))
date[i]<-tt[which(nchar(tt)==max(nchar(tt)))]
}
st <- stack(stack) # Create rasterstack object
# crop raster to buffer
mydata.sp <- as(mydata.sf, 'Spatial') # back to sp object
# pull raster data from GeoTIFF that corresponds to ordinal date
pat <- list()
for (i in 1:nrow(mydata.sp)) {
st2<-st[[which(date==mydata.sp$ord.year[i])]]
GeoCrop <- raster::crop(st2, mydata.sp[i,])
GeoCrop_mask <- raster::mask(GeoCrop, mydata.sp[i,])
pat[[i]] <- PatchStat(GeoCrop_mask)}
Additionally, I have eliminated one of the two land cover types so that each element in the list has only one row:
pat2 <- lapply(pat, `[`, -1,) # remove first row in each list element so only one row remains (using program plyr for R)
Now, I would like to match these rows to my original dataframe, so that pat2[[1]] is appended to mydata.sp[1,] like this (assuming a,b, and c are columns of metadata within my original SpatialPointsDataFrame). I would like all the columns of data from PatchStat added but to save time and space, I only included the first three here:
a b c PatchID n.cell n.core.cell
1 2 3 3 73 13
Note: If possible, I would love for this whole process to be included in the for loop to minimize room for error and also processing time.
Thanks so much!
Thanks for your effort to provide example data. But it is still incomplete (it refers to files that we do not have. You could to this
library(raster)
library(SDMTools)
s <- stack(system.file("external/rlogo.grd", package="raster"))
s <- round(s / 50) # to have fewer patches
names(s) <- c('masie_ice_r00_v01_2009168_4km', 'masie_ice_r00_v01_2006263_4km', 'masie_ice_r00_v01_2006264_4km')
df <- data.frame(ord.year=c("2009168", "2009168", "2006263", "2006264"))
pts <- SpatialPoints(cbind(c(20,40,60,80), c(20,40,60,20)))
crs(pts) <- crs(s)
pts <- SpatialPointsDataFrame(pts, df)
Make a buffer
b <- buffer(pts, 15, dissolve=FALSE)
Get matching names
nms <- names(s)
nms <- gsub('masie_ice_r00_v01_', '', nms)
nms <- gsub('_4km', '', nms)
Loop to match names, and put results in a list
p <- list()
for (i in 1:length(b)) {
j <- which(b$ord.year[i] == nms)
r <- s[[j]]
z <- crop(r, b[i,])
z <- mask(z, b[i,])
p[[i]] <- PatchStat(z)
}
Note that each element of p has a data.frame with multiple rows and columns.
p[[1]]
#patchID n.cell n.core.cell n.edges.perimeter n.edges.internal area core.area perimeter perim.area.ratio shape.index frac.dim.index core.area.index
#1 1 53 5 68 144 53 5 68 1.2830189 2.266667 1.427207 0.09433962
#2 2 123 8 182 310 123 8 182 1.4796748 3.956522 1.586686 0.06504065
#3 3 149 31 190 406 149 31 190 1.2751678 3.800000 1.543074 0.20805369
#4 4 54 2 114 102 54 2 114 2.1111111 3.800000 1.679578 0.03703704
#5 5 337 206 146 1202 337 206 146 0.4332344 1.972973 1.236172 0.61127596
If you only want the first rows
pp <- t(sapply(p, function(i) i[1,]))
Combining this with the orginal data.frame is now trivial
dfpp <- cbind(df, pp)
Well I did this very ugly thing and got what I wanted. But I don't like it. If anyone has a better idea I'd love to hear it!
# Change objects to df
pat2 <- lapply(pat, `[`, -1,) # remove first row in each list element
library(plyr) # ldply command
pat3 <- ldply (pat2, data.frame)
pat4 <- bind_cols(pb, pat3)
Related
There is five polygons for five different cities (see attached file in the link, it's called bound.shp). I also have a point file "points.csv" with longitude and latitude where for each point I know the proportion of people belonging to group m and group h.
I am trying to calculate the spatial segregation proposed by Reardon and O’Sullivan, “Measures of Spatial Segregation”
There is a package called "seg" which should allow us to do it. I am trying to do it but so far no success.
Here is the link to the example file: LINK. After downloading the "example". This is what I do:
setwd("~/example")
library(seg)
library(sf)
bound <- st_read("bound.shp")
points <- st_read("points.csv", options=c("X_POSSIBLE_NAMES=x","Y_POSSIBLE_NAMES=y"))
#I apply the following formula
seg::spseg(bound, points[ ,c(group_m, group_h)] , smoothing = "kernel", sigma = bandwidth)
Error: 'x' must be a numeric matrix with two columns
Can someone help me solve this issue? Or is there an alternate method which I can use?
Thanks a lot.
I don't know what exactly spseg function does but when evaluating the spseg function in the seg package documentation;
First argument x should be dataframe or object of class Spatial.
Second argument data should be matrix or dataframe.
After evaluating the Examples for spseg function, it should have been noted that the data should have the same number of rows as the id number of the Spatial object. In your sample, the id is the cities that have different polygons.
First, let's examine the bound data;
setwd("~/example")
library(seg)
library(sf)
#For the fortify function
library(ggplot2)
bound <- st_read("bound.shp")
bound <- as_Spatial(bound)
class(bound)
"SpatialPolygonsDataFrame"
attr(,"package")
"sp"
tail(fortify(bound))
Regions defined for each Polygons
long lat order hole piece id group
5379 83.99410 27.17326 972 FALSE 1 5 5.1
5380 83.99583 27.17339 973 FALSE 1 5 5.1
5381 83.99705 27.17430 974 FALSE 1 5 5.1
5382 83.99792 27.17552 975 FALSE 1 5 5.1
5383 83.99810 27.17690 976 FALSE 1 5 5.1
5384 83.99812 27.17700 977 FALSE 1 5 5.1
So you have 5 id's in your SpatialPolygonsDataFrame. Now, let's read the point.csv with read.csv function since the data is required to be in matrix format for the spseg function.
points <- read.csv("c://Users/cemozen/Downloads/example/points.csv")
tail(points)
group_m group_h x y
950 4.95 78.49000 84.32887 26.81203
951 5.30 86.22167 84.27448 26.76932
952 8.68 77.85333 84.33353 26.80942
953 7.75 82.34000 84.35270 26.82850
954 7.75 82.34000 84.35270 26.82850
955 7.75 82.34000 84.35270 26.82850
In the documentation and the example within, it has been strictly stated that; the row number of the points which have two attributes (group_m and group_h in our data), should be equal to the id number (which is the cities). Maybe, you should calculate a value by using the mean for each polygon or any other statistics for each city in your data to be able to get only one value for each polygon.
On the other hand, I just would like to show that the function is working properly after feeding with a matrix that has 5 rows and 2 groups.
sample_spseg <- spseg(bound, as.matrix(points[1:5,c("group_m", "group_h")]))
print(sample_spseg)
Reardon and O'Sullivan's spatial segregation measures
Dissimilarity (D) : 0.0209283
Relative diversity (R): -0.008781
Information theory (H): -0.0066197
Exposure/Isolation (P):
group_m group_h
group_m 0.07577679 0.9242232
group_h 0.07516285 0.9248372
--
The exposure/isolation matrix should be read horizontally.
Read 'help(spseg)' for more details.
first: I do not have experience with the seg-package and it's function.
What I read from your question, is that you want to perform the spseg-function, om the points within each area?
If so, here is a possible apprach:
library(sf)
library(tidyverse)
library(seg)
library(mapview) # for quick viewing only
# read polygons, make valif to avoid probp;ems later on
areas <- st_read("./temp/example/bound.shp") %>%
sf::st_make_valid()
# read points and convert to sf object
points <- read.csv("./temp/example/points.csv") %>%
sf::st_as_sf(coords = c("x", "y"), crs = 4326) %>%
#spatial join city (use st_intersection())
sf::st_join(areas)
# what do we have so far??
mapview::mapview(points, zcol = "city")
# get the coordinates back into a data.frame
mydata <- cbind(points, st_coordinates(points))
# drop the geometry, we do not need it anymore
st_geometry(mydata) <- NULL
# looks like...
head(mydata)
# group_m group_h city X Y
# 1 8.02 84.51 2 84.02780 27.31180
# 2 8.02 84.51 2 84.02780 27.31180
# 3 8.02 84.51 2 84.02780 27.31180
# 4 5.01 84.96 2 84.04308 27.27651
# 5 5.01 84.96 2 84.04622 27.27152
# 6 5.01 84.96 2 84.04622 27.27152
# Split to a list by city
L <- split(mydata, mydata$city)
# loop over list and perform sppseg function
final <- lapply(L, function(i) spseg(x = i[, 4:5], data = i[, 1:2]))
# test for the first city
final[[1]]
# Reardon and O'Sullivan's spatial segregation measures
#
# Dissimilarity (D) : 0.0063
# Relative diversity (R): -0.0088
# Information theory (H): -0.0067
# Exposure/Isolation (P):
# group_m group_h
# group_m 0.1160976 0.8839024
# group_h 0.1157357 0.8842643
# --
# The exposure/isolation matrix should be read horizontally.
# Read 'help(spseg)' for more details.
spplot(final[[1]], main = "Equal")
I have a data frame (called "events") of camera trap data with coordinates and want to extract the habitat type using a raster file for each location and add the habitat type to my data frame. How do I extract this using the raster and the data frame coordinates? How do I add this to another main data frame afterwards?
## Creating the raster file from a shapefile
myfile <- shapefile("dpky.lc5.shp")
myfile#data$VALUE<-as.numeric(myfile#data$VALUE) # VALUE gives the numeric code for habitat type.
sr <- "+init=EPSG:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
r <- raster(myfile, res=100, crs=sr)
myraster<-rasterize(myfile,r,field="VALUE")
myras_spdf <- as(myraster, "SpatialPixelsDataFrame")
myras_df <- as.data.frame(myras_spdf)
## Data frame with coordinates
events <- read.csv("DPKY.Clean.csv",h=T,sep=";")
events.sp<-SpatialPoints(events[,c("Longitude","Latitude")],proj4string = CRS("+init=EPSG:4326"))
events.sp
I have not been able to find any code for this problem yet specific to my problem. I did manage using another .gri file but that code doesn't work for this.
It appears that you have points and polygons, and what to query their values with points. In other words, extract the values for the points from the polygons. If that is the case, it makes no sense to create a RasterLayer (and/or a SpatialPixels) object.
Always provide some example data (p has the polygons, d is a data.frame with coordinates)
library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
set.seed(10)
d <- coordinates(spsample(p, 4, "regular"))
colnames(d) <- c("lon", "lat")
d <- data.frame(id=1:nrow(d), d)
Solution
x <- extract(p, d[,c("lon", "lat")])
Now you can do
cbind(d, x[,c(4,6)])
# id lon lat NAME_1 NAME_2
#1 1 5.889636 49.53576 Luxembourg Esch-sur-Alzette
#2 2 6.176340 49.53576 Luxembourg Luxembourg
#3 3 5.889636 49.82246 Diekirch Redange
#4 4 6.176340 49.82246 Diekirch Diekirch
Or something like this
d$NAME_2 <- x$NAME_2
d
# id lon lat NAME_2
#1 1 5.889636 49.53576 Esch-sur-Alzette
#2 2 6.176340 49.53576 Luxembourg
#3 3 5.889636 49.82246 Redange
#4 4 6.176340 49.82246 Diekirch
With some effort and help from the stackers, I have been able to parse a webpage and save it as a dataframe. I want to repeat the same operation on multiple xml files and rbind the list. Here is what I tried and did successfully:
library(XML)
xml.url <- "http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml"
doc <- xmlParse(xml.url)
x <- xmlToDataFrame(getNodeSet(doc,"//SAMPLE_ATTRIBUTE"))
x$UNITS <- NULL
x_t <- t(x)
x_t <- as.data.frame(x_t)
names(x_t) <- as.matrix(x_t[1, ])
x_t <- x_t[-1, ]
x_t[] <- lapply(x_t, function(x) type.convert(as.character(x)))
Above code works well, now when I try to apply a function to do the same for multiple xml files :
ERS_ID <- c("ERS445758","ERS445759", "ERS445760", "ERS445761", "ERS445762")
xml_url_test = as.vector(sprintf("http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml",
ERS_ID))
XML_parser <- function(XML_url){
doc <- xmlParse(XML_url)
x <- xmlToDataFrame(getNodeSet(doc,"//SAMPLE_ATTRIBUTE"))
x$UNITS <- NULL
x_t <- t(x)
x_t <- as.data.frame(x_t)
names(x_t) <- as.matrix(x_t[1, ])
x_t <- x_t[-1, ]
x_t[] <- lapply(x_t, function(x) type.convert(as.character(x)))
return(x_t)
}
major_test <- sapply(xml_url_test, XML_parser)
It works, but gives me a long list that is not in the right data frame format as I generated for the single XML file.
Finally I would like to also add a column to the final dataframe that has the ERS number from the ERS_ID vector
Something like x_t$ERSid <- ERS_ID in the function
Can someone point out what am I missing in the function as well as any better ways to do the task?
Thanks!
Your main issue is using sapply over lapply() where the latter returns a list and former attempts to simplify to a vector or matrix, here being a matrix.
major_test <- lapply(xml_url_test, XML_parser)
Of course, sapply is a wrapper for lapply and can also return a list: sapply(..., simplify=FALSE):
major_test <- sapply(xml_url_test, XML_parser, simplify=FALSE)
However, a few other items came up:
At beginning, you are not concatenating your ERS_ID to the url stem with sprintf's %s operator. So right now, the same urls are repeating.
At end, you are not binding your list of data frames into a compiled final single dataframe.
Add new ERS column inside your defined function, passing in ERS_ID vector. And while creating column, also remove the ERS prefix with gsub.
R code (adjusted)
XML_parser <- function(eid) {
XML_url <- as.vector(sprintf("http://www.ebi.ac.uk/ena/data/view/%s&display=xml", eid))
doc <- xmlParse(XML_url)
x <- xmlToDataFrame(getNodeSet(doc,"//SAMPLE_ATTRIBUTE"))
x$UNITS <- NULL
x_t <- t(x)
x_t <- as.data.frame(x_t)
names(x_t) <- as.matrix(x_t[1, ])
x_t <- x_t[-1, ]
x_t[] <- lapply(x_t, function(x) type.convert(as.character(x)))
x_t$ERSid <- gsub("ERS", "", eid) # ADD COL, REMOVE ERS
x_t <- x_t[,c(ncol(x_t),2:ncol(x_t)-1)] # MOVE NEW COL TO FIRST
return(x_t)
}
major_test <- lapply(ERS_ID, XML_parser)
# major_test <- sapply(ERS_ID, XML_parser, simplify=FALSE)
# BIND DATA FRAMES TOGETHER
finaldf <- do.call(rbind, major_test)
# RESET ROW NAMES
row.names(finaldf) <- seq(nrow(finaldf))
Using xml2 and the tidyverse you can do something like this:
require(xml2)
require(purrr)
require(tidyr)
urls <- rep("http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml", 2)
identifier <- LETTERS[seq_along(urls)] # Take a unique identifier per url here
parse_attribute <- function(x){
out <- data.frame(tag = xml_text(xml_find_all(x, "./TAG")),
value = xml_text(xml_find_all(x, "./VALUE")), stringsAsFactors = FALSE)
spread(out, tag, value)
}
doc <- map(urls, read_xml)
out <- doc %>%
map(xml_find_all, "//SAMPLE_ATTRIBUTE") %>%
set_names(identifier) %>%
map_df(parse_attribute, .id="url")
Which gives you a 2x36 data.frame. To parse the column type i would suggest using readr::type_convert(out)
Out looks as follows:
url age body product body site body-mass index chimera check collection date
1 A 28 mucosa Sigmoid colon 16.95502 ChimeraSlayer; Usearch 4.1 database 2009-03-16
2 B 28 mucosa Sigmoid colon 16.95502 ChimeraSlayer; Usearch 4.1 database 2009-03-16
disease status ENA-BASE-COUNT ENA-CHECKLIST ENA-FIRST-PUBLIC ENA-LAST-UPDATE ENA-SPOT-COUNT
1 remission 627051 ERC000015 2014-12-31 2016-10-21 1668
2 remission 627051 ERC000015 2014-12-31 2016-10-21 1668
environment (biome) environment (feature) environment (material) experimental factor
1 organism-associated habitat organism-associated habitat mucus microbiome
2 organism-associated habitat organism-associated habitat mucus microbiome
gastrointestinal tract disorder geographic location (country and/or sea,region) geographic location (latitude)
1 Ulcerative Colitis India 72.82807
2 Ulcerative Colitis India 72.82807
geographic location (longitude) host subject id human gut environmental package investigation type
1 18.94084 1 human-gut metagenome
2 18.94084 1 human-gut metagenome
medication multiplex identifiers pcr primers phenotype project name
1 ASA;Steroids;Probiotics;Antibiotics TGATACGTCT 27F-338R pathological BMRP
2 ASA;Steroids;Probiotics;Antibiotics TGATACGTCT 27F-338R pathological BMRP
sample collection device or method sequence quality check sequencing method sequencing template sex target gene
1 biopsy software pyrosequencing DNA male 16S rRNA
2 biopsy software pyrosequencing DNA male 16S rRNA
target subfragment
1 V1V2
2 V1V2
purrr is really helpful here, as you can iterate over a vector of URLs or a list of XML files with map, or within nested elements with at_depth, and simplify the results with the *_df forms and flatten.
library(tidyverse)
library(xml2)
# be kind, don't call this more times than you need to
x <- c("ERS445758","ERS445759", "ERS445760", "ERS445761", "ERS445762") %>%
sprintf("http://www.ebi.ac.uk/ena/data/view/%s&display=xml", .) %>%
map(read_xml) # read each URL into a list item
df <- x %>% map(xml_find_all, '//SAMPLE_ATTRIBUTE') %>% # for each item select nodes
at_depth(2, as_list) %>% # convert each (nested) attribute to list
map_df(map_df, flatten) # flatten items, collect pages to df, then all to one df
df
## # A tibble: 175 × 3
## TAG VALUE UNITS
## <chr> <chr> <chr>
## 1 investigation type metagenome <NA>
## 2 project name BMRP <NA>
## 3 experimental factor microbiome <NA>
## 4 target gene 16S rRNA <NA>
## 5 target subfragment V1V2 <NA>
## 6 pcr primers 27F-338R <NA>
## 7 multiplex identifiers TGATACGTCT <NA>
## 8 sequencing method pyrosequencing <NA>
## 9 sequence quality check software <NA>
## 10 chimera check ChimeraSlayer; Usearch 4.1 database <NA>
## # ... with 165 more rows
You can retrieve multiple IDs with a single REST url using a comma-separated list or range like ERS445758-ERS445762 and avoid multiple queries to the ENA.
This code gets all 5 samples into a node set and then applies functions using a leading dot in the xpath string so its relative to that node.
ERS_ID <- c("ERS445758","ERS445759", "ERS445760", "ERS445761", "ERS445762")
url <- paste0( "http://www.ebi.ac.uk/ena/data/view/", paste(ERS_ID, collapse=","), "&display=xml")
doc <- xmlParse(url)
samples <- getNodeSet( doc, "//SAMPLE")
## check the first node
samples[[1]]
## get the sample attribute node set and apply xmlToDataFrame to that
x <- lapply( lapply(samples, getNodeSet, ".//SAMPLE_ATTRIBUTE"), xmlToDataFrame)
# labels for bind_rows
names(x) <- sapply(samples, xpathSApply, ".//PRIMARY_ID", xmlValue)
library(dplyr)
y <- bind_rows(x, .id="sample")
z <- subset(y, TAG %in% c("age","sex","body site","body-mass index") , 1:3)
sample TAG VALUE
15 ERS445758 age 28
16 ERS445758 sex male
17 ERS445758 body site Sigmoid colon
19 ERS445758 body-mass index 16.9550173
50 ERS445759 age 58
51 ERS445759 sex male
...
library(tidyr)
z %>% spread( TAG, VALUE)
sample age body site body-mass index sex
1 ERS445758 28 Sigmoid colon 16.9550173 male
2 ERS445759 58 Sigmoid colon 23.22543185 male
3 ERS445760 26 Sigmoid colon 20.76124567 female
4 ERS445761 30 Sigmoid colon 0 male
5 ERS445762 36 Sigmoid colon 0 male
I have the following tabulator separated .txt file with 9796 lines:
https://www.dropbox.com/s/fnrbmaw8odm2rqs/Kommunale_N%C3%B8gletal.txt?dl=0
I would like to read the file into R, however the file is not in a classic table format. Instead, each variable of interest has 279 rows and 16 columns, where the first row defines the variable name, the first 2 columns define a municipality name and code, and the following 14 define the years from 1993-2006. Each variable is separated by a blank row. The file includes 35 variables.
I would like to read the data into a data.frame, but with one column for the municipality name, the municipality code, and the year, and one column for each of the 35 variables.
In case your are not comfortable following links or prefer a smaller sample, the following illustrates the dataset (2 variables and 3 years of observations):
Indbyggertal 1 januar
Københavns Kommune 101 466129 467253 471300
Frederiksberg Kommune 147 87173 87466 88002
Ballerup Kommune 151 45427 45293 45356
Andel 0-17-årige
Københavns Kommune 101 14.0 14.1 14.4
Frederiksberg Kommune 147 12.4 12.5 12.6
Ballerup Kommune 151 21.2 21.1 21.3
The first 3 lines of the preferred out should look like this:
Municipality name Municipality code Year Indbyggertal 1 januar Andel 0-17-årige … Ældreudg (netto) pr 65+/67+-årig
Københavns Kommune 101 1993 466129 14 35350
Frederiksberg Kommune 147 1993 87173 12.4 33701
Ballerup Kommune 151 1993 45427 21.2 31126
There are probably more ways for doing this, but the trick I used below is to read all data in as text, then determine the positions where new blocks begin, and finally loop through all blocks reading them in and storing them in a list:
lines <- readLines("Kommunale_Nøgletal.txt", encoding = "latin1")
# Find empty lines; these start a new block
start <- c(0, grep("^[\t]+$", lines))
# Read titles
headers <- lines[start + 1]
headers <- gsub("\t", "", headers)
# Determine beginnen and ending of data blocks
begin <- start + 2
end <- c(start[-1]-1, length(lines))
# Read each of the data blocks into a list
data <- vector(mode = "list", length(headers))
for (i in seq_along(headers)) {
block <- lines[begin[i]:end[i]]
data[[i]] <- read.table(textConnection(block), sep="\t", na.strings=c("U","M","-"))
}
names(data) <- headers
Setting the correct headers in each of the data sets should be simple after this and combining then into one data.frame can be done using rbind_all from the dplyr package. Below an example:
# Set columnnames in data
# Add variable name to data
for (i in names(data)) {
names(data[[i]]) <- c("municipality", "code", paste0("Y", 1993:2006))
data[[i]]$var = i
}
# Merge the different datasets into one data.frame
library(dplyr)
data <- rbind_all(data)
# Transpose the data
library(reshape2)
m <- melt(data, id.vars = c("municipality", "code", "var"))
res <- dcast(m, municipality + code + variable ~ var)
# Fix the year variable
names(res)[3] <- "year"
res$year <- as.numeric(gsub("Y", "", res$year))
I want to dissolve some polygons, and I am doing the following:
Batching in the shapefile (DA.shp - sensitive information hence first two sample records and only first three columns shown)
Batching in csv file called zone.csv that has the information for dissolving joining the zone.csv to DA (first five records shown due to sensitivity)
Dissolving the joined shapefile
Creating row IDs to make the dissolved shapefile into a polygondataframe for export.
It all goes smoothly, however, I want to carry the Zed and Criteria fields in my dissolved polygon, like one can using GIS. I have tried to search in vain, so any help will be appreciated.
library(rgeos)
library(rgdal)
library(sp)
# set working directory
wd <- setwd("c:/Personal/R")
# read DA shapefile
da <- readOGR(wd, "DA")
plot(da)
crs.shp <- proj4string(da)
da#data[1:2,1:3] # check first two records
OBJECTID DAUID CDUID
0 3 35204831 3520
1 5 35180720 3518
# batchin text file with zone numbers
zones.csv <- read.csv(file="c:/personal/R/Variant1.csv", header=TRUE, sep=",")
zones.csv$DAUID <- as.character(zones.csv$DAUID) # make DAUID as character for join
zones.csv[1:5,]
DAUID zed Criteria
1 35140110 3102 GGHM zones
2 35140111 3102 GGHM zones
3 35140112 3102 GGHM zones
4 35140113 3102 GGHM zones
5 35140114 3102 GGHM zones
da1 <- da # save a copy
da1#data$DAUID <- as.character(da1#data$DAUID) # make character field for join
da1#data <- merge(da1#data, zones.csv, by.x = "DAUID", by.y = "DAUID", all.x=T, sort=F)
# Now dissolve
zone.shp <- gUnaryUnion(da1, id = da1#data$zed.x)
plot(zone.shp)
# extract zone Id's to make dataframe
Gid <- sapply(slot(zone.shp, "polygons"), function(x) slot(x, "ID"))
# Create dataframe with correct rownames
z.df <- data.frame( ID=1:length(zone.shp), row.names = Gid)
# make Polygondataframe to export as shapefile
zone.shp.pdf <- SpatialPolygonsDataFrame(zone.shp, data=z.df)
zone.shp.pdf#data$crit <-
proj4string(zone.shp.pdf) <- CRS(proj4string(da))
Here is a self-contained reproducible example with some SpatialPolygons:
libarary(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
Create a data.frame, and so on.
Anyway, I think you can use raster::aggregate to solve your problem. Below is a simplified and improved script, but I cannot check if it works as I do not have your data.
library(raster)
da <- shapefile("c:/Personal/DA.shp")
zones <- read.csv("c:/personal/R/Variant1.csv", stringsAsFactors=FALSE)
da1 <- merge(da, zones, by="DAUID", all.x=TRUE)
# Now dissolve
zone.shp <- aggregate(da1, c('zed', 'Criteria'))
If you want to write this to a shapefile:
shapefile(zone.shp, 'file.shp')