Selecting and renaming columns in SpatialPointsDataFrame - r

I'm working with a feature class dataset extracted from a geodatabase, which I've filtered to my area of interest and intersected with a SpatialPointsDataFrame. In order to export it to a shapefile with WriteOGR I need to format the attribute names and I also want to only select specific columns to export in my final shapefile. I have been running into a lot of errors using standard select or base R subletting techniques. For some reason R doesn't seem to recognize the column names when I try to select. I've tried lots of different methods and can't figure out where I'm going wrong.
```bfcln%>%
+ select(STATEFP,DP2_HC03_V, DP2_HC03V.1)
Error in tolower(use) : object 'STATEFP' not found```
# create a spatial join between bf_pop and or_acs
#check CRS
```crsbf <- bf_pop#proj4string```
# change acs CRS to match bf_pop
```oracs_reprj <- spTransform(or_acs, crsbf)```
# join by spatial attributes
```bf_int <- raster::intersect(bf_pop, oracs_reprj)```
#truncate field names to 10 characters for ESRI formatting
```names(bf_int) <- strtrim(names(bf_int),10)```
#remove duplicates from attribute table
```bfcln <- bf_int[which(!duplicated(bf_int$id)), ]```
After failing with the select() method multiple times, I tried renaming columns.
# rename variables of interest
```bfcln1 <-bfcln%>%
select(DP2_HC03_V)%>%
rename(DP2_HC03_V=pcntunmar)%>%
select(DP2_HC03_V.1)%>%
rename(DP2_HC03_V.1=pcntirsh)
Error in tolower(use) : object 'DP2_HC03_V' not found```

To rename spatial files you'll need to install the package spdplyr.
Similarly to dplyr, you'd do:
df <- df %>%
rename(newName = oldName)

Related

R: Read specific columns of .dta file and converting variable names to lower case without reading whole file

I have a folder with multiple .dta files and I'm using the read_dta() function of the haven library to bind them. The problem is that some of the files have thier column names in lower case and others have them in upper case.
I was wondering if there is a way to only read the specific columns by changing their name to lower case in every case without reading the whole file and then selecting the columns, since the files are really large and this would take forever.
I was hoping that by using the .name_repair = element in the read_dta() function I could do this, but I really don't know how.
Im trying something like this
#Set working directory:
setwd("T:/")
#List of .dta file names to bind:
list_names<-list_names[grepl("_sdem.dta", list_names)]
#Variable names to select form those files:
vars_select<-c("r_def", "c_res", "ur", "con", "n_hog", "v_sel", "n_pro_viv","fac", "n_ren", "upm","eda", "clase1", "clase2", "clase3", "ent", "sex", "e_con", "niv_ins", "eda7c", "tpg_p8a","emp_ppal", "tue_ppal", "sub_o" )
#Read and bind ONLY the selected variables form the list of files
dataset <- data.frame()
for (i in 1:length(list_names)){
temp_data <- read_dta(list_names[i], col_select = vars_select)
dataset <- rbind(dataset, temp_data)
}
The problem is that when some of the files have their variable names in upper case format, their variables are not in the vars_select list and therefore, the next error appears:
Error: Can't subset columns that don't exist.
x Columns `r_def`, `c_res`, `n_hog`, `v_sel`, `n_pro_viv`, etc. don't exist.
I was trying to use the .name_repair = element in the read_dta() function to try to correct this, by using the tolower() function.
I was trying something like this with a specific file that has an upper case variable name format:
example_data <- read_dta("T:/2017_2_sdem.dta", col_select = vars_select, .name_repair = tolower(names()))
But the same error appears:
Error: Can't subset columns that don't exist.
x Columns `r_def`, `c_res`, `n_hog`, `v_sel`, `n_pro_viv`, etc. don't exist.
Thanks so much for your help!

Replacing column values based on related column in R

I'm currently working on a dataset which has an address and a zip code column. I'm trying to deal with the invalid/missing data in zip code by finding a different record with same address, and then filling the corresponding zip code to the invalid zip code. What would be the best approach to go about doing this?
Step 1. Using the non-missing addresses and zip codes construct a dictionary
data frame of sorts. For example, in a data frame "df" with an "address"
column and a "zip_code" column, you could get this via:
library(dplyr)
zip_dictionary <- na.omit(select(df, address, zip_code))
zip_dictionary <- distinct(zip_dictionary)
This assumes there is only one unique value of "zip_code" for each "address"
in your data. If not, you need to figure out which value to use and filter or
recode it accordingly.
Step 2. Install the {elucidate} package from GitHub and use the translate()
function to fill in the missing zip codes using the extracted dictionary from
step 1:
remotes::install_github("bcgov/elucidate")
library(elucidate)
df <- df %>%
mutate(zip_code = if_else(is.na(zip_code),
translate(address,
old = zip_dictionary$address,
new = zip_dictionary$zip_code)
)
)
disclaimer: I am the author of the {elucidate} package

How to Subset One Spatial Data Set into Multiple Spatial Datasets Based on Feature Name in R

I have a spatial dataset, polyline, that contains 115 line features and am trying to figure out if it is possible to select and save each line feature out into individual shape files using a loop or similar function?
I understand how to do this individually using subset (example below), however repeating this process 115 times seems like a waste of time and the power of R.
I am including an example of the data below:
trailname <- ("trail1", "trail2", "trail3")
trailtype <- ("mountain", "flat", "hilly")
parking <- ("no", "yes", "no")
shapelength <- ("835", "5728", "367")
trails <- data.frame(accessname, trailtype, parking, shapelength)
Here is a single subset example:
trail1 <- subset(trails, trailname == "trail1")
I would like to select each trail, and save it out as the name that appears under the "trail name" column i.e., trail1.shp
In base R, couldn't you us the assign function in a for loop to do this?
trailname <- c("trail1", "trail2", "trail3")
trailtype <- c("mountain", "flat", "hilly")
parking <- c("no", "yes", "no")
shapelength <- c("835", "5728", "367")
trails <- data.frame(trailname, trailtype, parking, shapelength)
for(i in 1:nrow(trails)){
name <- as.character(trails$trailname[[i]])
assign( name, subset(trails, trailname == trails$trailname[[i]]) )
}
EDITED TO ANSWER OP'S COMMENT
This should be do-able with a few tweaks. One item to note is that the example you provided is a data frame, while the writeOGR function takes...
SpatialPointsDataFrame, SpatialLinesDataFrame, or SpatialPolygonsDataFrame objects as defined in the sp package.
These type of objects have data frames, but also other attributes that are likely of interests. Let's assume your data is in one of these accepted types. I'll use rgdal cities data as an example. If all we care about is saving the files outside of our R session, then skip the assign function and drop the subset into the writeOGR function:
library('rgdal')
#loading in data
cities <- readOGR(system.file("vectors", package = "rgdal")[1], "cities")
#taking only first two rows for this example
shap <- cities[1:2,]
#where you want to save these files. This places it on your current working directory
location <- getwd()[[1]]
for(i in 1:nrow(shap)){
# name of file
name <- as.character(shap$NAME[[i]])
# change shap to your 'SpatialPointsDataFrame'
writeOGR(subset(shap, NAME == shap$NAME[[i]]), location, name , driver="ESRI Shapefile")
}
There is a R package called ShapePattern. Look up the function shpsplitter. Seems to do what you want. Otherwise you can do it in other GIS software, see here https://gis.stackexchange.com/questions/25709/splitting-shapefile-into-separate-files-for-each-feature-using-qgis-gdal-saga

R spCbind error

I have successfully added information to shapefiles before (see my post on http://rusergroup.swansea.ac.uk/Healthmap.ashx?HL=map ).
However, I just tried to do it again with a slightly different shapefile (new local health boards for Wales) and the code fails at spCbind with a "row names not identical error"
o <- match(wales.lonlat$NEW_LABEL, wds$HB_CD)
wds.xtra <- wds[o,]
wales.ncchd <- spCbind(wales.lonlat, wds.xtra)
My rows did have different names before and that didn't cause any problems. I relabeled the column in wds.xtra to match "NEW_LABEL" and that doesn't help.
The labels and order of labels do match exactly between wales.lonlat and wds.xtra.
(I'm using Revolution R 5.0, which is built on R 2.13.2)
I use match to merge data to the sp data slot based on rownames (or any other common ID). This avoids the necessity of maptools for the spCbind function.
# Based on rownames
sdata#data=data.frame(sdata#data, new.df[match(rownames(sdata#data), rownames(new.df)),])
# Based on common ID
sdata#data=data.frame(sdata#data, new.df[match(sdata#data$ID, new.df$ID),])
# where; sdata is your sp object and new.df is a data.frame object that you want to merge to sdata.
I had the same error and could resolve it by deleting all other data, which were not actually to be added. I suppose, they confused spCbind because the matching wanted to match all row-elements, not only the one given. In my example, I used
xtra2 <- data.frame(xtra$ID_3, xtra$COMPANY)
to extract the relevant fields and fed them to spCbind afterwards
gadm <- spCbind(gadm, xtra2)

colnames intgroup arguement of arrayQualityMetrics package of Biobase

I am using a package from Biobase : arrayQualityMetrics for creating the plots for visualization of microarray data.
My data is stored in ExpressionSet.
one of the column names of the phenoData(ExpressionSet) has name "Tissue" but when i run the following command :
arrayQualityMetrics(ExpressionSet,intgroup = "Tissue")
It gives me an error saying that :
Error in prepdata(expressionset, intgroup = intgroup, do.logtransform = do.logtransform) :
all elements of 'intgroup' should match column names of 'pData(expressionset)'.
I dont understand why I ma getting this error although my ExpressionSet contains a column names "Tissue" in its phenoData.
It's been awhile since you asked this question but this is likely due to arrayQualityMetrics() having to trim down the data frame in your pData() slot to a limited number of fields for display in the metadata table at the beginning of the report.
Try something like:
tmp <- pData(ExpressionSet)
pData(ExpressionSet) <- tmp[,c("Tissue", "SomeOtherInterestingField")] # swap out
arrayQualityMetrics(ExpressionSet,intgroup="Tissue")
pData(ExpressionSet) <- tmp # replace with your original full pData() data frame

Resources