Aggregating all SpatialPolygonsDataFrame objects from list into one SpatialPolygonsDataFrame - r

Not looking to editing topology, merely aggregating all polygons into one sp object of type SpatialPolygonsDataFrame (spdf). There is only one polygon per spdf.
Data (dropbox link to data) (filesize 1.1KB) ( dput() not appropriate in this instance):
list_of_spdf <- unlist(readRDS("data.Rds"))
I get the desired result with:
one_spdf <- rbind(list_of_spdf[1][[1]], list_of_spdf[2][[1]], list_of_spdf[3][[1]], makeUniqueIDs = TRUE)
# when plotting can see two polygons (third object is a repeat for sake of testing)
plot(one_spdf)
Having hundreds of objects (though only one polygon per spdf), I need to do the rbind programatically. So I tried lapply
list_of_spdf <- lapply(list_of_spdf, rbind, makeUniqueIDs = TRUE)
Obviously, this returns a list and therefore not what I'm looking for.
So I wrote a function:
rbindSPDF <- function(lst) {
# Create empty spdf objects
pol <-
SpatialPolygonsDataFrame(SpatialPolygons(list()), data = data.frame())
pols <-
SpatialPolygonsDataFrame(SpatialPolygons(list()), data = data.frame())
# loop for rbind
for (i in 1:length(lst)) {
pol[i] <- lst[i][[1]]
if (length(pols) == 0) {
pols <- pol[i]
} else {
pols <- rbind(pols, pol[i], makeUniqueIDs = TRUE)
}
}
return(pols)
}
However, when using rbindSPDF:
single_spdf <- rbindSPDF(list_of_spdf)
I get:
Error in as.vector(data) :
no method for coercing this S4 class to a vector
Not sure what I'm doing wrong here.
Plus, I'm guessing I probably don't even need to use my own function.
Note: On top of many other packages, I'm using spand rgdal for spatial data and would rather avoid using yet another one due to attaching/detaching time and masking.

To have a programmatical version of
one_spdf <- rbind(list_of_spdf[1][[1]],
list_of_spdf[2][[1]],
list_of_spdf[3][[1]],
...
makeUniqueIDs = TRUE)
for a very long list in list_of_spdf, would something like the following work?
# generate list containing list_of_spdf[i][[1]]
list.df <- lapply(seq_along(list_of_spdf),
function(i){list_of_spdf[i][[1]]})
# apply rbind to the list
one_spdf2 <- do.call("rbind",
c(args = list.df, makeUniqueIDs = TRUE))
> all.equal(one_spdf, one_spdf2)
[1] TRUE
The results seem equivalent on my machine.

Related

Error in rowSums(out_pathway) : 'x' must be an array of at least two dimensions

i am trying to solve this problem an you help me:
kegg_brite_map <- read.table("E:\\Path\\KoG1\\picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)
test_ko <- read.table("E:\Path\KoG1\test_ko.tsv", header=TRUE, sep="\t", row.names=1)
##Run function
### Reproducing the categorize by function (level 3) functionality in plain-text tables.
### Doing this because adding a column of KEGG Pathways to a table and then converting
### that table to BIOM is difficult.
categorize_by_function_l3 <- function(in_ko, kegg_brite_mapping) {
# Function to create identical output as categorize_by_function.py script,
# but with R objects instead of BIOM objects in Python.
# Input KO table is assumed to have rownames as KOs and sample names as columns.
out_pathway <- data.frame(matrix(NA, nrow=0, ncol=(ncol(in_ko) + 1)))
colnames(out_pathway) <- c("pathway", colnames(in_ko))
for(ko in rownames(in_ko)) {
# Skip KO if not in KEGG BRITE mapping df
# (this occurs with newer KOs that weren't present in PICRUSt1).
if(! ko %in% rownames(kegg_brite_mapping)) {
next
}
pathway_list <- strsplit(kegg_brite_mapping[ko, "metadata_KEGG_Pathways"], "\\|")[[1]]
for(pathway in pathway_list) {
pathway <- strsplit(pathway, ";")[[1]][3]
new_row <- data.frame(matrix(c(NA, as.numeric(in_ko[ko,])), nrow=1, ncol=ncol(out_pathway)))
colnames(new_row) <- colnames(out_pathway)
new_row$pathway <- pathway
out_pathway <- rbind(out_pathway, new_row)
}
}
out_pathway = data.frame(aggregate(. ~ pathway, data = out_pathway, FUN=sum))
rownames(out_pathway) <- out_pathway$pathway
out_pathway <- out_pathway[, -which(colnames(out_pathway) == "pathway")]
if(length(which(rowSums(out_pathway) == 0)) > 0) {
out_pathway <- out_pathway[-which(rowSums(out_pathway) == 0), ]
}
return(out_pathway)
}
#Run function to categorize all KOs by level 3 in BRITE hierarchy
test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)
#ERROR
Error in rowSums(out_pathway) :
'x' must be an array of at least two dimensions
Called from: rowSums(out_pathway)
Without this question being reproducible (see comment from #jogo), it is difficult to tell where you have issues in the code, but the error is telling you that your argument 'x' to the function rowSums() must be at least two dimensions. This makes sense because you need rows, to be able to take the sums of each of them.
out_pathway is the object that you are passing to rowSums(), so this is the place to start. This object must not have two dimensions, possibly because you are doing some aggregating, and then deleting columns in the lines above.
If out_pathway is just a one-dimensional object, you can just use sum()

Loop Changing to Matrix then Running tests

I have a dataframe with ~9000 rows of human coded data in it, two coders per item so about 4500 unique pairs. I want to break the dataset into each of these pairs, so ~4500 dataframes, run a kripp.alpha on the scores that were assigned, and then save those into a coder sheet I have made. I cannot get the loop to work to do this.
I can get it to work individually, using this:
example.m <- as.matrix(example.m)
s <- kripp.alpha(example.m)
example$alpha <- s$value
However, when trying a loop I am getting either "Error in get(v) : object 'NA' not found" when running this:
for (i in items) {
v <- i
v <- v[c("V1","V2")]
v <- assign(v, as.matrix(get(v)))
s <- kripp.alpha(v)
i$alpha <- s$value
}
Or am getting "In i$alpha <- s$value : Coercing LHS to a list" when running:
for (i in items) {
i.m <- i[c("V1","V2")]
i.m <- as.matrix(i.m)
s <- kripp.alpha(i.m)
i$alpha <- s$value
}
Here is an example set of data. Items is a list of individual dataframes.
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
items <- c("l","t")
I am sure this is a basic question, but what I want is for each file, i, to add a column with the alpha score at the end. Thanks!
Your problem is with scoping and extracting names from objects when referenced through strings. You'd need to eval() some of your object to make your current approach work.
Here's another solution
library("irr") # For kripp.alpha
# Produce the data
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
# Collect the data as a list right away
items <- list(l, t)
Now you can sapply() directly over the elements in the list.
sapply(items, function(v) {
kripp.alpha(as.matrix(v[c("V1","V2")]))$value
})
which produces
[1] 0.0 -0.5

R - Use names in a list to feed named objects to a loop?

I have a data frame of some 90 financial symbols (will use 3 for simplicity)
> View(syM)
symbol
1 APPL
2 YAHOO
3 IBM
I created a function that gets JSON data for these symbols and produce an output. Basically:
nX <- function(x) {
#get data for "x", format it, and store it in "nX"
nX <- x
return(nX)
}
I used a loop to get the data and store the zoo series named after each symbol accordingly.
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,]),
value = nX(x = syM[i,]))
Sys.sleep(time = 1)
}
Which results in:
[1] "APPL" "YAHOO" "IBM"
Each is a zoo series with 5 columns of data.
Further, I want to get some plotting done to each series and output the result, preferably using a for loop or something better.
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yN <- y[,2:3]
return(yN)
}
Following a similar logic to my previous loop I tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
But so far the data is not being sent to the function, only the name of the symbol, so I naturally get:
y[,2:3] : incorrect number of dimensions
I have also tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,],".plot"),
value = yN(y = ls(pattern = paste0(syM[i,]))))
}
With similar results. When I input the name of the series manually it does save the plot of the first symbol as "APPL.Plot".
assign(paste0(syM[1,], ".Plot"),
value = yN(p = APPL))
Consider lapply with setNames to create a named list of nX returned objects:
nX_list <- setNames(lapply(syM$symbol, nX), syM$symbol)
# OUTPUT ZOO OBJECTS BY NAMED INDEX
nX_list$AAPL
nX_list$YAHOO
nX_list$IBM
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(nX_list, envir=.GlobalEnv)
For plot function, first add a get inside function to retrieve an object by its string name, then similarly run lapply with setNames:
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yobj <- get(nX_list[[y]]) # IF USING ABOVE LIST
yobj <- get(y) # IF USING SEPARATE OBJECT
yN <- yobj[,2:3]
return(yN)
}
plot_list <- setNames(lapply(syM$symbol, yN), paste0(syM$symbol, ".plot"))
# OUTPUT PLOTS BY NAMED INDEX
plot_list$AAPL.plot
plot_list$YAHOO.plot
plot_list$IBM.plot
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(plot_list, envir=.GlobalEnv)
As you note, you're calling yN with a character argument in:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
paste0(syM[i,]) is going to resolve to a character and not the zoo object it appears you're trying to reference. Instead, use something like get():
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = get(paste0(syM[i,]))))
}
Or perhaps just store your zoo objects in a list in the first place and then operate on all elements of the list with something like lapply()...

Checking the class of an object of a package in R

I want to check for which all objects are data frames in package called "datasets" and if found for them I want to check for the conditions like if dimensions of those data frames is 248 observations and 8 variables then print those object
This is the code I have tried, but unable to get the output.
library(datasets)
lsf.str("package:datasets")
listname <- as.list((ls("package:datasets")))
lst=c()
for(i in listname){
lst <- is.data.frame(as.name(listname[[i]]))
}
Any help would be much appreciated.
What you bascially need is the get function and the Filter function.
datasets <- as.environment("package:datasets")
dataset_names <- ls(envir = datasets)
filter_func <- function(x){
y <- get(x, envir = datasets)
is.data.frame(y) && all(dim(y) == c(248, 8))
}
Filter(filter_func, dataset_names)

Using lapply to subset rows from data frames -- incorrect number of dimensions error

I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?
You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )

Resources