I have a spatial polygon dataframe with several shapefiles. I would like to trim these shapefiles by elevation and replace the original shapefile in the dataframe. However, there appears to be an error when I try to replace the polygon after the trim. Currently, my plan is run though the following loop for each shapefile in the dataset. However, when I try dist[i,] <- temp3, I get the following error:
Error in match(value, lx) : 'match' requires vector arguments
In addition: Warning message:
In checkNames(value) :
attempt to set invalid names: this may lead to problems later on. See ?make.names
Any suggestions?
# Load spdf
dist <- rgdal::readOGR('critterDistributions.shp');
# Load elevational ranges
rangeElevation <- read.csv(file = 'elevationRanges.csv');
# Load altitude data
elevation <- raster('ETOPO1_Bed_g_geotiff.tif');
# Tidy up CRSes
crs(elevation) <- crs(dist);
# Run loop
for (i in 1:length(dist)){
subjName <- as.character(dist#data$Species[i]);
if (!(subjName %in% rangeElevation$?..Species_name)){
paste0(subjName, 'does not exist in the elevational range database.');
}
else{
erNameMatch <- match(subjName, rangeElevation$?..Species_name);
temp <- raster::reclassify(elevation, rcl = c(-Inf,rangeElevation[erNameMatch,2],NA,
rangeElevation[erNameMatch,2],rangeElevation[erNameMatch,3],1,
rangeElevation[erNameMatch,3],Inf,NA));
temp2 <- dist[i,];
temp <- mask(temp, temp2);
temp <- crop(temp, temp2);
temp3 <- rasterToPolygons(temp, na.rm = T, dissolve = T);
names(temp3) <- make.names(names(temp2), unique = T);
temp3#data <- temp2#data;
dist[i,] <- temp3; # <<<< This is the line of code that doesn't work.
}
}
Upon further thought, I came up with a workaround: initiating a list, then using rbind after the loop to unite everything back together into a single object. I'm still interested in finding out why dist[i,] <- temp3 doesn't work, but at least I was able to get this job done.
oneSPDFtoRuleThemAll <- vector(mode = "list", length = length(dist));
for (i in 1:length(dist)){
subjName <- as.character(dist#data$Species[i]);
if (!(subjName %in% rangeElevation$?..Species_name)){
paste0(subjName, 'does not exist in the elevational range database.');
}
else{
erNameMatch <- match(subjName, rangeElevation$?..Species_name);
temp <- raster::reclassify(elevation, rcl = c(-Inf,rangeElevation[erNameMatch,2],NA,
rangeElevation[erNameMatch,2],rangeElevation[erNameMatch,3],1,
rangeElevation[erNameMatch,3],Inf,NA));
temp2 <- dist[i,];
temp <- mask(temp, temp2);
temp <- crop(temp, temp2);
temp3 <- rasterToPolygons(temp, na.rm = T, dissolve = T);
names(temp3) <- make.names(names(temp2), unique = T);
temp3#data <- temp2#data;
oneSPDFtoRuleThemAll[[i]] <- temp3; # <<<< This is the line of code that doesn't work.
}
}
finalSPDF <- rbind(unlist(oneSPDFtoRuleThemAll));
Related
I have 1000 json files. And I would like to read them in parallel. I have 4 CPU cores.
I have a character vector which has the names of all the files as following:-
cik_files <- list.files("./data/", pattern = ".json")
And using this vector I load the file and extract the data and add it to the following list:-
data <- list()
Below is the code for extracting the data:-
for(i in 1:1000){
data1 <- fromJSON(paste0("./data/", cik_files[i]), flatten = TRUE)
if(("NetIncomeLoss" %in% names(data1$facts$`us-gaap`))){
data1 <- data1$facts$`us-gaap`$NetIncomeLoss$units$USD
data1 <- data1[grep("CY20[0-9]{2}$", data1$frame), c(3, 9)]
try({if(nrow(data1) > 0){
data1$cik <- strtrim(cik_files[i], 13)
data[[length(data) + 1]] <- data1
}}, silent = TRUE)
}
}
This however, takes quite a lot of time. So I was wondering how I can run the code within the for loop but in parallel.
Thanks in advance.
Here is an attempt to solve the problem in the question. Untested, since there is no data.
Step 1
First of all, rewrite the loop in the question as a function.
f <- function(i, path = "./data", cik_files){
filename <- file.path(path, cik_files[i])
data1 <- fromJSON(filename, flatten = TRUE)
if(("NetIncomeLoss" %in% names(data1$facts$`us-gaap`))){
data1 <- data1$facts$`us-gaap`$NetIncomeLoss$units$USD
found <- grep("CY20[0-9]{2}$", data1$frame)
if(length(found) > 0){
tryCatch({
out <- data1[found, c(3, 9)]
out$cik <- strtrim(cik_files[i], 13)
out
},
error = function(e) e,
warning = function(w) w)
} else NULL
} else NULL
}
Step 2
Now load the package parallel and run one of the following, depending on OS.
library(parallel)
# Not on Windows
library(jsonlite)
json_list <- mclapply(seq_along(cik_files), f, cik_files = cik_files)
# Windows
ncores <- detectCores()
cl <- makeCluster(ncores - 1L)
clusterExport(cl, "cik_files")
clusterEvalQ(cl, "cik_files")
clusterEvalQ(cl, library(jsonlite))
json_list <- parLapply(cl, seq_along(cik_files), f, cik_files = cik_files)
stopCluster(cl)
Step 3
Extract the data from the returned list json_list.
err <- sapply(json_list, inherits, "error")
warn <- sapply(json_list, inherits, "warning")
ok <- !(err | warn)
json_list[ok] # correctly read in
i am trying to solve this problem an you help me:
kegg_brite_map <- read.table("E:\\Path\\KoG1\\picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)
test_ko <- read.table("E:\Path\KoG1\test_ko.tsv", header=TRUE, sep="\t", row.names=1)
##Run function
### Reproducing the categorize by function (level 3) functionality in plain-text tables.
### Doing this because adding a column of KEGG Pathways to a table and then converting
### that table to BIOM is difficult.
categorize_by_function_l3 <- function(in_ko, kegg_brite_mapping) {
# Function to create identical output as categorize_by_function.py script,
# but with R objects instead of BIOM objects in Python.
# Input KO table is assumed to have rownames as KOs and sample names as columns.
out_pathway <- data.frame(matrix(NA, nrow=0, ncol=(ncol(in_ko) + 1)))
colnames(out_pathway) <- c("pathway", colnames(in_ko))
for(ko in rownames(in_ko)) {
# Skip KO if not in KEGG BRITE mapping df
# (this occurs with newer KOs that weren't present in PICRUSt1).
if(! ko %in% rownames(kegg_brite_mapping)) {
next
}
pathway_list <- strsplit(kegg_brite_mapping[ko, "metadata_KEGG_Pathways"], "\\|")[[1]]
for(pathway in pathway_list) {
pathway <- strsplit(pathway, ";")[[1]][3]
new_row <- data.frame(matrix(c(NA, as.numeric(in_ko[ko,])), nrow=1, ncol=ncol(out_pathway)))
colnames(new_row) <- colnames(out_pathway)
new_row$pathway <- pathway
out_pathway <- rbind(out_pathway, new_row)
}
}
out_pathway = data.frame(aggregate(. ~ pathway, data = out_pathway, FUN=sum))
rownames(out_pathway) <- out_pathway$pathway
out_pathway <- out_pathway[, -which(colnames(out_pathway) == "pathway")]
if(length(which(rowSums(out_pathway) == 0)) > 0) {
out_pathway <- out_pathway[-which(rowSums(out_pathway) == 0), ]
}
return(out_pathway)
}
#Run function to categorize all KOs by level 3 in BRITE hierarchy
test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)
#ERROR
Error in rowSums(out_pathway) :
'x' must be an array of at least two dimensions
Called from: rowSums(out_pathway)
Without this question being reproducible (see comment from #jogo), it is difficult to tell where you have issues in the code, but the error is telling you that your argument 'x' to the function rowSums() must be at least two dimensions. This makes sense because you need rows, to be able to take the sums of each of them.
out_pathway is the object that you are passing to rowSums(), so this is the place to start. This object must not have two dimensions, possibly because you are doing some aggregating, and then deleting columns in the lines above.
If out_pathway is just a one-dimensional object, you can just use sum()
I tried to combine two lists into one dataframe:
all_stas <- list()
for(i in vid_id){
stas <- get_stats(video_id = i)
all_stas <- rbind(all_stas,stas)
}
View(all_stas)
all_detail <- list()
for(i in vid_id){
detail1 <- get_video_details(video_id = i)
all_detail <- rbind(all_detail,detail1)
}
View(all_detail)
df <- data.frame(all_stas,all_detail)
write.csv(df, file = "new_file.csv")
Afterwards I would like to store it into a csv file.
When I run it it gives me the following warning message
Warning message:
In rbind(all_stas, stas) :
number of columns of result is not a multiple of vector length (arg 2)
Does anyone of you know how I can make the code work?
This block below is triggering an error
all_stas <- list()
for(i in vid_id){
stas <- get_stats(video_id = i)
all_stas <- rbind(all_stas,stas)}
If I understand your code correctly you can get around that error by
all_stas <- list()
for(i in vid_id){
all_stas[[i]] <- get_stats(video_id = i)}
corr <- function(directory, threshold) {
files <- list.files(directory, full.names = TRUE)
nu <- numeric()
for(i in length(files)) {
my_data <- read.csv(files[i])
if (sum(complete.cases(my_data)) >= threshold) {
vec_sul <- my_data[complete.cases(my_data),]$sulfate
vec_nit <- my_data[complete.cases(my_data),]$nitrate
nu <- c(nu, cor(vec_sul, vec_nit))
}
}
nu
}
I've a list of .csv files sitting inside the directory I wish to pass as an argument to the function illustrated above. I also pass threshold value as the second argument. The objective is to read through all the files in the directory parameter and check if the files have complete cases more than the threshold value passed as the second arg.
Those files that pass this criteria will further be examined and follows the evaluation of the correlation between the two variables inside it: Sulfate and Nitrate. The series of such correlation values associated with the files that have more complete cases than the threshold value will be concatenated to a numerical variable vector. At the end of the loop execution, I want the function to return the vector containing the series of the correlation values evaluated in the "if" loop.
cr <- corr("specdata", 150)
When I run the above line of code in console, I get a numerical variable which is null. Could someone help me fix the code?
Though this kind of error has been seen so many times, it still happen. You want
i in 1:length(files)
You get numeric(0) (the "numeric null" you talk about), because your loop only reads in the final file. I guess the final file does not satisfy sum(complete.cases(my_data)) >= threshold so nothing is added to nu, initialized as numeric(0).
Also, I would like to point out that
vec_sul <- my_data[complete.cases(my_data),]$sulfate
vec_nit <- my_data[complete.cases(my_data),]$nitrate
nu <- c(nu, cor(vec_sul, vec_nit))
can be replaced by
nu <- c(nu, with(my_data, cor(sulfate, nitrate, use = "complete.obs")))
Consider the vectorized lapply() across list of files which avoids expanding a preset vector. The only adjustment is that lapply will return a length equal to input list, files, hence an else statement is added to fill in for dataframes with unmet threshold condition. But outside the loop, nu is removed of these NAs.
corr <- function(directory, threshold) {
files <- list.files(directory, full.names = TRUE)
nu <- lapply(files, function(i) {
my_data <- read.csv(i)
if (sum(complete.cases(my_data)) >= threshold) {
vec_sul <- my_data[complete.cases(my_data),]$sulfate
vec_nit <- my_data[complete.cases(my_data),]$nitrate
temp <- cor(vec_sul, vec_nit)
} else {
temp <- NA # SET NAs
}
return(temp)
})
nu <- nu[!is.na(nu)] # REMOVE NAs
return(nu)
}
Alternatively, try even vapply() (arguably slightly faster) to specify a numeric vector return
corr <- function(directory, threshold) {
files <- list.files(directory, full.names = TRUE)
nu <- vapply(files, function(i) {
my_data <- read.csv(i)
if (sum(complete.cases(my_data)) >= threshold) {
vec_sul <- my_data[complete.cases(my_data),]$sulfate
vec_nit <- my_data[complete.cases(my_data),]$nitrate
temp <- cor(vec_sul, vec_nit)
} else {
temp <- NA # SET NAs
}
return(temp)
}, numeric(1))
nu <- nu[!is.na(nu)] # REMOVE NAs
return(nu)
}
I came across this function a while back that was created for fixing PCA values. The problem with the function was that it wasn't compatible xts time series objects.
amend <- function(result) {
result.m <- as.matrix(result)
n <- dim(result.m)[1]
delta <- apply(abs(result.m[-1,] - result.m[-n,]), 1, sum)
delta.1 <- apply(abs(result.m[-1,] + result.m[-n,]), 1, sum)
signs <- c(1, cumprod(rep(-1, n-1) ^ (delta.1 <= delta)))
zoo(result * signs)
}
Full sample can be found https://stats.stackexchange.com/questions/34396/im-getting-jumpy-loadings-in-rollapply-pca-in-r-can-i-fix-it
The problem is that applying the function on a xts object with multiple columns and rows wont solve the problem. Is there a elegant way of applying the algorithm for a matrix of xts objects?
My current solution given a single column with multiple row is to loop through row by row...which is slow and tedious. Imagine having to do it column by column also.
Thanks,
Here is some code to get one started:
rm(list=ls())
require(RCurl)
sit = getURLContent('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', binary=TRUE, followlocation = TRUE, ssl.verifypeer = FALSE)
con = gzcon(rawConnection(sit, 'rb'))
source(con)
close(con)
load.packages('quantmod')
data <- new.env()
tickers<-spl("VTI,IEF,VNQ,TLT")
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='remove.na', dates='1990::2013')
prices<-data$prices[,-10] #don't include cash
retmat<-na.omit(prices/mlag(prices) - 1)
rollapply(retmat, 500, function(x) summary(princomp(x))$loadings[, 1], by.column = FALSE, align = "right") -> princomproll
require(lattice)
xyplot(amend(pruncomproll))
plotting "princomproll" will get you jumpy loadings...
It isn't very obvious how the amend function relates to the script below it (since it isn't called there), or what you are trying to achieve. There are a couple of small changes that can be made. I haven't profiled the difference, but it's a little more readable if nothing else.
You remove the first and last rows of the result twice.
rowSums might be slightly more efficient for getting the row sums than apply.
rep.int is a little bit fster than rep.
amend <- function(result) {
result <- as.matrix(result)
n <- nrow(result)
without_first_row <- result[-1,]
without_last_row <- result[-n,]
delta_minus <- rowSums(abs(without_first_row - without_last_row))
delta_plus <- rowSums(abs(without_first_row + without_last_row))
signs <- c(1, cumprod(rep.int(-1, n-1) ^ (delta_plus <= delta_minus)))
zoo(result * signs)
}