Checking the class of an object of a package in R - r

I want to check for which all objects are data frames in package called "datasets" and if found for them I want to check for the conditions like if dimensions of those data frames is 248 observations and 8 variables then print those object
This is the code I have tried, but unable to get the output.
library(datasets)
lsf.str("package:datasets")
listname <- as.list((ls("package:datasets")))
lst=c()
for(i in listname){
lst <- is.data.frame(as.name(listname[[i]]))
}
Any help would be much appreciated.

What you bascially need is the get function and the Filter function.
datasets <- as.environment("package:datasets")
dataset_names <- ls(envir = datasets)
filter_func <- function(x){
y <- get(x, envir = datasets)
is.data.frame(y) && all(dim(y) == c(248, 8))
}
Filter(filter_func, dataset_names)

Related

How can I automatically load all data sets from my own package

Assume I have two packages
package1
data sets d1, d2, d3
package2 which should use data sets from package1
package1 contains data sets, which I want to use for testing.
I can access each of these via e.g. package1::d1. But how can I load all of them in an automated way?
Something like
ds <- data(package = "package1") # you can try e.g. "carData"
ds$results[1, 3] # gives the first entry
mydataset <- load(ds$results[1, 3]) # this does not work
Other will use both packages, so it should work for others and on different platforms (Windows, Mac)
Any ideas?
ds$results[1, 3]
# Item
# "d1"
looks promising, but
data(ds$results[1, 3])
# Warning message:
# In data(ds$results[1, 3]) : data set ‘ds$results[1, 3]’ not found
As indicated by #dcarlson, you could extract the names of all datasets in your package and give them back as list to the data() function as argument list. However this solution only returns a promise for each dataset and not the actual dataset.
my_package <- "datasets"
name_of_all_datasets <- data.frame(data(package = my_package)$results)$Item
data(list = name_of_all_datasets, package = my_package)
I just realized, there are two options:
You want to load external files from inst/extdata
pathExtData <- system.file("extdata", package = "myPackage")
allFilenames <- list.files(pathExtData, full.names = TRUE)
# e.g. in case of Excel files
datalist <- list()
for (i in 1:length(allFilenames)) {
datalist[[i]] <- readxl::read_xlsx(path = allFilenames[i], sheet = "mySheet")
}
You want to load RData from a package you can use
ds <- data(package = "myPackage")
datalist <- list()
for (i in 1:length(ds$results[, 3])) {
eval(parse(text = paste0("datalist[[", i, "]] <- myPackage::", ds$results[i, 3])))
}

Error in rowSums(out_pathway) : 'x' must be an array of at least two dimensions

i am trying to solve this problem an you help me:
kegg_brite_map <- read.table("E:\\Path\\KoG1\\picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)
test_ko <- read.table("E:\Path\KoG1\test_ko.tsv", header=TRUE, sep="\t", row.names=1)
##Run function
### Reproducing the categorize by function (level 3) functionality in plain-text tables.
### Doing this because adding a column of KEGG Pathways to a table and then converting
### that table to BIOM is difficult.
categorize_by_function_l3 <- function(in_ko, kegg_brite_mapping) {
# Function to create identical output as categorize_by_function.py script,
# but with R objects instead of BIOM objects in Python.
# Input KO table is assumed to have rownames as KOs and sample names as columns.
out_pathway <- data.frame(matrix(NA, nrow=0, ncol=(ncol(in_ko) + 1)))
colnames(out_pathway) <- c("pathway", colnames(in_ko))
for(ko in rownames(in_ko)) {
# Skip KO if not in KEGG BRITE mapping df
# (this occurs with newer KOs that weren't present in PICRUSt1).
if(! ko %in% rownames(kegg_brite_mapping)) {
next
}
pathway_list <- strsplit(kegg_brite_mapping[ko, "metadata_KEGG_Pathways"], "\\|")[[1]]
for(pathway in pathway_list) {
pathway <- strsplit(pathway, ";")[[1]][3]
new_row <- data.frame(matrix(c(NA, as.numeric(in_ko[ko,])), nrow=1, ncol=ncol(out_pathway)))
colnames(new_row) <- colnames(out_pathway)
new_row$pathway <- pathway
out_pathway <- rbind(out_pathway, new_row)
}
}
out_pathway = data.frame(aggregate(. ~ pathway, data = out_pathway, FUN=sum))
rownames(out_pathway) <- out_pathway$pathway
out_pathway <- out_pathway[, -which(colnames(out_pathway) == "pathway")]
if(length(which(rowSums(out_pathway) == 0)) > 0) {
out_pathway <- out_pathway[-which(rowSums(out_pathway) == 0), ]
}
return(out_pathway)
}
#Run function to categorize all KOs by level 3 in BRITE hierarchy
test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)
#ERROR
Error in rowSums(out_pathway) :
'x' must be an array of at least two dimensions
Called from: rowSums(out_pathway)
Without this question being reproducible (see comment from #jogo), it is difficult to tell where you have issues in the code, but the error is telling you that your argument 'x' to the function rowSums() must be at least two dimensions. This makes sense because you need rows, to be able to take the sums of each of them.
out_pathway is the object that you are passing to rowSums(), so this is the place to start. This object must not have two dimensions, possibly because you are doing some aggregating, and then deleting columns in the lines above.
If out_pathway is just a one-dimensional object, you can just use sum()

Aggregating all SpatialPolygonsDataFrame objects from list into one SpatialPolygonsDataFrame

Not looking to editing topology, merely aggregating all polygons into one sp object of type SpatialPolygonsDataFrame (spdf). There is only one polygon per spdf.
Data (dropbox link to data) (filesize 1.1KB) ( dput() not appropriate in this instance):
list_of_spdf <- unlist(readRDS("data.Rds"))
I get the desired result with:
one_spdf <- rbind(list_of_spdf[1][[1]], list_of_spdf[2][[1]], list_of_spdf[3][[1]], makeUniqueIDs = TRUE)
# when plotting can see two polygons (third object is a repeat for sake of testing)
plot(one_spdf)
Having hundreds of objects (though only one polygon per spdf), I need to do the rbind programatically. So I tried lapply
list_of_spdf <- lapply(list_of_spdf, rbind, makeUniqueIDs = TRUE)
Obviously, this returns a list and therefore not what I'm looking for.
So I wrote a function:
rbindSPDF <- function(lst) {
# Create empty spdf objects
pol <-
SpatialPolygonsDataFrame(SpatialPolygons(list()), data = data.frame())
pols <-
SpatialPolygonsDataFrame(SpatialPolygons(list()), data = data.frame())
# loop for rbind
for (i in 1:length(lst)) {
pol[i] <- lst[i][[1]]
if (length(pols) == 0) {
pols <- pol[i]
} else {
pols <- rbind(pols, pol[i], makeUniqueIDs = TRUE)
}
}
return(pols)
}
However, when using rbindSPDF:
single_spdf <- rbindSPDF(list_of_spdf)
I get:
Error in as.vector(data) :
no method for coercing this S4 class to a vector
Not sure what I'm doing wrong here.
Plus, I'm guessing I probably don't even need to use my own function.
Note: On top of many other packages, I'm using spand rgdal for spatial data and would rather avoid using yet another one due to attaching/detaching time and masking.
To have a programmatical version of
one_spdf <- rbind(list_of_spdf[1][[1]],
list_of_spdf[2][[1]],
list_of_spdf[3][[1]],
...
makeUniqueIDs = TRUE)
for a very long list in list_of_spdf, would something like the following work?
# generate list containing list_of_spdf[i][[1]]
list.df <- lapply(seq_along(list_of_spdf),
function(i){list_of_spdf[i][[1]]})
# apply rbind to the list
one_spdf2 <- do.call("rbind",
c(args = list.df, makeUniqueIDs = TRUE))
> all.equal(one_spdf, one_spdf2)
[1] TRUE
The results seem equivalent on my machine.

How to loop through input variables for a function in R?

I have a function, biovars(prec,tmin,tmax) with three variables as input. prec,tmin,tmax are multi bandGeoTiffs. However there are several Tiffs stored in different directories, that needed to be stacked before the can be passed to the function:
stack("/path1/to/file/prec1.tif")
stack("/path2/to/file/tmin1.tif")
stack("/path3/to/file/tmax1.tif")
How can I loop through each directory, stack the first file and use the result of the loop as input for the function:
biovars(prec1,tmin1,tmax1)
biovars(prec2,tmin2,tmax2)
...
biovars(precn,tminn,tmaxn)
Is there an easy way for doing this? I don't think it is possible with three for loops...
From your description, I think you can do something like this:
library(dismo)
# setwd(root of data files")
fprec <- list.files(".", pattern="prec", recursive=TRUE, full.names=TRUE)
ftmin <- gsub("prec", "tmin", fprec)
ftmax <- gsub("prec", "tmax", fprec)
fbio <- gsub("prec", "bio", fprec)
for (i in 1:length(fprec)) {
prec <- brick(fprec[i])
tmin <- brick(ftmin[i])
tmax <- brick(ftmax[i])
b <- biovars(prec, tmin, tmax, filename=fbio[i])
}
I always prefer to use *apply functions and avoid get. It also leads to easily read code (IMO).
prec_files = c("/path/prec1.tif", "/path/prec2.tif", ..., "/path/precn.tif")
tmin_files = c("/path/tmin1.tif", "/path/tmin2.tif", ..., "/path/tminn.tif")
tmax_files = c("/path/tmax1.tif", "/path/tmax2.tif", ..., "/path/tmaxn.tif")
prec_stacks <- lapply(prec_files, stack)
tmin_stacks <- lapply(tmin_files, stack)
tmax_stacks <- lapply(tmax_files, stack)
results <- mapply(FUN = biovar, prec_stacks, tmin_stacks, tmax_stacks)
If biovars returns a useful object, you can store the results of the mapply call. If you hit memory limitations by (I assume) loading all the rasters at once, you can wrap the loading and biovar call into one function:
stack_and_biovar <- function(prec_path, tmin_path, tmax_path) {
prec <- stack(prec_path)
tmin <- stack(tmin_path)
tmax <- stack(tmax_path)
biovar(prec, tmin, tmax)
}
mapply(FUN = stack_and_biovar, prec_files, tmin_files, tmax_files)
If I understand you correctly then there is a simple way to do it using either pmap or pwalk from purrr package.
Basically, you list all the files with given patterns then use pmap/pwalk to loop through that list and pass corresponding file names to raster::stack and biovars functions.
library(tidyverse)
pathGIS <- "/path1/to/file/"
filePattern <- list("^prec*", "^tmax*", "^tmin*")
# List files and sort them numerically using mixedsort from gtools
fileList <- map(filePattern, ~ gtools::mixedsort(list.files(path = pathGIS, recursive = TRUE,
pattern = .x, full.names = TRUE)))
head(fileList)
# ..1, ..2, etc. correspond to the sublist order within fileList
pwalk(fileList, ~ biovars(raster::stack(..1), raster::stack(..2), raster::stack(..3)))
pmap(fileList, ~ biovars(raster::stack(..1), raster::stack(..2), raster::stack(..3)))
Simple example
fileList <- list(c("./GIS/prec1.tif", "./GIS/prec2.tif"),
c("./GIS/tmax1.tif", "./GIS/tmax2.tif"),
c("./GIS/tmin1.tif", "./GIS/tmin2.tif"))
dummy_function <- function(precFile, tmaxFile, tminFile) {
cat("\n-----\n")
print(precFile)
print(tmaxFile)
print(tminFile)
cat("-----\n")
}
pwalk(fileList, ~ dummy_function(..1, ..2, ..3))
-----
[1] "./GIS/prec1.tif"
[1] "./GIS/tmax1.tif"
[1] "./GIS/tmin1.tif"
-----
-----
[1] "./GIS/prec2.tif"
[1] "./GIS/tmax2.tif"
[1] "./GIS/tmin2.tif"
-----

rbind multiple dataframes within a function

I found this code line below on SO and it worked as a charm outside a function to identify the list of dataframes and join them using rbind.
mylist<-ls(pattern='leg_')
mleg <- do.call(rbind, lapply(mylist, get))
But when I enclose this within a loop, I am getting an error message. I have tried to troubleshoot at various steps in the functions and those work but I might be missing something that is causing this error.
for(i in 1:(length(blg_idx))){
assign(paste(deparse(substitute(leg_)),i,sep=''),l_merge(get(paste(deparse(substitute(blg)),i,sep='')),get(paste(deparse(substitute(bsg)),i,sep=''))))
}
mylist<-ls(pattern='leg_')
#return(mylist) # this returns a good list of dataframes
#mlegleg <- rbind(leg_1,leg_2) # this works
mleg <- do.call(rbind, lapply(mylist, get))
return(mleg)
} #end function read_leg
Error in FUN(c("leg_1", "leg_2")[[1L]], ...) :
object 'leg_1' not found
When I return mylist from the function, it is able to identify all the dataframes and list them. The function is able to return leg_1 or leg_2 dataframe when I choose to return those in debugging.
[1] "leg_1" "leg_2"
Any help?
update
I found another of achieving what I need but I am sure it is inefficient although my list of dataframes is a maximum of 4
for(i in 1:(length(blg_idx))){
assign(paste(deparse(substitute(leg_)),i,sep=''),l_merge(get(paste(deparse(substitute(blg)),i,sep='')),get(paste(deparse(substitute(bsg)),i,sep=''))))
}
mylist<-ls(pattern='leg_')
#return(mylist)
#mlegleg <- rbind(leg_1,leg_2) # this works
# mleg <- do.call(rbind, lapply(mylist, get))
mleg <- leg_1
for(i in 2:(length(blg_idx))){
mleg <- rbind(leg,get(paste(deparse(substitute(leg_)),i,sep='')))
}
return(mleg)
} #end read_leg
update 2
Here is the reproducible example for the issue I am facing. For some reason do.call & get is unable to process the mylist parameter generated for dataframes generated within a function.
read_date <- function(x){
pur_1 <- data.frame(sku=859, X = sample(1:10),Y = sample(c("yes", "no"), 10, replace = TRUE))
pur_2 <- data.frame(sku=859, X = sample(11:20),Y = sample(c("yes", "no","na"), 10, replace = TRUE))
mylist<-ls(pattern='pur_')
pur_final <- do.call(rbind, lapply(mylist, get))
#fancier version that I want to achieve is below
#assign(paste('pur_',eval(pur_1$sku[1]),sep=''),do.call(rbind, lapply(mylist, get)))
return(pur_final)
}
read_date()
Error message is
read_date()
Error in FUN(c("pur_1", "pur_2")[[1L]], ...) : object 'pur_1' not found
update 3
I am sorry for unconventional management of this post but I will get better with my next ones.
Here is what I stumbled upon that is working for me with an exception.
pur_final <- do.call(rbind, mget(paste0("pur_", 1:2),envir = as.environment(-1)))
But the next not so big issue is to suppress the row.names that get added to the dataframe. Any suggestions to suppress the row.names in this context.
> read_date()
sku X Y
pur_1.1 859 8 yes
pur_1.2 859 4 no
pur_1.3 859 3 yes
....
pur_2.8 859 14 na
pur_2.9 859 13 na
pur_2.10 859 19 no
>
You do not have a reproducible example with which to test this solution but take a look at the help page for get and try this:
mleg <- do.call(rbind, lapply(mylist, get, envir = globalenv() ))
The answer above contains the key to your question: envir = globalenv()
It took me a while to realize that R will create a private environment for each function. And within that private environment your other variables don't exist. That is, unless you tell your function to look in the Global Environment by using the envir argument.
Here's a function that should take a character string as input and then identify all variables (e.g. data frames) in Global Environment that include that string of text in their name. Then it will try to bind those variables (data frames).
If all variables are data frames with the same column names, then it should return a single binded data frame. myBindedDF <- mergeCompatibleTables("mypattern")
bindCompatibleTables <- function(x){
if(is.character(x)){
mylist <- grep(x, ls(pos = 1), value=T)
mergedDF <- do.call(rbind, mget(mylist,envir = as.environment(1)))
return(bindedDF)
} else {
stop("Input is not a character string")
}
}
A late response but I just faced a similar issue to the update 2 posting where "object 'pur_1' not found".
If you want to use the following within a function when you have an unknown number of dataframes starting with "pur_", for example:
mylist <- ls(pattern='pur_')
mleg <- do.call(rbind, lapply(mylist, get))
Then you need to point to the correct environment within the function:
mylist <- ls(pattern='pur_')
mleg <- do.call(rbind, lapply(mylist, get, env=environment()))

Resources