How to get R to read a gdb file? - r

I am trying to get R to read in a gdb file. First thing I did was to find out its layers, which I did by running:
ogrListLayers("my_data.gdb")
It turns to out my_data has two large layers. I have tried opening both but have had no success. Here is what I have tried so far:
1)
Wont_open <- readOGR(dsn = "D:/my_data.gdb", layer = "layer_1", dropNULLGeometries = F)
I have tried the above with and without the dropNULLGeometries argument and for both layers in my_data. When running this, I get the following error:
Error in readOGR(dsn = "D:/my_data.gdb", :
Unsupported field type: Binary
Wont_open <- st_read(dsn="D:/my_data.gdb", layer = "layer_1")
I have tried the above for both layers in my_data. When I run this, R simply stops working after about 1 hour of having started the process.
3)
read_GDB_Layer <- function(dsn, layerName, overwrite = T){
conversionDir <- tempdir()
gdalUtils:: ogr2ogr(src_datasource_name = dsn, dst_datasource_name = conversionDir, f = "ESRI Shapefile", layer + layerName, verbose = T, overwrite = overwrite)
df <- read.dbf(file.path(conversionDir, paste0(layerName, ".gdbtable")))
return(df)}
Then,
Wont_open <- read_GDB_Layer(dsn = "D:/my_data.gdb", layerName = "layer_1")
I tried this for both layers and changed the .gdbtable argument of the function for .dbf to run it on both layers and it still did not work. I got the following warning messages:
1: In gdal_setInstallation(search_path = NULL, rescan = FALSE, ignore.full_scan = TRUE, :
No GDAL installation found. Please install 'gdal' before continuing:
- www.gdal.org (no HDF4 support!)
- trac.osgeo.org/osgeo4w/ (with HDF4 support RECOMMENDED)
- www.fwtools.maptools.org (with HDF4 support)
2: In gdal_setInstallation(search_path = NULL, rescan = FALSE, ignore.full_scan = TRUE, :
If you think GDAL is installed, please run:
gdal_setInstallation(ignore.full_scan=FALSE)

The st_read() function worked for me, as pointed by #sven-brandt

Related

Redefine arguments within an R core package function

My fingers are starting to tire of typing update.packages(checkBuilt = T, ask = F). I was wondering whether it's possible to redefine the default parameters within the update.packages() function. So far, I've tried adding the following to my .Rprofile file:
utils::assignInNamespace(
"update.packages",
function(checkBuilt = TRUE, ask = FALSE, ...) {
update.packages(checkBuilt = checkBuilt, ask = ask, ...)
},
"utils"
)
But when I try to use the function in R I get the following error:
update.packages()
Error: C stack usage 7976404 is too close to the limit
I've also just tried using formals() with the following in the .Rprofile:
local({
args_new <- alist(lib.loc = .libPaths(), ask = FALSE, checkBuilt = TRUE)
ind <- which(methods::formalArgs(update.packages) %in% names(args_new))
formals(update.packages)[ind] <- args_new
})
But that results in the following error upon launching R:
Error in formals(update.packages) : object 'update.packages' not found
As #Roland said in the comments, your definition is recursive. You shouldn't bother with the assignInNamespace: keeping the new function in your workspace is good enough. Then you can use utils::update.packages in its definition, e.g.
update.packages <- function(checkBuilt = TRUE, ask = FALSE, ...)
utils::update.packages(checkBuilt = checkBuilt, ask = ask, ...)
You should avoid using assignInNamespace for the reasons listed in its help page.

get_file() error in google colab R enviroment

I have this error:
Error in get_file(fname = "flores.zip", origin = "https://drive.google.com/u/0/uc?export=download&confirm=sgo2&id=107ocoPLxNddbHp2MDsIWIYX9qb196WUv", : argument "file" is missing, with no default
Here is My code:
data_dir <- get_file(fname ="flores.zip",
origin ="https://drive.google.com/u/0/uc?export=download&confirm=sgo2&id=107ocoPLxNddbHp2MDsIWIYX9qb196WUv",
extract = TRUE
)
data_dir <- file.path(dirname(data_dir), "flores")
images <- list.files(data_dir, pattern = ".jpg", recursive = TRUE)
length(images)
In my desktop version RStudio works! But not in google colab R enviroment
can anybody help me?
Thanks!

How can I correctly use the cluster plan in the R future (furrr) package

I am currently using furrr to create a more organized execution of my model. I use a data.frame to pass parameters to a function in a orderly way, and then using the furrr::future_map() to map a function across all the parameters. The function works flawlessly when using the sequential and multicore futures on my local Machine (OSX).
Now, I want to test my code creating my own cluster of AWS instances (just as shown here).
I created a function using the linked article code:
make_cluster_ec2 <- function(public_ip){
ssh_private_key_file <- Sys.getenv('PEM_PATH')
github_pac <- Sys.getenv('PAC')
cl_multi <- future::makeClusterPSOCK(
workers = public_ip,
user = "ubuntu",
rshopts = c(
"-o", "StrictHostKeyChecking=no",
"-o", "IdentitiesOnly=yes",
"-i", ssh_private_key_file
),
rscript_args = c(
"-e", shQuote("local({p <- Sys.getenv('R_LIBS_USER'); dir.create(p, recursive = TRUE, showWarnings = FALSE); .libPaths(p)})"),
"-e", shQuote("install.packages('devtools')"),
"-e", shQuote(glue::glue("devtools::install_github('user/repo', auth_token = '{github_pac}')"))
),
dryrun = FALSE)
return(cl_multi)
}
Then, I create the cluster object and then check that is connected to the right instance
public_ids <- c('public_ip_1', 'public_ip_2')
cls <- make_cluster_ec2(public_ids)
f <- future(Sys.info())
And when I print f I get the specs of one of my remote instances, which indicates the socket is correctly connected:
> value(f)
sysname
"Linux"
release
"4.15.0-1037-aws"
version
"#39-Ubuntu SMP Tue Apr 16 08:09:09 UTC 2019"
nodename
"ip-xxx-xx-xx-xxx"
machine
"x86_64"
login
"ubuntu"
user
"ubuntu"
effective_user
"ubuntu"
But when I run my code using my cluster plan:
plan(list(tweak(cluster, workers = cls), multisession))
parameter_df %>%
mutate(model_traj = furrr::future_pmap(list('lat' = latitude,
'lon' = longitude,
'height' = stack_height,
'name_source' = facility_name,
'id_source' = facility_id,
'duration' = duration,
'days' = seq_dates,
'daily_hours' = daily_hours,
'direction' = 'forward',
'met_type' = 'reanalysis',
'met_dir' = here::here('met'),
'exec_dir' = here::here("Hysplit4/exec"),
'cred'= list(creds)),
dirtywind::hysplit_trajectory,
.progress = TRUE)
)
I get the following error:
Error in file(temp_file, "a") : cannot open the connection
In addition: Warning message:
In file(temp_file, "a") :
cannot open file '/var/folders/rc/rbmg32js2qlf4d7cd4ts6x6h0000gn/T//RtmpPvdbV3/filecf23390c093.txt': No such file or directory
I can not figure out what is happening under the hood, and I can not traceback() the error either from my remote machines. I have test the connection with the examples in the article and things seem to run correctly. I am wondering why is trying to create a tempdir during the execution. What am I missing here?
(This is also an issue in the furrr repo)
Disable the progress bar, i.e. don't specify .progress = TRUE.
This is because .progress = TRUE assumes your R workers can write to the a temporary file that the main R process created. This is typically only possible when you parallelize on on the same machine.
A smaller example of this error is:
library(future)
## Set up a cluster with one worker running on another machine
cl <- makeClusterPSOCK(workers = "node2")
plan(cluster, workers = cl)
y <- furrr::future_map(1:2, identity, .progress = FALSE)
str(y)
## List of 2
## $ : int 1
## $ : int 2
y <- furrr::future_map(1:2, identity, .progress = TRUE)
## Error in file(temp_file, "a") : cannot open the connection
## In addition: Warning message:
## In file(temp_file, "a") :
## cannot open file '/tmp/henrik/Rtmp1HkyJ8/file4c4b864a028ac.txt': No such file or directory

Ignoring the symbol ° in R devtools function document()

I would like to create a package for internal usage (not to distribute somewhere). One of my functions contains the line
if (data$unit[i] != "°C") {
It works perfectly in the script, but if I want to create the documentation for my package using document() from devtools, i get the error
Error in parse(text = lines, keep.source = TRUE, srcfile = srcfilecopy(file, path_to_my_code: unexpected INCOMPLETE_STRING
279: if (! is.na(data$unit[i]){
280: if (data$unit[i] != "
addition: Warning message:
In readLines(con, warn = FALSE, n = n, ok = ok, skipNul = skipNul) :
invalid input found on input connection 'path_to_my_code'
If I delete the °-character, document() works. But I need this character there, so this is not an option.
When using double-\ in the if-clause, my function doesn't detect °C anymore as shown here:
test <- c("mg/l", "°C")
"\\°C" %in% test
[1] FALSE
If I use tryCatch, the documentation is also not created.
Replacing "°C" by gsub(pattern = '\\\\', replacement = "", x = '\\°C') causes the function to crash at the double-\ .
How can I tell document() that everything is fine and it should just create the files?

How to customize clustering in mclust?

I am trying to use the mclust package of R. I want to cluster some data.
Here are the steps to what I have done :
Reading data :
mydata <- read.table("\Users......", row.names= 1, sep = "\t", header = TRUE)
Using mclust : library(mclust)
mydataModel <- Mclust(mydata)
summary(mydataModel)
It breaks into 7 clusters. However, I want my data to be broken only into 2 clusters. Please help on how to do ?
As mentioned by MrFlick, you should read the documentation by adding a ?function().
In your case, do ?Mclust() in your R console to see how default parameters have been set up.
This will show up once you do ?Mclust()
Mclust(data, G = NULL, modelNames = NULL,
prior = NULL,
control = emControl(),
initialization = NULL,
warn = mclust.options("warn"), ...)
All you need to do is:
Mclust(mydata, 2)

Resources