R : Check if R object exists before creating it - r

I am trying to skip steps loading data from large files if this has already been done earlier. Since the data ends up in (for example) mydf, I thought I could do:
if( !exists(mydf) )
{
#... steps to do loading here.
}
I got this from How to check if object (variable) is defined in R? and https://stat.ethz.ch/R-manual/R-devel/library/base/html/exists.html
However R Studio simply complains with
'Error in exists(mydf) : object 'mydf' not found
Why does it complain instead of just returning 'true' or 'false'? Any tips appreciated.

You should use exists("mydf") instead exists(mydf)

Related

Object not found error although column is in the table (data.table format)

I have defined a following function:
counter<-function(data,varname){
data[is.na(varname),.N]
}
When I pass the arguments:
counter(df,ip_address_ts)
I get the error:
Error in .checkTypos(e, names_x) : Object 'ip_address_ts' not found. Perhaps you intended ip_address_ts, email_address_ts
ip_address_ts is in df, so why does this not work?
Your code is looking the object ip_address_ts, not the string "ip_address_ts"
counter(df, "ip_address_ts")
Solution is to use get and then pass the column name as string:
counter<-function(data,varname){
data[is.na(get(varname)),.N]
}
counter(df,"ip_address_ts")
For this and other tips check out this link:
http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#3-functions
I happened to encounter the same error while working in R Studio, having checked the column exists in the data frame. Two things helped:
Restarting the session
Installing and loading the right package before running the code (In my case it
was the 'dplyr' package in order to use the filter() function).
I hope this helps

R tryCatch RODBC function issue

We have a number of MS Access databases on a server which are copies from remote locations which are updated overnight. We collate some of the data from these machines for reporting purposes on a daily basis. Sometimes the overnight update fails, meaning we don’t have access to all of the databases, so I am attempting to write an R script which will test if we can connect (using a list of the database paths), and output an updated version of the list including only those which we can connect to. This will then be used to run a further script which will only update the data related to the available databases.
This is what I have so far (I am new to R but reasonably proficient in SAS and SQL – attempting to use R both as a learning exercise and for potential cost savings);
{
# Create Store data locations listing
A=matrix(c(1000,1,"One","//Server/Comms1/Access.mdb"
,2000,2,"Two","//Server/Comms2/Access.mdb"
,3000,3,"Three","//Server/Comms3/Access.mdb"
)
,nrow=3,ncol=4,byrow=TRUE)
# Add column names
colnames(A)<-c("Ref1","Ref2","Ref3","Location")
#Create summary for testing connections (Ref1 and Location)
B<-A[,c(1,4)]
ConnectionTest<-function(Ref1,Location)
{
out<-tryCatch({ch<-odbcDriverConnect(paste("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=",Location))
sqlQuery(ch,paste("select ",Ref1," as Ref1,COUNT(variable) as Count from table"))}
,error=matrix(c(Ref1,0),nrow=1,ncol=2,byrow=TRUE)
)
return(out)
}
#Run function, using 'B' to provide arguments
C<-apply(B,1,function(x)do.call(ConnectionTest,as.list(x)))
#Convert to matrix and add column names
D<-matrix(unlist(C),ncol=2,byrow=T)
colnames(D)<-c("Ref1","Count")
}
When I run the script I get the following error message;
Error in value[3L] : attempt to apply non-function
I am guessing this is because I am using TryCatch incorrectly inside the UDF?
Does anyone have any advice on what I am doing incorrectly, or even if this is the best way to do what I am attempting?
Thanks
(apologies if this is formatted incorrectly, having to post on my phone due to Stackoverflow posting being blocked)
Edit - I think I fixed the 'Error in value[3L]' issue by adding function(e) {} around the matrix function in the error part of the tryCatch.
The issue now is that the script just fails if it can't reach one of the databases, rather than doing the matrix function. Do I need to add something else to make it ignore the error?
Edit 2 - it seems tryCatch does now work - it processes the
alternate function upon error but also shows warnings about the error, which makes sense.
As mentioned in the edit above, using 'function(e) {}' to wrap the Matrix function in the error section of the tryCatch fixed the 'Error in value[3L]' issue, so the script now works, but displays error messages if it can't access a particular channel. I am guessing the 'warning' section of the tryCatch can be used to adjust these as necessary.

rxDataStep in RevoScaleR package crashing

I am trying to create a new factor column on an .xdf data set with the rxDataStep function in RevoScaleR:
rxDataStep(nyc_lab1
, nyc_lab1
, transforms = list(RatecodeID_desc = factor(RatecodeID, levels=RatecodeID_Levels, labels=RatecodeID_Labels))
, overwrite=T
)
where nyc_lab1 is a pointer to a .xdf file. I know that the file is fine because I imported it into a data table and successfully created a the new factor column.
However, I get the following error message:
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERROR: The sample data set for the analysis has no variables.
What could be wrong?
First, RevoScaleR has some warts when it comes to replacing data. In particular, overwriting the input file with the output can sometimes causes rxDataStep to fail for unknown reasons.
Even if it works, you probably shouldn't do it anyway. If there is a mistake in your code, you risk destroying your data. Instead, write to a new file each time, and only delete the old file once you've verified you no longer need it.
Second, any object you reference that isn't part of the dataset itself, has to be passed in via the transformObjects argument. See ?rxTransform. Basically the rx* functions are meant to be portable to distributed computing contexts, where the R session that runs the code isn't be the same as your local session. In this scenario, you can't assume that objects in your global environment will exist in the session where the code executes.
Try something like this:
nyc_lab2 <- RxXdfData("nyc_lab2.xdf")
nyc_lab2 <- rxDataStep(nyc_lab1, nyc_lab2,
transforms=list(
RatecodeID_desc=factor(RatecodeID, levels=.levs, labels=.labs)
),
rxTransformObjects=list(
.levs=RatecodeID_Levels,
.labs=RatecodeID_Labels
)
)
Or, you could use dplyrXdf which will handle all this file management business for you:
nyc_lab2 <- nyc_lab1 %>% factorise(RatecodeID)

R CMD check: no visible binding for global variable ‘mypkgdata’ [duplicate]

This question already has answers here:
No visible binding for global variable Note in R CMD check
(5 answers)
Closed 4 years ago.
This question is slightly different from others on this subject -- I do indeed have a variable called "mypkgdata":
I am writing a package which ships with a data set. This data set is needed for calculations from within the package. In the DESCRIPTION file, I have specified "LazyData" for that purpose, such that the data set is always around when anyone loads the package. When I run the check, however, I get:
.getmodules2: no visible binding for global variable ‘mypkgdata’
What is the correct way of solving this problem?
If you have LazyData: TRUE in your DESCRIPTION file than the following should work:
x <- MyPackageName::mypkgdata
# ... your calculations using x
I get your note also, if I trie calling it without the MyPackageName:: part.
Here is how I have solved it. I create a custom environment in the package, load the data set in this environment, and wrote a function that returns the data set:
pkgEnv <- new.env(parent=emptyenv())
if(!exists("mypkgdata", pkgEnv)) {
data("mypkgdata", package="mypkg", envir=pkgEnv)
}
getMyPkgData <- function() {
pkgEnv[["mypkgdata"]]
}
And in the function that utilizes "mypkgdata", I write:
mypkgdata <- getMyPkgData()
Also, I gave up on lazy loading the data, as it is no longer necessary.
I think data from a package should not be flagged as invisible. However, a workaround is
if(getRversion() >= "2.15.1") utils::globalVariables("mypkgdata")
compare https://stackoverflow.com/a/17807914/3805440

Kindly check the R command

I am doing following in Cooccur library in R.
> fb<-read.table("Fb6_peaks.bed")
> f1<-read.table("F16_peaks.bed")
everything is ok with the first two commands and I can also display the data:
> fb
> f1
But when I give the next command as given below
> explore_pairs(c("fb", "f1"))
I get an error message:
Error in sum(sapply(tf1_s, score_sample, tf2_hits = tf2_s, hit_list = hit_l)) :
invalid 'type' (list) of argument
Could anyone suggest something?
Despite promising to release a version to the Bioconductor depository in the article the authors published over a year ago, they have still not delivered. The gz file that is attached to the article is not of a form that my installation recognizes. Your really should be corresponding with the authors for this question.
The nature of the error message suggests that the function is expecting a different data class. You should be looking at the specification for the arguments in the help(explore_pairs) file. If it is expecting 2 matrices, then wrapping data.matrix around the arguments may solve the problem, but if it is expecting a class created by one of that packages functions then you need to take the necessary step to construct the right objects.
The help file for explore_pairs does exist (at least in the MAN directory) and says the first argument should be a character vector with further provisos:
\arguments{
\item{factornames}{an vector of character strings, each naming a GFF-like
data frame containing the binding profile of a DNA-binding factor.
There is also a load utility, load_GFF, which I assume is designed for creation of such files.
Try rename your data frame:
names(fb)=c("seq","start","end")
Check the example datasets. The column names are as above. I set the names and it worked.

Resources