R minus in row name makes name sometimes unaccessible - r

I have an error I cant recreate with smaller examples so I hope anyone has an idea where to look at.
The problem
As described in the code-comments: rownamesX is not found in the rownames of the matrix (But they are there of course). If I print the not-found names, something like this comes out:
hsa−miR−00
It should be
hsa-miR-00
Further, the I tested some different approaches:
Code works if I source the subscript directly in Rstudio in the console (ctrl-shift-s hotkey)
Code works if I call the function in the Console (Ctrl-Enter on the line)
Code does not work if the subscript is sourced in the main script by (Ctrl-Enter on the line)
Code does not work if the whole main.R is sourced (ctrl-shift-s hotkey)
My environment:
The data matrix
~200k elements
rownames in the form of "type-type2-number"
colnames (=samples) : "S1", "S2", ...
The call:
A main script
sources a subscript
sources a function
calls the function with the data matrix as parameter
The function:
myFunction <- function(rownamesX = c("type-type2-number")
,mat){
indexes <- which(rownames(mat) %in% rownamesX) # This is empty
mat.part <- mat[indexes, ] # therefore his is empty
resp <- mat.part[1, ] - mat.part[2, ] # therefore this yields an error
}

The mistake was quite easy:
There are more than one "-":
−
-
These two look even more equal in Rstudio than here. So I looked for the first(larger) one when the second (smaller) one was in the rowname

Related

Issue with summary() function in R

I am new to programming and trying to learn R using swirl.
In one of the exercises I was told to use the summary function on a dataset. However I encountered a discrepancy in the way the summary was printed:
Instead of summarizing the categorical variable values, it instead says something about length, class and mode.
I went around searching for why this might be happening to no avail, but I did manage to find what the output is supposed to look like:
Any help would be greatly appreciated!
This behaviour is due to the option stringsAsFactors, which is FALSE by default on R 4. Previously it was TRUE by default:
From R 4 news: "now uses a `stringsAsFactors = FALSE' default, and hence by default no longer converts strings to factors in calls to data.frame() and read.table()."
A way to return to the previous behaviour with the same code is to run options(stringsAsFactors=T) before building dataframes. However, there is a warning saying this option will eventually be removed, as explained here.
For your new code, you can use the stringsAsFactors parameter, for instance data.frame(..., stringsAsFactors=T).
If you already have dataframes and you want to convert them, you could use this function to convert all character variables (you will have to adapt if only some variables need conversion):
to.factors <- function(df) {
i <- sapply(df, is.character)
df[i] <- lapply(df[i], as.factor)
df
}

R automatically executes code following "unexpected symbol"-errors

in recent months i realized a very annoying behaviour on both windows and unix R with R-Studio installations.
After an error, R is auto-executing every code it finds following a line producing an error(here: "unexpected symbol"). Here is an example code
vec1 <- c("Hallo", "World"
vec2 <- c(1,2,3)
print(vec2)
print(vec1)
In the first line:
vec1 <- c("Hallo", "World"
R is missing a closing ")". After erronously initializing it, this happens:
vec1 <- c("Hallo", "World"
+
+ vec2 <- c(1,2,3)
Error: unexpected symbol in:
"
vec2"
>
> print(vec2)
Error in print(vec2) : object 'vec2' not found
>
> print(vec1)
Error in print(vec1) : object 'vec1' not found
>
R apparently does try to look for a closing bracket, finds one, gives the expected "Unexpected symbol"-error, but instead of stopping it does try to execute the next line (and everything else following) as well.
Is this R- or R-Studio related and how can i stop that?
edit:
I should clarify what the problem is, based on the comments. This behaviour is not intended, nor did i plan to include faulty lines to my code!
Sometimes one just forgets to add a bracket, or comma, or whatever, but still initializes such a line. Then - at least for me - R has this very annoying behaviour to then run through the entire code. Here is a real life example:
Somewhat later in the same situation, model objects were written over, which was very annoying.
So again, i dont want you to correct the code, i would like to learn why R behaves as descrined and how to stop it.
It sounds like you're expecting R to stop when it finds an error. After all, that's what traditional compiled languages like C and Java do. But R isn't a compiled language. Each line of code is interpreted in order. This is an inherent part of R and doesn't have anything to do with RStudio. In your example, it's really hard for R to figure out where the call to c() ends because you're missing the close parenthesis.
One RStudio feature that I find useful for preventing this specific type of error is the auto-formatter (CTRL-SHIFT-A). When formatting the code sample you provide, it becomes obvious that something's not right when you look at the indentation.
The code changes from this...
vec1 <- c("Hallo", "World"
vec2 <- c(1, 2, 3)
print(vec2)
print(vec1)
To this..
vec1 <- c("Hallo", "World"
vec2 <- c(1, 2, 3)
print(vec2)
print(vec1)
The fact that the bottom three lines are indented so far to the right gives me a warning that I might have missed a closing parenthesis.
Generally
If your question is about broader error handling, you can often use a function to prevent R from continuing when it encounters an error. This won't work with your example since the parentheses are wrong, but it gives an answer to the broader question of when you can get R to stop upon encountering a problem.
Let's generate an error.
stop("This is an error")
print("The code keeps running!")
Notice how the second line runs after the error. Now let's wrap that code in a function.
demo_function <- function() {
stop("This is an error")
print("The code keeps running!")
}
demo_function()
The function throws an error and halts execution.
It's a good idea to put high-risk code inside of a function for exactly this reason. With the example you provided, R will throw an error as soon as you try to define the function, which might help you catch an error earlier in the development process.
As per the customer support of R-Studio, this behaviour is related to R-Studio and can be stopped by unticking "Execute all lines in a statement" under Global Options -> Editing -> Execution. Sorry for bothering.
You have to add some commas (' , ') and some parenthesis to your syntax, try with:
> vec1 <- c("Hello", "World")
> vec2 <- c(1,2,3)
> print(vec2)
> print(vec1)
It should work.

Calling objects from list

I'm having some trouble calling an object from a list, from a created variable within my for loop.
for (i in 1:10)
{
#create variables and run through function
varName<-paste("var",i,sep="")
assign(varName, rnmf(data, k=i, showprogress=FALSE))
#create new variable using object 3 from varName output
varNF<-paste("varNF",i,sep="")
assign(varNF, (data-varName[[3]])^2)
}
My problem is with the second part of my for loop. I am attempting to use the third object from the output of my first created variable, in the calculation of my second variable. If I use varName[[3]] I get "subscript out of bounds", and if I use varName$fit, I get "$ operator is invalid for atomic vectors".
It looks like varName in my second part is not calling the incrementing varName (var1, var2, var3, etc...) that I am creating, but it is calling the actual variable varName. To try and get around that, I instead tried
assign(varNF, (data-get(paste("var",i,"[[3]]",sep="")))^2)
Which gave me the error "object 'var1[[3]]' not found". But, if I simply call var1[[3]] in my R console, it does exist. I'm not quite sure where to go from here. Any help would be great!
A very useful rule of thumb in R is:
If you find yourself using either assign() or get() in your code, it's a strong indicator that you are approaching the problem with the wrong tools. If you still think you should use those functions, think again. The tools that you are missing are most likely R lists and subsetting of lists.
(and tell everyone that you know about the above)
In your case, I would do something like:
library("rNMF")
[...]
var <- list()
varNF <- list()
for (i in 1:10) {
res <- rnmf(data, k = i, showprogress = FALSE)
var[[i]] <- res
varNF[[i]] <- (data - res$fit)^2
}

how to figure out which statement in lapply fails in R

I often have the situation like this:
result <- lapply(1:length(mylist), function(x){
doSomething(x)
})
However, if it fails, I have no idea which element in the list failed on doSomething().
So then I end up recoding it as a for loop:
for(i in 1: length(mylist)){
doSomething(mylist[[i]])
}
I can then see the last value of i and what happened. There must be a better way to do this right?? Thanks!
Notice how the error includes 5L
> lapply(1:10, function(i) if (i == 5) stop("oops"))
Error in FUN(1:10[[5L]], ...) : oops
indicating that the 5th iteration failed.
One simple option is to run the code:
options( error=recover )
before running lapply (see ?recover for details).
Then when/if an error occurs you will instantly be put into the recover mode that will let you examine which function you are in, what arguments were passed to that function, etc. so you can see which step you are on and what the possible reason for the error is.
You can also use try or tryCatch as mentioned in the comments to either skip elements that produce an error or print out information on where they occur.

Why can't I pass a dataset to a function?

I'm using the package glmulti to fit models to several datasets. Everything works if I fit one dataset at a time.
So for example:
output <- glmulti(y~x1+x2,data=dat,fitfunction=lm)
works just fine.
However, if I create a wrapper function like so:
analyze <- function(dat)
{
out<- glmulti(y~x1+x2,data=dat,fitfunction=lm)
return (out)
}
simply doesn't work. The error I get is
error in evaluating the argument 'data' in selecting a method for function 'glmulti'
Unless there is a data frame named dat, it doesn't work. If I use results=lapply(list_of_datasets, analyze), it doesn't work.
So what gives? Without my said wrapper, I can't lapply a list of datasets through this function. If anyone has thoughts or ideas on why this is happening or how I can get around it, that would be great.
example 2:
dat=list_of_data[[1]]
analyze(dat)
works fine. So in a sense it is ignoring the argument and just literally looking for a data frame named dat. It behaves the same no matter what I call it.
I guess this is -yet another- problem due to the definition of environments in the parse tree of S4 methods (one of the resons why I am not a big fan of S4...)
It can be shown by adding quotes around the dat :
> analyze <- function(dat)
+ {
+ out<- glmulti(y~x1+x2,data="dat",fitfunction=lm)
+ return (out)
+ }
> analyze(test)
Initialization...
Error in eval(predvars, data, env) : invalid 'envir' argument
You should in the first place send this information to the maintainers of the package, as they know how they deal with the environments internally. They'll have to adapt the functions.
A -very dirty- workaround for yourself, is to put "dat" in the global environment and delete it afterwards.
analyze <- function(dat)
{
assign("dat",dat,envir=.GlobalEnv) # put the dat in the global env
out<- glmulti(y~x1+x2,data=dat,fitfunction=lm)
remove(dat,envir=.GlobalEnv) # delete dat again from global env
return (out)
}
EDIT:
Just for clarity, this is really about the worst solution possible, but I couldn't manage to find anything better. If somebody else gives you a solution where you don't have to touch your global environment, by all means use that one.

Resources