I'm looking for a simple way to move on to the next iteration in a for loop in R if the operation inside the for loop errors.
I've recreated a simple case below:
for(i in c(1, 3)) {
test <- try(i+1, silent=TRUE)
calc <- if(class(test) %in% 'try-error') {next} else {i+1}
print(calc)
}
This correctly gives me the following calc values.
[1] 2
[1] 4
However once I change the vector in i to include a non-numeric value:
for(i in c(1, "a", 3)) {
test <- try(i+1, silent=TRUE)
calc <- if(class(test) %in% 'try-error') {next} else {i+1}
print(calc)
}
This for loop doesn't work. I was hoping for the same calc values as above with the vector excluding the non-numeric value in i.
I tried using tryCatch as the following:
for(i in c(1, "a", 3)) {
calc <- tryCatch({i+1}, error = function(e) {next})
print(calc)
}
However, I get the following error:
Error in value[[3L]](cond) : no loop for break/next, jumping to top level
Could someone please help me understand how I could achieve this using a for loop in R?
As Dason noted, an atomic vector really is not the best way of storing mixed data types. Lists are for that. Consider the following:
l = list(1, "sunflower", 3)
for(i in seq_along(l)) {
this.e = l[[i]]
test <- try(this.e + 1, silent=TRUE)
calc <- if(class(test) %in% 'try-error') {next} else {this.e + 1}
print(calc)
}
[1] 2
[1] 4
In other words, your former loop "worked". It just always failed and went to next iteration.
Here is a solution using the "purr" package that might be helpful.
It goes through your list or vector and returns the elements that will cause errors
#Wrap the function you want to use in the adverb "safely"
safetest <- safely(function(x){ifelse(is.na(as.numeric(x)),
x+1,
as.numeric(x)+1)})
myvect<-c(1,"crumbs",3) #change to list if you want a list
#Use the safe version to find where the errors occur
check <- myvect %>%
map(safetest) %>%
transpose %>% .$result %>%
map_lgl(is_null)
myvect[check]
#This returns the results that did not through an error
#first remove NULL elements then flatten to double.
#The two flatten expresiion can be replaced by a single unlist
myvect %>%
map(safetest) %>%
transpose %>% .$result %>%
flatten()%>%flatten_dbl()
see https://blog.rstudio.org/2016/01/06/purrr-0-2-0/ for the original example.
Related
I try to create a loop that makes join to 5 dataframes like this
c <- list(EC_Pop, EC_GDP, EC_Inflation, ST_Tech_Exp, ST_Res_Jour)
for (i in seq_along(c))
{
if (i < 2)
{
EC_New <- c[i] %>%
left_join(c[i+1], by = c("Country","Year"))
}
else if(i > 1 & i < 4)
{
EC_New <- EC_New %>%
left_join(c[i+1], by = c("Country","Year"))
}
else
{
EC_New
}
}
But I have an error : UseMethod ("left_join") error: No Applicable method for 'left_join' applied to object of class "list"
Can somebody explain the reason? It seems very logical for me the way I wrote it...
According to the documentation of left_join, both x and y must be data frames.
Your c is a list, and so is c[i].
However, c[[i]] is a data frame. So change your code to include two square brackets.
EC_New <- c[[i]] %>%
left_join(c[[i+1]], by = c("Country","Year"))
I think you can also replace your code using Reduce:
EC_New2 <- Reduce(left_join, c)
Then check:
identical(EC_New, EC_New2) # should be TRUE
But I'm not sure since I don't have your data. It should work if the common columns are only "Country" and "Year".
And thanks to this answer, you can use the following command if the "Country" and "Year" are not the only common columns.
EC_New2 <- Reduce(function(x, y) left_join(x, y, by=c("Country","Year")), c)
By the way, try not to use function names such as c to name your R objects. While R allows this, it can lead to confusion later. For example, if you want to concatenate x and y but accidentally type c[x, y] instead of c(x, y), R may not return an error but something totally unexpected.
I'm building a package that interfaces with a git repository and works with historical versions of R functions. The trouble is that sometimes, these old functions are expecting the input data.frame to have columns it doesn't have. These columns don't affect the functionality, but they used to be in the data and they were hard-coded in these old functions. So of course, I'm getting an "undefined columns selected" error.
I want to use tryCatch to see which columns are missing and add them as dummies to my data.frame. For example,
old_fn <- function(x) {
print(x[, "c"])
return(x)
}
df <- data.frame(a = c(1,2,3), b = c(3,4,5))
result <- 0
while(result == 0) {
result <- tryCatch(
old_fn(df),
error = function(cond) {
if (grepl("undefined columns selected", cond, fixed = T)) {
missing_cols <- # ????
for (col in missing_cols) {
df[[eval(col)]] <- NA
}
return(0)
} else {
return(1)
}
}
)
}
I've tried calling traceback() and grepping the missing_cols from there but that doesn't seem to work during runtime the way I'd expect. Is there no way to see which columns are undefined?
Here's one way you could do this,
but I would feel very uncomfortable about doing it in an R package that's meant to be used by others.
I don't know if R's CMD check would flag it.
You can see the default function used to subset data frames by typing `[.data.frame` in the console.
There you can see the formal arguments and the body.
You would see that the default formals are function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 1).
You could then use trace to inject an expression that would be evaluated at the start of the function evaluation:
create_missing_cols <- function(x, j) {
missing_cols <- setdiff(j, colnames(x))
if (length(missing_cols) > 0L) {
for (column in missing_cols) {
x[[column]] <- NA
}
}
# return
x
}
trace(`[.data.frame`,
print = FALSE,
tracer = quote(x <- create_missing_cols(x, j)))
df <- data.frame(a = 1:2)
df[, c("a", "b", "c")]
a b c
1 1 NA NA
2 2 NA NA
untrace(`[.data.frame`)
This assumes that you will be using it only when j is a character vector.
EDIT: if you do end up using this,
definitely consider using on.exit(untrace(`[.data.frame`)) right after the call to trace,
so that the function is untraced even if errors occur.
In my global environment, I have variables that have the following names:
filtered_A
unfiltered_A
filtered_B
unfiltered_B
and so on...
The variable filtered_A is a subset of unfiltered_A that is a dataframe of one column containing gene names. I am trying to add a new column in unfiltered_A with two strings: "Passed" or "notPassed". "Passed" are those genes that exist in filtered_A. So basically I am building kind of a match between the two dataframes and write "Passed" or "notPassed" if they don't match.
I have wrote the following code:
```{r unfiltered}
setwd("/home/alaa/Documents/Analysis/genes/WES/unfiltered")
# list that contains sample names
vcfFiles <- list.files(getwd(), recursive = T)
for (i in vcfFiles) {
print(i)
assign(paste0("unfiltered_", i), read.table(i))
}
```
```{r filtered}
setwd("/home/alaa/Documents/Analysis/genes/WES/filtered")
for (i in vcfFiles) {
print(i)
assign(paste0("filtered_", i), read.table(i))
}
```
```{r matching}
for (i in vcfFiles){
y <- grep(i, ls())
filterd <- get(ls()[y[1]])
unfilterd <- get(ls()[y[2]])
name_filterd <- ls()[y[1]]
name_unfilterd <- ls()[y[2]]
assign(name_unfilterd, cbind(unfilterd, apply(unfilterd, 1, function(x) ifelse(any(x[1] == filterd), 'Passed','notPassed'))))
}
for (i in ls()){
if (is.data.frame(get(i)) && ncol(get(i)) == 2 && grepl(pattern = "unfiltered_", x = i)) {
print(i)
j <- get(i)
colnames(j)[2] <- "Situation"
assign(i, j)
}
}
#rm(i, j, filterd, unfilterd, name_filterd, name_unfilterd, y)
```
If I run this code, it will fail on the first time saying:
Error in apply(unfilterd, 1, function(x) ifelse(any(x[1] == filterd), :
dim(X) must have a positive length
I do understand that this is due to unfilterd being dimensionless. However, if I rerun this code, it works without any problem.
Can someone explain me what is wrong please and why is it failing on the first attempt?
In case you are wondering why I am working using the global environment and ls(), its because I have many dataframes to match.
Thanks in advance.
This code is producing
Error in subset.default(sos1, grepl(m, sos1)) : 'subset' must be logical
unik contains c("900-12004-2501-000", "900-12004-2510-000", "900-12005-0120-000")
sos1 contains c("900-12004-2501-0008000FOX1 SFOX1", 900-12004-2510-0008000FOX1 SFOX1", 900-12005-0120-0008000FOX1 SFOX')
Please Help
x <- nrow(miss)
unik <- unique(miss$Material.Number)
unik1 <- as.character(unik)
sos <- read.xlsx("trprod.xlsx", sheet = 1)
sos1 <- as.character(sos$Source.of.Supply)
output <- c()
for (i in 1:x)
{
m <- (unik1[i])
result <- subset(sos1, grepl(m, sos1))
if (length(result) == 0 ){
print('in if')
output <- c(output, m)
}
}
You get the error message because your running variable i runs from 1 to nrow(miss). Your vector unik1, however is shorter than nrow(miss), due to the unique operator being applied to it. Hence, when i exceeds the length of unik1, the variable m inside your loop becomes NA and grepl returns a vector of NAs which is of class int not logical. That's where the error comes from.
You can either change x to x <- length(unik1) or - of you really need to loop over all rows of miss - change the subset operation to
result <- subset(sos1, as.logical(grepl(m, sos1)))
I have a "hit list" of genes in a matrix. Each row is a hit, and the format is "chromosome(character) start(a number) stop(a number)." I would like to see which of these hits overlap with genes in the fly genome, which is a matrix with the format "chromosome start stop gene"
I have the following function that works (prints a list of genes from column 4 of dmelGenome):
geneListBuild <- function(dmelGenome='', hitList='', binSize='', saveGeneList='')
{
genomeColumns <- c('chr', 'start', 'stop', 'gene')
genome <- read.table(dmelGenome, header=FALSE, col.names = genomeColumns)
chr <- genome[,1]
startAdjust <- genome[,2] - binSize
stopAdjust <- genome[,3] + binSize
gene <- genome[,4]
genome <- data.frame(chr, startAdjust, stopAdjust, gene)
hits <- read.table(hitList, header=TRUE)
chrHits <- hits[hits$chr == "chr3R",]
chrGenome <- genome[genome$chr == "chr3R",]
genes <- c()
for(i in 1:length(chrHits[,1]))
{
for(j in 1:length(chrGenome[,1]))
{
if( chrHits[i,2] >= chrGenome[j,2] && chrHits[i,3] <= chrGenome[j,3] )
{
print(chrGenome[j,4])
}
}
}
genes <- unique(genes[is.finite(genes)])
print(genes)
fileConn<-file(saveGeneList)
write(genes, fileConn)
close(fileConn)
}
however, when I substitute print() with:
genes[j] <- chrGenome[j,4]
R returns a vector that has some values that are present in chrGenome[,1]. I don't know how it chooses these values, because they aren't in rows that seem to fulfill the if statement. I think it's an indexing issue?
Also I'm sure that there is a more efficient way of doing this. I'm new to R, so my code isn't very efficient.
This is similar to the "writing the results from a nested loop into another vector in R," but I couldn't fix it with the information in that thread.
Thanks.
I believe the inner loop could be replaced with:
gene.in <- ifelse( chrHits[i,2] >= chrGenome[,2] & chrHits[i,3] <= chrGenome[,3],
TRUE, FALSE)
Then you can use that logical vector to select what you want. Doing
which(gene.in)
might also be of use to you.