I have a simple question but it cost me hours. I would like to cbind() a matrix and a dataframe. The point is, they don't have equal lengths.
matrix:
condition
[,1]
ILMN_1666845 TRUE
ILMN_1716400 TRUE
Data.frame
a
t1 t2 t3 t4
1 0 1 1 1
If I use cbind() without a loop, everything is ok and this is the result:
b<-cbind(condition,a)
b
condition t1 t2 t3 t4
ILMN_1666845 TRUE 0 1 1 1
ILMN_1716400 TRUE 0 1 1 1
But in a for loop I get the following error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 1
Can anyone help me? Thanks!
For loop code:
for (p in 1:nrow(outcomes)) {
id <- apply(regulationtable, 1, function(i)
sum(i[1:length(regulationtable)] != outcomes[p,])==0)
idd<-as.matrix(id)
condition = subset(idd, idd[,1]==TRUE)
a<-as.data.frame(t(outcomes[p,]))
b<-cbind(condition,a)
write.table(b, "file.txt", append=TRUE)}
As far as I could read from your code, you try to cbind a possible empty object, which never works. That's also what the error is telling you. Probably at some point a is just empty, as there are no matches. So just add a condition
if(sum(id) !=0) { ... }
You could benefit quite a lot from rewriting your code to take this into account. I tried to guess what you wanted to do, and this code does exactly the same :
xx <- apply(outcomes,1,function(p){
id <- apply(regulationtable,1,function(i)
sum(i != p ) == 0)
if(sum(id) !=0)
cbind(as.data.frame(id[id]),t(p))
})
write.table(do.call(rbind,xx),file="file")
It returns you a list xx with, for every possible outcome, the genes that have the same regulationpattern. This is tested with :
outcomes <- expand.grid(c(0,1),c(0,1),c(0,1),c(0,1))
regulationtable <- data.frame(
t1=sample(0:1,10,replace=T),
t2=sample(0:1,10,replace=T),
t3=sample(0:1,10,replace=T),
t4=sample(0:1,10,replace=T)
)
rownames(regulationtable) <- paste("Gene",1:10,sep="-")
Related
For example i have a dataframe that has nothing inside but i need it to run the full code cause it usually expects there to be data. I tried this but it did not work
ifelse(dim(df_empty)[1]==0,rbind(Shots1B_empty,NA))
Maybe something like this:
df_empty <- data.frame(x=integer(0), y = numeric(0), a = character(0))
if(nrow(df_empty) == 0){
df_empty <- rbind(df_empty, data.frame(x=NA, y=NA, a=NA))
}
df_empty
# x y a
#1 NA NA NA
Simple question, OP, but actually pretty interesting. All the elements of your code should work, but the issue is that when you run as is, it will return a list, not a data frame. Let me show you with an example:
growing_df <- data.frame(
A=rep(1, 3),
B=1:3,
c=LETTERS[4:6])
df_empty <- data.frame()
If we evaluate as you have written you get:
df <- ifelse(dim(df_empty)[1]==0, rbind(growing_df, NA))
with df resulting in a List:
> class(df)
[1] "list"
> df
[[1]]
[1] 1 1 1 NA
The code "worked", but the resulting class of df is wrong. It's odd because this works:
> rbind(growing_df, NA)
A B c
1 1 1 D
2 1 2 E
3 1 3 F
4 NA NA <NA>
The answer is to use if and else, rather than ifelse(), just as #akrun noted in their answer. The reason is found if you dig into the documentation of ifelse():
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
Since dim(df_empty)[1] and/or nrow(df_empty) are both vectors, the result will be saved as a list. That's why if {} works, but not ifelse() here. rbind() results in a data frame normally, but the class of the result stored into df when assigning with ifelse() is decided based on the test element, not the resulting element. Compare that to if{} statements, which have a result element decided based on whatever expression is input into {}.
We may need if/else instead of ifelse - ifelse requires all arguments to be of same length, which obviously will be not the case when we rbind
Shots1B_empty <- if(nrow(df_empty) == 0) rbind(Shots1B_empty, NA)
For Loop in R including paste and ifelse
Dear Community,
I'm trying to combine a paste command in combination with an ifelse command in a for loop with R Studio - but I'm receiving the error message Error: Unexpected '}' in "}".
Download simplified Dataset:
I have created a simple dataset to illustrate my problem. You can download via the following link:
https://www.dropbox.com/scl/fi/jxvcfvxnv7pf8e74tulzq/Data.xlsx?dl=0&rlkey=de7b3e2d0ge8ju6tsvz2bkzuv
Background:
I would like evaluate empirical data from a test. Therefore I need to create a new column (called Point.[i]) for every item (i = 1-4). If the value in the "Answer.[i]"-column corresponds to the value of the "Right.Answer.[i]"-column, I would like to give 1 point (otherwise 0 points) for that task.
My Code
This is easy for one column:
data1 <- mutate(data, Point.1 = ifelse(Answer.1 == Right.Answer.1, 1, 0))
Now I would like write a for loop doing this for all columns, but the following code is not working:
for (i in 1:4) {
data[i] <- mutate(data, paste("Answer.",i) = ifelse(paste("Answer.",i) == paste("Answer.",i), 1, 0))
print(data)
}
I would be very grateful for any advice. Thanks in advance!
Karla
Try with this:
library(dplyr)
#Code
for (i in 1:3) {
var <- sym(paste0("Answer.",i))
var2 <- sym(paste0("Right.Answer.",i))
data <- data %>% mutate(!!var := ifelse(!!var == !!var2, 1, 0))
print(data)
}
Output:
data
Person Answer.1 Right.Answer.1 Answer.2 Right.Answer.2 Answer.3 Right.Answer.3
1 1 1 A 1 B 0 B
2 2 0 A 0 B 1 B
3 3 1 A 0 B 1 B
I'm working with multiple big data frames in R and I'm trying to write functions that can modify each of them (given a set of common parameters). One function is giving me trouble (shown below).
RawData <- function(x)
{
for(i in 1:nrow(x))
{
if(grep(".DERIVED", x[i,]) >= 1)
{
x <- x[-i,]
}
}
for(i in 1:ncol(x))
{
if(is.numeric(x[,i]) != TRUE)
{
x <- x[,-i]
}
}
return(x)
}
The objective of this function is twofold: first, to remove any rows that contain a ".DERIVED" string in any one of their cells (using grep), and second, to remove any columns that are non-numeric (using is.numeric). I get an error on the following condition:
if(grep(".DERIVED", x[i,]) >= 1)
The error states the "argument is of zero length", which I believe is usually associated with NULL values in a vector. However, I've used is.null on the entire data frame that is giving me errors, and it confirmed that there are no null values in the DF. I'm sure I'm missing something relatively simple here. Any advice would be greatly appreciated.
If you can use non-base-R functions, this should address your issue. df is the data.frame in question here. It will also be faster than looping over rows (generally not advised if avoidable).
library(dplyr)
library(stringr)
df %>%
filter_all(!str_detect(., '\\.DERIVED')) %>%
select_if(is.numeric)
You can make it a function just as you would anything else:
mattsFunction <- function(dat){
dat %>%
filter_all(!str_detect(., '\\.DERIVED')) %>%
select_if(is.numeric)
}
you should probably give it a better name though
The error is from the line
if(grep(".DERIVED", x[i,]) >= 1)
When grep doesn't find the term ".DERIVED", it returns something of zero length, your inequality doesn't return TRUE or FALSE, but rather returns logical(0). The error is telling you that the if statement cannot evaluate whether logical(0) >= 1
A simple example:
if(grep(".DERIVED", "1234.DERIVEDabcdefg") >= 1) {print("it works")} # Works nicely, since the inequality can be evaluated
if(grep(".DERIVED", "1234abcdefg") > 1) {print("no dice")}
You can replace that line with if(length(grep(".DERIVED", x[i,])) != 0)
There's something else you haven't noticed yet, which is that you're removing rows/columns in a loop. Say you remove the 5th column, the next loop iteration (when i = 6) will be handling what was the 7th row! (this will end in an error along the lines of Error in[.data.frame(x, , i) : undefined columns selected)
I prefer using dplyr, but if you need to use base R functions there are ways to to this without if statements.
Notice that you should consider using the regex version of "\\.DERIVED" and not ".DERIVED" which would mean "any character followed by DERIVED".
I don't have example data or output, so here's my best go...
# Made up data
test <- data.frame(a = c("data","data.DERIVED","data","data","data.DERIVED"),
b = (c(1,2,3,4,5)),
c = c("A","B","C","D","E"),
d = c(2,5,6,8,9),
stringsAsFactors = FALSE)
# Note: The following code assumes that the column class is numeric because the
# example code provided assumed that the column class was numeric. This will not
# detects if the column is full of a string of character values of only numbers.
# Using the base subset command
test2 <- subset(test,
subset = !grepl("\\.DERIVED",test$a),
select = sapply(test,is.numeric))
# > test2
# b d
# 1 1 2
# 3 3 6
# 4 4 8
# Trying to use []. Note: If only 1 column is numeric this will return a vector
# instead of a data.frame
test2 <- test[!grepl("\\.DERIVED",test$a),]
test2 <- test2[,sapply(test,is.numeric)]
# > test2
# b d
# 1 1 2
# 3 3 6
# 4 4 8
I have two questions.
for (k in 1:iterations) {
corr <- cor(df2_prod[,k], df2_qa[,k])
ifelse(is.numeric(corr), next,
ifelse((all(df2_prod[,k] == df2_qa[,k])) ), (corr <- 1), (corr <- 0))
correlation[k,] <- rbind(names(df2_prod[k]), corr)
}
This is my requirement - I want to calculate correlation for variables in a loop using the code corr <- cor(df2_prod[,k], df2_qa[,k]) If i receive a correlation value in number, I have to keep the value as it is.
Some time it happens that if two columns have the same values, i receive "NA" as output for the vector "corr".
x y
1 1
1 1
1 1
1 1
1 1
corr
[,1]
[1,] NA
I am trying to handle in such a way that if "NA" is received, i will replace the values with "1" or "0".
My questions are:
When I check the class of "corr" vector, I am getting it as "matrix". I want to check whether that is a number or not. Is there any other way other than checking is.numeric(corr)
> class(corr)
[1] "matrix"
I want to check if two columns has same value or not. Something like the code below. If it returns true, I want to proceed. But the way I have put the code in the loop is wrong. Could you please help me how this can be improved:
((all(df2_prod[,k] == df2_qa[,k]))
Is there any effective way to do this?
I sincerely apologize the readers for the poorly framed question / logic. If you can show me pointers which can improve the code, I would be really thankful to you.
1.
You basically want to avoide NAs, right? So you could check the result with is.na().
a <- rep(1, 5)
b <- rep(1, 5)
if(is.na(cor(a, b))) cor.value <- 1
2.You could count how many times the element of a is equal to the element of b with sum(a==b) and check whether this amount is equal to the amount of elements in a (or b) --> length(a)
if(sum(a==b) == length(a)) cor.value <- 1
An example to explain how the cor function works:
set.seed(123)
df1 <- data.frame(v1=1:10, v2=rnorm(10), v3=rnorm(10), v4=rnorm(10))
df2 <- data.frame(w1=rnorm(10), w2=1:10, w3=rnorm(10))
Here, the first variable of df1 is equal to the second variable of df2. Function cor directly applied on the first 3 variables of each data.frame gives:
cor(df1[, 1:3], df2[, 1:3])
# w1 w2 w3
#v1 -0.4603659 1.0000000 0.1078796
#v2 0.6730196 -0.2602059 -0.3486367
#v3 0.2713188 -0.3749826 -0.2520174
As you can notice, the correlation coefficient between w2 and v1 is 1, not NA.
So, in your case, cor(df2_prod[, 1:k], df2_qa[, 1:k]) should provide you the desired output.
A New Year's quandary for the stackoverflow community which has been quite the help by reading posts and answers in the past (this is my first question). I've found a work around, but I'm wondering if other approaches/solutions might be suggested.
I am attempting to remove trailing NA's from a large data.frame, but those NA's are only found in a few of the columns of the data.frame and I would like to retain all columns in the output. Here is a representative data subset.
df=data.frame(var1=rep("A", 8), var2=c("a","b","c","d","e","f","g","h"), var3=c(0,1,NA,2,3,NA,NA,NA), var4=c(0,0,NA,4,5,NA,NA,NA), var5=c(0,0,NA,0,2,4,NA,NA))
Goals of the process:
Trim trailing NAs based on NA presence in var3,var4 and var5
Retain all columns in final output
Only remove trailing NAs (i.e. row 3 remains in record as a placeholder)
Only trim if all columns have an NA (i.e. row 7 and 8, but not row 6)
Based on these goals, the solution should remove the last two rows of df:
df.output = df[-c(7,8),]
The behaviour of na.trim (in the zoo package) is ideal (as it limits removal to those NA's at the end of the data.frame, with sides="right"), and my work-around involved altering the na.trim.default function to include a subset term.
Any suggestions? Many thanks for any help.
EDIT: Just to complete this question, below is the function I created from the na.trim.default code which also works, but as noted, does require loading the zoo package.
na.trim.multiplecols <- function (object, colrange, sides = c("both", "left", "right"), is.na = c("any","all"),...)
{
is.na <- match.arg(is.na)
nisna <- if (is.na == "any" || length(dim(object[,colrange])) < 1) {
complete.cases(object[,colrange])
}
else rowSums(!is.na(object[,colrange])) > 0
idx <- switch(match.arg(sides), left = cumsum(nisna) > 0,
right = rev(cumsum(rev(nisna) > 0) > 0), both = (cumsum(nisna) >
0) & rev(cumsum(rev(nisna)) > 0))
if (length(dim(object)) < 2)
object[idx]
else object[idx, , drop = FALSE]
}
Something based on max(which(!is.na())) will work. We use this to find the largest index of non-missing data from the columns of interest.
Using your df
ind <- max(max(which(!is.na(df$var3))),
max(which(!is.na(df$var4))),
max(which(!is.na(df$var5))))
df[1:ind, ]
var1 var2 var3 var4 var5
1 A a 0 0 0
2 A b 1 0 0
3 A c NA NA NA
4 A d 2 4 0
5 A e 3 5 2
6 A f NA NA 4
Edit: First solution using base rle and apply
t <- rle(apply(as.matrix(df[,3:5]), 1, function(x) all(is.na(x))))
r <- ifelse(t$values[length(t$values)] == TRUE, t$lengths[length(t$lengths)], 0)
head(df, -r)
Second solution using Rle from package IRanges:
require(IRanges)
t <- min(sapply(df[,3:5], function(x) {
o <- Rle(x)
val <- runValue(o)
if (is.na(val[length(val)])) {
len <- runLength(o)
out <- len[length(len)]
} else {
out <- 0
}
}))
head(df, -t)