Affect value in R dataframe without checking if the index is empty - r

df = data.frame(A=c(1,1),B=c(2,2))
df$C = NA
df[is.na(df$B),]$C=5
Each time I want to affect a new value and the indexes found out to be empty like here is.na(df$B) , R raised raises an error replacement has 1 row, data has 0.
Is there a way that R just doesnt affect anything in these case instead of raising an error ?

We can do this in a single line instead of assigning 'C' to NA and then subsetting the data.frame. The below code will assign 5 to 'C' where there are NA elements in 'B' or else it will be NA
df$C[is.na(df$B)] <- 5

Related

Combine table and matrix with R

I am performing an analysis in R. I want to fill the first row of an matrix with the content of a table. The problem I have is that the content of the table is variable depending on the data so sometimes certain identifiers that appear in the matrix do not appear in the table.
> random.evaluate
DNA LINE LTR/ERV1 LTR/ERVK LTR/ERVL LTR/ERVL-MaLR other SINE
[1,] NA NA NA NA NA NA NA NA
> y
DNA LINE LTR/ERVK LTR/ERVL LTR/ERVL-MaLR SINE
1 1 1 1 1 4
Due to this, when I try to join the data of the matrix with the data of the table, I get the following error
random.evaluate[1,] <- y
Error in random.evaluate[1, ] <- y :
number of items to replace is not a multiple of replacement length
Could someone help me fix this bug? I have found solutions to this error but in my case they do not work for me.
First check if the column names of the table exist in the matrix
Check this link
If it exists, just set the value as usual.

Trying to find movies without directors in a ds on R

this is the code im trying to run to find rows where director is not equal to NA:
nodir <- subset(x, director=="NA",
select = c(titles))
Your problem is director=="NA". This logical comparison is defined to return NA. Because NA codes a missing value, NA == NA can be neither TRUE nor FALSE. You want is.na(director).

Excluding a number of answers from a R dataframe

I'm looking for a way to exclude a number of answers from a length function.
This is a follow on question from Getting R Frequency counts for all possible answers In sql the syntax could be
select * from someTable
where variableName not in ( 0, null )
Given
Id <- c(1,2,3,4,5)
ClassA <- c(1,NA,3,1,1)
ClassB <- c(2,1,1,3,3)
R <- c(5,5,7,NA,9)
S <- c(3,7,NA,9,5)
df <- data.frame(Id,ClassA,ClassB,R,S)
ZeroTenNAScale <- c(0:10,NA);
R.freq = setNames(nm=c('R','freq'),data.frame(table(factor(df$R,levels=ZeroTenNAScale,exclude=NULL))));
S.freq = setNames(nm=c('S','freq'),data.frame(table(factor(df$S,levels=ZeroTenNAScale,exclude=NULL))));
length(S.freq$freq[S.freq$freq!=0])
# 5
How would I change
length(S.freq$freq[S.freq$freq!=0])
to get an answer of 4 by excluding 0 and NA?
We can use colSums,
colSums(!is.na(S.freq)[S.freq$freq!=0,])[[1]]
#[1] 4
You can use sum to calculate the sum of integers. if NA's are found in your column you could be using na.rm(), however because the NA is located in a different column you first need to remove the row containing NA.
Our solution is as follows, we remove the rows containing NA by subsetting S.freq[!is.na(S.freq$S),], but we also need the second column freq:
sum(S.freq[!is.na(S.freq$S), "freq"])
# 4
You can try na.omit (to remove NAs) and subset ( to get rid off all lines in freq equal to 0):
subset(na.omit(S.freq), freq != 0)
S freq
4 3 1
6 5 1
8 7 1
10 9 1
From here, that's straightforward:
length(subset(na.omit(S.freq), freq != 0)$freq)
[1] 4
Does it solve your problem?
Just add !is.na(S.freq$S) as a second filter:
length(S.freq$freq[S.freq$freq!=0 & !is.na(S.freq$S)])
If you want to extend it with other conditions, you could make an index vector first for readability:
idx <- S.freq$freq!=0 & !is.na(S.freq$S)
length(S.freq$freq[idx])
You're looking for values with frequency > 0, that means you're looking for unique values. You get this information directly from vector S:
length(unique(df$S))
and leaving NA aside you get answer 4 by:
length(unique(df$S[!is.na(df$S)]))
Regarding your question on how to exclude a number of items based on their value:
In R this is easily done with logical vectors as you used it in you code already:
length(S.freq$freq[S.freq$freq!=0])
you can combine different conditions to one logical vector and use it for subsetting e.g.
length(S.freq$freq[S.freq$freq!=0 & !is.na(S.freq$freq)])

Conditional searching which omits NA values

I'm doing a conditional search of part of a dataset that has multiple NA values within each row.
Something like this (a preview)..
time1 time2 time3 time4 slice1 slice2 slice3 slice4
pt1 1 3 NA NA NA 1 3 5
pt2 NA 1 3 5 5 2 2 4
I want to do some conditional searching which applies a condition (comparing whether one column within a row is larger than another) for each row. I want to find all the rows (pt's) where a variable column (e.g. time1) is smaller than the corresponding column (e.g. slice 1).
all.smaller<-subset(patientdata, time1>slice1 & time2>slice2 & time3>slice3 & time4>slice4, na.rm=TRUE, select=c(1))
When I use this code (on a larger expanded table of this format), it only returns the rows without any NAs, where all the values are added in. This makes sense given the use of '&'.
My question is: Is there a way to find which rows fit my conditional search that ignores the NA's but only returns the rows where in all the column variables where values are provided, it searches whether time1>slice1, time2>slice2 etc.?
Any help is appreciated. Thanks.
You can make a function that takes a boolean (possibly NA) and maps it to TRUE if it is NA and its value otherwise.
na.true <- function(x) ifelse(is.na(x), TRUE, x)
You can then replace your subset with
na.true(time1 > slice1) & na.true(time2 > slice2) & na.true(time3 > slice3) & na.true(time4 > slice4)
You could try this.
n=1:4
cond <- paste0('((is.na(time',n,')|is.na(slice',n,'))|(time',n,'>slice',n,'))')
conds <- paste(cond, collapse=' & ')
all.smaller <- subset( patientdata, eval(parse(text=conds)) )
Essentially this checks if either time or slice are NA and forces a TRUE, and if not, check whether time is greater than slice. (Individually for each index.) It becomes clearer if you print out conds to see what it looks like.

dataframe where one column only has na values omitted

I have a data frame "accdata".
dim(accdata)
[1] 6496 188
One of the variables - "VAL" is of interest to me. I must calculate the number of instances where VAL is equal to 24.
I tried a few functions that returned error messages. After some research it seems I need to remove the NA values from VAL first.
I would try something like nonaaccdaa <- na.omit(accdata) except this removes instances of NA in any variable, not just VAL.
I tried nonaval <- na.omit(accdata[accdata$VAL]) but when I then checked the number of rows using nrow the result was null. I had expected a value between 1 and 6,496.
Whats up here?
This should do the trick:
sum(accdata$VAL == 24, na.rm=TRUE)

Resources