Rank order row values in R while keeping NA values - r

I'm trying to convert values in a data frame to rank order values by row. So take this:
df = data.frame(A = c(10, 20, NA), B = c(NA, 10, 20), C = c(20, NA, 10))
When I do this:
t(apply(df, 1, rank))
I get this:
[1,] 1 3 2
[2,] 2 1 3
[3,] 3 2 1
But I want the NA values to continue showing as NA, like so:
[1,] 1 NA 2
[2,] 2 1 NA
[3,] NA 2 1

Try using the argument na.last and set it to keep:
t(apply(df, 1, rank, na.last='keep'))
Output:
A B C
[1,] 1 NA 2
[2,] 2 1 NA
[3,] NA 2 1
As mentioned in the documentation of rank:
na.last:
for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if "keep" they are kept with rank NA.

Here a dplyr approach
Libraries
library(dplyr)
Data
df <- tibble(A = c(10, 20, NA), B = c(NA, 10, 20), C = c(20, NA, 10))
Code
df %>%
mutate(across(.fns = ~rank(x = .,na.last = "keep")))
Output
# A tibble: 3 x 3
A B C
<dbl> <dbl> <dbl>
1 1 NA 2
2 2 1 NA
3 NA 2 1

Related

How to verify if when a column is NA the other is not?

I have a dataframe with two columns. I need to check if where a column is NA the other is not. Thanks
Edited.
I would like to know, for each row of the dataframe, if there are rows with both columns not NA.
You can use the following code to check which row has no NA values:
df <- data.frame(x = c(1, NA),
y = c(2, NA))
which(rowSums(is.na(df))==ncol(df))
Output:
[1] 1
As you can see the first rows has no NA values so both columns have no NA values.
Here's a simple code to generate a column of the NA count for each row:
x <- sample(c(1, NA), 25, replace = TRUE)
y <- sample(c(1, NA), 25, replace = TRUE)
df <- data.frame(x, y)
df$NA_Count <- apply(df, 1, function(x) sum(is.na(x)))
df
x y NA_Count
1 NA 1 1
2 NA NA 2
3 1 NA 1
4 1 NA 1
5 NA NA 2
6 1 NA 1
7 1 1 0
8 1 1 0
9 1 1 0

Delete records containing more than 5 null values?

I would like to know how I can remove from a dataset the records that have more than 5 null values in the columns that define them. The following code allows you to delete records with any NA in any column. However, how can I modify it to do exactly what I ask? Any ideas?
df [ complete.cases (df),]
Here is an example data frame. One of the rows has 6 NA values.
We sum the NA values by row in a new column, filter where the number of NA is less than or equal to 5, then remove the new column.
df <- data.frame(a = c(1,NA,1,1),
b = c(1, NA, NA, 1),
c = c(1, NA, NA, NA),
d = c(1, NA, NA ,NA),
e = c(1, NA, NA, NA),
f = c(1, NA, NA, NA))
a b c d e f
1 1 1 1 1 1 1
2 NA NA NA NA NA NA
3 1 NA NA NA NA NA
4 1 1 NA NA NA NA
df %>%
mutate(count = rowSums(is.na(df))) %>%
filter(count <= 5) %>%
select(-count)
a b c d e f
1 1 1 1 1 1 1
2 1 NA NA NA NA NA
3 1 1 NA NA NA NA
I'm assuming you are referring to values of NA in your data indicating a missing value. NULL is returned by expressions and functions whose value is undefined. First create some reproducible data:
set.seed(42)
vals <- sample.int(1000, 250)
idx <- sample.int(250, 100)
vals[idx] <- NA
example <- as.data.frame(matrix(vals, 25))
Now compute the number of missing values by row and exclude the rows with more than 5 missing values:
na.count <- rowSums(is.na(example))
example[na.count<=5, ]

Fill in values between different start and end values

As the title suggests my issue is the following. I have one variable identifying the beginning of an event and another variable indicating the end time of the same event. I want an variable indicating whether an event took place or not.
dat <-
data.frame(
"t" = c(1:10),
"id1" = c(1, NA, NA, NA, 2, 3, NA, NA, 4, NA),
"id2" = c(NA, 1, NA, NA, NA, 2, NA, 3, 4, NA),
"desiredoutcome" = c(1, 1, 0, 0, 1, 1, 1, 1,1, 0)
)
Here, the variable desired outcome would take value 1 whenever it is between the same value of id1 and id2. Consider e.g. row 6. it is both between id =2 and id = 3 and the dummy should hence be 1.
Any idea how I can achieve this?
How about this ?
#Position of non-NA index in `id1`
inds1 <- which(!is.na(dat$id1))
#Corresponding position of non-NA index in `id2`
inds2 <- match(dat$id1[inds1], dat$id2)
#Initialise the result column to 0
dat$result <- 0
#create a sequence between inds1 and inds2 and assign value as 1.
dat$result[unique(unlist(Map(seq, inds1, inds2)))] <- 1
dat
# t id1 id2 desiredoutcome result
#1 1 1 NA 1 1
#2 2 NA 1 1 1
#3 3 NA NA 0 0
#4 4 NA NA 0 0
#5 5 2 NA 1 1
#6 6 3 2 1 1
#7 7 NA NA 1 1
#8 8 NA 3 1 1
#9 9 4 4 1 1
#10 10 NA NA 0 0
Here is one way to do it,
First convert NA to 0,
dat[is.na(dat)] <- 0
Then we use ifelse
ifelse((dat$id1 == 0 & dat$id2 == 0), 0, 1)
In the form of dataframe,
dat = cbind(dat, ifelse((dat$id1 == 0 & dat$id2 == 0), 0, 1))

Conditional filter with if statements

My data consists of columns and rows. Each column has "NA" and different numbers.
For example column1 is:
2
1
1
NA
1
NA
NA
NA
I want to assign a column id to the numbers in each column.
for(j in 1:54){
if(!(col[j] <-"NA")){
col[j] <- i
}
}
Expected result for column1:
1
1
NA
NA
NA
1
NA
NA
1
**column 2: **
2
2
NA
NA
NA
2
NA
NA
2
You can use
v <- c(2, 1, NA, NA, 4, 5, NA)
id <- ifelse(!is.na(v), 1, NA)
id
1 1 NA NA 1 1 NA
This means you don't need the for loop here. If you can apply a function to a vector you should avoid using the for loop.
Also, please provide your data so that others can actually use it (like in my code above).
EDIT
According to the comments you have multiple columns. You can use same code. See here
df <- data.frame(a= c(2, 1, NA, NA, 4, 5, NA), b= c(3, NA, NA, NA, 5, NA, 6))
id <- sapply(1:ncol(df), function(i){
ifelse(!is.na(df[ , i]), i, NA)})
id
a b
[1,] 1 2
[2,] 1 NA
[3,] NA NA
[4,] NA NA
[5,] 1 2
[6,] 1 NA
[7,] NA 2

Issue with local variables in r custom function

I've got a dataset
>view(interval)
# V1 V2 V3 ID
# 1 NA 1 2 1
# 2 2 2 3 2
# 3 3 NA 1 3
# 4 4 2 2 4
# 5 NA 5 1 5
>dput(interval)
structure(list(V1 = c(NA, 2, 3, 4, NA),
V2 = c(1, 2, NA, 2, 5),
V3 = c(2, 3, 1, 2, 1), ID = 1:5), row.names = c(NA, -5L), class = "data.frame")
I would like to extract the previous not NA value (or the next, if NA is in the first row) for every row, and store it as a local variable in a custom function, because I have to perform other operations on every row based on this value(which should change for every row i'm applying the function).
I've written this function to print the local variables, but when I apply it the output is not what I want
myFunction<- function(x){
position <- as.data.frame(which(is.na(interval), arr.ind=TRUE))
tempVar <- ifelse(interval$ID == 1, interval[position$row+1,
position$col], interval[position$row-1, position$col])
return(tempVar)
}
I was expecting to get something like this
# [1] 2
# [2] 2
# [3] 4
But I get something pretty messed up instead.
Here's attempt number 1:
dat <- read.table(header=TRUE, text='
V1 V2 V3 ID
NA 1 2 1
2 2 3 2
3 NA 1 3
4 2 2 4
NA 5 1 5')
myfunc1 <- function(x) {
ind <- which(is.na(x), arr.ind=TRUE)
# since it appears you want them in row-first sorted order
ind <- ind[order(ind[,1], ind[,2]),]
# catch first-row NA
ind[,1] <- ifelse(ind[,1] == 1L, 2L, ind[,1] - 1L)
x[ind]
}
myfunc1(dat)
# [1] 2 2 4
The problem with this is when there is a second "stacked" NA:
dat2 <- dat
dat2[2,1] <- NA
dat2
# V1 V2 V3 ID
# 1 NA 1 2 1
# 2 NA 2 3 2
# 3 3 NA 1 3
# 4 4 2 2 4
# 5 NA 5 1 5
myfunc1(dat2)
# [1] NA NA 2 4
One fix/safeguard against this is to use zoo::na.locf, which takes the "last observation carried forward". Since the top-row is a special case, we do it twice, second time in reverse. This gives us the "next non-NA value in the column (up or down, depending).
library(zoo)
myfunc2 <- function(x) {
ind <- which(is.na(x), arr.ind=TRUE)
# since it appears you want them in row-first sorted order
ind <- ind[order(ind[,1], ind[,2]),]
# this is to guard against stacked NA
x <- apply(x, 2, zoo::na.locf, na.rm = FALSE)
# this special-case is when there are one or more NAs at the top of a column
x <- apply(x, 2, zoo::na.locf, fromLast = TRUE, na.rm = FALSE)
x[ind]
}
myfunc2(dat2)
# [1] 3 3 2 4

Resources