Using purrr functions to replace NAs with is.na - r

I'm looking for a way to replace NAs in various list items using the purrr::map() suite of functions in R. It seems like it should be an easy task but I can't get it to work.
The following works:
vec1 <- c(3,6,7,NaN)
vec1[is.na(vec1)] <- 0
But when I try to do this for a list of vectors using map() it doesn't work:
library(purrr)
vec1 <- c(3,6,7,NaN)
vec2 <- c(2,3,4)
vec3 <- c(1,6,NaN,NaN,1)
veclist <- list(a = vec1,
b = vec2,
c = vec3)
veclistnew <- map(veclist, function(vec){vec[is.na(vec)] <- 0})
Thoughts? I would like the output to be a list of the original vectors with the NAs replaced by 0s.

You can do the following:
na_to_y <- function(x, y){
x[is.na(x)] <- y
x # you need to return the vector after replacement
}
map(veclist, na_to_y, 0)

Another option is replace
library(purrr)
veclist %>%
map(~replace(., is.nan(.), 0))
#$a
#[1] 3 6 7 0
#$b
#[1] 2 3 4
#$c
#[1] 1 6 0 0 1

You could also use coalesce from dplyr:
library(dplyr)
veclistnew <- map(veclist, ~coalesce(., 0))
> veclistnew
$a
[1] 3 6 7 0
$b
[1] 2 3 4
$c
[1] 1 6 0 0 1

Related

Extract the single value from a 1 x 1 data.frame produced with dplyr as a vector?

I am trying to extract the value from a 1 x 1 data.frame produced with dplyr as a vector
Example
Suppose we have
library(dplyr)
df <- iris %>% summarise(ifelse(tally(.) == 150, 1, 0))
df
# n
# 1 1
I expected df[1,1] to return the desired result [1] 1 (i.e. a vector), but, instead it returns a matrix.
> df[1,1]
n
[1,] 1
Notes
Somewhat strangely, when we create a similar data.frame manually, we can retrieve the value as a vector with .[1,1]
> data.frame(n=1) -> b
> b[1,1]
[1] 1
You can get the vector using df[[1,1]]
Output
> df[[1,1]]
[1] 1
Here is a simple example that explains how it works using test data
df1 <- data.frame(a = c(1,2,3), b = c(4,5,6))
Output
> df1['a']
a
1 1
2 2
3 3
> df1[['a']]
[1] 1 2 3
In addition to #sachin's answer, two additional methods may also work
df %>% as.numeric
[1] 1
and
df %>% unlist %>% unname
[1] 1

Writing a function in in R

I am doing an exercise to practice writing functions.
I'm trying to figure out the general code before writing the function that reproduces the output from the table function. So far, I have the following:
set.seed(111)
vec <- as.integer(runif(10, 5, 20))
x <- sort(unique(vec))
for (i in x) {
c <- length(x[i] == vec[i])
print(c)
}
But this gives me the following output:
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
I don't think I'm subsetting correctly in my loop. I've been watching videos, but I'm not quite sure where I'm going wrong. Would appreciate any insight!
Thanks!
We can sum the logical vector concatenate it to count
count <- c()
for(number in x) count <- c(count, sum(vec == number))
count
#[1] 3 1 4 1 5 4 3 2 7
In the OP's for loop, it is looping over the 'x' values and not on the sequence of 'x'
If we do
for(number in x) count <- c(count, length(vec[vec == number]))
it should work as well
You can try sapply + setNames to achieve the same result like table, i.e.,
count <- sapply(x, function(k) setNames(sum(k==vec),k))
or
count <- sapply(x, function(k) setNames(length(na.omit(match(vec,k))),k))
such that
> count
1 2 3 4 5 6 7 8 9
3 1 4 1 5 4 3 2 7
Here is a solution without using unique and with one pass through the vector (if only R was fast with for loops!):
count = list()
for (i in vec) {
val = as.character(i)
if (is.null(count[[val]]))
count[[val]] = 1
else
count[[val]] = count[[val]] + 1
}
unlist(count)

R finding values in a data frame using | operator vs %in%

I'm trying to find all instances of certain values in a data frame, and replace them with NA. I tried this two different ways that I thought were equivalent, but I get different results. For example:
df <- data.frame(a=c(1,2),b=c(3,4))
df[df == 1 | df == 4] <- NA
gives me the expected result:
df
# a b
# 1 NA 3
# 2 2 NA
whereas
df <- data.frame(a=c(1,2),b=c(3,4))
df[df %in% c(1,4)] <- NA
does nothing:
df
# a b
# 1 1 3
# 2 2 4
This seems to be because if I use the "|" operator, it searches the data frame element by element, whereas if I use %in% it searches the data frame vector by vector (column by column), but I don't understand why.
df <- data.frame(a=c(1,2),b=c(3,4))
df == 1 | df == 4
# a b
# [1,] TRUE FALSE
# [2,] FALSE TRUE
df %in% c(1,4)
# [1] FALSE FALSE
If we look at the code for %in%
function (x, table)
match(x, table, nomatch = 0L) > 0L
So, it is basically doing a match. The output of match would be
match(c(1,4), df, nomatch = 0L) > 0L
#[1] FALSE FALSE
%in% is applied on vectors instead of data.frame. So, we loop through the columns using lapply, then do the %in%
lapply(df, `%in%`, c(1, 4))
If we need how the matrix, then use sapply
df[sapply(df, `%in%`, c(1, 4))] <- NA
We can check the match works on a vector
sapply(df, match, x = c(1,4), nomatch = 0L) > 0
# a b
#[1,] TRUE FALSE
#[2,] FALSE TRUE
%in% is only for vectors. In order to perform it on a dataframe you would have to use sapply to apply a function across each of the columns.
df[sapply(df, function(x) x %in% c(1, 4))] <- NA
a b
1 NA 3
2 2 NA

Sorting a list of unequal-size vectors in r

Suppose I have several vectors - maybe they're stored in a list, but if there's a better data structure that's fine too:
ll <- list(c(1,3,2),
c(1,2),
c(2,1),
c(1,3,1))
And I want to sort them, using the first number, then the second number to resolve ties, then the third number to resolve remaining ties, etc.:
c(1,2)
c(1,3,1)
c(1,3,2)
c(2,1)
Are there any built in functions that will allow me to do this or do I need to roll my own solution?
(For those who know Python, what I'm after is something that mimics the behavior of sort in Python)
ll <- list(c(1,3,2),
c(1,2),
c(2,1),
c(1,3,1))
I'd prefer using NA for missing values and using rbind.data.frame instead of paste:
sortfun <- function(l) {
l1 <- lapply(l, function(x, n) {
length(x) <- n
x
}, n = max(lengths(l)))
l1 <- do.call(rbind.data.frame, l1)
l[do.call(order, l1)] #order's default is na.last = TRUE
}
sortfun(ll)
#[[1]]
#[1] 1 2
#
#[[2]]
#[1] 1 3 1
#
#[[3]]
#[1] 1 3 2
#
#[[4]]
#[1] 2 1
Here's an approach that uses data.table.
The result is a rectangular data.table with the rows ordered in the form you described. NA values are filled in where the list item was a different length.
library(data.table)
setorderv(data.table(do.call(cbind, transpose(l))), paste0("V", 1:max(lengths(l))))[]
# V1 V2 V3
# 1: 1 2 NA
# 2: 1 3 1
# 3: 1 3 2
# 4: 2 1 NA
This is ugly, but you can use the result on your list with something like:
l[setorderv(
data.table(
do.call(cbind, transpose(l)))[
, ind := seq_along(l)][],
paste0("V", seq_len(max(lengths(l)))))$ind]

How to extract a number into digits using R?

Suppose I have a number: 4321
and I want to extract it into digits: 4, 3, 2, 1
How do I do this?
Alternatively, with strsplit:
x <- as.character(4321)
as.numeric(unlist(strsplit(x, "")))
[1] 4 3 2 1
Use substring to extract character at each index and then convert it back to integer:
x <- 4321
as.integer(substring(x, seq(nchar(x)), seq(nchar(x))))
[1] 4 3 2 1
For real fun, here's an absurd method:
digspl<-function(x){
x<-trunc(x) # justin case
mj<-trunc(log10(x))
y <- trunc(x/10^mj)
for(j in 1:mj) {
y[j+1]<- trunc((x-y[j]*10^(mj-j+1))/(10^(mj-j)))
x<- x - y[j]*10^(mj-j+1)
}
return(y)
}
For fun, here's an alternative:
x <- 4321
read.fwf(textConnection(as.character(x)), rep(1, nchar(x)))
# V1 V2 V3 V4
# 1 4 3 2 1
The only advantage I can think of is the possibility of exploding your input into varying widths, though I guess you can do that with substring too.
An alternative solution, using modulo operator:
get_digit <- function(x, d) {
# digits from the right
# i.e.: first digit is the ones, second is the tens, etc.
(x %% 10^d) %/% (10^(d-1))
}
# for one number
get_all_digit <- function(x) {
get_digit_x <- function(d) get_digit(x,d)
sapply(nchar(x):1, get_digit_x)
}
# for a vector of numbers
digits <- function(x) {
out <- lapply(x, get_all_digit)
names(out) <- x
out
}
Example:
> digits(100:104)
$`100`
[1] 1 0 0
$`101`
[1] 1 0 1
$`102`
[1] 1 0 2
$`103`
[1] 1 0 3
$`104`
[1] 1 0 4

Resources