Apply a function to two vectors the "R" way? - r

There are two vectors x and y. If x contains an NA I want the NA to be replaced by a value from "y" with the corresponding index. Here is some example code that works:
x <- c(1,2,3,NA,5)
y <- c(6,7,8,9,10)
combineVector <- function(x,y)
{
for (i in 1:length(x)){
if (is.na(x[i]) && !is.na(y[i])){
x[i] = y[i]
}
}
return (x)
}
combineVector(x,y)
# [1] 1 2 3 9 5
I could have written this in almost any programming language. Is there a more "R" way to perform this task?

x <- c(1,2,3,NA,5)
y <- c(6,7,8,9,10)
x[is.na(x)] <- y[is.na(x)]
See the above. using is.na() on x returns a logical vector where it is TRUE for the NA elements of x. Using these in the selector for X and Y will select only those NA elements. Using it in assignment will replace the NA elements from x with the corresponding ones from Y.
That will be much faster than looping as the vector gets large.

Try this code:
x[is.na(x)] <- y[is.na(x)]
By subsetting the x vector with is.na(x) you will be assigning only those values of x which are NA to the corresponding indices in the y vector.

To generate a new vector taking x and y as input, you can use the ifelse function:
x<-c(1,2,3,NA,NA)
y<-c(6,7,8,9,NA)
ifelse(is.na(x), y, x)
# [1] 1 2 3 9 NA

Related

How can I find and replace a specific sequence of numbers in a vector in R?

I need to replace the sequence "1,0,1" with "1,1,1" whenever it is found in a vector. How can I do this?
x <- c(1,2,3,4,1,0,1)
Edit:
This search needs to be dynamic. If after changing from 1,0,1 to 1,1,1 another 1,0,1 occurs, this must also be replaced.
Considering:
x <- c (1,2,3,4,1,0,1,0,1,2)
I want the algorithm to do:
x <- c (1,2,3,4,1,1,1,0,1,2)
And after:
x <- c (1,2,3,4,1,1,1,1,1,2)
A function that deals dynamically with the length of the sub-vector (being sought). Solutions that convert to/from strings are going to be hugely inefficient asymptotically. Solutions that hard-code a sub-vec of length 3 are limited to sub-vecs of length 3. This deals with anything as long as the source vector is as large or larger than the sub-vec to be found.
#' Find a matching sub-vector
#'
#' Given a vector (`invec`) and a no-larger sub-vector (`subvec`),
#' determine if the latter occurs perfectly.
#' #param invec vector
#' #param subvec vector
#' #return integer positions, length 0 or more
find_subvec <- function(invec, subvec) {
sublen <- seq_along(subvec) - 1L
if (length(subvec) > length(invec)) return(integer(0))
which(
sapply(seq_len(length(invec) - length(subvec) + 1L),
function(i) all(subvec == invec[i + sublen]))
)
}
Use:
find_subvec(c(1,2,3,4,1,0,1), c(1,0,1))
# [1] 5
find_subvec(c(1,2,3,4,1,0,1,0,1), c(1,0,1))
# [1] 5 7
A literal replacement.
z <- c(1,1,1)
x <- c(1,2,3,4,1,0,1)
y <- c(1,0,1)
z <- c(1,1,1)
ind <- find_subvec(x, y)
for (i in ind) x[i + seq_along(y) - 1] <- z
x
# [1] 1 2 3 4 1 1 1
There could be edge cases as mentioned by #Onyambu when the expected results are not clear, but one option could be:
x + (x == 0 & c(NA, head(x, -1)) == 1 & c(tail(x, -1), NA) == 1)
1] 1 2 3 4 1 1 1
Here, it is not treating x as a string, but it is assessing whether the lag and lead values are 1 and the value in the middle is 0.
This should work well enough
library(tidyverse)
x <- c(1,2,3,4,1,0,1,0,1)
x %>%
reduce(str_c) %>%
str_replace_all("(?<=1)0(?=1)","1")
#> [1] "123411111"
Created on 2020-06-14 by the reprex package (v0.3.0)

Why are these functions different?

I am not sure why I get different results from these functions.
change_it1 <- function(x) {
x[x == 5] <- -10
}
change_it2 <- function(x) {
x[x == 5] <- -10
x
}
x <- 1:5
x <- change_it1(x)
x
x <- 1:5
x <- change_it2(x)
x
Why do both functions not change x in the same way as?
x[x==5] <- -10
The assignment operator <- is really a function that has the side effect of changing a variables value. But as a function, it also invisibly returns the value that was used on the right hand side for assignment. We can force the invisible value to be seen with a print(). For example
x <- 1:2
print(names(x) <- c("a","b"))
# [1] "a" "b"
or again with subsetting
print(x[1] <- 10)
# [1] 10
print(x[2] <- 20)
# [1] 20
x
# a b
# 10 20
See in each case the assignment returned the right-hand-side value and not the updated value of x. Functions will return whatever value was returned by the last expression. In the first case, you are returning the value returned by the assignment (which is just the value -10) and in the second case you are explicitly returning the updated x value.
The functions both change x in the same way (at least in the scope of the function), but you are just not returning the updated x value in both cases.

Vectorized (non-loop) solution returns wrong result (solution with for-loop returns correct result)

I have theoretically identical solutions, one is vectorized solution and another is with for-loop. But vectorized solution returns wrong result and I want to understand why. Solution's logic is simple: need to replace NA with previous non-NA value in the vector.
# vectorized
f1 <- function(x) {
idx <- which(is.na(x))
x[idx] <- x[ifelse(idx > 1, idx - 1, 1)]
x
}
# non-vectorized
f2 <- function(x) {
for (i in 2:length(x)) {
if (is.na(x[i]) && !is.na(x[i - 1])) {
x[i] <- x[i - 1]
}
}
x
}
v <- c(NA,NA,1,2,3,NA,NA,6,7)
f1(v)
# [1] NA NA 1 2 3 3 NA 6 7
f2(v)
# [1] NA NA 1 2 3 3 3 6 7
The two pieces of code are different.
The first one replace NA with the previous element if this one is not NA.
The second one replace NA with the previous element if this one is not NA, but the previous element can be the result of a previous NA substitution.
Which one is correct really depends on you. The second behaviour is more difficult to vectorize, but there are some already implemented functions like zoo::na.locf.
Or, if you only want to use base packages, you could have a look at this answer.
These two solutions are not equivalent. The first function is rather like:
f2_as_f1 <- function(x) {
y <- x # a copy of x
for (i in 2:length(x)) {
if (is.na(y[i])) {
x[i] <- y[i - 1]
}
}
x
}
Note the usage of the y vector.

Turning a couple of vectors into a list of vectors

Suppose I have a collection of independent vectors, of the same length. For example,
x <- 1:10
y <- rep(NA, 10)
and I wish to turn them into a list whose length is that common length (10 in the given example), in which each element is a vector whose length is the number of independent vectors that were given. In my example, assuming output is the output object, I'd expect
> str(output)
List of 10
$ : num [1:2] 1 NA
...
> output
[[1]]
[1] 1 NA
...
What's the common method of doing that?
use mapply and c:
mapply(c, x, y, SIMPLIFY=FALSE)
[[1]]
[1] 1 NA
[[2]]
[1] 2 NA
..<cropped>..
[[10]]
[1] 10 NA
Another option:
split(cbind(x, y), seq(length(x)))
or even:
split(c(x, y), seq(length(x)))
or even (assuming x has no duplicate values as in your example):
split(c(x, y), x)
Here is a solution that allows you to zip arbitrary number of equi-length vectors into a list, based on position of the element
merge_by_pos <- function(...){
dotlist = list(...)
lapply(seq_along(dotlist), function(i){
Reduce('c', lapply(dotlist, '[[', i))
})
}
x <- 1:10
y <- rep(NA, 10)
z <- 21:30
merge_by_pos(x, y, z)

How do I print values in a list that are greater than a certain number along with the row name in R?

I am painfully new to R. I have a list of data, and I wrote a loop to find which values are greater than a certain number:
for (i in listname){
if(i > x)
print(i)
}
I would like for the printed values to also include the row name... how would I go about doing that?
Thanks for your patience.
Strangely, when the item itself is the iterator, the name is lost. If you instead iterate over the number of the item, print works as expected:
for (i in 1:length(listname)){
if (listname[i] > x){
print(listname[i]) # value with name
}
}
Once you've learned more about R, you will probably want to do this in a "vectorized" way, instead of using a loop:
idx <- which(listname > x) # row numbers
listname[idx] # values with names
or with logical subsetting
gt_x<- listname > x # TRUE or FALSE
listname[gt_x] # values with names
Example: Try this with
listname <- 1:10
names(listname) <- letters[1:10]
x <- 4
idx <- which(listname > x) # row numbers
listname[idx] # values with names
# e f g h i j
# 5 6 7 8 9 10

Resources