Merging two vectors with an 'or' - r

I have 2 vectors, each of which has some NA values.
a <- c(1, 2, NA, 3, 4, NA)
b <- c(NA, 6, 7, 8, 9, NA)
I'd like to combine these two with a result that uses the value from a if it is non-NA, otherwise the value from b.
So the result would look like:
c <- c(1, 2, 7, 3, 4, NA)
How can I do this efficiently in R?

How about:
> c <- ifelse(is.na(a), b, a)
> c
[1] 1 2 7 3 4 NA

Try
a[is.na(a)] <- b[is.na(a)]
a
## [1] 1 2 7 3 4 NA
Or, if you don't want to overwrite a, just do
c <- a
c[is.na(c)] <- b[is.na(c)]
c
## [1] 1 2 7 3 4 NA

Related

For each value in a vector get the corresponding next smallest value

For each element in a vector, I want the corresponding next smaller value in the vector, without changing the original order of the elements.
For example, suppose the given vector is:
c(4, 5, 5, 10, 3, 7)
Then the result would be:
c(3, 4, 4, 7, 0, 5)
Note that since 3 does not have any smaller value, I want it to be replaced with 0.
Any help will be much appreciated. Thank you.
We may use
sapply(v1, function(x) sort(v1)[match(x, sort(v1))-1][1])
[1] 3 4 4 7 NA 5
Or use a vectorized option
v2 <- unique(v1)
v3 <- sort(v2)
v4 <- v3[-length(v3)]
i1 <- match(v1, v3) - 1
i1[i1 == 0] <- NA
v4[i1]
[1] 3 4 4 7 NA 5
data
v1 <- c(4, 5, 5, 10, 3, 7)
We can try the code below using outer + max.col
> m <- outer(v, u <- sort(unique(v)), `>`)
> replace(u[max.col(m, ties.method = "last")], rowSums(m) == 0, NA)
[1] 3 4 4 7 NA 5
Using findInterval:
sx = sort(x)
i = findInterval(x, sx, left.open = TRUE)
sx[replace(i, i == 0, NA)]
# [1] 3 4 4 7 NA 5

Back tracing parents/paths of two-column data of a tree

I have a tree data serialized like the following:
Relationship: P to C is "one-to-many", and C to P is "one-to-one". So column P may have duplicate values, but column C has unique values.
P, C
1, 2
1, 3
3, 4
2, 5
4, 6
# in data.frame
df <- data.frame(P=c(1,1,3,2,4), C=c(2,3,4,5,6))
1. How do I efficiently implement a function func so that:
func(df, val) returns a vector of full path to root (1 in this case).
For example:
func(df, 3) returns c(1,2,3)
func(df, 5) returns c(1,2,5)
func(df, 6) returns c(1,3,4,6)
2. Alternatively, quickly transforming df to a lookup table like this also works for me:
C, Paths
2, c(1,2)
3, c(1,3)
4, c(1,3,4)
5, c(1,2,5)
6, c(1,2,4,6)
Here is a solution using igraph
library(igraph)
g <- graph_from_data_frame(df)
df <- within(df,
Path <- sapply(match(as.character(C),names(V(g))),
function(k) toString(names(unlist(all_simple_paths(g,1,k))))))
such that
> df
P C Path
1 1 2 1, 2
2 1 3 1, 3
3 3 4 1, 3, 4
4 2 5 1, 2, 5
5 4 6 1, 3, 4, 6

How to select a random vector

I have 4 vectors that contain integers.
I want to perform calculations based on 2 of the vectors, selected randomly.
I tried creating a new vector containing all the vectors, but sample() only gives me the first element of each vector.
My vectors if it helps:
A <- c(4, 4, 4, 4, 0, 0)
B <- c(3, 3, 3, 3, 3, 3)
C <- c(6, 6, 2, 2, 2, 2)
D <- c(5, 5, 5, 1, 1, 1)
The output I wanted is for example: A, B or B, D or D, A etc.
A thousand thanks in advance!
This is easier to do if you store your vectors in a list:
vecs <- list(
A = c(4, 4, 4, 4, 0, 0),
B = c(3, 3, 3, 3, 3, 3),
C = c(6, 6, 2, 2, 2, 2),
D = c(5, 5, 5, 1, 1, 1)
)
idx <- sample(1:length(vecs), 2, replace = F)
sampled <- vecs[idx]
sampled
$D
[1] 5 5 5 1 1 1
$B
[1] 3 3 3 3 3 3
You can then access your two sampled vectors, regardless of their names, with sampled[[1]] and sampled[[2]].
You first need make a list or a dataframe, on which you can do sample(). size= says the number of vectors that you want in each sample, which is 2 here.
LIST
> LIST <- list(A, B, C, D)
> sample(LIST, size = 2)
[[1]]
[1] 3 3 3 3 3 3
[[2]]
[1] 4 4 4 4 0 0
Dataframe
> df <- data.frame(A, B, C, D)
> sample(df, size = 2)
B C
1 3 6
2 3 6
3 3 2
4 3 2
5 3 2
6 3 2
I think you were sampling on the wrong object.
Make a list:
LIST = list(A,B,C,D)
names(LIST) = c("A","B","C","D")
This gives you a sample of 2 from the list
sample(LIST,2)
To add them for example, do:
Reduce("+",sample(LIST,2))

Replacing NAs in a data frame with values from a different column

I would like to replace NAs in my data frame with values from another column. For example:
a1 <- c(1, 2, 4, NA, 2, NA)
b1 <- c(3, NA, 4, 4, 4, 3)
c1 <- c(NA, 3, 3, 4, 2, 3)
a2 <- c(2, 3, 5, 5, 3, 4)
b2 <- c(1, 2, 4, 5, 6, 3)
c2 <- c(3, 3, 2, 3, 4, 3)
df <- as.data.frame(cbind(a1, b1, c1, a2, b2, c2))
df
> df
a1 b1 c1 a2 b2 c2
1 1 3 NA 2 1 3
2 2 NA 3 3 2 3
3 4 4 3 5 4 2
4 NA 4 4 5 5 3
5 2 4 2 3 6 4
6 NA 3 3 4 3 3
I would like replace the NAs in df$a1 with the values from the corresponding row in df$a2, the NAs in df$b1 with the values from the corresponding row in df$b2, and the NAs in df$c1 with the values from the corresponding row in df$c2 so that the new data frame looks like:
> df
a1 b1 c1
1 1 3 3
2 2 2 3
3 4 4 3
4 5 4 4
5 2 4 2
6 4 3 3
How can I do this? I have a large data frame with many columns, so it would be great to find an efficient way to do this (I've already seen Replace missing values with a value from another column). Thank you!
An extensible option:
df2 <- df[c('a1','b1','c1')]
df2[] <- mapply(function(x,y) ifelse(is.na(x), y, x),
df[c('a1','b1','c1')], df[c('a2','b2','c2')],
SIMPLIFY=FALSE)
df2
# a1 b1 c1
# 1 1 3 3
# 2 2 2 3
# 3 4 4 3
# 4 5 4 4
# 5 2 4 2
# 6 4 3 3
It's easy enough to extend this to arbitrary column pairs: the first column in the first subset (df[c('a1','b1','c1')]) is paired with the first column of the second subset; second column first subset, second column second subset; etc. It can even be generalized with df[grepl('1$',colnames(df))] and df[grepl('2$',colnames(df))], assuming they don't mis-match.
coalesce in dplyr is meant to do exactly this (replace NAs in a first vector with not NA elements of a later one). e.g.
coalesce(df$a1,df$a2)
[1] 1 2 4 5 2 4
It can be used with sapply to do the whole dataset in an efficient and easily extensible manner:
sapply(c("a","b","c"),function(x) coalesce(df[,paste0(x,1)],df[,paste0(x,2)]))
a b c
[1,] 1 3 3
[2,] 2 2 3
[3,] 4 4 3
[4,] 5 4 4
[5,] 2 4 2
[6,] 4 3 3
dfnew<- ifelse(is.na(df$a1) == T, df$a2, df$a1)
as.data.frame(dfnew)
this is just for a1 col, you'll have to run this for all a,b and c and cbind it. if there are too many columns, running a loop will be the best option imo
You can use hutils::coalesce. It should be slightly faster, especially if it can 'cheat' -- if any columns have no NAs and so don't need to change, coalesce will skip them:
a1 <- c(1, 2, 4, NA, 2, NA)
b1 <- c(3, NA, 4, 4, 4, 3)
c1 <- c(NA, 3, 3, 4, 2, 3)
a2 <- c(2, 3, 5, 5, 3, 4)
b2 <- c(1, 2, 4, 5, 6, 3)
c2 <- c(3, 3, 2, 3, 4, 3)
s <- function(x) {
sample(x, size = 1e6, replace = TRUE)
}
df <- as.data.frame(cbind(a1 = s(a1), b1 = s(b1), c1 = s(c1),
a2 = s(a2), b2 = s(b2), c2 = s(c2)))
library(microbenchmark)
library(hutils)
library(data.table)
dt <- as.data.table(df)
old <- paste0(letters[1:3], "1") # you will need to specify
new <- paste0(letters[1:3], "2")
dplyr_coalesce <- function(df) {
ans <- df
for (j in seq_along(old)) {
o <- old[j]
n <- new[j]
ans[[o]] <- dplyr::coalesce(ans[[o]], df[[n]])
}
ans
}
hutils_coalesce <- function(df) {
ans <- df
for (j in seq_along(old)) {
o <- old[j]
n <- new[j]
ans[[o]] <- hutils::coalesce(ans[[o]], df[[n]])
}
ans
}
microbenchmark(dplyr = dplyr_coalesce(df),
hutils = hutils_coalesce(df))
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> dplyr 45.78123 61.76857 95.10870 69.21561 87.84774 1452.0800 100 b
#> hutils 36.48602 46.76336 63.46643 52.95736 64.53066 252.5608 100 a
Created on 2018-03-29 by the reprex package (v0.2.0).

how to replace values in a dataframe from a second smaller dataframe?

I am relatively new with R and I have a problem with a dataframe.
I have a very long dataframe (df1) with some coordinates xy and a value z. I have a shorter dataframe (df2) with the same columns but smaller number of rows. I want to replace values in df1 when xy are equal in df2.
x = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4)
y = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
z = c(8, 5, 3, 1, 2, 6, 8, 5, 3, 2, 8, 4, 4, 6, 2, 1)
df1 = data.frame(x, y, z)
x1=c(1,3,4)
y1=c(2,1,4)
z1=c(58,37,23)
df2=data.frame(x1,y1,z1)
names(df2) <- c("x", "y", "z")
I thought that I might use ifelse function as:
df1$znew<-ifelse((df1[,1]== df2[,1])&(df1[,2]==df2[,2]), df2[,3], df1[,3])
But the two objects are not the same dimensions.
I have tried to use loops so it analyse each row to compare x and y and then decide what z to use but I can't make it work.
At the end I would like to have a dataframe with a new variable of z to compare the values and corroborate that it really changed the values. My final dataframe would look like:
znew = c(8,58,3,1,2,6,8,5,37,2,8,4,4,6,2,23)
I really appreciate any help and I am sorry if somebody else posted similar questions, I have been all day trying to figure it out and I can't find any example that suits my case.
Assuming the two data frames do in fact have the same column names (probably just a typo in your question), you might do this with merge:
tmp <- merge(df1,df2,all.x = TRUE,by = c('x','y'))
tmp$z.x[!is.na(tmp$z.y)] <- tmp$z.y[!is.na(tmp$z.y)]
> tmp
x y z.x z.y
1 1 1 8 NA
2 1 2 4 4
3 1 3 3 NA
4 1 4 1 NA
5 2 1 2 NA
6 2 2 6 NA
7 2 3 8 NA
8 2 4 5 NA
9 3 1 4 4
10 3 2 2 NA
11 3 3 8 NA
12 3 4 4 NA
13 4 1 4 NA
14 4 2 6 NA
15 4 3 2 NA
16 4 4 3 3
Then just remove the extra column and rename the columns.

Resources