Replace column if the same column name R - r

I have used mice package in R to impute some missing values in my data, but not for all variables. Now I would like to replace the columns from the original data with columns from the imputed data, if their column names are equal. Here is my function:
replace_imp <- function(data,impdata) {
for(i in 1:length(impdata)){
for(k in 1:length(data)){
if(colnames(impdata)[i]==colnames(data)[k]){
data[,k] <- imp_data[,i]
}
}
}
}
But it does not seem to work, any help?

Starting with a minimal data set:
original <- data.frame(X=c(1, 1, 1), Y=c(2, 2, 2), Z=c(3, 3, 3))
imputed <- data.frame(A=c(2, 2, 2), Y=c(5, 5, 5), Z=c(1, 1, 1))
We should expect the original data frame to change it's 'Y' and 'Z' column to the imputed's value. Let's create a function that takes all matching column names, and for every match, we will replace the original's values with the imputed's.
replace_imputed <- function(original, imputed){
namestoChange <- colnames(original)[colnames(imputed) %in% colnames(original)]
for(i in 1:length(namestoChange)){
original[namestoChange[i]] <- imputed[namestoChange[i]]
}
return(original)
}
> replace_imputed(original, imputed)
X Y Z
1 1 5 1
2 1 5 1
3 1 5 1
Is this more or less what you were looking for?

original <- data.frame(X=c(1, 1, 1), Y=c(2, 2, 2), Z=c(3, 3, 3))
imputed <- data.frame(A=c(2, 2, 2), Y=c(5, 5, 5), Z=c(1, 1, 1))
original[names(imputed)] <- imputed
X Y Z A
1 5 1 2
1 5 1 2
1 5 1 2

Related

Permute the position of a subset of a vector

I want to permute a subset of a vector.
For example, say I have a vector (x) and I select a random subset of the vector (e.g., 40% of its values).
What I want to do is output a new vector (x2) that is identical to (x) except the positions of the values within the random subset are randomly swapped.
For example:
x = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
random subset = 1, 4, 5, 8
x2 could be = 4, 2, 3, 8, 1, 6, 7, 5, 9, 10
Here's an an example vector (x) and how I'd select the indices of a random subset of 40% of its values. Any help making (x2) would be appreciated!
x <- seq(1,10,1)
which(x%in%sample(x)[seq_len(length(x)*0.40)])
First draw a sample of proportion p from the indices, then sample and re-assign elements with that indices.
f <- \(x, p=0.4) {
r <- sample(seq_along(x), length(x)*p)
x[r] <- sample(x[r])
`attr<-`(x, 'subs', r) ## add attribute w/ indices that were sampled
}
set.seed(42)
f(x)
# [1] 8 2 3 4 1 5 7 10 6 9
# attr(,"subs")
# [1] 1 5 10 8
Data:
x <- 1:10
For sure there is a faster code to do what you are asking, but, a solution would be:
x <- seq(1,10,1)
y <- which(x%in%sample(x)[seq_len(length(x)*0.40)]) # Defined as "y" the vector of the random subset
# required libraries
library(combinat)
permutation <- permn(y) # permn() function in R generates a list of all permutations of the elements of x.
# https://www.geeksforgeeks.org/calculate-combinations-and-permutations-in-r/
permutation_sampled <- sample(permutation,1) # Sample one of the permutations.
x[y] <- permutation_sampled[[1]] # Substitute the selected permutation in x using y as the index of the elements that should be substituted.

Multivariate cummulative sum

Assume one wished to calculate a cumulative sum based on a multivariate condition, all(Z[i] <= x), for all i over a multivariate grid x. One may obviously implement this naively
cSums <- numeric(nrow(x))
for(i in seq(nrow(x))){
for(j in seq(nrow(Z))){
if(all(Z[j, ] <= x[i, ]))
cSums[i] <- cSums[i] + R[j] # <== R is a single vector to be summed
}
}
which would be somewhere around O((n*p)^2), or slightly faster by iteratively subsetting the columns
cSums <- numeric(nrow(x))
for(i in seq(nrow(x))){
indx <- seq(nrow(Z))
for(j in seq(ncol(Z))){
indx <- indx[which(Z[indx, j] <= x[i, j])]
}
cSums[i] <- sum(R[indx])
}
but this still worst-case as slow as the naive-implementation. How could one improve this to achieve faster performance, while still allowing an undefined number of columns to be compared?
Dummy data and Reproducible example
var1 <- c(3,3,3,5,5,5,4,4,4,6)
var2 <- rep(seq(1,5), each = 2)
Z <- cbind(var1, var2)
x <- Z
R <- rep(1, nrow(x))
# Result using either method.
#[1] 2 2 3 4 6 6 5 5 6 10
outer is your friend, just Vectorize your comparison. colSums yields the desired result then. Should be fast.
f <- Vectorize(function(k, l) all(Z[k, ] <= x[l, ]))
res <- colSums(outer(1:nrow(Z), 1:nrow(x), f))
res
# [1] 2 2 3 4 6 6 5 5 6 10
Data
x <- Z <- structure(c(3, 3, 3, 5, 5, 5, 4, 4, 4, 6, 1, 1, 2, 2, 3, 3, 4,
4, 5, 5), .Dim = c(10L, 2L), .Dimnames = list(NULL, c("var1",
"var2")))
We can use apply row-wise and compare every row with every other row and count how many of them satidy the criteria.
apply(Z, 1, function(x) sum(rowSums(Z <= as.list(x)) == length(x)))
#[1] 2 2 3 4 6 6 5 5 6 10
Similar approach can also be performed using sapply + split
sapply(split(Z, seq_len(nrow(Z))), function(x)
sum(rowSums(Z <= as.list(x)) == length(x)))
data
var1 <- c(3,3,3,5,5,5,4,4,4,6)
var2 <- rep(seq(1,5), each = 2)
Z <- data.frame(var1, var2)

finding values in a range in r and sum the number of values

I have a question I have the following data
c(1, 2, 4, 5, 1, 8, 9)
I set a l = 2 and an u = 6
I want to find all the values in the range (3,7)
How can I do this?
In base R we can use comparison operators to create a logical vector and use that for subsetting the original vector
x[x > 2 & x <= 6]
#[1] 3 5 6
Or using a for loop, initialize an empty vector, loop through the elements of 'x', if the value is between 2 and 6, then concatenate that value to the empty vector
v1 <- c()
for(i in x) {
if(i > 2 & i <= 6) v1 <- c(v1, i)
}
v1
#[1] 3 5 6
data
x <- c(3, 5, 6, 8, 1, 2, 1)

Extract first continuous sequence in vector

I have a vector:
as <- c(1,2,3,4,5,9)
I need to extract the first continunous sequence in the vector, starting at index 1, such that the output is the following:
1 2 3 4 5
Is there a smart function for doing this, or do I have to do something not so elegant like this:
a <- c(1,2,3,4,5,9)
is_continunous <- c()
for (i in 1:length(a)) {
if(a[i+1] - a[i] == 1) {
is_continunous <- c(is_continunous, i)
} else {
break
}
}
continunous_numbers <- c()
if(is_continunous[1] == 1) {
is_continunous <- c(is_continunous, length(is_continunous)+1)
continunous_numbers <- a[is_continunous]
}
It does the trick, but I would expect that there is a function that can already do this.
It isn't clear what you need if the index of the continuous sequence only if it starts at index one or the first sequence, whatever the beginning index is.
In both case, you need to start by checking the difference between adjacent elements:
d_as <- diff(as)
If you need the first sequence only if it starts at index 1:
if(d_as[1]==1) 1:(rle(d_as)$lengths[1]+1) else NULL
# [1] 1 2 3 4 5
rle permits to know lengths and values for each consecutive sequence of same value.
If you need the first continuous sequence, whatever the starting index is:
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
Examples (for the second option):
as <- c(1,2,3,4,5,9)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
#[1] 1 2 3 4 5
as <- c(4,3,1,2,3,4,5,9)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
# [1] 3 4 5 6 7
as <- c(1, 2, 3, 6, 7, 8)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
# [1] 1 2 3
A simple way to catch the sequence would be to find the diff of your vector and grab all elements with diff == 1 plus the very next element, i.e.
d1<- which(diff(as) == 1)
as[c(d1, d1[length(d1)]+1)]
NOTE
This will only work If you only have one sequence in your vector. However If we want to make it more general, then I 'd suggest creating a function as so,
get_seq <- function(vec){
d1 <- which(diff(as) == 1)
if(all(diff(d1) == 1)){
return(c(d1, d1[length(d1)]+1))
}else{
d2 <- split(d1, cumsum(c(1, diff(d1) != 1)))[[1]]
return(c(d2, d2[length(d2)]+1))
}
}
#testing it
as <- c(3, 5, 1, 2, 3, 4, 9, 7, 5, 4, 5, 6, 7, 8)
get_seq(as)
#[1] 3 4 5 6
as <- c(8, 9, 10, 11, 1, 2, 3, 4, 7, 8, 9, 10)
get_seq(as)
#[1] 1 2 3 4
as <- c(1, 2, 3, 4, 5, 6, 11)
get_seq(as)
#[1] 1 2 3 4 5 6

Variable sample upper value in R

I have the following matrix
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(x) = c("Y","Z")
m <-data.frame(m)
I am trying to create a random number in each row where the upper limit is a number based on a variable value (in this case 1*Y based on each row's value for for Z)
I currently have:
samp<-function(x){
sample(0:1,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
which work works well applying the sample function independently to each row, but I always get an error when I try to alter the x in sample. I thought I could do something like this:
samp<-function(x){
sample(0:m$Z,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
but I guess that was wishful thinking.
Ultimately I want the result:
Y Z randoms
2 5 4
4 7 7
3 9 3
5 3 1
1 7 6
Any ideas?
The following will sample from 0 to x$Y for each row, and store the result in randoms:
x$randoms <- sapply(x$Y + 1, sample, 1) - 1
Explanation:
The sapply takes each value in x$Y separately (let's call this y), and calls sample(y + 1, 1) on it.
Note that (e.g.) sample(y+1, 1) will sample 1 random integer from the range 1:(y+1). Since you want a number from 0 to y rather than 1 to y + 1, we subtract 1 at the end.
Also, just pointing out - no need for replace=T here because you are only sampling one value anyway, so it doesn't matter whether it gets replaced or not.
Based on #mathematical.coffee suggestion and my edited example this is the slick final result:
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(m) = c("Y","Z")
m <-data.frame(m)
samp<-function(x){
sample(Z + 1, 1)}
m$randoms <- sapply(m$Z + 1, sample, 1) - 1

Resources