How to make a generalized function update the value of a vector? - r

I have been trying to write a generalized function that multiplies each value in each row of a matrix by the corresponding value of a vector in terms of their position (i.e. matrix[1,1]*vector[1], matrix[1,2]*vector[2], etc) and then sum them together. It is important to note that the lengths of the vector and the rows of the matrix are always the same, which means that in each row the first value of the vector is multiplied with the first value of the matrix row. Also important to note, I think, is that the rows and columns of the matrix are of equal length. The end sum for each row should be assigned to different existing vector, the length of which is equal to the number of rows.
This is the matrix and vector:
a <- c(4, -9, 2, -1)
b <- c(-1, 3, -8, 2)
c <- c(5, 2, 6, 3)
d <- c(7, 9, -2, 5)
matrix <- cbind(a,b,c,d)
a b c d
[1,] 4 -1 5 7
[2,] -9 3 2 9
[3,] 2 -8 6 -2
[4,] -1 2 3 5
vector <- c(1, 2, 3, 4)
These are the basic functions that I have to generalize for the rows and columns of matrix and a vector of lenghts "n":
f.1 <- function() {
(matrix[1,1]*vector[1]
+ matrix[1,2]*vector[2]
+ matrix[1,3]*vector[3]
+ matrix[1,4]*vector[4])
}
f.2 <- function() {
(matrix[2,1]*vector[1]
+ matrix[2,2]*vector[2]
+ matrix[2,3]*vector[3]
+ matrix[2,4]*vector[4])
}
and so on...
This is the function I have written:
ncells = 4
f = function(x) {
i = x
result = 0
for(j in 1:ncells) {
result = result + vector[j] * matrix[i][j]
}
return(result)
}
Calling the function:
result.cell = function() {
for(i in 1:ncells) {
new.vector[i] = f(i)
}
}
The vector to which this result should be assigned (i.e. new.vector) has been defined beforehand:
new.vector <- c()
I expected that the end sum for each row will be assigned to the vector in a corresponding manner (e.g. if the sums for all rows were 1, 2, 3, 4, etc. then new.vector(1, 2, 3, 4, etc) but it did not happen.
(Edit) When I do this with the basic functions, the assignment works:
new.vector[1] <- f.1()
new.vector[2] <- f.2()
This does not however work with the generalized function:
new.vector[1:ncells] <- result cell[1:ncells]
(End Edit)
I have also tried setting the length for the the new.vector to be equal to ncells but I don't think it did any good:
length(new.vector) = ncells
My question is how can I make the new vector take the resulting sums of the multiplied elements of a row of a matrix by the corresponding value of a vector.
I hope I have been clear and thanks in advance!

There is no need for a loop here, we can use R's power of matrix multiplication and then sum the rows with rowSums. Note that m and v are used as names for matrix and vector to avoid conflict with those function names.
nr <- nrow(m)
rowSums(m * matrix(rep(v, nr), nr, byrow = TRUE))
# [1] 45 39 -4 32
However, if the vector v is always going to be the column number, we can simply use the col function as our multiplier.
rowSums(m * col(m))
# [1] 45 39 -4 32
Data:
a <- c(4, -9, 2, -1)
b <- c(-1, 3, -8, 2)
c <- c(5, 2, 6, 3)
d <- c(7, 9, -2, 5)
m <- cbind(a, b, c, d)
v <- 1:4

Related

Permute the position of a subset of a vector

I want to permute a subset of a vector.
For example, say I have a vector (x) and I select a random subset of the vector (e.g., 40% of its values).
What I want to do is output a new vector (x2) that is identical to (x) except the positions of the values within the random subset are randomly swapped.
For example:
x = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
random subset = 1, 4, 5, 8
x2 could be = 4, 2, 3, 8, 1, 6, 7, 5, 9, 10
Here's an an example vector (x) and how I'd select the indices of a random subset of 40% of its values. Any help making (x2) would be appreciated!
x <- seq(1,10,1)
which(x%in%sample(x)[seq_len(length(x)*0.40)])
First draw a sample of proportion p from the indices, then sample and re-assign elements with that indices.
f <- \(x, p=0.4) {
r <- sample(seq_along(x), length(x)*p)
x[r] <- sample(x[r])
`attr<-`(x, 'subs', r) ## add attribute w/ indices that were sampled
}
set.seed(42)
f(x)
# [1] 8 2 3 4 1 5 7 10 6 9
# attr(,"subs")
# [1] 1 5 10 8
Data:
x <- 1:10
For sure there is a faster code to do what you are asking, but, a solution would be:
x <- seq(1,10,1)
y <- which(x%in%sample(x)[seq_len(length(x)*0.40)]) # Defined as "y" the vector of the random subset
# required libraries
library(combinat)
permutation <- permn(y) # permn() function in R generates a list of all permutations of the elements of x.
# https://www.geeksforgeeks.org/calculate-combinations-and-permutations-in-r/
permutation_sampled <- sample(permutation,1) # Sample one of the permutations.
x[y] <- permutation_sampled[[1]] # Substitute the selected permutation in x using y as the index of the elements that should be substituted.

Applying an existing multi-argument function to multiple dataframes, row by row, with a joint output dataframe

I have a function taking four arguments,
h(a, b, c, d)
Where a and b are the i-th and the i+1-th row of df1 and c and d are the i-th and i+1-th row of df2, and the output has four variables and i-1 results.
The idea is the following: I want to use the function h to each combination of these four arguments where i is common, and so:
- for the first iteration it will take the 1st and 2nd row of df1 and 1st and 2nd row of df2
- for the second iteration it will take the 2nd and 3rd row of df1 and 2nd and 3rd row of df2
...
Afterward, perfectly, the results will be stored in a separate data frame, with 4 columns and i-1 rows.
I tried making use of apply function and of a for loop, yet my attempts failed me. I don't necessarily need a readymade solution, a hint would be nice. Thanks!
EDIT: reproducible example:
df1 <- data.frame(a = c(1, 2, 3, 4), b = c(5, 6, 7, 8))
df2 <- data.frame(c = c(4, 3, 2, 1), d = c(8, 7, 6, 5))
h <- function (a, b, c, d) {
vector <- (a + b) / (c - d)
vector
}
I would like to get a function that uses h until b and d reach the last row of df1/df2 (they have the same number of rows), and for each such combination generate vector and add it to some new data frame as a next row.
With apply you could do something like this:
df1 <- data.frame(a = c(1, 2, 3, 4), b = c(5, 6, 7, 8))
df2 <- data.frame(c = c(4, 3, 2, 1), d = c(8, 7, 6, 5))
h <- function (a, b, c, d) {
(a + b) / (c - d)
}
apply(cbind(df1, df2), 1, function(x) h(x["a"], x["b"], x["c"], x["d"]))
[1] -1.5 -2.0 -2.5 -3.0
If h is a vectorized function (as in your example) it would be better to
do.call(h, cbind(df1, df2))
Of course, I am not assuming that h is that simple, in which case (df1$a + df1$b) / (df2$c - df2$d) would suffice.
However, I advise learning about the purrr package. It is great for this kind of situation and mainly: you can define what type of output you are expecting (with purrr::map_*) to ensure consistency and avoid unexpected results.
For multiple arguments of a dataframe, use purrr::pmap_*:
# `pmap` returns a list
purrr::pmap(cbind(df1, df2), h)
[[1]]
[1] -1.5
[[2]]
[1] -2
[[3]]
[1] -2.5
[[4]]
[1] -3
# `pmap_dbl` returns a double vector or throws an error otherwise
purrr::pmap_dbl(cbind(df1, df2), h)
[1] -1.5 -2.0 -2.5 -3.0

finding values in a range in r and sum the number of values

I have a question I have the following data
c(1, 2, 4, 5, 1, 8, 9)
I set a l = 2 and an u = 6
I want to find all the values in the range (3,7)
How can I do this?
In base R we can use comparison operators to create a logical vector and use that for subsetting the original vector
x[x > 2 & x <= 6]
#[1] 3 5 6
Or using a for loop, initialize an empty vector, loop through the elements of 'x', if the value is between 2 and 6, then concatenate that value to the empty vector
v1 <- c()
for(i in x) {
if(i > 2 & i <= 6) v1 <- c(v1, i)
}
v1
#[1] 3 5 6
data
x <- c(3, 5, 6, 8, 1, 2, 1)

Replacing values in vector by 0 except for sample and looping this

I have a vector, in this case "dist_SLA" for which I want to do the following:
I want to take samples of increasing sizes, from size = 1 until all values of "dist_SLA" are sampled (so size = 1, size = 2, size = 3, ..... size = "dist_SLA"). --> Ill call the sample vectors sample.i
Then I want to transform all the sample vectors "sample.i" to new ones using this method: The vector should be transformed so that all values from "dist_SLA" that were not sampled in sample.i are replaced by 0, so that it gives me a vector which includes the sampled values and zeros. I'll call the new vectors "sp.i"
Lastly, I want to make a list which combines all calculated R-squares of lm of all different transformed vectors "sp.i" and "dist_SLA" (So R-square of sp.1 with "dist_SLA" + R-square of sp.2 with "dist_SLA", etc)
I have tried the following:
dist_SLA <- c(1, 4, 9, 3, 4, 6)
for (i in 1:NROW(dist_SLA)){
sample_[i] <- sample(dist_SLA, size = i )
sp_[i] <- ifelse(dist_SLA == sample_[i], yes = sample_[i], no = "0")
lm_[i] <- lm(dist_SLA ~ sp_[i])
fit_[i] <- summary(lm_[i])$r.squared
}
But this gives me a few problems:
The "ifelse" function gives me a vector in which all values that are identical to the value(s) of the sample won't get replaced by 0 in "sp_1". I therefore want a vector in which only the sample value(s) is/are not replaced by 0 but the others are.
The loop does not work in this way but I cannot figure out how.
How can I fix this?
I believe the following does what you want.
Note that you don't need the sample.i vectors, only the r-squared values will be saved. so you only have to have a vector where to save them.
set.seed(3520) # Make the results reproducible
dist_SLA <- c(1, 4, 9, 3, 4, 6)
n <- length(dist_SLA)
fit <- numeric(length(dist_SLA))
for (i in seq_along(dist_SLA)){
smpl <- sample(n, size = i)
sp <- numeric(length(dist_SLA))
sp[smpl] <- dist_SLA[smpl]
lmi <- lm(dist_SLA ~ sp)
fit[i] <- summary(lmi)$r.squared
}
fit
#[1] 0.6480000 0.0200000 0.1739130 0.7667327 0.8711111 1.0000000
Try this:
set.seed(123)
sample_ <- sample(dist_SLA, size = 3)
sample_
[1] 4 3 6
dist_SLA <- c(1, 4, 9, 3, 4, 6)
Then this will give you
dist_SLA==sample_
[1] FALSE FALSE FALSE FALSE FALSE TRUE
Whereas using %in% gives:
dist_SLA %in% sample_
[1] FALSE TRUE FALSE TRUE TRUE TRUE
And
ifelse(dist_SLA %in% sample_, dist_SLA, 0)
[1] 0 4 0 3 4 6
So your loop, depending on what you want to save for later usage could look like
set.seed(123)
dist_SLA <- c(1, 4, 9, 3, 4, 6)
lm_ <- vector(mode = "list", length = length(dist_SLA))
fit_ <- vector(mode = "numeric", length = length(dist_SLA))
for(x in 1 : length(dist_SLA)){
sample_ <- sample(dist_SLA, size = x)
spi <- ifelse(dist_SLA %in% sample_, dist_SLA, 0)
lm_[[x]] <- lm(dist_SLA ~ spi)
fit_[x] <- summary(lm_[[x]])$r.squared
}

Complement of empty index vector is empty index vector

I am removing values from a vector by using - (minus sign) in front of the index vector. Like this:
scores <- scores[-indexes.to.delete]
Sometimes indexes.to.delete vector is empty, that is N/A. So the scores vector should then remain unchanged. However, I am getting empty scores vector when indexes.to.delete is empty.
Example:
x <- c(1, 2, 3);
y <- c(4, 5, 6);
indexes.to.delete <- which(y < x); # will return empty vector
y <- y[-indexes.to.delete]; # returns empty y vector, but I want y stay untouched
I could code an if statement checking whether indexes.to.delete is empty, but I am wondering if there is a simpler way?
Maybe use;
x <- c(1, 2, 3)
y <- c(4, 5, 6)
y[!y<x]
> y[!y<x]
[1] 4 5 6
x <- c(1, 2, 3)
y <- c(4, 1, 6)
> y[!y<x]
[1] 4 6
>

Resources