Applying a function to a random sample of vector elements - r

I have been learning R for the past few days, and want to find out whether the problem below can be solved in a better manner (compacter code perhaps) than my solution.
Problem: A vector V of N (~ 1000) numeric elements, needs to be transformed in the following way.
Choose M (~ 100) elements at random.
Replace each such element x with f(x).
My Solution: for (i in sample(1:N, M)) V[i] = f(V[i])
Edit: The function f takes as input a single numeric value, and also outputs a single numeric value. Something like: f <- function (x) x^3 + 2
Edit: Thanks for everyone's contributions! I now understand the power of vectorized functions. :)

How about this
i <- sample(1:N, M)
V[i] <- f(V[i])
No need for loop since [<- is a vectorized function. See ?"[<-" to get further details on that.

It depends on the type of your function. If f is vectorised then
V <- f(V) # V is a vector with random numbers
will do the job. If f takes and returns a single value then:
V <- sapply(V, f)
Thankfully, in R most of the function are vectorised, so the first approach would work quite often.

Related

Which loop to use, R language?

We have to create function(K) that returns vector which has all items smaller than or equal to K from fibonacci sequence. We can assume K is fibonacci item. For example if K is 3 the function would return vector (1,1,2,3).
In general, a for loop is used when you know how many iterations you need to do, and a while loop is used when you want to keep going until a condition is met.
For this case, it sounds like you get an input K and you want to keep going until you find a Fibonacci term > K, so use a while loop.
ans <- function(n) {
x <- c(1,1)
while (length(x) <= n) {
position <- length(x)
new <- x[position] + x[position-1]
x <- c(x,new)
}
return(x[x<=n])
}
`
Tried many different loops, and this is closest I get. It works with every other number but ans(3) gives 1,1,2 even though it should give 1,1,2,3. Couldn't see what is wrong with this.

R: Apply function based on vector B to vector A

My first question...
I have two vectors, q and n. I want to perform a function on q based on the corresponding value in n (specifically binom.test(q[t],n[t],0.5)).
I've made a loop to do it, which works OK, but I'd like to know how to use apply functions to do it faster if such a thing is possible. I'm new to r, so please forgive my ignorance and probably sloppy formatting.
This is my loop:
q = ...
n = ...
p = c()
for(t in c(1:30)) {p = c(p,binom.test(q[t],n[t],0.5)$p.value)}
Thanks!
You can do this with sapply like this:
sapply(1:length(q), function(t) binom.test(q[t], n[t], 0.5)$p.value)

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Easiest way in R to get vector of frequencies of elements in vector

Suppose I have a vector of values v. What is the easiest way to get a vector f of length equal to v, where the ith element of f is the frequency of the ith element of v in v?
The only way I know to do it seems unnecessarily complicated:
v = sample(1:10,100,replace=TRUE)
D = data.frame( idx=1:length(v), v=v )
E = merge( D, data.frame(table(v)) )
E = E[ with(E,order(idx)), ]
f = E$Freq
Surely there's a simpler way to do this, along the lines of "frequencies(v)"?
For a vector of small positive integers v, as in the question, the expression
tabulate(v)[v]
is particularly simple as well as speedy.
For more general numerical vectors v you can persuade ecdf to help you out, as in
w <- sapply(v, ecdf(v)) * length(v)
tabulate(w)[w]
It's probably better to do the coding of the underlying algorithm yourself, though--and it certainly avoids the floating point rounding error implicit in the preceding solution:
frequencies <- function(x) {
i <- order(x)
v <- x[i]
w <- cumsum(c(TRUE, v[-1] != v[-length(x)]))
f <- tabulate(w)[w]
return(f[order(i)])
}
This algorithm sorts the data, assigns sequential identifiers 1, 2, 3, ... to the values as it encounters them (by summing a binary indicator of when the values change), uses the preceding tabulate()[] trick to obtain the frequencies efficiently, and then unsorts the results to make the output match the input, component by component.
I think the best solution here is:
ave(v,v,FUN=length)
It is simply ave()'s design to replicate and map the return value of FUN() back to every index of the input vector whose element was part of the group for which that particular invocation of FUN() was performed.
Something like this works for me:
sapply(v, function(elmt, vec) sum(vec == elmt), vec=v)
i would suggest you use table and as.vector:
as.vector(table(dataInVector))

mapply - passing row and column of element as argument

I'm new to R programming and I know I could write a loop to do this, but everything I read says that for simplicity its best to avoid loops and use apply instead.
I have a matrix and i would like to run this function on each element in the matrix.
cellresidue <- function(i,j){
result <- (cluster[i,j] - cluster.I[i,] - cluster.J[j,] - cluster.IJ)/(cluster.N*cluster.M)
return (result)
}
i= element row
j= element column
cluster.J is a matrix of column means
cluster.I is a matrix of row means
cluster.IJ is the mean of the entire matrix named cluster
What I can't figure out is how do I get the row and column of the element (I think should use row() and column col() functions) that mapply is working with and how do pass those arguments to mapply or apply?
There is no need for loops or *apply functions. You can just use plain matrix operations:
nI <- nrows(cluster)
nJ <- ncols(cluster)
cluster.I <- matrix(rowMeans(cluster), nI, nJ, byrow = FALSE)
cluster.J <- matrix(rowMeans(cluster), nI, nJ, byrow = TRUE)
cluster.IJ <- matrix( mean(cluster), nI, nJ)
residue.mat <- (cluster - cluster.I - cluster.J - cluster.IJ) /
(cluster.N * cluster.M)
(You did not explain what cluster.N and cluster.M are but I assume they are scalars)
It is not clear from your question what you are trying to do. It is best on this site to provide some mock data (preferably generated by the code, not pasted), and then show what form the end result should look like. It seems that the apply family is not what you seek.
Quick disambiguation between apply, sapply and mapply:
#providing data for examples
X=matrix(rnorm(9),3,3)
apply: apply a function to either columns (2) or rows (1) of a matrix or array
#here, sum by columns, same as colSums(X)
apply(X, 2, sum)
sapply: apply a function against (usually) a list of objects
#create a list with three vectors
mylist=list(1:4, 5:10, c(1,1,1))
#get the mean of each vector
sapply(mylist, mean)
#remove 2 to each element of X, same as c(X-2)
sapply(X, FUN=function(x) x-2)
mapply: a multivariate version of sapply, taking an arbitrary number of arguments. Never had much use of it… Some rock-bottom examples:
#same as c(1,2,3,4) + c(15,16,17,18)
mapply(sum, 1:4, 15:18)
#same as c(X+X), the vectorized matrix sum
mapply(sum, X, X)
Side note: It's perfectly ok to use loops in R; use whichever suits the best your thoughts. The issue is that if you have a "really big" number of iterations, this is where you could meet bottlenecks, depending on your patience. There are two solutions to this: rewrite your function in C/FORTRAN (and boost speed), or use built-in functions if applicable (which are, by the way, often writen in C or FORTRAN).

Resources