R: Build custom cumsum function in sapply - r

I'm trying to build a more custom version of cumsum to use on a data.table, but I'm failing at the first step:
numbers <- data.table(num=1:10)
sum <- 0
cumFunct <- function(n) {
sum <<- sum+n
return(sum)
}
numbers[, cum:=sapply(num, cumFunct)]
While this works, it is very unclean. It also requires sum to be set to 0 before I run the function.
Now, how do I write this in a cleaner way? Essentially, how can I pass the intermediate result to the next iteration of cumFunct without using global variables?
Thanks very much!

One way to do this would be to use the datatable "numbers" within the function:
numbers <- data.table(num=1:10)
cumFunct <- function(n) {
sum <- sum(numbers[1:n])
return(sum)
}
numbers[, cum:=sapply(num, cumFunct)]
This is not the most efficient way, but depending on what you do in your custom code, one can improve it.

An answer that is also a question: is this a pattern that will work here?
complicated.wizardry <- function(a,b){
a+b
}
cumlist <- function(sofar, remaining, myfn){
if(length(remaining)==1)return(c(sofar, myfn(sofar[length(sofar)],remaining[1])))
return ( cumlist( c(sofar, myfn(sofar[length(sofar)],remaining[1])),remaining[2:length(remaining)],myfn))
}
cumlist(0,1:10,complicated.wizardry)

Related

Using for loop to append vectors of variable length

I am trying to create a vector or list of values based on the output of a function performed on individual elements of a column.
library(hpoPlot)
xyz_hpo <- c("HP:0003698", "HP:0007082", "HP:0006956")
getallancs <- function(hpo_col) {
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- list()
output[[length(anc) + 1]] <- append(output, anc)
}
return(anc)
}
all_ancs <- getallancs(xyz_hpo)
get.ancestors outputs a character vector of variable length depending on each term. How can I loop through hpo_col adding the length of each ancs vector to the output vector?
Welcome to Stack Overflow :) Great job on providing a minimal reproducible example!
As mentioned in the comments, you need to move the output <- list() outside of your for loop, and return it after the loop. At present it is being reset for each iteration of the loop, which is not what you want. I also think you want to return a vector rather than a list, so I have changed the type of output.
Also, in your original question, you say that you want to return the length of each anc vector in the loop, so I have changed the function to output the length of each iteration, rather than the whole vector.
getallancs <- function(hpo_col) {
output <- numeric()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- append(output, length(anc))
}
return(output)
}
If you are only doing this for a few cases, such as your example, this approach will be fine, however, this paradigm is typically quite slow in R and it's better to try and vectorise this style of calculation if possible. This is especially important if you are running this for a large number of elements where computation will take more than a few seconds.
For example, one way the function above could be vectorised is like so:
all_ancs <- sapply(xyz_hpo, function(x) length(get.ancestors(hpo.terms, x)))
If in fact you did mean to output the whole vector of anc, not just the lengths, the original function would look like this:
getallancs <- function(hpo_col) {
output <- character()
for (i in 1:length(hpo_col)) {
anc <- get.ancestors(hpo.terms, hpo_col[i])
output <- c(output, anc)
}
return(output)
}
Or a vectorised version could be
all_ancs <- unlist(lapply(xyz_hpo, function(x) get.ancestors(hpo.terms, x)))
Hope that helps. If it solves your problem, please mark this as the answer.

how to print product of vector in R

So this is my current code:
vec_prod <- function(x){
out <- 1
for(i in 1:length(x)){
out <- out*x[i]
}
out
}
however, i want to print out the product of vector [2,3,5]
but it does not accept those values. I can only input (1:3) or (1:4)
I'm new to R programming so any help is appreciated. I do not want to use any other functions.
Issue wasn't in function code, but probably in the way user called it. I propose my cleaner version. But you should really just use prod.
vec_prod <- function(x){
out <- 1
for(i in x){
out <- out*i
}
out
}
vex_prod(c(4,5,6))

Repeat codeline in R

How can I repeat a code-line in R?
I have created a function called 'func1' and I want ‘data’ to run though ‘func1’ 10 times after another
This is what I have now:
data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data)
data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data)
This is what I would like to have:
solution
data<-func1(data,times=10)
Thanks in advance
Jannik
A simple loop would do this,
for(i in 1:10) {
data <- func1(data)
}
You can write a higher-order function which, given a function, f, a seed value, s, and an integer n, computes
f(f(f( ....(s))...)
(with n function evalutions):
iterate <- function(f,s,n){
if(n == 0){
s
}
else{
f(iterate(f,s,n-1))
}
}
Then you seem to want data <- iterate(func1,data,10)
You can also write iterate using a loop (in a way which is similar to the excellent answer of #JamesElderfield ) but the recursive approach given above is fairly common in the functional programming paradigm (which is one of R's native paradigms).

Deleting a row from a data set

I am trying to create a function that deletes n rows from a data set in R. The rows that I want to delete are the minimum values from the column time in the data set my_data_set.
I currently have
delete_data <- function(n)
{
k=1
while(k <= n)
{
my_data_set = my_data_set[-(which.min(my_data_set$time)),]
k=k+1
}
}
When I input these lines manually (without the use of the while loop) it works perfectly but I am not able to get the loop to work.
I am calling the function by:
delete_data(n = 2)
Any help is appreciated!
Thanks
Try:
my_data_set[ ! my_data_set$time == min(my_data_set$time), ]
Or if you are using data.table and wish to use the more direct syntax that data.table provides:
library(data.table)
setDT( my_data_set )
my_data_set[ ! time == min(time) ]
Then review how R work. R is a vectorized language that pretty much does what you mean without having to resort to complicated loops.
Also try:
my_data_set <- my_data_set[which(my_data_set$time > min(my_data_set$time)),]
By the way, which.min() will only pick up the first record if there is more than one record matching the minimum value.

How to use extract function in a for loop?

I am using the extract function in a loop. See below.
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000<- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Where maj function is:
maj <- function(x){
y <- as.numeric(names(which.max(table(x))))
return(y)
}
I was expecting to get i outputs, but I get only one output once the loop is done. Somebody knows what I am doing wrong. Thanks.
One solution in this kind of situation is to create a list and then assign the result of each iteration to the corresponding element of the list:
LU_Mod2000 <- vector("list", length(list_shp_Tanzania))
for (i in 1:length(list_shp_Tanzania)){
LU_Mod2000[[i]] <- extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj)
}
Do not do
LU_Mod2000 <- c(LU_Mod2000, extract(x=rc_Mod2000_LC, y=list_shp_Tanzania[[i]], fun=maj))
inside the loop. This will create unnecessary copies and will take long to run. Use the list method, and after the loop, convert the list of results to the desired format (usually using do.call(LU_Mod2000, <some function>))
Alternatively, you could substitute the for loop with lapply, which is what many people seem to prefer
LU_Mod2000 <- lapply(list_shp_Tanzania, function(z) extract(x=rc_Mod2000_LC, y=z, fun=maj))

Resources