Operate on multiple rows of dataframe simultaneously in R - r

I'm sure someone has asked this (very basic) question before, but I must be searching for the wrong thing because I can't find an answer:
I frequently need to perform operations that involve combining data from multiple rows of the same dataframe. I know how to do this with a looping construct, e.g.
for (i in 2:nrow(df)) { df$result[i] <- df$data[i] - df$data[i-1] }
for (i in 12:nrow(df)) { j <- i - 11; df$result[i] <- prod(df$data[j:i]) }
Is there a general solution for these types of operations that does not involve looping? Or is looping actually the best way to do it in R?

You may try subsetting your data frame, e.g. this:
for (i in 2:nrow[df]) { df$result[i] <- df$data[i] - df$data[i-1] }
becomes:
df$result[2:nrow(df)] <- df$data[2:nrow(df)] - df$data[1:nrow(df)-1]
Note: nrow() is a function AFAIK, so you should call it using parentheses, not square brackets.

In base R:
df$result[2:nrow(df)] = diff(df$data)
df$result2[13:nrow(df)] = diff(df$data,12)
Or dplyr:
df$result = dplyr::lag(df$data)
df$result2 = dplyr::lag(df$data, 12)

Related

Expand dataframe in R with rbind (union)

I need to scale up a set of files for a proof of concept in my company. Essentially have several 1000row files with around 200 columns each, and I want to rbind them until I reach the desired scale. This might be 1Million rows or more.
The output will be essentially a repetition of data (sounds a bit silly) and I'm aware of that, but i just need to prove something.
I used a while loop in R similar to this:
while(nrow(df) < 1000000) {df <- rbind(df,df);}
This seems to work but it looks a bit computationally heavy. It might might take like 10-15minutes.
I though of creating a function (below) and use an "apply" family function on the df, but couldn't succeed:
scaleup_function <- function(x)
{
while(nrow(df) < 1000)
{
x <- rbind(df, df)
}
}
Is there a quicker and more efficient way of doing it (it doesn't need to be with rbind) ?
Many thanks,
Joao
This should do the trick:
df <- matrix(0,nrow=1000,ncol=200)
reps_needed <- ceiling(1000000 / nrow(df))
df_scaled <- df[rep(1:nrow(df),reps_needed),]

Repeat codeline in R

How can I repeat a code-line in R?
I have created a function called 'func1' and I want ‘data’ to run though ‘func1’ 10 times after another
This is what I have now:
data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data)
data<-func1(data);data<-func1(data);data<-func1(data);data<-func1(data)
This is what I would like to have:
solution
data<-func1(data,times=10)
Thanks in advance
Jannik
A simple loop would do this,
for(i in 1:10) {
data <- func1(data)
}
You can write a higher-order function which, given a function, f, a seed value, s, and an integer n, computes
f(f(f( ....(s))...)
(with n function evalutions):
iterate <- function(f,s,n){
if(n == 0){
s
}
else{
f(iterate(f,s,n-1))
}
}
Then you seem to want data <- iterate(func1,data,10)
You can also write iterate using a loop (in a way which is similar to the excellent answer of #JamesElderfield ) but the recursive approach given above is fairly common in the functional programming paradigm (which is one of R's native paradigms).

="x" & i in loop in R

I am quite new in R and I know it is very simple but i got stuck.
Could you please tell me how I can write an Excel formula ="X" & i (for i for instance from 1 to 10) used in loop in r.
For example, assume I have two dataframes with a single column "SUBSET1" and "SUBSET2". What I want is to save the result of the sum of each column in two different dataframes.
For an reproducible example please refer below to the EDIT part:
Illustration:
for (i in 1:2)
{
assign(paste0("sum_results", i),"")
}
for (i in 1:2)
{
sum_results & i<-sum(subset & i) ----something which works in this way
}
I would be very grateful for any hint.
EDIT: Proper example:
Let's assume I have the following data frames
a<-c(2,3,4)
b<-c(2,3,5)
subset1<-data.frame(a,b)
a<-c(2,7,5)
b<-c(4,8,15)
subset2<-data.frame(a,b)
So desired output is that I have two data frames: sum_results1 & sum_results2, where sum_results1
is the sum of the column "a" of the subset1, and sum_results2 is the sum of the column "a" of the subset2.
for (i in 1:2)
{
assign(paste0("sum_results", i),"")
}
for (i in 1:2)
{
sum_results & i<-sum(subset & i)$a --that is where the problem is
}
you were very close. Assuming I am understanding your question correctly, try this:
for (i in 1:2)
{
assign(paste0("sum_results", i),sum(get(paste0("subset",i))))
}
Generally, you want to avoid loops in R. See the comments to your question regarding lapply There are probably much more efficient ways to solving this question. But you have not provided a replicable example as also mentioned in your comments. But let me know if this helps!
EDIT:: below is how you would use sapply and then my solution above to rename your results. sapply will allow you to use a more complicated function that could potentially do things with more than one column. You will have to be specific.
N <- 2
res <- sapply(1:N, function(i) sum(get(paste0("subset",i))))
for (i in 1:N)
{
assign(paste0("sum_results", i),res[i])
}

R stats - generate incremental variables

I am quite new in R programming and facing a simple problem but can't find any solution.
In other programming languages, it is possible to generate incrementally named variables into a loop. Is this possible in R? How?
For example, I would like to save the output of an operation into a new variable each time a loop is done:
for(i in 1:5) {
var_[i] <- i + pi
}
In this way, I would end up with var_1, var_2,..., var_5.
Thank you in advance for any help.
The literal version of what you're attempting is generally considered bad practice in R.
We generally avoid creating large collections of isolated data structures. It is much cleaner to put them all in a list and then set their names attribute:
x <- vector("list",5)
for (i in seq_len(5)){
x[[i]] <- i + pi
}
names(x) <- paste0("var_",1:5)
And then you'd refer to them via things like:
x[["var_1"]]
While this is possible in R, it is highly discouraged. It is better to work with named lists or vector or accumulate results. For example here you can store them as a vector.
myvar<-1:5+pi
# myvar[1] == 4.141593
# myvar[5] == 8.141593
or if you wanted to create a list you could use
myvar <- lapply(1:5, function(x) {x+pi})
names(myvar)<-paste("var", 1:5, sep="_")
# myvar[["var_1"]] == myvar[[1]] == 4.141593
But if you really need to create a bunch of variables (and you don't) you can use the assign() function

using a for loop to add columns to a data frame

I am new to R and it seems like this shouldn't be a difficult task but I cannot seem to find the answer I am looking for. I am trying to add multiple vectors to a data frame using a for loop. This is what I have so far and it works as far as adding the correct columns but the variable names are not right. I was able to fix them by using rename.vars but was wondering if there was a way without doing that.
for (i in 1:5) {
if (i==1) {
alldata<-data.frame(IA, rand1) }
else {
alldata<-data.frame(alldata, rand[[i]]) }
}
Instead of the variable names being rand2, rand3, rand4, rand5, they show up as rand..i.., rand..i...1, rand..i...2, and rand..i...3.
Any Suggestions?
You can set variable names using the colnames function. Therefore, your code would look something like:
newdat <- cbind(IA, rand1, rand[2:5])
colnames(newdat) <- c(colnames(IA), paste0("rand", 1:5))
If you're creating your variables in a loop, you can assign the names during the loop
alldata <- data.frame(IA)
for (i in 1:5) {alldata[, paste0('rand', i)] <- rand[[i]]}
However, R is really slow at loops, so if you are trying to do this with tens of thousands of columns, the cbind and rename approach will be much faster.
Just do cbind(IA, rand1, rand[2:5]).

Resources