Using loop to create new variables - r

My data frame is called Subs.
My variables are REV_4, REV_5, REV_6 etc
I want to create new variables to calculate percentage change of revenue.
Eg: d.rev.5 <- Subs$REV_5/Subs/$REV_4 -1
I would like to use a loop to create these new variables. I've tried this:
for(i in 5:10){
Subs$d.data.[i] <- Subs$REV_[i]/Subs$REV_[i-1] - 1 }
But it doesn't work.
I suspect it's not recognizing the i as part of the variable name.
Is there any way to get around this? Thank you so much.

You can't reference columns like you're attempting (Subs$REV_[i]), you need to create a string to represent the column.
What I think you're trying to do is (in the absense of your data I've created my own)
set.seed(123)
Subs <- data.frame(rev_1 = rnorm(10, 0, 1),
rev_2 = rnorm(10, 0, 1),
rev_3 = rnorm(10, 0, 1),
rev_4 = rnorm(10, 0, 1))
for(i in 2:4){
## looping over columns 2-4
col1 <- paste0("rev_", i)
col2 <- paste0("rev_", i - 1)
col_new <- paste0("d.rev.", i)
Subs[, col_new] <- Subs[, col1] / Subs[, col2]
}
## A note on subsetting a data.frame
Subs$rev_1 ## works
i <- 1
Subs$rev_[i] ## doesn't work
Subs[, rev_[i]] ## doesn't work
Subs[, "rev_1"] ## works
Subs[, paste0("rev_", i)] ## works
## because
paste0("rev_", i) ## creates the string:
[1] "rev_1"

Related

How to run function on indivisual columns instead of data frame?

Hello everyone I have two data frame trying to do bootstrapping with below script1 in my script1 i am taking number of rows from data frame one and two. Instead of taking rows number from entire data frame I wanted split individual columns as a data frame and remove the zero values and than take the row number than do the bootstrapping using below script. So trying with script2 where I am creating individual data frame from for loop as I am new to R bit confused how efficiently do add the script1 function to it
please suggest me below I am providing script which is running script1 and the script2 I am trying to subset each columns creating a individual data frame
Script1
set.seed(2)
m1 <- matrix(sample(c(0, 1:10), 100, replace = TRUE), 10)
m2 <- matrix(sample(c(0, 1:5), 50, replace = TRUE), 5)
m1 <- as.data.frame(m1)
m2 <- as.data.frame(m2)
nboot <- 1e3
n_m1 <- nrow(m1); n_m2 <- nrow(m2)
temp<- c()
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
value <- colSums(m2)/colSums(m1[boot,])
temp <- rbind(temp, value)
}
boot_data<- apply(temp, 2, median)
script2
for (i in colnames(m1)){
m1_subset=(m1[m1[[i]] > 0, ])
m1_subset=m1_subset[i]
m2_subset=m2[m2[[i]] >0, ]
m2_subset=m2_subset[i]
num_m1 <- nrow(m1_subset); n_m2 <- nrow(m2_subset)# after this wanted add above script changing input
}
If I understand correctly, you want to do the sampling and calculation on each column individually, after removing the 0 values. I. modified your code to work on a single vector instead of a dataframe (i.e., using length() instead of nrow() and sum() instead of colSums(). I also suggest creating the empty matrix for your results ahead of time, and filling in -- it will be fasted.
temp <- matrix(nrow = nboot, ncol = ncol(m1))
for (i in seq_along(m1)){
m1_subset = m1[m1[,i] > 0, i]
m2_subset = m2[m2[,i] > 0, i]
n_m1 <- length(m1_subset); n_m2 <- length(m2_subset)
for (j in seq_len(nboot)) {
boot <- sample(x = seq_len(n_m1), size = n_m2, replace = TRUE)
temp[j, i] <- sum(m2_subset)/sum(m1_subset[boot])
}
}
boot_data <- apply(temp, 2, median)
boot_data <- setNames(data.frame(t(boot_data)), names(m1))
boot_data

How to apply a mulitstep function to several datasets and then combine the results into a new dataset

I would like to apply a set of functions to multiple datasets to 1) edit the datasets (e.g. add new variables and remove NAs), 2) concatenate the resulting datasets, and 3) calculate summary statistics on the new datasets and combine the resulting statistics in a table.
Below is my current code to manually apply the functions to the datasets (which all contain the same variable structure, but have different lengths)
library(tibble)
library(doBy)
library(plyr)
# STEP 0: create example datasets.
# In reality, I have numerous sets of data that are systematically named, such as
# a1 = population A, scenario 1
# b1 = population B, scenario 1
# b2 = population B, scenario 2
# etc...
a1 <- tibble(min = c(10:14, NA), count = c(10, 25, 0, 29, 36, 5)); a1
b2 <- tibble(min = c(8:11, NA, NA), count = c(10, 5, 0, 23, 36, 5)); b2
## STEP 1: choose which datasets to process
tab <- a1
## STEP 2: Add identifying variable to each dataset (in preparation for rbinding). Ideally this is integrated into step 3 below and it would be automatically generated based on the original dataset name (or some type of for loop) Currently, I do it manually below. In the last step, I apply the summaryBy function using pop and scenario vars.
tab$pop <- "a"
tab$scenario <- "1"
## STEP 3: Apply below steps to each population dataset. I will include many more steps, but for brevity I only include a couple here
tab <- tab[!is.na(tab$min), ] # remove rows with NA values for min var
tab$min05 <- tab$min * 0.5 # create new var
a1_new <- tab # save as new edited dataset
## STEP 4: Repeat STEP 1-3 for each dataset.
tab <- b2
tab$pop <- "b"
tab$scenario <- "2"
tab <- tab[!is.na(tab$min), ]
tab$min05 <- tab$min * 0.5
b2_new <- tab
## STEP 5: Concatenate the edited population datasets
dt0 <- rbind(a1_new, b2_new); dt0
## STEP 6: Create summary statistics table
sumTbl <- summaryBy(min + min05 ~ pop + scenario, data = dt0,
FUN = function(x) { c(
min = min(x),
median = median(x),
mean = mean(x),
max=max(x)
) } )
sumTbl
Below is my attempt to create a function and apply the function over a list of the datasets.
my.list = list(a1, b2)
myfxn <- function(duck){
tab <- as.data.frame(duck)
tab$pop <- substr(deparse(substitute(duck)), 1, 1) # DOES NOT WORK
tab$scenario <- substr(deparse(substitute(duck)), 2, 2) # DOES NOT WORK
tab <- tab[!is.na(tab$min), ] # remove rows with NA values for min var
tab$min05 <- tab$min * 0.5 # create new var
return(tab)
}
all.lst <- lapply(my.list, myfxn)
However, there are several issues with my approach:
I am not able to properly extract the correct characters from dataset name to create the pop and scenario vars
The resulting "all.list" is a list of data frames and I'm not sure how to combine them (in order to run the summaryBy function.)
Could I add the summary stat function into myFxn and then run a separate ldplyr function to combine the results?
Many thanks in advance for your help. I've already searched many posts, but apologies in advance if I've missed something!

for loop to make new variables in r

I want to create 9 new variables which are called bank1, bank2, through bank9. These will be the column names. The values will be a full column of 1 of bank1, 2 for bank2, and so on and so forth. Now I was reading on loops and I have a code that does the loop but do no know how to store these values. This is what I got so far. I want to add these columns to Subs dataframe.
set.seed(3)
Subs <- data.frame(value = rnorm(10, 0, 1))
for(i in 1:9){
Subs <- assign(paste("bank", i, sep = ""), i)
}

Writing a for loop with the output as a data frame in R

I am currently working my way through the book 'R for Data Science'.
I am trying to solve this exercise question (21.2.1 Q1.4) but have not been able to determine the correct output before starting the for loop.
Write a for loop to:
Generate 10 random normals for each of μ= −10, 0, 10 and 100.
Like the previous questions in the book I have been trying to insert into a vector output but for this example, it appears I need the output to be a data frame?
This is my code so far:
values <- c(-10,0,10,100)
output <- vector("double", 10)
for (i in seq_along(values)) {
output[[i]] <- rnorm(10, mean = values[[i]])
}
I know the output is wrong but am unsure how to create the format I need here. Any help much appreciated. Thanks!
There are many ways of doing this. Here is one. See inline comments.
set.seed(357) # to make things reproducible, set random seed
N <- 10 # number of loops
xy <- vector("list", N) # create an empty list into which values are to be filled
# run the loop N times and on each loop...
for (i in 1:N) {
# generate a data.frame with 4 columns, and add a random number into each one
# random number depends on the mean specified
xy[[i]] <- data.frame(um10 = rnorm(1, mean = -10),
u0 = rnorm(1, mean = 0),
u10 = rnorm(1, mean = 10),
u100 = rnorm(1, mean = 100))
}
# result is a list of data.frames with 1 row and 4 columns
# you can bind them together into one data.frame using do.call
# rbind means they will be merged row-wise
xy <- do.call(rbind, xy)
um10 u0 u10 u100
1 -11.241117 -0.5832050 10.394747 101.50421
2 -9.233200 0.3174604 9.900024 100.22703
3 -10.469015 0.4765213 9.088352 99.65822
4 -9.453259 -0.3272080 10.041090 99.72397
5 -10.593497 0.1764618 10.505760 101.00852
6 -10.935463 0.3845648 9.981747 100.05564
7 -11.447720 0.8477938 9.726617 99.12918
8 -11.373889 -0.3550321 9.806823 99.52711
9 -7.950092 0.5711058 10.162878 101.38218
10 -9.408727 0.5885065 9.471274 100.69328
Another way would be to pre-allocate a matrix, add in values and coerce it to a data.frame.
xy <- matrix(NA, nrow = N, ncol = 4)
for (i in 1:N) {
xy[i, ] <- rnorm(4, mean = c(-10, 0, 10, 100))
}
# notice that i name the column names post festum
colnames(xy) <- c("um10", "u0", "u10", "u100")
xy <- as.data.frame(xy)
As this is a learning question I will not provide the solution directly.
> values <- c(-10,0,10,100)
> for (i in seq_along(values)) {print(i)} # Checking we iterate by position
[1] 1
[1] 2
[1] 3
[1] 4
> output <- vector("double", 10)
> output # Checking the place where the output will be
[1] 0 0 0 0 0 0 0 0 0 0
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
Error in output[[i]] <- rnorm(10, mean = values[[i]]) :
more elements supplied than there are to replace
As you can see the error say there are more elements to put than space (each iteration generates 10 random numbers, (in total 40) and you only have 10 spaces. Consider using a data format that allows to store several values for each iteration.
So that:
> output <- ??
> for (i in seq_along(values)) { # Testing the full code
+ output[[i]] <- rnorm(10, mean = values[[i]])
+ }
> output # Should have length 4 and each element all the 10 values you created in the loop
# set the number of rows
rows <- 10
# vector with the values
means <- c(-10,0,10,100)
# generating output matrix
output <- matrix(nrow = rows,
ncol = 4)
# setting seed and looping through the number of rows
set.seed(222)
for (i in 1:rows){
output[i,] <- rnorm(length(means),
mean=means)
}
#printing the output
output

Using a for variable in column names to be added to a data frame in R

I have a for loop in R that I want to create 10 different variables in a data frame named rand1, rand2, rand3, etc... Here is what I tried first:
for (rep in 1:10) {
assign(paste('alldata200814$rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
}
but that doesn't work - no error/warning message so I don't know why but when I try to submit
alldata200814$rand1
it says it is NULL.
So then I changed the for loop to:
for (rep in 1:10) {
assign(paste('rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
}
and it creates the variables rand1 - rand10, but now I want to attach them to my data frame. So I tried:
for (rep in 1:10) {
assign(paste('rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
alldata200814 <- cbind(alldata200814, paste('rand', rep, sep=""))
}
but that just creates columns with 'rand1', 'rand2', 'rand3', etc... in every row. Then I got really close by doing this:
for (rep in 1:10) {
data<-assign(paste('rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
alldata200814 <- cbind(alldata200814, data)
}
but that names all 10 columns of random numbers "data" when I want them to be "rand1", "rand2", "rand3", etc... and I'm not sure how to rename them within the loop. I previously had this programmed in 10 different lines like:
alldata200814$rand1<-runif(nrow(alldata200814), 0, 1)
but I may have to do this 100 times instead of only 10 so I need to find a better way to do this. Any help is appreciated and let me know if you need more information. Thanks!
for (i in 1:10){
alldata200814[,paste0("rand",i)] <- runif(nrow(alldata200814), 0, 1)
}
Stop using assign. Period. And until you're confident that you'll know when to use it, distrust anyone telling you to use it.
The other idiom that is important to know (and far preferable to anything involving assign) is that you can create objects and then modify the names after the fact. For instance,
new_col <- matrix(runif(nrow(alldata200814) * 10,0,1),ncol = 10)
alldata200814 <- cbind(alldata200814,new_col)
And now you can alter the column names in place using names(alldata200814) <- column_names. You can even use subsetting to only assign to specific column names, like this:
df <- data.frame(x = 1:5,y = 1:5)
names(df)[2] <- 'z'
> df
x z
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

Resources