I want to create 9 new variables which are called bank1, bank2, through bank9. These will be the column names. The values will be a full column of 1 of bank1, 2 for bank2, and so on and so forth. Now I was reading on loops and I have a code that does the loop but do no know how to store these values. This is what I got so far. I want to add these columns to Subs dataframe.
set.seed(3)
Subs <- data.frame(value = rnorm(10, 0, 1))
for(i in 1:9){
Subs <- assign(paste("bank", i, sep = ""), i)
}
Related
Like the title says, I wish to use lapply instead of a for loop to parse data from a data frame and put it into an empty data frame. My motivation is that the data frame I'm parsing contains thousands of genes and I've read that the apply functions are faster at iterating through large tables.
### My data table ###
rawCounts <- data.frame(ensembl_gene_id_version = c('ENSG00000000003.15', 'ENSG00000000005.6', 'ENSG00000000419.14'),
HS1 = c(1133, 0, 1392),
HS2 = c(900, 0, 1155),
HS3 = c(1251, 0, 2011),
HS4 = c(785, 0, 1022),
stringsAsFactors = FALSE)
## Function
extract_counts <- function(df, esdbid){
counts <- data.frame()
plyr::ldply(esdbid, function(i) {counts <- df[grep(pattern = i, x = df),] %>% rbind()})
return(counts)
}
## Call the first one
extract_counts(df = rawCounts, esdbid = c('ENSG00000000003.15'))
I want this to return a data frame, so I used the plyr::ldply function from this post - Extracting outputs from lapply to a dataframe
However, it isn't returning anything. Eventually I want to scale up my esdbid vector to include multiple values; such as any combination of gene IDs to quickly retrieve the gene counts.
Strangely, when I run this in the console it appears to work as intended for a vector of length 1, i.e.;
esdbid <- 'ENSG00000000003.15'
plyr::ldply(esdbid, function(i) {counts <- rawCounts[grep(pattern = i, x = rawCounts),] %>% rbind()})
Returns a data frame with the correct values. However, when I increase the length of the vector it returns only the first value for each row. For example if esdbid <- c('ENSG00000000003.15', 'ENSG00000000005.6', 'ENSG00000000419.14') then the console code will return the values for ENSG00000000003.15 three times.
Maybe subset can handle this more effectively?
extract_counts <- function(.data, esdbid) {
subset(.data, grepl(esdbid, .data))
}
esdbid <- "ENSG00000000003.15"
df |> extract_counts(esdbid)
Then you can use lapply if you want a list with all dataframe subsets:
lapply(
unique(df$ensembl_gene_id_version),
function(id) { df |> extract_counts(id) }
)
Results = rep(0,15)
number <- 1:500
for(i in 1:100) {
Results[i] <- sum(sample(number, 20, replace = F))/20)}
I wish to change the sample size 20, 10, 15 in the same loop. How can do it in a single loop?
You can create a loop within a loop.
Continuing what you started makes it difficult, because making sure the values go into the correct cell of your empty pre-structured dataframe requires some thinking. I would just start with a dataframe with 0 rows, and use rowbind to add the results of each itteration.
First create an empty data frame with 0 rows:
Results2 = data.frame(samplesize=c(), sum = c())
Set some parameters:
numbersToSampleFrom <- 1:500
samplesizes <- c(10, 20, 50)
NumberOfIterations <- 100
Loop it together
for(s in seq_along(samplesizes)){
for(i in 1:NumberOfIterations) {
Results2 <- rbind(Results2, data.frame(samplesize= samplesizes[s],
sum = sum(sample(number, samplesizes[s], replace = F))/samplesizes[s])
)
}
}
For each iteration it picks a sample size, does the sampling and creates a dataframe with one row, that you then bind to the Results2 dataframe.
There will be lots of ways to do this, this one is an easy modification from your code (I think).
How can I append a column in a dataframe?
I'm iterating over my datamatrix and if some data agree with a threshold I've set, I want to store them in a 1-row dataframe so I can print at the end of the loop.
My code, looks like this:
for (i in 1:nrow(my.data.frame)) {
# Store gene name in a variable and use it as row name for the 1-row dataframe.
gene.symbol <- rownames(my.data.frame)[i]
# init the dataframe to output
gene.matrix.of.truth <- data.frame(matrix(ncol = 0, nrow = 0))
for (j in 1:ncol(my.data.frame)) {
if (my.data.frame[i,j] < threshold) {
str <- paste(colnames(my.data.frame)[j], ';', my.data.frame[i,j], sep='')
# And I want to append this str to the gene.matrix.of.truth
# I tried gene.matrix.of.truth <- cbind(gene.matrix.of.truth, str) But didn't get me anywhere.
}
}
# Ideally I want to print the dataframe here.
# but, no need to print if nothing met my requirements.
if (ncol(gene.matrix.of.truth) != 0) {
write.table(paste('out_',gene.symbol,sep=''), gene.matrix.of.truth, row.names = T, col.names = F, sep='|', quote = F)
}
}
I do this sort of thing all the time, but with rows instead of columns. Start with
gene.matrix.of.truth = data.frame(x = character(0))
instead of the gene.matrix.of.truth <- data.frame(matrix(ncol = 0, nrow = 0)) you have at initiation. Your append step inside the for j loop will be
gene.matrix.of.truth = rbind(gene.matrix.of.truth, data.frame(x = str))
(i.e. create a dataframe around str and append it to gene.matrix.of.truth).
Obviously, your final if statement will be if(nrow(...)) instead of if(ncol(...)), and if you want the final table as a big row you'll need t to transpose your dataframe at printing time.
My data frame is called Subs.
My variables are REV_4, REV_5, REV_6 etc
I want to create new variables to calculate percentage change of revenue.
Eg: d.rev.5 <- Subs$REV_5/Subs/$REV_4 -1
I would like to use a loop to create these new variables. I've tried this:
for(i in 5:10){
Subs$d.data.[i] <- Subs$REV_[i]/Subs$REV_[i-1] - 1 }
But it doesn't work.
I suspect it's not recognizing the i as part of the variable name.
Is there any way to get around this? Thank you so much.
You can't reference columns like you're attempting (Subs$REV_[i]), you need to create a string to represent the column.
What I think you're trying to do is (in the absense of your data I've created my own)
set.seed(123)
Subs <- data.frame(rev_1 = rnorm(10, 0, 1),
rev_2 = rnorm(10, 0, 1),
rev_3 = rnorm(10, 0, 1),
rev_4 = rnorm(10, 0, 1))
for(i in 2:4){
## looping over columns 2-4
col1 <- paste0("rev_", i)
col2 <- paste0("rev_", i - 1)
col_new <- paste0("d.rev.", i)
Subs[, col_new] <- Subs[, col1] / Subs[, col2]
}
## A note on subsetting a data.frame
Subs$rev_1 ## works
i <- 1
Subs$rev_[i] ## doesn't work
Subs[, rev_[i]] ## doesn't work
Subs[, "rev_1"] ## works
Subs[, paste0("rev_", i)] ## works
## because
paste0("rev_", i) ## creates the string:
[1] "rev_1"
I have a for loop in R that I want to create 10 different variables in a data frame named rand1, rand2, rand3, etc... Here is what I tried first:
for (rep in 1:10) {
assign(paste('alldata200814$rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
}
but that doesn't work - no error/warning message so I don't know why but when I try to submit
alldata200814$rand1
it says it is NULL.
So then I changed the for loop to:
for (rep in 1:10) {
assign(paste('rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
}
and it creates the variables rand1 - rand10, but now I want to attach them to my data frame. So I tried:
for (rep in 1:10) {
assign(paste('rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
alldata200814 <- cbind(alldata200814, paste('rand', rep, sep=""))
}
but that just creates columns with 'rand1', 'rand2', 'rand3', etc... in every row. Then I got really close by doing this:
for (rep in 1:10) {
data<-assign(paste('rand', rep, sep=""), runif(nrow(alldata200814), 0, 1))
alldata200814 <- cbind(alldata200814, data)
}
but that names all 10 columns of random numbers "data" when I want them to be "rand1", "rand2", "rand3", etc... and I'm not sure how to rename them within the loop. I previously had this programmed in 10 different lines like:
alldata200814$rand1<-runif(nrow(alldata200814), 0, 1)
but I may have to do this 100 times instead of only 10 so I need to find a better way to do this. Any help is appreciated and let me know if you need more information. Thanks!
for (i in 1:10){
alldata200814[,paste0("rand",i)] <- runif(nrow(alldata200814), 0, 1)
}
Stop using assign. Period. And until you're confident that you'll know when to use it, distrust anyone telling you to use it.
The other idiom that is important to know (and far preferable to anything involving assign) is that you can create objects and then modify the names after the fact. For instance,
new_col <- matrix(runif(nrow(alldata200814) * 10,0,1),ncol = 10)
alldata200814 <- cbind(alldata200814,new_col)
And now you can alter the column names in place using names(alldata200814) <- column_names. You can even use subsetting to only assign to specific column names, like this:
df <- data.frame(x = 1:5,y = 1:5)
names(df)[2] <- 'z'
> df
x z
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5