Looping through dataframes in R - r

I am trying to loop through all the rows of a column in a DataFrame. I read in the csv using data.table. I am new to R and was wondering what way I would go about doing something like this:
for i in row_2_of_dataframe:
if i == 0:
#Do something to that value
else:
#Leave it the way it is
Any help would be great.

I would recommend using the ifelse() function. For example;
mydf$column_name <- ifelse(mydf$column_name == 0, "do something",mydf$column_name)

frame <- data.frame(x = as.character(rep("bye", 11)),
y = as.character(0:10),
stringsAsFactors = FALSE)
for (i in 1:length(frame[, 2])) {
if (frame[, 2][i] == 0) {
frame[, 2][i] <- "hi"
}
}
You don't even really need an else statement.
Furthermore,
frame[, 2]
selects the second column and turns it into a vector.
frame[, 1]
would select the first column.
frame[1, ]
would select the first row.
And so on.
Cheers.

Related

Create new columns in data table with for loop

Is there a simple way to create columns within a for loop? I know this question has been asked here multiple times and I have tried this solution adjusted to my case
for (i in 1:100) {
eval(parse(text = paste0('a$colum', i, ' <- whatever_you_want_your_column_to_contain')))
}
from one of the posts, but it did not help. I have an existing data table data and I am trying to create columns P_1 to P_30 within a for loop and then assign them NULL (I am just trying to pre-define the columns). I have tried this:
for (i in 1:30) {
eval(parse(text = paste0('data$P_', i, ' <- NULL')))
}
but without any success. Can you please suggest any approach that would work?
A related question - how to refer to those columns in another loop, if I have again column P_i where i is from 1 to 30, how to refer to data$P_i within a loop?
Edit:
I have this data table to make an example:
customer_id <- c("1","1","1","2","2","2","2","3","3","3")
account_id <- as.character(c(11,11,11,55,55,55,55,38,38,38))
obs_date <- c(as.Date("2017-01-01","%Y-%m-%d"), as.Date("2017-02-01","%Y-%m-%d"), as.Date("2017-03-01","%Y-%m-%d"),
as.Date("2017-12-01","%Y-%m-%d"), as.Date("2018-01-01","%Y-%m-%d"), as.Date("2018-02-01","%Y-%m-%d"),
as.Date("2018-03-01","%Y-%m-%d"), as.Date("2018-04-01","%Y-%m-%d"), as.Date("2018-05-01","%Y-%m-%d"),
as.Date("2018-06-01","%Y-%m-%d"))
variable <- c(87,90,100,120,130,150,12,13,15,14)
data <- data.table(customer_id,account_id,obs_date,variable)
and I found out that the problem is really in assigning that NULL to those columns, because when I am doing this based on the post's advice:
for (i in 1:30) {
eval(parse(text = paste0('data$P_', i, ' <- 1')))
}
it really works, just with the NULL instead of 1 it does not. So, it is not a bad advice, it just does not work with NULL.
Here's a data.table answer - I think you were close, you just didn't have the right syntax to append a column to a data table:
for (i in 1:30) {
data[, paste0("P_", i) := "whatever_you_want_your_column_to_contain"]
}

How to create a sorted vector in r

I have a list of elements in a random order. I want to read each element of this data one at a time and insert into other list in a sorted order. I wonder how to do this in R. I tried the below code.
lst=list()
x=c(2,3,1,4,5)
for(i in 1:length(x)) ## for reading the elements from x
{
if(lst==NULL)
{
lst=x[i]
}
else
{
lst=x[i]
print(lst)
for(k in 2: length(lst)) ## For sorting the elements in a list
{
value = lst[k]
j=k-1
while(j>=1 && lst[j]>value)
{
lst[j+1] = lst[j]
j= j-1
}
lst[j+1] = value
}
}
print(lst)
}
But i get the the Error :
error in if (lst == NULL) { : argument is of length zero.
For big datasets with lots of columns, you can use do.call
df1 <- df[do.call(order, df),]
Checking the order by specifying the column names,
df2 <- df[with(df, order(V1, V2, V3, V4)),]
identical(df1,df2)
#[1] TRUE
If you need to order in the reverse direction
df[do.call(order, c(df,decreasing=TRUE)),]
data
set.seed(24)
df <- as.data.frame(matrix(sample(letters,10*4,replace=TRUE),ncol=4))
First off, as commenters as pointed, you could use sort or order. But I believe you are trying to solve an assignment.
Your problem is a typo. Try executing in a console:
lst <- list()
lst == NULL
The last line evaluates to a null-length vector (logical(0)) for which R has no interpretation. Instead you are interested in
is.null(lst)
which will return TRUE or FALSE.

R: How to convert from loops and rbinds to efficient code?

I'm new to R. I have a problem to solve, and a working function below that solves it nicely (in decent time). But, from what I'm reading on R tutorials, and here on SO, I feel like I'm doing way too much work to solve it. Is there some fancy R way to collapse this all into a few lines?
The problem to solve: Given a CSV file of data of character data, and a "flag" argument, extract the value at position [row, 1]. "row" is calculated to be the minimum value from column "InterestingColumn" for "flag a", the maximum value from column "Interesting Column" for "flag b", or the n-th value defined by a numeric "flag". The output should be grouped by the unique values of "InterestingColumn". The returned result should be a data frame. The column schema is known, but the length of the file is not.
My instinct is that I should be able to get rid of the for loop altogether, and also that my reconstruction of the matrix with rbind each time is inefficient (like this?) Any tutelage would be appreciated, thanks!
myfunc <- function(flag = "a") {
csv <- read.csv("data.csv", colClasses = "character")
col <- unique(csv$InterestingColumn)
output <- NULL
for (i in 1:length(col)) {
sub <- subset(csv, InterestingColumn == col[i])
vals <- as.numeric(sub[, 12])
if (flag == "a") {
output <- rbind(output, matrix(c(sub[which.min(vals),1], col[i]), ncol = 2))
}
else if (flag == "b") {
output <- rbind(output, matrix(c(sub[which.max(vals),1], col[i]), ncol = 2))
}
else if (is.numeric(flag)) {
output <- rbind(output, matrix(c(sub[flag,1], col[i]), ncol = 2))
}
colnames(output) <- c("data", "col")
as.data.frame(output)
}
}
Say that column 12 is named Col12. Then aggregate may be in order. Everything after the read.csv call in the function should be handled by the following expression (but you may want to set the names of the resulting data frame):
aggregate(Col12 ~ InterestingColumn, data=csv, FUN=function(x) {
if (flag == "a") {
min(x);
} else if (flag == "b") {
max(x);
} else if (is.numeric(flag)) {
x[flag];
}
})

Logical "Except" operator for If statements in R

FYI, I'm new to using R so my code is likely quite clunky. I've done my homework on this but haven't been able to find an "Except" logical operator for R and really need something like that in my code. My input data is a .csv containing integers and null values with 12 columns and 1440 rows.
oneDayData <- read.csv("data.csv") # Loading data
oneDayMatrix <- data.matrix(oneDayData, rownames.force = NA) #turning data frame into a matrix
rowBefore <- data.frame(oneDayData[i-1,10], stringsAsFactors=FALSE) # Creating a variable to be used in the if statement, represents cell before the cell in the loop
ctr <- 0 # creating a counter and zeroing it
for (i in 1:nrow(oneDayMatrix)) {
if ((oneDayMatrix[i,10] == -180) & (oneDayMatrix[i,4] == 0)) { # Makes sure that there is missing data matched with a zero in activityIn
impute1 <- replace(oneDayMatrix[ ,10], oneDayMatrix[i,10], rowBefore)
ctr <- (ctr + 1) # Populating the counter with how many rows get changed
}
else{
print("No data fit this criteria.")
}
}
print(paste(ctr, "rows have been changed.")) # Printing the counter and number of rows that got changed enter code here
I would like to add some kind of EXCEPT condition to my if statement or equivalent that says something like: employ the two previous conditions (see if statement in code) EXCEPT when oneDayMatrix[i-1, 4] > 0. I would really appreciate any help with this and thank you in advance!
"Except" is equivalent to "if not". The "not" operator in R is !. So to add that oneDayMatrix[i-1, 4] > 0 exception, you just need to modify your if statement as follows:
if ((oneDayMatrix[i, 10] == -180) &
(oneDayMatrix[i, 4] == 0) &
!(oneDayMatrix[i-1, 4] > 0)) { ... }
or equivalently:
if ((oneDayMatrix[i, 10] == -180) &
(oneDayMatrix[i, 4] == 0) &
(oneDayMatrix[i-1, 4] <= 0)) { ... }
This goes on top of a couple fixes that need to be made to your code:
as I pointed out, rowBefore is not defined properly: in terms of i which is not defined yet. Inside your for loop, just replace rowBefore with oneDayMatrix[i-1, 10]
as #noah pointed out, you need to start your loop at the second index: for (i in 2:nrow(oneDayMatrix)).

R: create vector from nested for loop

I have a "hit list" of genes in a matrix. Each row is a hit, and the format is "chromosome(character) start(a number) stop(a number)." I would like to see which of these hits overlap with genes in the fly genome, which is a matrix with the format "chromosome start stop gene"
I have the following function that works (prints a list of genes from column 4 of dmelGenome):
geneListBuild <- function(dmelGenome='', hitList='', binSize='', saveGeneList='')
{
genomeColumns <- c('chr', 'start', 'stop', 'gene')
genome <- read.table(dmelGenome, header=FALSE, col.names = genomeColumns)
chr <- genome[,1]
startAdjust <- genome[,2] - binSize
stopAdjust <- genome[,3] + binSize
gene <- genome[,4]
genome <- data.frame(chr, startAdjust, stopAdjust, gene)
hits <- read.table(hitList, header=TRUE)
chrHits <- hits[hits$chr == "chr3R",]
chrGenome <- genome[genome$chr == "chr3R",]
genes <- c()
for(i in 1:length(chrHits[,1]))
{
for(j in 1:length(chrGenome[,1]))
{
if( chrHits[i,2] >= chrGenome[j,2] && chrHits[i,3] <= chrGenome[j,3] )
{
print(chrGenome[j,4])
}
}
}
genes <- unique(genes[is.finite(genes)])
print(genes)
fileConn<-file(saveGeneList)
write(genes, fileConn)
close(fileConn)
}
however, when I substitute print() with:
genes[j] <- chrGenome[j,4]
R returns a vector that has some values that are present in chrGenome[,1]. I don't know how it chooses these values, because they aren't in rows that seem to fulfill the if statement. I think it's an indexing issue?
Also I'm sure that there is a more efficient way of doing this. I'm new to R, so my code isn't very efficient.
This is similar to the "writing the results from a nested loop into another vector in R," but I couldn't fix it with the information in that thread.
Thanks.
I believe the inner loop could be replaced with:
gene.in <- ifelse( chrHits[i,2] >= chrGenome[,2] & chrHits[i,3] <= chrGenome[,3],
TRUE, FALSE)
Then you can use that logical vector to select what you want. Doing
which(gene.in)
might also be of use to you.

Resources