Nested loop in R: columns then rows - r

I am trying to write a nested for loop in R, but am running into problems. I have researched as much as possible but can't find (or understand) the help I need. I am fairly new to R, so any advice on this looping would be appreciated, or if there is a simpler, more elegant way!
I have generated a file of daily temperatures for many many locations (I'll call them sites), and the file columns are set up like this:
year month day unix_time site_a site_b site_c site_d ... on and on
For each site (within each column), I want to run through the temperature values and create new columns (or a new data frame) with a number (a physiological rate) that corresponds with a range of those temperatures. (for example, temperatures less than 6.25 degrees have a rate of -1.33, temperatures between 6.25 and 8.75 have a rate of 0.99, etc). I have created a loop that does this for a single column of data. For example:
for(i in 1:dim(data)[1]){
if (data$point_a[i]<6.25) data$rate_point_a[i]<--1.33 else
if (data$point_a[i]>=6.25 && data$point_a[i]<8.75) data$rate_point_a[i]<-0.99 else
if (data$point_a[i]>=8.75 && data$point_a[i]<11.25) data$rate_point_a[i]<-3.31 else
if (data$point_a[i]>=11.25 && data$point_a[i]<13.75) data$rate_point_a[i]<-2.56 else
if (data$point_a[i]>=13.75 && data$point_a[i]<16.25) data$rate_point_a[i]<-1.81 else
if (data$point_a[i]>=16.25 && data$point_a[i]<18.75) data$rate_point_a[i]<-2.78 else
if (data$point_a[i]>=18.75 && data$point_a[i]<21.25) data$rate_point_a[i]<-3.75 else
if (data$point_a[i]>=21.25 && data$point_a[i]<23.75) data$rate_point_a[i]<-1.98 else
if (data$point_a[i]>=23.75 && data$point_a[i]<26.25) data$rate_point_a[i]<-0.21
}
The above code gives me a new column called "rate_site_a" that has my physiological rates. What I am having trouble doing is nesting this loop into another loop that runs through all of the columns. I have tried things such as:
for (i in 1:ncol(data)){
#for each row in that column
for (s in 1:length(data)){
if ([i]<6.25) rate1[s]<--1.33 else ...
I guess I don't know how to make the "if else" statement refer to the correct places. I know that I can't add the "rate" columns onto the existing data frame, as this would increase my ncol as I go through the loop, so need to put them into another data frame (though don't think this is my main issue). I am going to have many many many points to work through and would rather not have to do them one at a time, hence my attempt at a nested loop.
Any help would be much appreciated. Here is a link to some sample data if that is helpful. http://dl.dropbox.com/u/17903768/AVHRR_output.txt Thanks in advance!

Use ifelse which is vectorized:
ifelse(data$point<= 6.25,-1.33,ifelse(data$point<= 8.25,-0.99,ifelse(data$point<= 11.25,-3.31,.....Until finished.
For instance:
datap=read.table('http://dl.dropbox.com/u/17903768/AVHRR_output.txt',header=T)
apply(datap[,5:9],2,function(x){
datap$x =
ifelse(x<=6.25,1.33,
ifelse(x<=8.75,-0.99,
ifelse(x<=11.25,-3.31,
ifelse(x<=13.75,-2.56,
ifelse(x<=16.25,-1.81,
ifelse(x<=18.75,-2.78,
ifelse(x<=21.25,-3.75,
ifelse(x<=23.75,-1.98,-0.21))))))))})

Andres answer is great for the apply part to get you thru all the "temperature" columns. I'm stuck here without a copy of R (at work) to experiment with, but I suspect if you create a vector of your cutoff values
xcut <- c(0,6.25,8.75,.11.25,...
and just do
x <- xcut[(which(x>xcut))]
you'll have a much simpler bit of code, and easier to edit as well. (note: I added the 0 value to avoid problems with small x values :-) )

here's another way using just logicals:
DAT <- read.table("http://dl.dropbox.com/u/17903768/AVHRR_output.txt",header=TRUE,as.is=TRUE)
recodecolumn <- function(x){
out <- vector(length=length(x))
out[x < 6.25] <- 1.33
out[x >= 6.25 & x < 8.75] <- .99
out[x >= 8.75 & x < 11.25] <- 3.31
out[x >= 11.25 & x < 13.25] <- 2.56
out[x >= 13.25 & x < 16.25] <- 1.81
out[x >= 16.25 & x < 18.75] <- 2.78
out[x >= 18.75 & x < 21.25] <- 3.75
out[x >= 21.25 & x < 23.75] <- 1.98
out[x >= 23.75 & x < 26.25] <- 0.21
out
}
NewCols <- apply(DAT[,5:9],2,recodecolumn)
colnames(NewCols) <- paste("rate",1928:1932,sep="_")
DAT <- cbind(DAT,NewCols)

I find that findInterval is useful in situations like this instead of nested if else statements as it is already vectorized and returns the position within a vector of cutoff points.
DAT <- read.table("http://dl.dropbox.com/u/17903768/AVHRR_output.txt",header=TRUE,as.is=TRUE)
recode.fn <- function(x){
cut.vec <- c(0, seq(6.25,26.25,by = 2.5),Inf)
recode.val <- c(-1.33, 0.99, 3.31, 2.56,1.81,2.78,3.75,1.98, 0.21)
cut.interval <- findInterval(x, cut.vec, FALSE)
return(recode.val[cut.interval])
}
# Add on recoded data to existing data frame
DAT[,10:14] <- sapply(DAT[,5:9],FUN=recode.fn)

Related

Calculate elements of time series for each variable (loop?)

I need to calculate each component of the time series for each X (50 levels) and Y (80 levels) from my dataset (df).
I wanted to go with something akin to the code below, where I tried to just get the seasonality. If I can get this it should be the same for the trend and random component of the decompose.
P <- df$X
for(y in 1:length(P)) {
OneP <- P[y]
AllS <- unique(df$Y[df$X== OneP])
for(i in 1:length(AllS)) {
OneS<- AllS[i]
df$TS[df$Y == OneS & df$X== OneP] <- ts(df$Mean[df$Y == OneS & df$X
== OneP], start = c(1999, 1), end = c(2015, 12), frequency = 12)
df$Dec[df$Y == OneS & df$X== OneP] <- decompose(ts(df$TS[df$Y == OneS &
df$X== OneP], frequency = 12), type = c("additive"))
df$Decomposition_seasonal[df$Y == OneS & df$X== OneP] <- df$Dec([df$Y == OneS & df$X== OneP], Dec$seasonal)
}
But this is not working. Error message is:
Error: attempt to apply non-function
I understand that the problem might come from my attempt to put decomposition output in a column. But how else to do it? Make a new dataset for every dev in every X * Y combination?
I know that the first lines of the code work as I used it before for something else. And I know this will run and give me TS and decomposition. It's the individual components bit that I am struggling with. Any advice is deeply appreciated.
Similar data:
X Y Mean Date(mY)
Tru A 35.6 02.2015
Fle A 15 05.2010
Srl C 67.1 05.1999
Tru A 13.2 08.2006
Srl B 89 08.2006
Tru B 14.8 12.2001
Fle A 21.5 11.2001
Lub D 34.8 03.2000

Efficient way to bin data ranges in R

I have several hundred variables in a data frame which need to be binned into buckets.
Currently, I'm using code similar to the following:
idx <- list()
idx[[1]] <- which(df$myVariable < 628 & df$myVariable >= 0)
idx[[2]] <- which(df$myVariable < 774 & df$myVariable >= 628)
idx[[3]] <- which(df$myVariable < 885 & df$myVariable >= 774)
idx[[4]] <- which(df$myVariable <= Inf & df$myVariable >= 4819)
idx[[5]] <- which(df$myVariable < 0)
df$myVariable[idx[[1]]] = 1
df$myVariable[idx[[2]]] = 2
df$myVariable[idx[[3]]] = 3
df$myVariable[idx[[4]]] = 4
df$myVariable[idx[[5]]] = 0
In reality, there are 21 ranges of values for each of the variables, and the cut points may vary between the variables. So, in full, this code is over 30,000 lines long (I have a script which generates it).
Is there a better way to represent this code? Ideally it would make use of dplyr, since I intend to run this code in sparklyr, but if that is not possible, native R code is fine (thanks to the spark_apply function).

For Loop with If/If Else statement with simple math functions need help storing

I am trying to create a simple For Loop that will run through a column of numbers in my dataframe and perform three different simple math functions on the column of numbers based on simple conditions.
Basically if my RiB value is less than or greater than a constant (pos_RiB <- 0.011, neg_RiB <- -0.011) I want the function I wrote to do the math on the RiB value and store it, if the RiB value falls inbetween the constants, I want the RiB value to just store as it is.
Here is my code:
#Set empty storage vector
vec <- vector()
#Set positive and negative RiB constants for loop
#Create functions for different atmospheric stability conditions
RiB <- sp3_join_a$RiB
Unstable <- function(RiB){
(1-(16*RiB))^0.75
}
Stable <- function(RiB){
(1-(5*RiB))^2
}
Neutral <- function(RiB){
RiB == RiB
}
#Condition constants
pos_RiB <- 0.011
neg_RiB <- -0.011
#For Loop
for(i in (sp3_join_a$RiB)){
if (sp3_join_a$RiB > pos_RiB){
vec[i] <- Unstable(sp3_join_a$RiB[i])
}
else if (sp3_join_a$RiB < neg_RiB){
vec[i] <- Stable(sp3_join_a$RiB[i])
}
else (sp3_join_a$RiB < pos_RiB && sp3_join_a > neg_RiB)
vec[i] <- Neutral(sp3_join_a$RiB[i])
}
sp3_join_a$vec <- vec
In my dataframe sp3_join_a$RiB values are numeric and look like (0.15099768 0.13389330 0.08309406 0.06137715 0.06234167 0.05491064 0.04332422 0.05927553 0.03774791 0.04653331).
Maybe you can try nested ifelse
sp3_join_a <- within(sp3_join_a,
vec <- ifelse(RiB > pos_RiB,
Unstable(RiB),
ifelse(RiB < neg_RiB,
Stable(RiB),
Neutral(RiB))))
Using data.table you can chain expressions and do conditional subsetting and sub assign values by reference, avoiding a for loop
library(data.table)
# dummy data
sp3_join_a <- data.table(RiB = runif(100, -0.015, 0.015))
pos_RiB <- 0.011
neg_RiB <- -0.011
The solution is then:
# Chaining expressions with (sub)assignment on conditional subsets
sp3_join_a[, vec := RiB][RiB > pos_RiB, vec := (1-(16*RiB))^0.75][RiB < neg_RiB, vec := (1-(5*RiB))^2]
Note you can use standard base R syntax for a very similar approach, I just used data.table because its more efficient.
sp3_join_a[,"vec"] <- sp3_join_a[,"RiB"]
sp3_join_a[sp3_join_a[,"RiB"] > pos_RiB, "vec"] <- Unstable(sp3_join_a[sp3_join_a[,"RiB"] > pos_RiB, "RiB"])
sp3_join_a[sp3_join_a[,"RiB"] < neg_RiB, "vec"] <- Stable(sp3_join_a[sp3_join_a[,"RiB"] < neg_RiB, "RiB"])
To use your loop there are some adjustments I'd suggest to make
# send RiB to vec column as is before the loop
sp3_join_a$vec <- sp3_join_a$RiB
# i needs to reference by position (1:nRows)
for(i in 1:length(sp3_join_a$RiB)){
# assign straight to the dataframe
if (sp3_join_a$RiB[i] > pos_RiB){
sp3_join_a$vec[i] <- Unstable(sp3_join_a$RiB[i])
}
if (sp3_join_a$RiB[i] < neg_RiB){
sp3_join_a$vec[i] <- Stable(sp3_join_a$RiB[i])
}
}

Conditional Max/Min values within group_by in R

I have been searching for an answer to this for a while without much luck so fingers crossed someone can help me!
I am dealing with cyclical data and I am trying to find the associated value of the two peaks and two troughs - this doesn't necessary equate to the max/min and second max/min values but rather the max/min and then the second max/min values conditional on the value being larger/smaller than the preceding and subsequent values.
This is an example of one cycle
x <- c(3.049, 3.492, 3.503, 3.429, 3.013, 2.881, 2.29, 1.785, 1.211, 0.890, 0.859, 0.903, 1.165, 1.634, 2.073, 2.477, 3.162, 3.207, 3.177, 2.742, 2.24, 1.827, 1.358, 1.111, 1.063, 1.098, 1.287, 1.596, 2.169, 2.292)
I have 1000's of cycles so I am using group_by in dplyr to group the cycles and then hoped to apply the conditional max/min argument within groups.
I would appreciate any advice with this,
Thanks in advance
Edit
I have since used the below function with just a slight edit on the last line
return(data.frame(Data.value=x, Time=y, Date=z,HHT=peak, LLT=trough))
where x is my original x above, y is a time var and z is a date var. This allowed me to do some extra calculations on the results (I needed the time at which the value was min/max as well as the value itself).
So now I have a dataframe with everything I need but it is only for one date - I still can't get this run through the whole dataset using the group_by function. I have tried sub-setting by date using
subsets<-split(data, data$datevar, drop=TRUE)
But still need a way to somehow run the findminmax function (and my few extra lines of code) for each subset. Any ideas?
Consider the following custom function that you can pass in a dplyr group_by() procedure. Essentially, function iterates through list of cyclical values and compares neighbor before and after it. Peaks would have neighbors both lower than itself and troughs with neighbors larger than iteself.
findminmax <- function(x){
peak <- list(NA, NA) # INITIALIZE TEMP LISTS AND ITERATORS
p <- 1
trough <- list(NA, NA)
t <- 1
for (i in 1:length(x)){
if (i != 1 & i != length(x)){ # LEAVES OUT FIRST AND LAST VALUES
if ((x[i] > x[i-1]) & (x[i] > x[i+1])) { # COMPARES IF GREATER THAN NEIGHBORS
peak[p] <- x[i]
p <- p + 1
}
if ((x[i] < x[i-1]) & (x[i] < x[i+1])){ # COMPARES IF LESS THAN NEIGHBORS
trough[t] <- x[i]
t <- t + 1
}
}
}
return(list(peak1=peak[[1]], peak2=peak[[2]],
trough1=trough[[1]], trough2=trough[[2]]))
}
result <- findminmax(x)
#$peak1
#[1] 3.503
#$peak2
#[1] 3.207
#$trough1
#[1] 0.859
#$trough2
#[1] 1.063
For dplyr's group_by:
finaldf <- originaldf %>%
group_by(z) %>%
summarise(Time = mean(y),
HHT1 = findminmax(x)$peak1,
HHT2 = findminmax(x)$peak2,
LLT1 = findminmax(x)$trough1,
LLT2 = findminmax(x)$trough2)

speeding up boolean logic loop in R

I am very new to R but I am interested in learning more and improving.
I have a dataset with around 40,000+ rows containing the length of neuron segments. I want to compare the length trends of neurons of different groups. The first step in this analysis involves sorting the measurements into 1 of 6 different categories such as '<10' '10-15', '15-20', '20-25', '25-30', and '>30'.
I created these categories as appended columns using 'mutate' from the 'dplyr' package and now I am trying to write a boolean function to determine where the measurement fits by applying a value of '1' to the corresponding column if it fits, and a '0' if it doesn't.
Here is what I wrote:
for (i in 1:40019) {
{if (FinalData$Length[i] <=10)
{FinalData$`<10`[i]<-1
} else {FinalData$`<10`[i]<-0}} #Fills '<10'
if (FinalData$Length[i] >=10 & FinalData$Length[i]<15){
FinalData$`10-15`[i]<-1
} else{FinalData$`10-15`[i]<-0} #Fills'10-15'
if (FinalData$Length[i] >=15 & FinalData$Length[i]<20){
FinalData$`15-20`[i]<-1
} else{FinalData$`15-20`[i]<-0} #Fills '15-20'
if (FinalData$Length[i] >=20 & FinalData$Length[i]<25) {
FinalData$`20-25`[i]<-1
} else{FinalData$`20-25`[i]<-0} #Fills '20-25'
if(FinalData$Length[i] >=25 & FinalData$Length[i]<30){
FinalData$`25-30`[i]<-1
} else{FinalData$`25-30`[i]<-0} #Fills '25-30'
if(FinalData$Length[i] >=30){
FinalData$`>30`[i]<-1
} else{FinalData$`>30`[i]<-0} #Fills '>30'
}
This seems to work, but it takes a long time:
system.time(source('~/Desktop/Home/Programming/R/Boolean Loop R.R'))
user system elapsed
94.408 19.147 118.203
The way I coded this seems very clunky and inefficient. Is there a faster and more efficient way to code something like this or am I doing this appropriately for what I am asking for?
Here is an example of some of the values I am testing:
'Length': 14.362, 12.482337, 8.236, 16.752, 12.045
If I am not being clear about how the dataframe is structured, here is a screenshot:
How my data frame is organized
You can use the cut function in R. It is used to convert numeric values to factors:
x<-c(1,2,4,2,3,5,6,5,6,5,8,0,5,5,4,4,3,3,3,5,7,9,0,5,6,7,4,4)
cut(x = x,breaks = c(0,3,6,9,12),labels = c("grp1","grp2","grp3","grp4"),right=F)
set right = "T" or "F" as per your need.
You can vectorise that as follows (I made a sample of some data called DF)
DF <- data.frame(1:40000,sample(letters,1:40000,replace=T),"Length"=sample(1:40,40000,replace=T))
MyFunc <- function(x) {
x[x >= 10 & x < 15] <- "10-15"
x[x >= 15 & x < 20] <- "15-20"
x[x >= 20 & x < 25] <- "20-25"
x[x >= 25 & x < 30] <- "25-30"
x[x > 30] <- ">30"
x[x < 10] <- "<10"
return(x)
}
DF$Group <- MyFunc(DF[,3])
If it has to be 6 columns like that, you can modify the above to return a one or zero for the appropriate size and everything else, respectively, for each of the 6 columns.
Edit: I guess a series of ifelse might be best if it really has to be 6 columns like that.
e.g.
DF$'<10' <- sapply(DF$Length, function(x) ifelse(x < 10,1,0))

Resources