Thanks for all the help I got from just reading stuff.
I'm not happy with my R loops when I am only dealing within one data.frame because I have to write down the name of the dataframe over and over again which bloats up my R code.
Here is a silly example:
x<- rep(NA,10)
y <- 1:10
dat <- data.frame(x,y)
for(i in 2:nrow(dat)){
dat$x[i] <- dat$y[i] + dat$y[i-1]
}
So what I want to get rid of is that dat$ -bit. Outside loops this can neatly be done with within(), but I am not exactly sure whether you can actually do that with R. I tried it though:
remove(x,y) # In order to avoid accidental usage of the initial vectors
within(dat,{
for(i in 2:nrow(dat)){
x[i] <- y[i] + y[i-1]
}})
The output looks like this:
x y i
1 NA 1 10
2 3 2 10
3 5 3 10
4 7 4 10
5 9 5 10
6 11 6 10
7 13 7 10
8 15 8 10
9 17 9 10
10 19 10 10
So the loop did actually work, it's just that there is a new magical column.
Does anyone know (1) what is going on here and (2) how to elegantly deal with that kind of loops (a more complicated example wrapping within() around a loop including several if() statements and calculations failed btw?
Thanks a lot in advance!
skr
Ben answered your main question, by noting that i is being assigned to by the for loop. You can see that that is so by trying something like this:
for(j in 1:3) cat("hi\n")
hi
hi
hi
> j
[1] 3
One option is just to remove the unwanted i variable by making its value NULL:
within(dat,{
for(i in 2:nrow(dat)){
x[i] <- y[i] + y[i-1]
}
i <- NULL
})
Another is to use with() instead of within():
dat$x <- with(dat, {
for(i in 2:nrow(dat)){
x[i] <- y[i] + y[i-1]
}
x
})
Finally, though I realize yours was a toy example, the best solution will very often be to avoid for loops altogether:
d <- data.frame(y=1:10)
within(d, {x = y + c(NA, head(y, -1))})
# y x
# 1 1 NA
# 2 2 3
# 3 3 5
# 4 4 7
# 5 5 9
# 6 6 11
# 7 7 13
# 8 8 15
# 9 9 17
# 10 10 19
Related
Using R 3.3.2.
My equation is:
Lt_1<-Lt +(Linf-Lt)*(1-exp(-K))
Values for equation parameters:
Lt<-40 #only value that changes
Linf<-139.086
K<-0.413
year=c(0:10)
I want to loop through the equation for the length of year but need to have Lt start as 40, but then take the value of Lt_1 from the last time the equation was calculated.
What I have tried:
#dataframe to old output
new<- as.data.frame(matrix( 0, nrow=length(year), ncol= 2))
predict_length<-for(i in seq_along(year)){
Lt_1<-Lt +(Linf-Lt)*(1-exp(-K))
new[1,2]<-Lt
new[i,2]<-Lt_1
new[i,1]<-data[i]
}
new
Output:
V1 V2
1 0 40.00000
2 1 73.52453
3 2 73.52453
4 3 73.52453
5 4 73.52453
6 5 73.52453
7 6 73.52453
8 7 73.52453
9 8 73.52453
10 9 73.52453
11 10 73.52453
The loop isn't working - the second LT_1 is repeated for the remainder of the data frame.
Consider using Reduce, the common higher order function found in other languages, since you are essentially nesting equation calls together:
eq <- function(Lt) Lt + (Linf-Lt)*(1-exp(-K))
Lt_1 <- eq(Lt)
Lt_2 <- eq(eq(Lt))
Lt_3 <- eq(eq(eq(Lt)))
Hence, wrap Reduce inside an sapply that iteratively with rep passes the input Lt values at successively increasing times:
new <- as.data.frame(matrix( 0, nrow=length(year), ncol= 2))
new$V1 <- year
new$V2 <- sapply(year, function(i)
Reduce(function(x, y) eq(x), rep(40, i+1)))
new
# V1 V2
# 1 0 40.00000
# 2 1 73.52453
# 3 2 95.70645
# 4 3 110.38339
# 5 4 120.09456
# 6 5 126.52008
# 7 6 130.77161
# 8 7 133.58468
# 9 8 135.44598
# 10 9 136.67754
# 11 10 137.49241
The main question OP had raised is about reason why loop is not working for him. The main reason is that for all calculations Lt has been used in formula.
The two changes will be needed in for-loop:
1. Declare Lt_1 out of for-loop
2. Use Lt-1 inplace of Lt1 in formula.
The modified code will be:
Lt<-40 #only value that changes
Linf<-139.086
K<-0.413
year=c(0:10)
#dataframe to old output
new<- as.data.frame(matrix( 0, nrow=length(year), ncol= 2))
new[1,1]<- 0
new[1,2]<- Lt
Lt_1 <- Lt;
for(i in 2:length(year)){
Lt_1<-Lt_1 +(Linf-Lt_1)*(1-exp(-K))
new[i,2]<-Lt_1
new[i,1]<- i
}
new
# V1 V2
#1 0 40.00000
#2 2 73.52453
#3 3 95.70645
#4 4 110.38339
#5 5 120.09456
#6 6 126.52008
#7 7 130.77161
#8 8 133.58468
#9 9 135.44598
#10 10 136.67754
#11 11 137.49241
Here's a quick and dirty solution:
lt0 <- 40
cfunc <- function(lt){
linf <- 139.086
k <- 0.413
lt1 <- lt + (linf-lt)*(1-exp(-k))
}
year = 0:10
val <- c(cfunc(lt0), rep(0, length(year)-1))
for(i in 2:length(year)){
val[i] <- cfunc(val[i-1])
}
output:
> val
[1] 73.52453 95.70645 110.38339 120.09456 126.52008 130.77161 133.58468 135.44598 136.67754 137.49241
[11] 138.03158
There ought to be a better way than using a for loop
this is for setting
#this is for setting
A <- c(1,1,2,2,2,3,4,4,5,5,5)
B <- c(1:10)
C <- c(11:20)
ABC <- data.frame(A,B,C)
#so, I made up my own ABC like this
A B C
1 1 1 11
2 1 2 12
3 2 3 13
4 2 4 14
5 2 5 15
6 3 6 16
7 4 7 17
8 4 8 18
9 5 9 19
10 5 10 20
On this setting,
I want to know, when (A) are in a specific condition, how to get average (B)or(C)
For example
if condition(A) are 2:4, get mean (B), and mean(C)
new_ABC <- subset(ABC, ABC$A >= 2 & ABC$A <= 4)
mean(newABC$B)
mean(newABC$C)
and it works.
But if I want to make a function like this, I tried severe days, I have no idea...
getMeanB <- function(condition){
for(i in min(condition) : max(condition){
# I do not really know what to do..
}
}
Any helps will very thanks!!
If the argument 'condition' is a vector, then we can do it
getMean <- function(data, condition, cName) {
minC <- min(condition)
maxC <- max(condition)
i1 <- data[[cName]] >= minC & data[[cName]] <= maxC
colMeans(data[i1,setdiff(names(data), cName)], na.rm = TRUE)
}
getMean(ABC, 2:4, "A")
# B C
# 5.5 15.5
NOTE: Here, the 'data' and 'cName' arguments are added to make it more dynamic and applied to other datasets with different column names.
I have a dataframe but the numbering of the months is all jumbled. I need to change the following rows to the following, but i'm struggling to see an easy way through. I'm aware that this code changes the data, but it's just a case of working out the puzzle without adding columns together.
data$column[data$column == "0"] <- "7"
0 <- 7
1 <- 8
2 <- 9
3 <- 10
4 <- 1
5 <- 2
6 <- 3
7 <- 4
8 <- 5
9 <- 6
Thank you
maybe plyr::mapvalues() can help you here:
library(plyr)
df$column <- mapvalues(df$column, from = c(0,1:9), to = c(7:10, 1:6))
Use
data$column <- (data$column + 7) %% 10
Suppose the data frame is like this:
df <- data.frame(x = c(1,7,8,15,24,100,9,19,128))
How do I create a new variable that satisfies the following condition:
y = 1 if 1<=x<=7
y = 2 if 8<=x<=14
y = 3 if 15<=x<=21
...
y = k if 1+7*(k-1)<= x<= 7+7*(k-1)
so that I can have the new data frame like this
df <- data.frame(y = c(1,1,2,3,4,15, 2,3, 19))
I am wondering if a for loop can be applied in this case.
Via simple algebra, you can do:
df$y <- floor((df$x+6)/7)
df
# x y
# 1 1 1
# 2 7 1
# 3 8 2
# 4 15 3
# 5 24 4
# 6 100 15
# 7 9 2
# 8 19 3
# 9 128 19
In R you will often find it easier (less typing and less thinking) to use vectorized operators than for loops for simple computations like this. In this case we performed calls to +, /, and floor over a whole vector instead of looping and using them on each element.
The idea is extracting the position of df charactes with a reference of other df, example:
L<-LETTERS[1:25]
A<-c(1:25)
df<-data.frame(L,A)
Compare<-c(LETTERS[sample(1:25, 25)])
df[] <- lapply(df, as.character)
for (i in 1:nrow(df)){
df[i,1]<-which(df[i,1]==Compare)
}
head(df)
L A
1 14 1
2 12 2
3 2 3
This works good but scale very bad, like all for, any ideas with apply, or dplyr?
Thanks
Just use match
Your data (use set.seed when providing data using sample)
df <- data.frame(L = LETTERS[1:25], A = 1:25)
set.seed(1)
Compare <- LETTERS[sample(1:25, 25)]
Solution
df$L <- match(df$L, Compare)
head(df)
# L A
# 1 10 1
# 2 23 2
# 3 12 3
# 4 11 4
# 5 5 5
# 6 21 6