I am looking for a way of doing a rolling product in an ifelse statement that is based on an additional column?
My data looks like this
A B C
1 1 1
2 3 1
3 5 0
4 7 0
The excel formula equivalent would be
C3 = IF(B3=0,(1+A3/10)*C2,1)
I tried using
ifelse(B==0,cumprod(c(1,(A[-1]/10+1))),1)
I couldn't get it working for this case as it is always referring to just the data in column A.
I would expect the following results
A B C
1 1 1 1
2 3 1 1
3 5 0 1.5
4 7 0 2.55
thanks in advance
Try this:
df$C <- cumprod(with(df, ifelse(B==0, A/10+1, 1)))
Or using Reduce:
df$C <- Reduce('*', with(df, ifelse(B==0, A/10+1, 1)), accumulate = T)
Related
All of the variables are on the same scale in the data.frame 1-5.
Example of data.frame
rpi_invert
A B C D
5 2 4 1
3 5 5 2
1 1 3 4
For all values that equal 5 I would like to change it to 1.
for 4 change to 2.
for 2 change to 4.
for 1 change to 5.
Example of data.frame after values have been changed.
rpi_invert
A B C D
1 4 2 5
3 1 1 4
5 5 3 2
What I have tired.
for(b in colnames(rpi_invert)){
rpi_invert[[b]][rpi_invert[[b]] == 5] <- 1
rpi_invert[[b]][rpi_invert[[b]] == 4] <- 2
rpi_invert[[b]][rpi_invert[[b]] == 2] <- 4
rpi_invert[[b]][rpi_invert[[b]] == 1] <- 5
}
This will only change the values in the first row and not the second column.
for(b in colnames(rpi_invert)){
rpi_invert <- ifelse(rpi_invert[[b]] == 5,1,
ifelse(rpi_invert[[b]] == 4,2,
ifelse(rpi_invert[[b]] == 2,4,
ifelse(rpi_invert[[b]] == 1,5,rpi_invert[[b]]))))
}
But this gives me the error:
Error in rpi_invert[[b]] : subscript out of bounds
If I try to the same methods for an individual column instead of looping through the data.frame then both methods work so I am not sure what is the problem.
I am sure what I am trying to do can be done more efficiently without a for loop probably with some type of apply function but I am not sure how.
Any help will be appreciated please let me know if further information is needed.
You can try (if your data.frame is df):
3-(df-3)
# A B C D
#1 1 4 2 5
#2 3 1 1 4
#3 5 5 3 2
or, same but written a bit differently: 6-df
I was writing a loop with if function in R. The table is like below:
ID category
1 a
1 b
1 c
2 a
2 b
3 a
3 b
4 a
5 a
I want to use the for loop with if function to add another column to count each grouped ID, like below count column:
ID category Count
1 a 1
1 b 2
1 c 3
2 a 1
2 b 2
3 a 1
3 b 2
4 a 1
5 a 1
My code is (output is the table name):
for (i in 2:nrow(output1)){
if(output1[i,1] == output[i-1,1]){
output1[i,"rn"]<- output1[i-1,"rn"]+1
}
else{
output1[i,"rn"]<-1
}
}
But the result returns as all count column values are all "1".
ID category Count
1 a 1
1 b 1
1 c 1
2 a 1
2 b 1
3 a 1
3 b 1
4 a 1
5 a 1
Please help me out... Thanks
There are packages and vectorized ways to do this task, but if you are practicing with loops try:
output1$rn <- 1
for (i in 2:nrow(output1)){
if(output1[i,1] == output1[i-1,1]){
output1[i,"rn"]<- output1[i-1,"rn"]+1
}
else{
output1[i,"rn"]<-1
}
}
With your original code, when you made this call output1[i-1,"rn"]+1 in the third line of your loop, you were referencing a row that didn't exist on the first pass. By first creating the row and filling it with the value 1, you give the loop something explicit to refer to.
output1
# ID category rn
# 1 1 a 1
# 2 1 b 2
# 3 1 c 3
# 4 2 a 1
# 5 2 b 2
# 6 3 a 1
# 7 3 b 2
# 8 4 a 1
# 9 5 a 1
With the package dplyr you can accomplish it quickly with:
library(dplyr)
output1 %>% group_by(ID) %>% mutate(rn = 1:n())
Or with data.table:
library(data.table)
setDT(output1)[,rn := 1:.N, by=ID]
With base R you can also use:
output1$rn <- with(output1, ave(as.character(category), ID, FUN=seq))
There are vignettes and tutorials on the two packages mentioned, and by searching ?ave in the R console for the last approach.
looping solution will be painfully slow for bigger data. Here is one line solution using data.table:
require(data.table)
a<-data.table(ID=c(1,1,1,2,2,3,3,4,5),category=c('a','b','c','a','b','a','b','a','a'))
a[,':='(category_count = 1:.N),by=.(ID)]
what you want is actually a column of factor level. do this
df$count=as.numeric(df$category)
this will give out put as
ID category count
1 1 a 1
2 1 b 2
3 1 c 3
4 2 a 1
5 2 b 2
6 3 a 1
7 3 b 2
8 4 a 1
9 5 a 1
provided your category is already a factor. if not first convert to factor
df$category=as.factor(df$category)
df$count=as.numeric(df$category)
I've looked on the internet but I haven found the answer that I'm looking for, but shure it's out there...
I've a data frame, and I want to divide (or any other operation) every cell of a row by a value that it's placed in the second column of my data frame.
So first row from col3 to last col, divide each cell by the value of col2 of that certain row, and so on for every single row.
I have solved this by using a For loop, col2 (delta) it's now a vector, and col3 to end it's a data.frame (mu). The results are append to a new data frame by using rbind.
The question is; I'm pretty sure that this can be done by using the function apply, sapply or similar, but I have not gotten the results that I've been looking so far (not the good ones as I do with the loop for). ¿How can I do it without using a loop for?
Loop for I've been using so far.
In resume.
I want to divide each mu by the delta value of it's own row.
for (i in 1:(dim(mu)[1])){
RA_row <- mu[i,]/delta[i]
RA <- rbind(RA, RA_row)
}
transcript delta mu_5 mu_15 mu_25 mu_35 mu_45 mu_55 mu_65
1 YAL001C 0.066702720 2.201787e-01 1.175731e-01 2.372506e-01 0.139281317 0.081723456 1.835414e-01 1.678318e-01
2 YAL002W 0.106000180 3.685822e-01 1.326865e-01 2.887973e-01 0.158207858 0.193476082 1.867039e-01 1.776946e-01
3 YAL003W 0.022119345 2.271518e+00 2.390637e+00 1.651997e+00 3.802739732 2.733559839 2.772454e+00 3.571712e+00
Thanks
It appears as though you want just:
mu2 <- mu[-(1:2)]/mu[[2]]
# same as mu[-(1:2), ]/mu[['delta']]
That should produce a new dataframe with the division by row. Somewhat more dangerous would be to do the division "in place".
mu[-(1:2)] <- mu[-(1:2)]/mu[[2]]
> mu <- data.frame(a=1,b=1:10, c=rnorm(10), d=rnorm(10) )
> mu
a b c d
1 1 1 -1.91435943 0.45018710
2 1 2 1.17658331 -0.01855983
3 1 3 -1.66497244 -0.31806837
4 1 4 -0.46353040 -0.92936215
5 1 5 -1.11592011 -1.48746031
6 1 6 -0.75081900 -1.07519230
7 1 7 2.08716655 1.00002880
8 1 8 0.01739562 -0.62126669
9 1 9 -1.28630053 -1.38442685
10 1 10 -1.64060553 1.86929062
> (mu2 <- mu[-(1:2)]/mu[[2]])
c d
1 -1.914359426 0.450187101
2 0.588291656 -0.009279916
3 -0.554990812 -0.106022792
4 -0.115882600 -0.232340537
5 -0.223184021 -0.297492062
6 -0.125136500 -0.179198716
7 0.298166649 0.142861258
8 0.002174452 -0.077658337
9 -0.142922281 -0.153825205
10 -0.164060553 0.186929062
> (mu[-(1:2)] <- mu[-(1:2)]/mu[[2]] )
> mu
a b c d
1 1 1 -1.914359426 0.450187101
2 1 2 0.588291656 -0.009279916
3 1 3 -0.554990812 -0.106022792
4 1 4 -0.115882600 -0.232340537
5 1 5 -0.223184021 -0.297492062
6 1 6 -0.125136500 -0.179198716
7 1 7 0.298166649 0.142861258
8 1 8 0.002174452 -0.077658337
9 1 9 -0.142922281 -0.153825205
10 1 10 -0.164060553 0.186929062
I have a data frame in R,
df <- data.frame(a=c(1,1,1,2,2,5,5,5,5,5,6,6), b=c(0,1,0,0,0,0,0,1,0,0,0,1))
I want to remove the rows which has values for the variable b equal to 0 which occurs after the value equals to 1 for the duplicated variable a values.
So the output I am looking for is,
df.out <- data.frame(a=c(1,1,2,2,5,5,5,6,6), b=c(0,1,0,0,0,0,1,0,1))
Is there a way to do this in R?
This should do the trick?
ind = intersect(which(df$b==0), which(df$b==1)+1)
df.out = df[-ind,]
The which(df$b==1) returns the index of the df where b==1. add one to this and intersect with the indexes where b==0.
How about
df[ ave(df$b, df$a, FUN=function(x) x>=cummax(x))==1, ]
# a b
# 1 1 0
# 2 1 1
# 4 2 0
# 5 2 0
# 6 5 0
# 7 5 0
# 8 5 1
# 11 6 0
# 12 6 1
Here we use ave to look within each level of a and we test to see if we've seen a 1 yet with cummax.
I'm trying to select the column with the highest value for each row in a data.frame. So for instance, the data is set up as such.
> df <- data.frame(one = c(0:6), two = c(6:0))
> df
one two
1 0 6
2 1 5
3 2 4
4 3 3
5 4 2
6 5 1
7 6 0
Then I'd like to set another column based on those rows. The data frame would look like this.
> df
one two rank
1 0 6 2
2 1 5 2
3 2 4 2
4 3 3 3
5 4 2 1
6 5 1 1
7 6 0 1
I imagine there is some sort of way that I can use plyr or sapply here but it's eluding me at the moment.
There might be a more efficient solution, but
ranks <- apply(df, 1, which.max)
ranks[which(df[, 1] == df[, 2])] <- 3
edit: properly spaced!