Look, what I want to do: [In Excel is clear and easy, but in R I have a problem...:(]
Column A 1 2 3 4 5
Column B 0 9 2 1 7
That's my real "algorithm":
Column C
(first value) = mean(Column A) = 3
(second value) = ((mean(Column A)*4) + 0)/5 = 2,4
(third value) = ((second value*4) + 9)/5 = 3,72
etc.
So we have:
# A B C
# 1 1 0 3
# 2 2 9 2,4
# 3 3 2 3,72
# 4 4 1 3,37
# 5 5 7 2,90
This is my actually code with your suggestion:
a <- c(1:5)
b <- c(0,9,0,1,7,0)
matrix <- data.frame(A=a,B=b)
matrix <- c(mean(matrix$A), (cumsum(matrix$B) + (mean(matrix$A)*4))/5)
This is solution: 2.4 4.2 4.2 4.4 5.8 (WRONG !!)
Of course R write me error that: "replacement has 6 rows, data has 5" but this isn't relevant...I only want to know, how should I do it??
You could use ?cumsum:
a <- 1:5
b <- c(0, 9, 2, 1, 7)
mean(a) + cumsum(b)
# [1] 3 12 14 15 22
UPDATE:
It seems you want to run a (weighted) moving average. Maybe you should have a look at the TTR package.
Please find an easy approach below:
wma <- function(b, startValue, a=4/5) {
m <- double(length(b)+1)
m[1] <- startValue
for (i in seq(along=b)) {
m[i+1] <- a * m[i] + (1-a) * b[i]
}
return(m)
}
wma(b, mean(a))
# [1] 3.00000 2.40000 3.72000 3.37600 2.90080 3.72064
This solves your issue:
mydf<-data.frame(A=1:5, B=c(0,9,2,1,7))
mydf$C<-cumsum(mydf$B)+mean(mydf$A)
mydf
# A B C
# 1 1 0 3
# 2 2 9 12
# 3 3 2 14
# 4 4 1 15
# 5 5 7 22
Hope it helps.
Related
I have to subset the data of 6 rows every time. How to do that in R?
data:
col1 : 1,2,3,4,5,6,7,8,9,10
col2 : a1,a2,a3,a4,a5,a6,a7,a8,a9,a10
I want to do subset of 6 rows every time. First subset of the rows will have 1:6 ,next subset of the rows will have 7:nrow(data). I have tried using seq function.
seqData <- seq(1,nrow(data),6)
output: It is giving 1 and 7th row but I want 1 to 6 rows first, next onwards 7 to nrow(data).
How to get output like that.
Will this work:
set.seed(1)
dat <- data.frame(c1 = sample(1:5,12,T),
c2 = sample(1:5,12,T))
dat
c1 c2
1 1 2
2 4 2
3 1 1
4 2 5
5 5 5
6 3 1
7 2 1
8 3 5
9 3 5
10 1 2
11 5 2
12 5 1
split(dat, rep(1:ceiling(nrow(dat)/6), each = 6))
$`1`
c1 c2
1 1 2
2 4 2
3 1 1
4 2 5
5 5 5
6 3 1
$`2`
c1 c2
7 2 1
8 3 5
9 3 5
10 1 2
11 5 2
12 5 1
The function below creates a numeric vector with integers increasing by 1 unit every n rows. And uses this vector to split the data as needed.
data <- data.frame(col1 = 1:10, col2 = paste0("a", 1:10))
split_nrows <- function(x, n){
f <- c(1, rep(0, n - 1))
f <- rep(f, length.out = NROW(x))
f <- cumsum(f)
split(x, f)
}
split_nrows(data, 6)
Here's a simple example with mtcars that yields a list of 6 subset dfs.
nrows <- nrow(mtcars)
breaks <- seq(1, nrows, 6)
listdfs <- lapply(breaks, function(x) mtcars[x:(x+5), ]) # increment by 5 not 6
listdfs[[6]] <- listdfs[[6]][1:2, ] #last df: remove 4 NA rows (36 - 32)
Given a two column data.frame with one containing group labels and a second containing integer values ordered from smallest to largest. How can the data be expanded creating pairs of combinations of the integer column?
Not sure the best way to state this. I'm not interested in all possible combinations but instead all unique combinations starting from the lowest value.
In r, the combn function gives the desired output not considering groups, for example:
t(combn(seq(1:4),2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 3
[5,] 2 4
[6,] 3 4
Since the first values is 1 we get the unique combination of (1,2) and not the additional combination of (2,1) which I don't need. How would one then apply a similar method by groups?
for example given a data.frame
test <- data.frame(Group = rep(c("A","B"),each=4),
Val = c(1,3,6,8,2,4,5,7))
test
Group Val
1 A 1
2 A 3
3 A 6
4 A 8
5 B 2
6 B 4
7 B 5
8 B 7
I was able to come up with this solution that gives the desired output:
test <- data.frame(Group = rep(c("A","B"),each=4),
Val = c(1,3,6,8,2,4,5,7))
j=1
for(i in unique(test$Group)){
if(j==1){
one <- filter(test,i == Group)
two <- data.frame(t(combn(one$Val,2)))
test1 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
j=j+1
}else{
one <- filter(test,i == Group)
two <- data.frame(t(combn(one$Val,2)))
test2 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
test1 <- rbind(test1,test2)
}
}
test1
Group Val1 Val2
1 A 1 3
2 A 1 6
3 A 1 8
4 A 3 6
5 A 3 8
6 A 6 8
7 B 2 4
8 B 2 5
9 B 2 7
10 B 4 5
11 B 4 7
12 B 5 7
However, this is not elegant and is really slow as the number of groups and length of each group become large. It seems like there should be a more elegant and efficient solution but so far I have not come across anything on SO.
I would appreciate any ideas!
here is a data.table approach
library( data.table )
#make test a data.table
setDT(test)
#split by group
L <- split( test, by = "Group")
#get unique combinations of 2 Vals
L2 <- lapply( L, function(x) {
as.data.table( t( combn( x$Val, m = 2, simplify = TRUE ) ) )
})
#merge them back together
data.table::rbindlist( L2, idcol = "Group" )
# Group V1 V2
# 1: A 1 3
# 2: A 1 6
# 3: A 1 8
# 4: A 3 6
# 5: A 3 8
# 6: A 6 8
# 7: B 2 4
# 8: B 2 5
# 9: B 2 7
#10: B 4 5
#11: B 4 7
#12: B 5 7
You can set simplify = F in combn() and then use unnest_wider() in dplyr.
library(dplyr)
library(tidyr)
test %>%
group_by(Group) %>%
summarise(Val = combn(Val, 2, simplify = F)) %>%
unnest_wider(Val, names_sep = "_")
# Group Val_1 Val_2
# <chr> <dbl> <dbl>
# 1 A 1 3
# 2 A 1 6
# 3 A 1 8
# 4 A 3 6
# 5 A 3 8
# 6 A 6 8
# 7 B 2 4
# 8 B 2 5
# 9 B 2 7
# 10 B 4 5
# 11 B 4 7
# 12 B 5 7
library(tidyverse)
df2 <- split(df$Val, df$Group) %>%
map(~gtools::combinations(n = 4, r = 2, v = .x)) %>%
map(~as_tibble(.x, .name_repair = "unique")) %>%
bind_rows(.id = "Group")
I'm looking at trying to apply a function in R to each row, while updating each row with the output of the function from the previous row. I know that's a mouthful, but here's an example. Let's say I had dataframe, df:
df<- data.frame(a=c(10,15,20,25,30), b=c(2,4,5,7,10))
And I had a function, funR, that just took the difference between column a and column b:
funR<- function(argA, argB){
c<- argA-argB
return(c)
}
Now a simplified version of what I'd be going for is let's say I apply the function to the first row and get 10 - 2 = 8. I would then want to replace the second row of column a with this output before applying the function to that row, so instead of 15 - 4 I'd be doing 8 - 4. I would then replace 20 in row 3 with 4, and so on and so on.
Edit to show expected output:
a b
1 10 2
2 8 4
3 4 5
4 -1 7
5 -8 10
Any help would be greatly appreciated!
This is really a one-liner in base R:
Method 1:
for (i in 1:(nrow(df) - 1)) df$a[i + 1] <- df$a[i] - df$b[i];
df;
# a b
#1 10 2
#2 8 4
#3 4 5
#4 -1 7
#5 -8 10
Here we implement the recursion relation a[i+1] = a[i] - b[i] in a simple for loop. The for loop will be very fast, as we directly overwrite existing entries in df.
Method 2
Or alternatively:
df$a <- df$a[1] - cumsum(c(0, df$b))[1:length(df$a)];
df;
# a b
#1 10 2
#2 8 4
#3 4 5
#4 -1 7
#5 -8 10
This is based on the expanded recursion relation, where you can see that e.g. a[4] = a[1] - (b[1] + b[2] + b[3]), and so on.
We can also do this with accumulate from purrr
library(purrr)
library(dplyr)
df %>%
mutate(a = accumulate(b[-n()], `-`, .init = a[1]))
# a b
#1 10 2
#2 8 4
#3 4 5
#4 -1 7
#5 -8 10
Here is a faster version if you want to maintain the use of the function funR.
df<- data.frame(a=c(10,15,20,25,30), b=c(2,4,5,7,10))
funR<- function(argA, argB){
n = length(argA)
argC = c(argA[1], argB)
accumdiff <- function(x){
Reduce(function(x1,x2) x1-x2, x, accumulate=TRUE)}
argC = c(argA[1],accumdiff(argC)[c(-1)])
rev(rev(argC)[-1])
}
df$a <- funR(df$a, df$b)
df
# a b
# 1 10 2
# 2 8 4
# 3 4 5
# 4 -1 7
# 5 -8 10
I have a dataframe:
df <- data.frame('a'=c(1,2,3,4,5), 'b'=c(1,20,3,4,50))
df
a b
1 1 1
2 2 20
3 3 3
4 4 4
5 5 50
and I want to create a new column based on existing columns. Something like this:
if (df[['a']] == df[['b']]) {
df[['c']] <- df[['a']] + df[['b']]
} else {
df[['c']] <- df[['b']] - df[['a']]
}
The problem is that the if condition is checked only for the first row... If I create a function from the above if statement then I use apply() (or mapply()...), it is the same.
In Python/pandas I can use this:
df['c'] = df[['a', 'b']].apply(lambda x: x['a'] + x['b'] if (x['a'] == x['b']) \
else x['b'] - x['a'], axis=1)
I want something similar in R. So the result should look like this:
a b c
1 1 1 2
2 2 20 18
3 3 3 6
4 4 4 8
5 5 50 45
One option is ifelse which is vectorized version of if/else. If we are doing this for each row, the if/else as showed in the OP's pandas post can be done in either a for loop or lapply/sapply, but that would be inefficient in R.
df <- transform(df, c= ifelse(a==b, a+b, b-a))
df
# a b c
#1 1 1 2
#2 2 20 18
#3 3 3 6
#4 4 4 8
#5 5 50 45
This can be otherwise written as
df$c <- with(df, ifelse(a==b, a+b, b-a))
to create the 'c' column in the original dataset
As the OP wants a similar option in R using if/else
df$c <- apply(df, 1, FUN = function(x) if(x[1]==x[2]) x[1]+x[2] else x[2]-x[1])
Here is a slightly more confusing algebraic method:
df$c <- with(df, b + ((-1)^((a==b)+1) * a))
df
a b c
1 1 1 2
2 2 20 18
3 3 3 6
4 4 4 8
5 5 50 45
The idea is that the "minus" operator is turned on or off based on the test a==b.
If you want an apply method, then another way with mapply would be create a function and apply it,
fun1 <- function(x, y) if (x == y) {x + y} else {y-x}
df$c <- mapply(fun1, df$a, df$b)
df
# a b c
#1 1 1 2
#2 2 20 18
#3 3 3 6
#4 4 4 8
#5 5 50 45
Using dplyr package:
library(dplyr)
df <- df %>%
mutate(c = if_else(a == b, a + b, b - a))
df
# a b c
# 1 1 1 2
# 2 2 20 18
# 3 3 3 6
# 4 4 4 8
# 5 5 50 45
A solution with apply
myFunction <- function(x){
a <- x[1]
b <- x[2]
#further values ignored (if there are more than 2 columns)
value <- if(a==b) a + b else b - a
#or more complicated stuff
return(value)
}
df$c <- apply(df, 1, myFunction)
I need to paste the quartile number to a table, but it has to avoid overlapping values. An example would be:
table<-data.frame(num1=c(0,1,1,2,2,3,3,3,3,6), num2=seq(20,30,len=10))
if I use ntile from dplyr:
library(dplyr)
table$cuartil<-ntile(table$num1, 4)
I get an overlapping result:
> table
num1 num2 cuartil
0 20.00000 1
1 21.11111 1
1 22.22222 1
2 23.33333 2
2 24.44444 2
3 25.55556 3
3 26.66667 3
3 27.77778 3
3 28.88889 4
6 30.00000 4
Instead of that I would need:
num1 num2 cuartil
0 20 1
1 21.11111 1
1 22.22222 1
2 23.33333 2
2 24.44444 2
3 25.55556 3
3 26.66667 3
3 27.77778 3
3 28.88889 3
6 30 4
So the value 3 is only in one group and not in two (3 and 4).
Is there any other function to calculate percentiles but considering not to overlap values like ntile does?
You could just write your own function using quantile():
quartile <- function(x) {
qrt <- quantile(x)
y <- x
y[x<=qrt[5]] <- 4
y[x<=qrt[4]] <- 3
y[x<=qrt[3]] <- 2
y[x<=qrt[2]] <- 1
y[x<=qrt[1]] <- 0
return(y)
}
table$cuartil <- quartile(table$num1)
Using Sam's Function, a generic one might be:
quantG <- function(x, m)
{library(plyr)
qrt <- quantile(x,probs=seq(0, 1, 1/m) )
y <- x
for (i in (1:5) ) {y[x<=qrt[m+2-i]] <- m+1-i}
return(y)}
And:
table$cuartil <- quantG (table$num1,4)
gets the same result.