I have a simple for-loop which works as I would like on vectors, I would like to use my for-loop on a column of a dataframe grouped by another column in the dataframe e.g.:
# here is my for-loop working as expected on a simple vector:
vect <- c(0.5, 0.7, 0.1)
res <- vector(mode = "numeric", length = 3)
for (i in 1:length(vect)) {
res[i] <- sum(exp(-2 * (vect[i] - vect[-i])))
}
res
[1] 1.9411537 0.9715143 5.5456579
And here is psuedo-code trying to do it on a column of a dataframe:
#Example data
my.df <- data.frame(let = rep(LETTERS[1:3], each = 3),
num1 = 1:3, vect = c(0.5, 0.7, 0.1), num3 = NA)
my.df
let num1 vect num3
1 A 1 0.5 NA
2 A 2 0.7 NA
3 A 3 0.1 NA
4 B 1 0.5 NA
5 B 2 0.7 NA
6 B 3 0.1 NA
7 C 1 0.5 NA
8 C 2 0.7 NA
9 C 3 0.1 NA
# My attempt:
require(tidyverse)
my.df <- my.df %>%
group_by(let) %>%
mutate(for (i in 1:length(vect)) {
num3[i] <- sum(exp(-4 * (vect[i] - vect[-i])))
})
What result should look like (but my psuedo code above doesn't work):
let num1 vect num3
1 A 1 0.5 1.9411537
2 A 2 0.7 0.9715143
3 A 3 0.1 5.5456579
4 B 1 0.5 1.9411537
5 B 2 0.7 0.9715143
6 B 3 0.1 5.5456579
7 C 1 0.5 1.9411537
8 C 2 0.7 0.9715143
9 C 3 0.1 5.5456579
I feel like I am not using tidyverse logic by trying to having a for-loop inside mutate, any suggestions much appreciated.
The simple solution is to create a custom function and pass that to mutate. A working solution:
custom_func <- function(vec) {
res <- vector(mode = "numeric", length = 3)
for (i in 1:length(vect)) {
res[i] <- sum(exp(-2 * (vect[i] - vect[-i])))
}
res
}
library(tidyverse)
my.df %>%
group_by(let) %>%
mutate(num3 = custom_func(vect))
#> # A tibble: 9 x 4
#> # Groups: let [3]
#> let num1 vect num3
#> <fct> <int> <dbl> <dbl>
#> 1 A 1 0.5 1.94
#> 2 A 2 0.7 0.972
#> 3 A 3 0.1 5.55
#> 4 B 1 0.5 1.94
#> 5 B 2 0.7 0.972
#> 6 B 3 0.1 5.55
#> 7 C 1 0.5 1.94
#> 8 C 2 0.7 0.972
#> 9 C 3 0.1 5.55
I'm wondering whether a more elegant version of the custom function is possible - perhaps someone smarter than me can tell you whether purrr::map, for example, could provide an alternative.
We can use map_dbl from purrr and apply the formula for calculation.
library(dplyr)
library(purrr)
my.df %>%
group_by(let) %>%
mutate(num3 = map_dbl(seq_along(vect), ~ sum(exp(-2 * (vect[.] - vect[-.])))))
# let num1 vect num3
# <fct> <int> <dbl> <dbl>
#1 A 1 0.5 1.94
#2 A 2 0.7 0.972
#3 A 3 0.1 5.55
#4 B 1 0.5 1.94
#5 B 2 0.7 0.972
#6 B 3 0.1 5.55
#7 C 1 0.5 1.94
#8 C 2 0.7 0.972
#9 C 3 0.1 5.55
You can turn your for-loop into a sapply-call and then use it in mutate.
sapply takes a function and aplys it to each list-element. In this case I'm looping over the number of elements in each groups (n()).
my.df %>%
group_by(let) %>%
mutate(num3 = sapply(1:n(), function(i) sum(exp(-2 * (vect[i] - vect[-i])))))
# A tibble: 9 x 4
# Groups: let [3]
# let num1 vect num3
# <fct> <int> <dbl> <dbl>
# 1 A 1 0.5 1.94
# 2 A 2 0.7 0.972
# 3 A 3 0.1 5.55
# 4 B 1 0.5 1.94
# 5 B 2 0.7 0.972
# 6 B 3 0.1 5.55
# 7 C 1 0.5 1.94
# 8 C 2 0.7 0.972
# 9 C 3 0.1 5.55
This is essential equivalent to the very wrong looking for-loop inside a mutate call. In this case, however I'd prefer the custom-function provided by A. Stam.
my.df %>%
group_by(let) %>%
mutate(num3 = {
res <- numeric(length = n())
for (i in 1:n()) {
res[i] <- sum(exp(-2 * (vect[i] - vect[-i])))
}
res
})
You can also replace sapply with purrr's map_dbl.
Or using data.table
library(data.table)
setDT(my.df)[, num3 := unlist(lapply(seq_len(.N),
function(i) sum(exp(-2 * (vect[i] - vect[-i]))))), let]
my.df
# let num1 vect num3
#1: A 1 0.5 1.9411537
#2: A 2 0.7 0.9715143
#3: A 3 0.1 5.5456579
#4: B 1 0.5 1.9411537
#5: B 2 0.7 0.9715143
#6: B 3 0.1 5.5456579
#7: C 1 0.5 1.9411537
#8: C 2 0.7 0.9715143
#9: C 3 0.1 5.5456579
Related
I have a dataframe:
date comp ei
1 1/1/73 A NA
2 1/4/73 A 0.6
3 1/7/73 A 0.7
4 1/10/73 A 0.9
5 1/1/74 A 0.4
6 1/4/74 A 0.5
7 1/7/74 A 0.7
8 1/10/74 A 0.7
9 1/1/75 A 0.4
10 1/4/75 A 0.5
11 1/1/73 B 0.8
12 1/4/73 B 0.8
13 1/7/73 B 0.5
14 1/10/73 B 0.6
15 1/1/74 B 0.3
16 1/4/74 B 0.2
17 1/1/73 C NA
18 1/4/73 C 0.6
19 1/7/73 C 0.4
20 1/10/73 C 0.8
21 1/1/74 C 0.7
22 1/4/74 C 0.9
23 1/7/74 C 0.4
24 1/10/74 C 0.3
I want to calculate the rolling std. deviation of ei grouped by comp. I want the rolling standard deviation of the last 8 lines - but if only 6 lines exists, so far, it should still take the rolling std. deviation of those. So I use width = 8 and partial = 6 in this code:
roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)
df <- transform(df, roll = ave(ei, comp, FUN = roll))
However, due to the fact that some of my 'ei' values are 'NA' the partial part of the function doesn't work, since there is an NA in one of the past 8 lines. So of course after 6 lines the std. dev. is NA. Only for comp = B, the partial = 6 works. The results are seen below:
date comp ei roll
1 1/1/73 A NA NA
2 1/4/73 A 0.6 NA
3 1/7/73 A 0.7 NA
4 1/10/73 A 0.9 NA
5 1/1/74 A 0.4 NA
6 1/4/74 A 0.5 NA
7 1/7/74 A 0.7 NA
8 1/10/74 A 0.7 NA
9 1/1/75 A 0.4 0.1726888
10 1/4/75 A 0.5 0.1772811
11 1/1/73 B 0.8 NA
12 1/4/73 B 0.8 NA
13 1/7/73 B 0.5 NA
14 1/10/73 B 0.6 NA
15 1/1/74 B 0.3 NA
16 1/4/74 B 0.2 0.2503331
17 1/1/73 C NA NA
18 1/4/73 C 0.6 NA
19 1/7/73 C 0.4 NA
20 1/10/73 C 0.8 NA
21 1/1/74 C 0.7 NA
22 1/4/74 C 0.9 NA
23 1/7/74 C 0.4 NA
24 1/10/74 C 0.3 NA
I would have rather wanted my results to look as it does below, where the first std. dev is calculated for comp A in line number 7 for the previous 6 values (not NA) and where comp C has a std. dev in line 23 and 24:
date comp ei roll
1 1/1/73 A NA NA
2 1/4/73 A 0.6 NA
3 1/7/73 A 0.7 NA
4 1/10/73 A 0.9 NA
5 1/1/74 A 0.4 NA
6 1/4/74 A 0.5 NA
7 1/7/74 A 0.7 0.1751190
8 1/10/74 A 0.7 0.1618347
9 1/1/75 A 0.4 0.1726888
10 1/4/75 A 0.5 0.1772811
11 1/1/73 B 0.8 NA
12 1/4/73 B 0.8 NA
13 1/7/73 B 0.5 NA
14 1/10/73 B 0.6 NA
15 1/1/74 B 0.3 NA
16 1/4/74 B 0.2 0.2503331
17 1/1/73 C NA NA
18 1/4/73 C 0.6 NA
19 1/7/73 C 0.4 NA
20 1/10/73 C 0.8 NA
21 1/1/74 C 0.7 NA
22 1/4/74 C 0.9 NA
23 1/7/74 C 0.4 0.2065591
24 1/10/74 C 0.3 0.2267787
How can I do this without running a na.omit code before calculating the rolling std. dev? The reason why I don't want to remove NA's is that I need the lines with comp and dates (plus other columns in my real dataset). Also, removing my NA values might, in my real dataset, lead to removing NA's in the middle of a period so that the rolling std. dev. function won't fit with the dates and my results will be wrong.
Is there a way to deal with this without removing the NA values?
1) FUN computes sd if there are at least 6 non-NAs and otherwise returns NA.
Then proceed as in the question.
library(zoo)
df$date <- as.Date(df$date, "%d/%m/%y")
FUN <- function(x) if (length(na.omit(x)) >= 6) sd(x, na.rm = TRUE) else NA
roll <- function(z) rollapplyr(z, width = 8, FUN = FUN,
fill = NA, partial = 6)
transform(df, roll = ave(ei, comp, FUN = roll))
2) The other possibility is to use na.omit and then merge the result back with the original data frame.
library(zoo)
df$date <- as.Date(df$date, "%d/%m/%y")
roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)
df_roll_0 <- transform(na.omit(df), roll = ave(ei, comp, FUN = roll))
df_roll_m <- merge(df, df_roll_0, all = TRUE)
o <- with(df_roll_m, order(comp, date))
df_roll <- df_roll_m[o, ]
2a) This could also be expressed using dplyr/tidyr:
library(dplyr)
library(tidyr)
library(zoo)
df$date <- as.Date(df$date, "%d/%m/%y")
roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)
df_roll_0 <- df %>%
drop_na %>%
group_by(comp) %>%
mutate(roll = roll(ei)) %>%
ungroup
df %>%
left_join(df_roll_0)
Note
Lines <- " date comp ei
1 1/1/73 A NA
2 1/4/73 A 0.6
3 1/7/73 A 0.7
4 1/10/73 A 0.9
5 1/1/74 A 0.4
6 1/4/74 A 0.5
7 1/7/74 A 0.7
8 1/10/74 A 0.7
9 1/1/75 A 0.4
10 1/4/75 A 0.5
11 1/1/73 B 0.8
12 1/4/73 B 0.8
13 1/7/73 B 0.5
14 1/10/73 B 0.6
15 1/1/74 B 0.3
16 1/4/74 B 0.2
17 1/1/73 C NA
18 1/4/73 C 0.6
19 1/7/73 C 0.4
20 1/10/73 C 0.8
21 1/1/74 C 0.7
22 1/4/74 C 0.9
23 1/7/74 C 0.4
24 1/10/74 C 0.3"
df <- read.table(text = Lines)
Simple question, but can't seem to find the answer.
I am trying to divide all cells in a column with the first cell.
V1=c(4,5,6,3,2,7)
V2= c(2,4,5,8,7,9)
group=c(1,1,1,2,2,2)
D= data.frame(V1=V1, V2=V2, group=group)
D
V1 V2 group
1 4 2 1
2 5 4 1
3 6 5 1
4 3 8 2
5 2 7 2
6 7 9 2
This is what I would like to get:
V1 V2 group
1 1.0 1.0 1
2 1.3 2.0 1
3 1.5 2.5 1
4 1.0 1.0 2
5 0.7 0.9 2
6 2.3 1.1 2
A dplyr option:
D %>%
group_by(group) %>%
mutate_at(c("V1", "V2"), ~./first(.))
# A tibble: 6 x 3
# Groups: group [2]
V1 V2 group
<dbl> <dbl> <dbl>
1 1 1 1
2 1.25 2 1
3 1.5 2.5 1
4 1 1 2
5 0.667 0.875 2
6 2.33 1.12 2
Here is a one-liner base R solution,
D[-3] <- sapply(D[-3], function(i) ave(i, D$group, FUN = function(i)i / i[1]))
D
# V1 V2 group
#1 1.0000000 1.000 1
#2 1.2500000 2.000 1
#3 1.5000000 2.500 1
#4 1.0000000 1.000 2
#5 0.6666667 0.875 2
#6 2.3333333 1.125 2
A dplyr way:
library(dplyr)
D %>%
group_by(group) %>%
mutate_all(~ round(. / first(.), 1))
A data.table approach:
library(data.table)
setDT(D)[, lapply(.SD, function(x) round(x / x[1], 1)), by = group]
A base R solution:
split(D, D$group) <- lapply(split(D, D$group),
function(.) {
.[,1:2] <- as.data.frame(t(t(.[, 1:2]) / unlist(.[1,1:2])))
.
})
D
# V1 V2 group
# 1 1.0000000 1.000 1
# 2 1.2500000 2.000 1
# 3 1.5000000 2.500 1
# 4 1.0000000 1.000 2
# 5 0.6666667 0.875 2
# 6 2.3333333 1.125 2
An option with base R
by(D[-3], D[3], FUN = function(x) x/unlist(x[1,])[col(x)])
I have a correlation dataset that looks like this:
V1 V2 R2
1 2 0.4
1 3 0.5
3 5 0.3
And i want to convert it to a two-column data in such a way that I would have multiple x (in column V) in one y (in column R2) for scatter plotting. It would look like this:
V R2
1 0.4
2 0.4
1 0.5
2 0.5
3 0.5
3 0.3
4 0.3
5 0.3
How can I do this in R?
In the tidyverse, you can make a list column of the required vectors with purrr::map2 to iterate seq over each pair of start and end points, and then expand with tidyr::unnest:
df <- data.frame(V1 = c(1L, 1L, 3L),
V2 = c(2L, 3L, 5L),
R2 = c(0.4, 0.5, 0.3))
library(tidyverse)
df %>% transmute(V = map2(V1, V2, seq), R2) %>% unnest()
#> R2 V
#> 1 0.4 1
#> 2 0.4 2
#> 3 0.5 1
#> 4 0.5 2
#> 5 0.5 3
#> 6 0.3 3
#> 7 0.3 4
#> 8 0.3 5
In base R, there isn't a simple equivalent of unnest, so it's easier to use Map (the multivariate lapply, roughly equivalent to purrr::map2 above) to build a list of data frames, complete with the R2 value (recycled by data.frame), which than then be do.call(rbind, ...)ed into a single data frame:
do.call(rbind,
Map(function(v1, v2, r2){data.frame(V = v1:v2, R2 = r2)},
df$V1, df$V2, df$R2))
#> V R2
#> 1 1 0.4
#> 2 2 0.4
#> 3 1 0.5
#> 4 2 0.5
#> 5 3 0.5
#> 6 3 0.3
#> 7 4 0.3
#> 8 5 0.3
Check out the intermediate products of each to get a feel for how they work.
Here is one option using data.table
library(data.table)
setDT(df1)[, .(V = V1:V2, R2), by = .(grp = 1:nrow(df1))][, grp := NULL][]
# V R2
#1: 1 0.4
#2: 2 0.4
#3: 1 0.5
#4: 2 0.5
#5: 3 0.5
#6: 3 0.3
#7: 4 0.3
#8: 5 0.3
I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.
I'm desperately trying to lag a variable by group. I found this post that deals with essentially the same problem I'm facing, but the solution does not work for me, no idea why.
This is my problem:
library(dplyr)
df <- data.frame(monthvec = c(rep(1:2, 2), rep(3:5, 3)))
df <- df %>%
arrange(monthvec) %>%
mutate(growth=ifelse(monthvec==1, 0.3,
ifelse(monthvec==2, 0.5,
ifelse(monthvec==3, 0.7,
ifelse(monthvec==4, 0.1,
ifelse(monthvec==5, 0.6,NA))))))
df%>%
group_by(monthvec) %>%
mutate(lag.growth = lag(growth, order_by=monthvec))
Source: local data frame [13 x 3]
Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 0.3
3 2 0.5 NA
4 2 0.5 0.5
5 3 0.7 NA
6 3 0.7 0.7
7 3 0.7 0.7
8 4 0.1 NA
9 4 0.1 0.1
10 4 0.1 0.1
11 5 0.6 NA
12 5 0.6 0.6
13 5 0.6 0.6
This is what I'd like it to be in the end:
df$lag.growth <- c(NA, NA, 0.3, 0.3, 0.5, 0.5, 0.5, 0.7,0.7,0.7, 0.1,0.1,0.1)
monthvec growth lag.growth
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
I believe that one problem is that my groups are not of equal length...
Thanks for helping out.
Here is an idea. We group by monthvec in order to get the number of rows (cnt) of each group. We ungroup and use the first value of cnt as the size of the lag. We regroup on monthvec and replace the values in each group with the first value of each group.
library(dplyr)
df %>%
group_by(monthvec) %>%
mutate(cnt = n()) %>%
ungroup() %>%
mutate(lag.growth = lag(growth, first(cnt))) %>%
group_by(monthvec) %>%
mutate(lag.growth = first(lag.growth)) %>%
select(-cnt)
which gives,
# A tibble: 13 x 3
# Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
You may join your original data with a dataframe with a shifted "monthvec".
left_join(df, df %>% mutate(monthvec = monthvec + 1) %>% unique(), by = "monthvec")
# monthvec growth.x growth.y
# 1 1 0.3 NA
# 2 1 0.3 NA
# 3 2 0.5 0.3
# 4 2 0.5 0.3
# 5 3 0.7 0.5
# 6 3 0.7 0.5
# 7 3 0.7 0.5
# 8 4 0.1 0.7
# 9 4 0.1 0.7
# 10 4 0.1 0.7
# 11 5 0.6 0.1
# 12 5 0.6 0.1
# 13 5 0.6 0.1