I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.
Related
I have a dataframe:
date comp ei
1 1/1/73 A NA
2 1/4/73 A 0.6
3 1/7/73 A 0.7
4 1/10/73 A 0.9
5 1/1/74 A 0.4
6 1/4/74 A 0.5
7 1/7/74 A 0.7
8 1/10/74 A 0.7
9 1/1/75 A 0.4
10 1/4/75 A 0.5
11 1/1/73 B 0.8
12 1/4/73 B 0.8
13 1/7/73 B 0.5
14 1/10/73 B 0.6
15 1/1/74 B 0.3
16 1/4/74 B 0.2
17 1/1/73 C NA
18 1/4/73 C 0.6
19 1/7/73 C 0.4
20 1/10/73 C 0.8
21 1/1/74 C 0.7
22 1/4/74 C 0.9
23 1/7/74 C 0.4
24 1/10/74 C 0.3
I want to calculate the rolling std. deviation of ei grouped by comp. I want the rolling standard deviation of the last 8 lines - but if only 6 lines exists, so far, it should still take the rolling std. deviation of those. So I use width = 8 and partial = 6 in this code:
roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)
df <- transform(df, roll = ave(ei, comp, FUN = roll))
However, due to the fact that some of my 'ei' values are 'NA' the partial part of the function doesn't work, since there is an NA in one of the past 8 lines. So of course after 6 lines the std. dev. is NA. Only for comp = B, the partial = 6 works. The results are seen below:
date comp ei roll
1 1/1/73 A NA NA
2 1/4/73 A 0.6 NA
3 1/7/73 A 0.7 NA
4 1/10/73 A 0.9 NA
5 1/1/74 A 0.4 NA
6 1/4/74 A 0.5 NA
7 1/7/74 A 0.7 NA
8 1/10/74 A 0.7 NA
9 1/1/75 A 0.4 0.1726888
10 1/4/75 A 0.5 0.1772811
11 1/1/73 B 0.8 NA
12 1/4/73 B 0.8 NA
13 1/7/73 B 0.5 NA
14 1/10/73 B 0.6 NA
15 1/1/74 B 0.3 NA
16 1/4/74 B 0.2 0.2503331
17 1/1/73 C NA NA
18 1/4/73 C 0.6 NA
19 1/7/73 C 0.4 NA
20 1/10/73 C 0.8 NA
21 1/1/74 C 0.7 NA
22 1/4/74 C 0.9 NA
23 1/7/74 C 0.4 NA
24 1/10/74 C 0.3 NA
I would have rather wanted my results to look as it does below, where the first std. dev is calculated for comp A in line number 7 for the previous 6 values (not NA) and where comp C has a std. dev in line 23 and 24:
date comp ei roll
1 1/1/73 A NA NA
2 1/4/73 A 0.6 NA
3 1/7/73 A 0.7 NA
4 1/10/73 A 0.9 NA
5 1/1/74 A 0.4 NA
6 1/4/74 A 0.5 NA
7 1/7/74 A 0.7 0.1751190
8 1/10/74 A 0.7 0.1618347
9 1/1/75 A 0.4 0.1726888
10 1/4/75 A 0.5 0.1772811
11 1/1/73 B 0.8 NA
12 1/4/73 B 0.8 NA
13 1/7/73 B 0.5 NA
14 1/10/73 B 0.6 NA
15 1/1/74 B 0.3 NA
16 1/4/74 B 0.2 0.2503331
17 1/1/73 C NA NA
18 1/4/73 C 0.6 NA
19 1/7/73 C 0.4 NA
20 1/10/73 C 0.8 NA
21 1/1/74 C 0.7 NA
22 1/4/74 C 0.9 NA
23 1/7/74 C 0.4 0.2065591
24 1/10/74 C 0.3 0.2267787
How can I do this without running a na.omit code before calculating the rolling std. dev? The reason why I don't want to remove NA's is that I need the lines with comp and dates (plus other columns in my real dataset). Also, removing my NA values might, in my real dataset, lead to removing NA's in the middle of a period so that the rolling std. dev. function won't fit with the dates and my results will be wrong.
Is there a way to deal with this without removing the NA values?
1) FUN computes sd if there are at least 6 non-NAs and otherwise returns NA.
Then proceed as in the question.
library(zoo)
df$date <- as.Date(df$date, "%d/%m/%y")
FUN <- function(x) if (length(na.omit(x)) >= 6) sd(x, na.rm = TRUE) else NA
roll <- function(z) rollapplyr(z, width = 8, FUN = FUN,
fill = NA, partial = 6)
transform(df, roll = ave(ei, comp, FUN = roll))
2) The other possibility is to use na.omit and then merge the result back with the original data frame.
library(zoo)
df$date <- as.Date(df$date, "%d/%m/%y")
roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)
df_roll_0 <- transform(na.omit(df), roll = ave(ei, comp, FUN = roll))
df_roll_m <- merge(df, df_roll_0, all = TRUE)
o <- with(df_roll_m, order(comp, date))
df_roll <- df_roll_m[o, ]
2a) This could also be expressed using dplyr/tidyr:
library(dplyr)
library(tidyr)
library(zoo)
df$date <- as.Date(df$date, "%d/%m/%y")
roll <- function(z) rollapplyr(z, width = 8, FUN = sd, fill = NA, partial = 6)
df_roll_0 <- df %>%
drop_na %>%
group_by(comp) %>%
mutate(roll = roll(ei)) %>%
ungroup
df %>%
left_join(df_roll_0)
Note
Lines <- " date comp ei
1 1/1/73 A NA
2 1/4/73 A 0.6
3 1/7/73 A 0.7
4 1/10/73 A 0.9
5 1/1/74 A 0.4
6 1/4/74 A 0.5
7 1/7/74 A 0.7
8 1/10/74 A 0.7
9 1/1/75 A 0.4
10 1/4/75 A 0.5
11 1/1/73 B 0.8
12 1/4/73 B 0.8
13 1/7/73 B 0.5
14 1/10/73 B 0.6
15 1/1/74 B 0.3
16 1/4/74 B 0.2
17 1/1/73 C NA
18 1/4/73 C 0.6
19 1/7/73 C 0.4
20 1/10/73 C 0.8
21 1/1/74 C 0.7
22 1/4/74 C 0.9
23 1/7/74 C 0.4
24 1/10/74 C 0.3"
df <- read.table(text = Lines)
I'm desperately trying to lag a variable by group. I found this post that deals with essentially the same problem I'm facing, but the solution does not work for me, no idea why.
This is my problem:
library(dplyr)
df <- data.frame(monthvec = c(rep(1:2, 2), rep(3:5, 3)))
df <- df %>%
arrange(monthvec) %>%
mutate(growth=ifelse(monthvec==1, 0.3,
ifelse(monthvec==2, 0.5,
ifelse(monthvec==3, 0.7,
ifelse(monthvec==4, 0.1,
ifelse(monthvec==5, 0.6,NA))))))
df%>%
group_by(monthvec) %>%
mutate(lag.growth = lag(growth, order_by=monthvec))
Source: local data frame [13 x 3]
Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 0.3
3 2 0.5 NA
4 2 0.5 0.5
5 3 0.7 NA
6 3 0.7 0.7
7 3 0.7 0.7
8 4 0.1 NA
9 4 0.1 0.1
10 4 0.1 0.1
11 5 0.6 NA
12 5 0.6 0.6
13 5 0.6 0.6
This is what I'd like it to be in the end:
df$lag.growth <- c(NA, NA, 0.3, 0.3, 0.5, 0.5, 0.5, 0.7,0.7,0.7, 0.1,0.1,0.1)
monthvec growth lag.growth
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
I believe that one problem is that my groups are not of equal length...
Thanks for helping out.
Here is an idea. We group by monthvec in order to get the number of rows (cnt) of each group. We ungroup and use the first value of cnt as the size of the lag. We regroup on monthvec and replace the values in each group with the first value of each group.
library(dplyr)
df %>%
group_by(monthvec) %>%
mutate(cnt = n()) %>%
ungroup() %>%
mutate(lag.growth = lag(growth, first(cnt))) %>%
group_by(monthvec) %>%
mutate(lag.growth = first(lag.growth)) %>%
select(-cnt)
which gives,
# A tibble: 13 x 3
# Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
You may join your original data with a dataframe with a shifted "monthvec".
left_join(df, df %>% mutate(monthvec = monthvec + 1) %>% unique(), by = "monthvec")
# monthvec growth.x growth.y
# 1 1 0.3 NA
# 2 1 0.3 NA
# 3 2 0.5 0.3
# 4 2 0.5 0.3
# 5 3 0.7 0.5
# 6 3 0.7 0.5
# 7 3 0.7 0.5
# 8 4 0.1 0.7
# 9 4 0.1 0.7
# 10 4 0.1 0.7
# 11 5 0.6 0.1
# 12 5 0.6 0.1
# 13 5 0.6 0.1
Here is a sample data set:
sample1 <- data.frame(Names=letters[1:10], Values=sample(seq(0.1,1,0.1)))
When I'm reordering the data set, I'm losing the row names order
sample1[order(sample1$Values), ]
Names Values
7 g 0.1
4 d 0.2
3 c 0.3
9 i 0.4
10 j 0.5
5 e 0.6
8 h 0.7
6 f 0.8
1 a 0.9
2 b 1.0
Desired output:
Names Values
1 g 0.1
2 d 0.2
3 c 0.3
4 i 0.4
5 j 0.5
6 e 0.6
7 h 0.7
8 f 0.8
9 a 0.9
10 b 1.0
Try
rownames(Ordersample2) <- 1:10
or more generally
rownames(Ordersample2) <- NULL
I had a dplyr usecase:
df %>% as.data.frame(row.names = 1:nrow(.))
I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?
Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1
This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1
I have this data frame:
df <- data.frame(A=c("a","b","c","d","e","f","g","h","i"),
B=c("1","1","1","2","2","2","3","3","3"),
C=c(0.1,0.2,0.4,0.1,0.5,0.7,0.1,0.2,0.5))
> df
A B C
1 a 1 0.1
2 b 1 0.2
3 c 1 0.4
4 d 2 0.1
5 e 2 0.5
6 f 2 0.7
7 g 3 0.1
8 h 3 0.2
9 i 3 0.5
I would like to add 1000 further columns and fill this columns with the values generated by :
transform(df, D=ave(C, B, FUN=function(b) sample(b, replace=TRUE)))
I've tried with a for loop but it does not work:
for (i in 4:1000){
df[, 4:1000] <- NA
df[,i] = transform(df, D=ave(C, B, FUN=function(b) sample(b, replace=TRUE)))
}
For efficiency reasons, I suggest running sample only once for each group. This can be achieved with this:
sample2 <- function(x, size)
{
if(length(x)==1) rep(x, size) else sample(x, size, replace=TRUE)
}
new_df <- do.call(rbind, by(df, df$B,
function(d) cbind(d, matrix(sample2(d$C, length(d$C)*1000),
ncol=1000))))
Notes:
I've created sample2 in case there is a group with only one C value. Check ?sample to see what I mean.
The names of the columns will be numbers, from 1 to 1000. This can be changed as in the answer by #agstudy.
The row names are also changed. "Fixing" them is similar, just use row.names instead of col.names.
Using replicate for example:
cbind(df,replicate(1000,ave(df$C, df$B,
FUN=function(b) sample(b, replace=TRUE))))
To add 4 columns for example:
cbind(df,replicate(4,ave(df$C, df$B,
FUN=function(b) sample(b, replace=TRUE))))
A B C 1 2 3 4
1 a 1 0.1 0.2 0.2 0.1 0.2
2 b 1 0.2 0.4 0.2 0.4 0.4
3 c 1 0.4 0.1 0.1 0.1 0.1
4 d 2 0.1 0.1 0.5 0.5 0.1
5 e 2 0.5 0.7 0.1 0.5 0.1
6 f 2 0.7 0.1 0.7 0.7 0.7
7 g 3 0.1 0.2 0.5 0.2 0.2
8 h 3 0.2 0.2 0.1 0.2 0.1
9 i 3 0.5 0.5 0.5 0.1 0.5
Maybe you need to rename columns by something like :
gsub('([0-9]+)','D\\1',colnames(res))
1] "A" "B" "C" "D1" "D2" "D3" "D4"