Suppose that I have the following.
A table with input data
table <- data.frame(id=c(1,2,3,4,5,6),
cost=c(100,200,300,400,500,600))
A list of possible outcomes with and associate probability
values<-list(c(1),
c(0.5),
c(0))
A simulation of different scenarios
esc<-sample(1:3,100,replace=T)
How can I add a new column which contains the next formula?
id cost final
1 100 100*ifelse(esc[1]==1,values[[1]],ifelse(esc[1]==2,values[[2]],values[[3]]))
2 200 200*ifelse(esc[2]==1,values[[1]],ifelse(esc[2]==2,values[[2]],values[[3]]))
Convert esc variable into factor by using values as labels. Then convert into numeric type. This will map values to esc correctly.
esc <- as.numeric ( as.character( factor( esc, levels = sort( unique( esc )), labels = values) ) )
# [1] 1.0 0.5 0.5 0.0 1.0 0.0 0.0 0.5 0.5 1.0 1.0 1.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.0 1.0 0.5 1.0 1.0 0.5 1.0 0.5 0.0 0.5 0.5 0.5 0.5 1.0 0.0 0.0 0.0
# [38] 1.0 0.0 0.5 0.0 0.5 0.0 0.5 0.5 0.0 1.0 0.5 0.0 0.0 0.5 0.0 0.5 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.0 1.0 0.5 1.0 0.5 1.0 0.5 0.0 1.0 0.0 0.5 0.0 0.5 0.5
# [75] 0.5 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.5 0.0 0.0 0.0 0.5 0.5 0.0 0.5
table$esc <- esc[ 1: nrow(table) ] # add esc to table
Now multiply cost with esc to get final
within( table, final <- cost * esc)
# id cost esc final
# 1 1 100 1.0 100
# 2 2 200 0.5 100
# 3 3 300 0.5 150
# 4 4 400 0.0 0
# 5 5 500 1.0 500
# 6 6 600 0.0 0
Data:
table <- data.frame(id=c(1,2,3,4,5,6), cost=c(100,200,300,400,500,600))
values <- c(1, 0.5, 0)
set.seed(1L)
esc <- sample(1:3,100,replace=T)
esc
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3 3 1 2 1 1 2 1 2 3 2 2 2 2 1 3 3 3 1 3 2 3 2 3 2 2 3 1 2 3 3 2 3 2 1 1 1 1 2 2 2 3 1 2 1 2 1 2 3 1 3 2 3 2 2 2
# [76] 3 3 2 3 3 2 3 2 1 3 1 3 1 1 1 1 1 2 3 3 3 2 2 3 2
Related
I have a file with codes like this:
V1 V2
1 1.0000000
2 0.2000000
3 0.5000000
4 0.0000000
And one matrix with the codes like the following:
1 1 1 1 1 1 1
3 3 3 3 3 3 3
4 2 4 2 4 2 4
1 1 1 1 1 1 1
I would like to use a loop to make a new matrix in which each value is the value of the codes as follows:
1.0 1.0 1.0 1.0 1.0 1.0 1.0
0.5 0.5 0.5 0.5 0.5 0.5 0.5
0.0 0.2 0.0 0.2 0.0 0.2 0.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0
Any ideas?
Here you can do it somewhat easier because of labels, but in general you can use match:
data:
df_map <- data.frame(
V1 = c(1, 2, 3, 4),
V2 = c(1, 0.2, 0.5, 0)
)
m_codes <- matrix(sample(1:4, 32, TRUE), nrow = 4)
solution:
m_values <- matrix(df_map$V2[match(m_codes, df_map$V1)], nrow = nrow(m_codes))
I have just one dataframe as below.
df=data.frame(o=c(rep("a",12),rep("b",3)), d=c(0,0,1,0,0.3,0.6,0,1,2,3,4,0,0,1,0))
> df
o d
1 a 0.0
2 a 0.0
3 a 1.0
4 a 0.0
5 a 0.3
6 a 0.6
7 a 0.0
8 a 1.0
9 a 2.0
10 a 3.0
11 a 4.0
12 a 0.0
13 b 0.0
14 b 1.0
15 b 0.0
I want to add a new column that counts frequency based on both columns 'o' and 'd'.
And the frequency should start again from 1 if the value of column 'd' is zero like below(hand-made).
> df_result
o d freq
1 a 0.0 1
2 a 0.0 2
3 a 1.0 2
4 a 0.0 3
5 a 0.3 3
6 a 0.6 3
7 a 0.0 5
8 a 1.0 5
9 a 2.0 5
10 a 3.0 5
11 a 4.0 5
12 a 0.0 1
13 b 0.0 2
14 b 1.0 2
15 b 0.0 1
In base R, use ave :
df$freq <- with(df, ave(d, cumsum(d == 0), FUN = length))
df
# o d freq
#1 a 0.0 1
#2 a 0.0 2
#3 a 1.0 2
#4 a 0.0 3
#5 a 0.3 3
#6 a 0.6 3
#7 a 0.0 5
#8 a 1.0 5
#9 a 2.0 5
#10 a 3.0 5
#11 a 4.0 5
#12 a 0.0 1
#13 b 0.0 2
#14 b 1.0 2
#15 b 0.0 1
With dplyr :
library(dplyr)
df %>% add_count(grp = cumsum(d == 0))
using data.tables and #Ronak Shah approach
df=data.frame(o=c(rep("a",12),rep("b",3)), d=c(0,0,1,0,0.3,0.6,0,1,2,3,4,0,0,1,0))
library(data.table)
setDT(df)[, freq := .N, by = cumsum(d == 0)]
df
#> o d freq
#> 1: a 0.0 1
#> 2: a 0.0 2
#> 3: a 1.0 2
#> 4: a 0.0 3
#> 5: a 0.3 3
#> 6: a 0.6 3
#> 7: a 0.0 5
#> 8: a 1.0 5
#> 9: a 2.0 5
#> 10: a 3.0 5
#> 11: a 4.0 5
#> 12: a 0.0 1
#> 13: b 0.0 2
#> 14: b 1.0 2
#> 15: b 0.0 1
Created on 2021-02-26 by the reprex package (v1.0.0)
One more answer using rle()
df$freq <- with(rle(cumsum(df$d == 0)), rep(lengths, lengths))
df
o d freq
1 a 0.0 1
2 a 0.0 2
3 a 1.0 2
4 a 0.0 3
5 a 0.3 3
6 a 0.6 3
7 a 0.0 5
8 a 1.0 5
9 a 2.0 5
10 a 3.0 5
11 a 4.0 5
12 a 0.0 1
13 b 0.0 2
14 b 1.0 2
15 b 0.0 1
I want to index duplicates with respect to certain variables in R in a seperate, new variable.
Let's assume that I have the following dataset:
a <- seq(from=0, to=1, by=.4)
b <- seq(from=0, to=1, by=.4)
c <- seq(from=0, to=1, by=.4)
d <- seq(from=0, to=1, by=.4)
df <- expand.grid(a=a, b=b, c=c, d=d)
> df[1:20,]
a b c d
1 0.0 0.0 0.0 0
2 0.4 0.0 0.0 0
3 0.8 0.0 0.0 0
4 0.0 0.4 0.0 0
5 0.4 0.4 0.0 0
6 0.8 0.4 0.0 0
7 0.0 0.8 0.0 0
8 0.4 0.8 0.0 0
9 0.8 0.8 0.0 0
10 0.0 0.0 0.4 0
11 0.4 0.0 0.4 0
12 0.8 0.0 0.4 0
13 0.0 0.4 0.4 0
14 0.4 0.4 0.4 0
15 0.8 0.4 0.4 0
16 0.0 0.8 0.4 0
17 0.4 0.8 0.4 0
18 0.8 0.8 0.4 0
19 0.0 0.0 0.8 0
20 0.4 0.0 0.8 0
In this case, the first entry and the tenth entry are identical with respect to a and b. How can I assign a value e.g. "0.00-0.00" to a new variable for all those columns that have this combination (also line 19) and the same for all other combinations (eg. line 2, 11 and 20 etc.).
Thanks a lot in advance!
get duplicated rows like 10th,11th...
duplicated(df[,c(1,2)])
getting original rows as well ie. 1st,2nd...
duplicated(df[,c(1,2)], fromLast = TRUE)
assigning range to original as well as duplicates in new column e
df[duplicated(df[,c(1,2)], fromLast = TRUE) | duplicated(df[,c(1,2)]),"e"] <- paste0(df[duplicated(df[,c(1,2)], fromLast = TRUE) | duplicated(df[,c(1,2)]),1],"-",df[duplicated(df[,c(1,2)], fromLast = TRUE) | duplicated(df[,c(1,2)]),2])
> head(df)
a b c d e
1 0.0 0.0 0 0 0-0
2 0.4 0.0 0 0 0.4-0
3 0.8 0.0 0 0 0.8-0
4 0.0 0.4 0 0 0-0.4
5 0.4 0.4 0 0 0.4-0.4
6 0.8 0.4 0 0 0.8-0.4
Note : in this example, all rows are fitting original/duplicate criteria, therefore range assigned to all
Try this
df$e <- paste(df$a,df$b)
Let me know if you were looking for something else
I have a dataset (df) like this:
Iso conc. rep time OD
1 1 1 0 0.2
1 1.5 2 0 0.2
1 2 3 0 0.2
2 1 1 0 0.3
2 1.5 2 0 0.25
2 2 3 0 0.3
1 1 1 1 0.4
1 1.5 2 1 0.35
1 2 3 1 0.38
2 1 1 1 0.4
2 1.5 2 1 0.45
2 2 3 1 0.43
And I want to get the result growth=OD(time=1)-OD(time=0) basing on Iso, conc, and rep.
The output would be like this:
Iso conc. rep time growth
1 1 1 1 0.2
1 1.5 2 1 0.15
1 2 3 1 0.18
2 1 1 1 0.1
2 1.5 2 1 0.2
2 2 3 1 0.13
I have been thinking to use data.table to calculate the growth.
DT <- as.data.table(df)
DT[, , by = .(Iso,conc.,rep,set)]
But I don't know how to write the part before two comma. Could somebody help me?
Using data.table you can simply do:
dt[,.(growth = OD[time==1]-OD[time==0]),.(Iso,conc.,rep)]
# Iso conc. rep growth
#1: 1 1.0 1 0.20
#2: 1 1.5 2 0.15
#3: 1 2.0 3 0.18
#4: 2 1.0 1 0.10
#5: 2 1.5 2 0.20
#6: 2 2.0 3 0.13
You can do this with:
DT [, list(growth = OD[time == 1] - OD[time == 0]), by=.(Iso,conc.,rep)]
Or alternatively, if you are sure there are only two values in each group:
DT [order(time), list(growth = diff(OD), by=.(Iso,conc.,rep)]
I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?
Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1
This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1