I'm trying to generate a dataframe of parameter values for a sensitivity analysis where each row is a parameter space. I'd like to be able to automate the generation of the dataframe such that each parameter is varied by -10% and +10% whilst all the other values are kept the same (see below example of desired df). Does anyone know how I can do this? I feel like the answer is obvious, but really can't see what it is!
Example of desired df:
a <- c(10,9,11,10,10,10,10,10,10)
b <- c(20,20,20,18,22,20,20,20,20)
c <- c(30,30,30,30,30,27,33,30,30)
d <- c(40,40,40,40,40,40,40,36,44)
parms <- data.frame(a,b,c,d)
I think the function expand.grid is what you are looking for.
a <- c(9,10,11)
b <- c(18,20,22)
c <- c(27,30,33)
d <- c(36,40,44)
test <- expand.grid(a,b,c,d)
To automate the first part (variation by 10% around center value) you may use this approach:
library(magrittr)
vary_around_center <- function(center){
c(center*0.9, center, center*1.1)
}
c(10,20,30,40) %>%
lapply(vary_around_center) %>%
expand.grid
I think this will get you the "one parameter changing at a time" pattern you showed in your example.
params <- c(a = 10, b = 20, c = 30, d = 40)
builder_func <- function(params) {
opts <- map_df(params, ~c(., .*.9, .*1.1))
stocks <- map_df(params, ~rep(., 3))
map_df(names(opts),
~ bind_cols(
opts[.],
stocks[. != names(stocks)]
)) %>%
unique()
}
builder_func(params)
# A tibble: 9 x 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 10 20 30 40
2 9 20 30 40
3 11 20 30 40
4 10 18 30 40
5 10 22 30 40
6 10 20 27 40
7 10 20 33 40
8 10 20 30 36
9 10 20 30 44
Sorry I missed that nuance the first time I read your question. Let me know if something isn't quit right...
Related
I don´t know how or where to start, but i hope someone can help. It´s the first time i´d use R like this, so even a keyword or a recommendation where to look it up would be helpful.
My dataframe looks like this:
set.seed(1)
df <- data.frame(
X = sample(c(1, 2, 3), 50, replace = TRUE),
Y = sample(c(1, 2, 3), 50, replace = TRUE))
And I would like to get a cross table like this:
using
length(which(df$X == & df$Y == ))
I could calculate the data with R and fill it in my Excel-sheet but there has to be a better option.
Thank you in advance.
Try this base R solution:
#Data
set.seed(1)
df <- data.frame(
X = sample(c(1, 2, 3), 50, replace = TRUE),
Y = sample(c(1, 2, 3), 50, replace = TRUE))
#Code
addmargins(table(df$X,df$Y))
Output:
1 2 3 Sum
1 6 7 5 18
2 4 6 9 19
3 5 5 3 13
Sum 15 18 17 50
You can also change the order of your variables like this:
#Code2
addmargins(table(df$Y,df$X))
Output:
1 2 3 Sum
1 6 4 5 15
2 7 6 5 18
3 5 9 3 17
Sum 18 19 13 50
In order to export to MS Excel, you use this code:
library(xlsx)
#Transform to dataframe
d1 <- as.data.frame.matrix(addmargins(table(df$X,df$Y)))
#Export
write.xlsx(d1,file='myexample.xlsx','Sheet1')
If the data have only two columns, just pass the data.frame object to table.
addmargins(table(df))
If the data include more than two columns, you can subset it's variable before passing to table().
addmargins(table(df[c("X", "Y")]))
You can also pass a formula to xtabs().
addmargins(xtabs( ~ X + Y, df))
All of above give
Y
X 1 2 3 Sum
1 5 6 3 14
2 2 6 6 14
3 13 4 5 22
Sum 20 16 14 50
To export the table to an excel file, you can use write.xlsx() from openxlsx.
library(openxlsx)
tab <- addmargins(xtabs( ~ X + Y, df))
write.xlsx(tab, "foo.xlsx")
I have a vector of True and False values. The length of the vector is 1000.
vect <- [T T F T F F..... x1000]
I want loop over the first 100 (i.e 1:100) values and calculate the count of true and false values and store the result into some variable (e.g. True <- 51, False <- 49). Then loop over the next 100 values (101:200) and do the same computation as before, and so on till I reach 1000.
The code below is pretty standard but, instead of slicing the vector, it calculates sums for the entire vector.
count_True = 0
count_False = 0
for (i in vect){
if (i == 'T'){
count_True = count_True + 1
}
else {
count_false = count_false + 1
}
}
I am aware you can split the the vector by
vect_splt <- split(vect,10)
but is there a way to combine these to do what I wanted or any other way?
Does something like this work:
set.seed(42)
vect <- sample(rep(c(T, F), 500))
vect <- tibble(vect)
vect %>%
mutate(seq = row_number() %/% 100) %>%
group_by(seq) %>%
summarise(n_TRUE = sum(vect),
n_FALSE = sum(!vect))
# A tibble: 11 x 3
seq n_TRUE n_FALSE
<dbl> <int> <int>
1 0 42 57
2 1 56 44
3 2 50 50
4 3 55 45
5 4 43 57
6 5 48 52
7 6 48 52
8 7 54 46
9 8 51 49
10 9 53 47
11 10 0 1
We can use a split by table. With a grouping index created with gl, split the vector into a list of vectors and get the count with table and store it in a list
out <- lapply(split(vect, as.integer(gl(length(vect), 100, length(vect)))), table)
It can be converted to a single dataset by rbinding
out1 <- do.call(rbind, out)
data
set.seed(24)
vect <- sample(c(TRUE, FALSE), 1000, replace = TRUE)
I have a data frame which holds activity (A) data across time (T) for a number of subjects (S) in different groups (G). The activity data were sampled every 10 minutes. What I would like to do is to re-bin the data into, say, 30-minute bins (either adding or averaging values) keeping the subject Id and group information.
Example. I have something like this:
S G T A
1 A 30 25
1 A 40 20
1 A 50 15
1 A 60 20
1 A 70 5
1 A 80 20
2 B 30 10
2 B 40 10
2 B 50 10
2 B 60 20
2 B 70 20
2 B 80 20
And I'd like something like this:
S G T A
1 A 40 20
1 A 70 15
2 B 40 10
2 B 70 20
Whether time is the average time (as in the example) or the first/last time point and whether the activity is averaged (again, as in the example) or summed is not important for now.
I will appreciate any help you can provide on this. I was thinking about creating a script in Python to re-bin this particular dataframe, but I thought that there may be a way of doing it in R in a way that may be applied to any dataframe with differing numbers of columns, etc.
There are some ways to come to the wished dataframe.
I have reproduced your dataframe:
df <- data.frame(S = c(rep(1,6),rep(2,6)),
G = c(rep("A",6),rep("B",6)),
T = rep(seq(30,80,10),2),
A = c(25, 20, 15, 20, 5, 20, 10, 10, 10, 20, 20, 20))
The classical way could be:
df[df$T == 40 | df$T == 70,]
The more modern tidyverse way is
library(tidyverse)
df %>% filter(T == 40 | T ==70)
If you want to get the average of each group of G filtered for T==40 and 70:
df %>% filter(T == 40 | T == 70) %>%
group_by(G) %>%
mutate(A = mean(A))
If I have 5 data frames in the global environment, such as a,b,c,d,and e
I want the data frame a to be compared with e, and if R finds any common elements in a and e, delete the elements in a. then I want the data frame b to be compared with e and delete the common elements, and so on.
Actually I have 20 tables need to be compared with e.
Can anyone give some elegant way to handle this problem? I'm thinking of loop or functions but can't work the details out.
Thanks everybody and have a nice day!
The easiest would be to put all the dataframes you want to compare in a list, then use lapply to loop over this list:
# create list of data.frames
dlist <- list(df1 = data.frame(var1 = 1:10), df2 = data.frame(var1 = 11:20),
df3 = data.frame(var1 = 21:30), df4 = data.frame(var1 = 31:40))
# create master-data.frame
set.seed(1)
df <- data.frame(var1 = sample(1:100, 30))
# use lapply() to loop over the data and exclude all elements that are in the master-data.frame
dlist <- lapply(dlist, function(x){
x <- x[!x$var1 %in% df$var1, , drop = FALSE]
})
Result:
> dlist
$df1
var1
2 2
3 3
4 4
5 5
7 7
8 8
9 9
$df2
var1
1 11
2 12
3 13
4 14
5 15
8 18
$df3
var1
2 22
3 23
4 24
6 26
10 30
$df4
var1
1 31
3 33
5 35
6 36
8 38
9 39
10 40
If you absolutely need the dataframes in your global directory, you could use list2env:
list2env(dlist, envir = .GlobalEnv)
I want to know a code which finds the largest k cells and their locations, when given a two dimensional table.
for example, the given two dimensional table is as follows,
table_ex
A B C
F 99 693 515
I 722 583 37
M 186 817 525
the function, which is made by a desirable code, gives the result.
function(table_ex, 2)
817, M B
722, I A
In the case described above, since k=2, the function gives two largest cells and their locations.
You can coerce to data.frame then just sort using order:
getTopCells <- function(tab, n) {
sort_df <- as.data.frame(tab)
sort_df <- sort_df[order(-sort_df$Freq),]
sort_df[1:n, ]
}
Example:
tab <- table(sample(c('A', 'B'), 200, replace=T),
rep(letters[1:5], 40))
# returns:
# a b c d e
# A 20 23 19 21 23
# B 20 17 21 19 17
getTopCells(tab, 3)
# returns:
# Var1 Var2 Freq
# 3 A b 23
# 9 A e 23
# 6 B c 21
A solution using only 'base' and without coercing into a data.frame :
First let's create a table:
set.seed(123)
tab <- table(sample(c('A', 'B'), 200, replace=T),
rep(letters[1:5], 40))
a b c d e
A 15 13 18 20 22
B 25 27 22 20 18
and now:
for (i in 1:nrow(tab)){
cat(dimnames(tab)[[1]][i], which.max(tab[i,]),max(tab[i,]),'\n')
}
A 5 22
B 2 27
I'm using a reshaping approach here. The key is to save your table in a data.frame format and then save your row names as another column in that data.frame. Then you can use something like:
df = read.table(text="
names A B C
F 99 693 515
I 722 583 37
M 186 817 525", header=T)
library(tidyr) # to reshape your dataset
library(dplyr) # to join commands
df %>%
gather(names2,value,-names) %>% # reshape your dataset
arrange(desc(value)) %>% # arrange your value column
slice(1:2) # pick top 2 rows
# names names2 value
# 1 M B 817
# 2 I A 722
PS: In case you don't want to use any packages, or don't want to use data.frames but your original table, I'm sure you'll find some great alternative replies here.