With the following x vector I would like to create a vector of size 6 (grt1) with two conditions. For unique values of x, take the sub as below. For other positions of grt1 like 3,5, and 6, data from std normal.
h=c(1:6)
grt1=numeric(length(h)) #Null vector
x=c(1,2,2,4,4,4)
for (i in unique(x)){
f=rep(x[x==i],3)
grt1[i]=sum(f)
} ##Condition-1
for( j in c(3,5,6))
{
grt1[j]=rnorm(1)
} ##Condition 2
The above code is working. But I want to make them a general statement by not specifying c(3,5,6) in the second condition.
Any help is appreciated.
Here is an option with split
v1 <- sapply(split(x, x), function(x) sum(rep(x, 3)))
grt1[grt1 == 0] <- rnorm(sum(grt1 == 0))
grt1
#[1] 3.0000000 12.0000000 0.3993774 36.0000000 -0.8149975 -1.2508617
If we need to use the OP's loop, get the index of elements that are 0 (because 'grt1' is initialized as a numeric vector of 0s) with which, loop over it and assign those elements with rnorm
for(j in which(grt1 == 0)) grt1[j] <- rnorm(1)
grt1
#[1] 3.000000 12.000000 -0.765245 36.000000 -1.350172 -1.081518
NOTE: rnorm is Vectorized, so a for loop is not really needed here
Or with tidyverse
library(dplyr)
library(tidyr)
tibble(x) %>%
group_by(x) %>%
summarise(Sum = sum(rep(x, 3)), .groups = 'drop') %>%
complete(x = h) %>%
transmute(Sum = replace(Sum, is.na(Sum), rnorm(sum(is.na(Sum))))) %>%
pull(Sum)
#[1] 3.0000000 12.0000000 -1.9279778 36.0000000 0.7900143 -0.2506099
which(duplicated(x)) would give you c(3, 5, 6). Try :
inds <- which(duplicated(x))
for (i in unique(x)) {
f=rep(x[x==i],3)
grt1[i]=sum(f)
}
for( j in inds){
grt1[j]=rnorm(1)
}
Related
I have simulation and data structures as follows (just a toy example):
foo = function(mu=0,lambda=1){
x1 = rnorm(1,mu) #X~N(μ,1)
y1 = rexp(1,lambda) #Y~Exp(λ)
list(x=x1,y=y1)
}
mu = 1; lambda = 2 #true values: E(X)=μ=1; E(Y)=1/λ=0.5
set.seed(0); out = replicate(1000, foo(mu,lambda), simplify=FALSE)
# str(out)
Then we get a list out of length(out)=1000, with each list having out$x and out$y.
I want to compute the means for 1000 out$xs and out$ys, respectively.
Of course, I can reach my goal through a not-clever way as
m = c() #for storing simulated values
for(i in 1:2){
s = sapply( 1:1000, function(j)out[[j]][i] )
m[i] = mean( as.numeric(s) )
}
m
# [1] 0.9736922 0.4999028
Can we use a more simple and efficient way to compute the means? I also try lapply(out, mean)
and Reduce("+",out)/1000, but failed...
This is another option if the sublists are always the same length:
> rowMeans(matrix(unlist(out),2))
[1] 0.9736922 0.4999028
Or:
> rowMeans(replicate(1000,unlist(foo(mu,lambda))))
x y
0.9736922 0.4999028
An option is to use purrr::transpose
library(purrr)
out %>% transpose() %>% map(~ mean(unlist(.x)[1:1000]))
# Or: out[1:1000] %>% transpose() %>% map(~ mean(unlist(.x)))
#$x
#[1] 0.9736922
#
#$y
#[1] 0.4999028
Or a base R solution using lapply (which is essentially the same as your explicit for loop):
lapply(c("x", "y"), function(var) mean(sapply(out[1:1000], "[[", var)))
#[[1]]
#[1] 0.9736922
#
#[[2]]
#[1] 0.4999028
I want to use rmultinom(), combined with a transition matrix, to generate whole number outputs that, when summed, are equal to the original values. However, I can't figure out how to do it without iterating over the matrix. Here is an example:
a = matrix(runif(16),nrow=4,ncol=4)
a = apply(a,2,FUN = function(x) x/sum(x))
b = c(5,7,5,9)
out = c(0,0,0,0) # initialize
for (i in 1:ncol(a)){
tmp = rmultinom(1,b[i],a[,i])
out = tmp + out
}
sum(out) == sum(b) ## Should eval to true
a represents a transition matrix, with each column summing to 1. b is a starting vector of integers. The loop iterates along the columns to generate a vector in out that sums to the initial numbers in b. How can I do this without using a loop? The results would be similar to if I multiply a %*% b, but this leaves me with floating point values.
You could do apply and rowSums (this will be stochastic):
library(magrittr)
set.seed(1)
a = matrix(runif(16),nrow=4,ncol=4)
a = apply(a,2,FUN = function(x) x/sum(x))
b <- c(5,7,5,9)
out <- purrr::map(1:4, ~rmultinom(1, b[.x], a[,.x])) %>%
unlist() %>%
matrix(nrow = 4) %>%
rowSums()
out
[1] 7 7 9 3
sum(out)
[1] 26
sum(b)
[1] 26
I want my function to be able to take a value or a column name. How can I do this with data.table?
library(data.table)
df <- data.table(a = c(1:5),
b = c(5:1),
c = c(1, 3, 5, 3, 1))
myfunc <- function(val) {
df[a >= val]
}
# This works:
myfunc(2)
# This does not work:
myfunc("c")
If I define my function as:
myfunc <- function(val) {
df[a >= get(val)]
}
# This doesn't work:
myfunc(2)
# This works:
myfunc("c")
What is the best way to resolve this?
Edit: To be clear, I want to results to be the same as:
# myfunc(2)
df %>%
filter(a >= 2)
# myfunc("c")
df %>%
filter(a >= c)
EDIT:
Thanks all for the responses, I think I like dww's answer the best.
I wish it was as easy as in dplyr, where I can do:
myfunc <- function(val) {
df %>%
filter(a >= {{val}})
}
# Both work:
myfunc(2)
myfunc(c)
If you build and parse the whole expression, then you can evaluate it in its entirety. For example
myfunc <- function(val) {
df[eval(parse(text=paste("a >= ", val)))]
}
Though relying on a function that lets you mix values and variable names in the same parameter might be dangerous. Especially in the case where you actually wanted to match on character values rather than variable names. If you passed in the whole expression you could do
myfunc <- function(expr) {
expr <- substitute(expr)
df[eval(expr)]
}
myfunc(a>=3)
myfunc(a>=c)
The question did not actually define the desired behavior so we assume that df must be a data.table and if a character string is passed then the column of that name should be returned and if a number is passed then those rows whose a column exceed that number should be returned.
Define an S3 generic and methods for character and default.
myfunc <- function(x, data = df) UseMethod("myfunc")
myfunc.character <- function(x, data = df) data[[x]]
myfunc.default <- function(x, data = df) data[a > x]
myfunc(2)
## a b c
## 1: 3 3 5
## 2: 4 2 3
## 3: 5 1 1
myfunc("c")
## [1] 1 3 5 3 1
I have the following function which finds the distinct number of cases belonging to 4 different factors. test is a list containing 4 dataframes
for (i in test){
i<-i%>%distinct(FileNumber)%>%nrow()
print(i)
}
when i run this, I get the following output
[1] 38
[1] 129
[1] 1868
[1] 277
However I want this output to be saved into another vector called my_vector. So that my_vector is
38 129 1868 277
So I tried the following based on this answer I found
Saving results from for loop as a vector in r
library(dplyr)
my_vector<-vector("numeric",4L)
for (i in test){
my_vector[i]<-i%>%distinct(FileNumber)%>%nrow()
}
However when I run this I get the following message
Error in my_vector[i] <- i %>% distinct(FileNumber) %>% nrow() :
invalid subscript type 'list'
How do I get the earlier output I listed saved into a vector?
You are trying to index my_vector with a list-like object.
For instance:
mylist <- list(mtcars, mtcars)
myvec <- numeric(length(mylist))
for (i in mylist) {
myvec[i] <- nrow(distinct(i, cyl))
}
On the first (and second in this example) iteration, i is a frame, so myvec[i] is equivalent to myvec[mtcars], which does not make sense.
Instead, loop over the index of the list of frames, ala:
library(dplyr)
mylist <- list(mtcars, mtcars)
myvec <- numeric(length(mylist))
for (i in seq_len(length(mylist))) {
myvec[i] <- test[[i]] %>% distinct(cyl) %>% nrow()
}
myvec
# [1] 3 3
or just do something like:
sapply(mylist, function(l) l %>% distinct(cyl) %>% nrow())
# [1] 3 3
BTW: this is just as easy in base-R with:
sapply(mylist, function(l) length(unique(l[["cyl"]])))
# [1] 3 3
This should work with a list of data frames or matrices
d <- list(a = matrix(rnorm(100), nrow = 20),
b = matrix(rnorm(100), nrow = 10),
c = matrix(rnorm(100), nrow = 50))
my_vect <- c()
for (i in seq_along(d)){
n <- nrow(d[[i]])
my_vect[i] <- n
}
my_vect
[1] 20 10 50
Use unlist() and if that doesn't work, then add as.vector() in your pipe:
for (i in test){
i<-i %>% distinct(FileNumber) %>% nrow() %>% unlist()
print(i)
}
If that does not come out as a vector then:
for (i in test){
i<-i %>% distinct(FileNumber) %>% nrow() %>% unlist() %>% as.vector()
print(i)
}
I am working on filtering a data frame using dplyr. The problem is that the predicates differs between columns.
Please find below a minimal example with three columns and three predicates:
library(tidyverse)
set.seed(123)
dframe <- rerun(3, rnorm(5)) %>%
set_names(paste0("var", 1:3)) %>%
data.frame
cond <- c(2, 1, -1.4)
dframe %>% filter(var1 < cond[1] & var2 < cond[2] & var3 > cond[3])
Is there any way to filter the data set without explicitly stating the predicates in filter?
Edit: A potential solution to the problem is obviously using a for-loop, see the code below. However, there might be more elegant solutions.
dframe_help <- dframe
cond <- c(2, 1, -1.4)
isSmaller <- c(TRUE, TRUE, FALSE)
for(i in seq_along(cond)) {
if (isSmaller[i])
dframe_help <- dframe_help %>% filter_at(.vars = vars(num_range(prefix = "var", range = i)),
.vars_predicate = all_vars(. < cond[i]))
else
dframe_help <- dframe_help %>% filter_at(.vars = vars(num_range(prefix = "var", range = i)),
.vars_predicate = all_vars(. > cond[i]))
}
You need some sort of object to specify whether to use < or >. I've created one called less, which is 1 for < and 0 for >.
require(purrr); require(magrittr)
filter2 <- function(dframe, cond, less){
rows <- pmap(list(cond, less, dframe),
function(cond, less, x) if(less) x < cond else x > cond
) %>%
pmap_lgl(all)
dframe[rows,]
}
dframe %>% filter2(cond = c(2, 1, -1.4), less = c(1, 1, 0))
Or, explicitly pass the function you want to use for each variable.
filter3 <- function(df, y, fun){
df[pmap(list(df, y, fun), function(x, y, fun) fun(x, y)) %>%
pmap_lgl(all)
,]
}
dframe %>% filter3(y = c(2, 1, -1.4), fun = list(`<`, `<`, `>`))
Not sure what you mean by 'automating' this process, but here are a couple of options.
If you want to filter along multiple features with some extra clarity, you can create a standalone filtering function:
cond <- c(2, 1, -1.4)
filter_using_conditions <- function(df) {
df[df$var1 < cond[1] & df$var2 < cond[2] & df$var3 > cond[3],]
}
dframe %>%
filter_using_conditions()
var1 var2 var3
2 0.4978505 -0.2179749 0.8377870
3 -1.9666172 -1.0260044 0.1533731
4 0.7013559 -0.7288912 -1.1381369
5 -0.4727914 -0.6250393 1.2538149
If you want implement a solution using vectors of operators and values, you can try doing some string manipulation and use base::eval() or glue::eval() to generate a logical vector for subsetting your dataframe. Here's an example using purrr:map and purrr:map2 (it's not very elegant but hopefully gets the point across):
cond <- c(2, 1, -1.4)
operators <- c("<", "<", ">")
filter_conditions <- function(dframe, conds, operators) {
x <- paste(operators, conds, sep = " ")
rows_to_use <- map2(dframe, x, paste) %>%
map(map_lgl, glue::evaluate, NULL) %>%
as_tibble() %>%
na_if(FALSE) %>%
complete.cases()
dframe[rows_to_use,]
}
filter_conditions(dframe, cond, operators)
var1 var2 var3
2 0.4978505 -0.2179749 0.8377870
3 -1.9666172 -1.0260044 0.1533731
4 0.7013559 -0.7288912 -1.1381369
5 -0.4727914 -0.6250393 1.2538149
This example uses purrr:map2() to generate individual strings for each datapoint using the specified operator-condition pairings, and then uses glue::evaluate() and purrr:map2() to execute those strings as commands and return logical vectors. dplyr::na_if() is used so you can later use complete.cases() to get a logical vector corresponding to row indices.
map2(dframe, x, paste)
$var1
[1] "1.78691313680308 < 2" "0.497850478229239 < 2" "-1.96661715662964 < 2" "0.701355901563686 < 2"
[5] "-0.472791407727934 < 2"
$var2
[1] "-1.06782370598685 < 1" "-0.217974914658295 < 1" "-1.02600444830724 < 1" "-0.72889122929114 < 1"
[5] "-0.625039267849257 < 1"
$var3
[1] "-1.68669331074241 > -1.4" "0.837787044494525 > -1.4" "0.153373117836515 > -1.4"
[4] "-1.13813693701195 > -1.4" "1.25381492106993 > -1.4"