rmultinom() – but transposed? - r

I want a multinominal distributed data frame with dummies. The probabilities should be applied to the columns. I have following code which seems a bit awkward. Does anyone have a better idea?
set.seed(1234)
data.table::transpose(data.frame(rmultinom(10, 1, c(1:5)/5)))
# V1 V2 V3 V4 V5
# 1 0 0 0 1 0
# 2 0 0 0 0 1
# 3 0 0 0 0 1
# 4 0 1 0 0 0
# 5 0 0 0 0 1
# 6 0 0 0 0 1
# 7 0 0 0 1 0
# 8 0 1 0 0 0
# 9 0 0 0 0 1
# 10 0 0 0 1 0

A little shorter: and doesn't involve multiple coercions.
data.frame(t(rmultinom(10, 1, c(1:5)/5)))
or
library(data.table)
data.table(t(rmultinom(10, 1, c(1:5)/5)))

Related

Best way to covert List to Matrix or Tibble format?

I'm am seeking a decent way to convert output from a function as a list into a matrix or tibble format.
The following tibble feeds into a function. The function returns a list. In this simple example, the returned list happens to contain the same values as the function input tibble.
# # A tibble: 6 x 15
# rev CoS gm sga ebitda bd ebit ie ii gain ebt chg_DTL current tax ni
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This is the list that is returned from the function.
> ni_out
$rev
[1] 0 0 0 0 0 0
$CoS
[1] 0 0 0 0 0 0
$gm
[1] 0 0 0 0 0 0
$sga
[1] 0 0 0 0 0 0
$ebitda
[1] 0 0 0 0 0 0
$bd
[1] 0 0 0 0 0 0
$ebit
[1] 0 0 0 0 0 0
$ie
[1] 0 0 0 0 0 0
$ii
[1] 0 0 0 0 0 0
$gain
[1] 0 0 0 0 0 0
$ebt
[1] 0 0 0 0 0 0
$chg_DTL_net
[1] 0 0 0 0 0 0
$current
[1] 0 0 0 0 0 0
$tax
[1] 0 0 0 0 0 0
$ni
[1] 0 0 0 0 0 0
I desire to convert that back into something more pleasing to look at such as the original tibble format or a matrix.
I obtain the dimensions of the list output .
lengths(ni_out)[[1]]
# [1] 6
> length(ni_out)
# [1] 15
However, my unsuccessful attempt at a matrix appears as the following.
as.matrix(unlist(ni_out), nrow = lengths(ni_out)[[1]], ncol = length(ni_out))
# [,1]
# rev1 0
# rev2 0
# rev3 0
# rev4 0
# rev5 0
# rev6 0
# CoS1 0
# CoS2 0
# CoS3 0
# CoS4 0
# CoS5 0
# CoS6 0
# gm1 0
# gm2 0
# gm3 0
# gm4 0
# gm5 0
# gm6 0
# sga1 0
# sga2 0
# sga3 0
# sga4 0
# sga5 0
# sga6 0
# ebitda1 0
# ebitda2 0
# etc.
Thoughts for a matrix or tibble format ??
Next time please provide a reproducible example.
If your list is called mylist I would try data.table::rbindlist(mylist)
Please see an example below including the conversion of vectors to data.frames.
dat <- 0:5
mylist <- list(dat, dat, dat)
mylist <- lapply(mylist, function(x) data.frame(t(x)))
data.table::rbindlist(mylist)
> data.table::rbindlist(mylist)
X1 X2 X3 X4 X5 X6
1: 0 1 2 3 4 5
2: 0 1 2 3 4 5
3: 0 1 2 3 4 5
EDIT: it seems you want to cbind instead of rbind, so I would use the below in that case.
dat <- 0:5
mylist <- list(dat, dat, dat)
mylist <- lapply(mylist, function(x) data.frame(x))
dplyr::bind_cols(mylist)
x...1 x...2 x...3
1 0 0 0
2 1 1 1
3 2 2 2
4 3 3 3
5 4 4 4
6 5 5 5
As you can see the answer is different depending on what you want and therefore it's important to provide an example.
You can use do.call funtion like this:
a <- list(data.frame(x=1:5),data.frame(y=1:5))
do.call("cbind",a)
Check cbindlist function too.
simply call data.frame or as_tibble on the list:
l <- list(x=rep(0,6),y=rep(0,6), z=rep(0,6), t=rep(0,6))
data.frame(l)
x y z t
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
as_tibble(l)
# A tibble: 6 x 4
x y z t
<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
as for matrix transform it first to a data.frame then to a matrix
as.matrix(data.frame(l))
x y z t
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
[5,] 0 0 0 0
[6,] 0 0 0 0
Another option with as.data.table
library(data.table)
as.data.table(l)
data
l <- list(x=rep(0,6),y=rep(0,6), z=rep(0,6), t=rep(0,6))

Using any in nested ifelse statement

data:
set.seed(1337)
m <- matrix(sample(c(0,0,0,1),size = 50,replace=T),ncol=5) %>% as.data.frame
colnames(m)<-LETTERS[1:5]
code:
m %<>%
mutate(newcol = ifelse(A==1&(B==1|C==1)&(D==1|E==1),1,
ifelse(any(A,B,C,D,E),0,NA)),
desiredResult= ifelse(A==1&(B==1|C==1)&(D==1|E==1),1,
ifelse(!(A==0&B==0&C==0&D==0&E==0),0,NA)))
looks like:
A B C D E newcol desiredResult
1 0 1 1 1 0 0 0
2 0 1 0 0 1 0 0
3 0 1 0 0 0 0 0
4 0 0 0 0 0 0 NA
5 0 1 0 1 0 0 0
6 0 0 1 0 0 0 0
7 1 1 1 1 0 1 1
8 0 1 1 0 0 0 0
9 0 0 0 0 0 0 NA
10 0 0 1 0 0 0 0
question
I want newcol to be the same as desiredResult.
Why can't I use any in that "stratified" manner of ifelse. Is there a function like any that would work in that situation?
possible workaround
I could define a function
any_vec <- function(...) {apply(cbind(...),1,any)} but this does not make me smile too much.
like suggested in the answer
using pmax works exactly like a vectorized any.
m %>%
mutate(pmaxResult = ifelse(A==1& pmax(B,C) & pmax(D,E),1,
ifelse(pmax(A,B,C,D,E),0,NA)),
desiredResult= ifelse(A==1&(B==1|C==1)&(D==1|E==1),1,
ifelse(!(A==0&B==0&C==0&D==0&E==0),0,NA)))
Here's an alternative approach. I converted to logical at the beginning and back to integer at the end:
m %>%
mutate_all(as.logical) %>%
mutate(newcol = A & pmax(B,C) & pmax(D, E) ,
newcol = replace(newcol, !newcol & !pmax(A,B,C,D,E), NA)) %>%
mutate_all(as.integer)
# A B C D E newcol
# 1 0 1 1 1 0 0
# 2 0 1 0 0 1 0
# 3 0 1 0 0 0 0
# 4 0 0 0 0 0 NA
# 5 0 1 0 1 0 0
# 6 0 0 1 0 0 0
# 7 1 1 1 1 0 1
# 8 0 1 1 0 0 0
# 9 0 0 0 0 0 NA
# 10 0 0 1 0 0 0
I basically replaced the any with pmax.

Building a symmetric binary matrix

I have a matrix that is for example like this:
rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3
I want to make a Symmetric binary matrix that it's dimnames of that is the same as rownames of above matrix. I want to fill these matrix by 1 & 0 in such a way that 1 indicated placing variables that has the same number in front of it and 0 for the opposite situation.This matrix would be like
dimnames
a c b d y q i j r
a 1 0 0 0 0 0 1 1 0
c 0 1 0 0 0 0 0 0 1
b 0 0 1 0 1 0 0 0 0
d 0 0 0 1 0 1 0 0 0
y 0 0 1 0 1 0 0 0 0
q 0 0 0 1 0 1 0 0 0
i 1 0 0 0 0 0 1 1 0
j 1 0 0 0 0 0 1 1 0
r 0 1 0 0 0 0 0 0 1
Anybody know how can I do that?
Use dist:
DF <- read.table(text = "rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3", header = TRUE)
res <- as.matrix(dist(DF$V1)) == 0L
#alternatively:
#res <- !as.matrix(dist(DF$V1))
#diag(res) <- 0L #for the first version of the question, i.e. a zero diagonal
res <- +(res) #for the second version, i.e. to coerce to an integer matrix
dimnames(res) <- list(DF$rownames, DF$rownames)
# 1 2 3 4 5 6 7 8 9
#1 1 0 0 0 0 0 1 1 0
#2 0 1 0 0 0 0 0 0 1
#3 0 0 1 0 1 0 0 0 0
#4 0 0 0 1 0 1 0 0 0
#5 0 0 1 0 1 0 0 0 0
#6 0 0 0 1 0 1 0 0 0
#7 1 0 0 0 0 0 1 1 0
#8 1 0 0 0 0 0 1 1 0
#9 0 1 0 0 0 0 0 0 1
You can do this using table and crossprod.
tcrossprod(table(DF))
# rownames
# rownames a b c d i j q r y
# a 1 0 0 0 1 1 0 0 0
# b 0 1 0 0 0 0 0 0 1
# c 0 0 1 0 0 0 0 1 0
# d 0 0 0 1 0 0 1 0 0
# i 1 0 0 0 1 1 0 0 0
# j 1 0 0 0 1 1 0 0 0
# q 0 0 0 1 0 0 1 0 0
# r 0 0 1 0 0 0 0 1 0
# y 0 1 0 0 0 0 0 0 1
If you want the row and column order as they are found in the data, rather than alphanumerically, you can subset
tcrossprod(table(DF))[DF$rownames, DF$rownames]
or use factor
tcrossprod(table(factor(DF$rownames, levels=unique(DF$rownames)), DF$V1))
If your data is large or sparse, you can use the sparse matrix algebra in xtabs, with similar ways to change the order of the resulting table as before.
Matrix::tcrossprod(xtabs(data=DF, ~ rownames + V1, sparse=TRUE))

sum or group specific columns based on clusters in r

So I have a data set of species and abundances, here is a sample of it:
aca.qua aca.bah aca.chi achi.lin alb.vul alu.mon ani.vir arc.rho asp.lun aux.roc bag.bag bag.mar bal.cap cal.cal cal.pen
1 0 0 0 0 5 0 57 0 0 0 0 0 0 0 16
2 0 0 1 0 2 0 3 0 0 0 0 8 0 0 0
3 0 0 0 0 1 0 3 0 0 0 0 0 0 0 3
4 0 0 0 0 5 0 0 0 22 0 0 94 0 0 0
5 0 0 0 0 1 0 0 0 0 2 3 2 0 0 1
6 0 0 0 0 0 0 0 1 0 0 2 2 0 0 0
A made a cluster analysis with some of the species traits and came up with some clusters were each species should be included:
aca.qua aca.bah aca.chi achi.lin alb.vul alu.mon ani.vir arc.rho asp.lun aux.roc bag.bag bag.mar bal.cap cal.cal cal.pen
1 1 1 2 3 1 4 4 1 5 4 4 1 1 1
"aca.qua" should be in cluster 1, as well as "aca.bah", "aca.chi" and "alu.mon", etc. "achi.lin" in cluster two and so on.
I was trying to come up with a code that uses the references in the second data frame to group the columns by cluster and sum them. I was trying to do so with dplyr, mutate and some loops, but I never managed to get to a good way of doing that. I tried adding the clusters as a row thant using t() to transpose and select(), then transpose back, etc, it was getting way too complicated.
Is there any way that I can use the the vector containing the names of the species and it's clusters as reference to sum the respective columns of each cluster?
The idea is to end up with something like this, but for all the clusters:
V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 cluster1
1 1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 1
4 1 0 0 0 0 0 0 0 0 0 0
5 0 0 1 0 0 0 0 1 0 0 22
6 0 1 0 0 0 0 0 0 0 0 0
Here I used the following code:
teste4 <- teste3 %>%
filter(V1 == 1) %>%
select(-1)
teste5 <- teste4 %>%
mutate(cluster1 = rowSums(teste4[, 1:rowSums(teste4)]))
The point here is that I will also try several different cluster methods and models, therefore, I need to make it somehow more automatic when I come up with new cluster combinations instead of manualy selecting each columns (the original dataset is much larger.
Try to add the rows that match each cluster with rowSums. We can wrap it in an lapply call to cycle through each unique cluster:
lst <- lapply(1:max(df2[1,]), function(x) rowSums(df1[,df2[1,] == x, drop=F]))
setNames(data.frame(lst),paste0("clust",1:length(lst)))
# clust1 clust2 clust3 clust4 clust5
# 1 16 0 5 57 0
# 2 1 0 2 11 0
# 3 3 0 1 3 0
# 4 22 0 5 94 0
# 5 1 0 1 5 2
# 6 0 0 0 5 0

Creating a factor/categorical variable from 4 dummies

I have a data frame with four columns, let's call them V1-V4 and ten observations. Exactly one of V1-V4 is 1 for each row, and the others of V1-V4 are 0. I want to create a new column called NEWCOL that takes on the value of 3 if V3 is 1, 4 if V4 is 1, and is 0 otherwise.
I have to do this for MANY sets of variables V1-V4 so I would like the solution to be as short as possible so that it will be easy to replicate.
This does it for 4 columns to add a fifth using matrix multiplication:
> cbind( mydf, newcol=data.matrix(mydf) %*% c(0,0,3,4) )
V1 V2 V3 V4 newcol
1 1 0 0 0 0
2 1 0 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 1 0 3
6 0 0 1 0 3
7 0 0 0 1 4
8 0 0 0 1 4
9 0 0 0 1 4
10 0 0 0 1 4
It's generalizable to getting multiple columns.... we just need the rules. You need to make a matric with the the same number of rows as there are columns in the original data and have one column for each of the new factors needed to build each new variable. This shows how to build one new column from the sum of 3 times the third column plus 4 times the fourth, and another new column from one times the first and 2 times the second.
> cbind( mydf, newcol=data.matrix(mydf) %*% matrix(c(0,0,3,4, # first set of factors
1,2,0,0), # second set
ncol=2) )
V1 V2 V3 V4 newcol.1 newcol.2
1 1 0 0 0 0 1
2 1 0 0 0 0 1
3 0 1 0 0 0 2
4 0 1 0 0 0 2
5 0 0 1 0 3 0
6 0 0 1 0 3 0
7 0 0 0 1 4 0
8 0 0 0 1 4 0
9 0 0 0 1 4 0
10 0 0 0 1 4 0
An example data set:
mydf <- data.frame(V1 = c(1, 1, rep(0, 8)),
V2 = c(0, 0, 1, 1, rep(0, 6)),
V3 = c(rep(0, 4), 1, 1, rep(0, 4)),
V4 = c(rep(0, 6), rep(1, 4)))
# V1 V2 V3 V4
# 1 1 0 0 0
# 2 1 0 0 0
# 3 0 1 0 0
# 4 0 1 0 0
# 5 0 0 1 0
# 6 0 0 1 0
# 7 0 0 0 1
# 8 0 0 0 1
# 9 0 0 0 1
# 10 0 0 0 1
Here's an easy approach to generate the new column:
mydf <- transform(mydf, NEWCOL = V3 * 3 + V4 * 4)
# V1 V2 V3 V4 NEWCOL
# 1 1 0 0 0 0
# 2 1 0 0 0 0
# 3 0 1 0 0 0
# 4 0 1 0 0 0
# 5 0 0 1 0 3
# 6 0 0 1 0 3
# 7 0 0 0 1 4
# 8 0 0 0 1 4
# 9 0 0 0 1 4
# 10 0 0 0 1 4

Resources