I have a list l and an integer n. I would like to pass l n-times to expand.grid.
Is there a better way than writing expand.grid(l, l, ..., l) with n times l?
The function rep seems to do what you want.
n <- 3 #number of repetitions
x <- list(seq(1,5))
expand.grid(rep(x,n)) #gives a data.frame of 125 rows and 3 columns
x2 <- list(a = seq(1,5), b = seq(6, 10))
expand.grid(rep(x2,n)) #gives a data.frame of 15625 rows and 6 columns
If the solution by #Phann doesn't fit to your situation, you can try the following "evil trio" solution:
l <- list(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("male", "female"))
n <- 4
eval(parse(text = paste("expand.grid(",
paste(rep("l", times = n), collapse = ","), ")")))
I think the easiest way to solve the original question is to nest the list using rep.
For example, to expand the same list, n times, use rep to expand the nested list as many times as necessary (n), then use the expanded list as the only argument to expand.grid.
# Example list
l <- list(1, 2, 3)
# Times required
n <- 3
# Expand as many times as needed
m <- rep(list(l), n)
# Expand away
expand.grid(m)
If the function is wanted to (repeatedly) act on the elements of the list freely (i.e., the list members being unconnected from the defined list itself), the following will be useful:
l <- list(1:5, "s") # A list with numerics and characters
n <- 3 # number of repetitions
expand.grid(unlist(rep(l, n))) # the result is:
Var1
1 1
2 2
3 3
4 4
5 5
6 s
7 1
8 2
9 3
10 4
11 5
12 s
13 1
14 2
15 3
16 4
17 5
18 s
Related
I want to turn combinations of columns into some kind of interpretable variable. There are 3 levels of a factor repeated in three columns, for each id. For all the combinations between the variables I would like to gain a list, and when I have the lsit, I want to know how many times can we find each combination. For example, when q1 and q2 are the same, it should return "A". An then A appear XX times. Anyone with suggestions? Thanks!!
id <- 1:10
set.seed(1)
q1 <- sample(1:3, 10, replace=TRUE)
set.seed(2)
q2 <- sample(1:3, 10, replace=TRUE)
set.seed(2)
q3 <- sample(1:3, 10, replace=TRUE)
df <- data.frame(id,q1,q2,q3)
df
df
id q1 q2 q3
1 1 1 1 1
2 2 2 3 3
3 3 2 2 2
4 4 3 1 1
5 5 1 3 3
6 6 3 3 3
7 7 3 1 1
8 8 2 3 3
9 9 2 2 2
10 10 1 2 2
if df$q1=="1" & df$q2=="1" print A
if df$q1=="1" & df$q2=="2" print B
if df$q1=="1" & df$q2=="3" print C
if df$q1=="2" & df$q2=="3" print D
if df$q1=="2" & df$q2=="2" print E
if df$q1=="3" & df$q2=="3" print F
if df$q2=="1" & df$q2=="1" print G
if df$q2=="1" & df$q2=="2" print H
response <- save(print A, print B, print C and so on....)
length(A)
length(B)
and so on...
I think this should do what you want, using base R. I hope I understood your desired output. I basically combined each pair of columns into its own variable (comb.var[, i]) and then combined that with each column name pair to create another variable output$fct and the relabeled the new variable which represents each q-pair x value-pair combination and counted the occurrence of each combination with summary()
code:
# dimensions of df
n = nrow(df) #rows
p = ncol(df) #columns
# unique pairs of q columns
pairs.n = choose(p - 1, 2) # number of unique pairs
pairs = combn(1:(p - 1), 2) # matrix of those pairs
# data frame of NAs of proper size
comb.var <- matrix(NA, nrow = n, ncol = pairs.n)
for(combo in 1:ncol(pairs)){
i = pairs[1, combo]
j = pairs[2, combo]
# get the right 2 columns from df
qi = df[, i + 1]
qj = df[, j + 1]
# combine into 1 variable
comb.var[, combo] <- paste(qi, qj, sep = "_")
}
# clean up the output: turn out.M into vector and add id columns
output = data.frame(data.frame(id = rep(df$id, times = pairs.n),
qi = rep(pairs[1, ], each = n),
qj = rep(pairs[2, ], each = n),
val = as.vector(comb.var)))
# combine variables again
output$fct = with(output, paste(qi, qj, val, sep = "."))
# count number of different outputs
uniq.n = length(unique(output$fct))
# re-label the factor
output$fct <- factor(output$fct, labels = LETTERS[1:uniq.n])
# count the group members
summary(output$fct)
I have a simple problem which can be solved in a dirty way, but I'm looking for a clean way using data.table
I have the following data.table with n columns belonging to m unequal groups. Here is an example of my data.table:
dframe <- as.data.frame(matrix(rnorm(60), ncol=30))
cletters <- rep(c("A","B","C"), times=c(10,14,6))
colnames(dframe) <- cletters
A A A A A A
1 -0.7431185 -0.06356047 -0.2247782 -0.15423889 -0.03894069 0.1165187
2 -1.5891905 -0.44468389 -0.1186977 0.02270782 -0.64950716 -0.6844163
A A A A B B B
1 -1.277307 1.8164195 -0.3957006 -0.6489105 0.3498384 -0.463272 0.8458673
2 -1.644389 0.6360258 0.5612634 0.3559574 1.9658743 1.858222 -1.4502839
B B B B B B B
1 0.3167216 -0.2919079 0.5146733 0.6628149 0.5481958 -0.01721261 -0.5986918
2 -0.8104386 1.2335948 -0.6837159 0.4735597 -0.4686109 0.02647807 0.6389771
B B B B C C
1 -1.2980799 0.3834073 -0.04559749 0.8715914 1.1619585 -1.26236232
2 -0.3551722 -0.6587208 0.44822253 -0.1943887 -0.4958392 0.09581703
C C C C
1 -0.1387091 -0.4638417 -2.3897681 0.6853864
2 0.1680119 -0.5990310 0.9779425 1.0819789
What I want to do is to take a random subset of the columns (of a sepcific size), keeping the same number of columns per group (if the chosen sample size is larger than the number of columns belonging to one group, take all of the columns of this group).
I have tried an updated version of the method mentioned in this question:
sample rows of subgroups from dataframe with dplyr
but I'm not able to map the column names to the by argument.
Can someone help me with this?
Here's another approach, IIUC:
idx <- split(seq_along(dframe), names(dframe))
keep <- unlist(Map(sample, idx, pmin(7, lengths(idx))))
dframe[, keep]
Explanation:
The first step splits the column indices according to the column names:
idx
# $A
# [1] 1 2 3 4 5 6 7 8 9 10
#
# $B
# [1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#
# $C
# [1] 25 26 27 28 29 30
In the next step we use
pmin(7, lengths(idx))
#[1] 7 7 6
to determine the sample size in each group and apply this to each list element (group) in idx using Map. We then unlist the result to get a single vector of column indices.
Not sure if you want a solution with dplyr, but here's one with just lapply:
dframe <- as.data.frame(matrix(rnorm(60), ncol=30))
cletters <- rep(c("A","B","C"), times=c(10,14,6))
colnames(dframe) <- cletters
# Number of columns to sample per group
nc <- 8
res <- do.call(cbind,
lapply(unique(colnames(dframe)),
function(x){
dframe[,if(sum(colnames(dframe) == x) <= nc) which(colnames(dframe) == x) else sample(which(colnames(dframe) == x),nc,replace = F)]
}
))
It might look complicated, but it really just takes all columns per group if there's less than nc, and samples random nc columns if there are more than nc columns.
And to restore your original column-name scheme, gsub does the trick:
colnames(res) <- gsub('.[[:digit:]]','',colnames(res))
I'm trying to crate a data frame in R by generating rows and appending them one by one. I am doing following
# create an empty data frame.
x <- data.frame ()
# Create 2 lists.
l1 <- list (a = 9, b = 2, c = 4)
l2 <- list (a = 7, b = 2, c = 3)
# Append and print.
x <- rbind (x, l1)
x
a b c
2 9 2 4
# append l2
x <- rbind (x, l2)
x
a b c
2 9 2 4
21 7 2 3
# Append again
x <- rbind (x, l2)
x
a b c
2 9 2 4
21 7 2 3
3 7 2 3
# Append again.
x <- rbind (x, l2)
x
a b c
2 9 2 4
21 7 2 3
3 7 2 3
4 7 2 3
My question is when I print x, what is the significance of the values printed at the beginning of each row ( ie the values 2, 21, 3, 4...) and why these values are appearing as they are, I'd expect then to have been 1,2, 3, 4 .... and so on for shown the indexes of corresponding rows.
Please help.
I think your issue is that you are trying to rbind a data.frame with a list. If you change your rbind commands to this:
x <- rbind (x, as.data.frame(l1))
you won't have an issue.
If you have many lists, may I suggest the data.table package which is very convenient and fast. An example follows:
library(data.table)
n = 100;
V=vector("list",n)
for (i in 1:n) {
V[[i]]<-list(a=runif(1),b=runif(1),c=runif(1));
}
V=rbindlist(V)
V
Thanks.
You won't have strange row names if you avoid initializing an empty data frame.
x <- as.data.frame(l1)
x <- rbind (x, l1)
x <- rbind (x, l2)
x <- rbind (x, l2)
x
If you want to bind rows in a more efficient way, I recommend you the function rbindlist from the data.table package.
So, I have several dataframes like this
1 2 a
2 3 b
3 4 c
4 5 d
3 5 e
......
1 2 j
2 3 i
3 4 t
3 5 r
.......
2 3 t
2 4 g
6 7 i
8 9 t
......
What I want is, I want to merge all of these files into one single file showing the values of third column for each pair of values in columns 1 and columns 2 and 0 if that pair is not present.
So, the output for this will be, since, there are three files (there are more)
1 2 aj0
2 3 bit
3 4 ct0
4 5 d00
3 5 er0
6 7 00i
8 9 00t
......
What I did was combine all my text .txt files in a single list.
Then,
L <- lapply(seq_along(L), function(i) {
L[[i]][, paste0('DF', i)] <- 1
L[[i]]
})
Which will indicate the presence of a value when we will be merging them.
I don't know how to proceed further. Any inputs will be great. Thanks!
Here is one way to do it with Reduce
# function to generate dummy data
gen_data<- function(){
data.frame(
x = 1:3,
y = 2:4,
z = sample(LETTERS, 3, replace = TRUE)
)
}
# generate list of data frames to merge
L <- lapply(1:3, function(x) gen_data())
# function to merge by x and y and concatenate z
f <- function(x, y){
d <- merge(x, y, by = c('x', 'y'), all = TRUE)
# set merged column to zero if no match is found
d[['z.x']] = ifelse(is.na(d[['z.x']]), 0, d[['z.x']])
d[['z.y']] = ifelse(is.na(d[['z.y']]), 0, d[['z.y']])
d$z <- paste0(d[['z.x']], d[['z.y']])
d['z.x'] <- d['z.y'] <- NULL
return(d)
}
# merge data frames
Reduce(f, L)
I have one matrix of mutation counts, say "counts". This matrix has column names V1, V2,...,Vi,...Vn where not every "i" is there. Thus it can jump, such as V1, V2, V5 say. Further, most of columns have a 0 in them.
I need to create a sum matrix, called "answer", where element i, j is the sum of the number of the number counts at both i and j. At the i, i element it just shows the number of counts at i.
Here's a quick data set up. I already have the correct dimensioned matrix set up in my code called "answer". Thus what I would need to automate are the last several lines where I fill in the matrix.
counts <- matrix(data = c(0,2,0,5,0,6,0), nrow = 1, ncol = 7, dimnames=list("",c("V1","V2","V3","V4","V5","V6","V7")))
answer <- matrix(data =0, nrow = 3, ncol = 3, dimnames = list(c("V2","V4","V6"),c("V2","V4","V6")))
answer[1,1] <- 2
answer[1,2] <- 7
answer[1,3] <- 8
answer[2,1] <- 7
answer[2,2] <- 5
answer[2,3] <- 11
answer[3,1] <- 8
answer[3,2] <- 11
answer[3,3] <- 6
I understand I can do this with 2 nested for loops, but surely there must be a better way no? Thanks!
This could be done with the right use of expand.grid and rowSums:
n = counts[, counts > 0]
answer = matrix(rowSums(expand.grid(n, n)), nrow=length(n), dimnames=list(names(n), names(n)))
diag(answer) = n
To show how it works, n would end up being:
V2 V4 V5
2 5 6
and expand.grid(n, n) would be:
Var1 Var2
1 2 2
2 5 2
3 6 2
4 2 5
5 5 5
6 6 5
7 2 6
8 5 6
9 6 6
The last line (diag) is necessary because otherwise the diagonal would be twice the original vector (adding 2+2, 5+5, or 6+6).