Apply a function to a List of dataframes in R - r

I need help in how to manage lists in an iterative way.
I have the following list list which is composed of several dataframes with same columns, but different number of rows. Example:
[[1]]
id InpatientDays ERVisits OfficeVisits Narcotics
1 a 0 0 18 1
2 b 1 1 6 1
3 c 0 0 5 3
4 d 0 1 19 0
5 e 8 2 19 3
6 f 2 0 9 2
[[2]]
id InpatientDays ERVisits OfficeVisits Narcotics
7 a 16 1 8 1
8 b 2 0 8 0
9 c 2 1 4 3
10 d 4 2 0 2
11 e 6 5 20 2
12 a 0 0 7 4
I would like to apply a function to get all the possible combinations for the id for each "data frame" in the list.
I intended to try something like this lapply(list1, function(x) combn(unique(list1[x]$id))) Which of course does not work.. expecting to get something like:
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] "a" "a" "a" "a" "a" "b" "b" "b" "b" "c" "c" "c" "d" "d" "e"
[2,] "b" "c" "d" "e" "f" "c" "d" "e" "f" "d" "e" "f" "e" "f" "f"
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "a" "a" "a" "a" "b" "b" "b" "c" "c" "d"
[2,] "b" "c" "d" "e" "c" "d" "e" "d" "e" "e"
Is this possible? I know for sure this works for a single dataframe df
combn(unique(df$id),2)

We need to use unique(x$id)
lapply(list1, function(x) combn(unique(x$id),2))
The OP's code is looping the 'list1' using lapply. The anonymous function call (function(x)) returns each of the 'data.frame' within the list i.e. 'x' is the 'data.frame'. So, we just need to call x$id (or x[['id']]) to extract the 'id' column. In essence, 'x' is not an index. But, if we need to subset based on the index, we have to loop through the sequence of 'list1' (or if the list elements are named, then loop through the names of it)
lapply(seq_along(list1), function(i) combn(unique(list1[[i]]$id), 2))

Related

using rbind two combine two one-column variables

I am trying to simply use rbind for two columns and I use the following (all variables are city names and r considers them as factor)
firstcitynames <- rcffull$X1CityName
secondcitynames <- rcffull$X2CityName
allcitynames <- rbind(firstcitynames, secondcitynames)
allcitynames
then when get to View(allcitynames) all I get is a bunch of numbers instead of names:
[,2276] [,2277] [,2278] [,2279] [,2280] [,2281]
[,2282] [,2283] [,2284] [,2285] [,2286] [,2287]
Any suggestions?
You need to convert factors to characters with as.character(df$var)
Here's an illustration
a <- factor(letters[1:10])
b <- factor(LETTERS[1:10])
rbind(a,b)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## a 1 2 3 4 5 6 7 8 9 10
## b 1 2 3 4 5 6 7 8 9 10
rbind(as.character(a), as.character(b))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
## [2,] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
Assuming firstcitynames and secondcitynames are of type factors
you can try this
rbind(levels(firstcitynames),levels(secondcitynames))
this one also worked:
firstcitynames <- as.tibble(rcffull$X1CityName)
secondcitynames <- as.tibble(rcffull$X2CityName)
allcitynames <- rbind(firstcitynames, secondcitynames)
allcitynames

calculate the repeatence of combinations elements in R

suppose I have two vector like this :
l1 = c('C','D','E','F')
l2 = c('G','C','D','F')
I generate all combinations of two elements using combn function:
l1_vector = t(combn(l1,2))
l2_vector = t(combn(l2,2))
> l1_vector
[,1] [,2]
[1,] "C" "D"
[2,] "C" "E"
[3,] "C" "F"
[4,] "D" "E"
[5,] "D" "F"
[6,] "E" "F"
> l2_vector
[,1] [,2]
[1,] "G" "C"
[2,] "G" "D"
[3,] "G" "F"
[4,] "C" "D"
[5,] "C" "F"
[6,] "D" "F"
Now I want to calculate the repeat elements of l1_vector and l2_vector , as the example i give, the repeat of elements should be 3 (["C","D"],["C","F"],["D","F"])
How can I do that without using loop ?
As mentioned in the comments, you can use the merge function for this. Since the default behavior of merge is to use all of the available columns, it will return only those rows that are perfect matches.
> merge(l1_vector, l2_vector)
V1 V2
1 C D
2 C F
3 D F
>
> nrow(merge(l1_vector, l2_vector))
[1] 3
While merge is perfectly fine for your case, there is some work around.
If you just need the number of repeated elements:
choose(length(intersect(l1, l2)), 2)
[1] 3
If you need the repeated elements:
t(combn(intersect(l1, l2), 2))
[,1] [,2]
[1,] "C" "D"
[2,] "C" "F"
[3,] "D" "F"

Exclude rows where element has been previously met for N times

I have following input data:
# [,1] [,2]
#[1,] "A" "B"
#[2,] "A" "C"
#[3,] "A" "D"
#[4,] "B" "C"
#[5,] "B" "D"
#[6,] "C" "D"
Next I want to exclude rows where first or second element has been previously for N times. For example if N = 2 then need to exclude following rows:
#[3,] "A" "D" - element "A" has been 2 times
#[5,] "B" "D" - element "B" has been 2 times
#[6,] "C" "D" - element "C" has been 2 times
Note: Need to take into account excluding results immediately. For example if element has met 5 times and after removing it met only 1 times then need to leave next row with this element. Because now it meets 2 times.
Example (N=2):
Input data:
[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[3,] "A" "D"
[4,] "A" "E"
[5,] "B" "C"
[6,] "B" "D"
[7,] "B" "E"
[8,] "C" "D"
[9,] "C" "E"
[10,] "D" "E"
Output data:
[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[5,] "B" "C"
[10,] "D" "E"
There are possibly more elegant solutions... but this seems to work:
v <- c("A", "B", "C", "D", "E")
cmb <- t(combn(v, 2))
n <- 2
# Go through each letter
for (l in v)
{
# Find the combinations using that letter
rows <- apply(cmb, 1, function(x){l %in% x})
rows.2 <- which(rows==T)
if (length(rows.2)>n)
rows.2 <- rows.2[1:n]
# Take the first n rows containing the letter,
# then append all the ones not containing it
cmb <- rbind(cmb[rows.2,], cmb[rows==F,])
}
cmb
which outputs:
[,1] [,2]
[1,] "D" "E"
[2,] "B" "C"
[3,] "A" "C"
[4,] "A" "B"

put the individual list name to the last column in the list?

I would like to add the individual list name to the last column, respectively. what is the best way to do that efficiently.
lst <- list(a=matrix(runif(10), nrow=5, ncol=2), b=matrix(runif(6), nrow=3, ncol=2))
$a
[,1] [,2]
[1,] 0.5257330 0.52673079
[2,] 0.2103107 0.23357179
[3,] 0.3745236 0.03687697
[4,] 0.9731074 0.15569480
[5,] 0.2248541 0.60258915
$b
[,1] [,2]
[1,] 0.9901820 0.3648310
[2,] 0.8922225 0.4285105
[3,] 0.6963518 0.5795353
I would like this one: it means the individual list name should be added in the last column, respectively.
$a
[,1] [,2] [,3]
[1,] "0.52573303761892" "0.526730791199952" "a"
[2,] "0.210310699883848" "0.233571790158749" "a"
[3,] "0.374523550504819" "0.0368769748602062" "a"
[4,] "0.973107369150966" "0.155694802291691" "a"
[5,] "0.224854125175625" "0.602589153219014" "a"
$b
[,1] [,2] [,3]
[1,] "0.990182007197291" "0.36483103595674" "b"
[2,] "0.892222490161657" "0.42851050500758" "b"
[3,] "0.696351842954755" "0.579535307129845" "b"
Any help will be appreciated.
Kevin
A solution that keeps the names from the original list:
mapply(function(x, y) cbind(x, y), lst, names(lst))
Here's a solution that gives you exactly what you asked for. Based on your expected output, it seems like you're aware that by doing so, you're coercing the numbers in the matrix to characters.
lapply(names(lst), function(x) {
`colnames<-`(cbind(lst[[x]], x), NULL)
} )
# [[1]]
# [,1] [,2] [,3]
# [1,] "0.497699242085218" "0.934705231105909" "a"
# [2,] "0.717618508264422" "0.212142521282658" "a"
# [3,] "0.991906094830483" "0.651673766085878" "a"
# [4,] "0.380035179434344" "0.125555095961317" "a"
# [5,] "0.777445221319795" "0.267220668727532" "a"
#
# [[2]]
# [,1] [,2] [,3]
# [1,] "0.386114092543721" "0.86969084572047" "b"
# [2,] "0.0133903331588954" "0.34034899668768" "b"
# [3,] "0.382387957070023" "0.482080115471035" "b"

Combination without repetition in R

I am trying to get all the possible combinations of length 3 of the elements of a variable. Although it partly worked with combn() I did not quite get the output I was looking for. Here's my example
x <- c("a","b","c","d","e")
t(combn(c(x,x), 3))
The output I get looks like this
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "b" "d"
[3,] "a" "b" "e"
I am not really happy with this command for 2 reasons. I wanted to get an output that says "a+b+c" "a+b+b"...., unfortunately I wasn't able to edit the output with paste() or something.
I was also looking forward for one combination of each set of letters, that is I either get "a+b+c" or "b+a+c" but not both.
Try something like:
x <- c("a","b","c","d","e")
d1 <- combn(x,3) # All combinations
d1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "a" "a" "a" "a" "a" "a" "b" "b" "b" "c"
# [2,] "b" "b" "b" "c" "c" "d" "c" "c" "d" "d"
# [3,] "c" "d" "e" "d" "e" "e" "d" "e" "e" "e"
nrow(unique(t(d1))) == nrow(t(d1))
# [1] TRUE
d2 <- expand.grid(x,x,x) # All permutations
d2
# Var1 Var2 Var3
# 1 a a a
# 2 b a a
# 3 c a a
# 4 d a a
# 5 e a a
# 6 a b a
# 7 b b a
# 8 c b a
# 9 d b a
# ...
nrow(unique(d2)) == nrow(d2)
# [1] TRUE
try this
x <- c("a","b","c","d","e")
expand.grid(rep(list(x), 3))

Resources