I have the next issue trying to convert this list of list into a data frame where is unique element of the list is a column of its own.
This is what I have right now:
> head(data$egg_groups)
[[1]]
name resource_uri
1 Plant /api/v1/egg/7/
2 Monster /api/v1/egg/1/
[[2]]
name resource_uri
1 Plant /api/v1/egg/7/
2 Monster /api/v1/egg/1/
[[3]]
name resource_uri
1 Plant /api/v1/egg/7/
2 Monster /api/v1/egg/1/
[[4]]
name resource_uri
1 Dragon /api/v1/egg/14/
2 Monster /api/v1/egg/1/
[[5]]
name resource_uri
1 Dragon /api/v1/egg/14/
2 Monster /api/v1/egg/1/
[[6]]
name resource_uri
1 Dragon /api/v1/egg/14/
2 Monster /api/v1/egg/1/
What I would like to have is a data frame where is one of those entries (just name) is a column of its own.
Something like this:
Plant Monster Dragon
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
I have tried the library plyr and the using unlist and so far nothing has worked. Any tips would be appreciated. Thanks
EDIT: This is the dput pastebin link:
dput
I would suggest using mtabulate from the "qdapTools" package. First, just loop through the list and extract the relevant column as a vector, and use the resulting list as the input for mtabulate, something like this:
library(qdapTools)
head(mtabulate(lapply(L, `[[`, "name")))
# Bug Ditto Dragon Fairy Flying Ground Human-like Indeterminate Mineral Monster
# 1 0 0 0 0 0 0 0 0 0 1
# 2 0 0 0 0 0 0 0 0 0 1
# 3 0 0 0 0 0 0 0 0 0 1
# 4 0 0 1 0 0 0 0 0 0 1
# 5 0 0 1 0 0 0 0 0 0 1
# 6 0 0 1 0 0 0 0 0 0 1
# Plant Undiscovered Water1 Water2 Water3
# 1 1 0 0 0 0
# 2 1 0 0 0 0
# 3 1 0 0 0 0
# 4 0 0 0 0 0
# 5 0 0 0 0 0
# 6 0 0 0 0 0
You can use rbindlist() from data.table v1.9.5 as follows:
(Using #lukeA's example)
require(data.table) # 1.9.5+
dt = rbindlist(l, idcol="id")
# id x y
# 1: 1 a 1
# 2: 1 b 2
# 3: 2 b 2
# 4: 2 c 3
dcast(dt, id ~ x, fun.aggregate = length)
# id a b c
# 1: 1 1 1 0
# 2: 2 0 1 1
You can install it by following the instructions here.
Here's one way to do it:
(l <- list(data.frame(x = letters[1:2], y = 1:2), data.frame(x = letters[2:3], y = 2:3)))
# [[1]]
# x y
# 1 a 1
# 2 b 2
#
# [[2]]
# x y
# 1 b 2
# 2 c 3
df <- do.call(rbind, lapply(1:length(l), function(x) cbind(l[[x]], id = x) ))
# x y id
# 1 a 1 1
# 2 b 2 1
# 3 b 2 2
# 4 c 3 2
library(reshape2)
dcast(df, id~x, fun.aggregate = function(x) if (length(x)) "1" else "" )[-1]
# a b c
# 1 1 1
# 2 1 1
Related
Similar questions have been asked about counting pairs, however none seem to be specifically useful for what I'm trying to do.
What I want is to count the number of pairs across multiple list elements and turn it into a matrix. For example, if I have a list like so:
myList <- list(
a = c(2,4,6),
b = c(1,2,3,4),
c = c(1,2,5,7),
d = c(1,2,4,5,8)
)
We can see that the pair 1:2 appears 3 times (once each in a, b, and c). The pair 1:3 appears only once in b. The pair 1:4 appears 2 times (once each in b and d)... etc.
I would like to count the number of times a pair appears and then turn it into a symmetrical matrix. For example, my desired output would look something like the matrix I created manually (where each element of the matrix is the total count for that pair of values):
> myMatrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 3 1 2 2 0 1 1
[2,] 3 0 1 3 2 1 1 1
[3,] 1 1 0 1 0 0 0 0
[4,] 2 3 1 0 0 0 0 1
[5,] 2 2 0 0 0 0 1 1
[6,] 0 1 0 0 0 0 0 0
[7,] 1 1 0 0 1 0 0 0
[8,] 1 1 0 1 1 0 0 0
Any suggestions are greatly appreciated
Inspired by #akrun's answer, I think you can use a crossproduct to get this very quickly and simply:
out <- tcrossprod(table(stack(myList)))
diag(out) <- 0
# values
#values 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
Original answer:
Use combn to get the combinations, as well as reversing each combination.
Then convert to a data.frame and table the results.
tab <- lapply(myList, \(x) combn(x, m=2, FUN=\(cm) rbind(cm, rev(cm)), simplify=FALSE))
tab <- data.frame(do.call(rbind, unlist(tab, rec=FALSE)))
table(tab)
# X2
#X1 1 2 3 4 5 6 7 8
# 1 0 3 1 2 2 0 1 1
# 2 3 0 1 3 2 1 1 1
# 3 1 1 0 1 0 0 0 0
# 4 2 3 1 0 1 1 0 1
# 5 2 2 0 1 0 0 1 1
# 6 0 1 0 1 0 0 0 0
# 7 1 1 0 0 1 0 0 0
# 8 1 1 0 1 1 0 0 0
We could loop over the list, get the pairwise combinations with combn, stack it to a two column dataset, convert the 'values' column to factor with levels specified as 1 to 8, get the frequency count (table), do a cross product (crossprod), convert the output back to logical, and then Reduce the list elements by adding elementwise and finally assign the diagonal elements to 0. (If needed set the names attributes of dimnames to NULL
out <- Reduce(`+`, lapply(myList, function(x)
crossprod(table(transform(stack(setNames(
combn(x,
2, simplify = FALSE), combn(x, 2, paste, collapse="_"))),
values = factor(values, levels = 1:8))[2:1]))> 0))
diag(out) <- 0
names(dimnames(out)) <- NULL
-output
> out
1 2 3 4 5 6 7 8
1 0 3 1 2 2 0 1 1
2 3 0 1 3 2 1 1 1
3 1 1 0 1 0 0 0 0
4 2 3 1 0 1 1 0 1
5 2 2 0 1 0 0 1 1
6 0 1 0 1 0 0 0 0
7 1 1 0 0 1 0 0 0
8 1 1 0 1 1 0 0 0
I thought of a solution based on #TarJae answer, is not a elegant one, but it was a fun challenge!
Libraries
library(tidyverse)
Code
map_df(myList,function(x) as_tibble(t(combn(x,2)))) %>%
count(V1,V2) %>%
{. -> temp_df} %>%
bind_rows(
temp_df %>%
rename(V2 = V1, V1 = V2)
) %>%
full_join(
expand_grid(V1 = 1:8,V2 = 1:8)
) %>%
replace_na(replace = list(n = 0)) %>%
arrange(V2,V1) %>%
pivot_wider(names_from = V1,values_from = n) %>%
as.matrix()
Output
V2 1 2 3 4 5 6 7 8
[1,] 1 0 3 1 2 2 0 1 1
[2,] 2 3 0 1 3 2 1 1 1
[3,] 3 1 1 0 1 0 0 0 0
[4,] 4 2 3 1 0 1 1 0 1
[5,] 5 2 2 0 1 0 0 1 1
[6,] 6 0 1 0 1 0 0 0 0
[7,] 7 1 1 0 0 1 0 0 0
[8,] 8 1 1 0 1 1 0 0 0
First identify the possible combination of each vector from the list to a tibble then I bind them to one tibble and count the combinations.
library(tidyverse)
a <- as_tibble(t(combn(myList[[1]],2)))
b <- as_tibble(t(combn(myList[[2]],2)))
c <- as_tibble(t(combn(myList[[3]],2)))
d <- as_tibble(t(combn(myList[[4]],2)))
bind_rows(a,b,c,d) %>%
count(V1, V2)
V1 V2 n
<dbl> <dbl> <int>
1 1 2 3
2 1 3 1
3 1 4 2
4 1 5 2
5 1 7 1
6 1 8 1
7 2 3 1
8 2 4 3
9 2 5 2
10 2 6 1
11 2 7 1
12 2 8 1
13 3 4 1
14 4 5 1
15 4 6 1
16 4 8 1
17 5 7 1
18 5 8 1
I'm just starting to use R. I have a dataset with in the first column unique identifiers (1958 patients) and in columns 2-35 0's en 1's.
For example:
Patient A: 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NA NA
I want to change this to:
Patient A: 0 1 0 1 0 1
Thanks in advance.
We can use tapply and grouping our variable based on whether it changes value or not, i.e.
tapply(x[!is.na(x)], cumsum(c(TRUE, diff(x[!is.na(x)]) != 0)), FUN = unique)
#1 2 3 4 5 6
#0 1 0 1 0 1
Based on your example, it is not clear whether NA's can also occur in the middle, and how you would want to deal with that situation (e.g. make 1 NA 1 to 1 1 (option 1) and hence combine the two 1's, or whether NA would mark a boundary and you would keep both 1's (option 2).
That determines at which point to remove NA's in the code.
You could use S4Vectors run length encoding, which would allow you to have more than just 0 and 1.
library(S4Vectors)
## create example data
set.seed(1)
x <- sample(c(0,1), (1958*34), replace=TRUE, prob=c(.4, .6))
x[sample(length(x), 200)] <- NA
x <- matrix(x, nrow=1958, ncol=34)
df <- data.frame(patient.id = paste0("P", seq_len(1958)), x, stringsAsFactors = FALSE)
## define function to remove NA values
# option 1
fun.NA.boundary <- function(x) {
a <- runValue(Rle(x))
a[!is.na(a)]
}
# option 2
fun.NA.remove <- function(x) runValue(Rle(x[!is.na(x)]))
## calculate results
# option 1
reslist <- apply(x[,-1], 1, function(y) fun.NA.boundary(y))
# option 2
reslist <- apply(x[,-1], 1, function(y) fun.NA.remove(y))
names(reslist) <- df$patient.id
head(reslist)
#> $P1
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#>
#> $P2
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P3
#> [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P4
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P5
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#>
#> $P6
#> [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
In my data, I have 74 observations (rows) and 128 variables (columns), where each variable takes either 0 or 1 as value. In R, I am trying to write a code, where I can find in each row, the variables that has 1 as value and calculate 80% of the times 1 appears in each row. Pick those variables that has 80% of the times value as 1 and change the value from 1 to 0. I could write code, where I can calculate the 80% of times, 1 appears in each row, but I am not able to pick these variables in each row and change their value from 1 to 0.
data# data frame with 74 observations and 128 variables
row1 <- data[1,]
count1 <- length(which(data[1,] == 1)) # #number of 1 in row 1
print(count1)
perform <- 80/100*count1# 80% of count1
Below code works for one row:
test <- t(apply(data[1,], 1, function(x,n){
onesInX <- which(x==1)
# Randomly select 80% of 1 and change to 0
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
If specify all the rows, code is not working:
test <- t(apply(data[1:74,], 1, function(x,n){
onesInX <- which(x==1)
# Randomly select 80% of 1 and change to 0
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
Example of desired output:
original data frame
df
a b c d e f
1 1 1 1 1 1 1
2 1 0 1 1 0 1
3 1 1 1 0 1 1
When the code is applied to all the three rows in df, output should like this in all the three rows (80% of 1 replaced as 0):
a b c d e f
1 1 0 0 0 1 0
2 0 0 1 0 0 0
3 0 1 1 0 0 0
Thanks
Any suggestions
Thank you
Priya
A solution is to use apply row-wise and get indices where value is 1 using which. Afterwards, pick 80% of those indices (with value as 1) using sample and replace those to '0`.
t(apply(df, 1, function(x){
onesInX <- which(x==1)
# Randomly select 80% of 1 and change to 0
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
# a b c d e f
# [1,] 0 0 0 1 0 0
# [2,] 0 0 0 1 0 0
# [3,] 0 0 1 0 0 1
# [4,] 0 1 0 0 0 0
# [5,] 0 1 0 0 0 0
# [6,] 1 0 0 0 0 0
# [7,] 0 0 0 0 0 1
# [8,] 0 0 1 0 0 0
# [9,] 0 0 1 0 1 0
# [10,] 0 0 0 0 0 1
Sample Data:
set.seed(1)
df <- data.frame(a = sample(c(0,1,1,1), 10, replace = TRUE),
b = sample(c(0,1,1,1), 10, replace = TRUE),
c = sample(c(0,1,1,1), 10, replace = TRUE),
d = sample(c(0,1,1,1), 10, replace = TRUE),
e = sample(c(0,1,1,1), 10, replace = TRUE),
f = sample(c(0,1,1,1), 10, replace = TRUE))
df
# a b c d e f
# 1 1 0 1 1 1 1
# 2 1 0 0 1 1 1
# 3 1 1 1 1 1 1
# 4 1 1 0 0 1 0
# 5 0 1 1 1 1 0
# 6 1 1 1 1 1 0
# 7 1 1 0 1 0 1
# 8 1 1 1 0 1 1
# 9 1 1 1 1 1 1
# 10 0 1 1 1 1 1
# Answer on OP's data
t(apply(df1, 1, function(x){
onesInX <- which(x==1)
x[sample(onesInX, floor(length(onesInX)*.8))] <- 0
x
}))
# a b c d e f
# 1 1 1 0 0 0 0 <- .8*6 = 4.8 => 4 has been converted to 0
# 2 0 0 0 1 0 0 <- .8*5 = 4.0 => 4 has been converted to 0
# 3 0 1 0 0 0 0 <- .8*4 = 3.2 => 3 has been converted to 0
# Data from OP
df1 <- read.table(text="
a b c d e f
1 1 1 1 1 1 1
2 1 0 1 1 0 1
3 1 1 1 0 1 1",
header = TRUE)
df1
# a b c d e f
# 1 1 1 1 1 1 1 <- No of 1 = 6
# 2 1 0 1 1 0 1 <- No of 1 = 4
# 3 1 1 1 0 1 1 <- No of 1 = 5
I'm new to R and excited about all the possibilities of data management and presentation.
Actually I have a problem and did not find any solution:
I have built a data frame with:
require(BMS)
n = c(1, 2, 3, 4, 5, 6, 7, 8)
s = c("55aa55aa", "aa55aa55", "12345678", "9ABCDEF0", "55aa55aa", "aa55aa55", "12345678", "9ABCDEF0")
df = data.frame(n, s)
df$s <- as.character(df$s)
df
# n s
# 1 1 55aa55aa
# 2 2 aa55aa55
# 3 3 12345678
# 4 4 9ABCDEF0
# 5 5 55aa55aa
# 6 6 aa55aa55
# 7 7 12345678
# 8 8 9ABCDEF0
Column s is a 32bit hex value which I want to add as the real bit string to the data frame as new column sbin.
It should look like this afterwards:
df
# n s sbin
# 1 1 55aa55aa 01010101101010100101010110101010
# 2 2 aa55aa55 10101010010101011010101001010101
# 3 3 12345678 00010010001101000101011001111000
# 4 4 9ABCDEF0 .......
# 5 5 55aa55aa ......
# 6 6 aa55aa55
# 7 7 12345678
# 8 8 9ABCDEF0
For conversion I like to use the "hex2bin" function out of "BMS" package.
I tried this
lapply(df$s, hex2bin)
# [[1]]
# [1] 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0
# [[2]]
# [1] 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1
# [[3]]
# [1] 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0
# .....
but did not get the required output.
In the end I would like to access each bit in the data frame rows. So I would like to get 32 vectors with 8 bits each in this example.
How about this?
df$sbin <- sapply(df$s, FUN = function(x) { paste(hex2bin(x), collapse = "") })
# n s sbin
# 1 1 55aa55aa 01010101101010100101010110101010
# 2 2 aa55aa55 10101010010101011010101001010101
# 3 3 12345678 00010010001101000101011001111000
# 4 4 9ABCDEF0 10011010101111001101111011110000
# 5 5 55aa55aa 01010101101010100101010110101010
# 6 6 aa55aa55 10101010010101011010101001010101
# 7 7 12345678 00010010001101000101011001111000
# 8 8 9ABCDEF0 10011010101111001101111011110000
Suppose I have a column in a matrix or data.frame as follows:
df <- data.frame(col1=sample(letters[1:3], 10, TRUE))
I want to expand this out to multiple columns, one for each level in the column, with 0/1 entries indicating presence or absence of level for each row
newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
for (i in 1:length(levels(df$col1))) {
curLetter <- levels(df$col1)[i]
newdf[which(df$col1 == curLetter), curLetter] <- 1
}
newdf
I know there's a simple clever solution to this, but I can't figure out what it is.
I've tried expand.grid on df, which returns itself as is. Similarly melt in the reshape2 package on df returned df as is. I've also tried reshape but it complains about incorrect dimensions or undefined columns.
Obviously, model.matrix is the most direct candidate here, but here, I'll present three alternatives: table, lapply, and dcast (the last one since this question is tagged reshape2.
table
table(sequence(nrow(df)), df$col1)
#
# a b c
# 1 1 0 0
# 2 0 1 0
# 3 0 1 0
# 4 0 0 1
# 5 1 0 0
# 6 0 0 1
# 7 0 0 1
# 8 0 1 0
# 9 0 1 0
# 10 1 0 0
lapply
newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
newdf[] <- lapply(names(newdf), function(x)
{ newdf[[x]][df[,1] == x] <- 1; newdf[[x]] })
newdf
# a b c
# 1 1 0 0
# 2 0 1 0
# 3 0 1 0
# 4 0 0 1
# 5 1 0 0
# 6 0 0 1
# 7 0 0 1
# 8 0 1 0
# 9 0 1 0
# 10 1 0 0
dcast
library(reshape2)
dcast(df, sequence(nrow(df)) ~ df$col1, fun.aggregate=length, value.var = "col1")
# sequence(nrow(df)) a b c
# 1 1 1 0 0
# 2 2 0 1 0
# 3 3 0 1 0
# 4 4 0 0 1
# 5 5 1 0 0
# 6 6 0 0 1
# 7 7 0 0 1
# 8 8 0 1 0
# 9 9 0 1 0
# 10 10 1 0 0
It's very easy with model.matrix
model.matrix(~ df$col1 + 0)
The term + 0 means that the intercept is not included. Hence, you receive a dummy variable for each factor level.
The result:
df$col1a df$col1b df$col1c
1 0 0 1
2 0 1 0
3 0 0 1
4 1 0 0
5 0 1 0
6 1 0 0
7 1 0 0
8 0 1 0
9 1 0 0
10 0 1 0
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`df$col1`
[1] "contr.treatment"