How to unlist a very messy list in R [duplicate] - r

This question already has answers here:
How to flatten a list to a list without coercion?
(7 answers)
Closed 7 years ago.
I have a very messy list with multiple levels in the form of:
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "D" "B" "A"
[[1]][[1]][[2]]
[1] "E" "B" "A"
[[1]][[2]]
[[1]][[2]][[1]]
[1] "D" "C" "A"
[[1]][[3]]
[[1]][[3]][[1]]
[1] "B" "D" "A"
....
[[5]][[2]][[2]]
[1] "D" "B" "E"
[[5]][[3]]
[1] "C" "E"
...
What is the easiest way to just get a list of the lowest level character vectors, so the first element would be "D""B""A" then the next would be "E""B""A" and so forth?
Thanks!
Edit:
Here's my list in dput format as requested. However, the nesting structure can change and the number of levels can increase. Thus any solution that works by using a presupposed number of levels is no good.
> dput(myResults)
list(list(list(c("D", "B", "A"), c("E", "B", "A")), list(c("D",
"C", "A")), list(c("B", "D", "A"), c("C", "D", "A"), c("E", "D",
"A")), list(c("B", "E", "A"), c("D", "E", "A"))), list(list(c("D",
"A", "B"), c("E", "A", "B")), c("C", "B"), list(c("A", "D", "B"
), c("E", "D", "B")), list(c("A", "E", "B"), c("D", "E", "B"))),
list(list(c("D", "A", "C")), c("B", "C"), list(c("A", "D",
"C")), c("E", "C")), list(list(c("B", "A", "D"), c("C", "A",
"D"), c("E", "A", "D")), list(c("A", "B", "D"), c("E", "B",
"D")), list(c("A", "C", "D")), list(c("A", "E", "D"), c("B",
"E", "D"))), list(list(c("B", "A", "E"), c("D", "A", "E")),
list(c("A", "B", "E"), c("D", "B", "E")), c("C", "E"),
list(c("A", "D", "E"), c("B", "D", "E"))))

Edit
There is a package rlist with a function list.flatten that does this
library(rlist)
list.flatten(yourLst)
A recursive solution (the order is changed though, ie. the leastly nested stuff comes out first)
unlst <- function(lst){
if (!any((inds <- sapply(lst, is.list)))) return(lst)
c(lst[!inds], unlst(unlist(lst[inds], rec=F)))
}

Try this function please.
unlist_messy_list <- function(cur_list){
if (is.atomic(cur_list)){
list(cur_list)
}else{
cl <- lapply(cur_list, unlist_messy_list)
Reduce(c, cl)
}
}
As you have not provided a sample data , I tested it with some cases made up by myself and it works.
unlist_messy_list(list())
unlist_messy_list(list(c(1,2,3), c(4,5,6), c(7,8,9)))
unlist_messy_list(list(c(1,2,3), list(c(4,5,6), c(7,8,9))))
unlist_messy_list(list(c(1,2,3), c(4,5,6), list(c(7,8,9), c(10,11,12))))
unlist_messy_list(list(c(1,2,3), list(c(4,5,6), c(7,8,9), list(10, c(11,12,13), 14, list(c(15,16))))))
I just tested it on your newly provided data, and it works fine. The output is (after dput):
list(c("D", "B", "A"), c("E", "B", "A"), c("D", "C", "A"), c("B", "D", "A"), c("C", "D", "A"), c("E", "D", "A"), c("B", "E", "A"), c("D", "E", "A"), c("D", "A", "B"), c("E", "A", "B"), c("C", "B"), c("A", "D", "B"), c("E", "D", "B"), c("A", "E", "B"), c("D", "E", "B"), c("D", "A", "C"), c("B", "C"), c("A", "D", "C"), c("E", "C"), c("B", "A", "D"), c("C", "A", "D"), c("E", "A", "D"), c("A", "B", "D"), c("E", "B", "D"), c("A", "C", "D"), c("A", "E", "D"), c("B", "E", "D"),c("B", "A", "E"), c("D", "A", "E"), c("A", "B", "E"), c("D", "B", "E"), c("C", "E"), c("A", "D", "E"), c("B", "D", "E"))

Related

Converting multiple columns to factors and releveling with mutate(across)

dat <- data.frame(Comp1Letter = c("A", "B", "D", "F", "U", "A*", "B", "C"),
Comp2Letter = c("B", "C", "E", "U", "A", "C", "A*", "E"),
Comp3Letter = c("D", "A", "C", "D", "F", "D", "C", "A"))
GradeLevels <- c("A*", "A", "B", "C", "D", "E", "F", "G", "U")
I have a dataframe that looks something like the above (but with many other columns I don't want to change).
The columns I am interested in changing contains lists of letter grades, but are currently character vectors and not in the right order.
I need to convert each of these columns into factors with the correct order. I've been able to get this to work using the code below:
factordat <-
dat %>%
mutate(Comp1Letter = factor(Comp1Letter, levels = GradeLevels)) %>%
mutate(Comp2Letter = factor(Comp2Letter, levels = GradeLevels)) %>%
mutate(Comp3Letter = factor(Comp3Letter, levels = GradeLevels))
However this is super verbose and chews up a lot of space.
Looking at some other questions, I've tried to use a combination of mutate() and across(), as seen below:
factordat <-
dat %>%
mutate(across(c(Comp1Letter, Comp2Letter, Comp3Letter) , factor(levels = GradeLetters)))
However when I do this the vectors remain character vectors.
Could someone please tell me what I'm doing wrong or offer another option?
You can do across as an anonymous function like this:
dat <- data.frame(Comp1Letter = c("A", "B", "D", "F", "U", "A*", "B", "C"),
Comp2Letter = c("B", "C", "E", "U", "A", "C", "A*", "E"),
Comp3Letter = c("D", "A", "C", "D", "F", "D", "C", "A"))
GradeLevels <- c("A*", "A", "B", "C", "D", "E", "F", "G", "U")
dat %>%
tibble::as_tibble() %>%
dplyr::mutate(dplyr::across(c(Comp1Letter, Comp2Letter, Comp3Letter) , ~forcats::parse_factor(., levels = GradeLevels)))
# # A tibble: 8 × 3
# Comp1Letter Comp2Letter Comp3Letter
# <fct> <fct> <fct>
# 1 A B D
# 2 B C A
# 3 D E C
# 4 F U D
# 5 U A F
# 6 A* C D
# 7 B A* C
# 8 C E A
You were close, all that was left to be done was make the factor function anonymous. That can be done either with ~ and . in tidyverse or function(x) and x in base R.

Repeating list elements in R

I have this list:
my_list <- list(V1 = c("A", "B", "C"), V2 = c("A", "B", "D"), V3 = c("A", "B", "E"), V4 = c("A", "B", "F"), V5 = c("A", "B", "G"))
I want to repeat each list element 4 times to get this list:
output <- list(V1 = c("A", "B", "C"), V2 = c("A", "B", "C"), V3 = c("A", "B", "C"), V4 = c("A", "B", "C"), V5 = c("A", "B", "D"), V6 = c("A", "B", "D"), V7 = c("A", "B", "D"), V8 = c("A", "B", "D"), V9 = c("A", "B", "E"), V10 = c("A", "B", "E"), V11 = c("A", "B", "E"), V12 = c("A", "B", "E"), V13 = c("A", "B", "F"), V14 = c("A", "B", "F"), V15 = c("A", "B", "F"), V16 = c("A", "B", "F"), V17 = c("A", "B", "G"), V18 = c("A", "B", "G"), V19 = c("A", "B", "G"), V20 = c("A", "B", "G"))
unlist(rep(list(my_list), each = 4), recursive = F) doesn't do the trick because it repeats the entire list 4 times instead of repeating each individual element 4 times.
If you specify the same index multiple times, R respects the duplication:
letters[c(1, 1, 1)]
[1] "a" "a" "a"
Therefore all we need is a set of indices like c(1, 1, 1, 1, 2, 2, 2, ...). We can create exactly this with rep's "each" argument, and then rename them with names and paste0:
my_list <- list(V1 = c("A", "B", "C"), V2 = c("A", "B", "D"), V3 = c("A", "B", "E"), V4 = c("A", "B", "F"), V5 = c("A", "B", "G"))
list_repeated <- my_list[rep(1:length(my_list), each = 4)]
names(list_repeated) <- paste0('V', 1:length(list_repeated))

How to Split an R List Containing Character Vectors of Varying Lengths Into Specific Columns?

I have some data in JSON format, that using jsonlite I was able to read into a data frame in R. The data I'm working with is in lists, where each list contains character vectors of different lengths. For example:
values
<list>
1 A
2 B
3 character(0)
4 C
5 c(A, C)
6 D
7 c(B, C)
8 c(D, E)
Or, to reproduce in full:
structure(list(values1 = list("C", "E", character(0), "C", character(0),
"C", c("D", "A"), c("D", "A"), "D", "D", character(0), "D",
"A", "E", "E", "A", "A", "A", "B", "A", "A", "A", "A", "D",
"E", "E", "A", character(0), "E", character(0), character(0),
"B", character(0), "C", "C", "C", "C", "C", character(0),
character(0), character(0), character(0), character(0), character(0),
character(0), character(0), "E", c("E", "D"), c("E", "D"),
"B", "E", "E", "A", "A", "B", "B", "B", "B", "B", "D", "D",
character(0), character(0), character(0), character(0), "B",
c("C", "A"), character(0), "A", "B", "B", "B", "B", "B",
"C", "C", character(0), character(0), character(0), character(0),
"E", "E", character(0), character(0), "B", "E", "A", "C",
"B", "C", "A", "C", "C", "C", "C", "C", "A", character(0),
"A", character(0), "A", "D", "B", "A", "C", "A", "A", "A",
"C", "A", "A", "B", "D", "D", character(0), character(0),
character(0), character(0), character(0), character(0), "C",
"B", character(0), "B", character(0), "B", "E", "D", c("C",
"E"), c("C", "E"), "D", "D", "C", "C", character(0), "C",
character(0), "C", "C", "D", "E", "E", "B", "B", "C", "C",
"B", "B", "E", character(0), character(0), character(0),
character(0), "B", "B", "E", "A", character(0), "B", "A",
character(0), "A", "D", "D", c("D", "A"), c("D", "A"), c("D",
"B"), c("D", "B"), character(0), "E", character(0), "E",
"E", "E", "E", character(0), "D", character(0), "E", "A",
"A", "A", "A", "A", "D", "D", c("B", "A"), c("B", "A"), "C",
character(0), character(0), "B", "E", "E", "B", c("E", "B"
), "A", "A", "B", "B", "D", "D", "A", "A", character(0),
"A", "C", character(0), "C", "C", "B", "B", "A", "A", "B",
"B", "A", "E", "C", "C", "D", "D", "D", c("C", "E"), character(0),
character(0), character(0), character(0), "E", c("E", "A"
), "E", character(0), character(0), "A", "D", "D", c("D",
"A"), c("D", "A"), character(0), character(0), character(0),
character(0), character(0), character(0), "B", "C", "C",
"C", "C", "B", "B", c("C", "E"), c("C", "E"), "E", "C", "C",
"C", c("E", "D", "B", "A"), c("E", "D", "B", "A"), character(0),
"A", character(0), "A", c("C", "A"), c("C", "A"), c("C",
"A"), "E", "E", "A", character(0), "C", c("E", "D"), c("E",
"D"), character(0), character(0), character(0), character(0),
"A", "A", "A", "A", "D", "E", c("C", "D"), "E", character(0),
character(0), character(0), "D", "D", character(0), "A",
"B", character(0), character(0), character(0), character(0),
"D", "D", "D", "E", "E", "D", "D", "B", "B", "B", "E", "D",
"C", "D", "C", "C", "E", "E", "A", character(0), character(0),
"B", character(0), "B", "B", "B", "B", character(0), "A",
"C", "C", "C", "D", "D", "D", character(0), "D", character(0),
"D", "B", "A", character(0), "B", "D", "A", "A", character(0),
"A", "D", "D", "E", "E", "B", character(0), character(0),
character(0), "C", "C", "C", "B", "B", "A", "D", c("C", "B"
), character(0), "D", "C", "C", character(0), character(0),
"D", "D", "D", c("B", "A"), "E", "A", "A", character(0),
"E", "C", "B", character(0), character(0), character(0),
character(0), "E", "E", "D", "C", "C", "E", "E", "E", "E",
character(0), "E", "E", "A", "B", "A", "A", "D", "E", "E",
"B", "B", character(0), character(0), "D", "D", "C", "D",
"D", "E", character(0), "E", character(0), "E", c("D", "B"
), character(0), "B", character(0), character(0), "D", character(0),
"D", "D", "D", "C", character(0), "E", "E", c("E", "B"),
c("E", "B"), "E", "E", "D", "D", "B", c("E", "A"), c("E",
"A"), c("C", "D"), c("C", "D"), c("C", "B"), c("C", "B"),
character(0), "C", "B"), values2 = list("C", "E", "C",
"C", "C", "C", c("D", "A"), c("D", "A"), "D", "D", "D", "D",
"A", "E", "E", "A", "A", "A", "B", "A", "A", "A", "A", "D",
"E", "E", "A", "E", "E", character(0), "B", "B", "C", "C",
"C", "C", "C", "C", c("E", "A"), c("E", "A"), c("E", "A"),
c("E", "A"), c("C", "A"), c("C", "A"), c("C", "A"), c("C",
"A"), "E", c("E", "D"), c("E", "D"), "B", "E", "E", "A",
"A", "B", "B", "B", "B", "B", "D", "D", c("C", "B"), c("C",
"B"), c("C", "B"), c("C", "B"), "B", c("C", "A"), character(0),
"A", "B", "B", "B", "B", "B", "C", "C", c("E", "D"), c("E",
"D"), c("E", "D"), c("E", "D"), "E", "E", character(0), character(0),
"B", "E", "A", "C", "B", "C", "A", "C", "C", "C", "C", "C",
"A", "A", "A", "A", "A", "D", "B", "A", "C", "A", "A", "A",
"C", "A", "A", "B", "D", "D", "E", "E", "E", "E", character(0),
character(0), "C", "B", "B", "B", "B", "B", "E", "D", c("C",
"E"), c("C", "E"), "D", "D", "C", "C", "C", "C", "C", "C",
"C", "D", "E", "E", "B", "B", "C", "C", "B", "B", "E", "B",
"B", "B", "B", "B", "B", "E", "A", "B", "B", "A", "A", "A",
"D", "D", c("D", "A"), c("D", "A"), c("D", "B"), c("D", "B"
), "E", "E", "E", "E", "E", "E", "E", "D", "D", "E", "E",
"A", "A", "A", "A", "A", "D", "D", c("B", "A"), c("B", "A"
), "C", character(0), character(0), "B", "E", "E", "B", c("E",
"B"), "A", "A", "B", "B", "D", "D", "A", "A", "A", "A", "C",
"C", "C", "C", "B", "B", "A", "A", "B", "B", "A", "E", "C",
"C", "D", "D", "D", c("C", "E"), "D", "D", "D", "D", "E",
c("E", "A"), "E", character(0), character(0), "A", "D", "D",
c("D", "A"), c("D", "A"), c("D", "A"), c("D", "A"), c("D",
"A"), c("D", "A"), c("D", "A"), c("D", "A"), "B", "C", "C",
"C", "C", "B", "B", c("C", "E"), c("C", "E"), "E", "C", "C",
"C", c("E", "D", "B", "A"), c("E", "D", "B", "A"), "A", "A",
"A", "A", c("C", "A"), c("C", "A"), c("C", "A"), "E", "E",
"A", "C", "C", c("E", "D"), c("E", "D"), "A", "A", "A", "A",
"A", "A", "A", "A", "D", "E", c("C", "D"), "E", character(0),
character(0), character(0), "D", "D", character(0), "A",
"B", c("D", "B"), c("D", "B"), c("D", "B"), c("D", "B"),
"D", "D", "D", "E", "E", "D", "D", "B", "B", "B", "E", "D",
"C", "D", "C", "C", "E", "E", "A", character(0), "B", "B",
"B", "B", "B", "B", "B", "A", "A", "C", "C", "C", "D", "D",
"D", "D", "D", "D", "D", "B", "A", "B", "B", "D", "A", "A",
"A", "A", "D", "D", "E", "E", "B", character(0), character(0),
character(0), "C", "C", "C", "B", "B", "A", "D", c("C", "B"
), "D", "D", "C", "C", character(0), "D", "D", "D", "D",
c("B", "A"), "E", "A", "A", character(0), "E", "C", "B",
"C", "C", "C", "C", "E", "E", "D", "C", "C", "E", "E", "E",
"E", "E", "E", "E", "A", "B", c("C", "E", "D", "B", "A"),
c("C", "E", "D", "B", "A"), "D", "E", "E", "B", "B", character(0),
character(0), "D", "D", "C", "D", "D", "E", "E", "E", "E",
"E", c("D", "B"), "B", "B", character(0), "D", "D", "D",
"D", "D", "D", "C", "E", "E", "E", c("E", "B"), c("E", "B"
), "E", "E", "D", "D", "B", c("E", "A"), c("E", "A"), c("C",
"D"), c("C", "D"), c("C", "B"), c("C", "B"), "C", "C", "B")), row.names = c(NA,
445L), class = "data.frame")
I would like to split this data up so that each value gets its own column:
1 2 3 4 5
<chr> <chr> <chr> <chr> <chr>
1 A
2 B
3
4 C
5 A C
6 D
7 B C
8 D E
Then, ultimately, get the data into a tidy format so that it's easy to filter by a column:
A B C D E
<logi> <logi> <logi> <logi> <logi>
1 TRUE FALSE FALSE FALSE FALSE
2 FALSE TRUE FALSE FALSE FALSE
3 FALSE FALSE FALSE FALSE FALSE
4 FALSE FALSE TRUE FALSE FALSE
5 TRUE FALSE TRUE FALSE FALSE
6 FALSE FALSE FALSE TRUE FALSE
7 FALSE TRUE TRUE FALSE FALSE
8 FALSE FALSE FALSE TRUE TRUE
That last step should be simple with mutate, it's the splitting I can't figure out. I'm aware of both tidyr separate and unnest_wider, but as far as I can tell those don't let me control which columns the vector is split into.
Assuming your data is something like this :
df <- structure(list(values = list("A", "B", character(0), "C", c("A",
"C"), "D", c("B", "C"), c("D", "E"))),
row.names = c(NA, -8L), class = "data.frame")
You can do :
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
unnest(values) %>%
complete(row = 1:max(row)) %>%
mutate(val = TRUE) %>%
pivot_wider(names_from = values, values_from = val, values_fill = FALSE) %>%
dplyr::select(-`NA`, -row)
# A B C D E
# <lgl> <lgl> <lgl> <lgl> <lgl>
#1 TRUE FALSE FALSE FALSE FALSE
#2 FALSE TRUE FALSE FALSE FALSE
#3 FALSE FALSE FALSE FALSE FALSE
#4 FALSE FALSE TRUE FALSE FALSE
#5 TRUE FALSE TRUE FALSE FALSE
#6 FALSE FALSE FALSE TRUE FALSE
#7 FALSE TRUE TRUE FALSE FALSE
#8 FALSE FALSE FALSE TRUE TRUE
Based on the dput, data, we can do
library(dplyr)
library(tidyr)
df1 %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn) %>%
unnest(value) %>%
pivot_wider(names_from = value, values_from = name,
values_fill = FALSE, values_fn = list(name = ~ length(.) > 0)) %>%
select(-rn)
# A tibble: 422 x 5
# C E D A B
# <lgl> <lgl> <lgl> <lgl> <lgl>
# 1 TRUE FALSE FALSE FALSE FALSE
# 2 FALSE TRUE FALSE FALSE FALSE
# 3 TRUE FALSE FALSE FALSE FALSE
# 4 TRUE FALSE FALSE FALSE FALSE
# 5 TRUE FALSE FALSE FALSE FALSE
# 6 TRUE FALSE FALSE FALSE FALSE
# 7 FALSE FALSE TRUE TRUE FALSE
# 8 FALSE FALSE TRUE TRUE FALSE
# 9 FALSE FALSE TRUE FALSE FALSE
#10 FALSE FALSE TRUE FALSE FALSE
# … with 412 more rows

Reorder dataset following Multiple qualitative criteria (Alphabetic Order)

I have a dataset in a list that I called reord composed of 521 vectors. Each vector has 14 elements. Only four columns (elements of each single vector) are important to sort this dataset.
I want to create a new dataset that follows the Alphabetic Order of these columns: reord[[i]][10] then reord[[i]][2] then reord[[i]][3] and then reord[[i]][6].
in the 10th column we have 9 levels: CAD,CHF,...,...SEK,USD
in the 2nd column we have 3 levels: A,D,Q
in the 3rd column we have 6 levels: A,D,I,R,S,T
in the 6th column we have 5 levels: A,B,C,K,U
Each of these column will set a group. The 10th column is the more important (Sets the main order), the 2nd column follows the 10th (Sets the second order), the 3rd column follows the 2nd (Sets the third order) and the 6th column follows the 3rd (Sets the fourth order).
An Example of the output index (for reorder the original dataset) I'm looking for:
CAD A A A
CAD A A B
CAD A A C
CAD A A K
...
CAD A D A
CAD A D B
CAD A D C
CAD A D K
...
...
CAD Q T U
CHF A A A
CHF A A B
...
...
...
USD Q T U
This is the dataset:
reord = list(c("H", "A", "A", "B", "5J", "A", "5J", "A", "TO1", "USD",
"A", "A", "3", "C"), c("H", "D", "R", "B", "5J", "C", "5J", "A",
"TO1", "CAD", "A", "A", "3", "C"), c("H", "A", "I", "B", "5J",
"A", "5J", "A", "TO1", "JPY", "A", "A", "3", "C"), c("H", "A",
"R", "B", "5J", "C", "5J", "A", "TO1", "$TO1+TO1-USD-EUR-JPY-GBP-CHF-CAD-SEK",
"A", "A", "3", "C"), c("H", "D", "I", "B", "5J", "U", "5J", "A",
"TO1", "JPY", "A", "A", "3", "C"), c("H", "A", "D", "B", "5J",
"C", "5J", "A", "TO1", "EUR", "A", "A", "3", "C"), c("H", "D",
"R", "B", "5J", "A", "5J", "A", "TO1", "SEK", "A", "A", "3",
"C"), c("H", "Q", "C", "B", "5J", "B", "5J", "A", "TO1", "USD",
"A", "A", "3", "A"), c("H", "D", "S", "B", "5J", "U", "5J", "A",
"TO1", "JPY", "A", "A", "3", "A"), c("H", "A", "R", "B", "5J",
"U", "5J", "A", "TO1", "SEK", "A", "A", "3", "C"), c("H", "A",
"R", "B", "5J", "B", "5J", "A", "TO1", "$TO1+TO1-USD-EUR-JPY-GBP-CHF-CAD-SEK",
"A", "A", "3", "C"), c("H", "A", "S", "B", "5J", "B", "5J", "A",
"TO1", "JPY", "A", "A", "3", "A"), c("H", "D", "D", "B", "5J",
"U", "5J", "A", "TO1", "JPY", "A", "A", "3", "C"), c("H", "D",
"S", "B", "5J", "A", "5J", "A", "TO1", "$TO1+TO1-USD-EUR-JPY-GBP-CHF-CAD-SEK",
"A", "A", "3", "A"), c("H", "D", "S", "B", "5J", "A", "5J", "A",
"TO1", "GBP", "A", "A", "3", "A"), c("H", "D", "I", "B", "5J",
"K", "5J", "A", "TO1", "CAD", "A", "A", "3", "C"), c("H", "D",
"R", "B", "5J", "K", "5J", "A", "TO1", "CHF", "A", "A", "3",
"C"), c("H", "A", "T", "B", "5J", "K", "5J", "A", "TO1", "JPY",
"A", "A", "3", "A"), c("H", "D", "T", "B", "5J", "U", "5J", "A",
"TO1", "CAD", "A", "A", "3", "A"), c("H", "Q", "C", "B", "5J",
"A", "5J", "A", "TO1", "USD", "A", "A", "3", "A"), c("H", "A",
"D", "B", "5J", "B", "5J", "A", "TO1", "EUR", "A", "A", "3",
"C"), c("H", "A", "S", "B", "5J", "C", "5J", "A", "TO1", "EUR",
"A", "A", "3", "A"), c("H", "D", "D", "B", "5J", "K", "5J", "A",
"TO1", "CAD", "A", "A", "3", "C"), c("H", "D", "R", "B", "5J",
"B", "5J", "A", "TO1", "SEK", "A", "A", "3", "C"), c("H", "D",
"R", "B", "5J", "K", "5J", "A", "TO1", "TO1", "A", "A", "3",
"C"), c("H", "D", "S", "B", "5J", "B", "5J", "A", "TO1", "$TO1+TO1-USD-EUR-JPY-GBP-CHF-CAD-SEK",
"A", "A", "3", "A"), c("H", "D", "S", "B", "5J", "C", "5J", "A",
"TO1", "$TO1+TO1-USD-EUR-JPY-GBP-CHF-CAD-SEK", "A", "A", "3",
"A"), c("H", "D", "I", "B", "5J", "A", "5J", "A", "TO1", "GBP",
"A", "A", "3", "C"), c("H", "A", "D", "B", "5J", "A", "5J", "A",
"TO1", "JPY", "A", "A", "3", "C"), c("H", "A", "D", "B", "5J",
"K", "5J", "A", "TO1", "$TO1+TO1-USD-EUR-JPY-GBP-CHF-CAD-SEK",
"A", "A", "3", "C"))
This are the grouped elements of each columns
##tenth column
tenth = sapply(reord, `[`, 10)
idx10 = split(seq_along(tenth), tenth)
##second column
second = sapply(reord, `[`, 2)
idx2 = split(seq_along(second), second)
##third column
third = sapply(reord, '[', 3)
idx3 = split(seq_along(third), third)
##sixth column
sixth = sapply(reord, '[', 6)
idx6 = split(seq_along(sixth), sixth)
How can I obtain this type of index for reorder the dataset? Thank You
The following function corresponds to the problem description.
If the columns to be ordered are not columns 10, 2, 3, 6 like in the question, override the default cols argument.
fun_order <- function(X, cols = c(10, 2, 3, 6)){
X <- do.call(rbind.data.frame, X)
X[] <- lapply(X, as.character)
names(X) <- seq_along(X)
i <- do.call(order, X[cols])
outcols <- c(cols, seq_len(ncol(X))[-cols])
Y <- X[i, outcols]
row.names(Y) <- NULL
list(index = i, cols = cols, data = Y)
}
fun_order(reord)

R - Bin according to factor

I have a dataset where I'd like to run classIntervals(df$vol, 3, style="jenks") for every group and type combination within it.
The data looks somewhat like this.
data_sam <- data.frame( "group"=c( "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"
), "type"=c( "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B" ), "index"=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
), "vol"=c(52,272,374,408,498,480,451,644,715,659,820,713,810,676,840,589,594,998,782,483,351,494,377,261,637,379,706,530,619,724,333,189,246,82,39,85,159,143,125,118,79,39,110,190,264,101,70,46,0,27,71,69,172,464,132,0,156,167,142,45,51,10,0,14,67,20,2,12,1,0,6,2,2,17,22,7,0,2,9,5,12,15,7,0,12,18,4,3,12,9,12,13,14,8,9,11,10,5,4,1,4,10,4,4,3,5,5,1,3,0,2,3,2,4,2,3,3,0,0,1,1,1,0,0,1,1,2,0,1,1,0,1,1,0,0,1,0,0,0,0,1,2,0,1,1
))
I would like to be able to see the bin results per group-type.
As per the data above, the following results are what should I get when I run classIntervals:
group A - type A
style: jenks
one of 2,628 possible partitions of this variable into 3 classes
[0,190] (190,530] (530,998]
53 17 16
group A - type B
style: jenks
one of 66 possible partitions of this variable into 3 classes
[0,2] (2,5] (5,14]
34 15 10
Is there a way that I can loop through the group types within data_sam for the bins? And, ideally view the results into a data.frame formatted in the following way.
group type count1 count2 count3 boundary1 boundary2 boundary3
A A 53 17 16 [0,190] (190,530] (530,998]
A B 34 15 10 [0,2] (2,5] (5,14]
Alternatively, I'm happy to see even the breaks within each group attached to every row on the data_sam.
I'm not sure what's possible here so please let me know.
Consider by, the object-oriented wrapper to tapply to run operations on subsets of factor(s). Specifically, you can have return a list of data frames to be binded together at end.
Below extracts the brks object of return value from the classIntervals call as docs mention, assumed to be a named vector where names are boundaries and values are counts.
df_list <- by(df, df[,c("group", "type")], function(sub) {
tryCatch({
res <- classIntervals(sub$vol, n=3, style="jenks")$brks
data.frame(group = sub$group[1],
type = sub$type[1],
count1 = res[1],
count2 = res[2],
count3 = res[3],
boundary1 = names(res)[1],
boundary2 = names(res)[2],
boundary3 = names(res)[3])
}, error = function(e) NA
)
})
final_df <- do.call(rbind, df_list)

Resources