Convert dataframe to unnamed list [duplicate] - r

This question already has an answer here:
mapply over two lists [closed]
(1 answer)
Closed 6 years ago.
I have a dataframe df looking like this
A B C D
1 78 12 43 12
2 23 12 42 13
3 14 42 11 99
4 49 94 27 72
I need the first two columns converted into a list which looks exactly like this:
[[1]]
[1] 78 12
[[2]]
[1] 23 12
[[3]]
[1] 14 42
[[4]]
[1] 49 94
Basically what
list(c(78, 12), c(23, 12), c(14, 42), c(49, 94)
would do. I tried this
lapply(as.list(1:dim(df)[1]), function(x) df[x[1],])
as well as
lapply(as.list(1:nrow(df)), function(x) df)
But thats slightly different.
Any suggestions?

You can try the Map:
Map(c, df$A, df$B)
[[1]]
[1] 78 12
[[2]]
[1] 23 12
[[3]]
[1] 14 42
[[4]]
[1] 49 94

In case this is of interest, it is possible to accomplish this with the foreach package:
library(foreach)
foreach(i=seq.int(nrow(df))) %do% (c(df[[i]][1], df[[i]][2]))
foreach returns a list by default. The code runs down the rows and pulls elements from the first and second columns.
An even cleaner to read version:
foreach(i=seq.int(nrow(df))) %do% (df[[i]][1:2])

Another option is with lapply
lapply(seq_len(nrow(df1)), function(i) unlist(df1[i, 1:2], use.names=FALSE))
#[[1]]
#[1] 78 12
#[[2]]
#[1] 23 12
#[[3]]
#[1] 14 42
#[[4]]
#[1] 49 94

Related

Convert vector of years to single string with range intervals [duplicate]

This question already has answers here:
Create grouping variable for consecutive sequences and split vector
(5 answers)
Closed 3 years ago.
If I have a vector as such:
dat <- c(1,2,3,4,5,19,20,21,56,80,81,92)
How can I break it up into a list as:
[[1]]
1 2 3 4 5
[[2]]
19 20 21
[[3]]
56
[[4]]
80 81
[[5]]
92
Just use split in conjunction with diff:
> split(dat, cumsum(c(1, diff(dat) != 1)))
$`1`
[1] 1 2 3 4 5
$`2`
[1] 19 20 21
$`3`
[1] 56
$`4`
[1] 80 81
$`5`
[1] 92
Not exactly what you asked for, but the "R.utils" package has a couple of related fun functions:
library(R.utils)
seqToIntervals(dat)
# from to
# [1,] 1 5
# [2,] 19 21
# [3,] 56 56
# [4,] 80 81
# [5,] 92 92
seqToHumanReadable(dat)
# [1] "1-5, 19-21, 56, 80-81, 92"
I think Robert Krzyzanowski is correct. So here is a tidyverse that involves placing the vector into a tibble (data frame).
library(tidyverse)
# library(dplyr)
# library(tidyr)
df <- c(1,2,3,4,5,19,20,21,56,80,81,92) %>%
tibble(dat = .)
# using lag()
df %>%
group_by(seq_id = cumsum(dat != lag(dat) + 1 | is.na(dat != lag(dat) + 1)) %>%
nest()
# using diff()
df %>%
group_by(seq_id = cumsum(c(1, diff(dat)) != 1)) %>%
nest()
Of course, you need not nest the resulting groups into list-columns, and can instead perform some kind of summary operation.

R : Convert nested list into a one level list [duplicate]

This question already has answers here:
How to flatten a list of lists?
(3 answers)
Closed 5 years ago.
I got the following y nested list :
x1=c(12,54,2)
x2=c(2,88,1)
x3=c(4,8)
y=list()
y[[1]]=x1
y[[2]]=list(x2,x3)
y
[[1]]
[1] 12 54 2
[[2]]
[[2]][[1]]
[1] 2 88 1
[[2]][[2]]
[1] 4 8
I would like to extract all elements from this nested list and put them into a one level list, so my expected result should be :
y_one_level_list
[[1]]
[1] 12 54 2
[[2]]
[1] 2 88 1
[[3]]
[1] 4 8
Obviously ma real problem involve a deeper nested list, how would you solve it? I tried rapply but I failed.
Try lapply together with rapply:
lapply(rapply(y, enquote, how="unlist"), eval)
#[[1]]
#[1] 12 54 2
#[[2]]
#[1] 2 88 1
#[[3]]
#[1] 4 8
It does work for deeper lists either.
You can try this:
flatten <- function(lst) {
do.call(c, lapply(lst, function(x) if(is.list(x)) flatten(x) else list(x)))
}
flatten(y)
#[[1]]
#[1] 12 54 2
#[[2]]
#[1] 2 88 1
#[[3]]
#[1] 4 8

Calculate in deeper list levels R

Imagine, I have list of two levels:
lll <- list()
lll[[1]] <- list(1:10, 1:5, 1:2)
lll[[2]] <- list(10:20, 20:30)
lll
[[1]]
[[1]][[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[1]][[2]]
[1] 1 2 3 4 5
[[1]][[3]]
[1] 1 2
[[2]]
[[2]][[1]]
[1] 10 11 12 13 14 15 16 17 18 19 20
[[2]][[2]]
[1] 20 21 22 23 24 25 26 27 28 29 30
I want calculate means of these sequences. I have written a little function, which works fine:
func <- function(list.list){
lapply(1:length(list.list), function(i) mean(list.list[[i]]))
}
lapply(lll, func)
I don't like in this function, that I have to use anonymous function.
It gets even more complicated when I have list of 3 levels.
Maybe you know better ways to make calculations in which anonymous function would not be included? Should I use higher-order functions (Map, Reduce)?
I know how to write for cycle, but in this case it isn't an option.
Here's a possible solution (using rapply = recursive apply) working at any level of depth :
lll <- list()
lll[[1]] <- list(1:10, 1:5, 1:2)
lll[[2]] <- list(10:20, 20:30)
res <- rapply(lll,mean,how='replace')
> res
[[1]]
[[1]][[1]]
[1] 5.5
[[1]][[2]]
[1] 3
[[1]][[3]]
[1] 1.5
[[2]]
[[2]][[1]]
[1] 15
[[2]][[2]]
[1] 25
Setting argument how='unlist' you will get :
res <- rapply(lll,mean,how='replace')
> res
[1] 5.5 3.0 1.5 15.0 25.0

Find consecutive values in vector in R [duplicate]

This question already has answers here:
Create grouping variable for consecutive sequences and split vector
(5 answers)
Closed 4 years ago.
If I have a vector as such:
dat <- c(1,2,3,4,5,19,20,21,56,80,81,92)
How can I break it up into a list as:
[[1]]
1 2 3 4 5
[[2]]
19 20 21
[[3]]
56
[[4]]
80 81
[[5]]
92
Just use split in conjunction with diff:
> split(dat, cumsum(c(1, diff(dat) != 1)))
$`1`
[1] 1 2 3 4 5
$`2`
[1] 19 20 21
$`3`
[1] 56
$`4`
[1] 80 81
$`5`
[1] 92
Not exactly what you asked for, but the "R.utils" package has a couple of related fun functions:
library(R.utils)
seqToIntervals(dat)
# from to
# [1,] 1 5
# [2,] 19 21
# [3,] 56 56
# [4,] 80 81
# [5,] 92 92
seqToHumanReadable(dat)
# [1] "1-5, 19-21, 56, 80-81, 92"
I think Robert Krzyzanowski is correct. So here is a tidyverse that involves placing the vector into a tibble (data frame).
library(tidyverse)
# library(dplyr)
# library(tidyr)
df <- c(1,2,3,4,5,19,20,21,56,80,81,92) %>%
tibble(dat = .)
# using lag()
df %>%
group_by(seq_id = cumsum(dat != lag(dat) + 1 | is.na(dat != lag(dat) + 1)) %>%
nest()
# using diff()
df %>%
group_by(seq_id = cumsum(c(1, diff(dat)) != 1)) %>%
nest()
Of course, you need not nest the resulting groups into list-columns, and can instead perform some kind of summary operation.

Keep column name when filtering matrix columns

I have a matrix, like the one generated with this code:
> m = matrix(data=c(1:50), nrow= 10, ncol = 5);
> colnames(m) = letters[1:5];
If I filter the columns, and the result have more than one column, the new matrix keeps the names. For example:
> m[, colnames(m) != "a"];
b c d e
[1,] 11 21 31 41
[2,] 12 22 32 42
[3,] 13 23 33 43
[4,] 14 24 34 44
[5,] 15 25 35 45
[6,] 16 26 36 46
[7,] 17 27 37 47
[8,] 18 28 38 48
[9,] 19 29 39 49
[10,] 20 30 40 50
Notice that here, the class is still matrix:
> class(m[, colnames(m) != "a"]);
[1] "matrix"
But, when the filter lets only one column, the result is a vector, (integer vector in this case) and the column name, is lost.
> m[, colnames(m) == "a"]
[1] 1 2 3 4 5 6 7 8 9 10
> class(m[, colnames(m) == "a"]);
[1] "integer"
The name of the column is very important.
I would like to keep both, matrix structure (a one column matrix) and the column's name.
But, the column's name is more important.
I already know how to solve this by the long way (by keeping track of every case). I'm wondering if there is an elegant, enlightening solution.
You need to set drop = FALSE. This is good practice for programatic use
drop
For matrices and arrays. If TRUE the result is coerced to the lowest possible dimension (see the examples)
m[,'a',drop=FALSE]
This will retain the names as well.
You can also use subset:
m.a = subset(m, select = colnames(m) == "a")

Resources