Split dataframe columns into vectors in R - r

I have a dataframe as such:
Number <- c(1,2,3)
Number2 <- c(10,12,14)
Letter <- c("A","B","C")
df <- data.frame(Number,Number2,Letter)
I would like to split the df into its respective three columns, each one becoming a vector with the respective column name. In essence, the output should look exactly like the original three input vectors in the above example.
I have tried the split function and also using for loop, but without success.
Any ideas? Thank you.

We may use unclass as data.frame is a list with additional attributes. By unclassing, it removes the data.frame attribute
unclass(df)
Or another option is asplit with MARGIN specified as 2
asplit(df, 2)
NOTE: Both of them return a named list. If we intend to create new objects in the global env, use list2env (not recommended though)

We can use c oras.list
> c(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
> as.list(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"

Assuming you are trying to create these as vectors if the global environment, use list2env:
df <- data.frame(Number = c(1, 2, 3),
Number2 = c(10, 12, 14),
Letter = c("A", "B", "C"))
list2env(df, .GlobalEnv)
## <environment: R_GlobalEnv>
ls()
## [1] "df" "Letter" "Number" "Number2"

list2env is clearly the easiest way, but if you want to do it with a for loop it can also be achieved.
The "tricky" part is to make a new vector based on the column names inside the for loop. If you just write
names(df[i]) <- input
a vector will not be created.
A workaround is to use paste to create a string with the new vector name and what should be in it, then use "eval(parse(text=)" to evaluate this expression.
Maybe not the most elegant solution, but seems to work.
for (i in colnames(df)){
vector_name <- names(df[i])
expression_to_be_evaluated <- paste(vector_name, "<- df[[i]]")
eval(parse(text=expression_to_be_evaluated))
}
> Letter
[1] A B C
Levels: A B C
> Number
[1] 1 2 3
> Number2
[1] 10 12 14

Related

Indirectly defined objects to a list

I have a variable whose name and value are both determined dynamically. I want to append that variable to a list, so I need to express it in the form list(x). My difficulty is defining what x should be.
So, for example, if the value of the name is given by the variable a, and the value is b, I have both
a <- "name"
b <- 3
But then I get this result:
list(a = b)
$a
[1] 3
The value is correct but the name is not. I want the list to look behind the variable a to its current value, which is "name".
How do I do that, please?
Use lst from dplyr as the expression on the lhs of = is only evaluated literally and not the value stored int it.
library(dplyr)
lst(!! a:= b)
$name
[1] 3
Or with setNames/names<- from base R
setNames(list(b), a)
$name
[1] 3
`names<-`(list(b), a)
$name
[1] 3
Or create the list first and then rename
lst1 <- list(b)
names(lst1) <- a
We may use structure.
a <- 'name'; b <- 3
structure(as.list(b), names=a)
# $name
# [1] 3
It generalizes well for multiple pair-wise values.
a <- c('name1', 'name2'); b <- c(3, 4)
structure(as.list(b), names=a)
# $name1
# [1] 3
#
# $name2
# [1] 4

How to use grep to search for patterns matches within a list of data frames using a second list of character vectors in R

I have two lists in R. One is a list of data frames with rows that contain strings (List 1). The other is a list (of the same length) of characters (List 2). I would like to go through the lists in a parallel fashion taking the character string from List 2 and searching for it to get its position (using grep) in the data frame at the corresponding element in List 1. Here is a toy example to show what my lists look like:
List1 <- list(data.frame(a = c("other","other","dog")),
data.frame(a = c("cat","other","other")),
data.frame(a = c("other","other","bird")))
List2 <- list("a" = c("dog|xxx|xxx"),
"a" = c("cat|xxx|xxx"),
"a" = c("bird|xxx|xxx"))
The output I would like to get would be a list of the position in each data frame in List 1 of the pattern match i.e. in this example the positions would be 3, 1 & 3. So the list would be:
[[1]]
[1] 3
[[2]]
[1] 1
[[3]]
[1] 3
I cannot seem to figure out how to do this.
I tried lapply:
NewList1 <- lapply(1:length(List1),
function(x) grep(List2[[x]]))
But that does not work. I also tried purrr:map2:
NewList2<-map2(List2, List1, grep(List2$A, List1))
This also does not work. I would be very grateful of any suggestions anyone may have as to how to fix this. Many thanks to anyone willing to wade in!
Try Map + unlist
> Map(grep, List2, unlist(List1, recursive = FALSE))
$a
[1] 3
$a
[1] 1
$a
[1] 3
Using Map you can do -
Map(function(x, y) grep(y, x$a), List1, List2)
#[[1]]
#[1] 3
#[[2]]
#[1] 1
#[[3]]
#[1] 3
The map2 attempt was close but you need to refer lists as .x and .y in the function.
purrr::map2(List2, List1, ~grep(.x, .y$a))

Storing unique values of each column (of a df) in list

It is straight forward to obtain unique values of a column using unique. However, I am looking to do the same but for multiple columns in a dataframe and store them in a list, all using base R. Importantly, it is not combinations I need but simply unique values for each individual column. I currently have the below:
# dummy data
df = data.frame(a = LETTERS[1:4]
,b = 1:4)
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols)
{
x = unique(i)
unique_values_by_col[[i]] = x
}
The problem comes when displaying unique_values_by_col as it shows as empty. I believe the problem is i is being passed to the loop as a text not a variable.
Any help would be greatly appreciated. Thank you.
Why not avoid the for loop altogether using lapply:
lapply(df, unique)
Resulting in:
> $a
> [1] A B C D
> Levels: A B C D
> $b
> [1] 1 2 3 4
Or you have also apply that is specifically done to be run on column or line:
apply(df,2,unique)
result:
> apply(df,2,unique)
a b
[1,] "A" "1"
[2,] "B" "2"
[3,] "C" "3"
[4,] "D" "4"
thought if you want a list lapply return you a list so may be better
Your for loop is almost right, just needs one fix to work:
# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols) {
x = unique(df[[i]])
unique_values_by_col[[i]] = x
}
unique_values_by_col
# $a
# [1] A B C D
# Levels: A B C D
#
# $b
# [1] 1 2 3 4
i is just a character, the name of a column within df so unique(i) doesn't make sense.
Anyhow, the most standard way for this task is lapply() as shown by demirev.
Could this be what you're trying to do?
Map(unique,df)
Result:
$a
[1] A B C D
Levels: A B C D
$b
[1] 1 2 3 4

Make a named list from two columns with multiple values per name

I want to create a named list where each name has multiple values. I can only find how to do this if for each name there is one value. My solution that I am using now is
df <- data.frame(col1=c('a','a','b','b'), col2=c(1,2,3,4))
l <- list()
for(letter in unique(df$col1)){
l[[letter]] <- df[df$col1==letter,]$col2
}
> l
$a
[1] 1 2
$b
[1] 3 4
but what is a better way to do this?
We can use split to return a named list of vectors
split(df$col2, df$col1)

Paste column values together in a data frame

I am trying to paste together the rowname along with the data in the desired column. I wrote the following code but somehow couldnot find a way to do it correctly.
The desired output will be: "a,1,11" "b,2,22" "c,3,33"
x = data.frame(cbind(f1 = c(1,2,3), f2 = c(5,6,7), f3=c(11,22,33)), row.names= c('a','b','c'))
x
# f1 f2 f3
# a 1 5 11
# b 2 6 22
# c 3 7 33
do.call("paste", c(rownames(x), x[c('f1','f3')], sep=","))
# [1] "a,b,c,1,11" "a,b,c,2,22" "a,b,c,3,33"
Two main points:
Use apply instead of do.call(paste, .)
Use cbind instead of c in this case.
If you would rather use c, you would need to coerce the row names to a list or column first, eg: c(list(rownames(x)), x)
Try the following:
apply(cbind(rownames(x), x[c('f1','f3')]), 1, paste, collapse=",")
a b c
"a,1,11" "b,2,22" "c,3,33"
Your do.call instructs R to paste the list c(rownames(x), x[c('f1','f3')]) together. But take a look at your list.
> c(rownames(x), x[c('f1','f3')])
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
$f1
[1] 1 2 3
$f3
[1] 11 22 33
The c command takes the elements of each argument and joins them together. This properly deconstructs x[c('f1','f3')] but also deconstructs rownames(x) in a way you don't want. Obeying the standard recycling rule, paste then takes an item from each list element and patches them together with sep=",".
You could fix this by encapsulating rownames(x) inside a list structure so that your list of arguments comes out properly:
do.call("paste", c(list(rownames(x)), x[c('f1','f3')], sep=","))
No need for do.call or apply:
paste(rownames(x),x[[1]],x[[3]] , sep=",")
[1] "a,1,11" "b,2,22" "c,3,33"

Resources