How to automatically transform columns into objects in R? - r

I need that each column of my dataset become an object named with that column's name and containing its values as the object value.
I know how to do the process manually (see "Process question" heading below), but I need to automatize and generalize the process with as few rows as possible.
Example data
library(tibble)
df <- tibble(a = 1, b = 2, c = 3, d = 4)
Input
> df
# A tibble: 1 x 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 2 3 4
Process Question
How to automatize this part?
a <- df$a
b <- df$b
c <- df$c
d <- df$d
Output
> a;b;c;d
[1] 1
[1] 2
[1] 3
[1] 4

tibble/data.frame are list. So, we can use list2env (or use attach)
list2env(df, .GlobalEnv)
-checking
> a
[1] 1
> b
[1] 2

Related

R unpack a list of vectors in a dataframe

To start: I've seen this post and no, tidyr's unnest doesn't work here. I am doing an lapply where the returning function returns a list with named entries (see example func at the bottom for clarity):
ls <- lapply(x, func)
Now if I look at ls, it is a list of lists, and in the R studio data viewer it appears as having Name, Type, and Value columns.
Now, if I use
df <- bind_rows(ls)
I get exactly what I want, except I then need to bind the dataframe containing x to df. This is the problem, because for each x, func will return a variable number of rows, which means I need to run an equivalent of bind_rows after I have already attached ls to my dataframe.
An example is as below:
func <- function(x){
res <- list()
res$name <- 1:x
res$val <- 1:x
return(res)
}
df <- data.frame(nums <- c(1:3), letters <- c("A", "B", "C"))
ls <- lapply(df$nums, func)
bind_rows(ls) gives:
name val
<int> <int>
1 1 1
2 1 1
3 2 2
4 1 1
5 2 2
6 3 3
and the desired output is:
name val nums letters
<int> <int> <dbl> <chr>
1 1 1 1 A
2 1 1 2 B
3 2 2 2 B
4 1 1 3 C
5 2 2 3 C
6 3 3 3 C
Note that func here creates n rows given x = n. This is not the case for my actual function. func(n) can produce any positive number of rows.
Maybe you're looking for something more "canned", but you could write a function that would produce the desired output like this:
out <- function(data, varname){
l <- lapply(data[[varname]], func)
l <- lapply(1:length(l), function(x)do.call(data.frame, c(l[[x]], zz_obs=x)))
l <- do.call(rbind, l)
data$zz_obs <- 1:nrow(data)
if(!all(data$obs %in% l$obs))warning("Not all rows of data in output\n")
data <- dplyr::full_join(l, data, by="zz_obs")
data[,-which(names(data) == "zz_obs")]
}
out(df, "nums")
# name val nums letters
# 1 1 1 1 A
# 2 1 1 2 B
# 3 2 2 2 B
# 4 1 1 3 C
# 5 2 2 3 C
# 6 3 3 3 C
You can try mapply which is similar to lapply, but allows multiple vectors or lists to be passed to iterate over their values:
library(dplyr)
func <- function(x, y){
res <- list()
res$name <- 1:x
res$val <- 1:x
res$let <- rep(y, x)
return(res)
}
df <- data.frame(nums <- c(1:3), letters <- c("A", "B", "C"))
ls <- mapply(
func,
x = df$nums,
y =df$letters,
SIMPLIFY = FALSE
)
bind_rows(ls)
# A tibble: 6 x 3
# name val let
# <int> <int> <chr>
# 1 1 1 A
# 2 1 1 B
# 3 2 2 B
# 4 1 1 C
# 5 2 2 C
# 6 3 3 C
In the interim, the function I will be using to do this is:
merge_and_flatten <- function(x, y){
for (i in 1:nrow(x)){
y[[i]][names(x)] <- lapply(x[i, ], rep, times = length(y[[i]][[1]]))
}
return(bind_rows(y))
}
This is the cleanest solution I could come up with. Here, x serves and my df, and y serves as ls. It works by reducing the problem to bind_rows: it simply adds elements to ls which contain the columns in x. I absolutely want a cleaner solution, but this works for anyone who needs it.

Counting character values in R

I have a dataframe which looks like below.
data <- data.frame(Var_1 = c("A","B","C","A","B"))
Var_1
A
B
C
A
B
Need to do count like below.
Var_1 Count
A 2
B 2
C 1
A 2
B 2
# sample data
df <- data.frame(Var_1 = c("A","B","C","A","B"))
# make a frequency table to determine the "count"
countsDF <- table(df$Var_1)
# use names to match the Var_1 in the countsDF, then assign
# the corresponding count
df$count <- countsDF[match(df$Var_1, names(countsDF))]
I believe you can try using the table and storing it as a dataframe, from this data frame you can access the frequency of your data.
> data <- data.frame(Var_1 = c("A","B","C","A","B"))
> df <- as.data.frame(table(data))
> df$data
[1] A B C
Levels: A B C
> df$Freq
[1] 2 2 1
> df
data Freq
1 A 2
2 B 2
3 C 1
PS: I am not sure if you mean to repeat the 'levels' of your data as mentioned in your question, but unless it is case specific (which is not mentioned) I would take a repetitive class or level into consideration.

Find the index of the row in data frame that contain one element in a string vector

If I have a data.frame like this
df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14])
df
col1 col2 col3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 a 5 n
I want to get the row indices that contains one of the element in c("a", "k", "n"); in this example, the result should be 1, 2, 5.
If you have a large data frame and you wish to check all columns, try this
x <- c("a", "k", "n")
Reduce(union, lapply(x, function(a) which(rowSums(df == a) > 0)))
# [1] 1 5 2
and of course you can sort the end result.
s <- c('a','k','n');
which(df$col1%in%s|df$col3%in%s);
## [1] 1 2 5
Here's another solution. This one works on the entire data.frame, and happens to capture the search strings as element names (you can get rid of those via unname()):
sapply(s,function(s) which(apply(df==s,1,any))[1]);
## a k n
## 1 2 5
Original second solution:
sort(unique(rep(1:nrow(df),ncol(df))[as.matrix(df)%in%s]));
## [1] 1 2 5

R: change data frame structure using values from one variable as new variable

df1 <- data.frame(
name = c("a", "b", "b", "c"),
score = c(1, 1, 2, 1)
)
How can I get a new data frame with variables/columns from df$name and with each 'corresponding' df$score. I figure that its actually a two-step problem:
First I would need to make a list of (in this example) unequal length vectors like this:
$a
[1] 1
$b
[1] 1 2
$c
[1] 1
Second, NAs need to be padded so one get vectors of equal length before making the desired data frame
that would be like:
a b c
1 1 1 1
2 NA 2 NA
I cannot find any simple means to do this - Im sure there must be!
If the solution can be delivered using dplyr it would be fantastic! Thanks!
To split the data:
(s <- split(df1$score, df1$name))
# $a
# [1] 1
#
# $b
# [1] 1 2
#
# $c
# [1] 1
To create the new data frame:
as.data.frame(sapply(s, `length<-`, max(vapply(s, length, 1L))))
# a b c
# 1 1 1 1
# 2 NA 2 NA
Slightly more efficient would be to use vapply in place of sapply
len <- max(vapply(s, length, 1L))
as.data.frame(vapply(s, `length<-`, double(len), len))
# a b c
# 1 1 1 1
# 2 NA 2 NA

Difference between `names(df[1]) <- ` and `names(df)[1] <- `

Consider the following:
df <- data.frame(a = 1, b = 2, c = 3)
names(df[1]) <- "d" ## First method
## a b c
##1 1 2 3
names(df)[1] <- "d" ## Second method
## d b c
##1 1 2 3
Both methods didn't return an error, but the first didn't change the column name, while the second did.
I thought it has something to do with the fact that I'm operating only on a subset of df, but why, for example, the following works fine then?
df[1] <- 2
## a b c
##1 2 2 3
What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:
df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7
df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12
df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent
Which produces:
# a b
# 1 10 5
# 2 11 6
# 3 12 7
Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.
In the scenario:
names(df[2]) <- "x"
You can think of the assignment as follows (this is a simplification, see end of post for more detail):
tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7
names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7
df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7
The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.
But in the scenario:
names(df)[2] <- "x"
you can think of the assignment as (again, a simplification):
tmp <- names(df)
# [1] "a" "b"
tmp[2] <- "x"
# [1] "a" "x"
names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7
Notice how we directly assign to names, instead of assigning to df which ignores attributes.
df[2] <- 2
works because we are assigning directly to the values, not the attributes, so there are no problems here.
EDIT: based on some commentary from #AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):
Version 1 names(df[2]) <- "x" translates to:
df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )
Version 2 names(df)[2] <- "x" translates to:
df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )
Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks #Frank):
right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2

Resources