Extract each column of a data frame into an object - r

I have a data frame with many columns, named foo, bar, etc.
I would like to extract each column of the data frame to separate objects called foo, bar and so on. Is there an automated way to do this in R?
Working example:
mock <- structure(list(
x = structure(1:3, .Label = c("1", "2", "3"), class = "factor"),
y = structure(1:3, .Label = c("A", "B", "C"), class = "factor"),
z = structure(c(1L, 1L, 2L), .Label = c("0", "1"), class = "factor")),
.Names = c("x", "y", "z"), row.names = c(NA, -3L), class = "data.frame")
Output:
> mock
x y z
1 1 A 0
2 2 B 0
3 3 C 1
How can I write a loop that creates objects x, y and z from the three columns of this data frame?

> for (i in 1:ncol(mock)) {
+ assign(names(mock)[i],mock[,i])
+ }
> x
[1] 1 2 3
Levels: 1 2 3
> y
[1] A B C
Levels: A B C
> z
[1] 0 0 1
Levels: 0 1
You should be careful with the use of assign, though. You can achieve almost the same result using attach(mock), which is reversible (detach()) and won't unintentionally overwrite existing variables (it just masks them).

Related

Creating new columns based on data in row separated by specific character in R

I've the following table
Owner
Pet
Housing_Type
A
Cats;Dog;Rabbit
3
B
Dog;Rabbit
2
C
Cats
2
D
Cats;Rabbit
3
E
Cats;Fish
1
The code is as follows:
Data_Pets = structure(list(Owner = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), Pets = structure(c(2L, 5L, 1L,4L, 3L), .Label = c("Cats ",
"Cats;Dog;Rabbit", "Cats;Fish","Cats;Rabbit", "Dog;Rabbit"), class = "factor"),
House_Type = c(3L,2L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA, -5L))
Can anyone advise me how I can create new columns based on the data in Pet column by creating a new column for each animal separated by ; to look like the following table?
Owner
Cats
Dog
Rabbit
Fish
Housing_Type
A
Y
Y
Y
N
3
B
N
Y
Y
N
2
C
N
Y
N
N
2
D
Y
N
Y
N
3
E
Y
N
N
Y
1
Thanks!
One approach is to define a helper function that matches for a specific animal, then bind the columns to the original frame.
Note that some wrangling is done to get rid of whitespace to identify the unique animals to query.
f <- Vectorize(function(string, match) {
ifelse(grepl(match, string), "Y", "N")
}, c("match"))
df %>%
bind_cols(
f(df$Pets, unique(unlist(strsplit(trimws(as.character(df$Pets)), ";"))))
)
Owner Pets House_Type Cats Dog Rabbit Fish
1 A Cats;Dog;Rabbit 3 Y Y Y N
2 B Dog;Rabbit 2 N Y Y N
3 C Cats 2 Y N N N
4 D Cats;Rabbit 3 Y N Y N
5 E Cats;Fish 1 Y N N Y
Or more generalized if you don't know for sure that the separator is ;, and whitespace is present, stringi is useful:
dplyr::bind_cols(
df,
f(df$Pets, unique(unlist(stringi::stri_extract_all_words(df$Pets))))
)
You can use separate_rows and pivot_wider from tidyr library:
library(tidyr)
library(dplyr)
Data_Pets %>%
separate_rows(Pets , sep = ";") %>%
mutate(Pets = trimws(Pets)) %>%
mutate(temp = row_number()) %>%
pivot_wider(names_from = Pets, values_from = temp) %>%
mutate(across(c(Cats:Fish), function(x) if_else(is.na(x), "N", "Y"))) %>%
dplyr::relocate(House_Type, .after = Fish)
which will generate:
# Owner Cats Dog Rabbit Fish House_Type
# <fct> <chr> <chr> <chr> <chr> <int>
# 1 A Y Y Y N 3
# 2 B N Y Y N 2
# 3 C Y N N N 2
# 4 D Y N Y N 3
# 5 E Y N N Y 1
Data:
Data_Pets = structure(list(Owner = structure(1:5, .Label = c("A", "B", "C", "D",
"E"), class = "factor"), Pets = structure(c(2L, 5L, 1L,4L, 3L), .Label = c("Cats ",
"Cats;Dog;Rabbit", "Cats;Fish","Cats;Rabbit", "Dog;Rabbit"), class = "factor"),
House_Type = c(3L,2L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA, -5L))

Distinct in dplyr does not work (sometimes)

I have the following data frame which I have obtained from a count. I have used dput to make the data frame available and then edited the data frame so there is a duplicate of A.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
print(df)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
Now I would like to take distinct on Procedure and only keep the first A.
df %>%
distinct(Procedure, .keep_all=TRUE)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
It does not work. Strange...
If we print the Procedure column, we can see that there are duplicated levels for a, which is problematic for the distinct function.
df$Procedure
[1] D A A C
Levels: A A C D -1
Warning message:
In print.factor(x) : duplicated level [2] in factor
One way to fix is to drop the factor levels. We can use factor function to achieve this. Another way is to convert the Procedure column to character.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
library(tidyverse)
df %>%
mutate(Procedure = factor(Procedure)) %>%
distinct(Procedure, .keep_all=TRUE)
# # A tibble: 3 x 2
# Procedure n
# <fct> <int>
# 1 D 10717
# 2 A 4412
# 3 C 1480
You have duplicated value in a label parameter .Label = c("A", "A", "C", "D", "-1"). That is an issue. Btw your way of initializing of a tibble seems to be very strange (i do not know exactly your goal but still)
Why not use
df <- tibble(
Procedure = c("D", "A", "A", "C"),
n = c(10717L, 4412L, 2058L, 1480L)
)

Change the column name of the dataframes in a list [duplicate]

This question already has answers here:
Changing Column Names in a List of Data Frames in R
(6 answers)
Closed 4 years ago.
I have the following list with multiple dataframes .
> dput(dfs)
structure(list(a = structure(list(x = 1:4, a = c(0.114304427057505,
0.202305722748861, 0.247671527322382, 0.897279736353084)), .Names = c("x",
"a"), row.names = c(NA, -4L), class = "data.frame"), b = structure(list(
x = 1:3, b = c(0.982652948237956, 0.694535500137135, 0.0617770322132856
)), .Names = c("x", "b"), row.names = c(NA, -3L), class = "data.frame"),
c = structure(list(x = 1:2, c = c(0.792271690675989, 0.997932326048613
)), .Names = c("x", "c"), row.names = c(NA, -2L), class = "data.frame")), .Names = c("a",
"b", "c"))
here i want change the first column name of each dataframe.
> dfs
$a
x a
1 1 0.1143044
2 2 0.2023057
3 3 0.2476715
4 4 0.8972797
$b
x b
1 1 0.98265295
2 2 0.69453550
3 3 0.06177703
$c
x c
1 1 0.7922717
2 2 0.9979323
I am using the following function
> lapply(dfs,function(x){ names(x)[1] <- 'sec';x})
$a
sec a
1 1 0.1143044
2 2 0.2023057
3 3 0.2476715
4 4 0.8972797
$b
sec b
1 1 0.98265295
2 2 0.69453550
3 3 0.06177703
$c
sec c
1 1 0.7922717
2 2 0.9979323
It's works but when i recall the original list ,the column names are not change.
How to assign to original list?
Thank you.
You have to assign the result of lapply to a variable, like this
dfs <- lapply(dfs,function(x){
names(x)[1] <- 'sec'
return(x)
})

Matching two list of unequal length

I am trying to match the values in 2 lists only where the variable names are the same between list. I would like the result to be a list the length of the longer list filled with count of total matches.
jac <- structure(list(s1 = "a", s2 = c("b", "c", "d"), s3 = 5),
.Names = c("s1", "s2", "s3"))
larger <- structure(list(s1 = structure(c(1L, 1L, 1L), .Label = "a", class = "factor"),
s2 = structure(c(2L, 1L, 3L), .Label = c("b", "c", "d"), class = "factor"),
s3 = c(1, 2, 7)), .Names = c("s1", "s2", "s3"), row.names = c(NA, -3L), class = "data.frame")
I am using mapply(FUN = pmatch, jac, larger) which gives me a correct total but not in the format that I would like below:
s1 s2 s3 s1result s2result s3result
a c 1 1 2 NA
a b 2 1 1 NA
a c 7 1 3 NA
However, I don't think pmatch will ensure the name matching in every situation so I wrote a function that I am still having issues with:
prodMatch <- function(jac,larger){
for(i in 1:nrow(larger)){
if(names(jac)[i] %in% names(larger[i])){
r[i] <- jac %in% larger[i]
r
}
}
}
Can anyone help out?
Another dataset that causes one to not be a multiple of the ohter:
larger2 <-
structure(list(s1 = structure(c(1L, 1L, 1L), class = "factor", .Label = "a"),
s2 = structure(c(1L, 1L, 1L), class = "factor", .Label = "c"),
s3 = c(1, 2, 7), s4 = c(8, 9, 10)), .Names = c("s1", "s2",
"s3", "s4"), row.names = c(NA, -3L), class = "data.frame")
mapply returns a list of matching index, you can convert it to a data frame simply using as.data.frame:
as.data.frame(mapply(match, jac, larger))
# s1 s2 s3
# 1 1 2 NA
# 2 1 1 NA
# 3 1 3 NA
And cbind the result with larger gives what you expected:
cbind(larger,
setNames(as.data.frame(mapply(match, jac, larger)),
paste(names(jac), "result", sep = "")))
# s1 s2 s3 s1result s2result s3result
#1 a c 1 1 2 NA
#2 a b 2 1 1 NA
#3 a d 7 1 3 NA
Update: To take care of the cases where the name of the two lists don't match, we can loop through the larger and it's name simultaneously and extract the elements from jac as follows:
as.data.frame(
mapply(function(col, name) {
m <- match(jac[[name]], col)
if(length(m) == 0) NA else m # if the name doesn't exist in jac return NA as well
}, larger, names(larger)))
# s1 s2 s3
#1 1 2 NA
#2 1 1 NA
#3 1 3 NA

List of data frames with names instead of numbers?

I am not sure if this question is too basic but as I haven't found an answer despite searching google for quite some time I have to ask here..
Suppose I want to create a list out of data frames (df1 and df2), how can I use the name of the data frame as the list "index"(?) instead of numbers? I.e., how do I get [[df1]] instead of [[1]] and [[df2]] instead of [[2]]?
list(structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA,
-10L), class = "data.frame"), structure(list(b = 1:10, a = 1:10), .Names = c("b",
"a"), row.names = c(NA, -10L), class = "data.frame"))
OK, entirely different way to ask this question to hopefully make things clearer ;)
I have three data frames
weguihl <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
raeg <- structure(list(b = 1:3, a = 1:3), .Names = c("b", "a"), row.names = c(NA, -3L), class = "data.frame")
awezilf <- structure(list(a = 1:3, b = 1:3), .Names = c("a", "b"), row.names = c(NA, -3L), class = "data.frame")
I want to create a list out of them..
li <- list(weguihl, raeg, awezilf)
But now I have the problem that - without remembering the order of the data frames - I do not know which data frame is which in the list..
> li
[[1]]
a b
1 1 1
2 2 2
3 3 3
[[2]]
b a
1 1 1
2 2 2
3 3 3
[[3]]
a b
1 1 1
2 2 2
3 3 3
Thus I'd prefer this output
> li
[[weguihl]]
a b
1 1 1
2 2 2
3 3 3
[[raeg]]
b a
1 1 1
2 2 2
3 3 3
[[awezilf]]
a b
1 1 1
2 2 2
3 3 3
How do I get there?
You could potentially achieving this with mget on a clean global environment.
Something like
Clean the global environment
rm(list = ls())
You data frames
weguihl <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
raeg <- structure(list(b = 1:10, a = 1:10), .Names = c("b", "a"), row.names = c(NA, -10L), class = "data.frame")
awezilf <- structure(list(a = 1:10, b = 1:10), .Names = c("a", "b"), row.names = c(NA, -10L), class = "data.frame")
Running mget which will return a list of data frames by default
li <- mget(ls(), .GlobalEnv)
li
# $awezilf
# a b
# 1 1 1
# 2 2 2
# 3 3 3
#
# $raeg
# b a
# 1 1 1
# 2 2 2
# 3 3 3
#
# $weguihl
# a b
# 1 1 1
# 2 2 2
# 3 3 3

Resources