Modify a data frame converting colnames into factor - r

I´m analyzing some data structured as "df" in the example and I need to convert it into something like the "example" object below:
a<- c(1:3)
b<- c(1:3)
c<- c(1:3)
df<- data.frame(a, b, c)
col1<- c("a","a","a", "b", "b", "b", "c", "c", "c")
col2<- rep(1:3,3)
example<- data.frame(col1, col2)

We can use pivot_longer
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything())

A quick base R solution is stack:
stack(df)
values ind
1 1 a
2 2 a
3 3 a
4 1 b
5 2 b
6 3 b
7 1 c
8 2 c
9 3 c

You can also use gather() from tidyr package
gather(df, colnames(df), key = "col1", value = "col2")
key and value serves as new column names in the resulting dataframe. Use in tidyverse syntax as follows
df %>%
gather(colnames(df), key = "col1", value = "col2")

Related

Produce all combinations of one element with all other elements within a single column in R

Suppose I have a data frame with a single column that contains letters a, b, c, d, e.
a
b
c
d
e
In R, is it possible to extract a single letter, such as 'a', and produce all possible paired combinations between 'a' and the other letters (with no duplications)? Could the combn command be used in this case?
a b
a c
a d
a e
We can use data.frame
data.frame(col1 = 'a', col2 = setdiff(df1$V1, "a"))
-ouptput
col1 col2
1 a b
2 a c
3 a d
4 a e
data
df1 <- structure(list(V1 = c("a", "b", "c", "d", "e")),
class = "data.frame", row.names = c(NA,
-5L))
Update:
With .before=1 argument the code is shorter :-)
df %>%
mutate(col_a = first(col1), .before=1) %>%
slice(-1)
With dplyr you can:
library(dplyr)
df %>%
mutate(col2 = first(col1)) %>%
slice(-1) %>%
select(col2, col1)
Output:
col2 col1
<chr> <chr>
1 a b
2 a c
3 a d
4 a e
You could use
expand.grid(x=df[1,], y=df[2:5,])
which returns
x y
1 a b
2 a c
3 a d
4 a e

tidyverse alternative to left_join & rows_update when two data frames differ in columns and rows

There might be a *_join version for this I'm missing here, but I have two data frames, where
The merging should happen in the first data frame, hence left_join
I not only want to add columns, but also update existing columns in the first data frame, more specifically: replace NA's in the first data frame by values in the second data frame
The second data frame contains more rows than the first one.
Condition #1 and #2 make left_join fail. Condition #3 makes rows_update fail. So I need to do some steps in between and am wondering if there's an easier solution to get the desired output.
x <- data.frame(id = c(1, 2, 3),
a = c("A", "B", NA))
id a
1 1 A
2 2 B
3 3 <NA>
y <- data.frame(id = c(1, 2, 3, 4),
a = c("A", "B", "C", "D"),
q = c("u", "v", "w", "x"))
id a q
1 1 A u
2 2 B v
3 3 C w
4 4 D x
and the desired output would be:
id a q
1 1 A u
2 2 B v
3 3 C w
I know I can achieve this with the following code, but it looks unnecessarily complicated to me. So is there maybe a more direct approach without having to do the intermediate pipes in the two commands below?
library(tidyverse)
x %>%
left_join(., y %>% select(id, q), by = c("id")) %>%
rows_update(., y %>% filter(id %in% x$id), by = "id")
You can left_join and use coalesce to replace missing values.
library(dplyr)
x %>%
left_join(y, by = 'id') %>%
transmute(id, a = coalesce(a.x, a.y), q)
# id a q
#1 1 A u
#2 2 B v
#3 3 C w

Organize subgroup strings (text)

I am trying to convert something like this df format:
df <- data.frame(first = c("a", "a", "b", "b", "b", "c"),
words =c("about", "among", "blue", "but", "both", "cat"))
df
first words
1 a about
2 a among
3 b blue
4 b but
5 b both
6 c cat
into the following format:
df1
first words
1 a about, among
2 b blue, but, both
3 c cat
>
I have tried
aggregate(words ~ first, data = df, FUN = list)
first words
1 a 1, 2
2 b 3, 5, 4
3 c 6
and tidyverse:
df %>%
group_by(first) %>%
group_rows()
Any suggestions would be appreciated!
A data.table solution:
library(data.table)
df <- data.frame(first = c("a", "a", "b", "b", "b", "c"),
words =c("about", "among", "blue", "but", "both", "cat"))
df <- setDT(df)[, lapply(.SD, toString), by = first]
df
# first words
# 1: a about, among
# 2: b blue, but, both
# 3: c cat
# convert back to a data.frame if you want
setDF(df)
Using tidyverse, after the group_by use summarise to either paste
library(dplyr)
df %>%
group_by(first) %>%
summarise(words = toString(words))
# A tibble: 3 x 2
# first words
# <fct> <chr>
#1 a about, among
#2 b blue, but, both
#3 c cat
or keep it as a list column
df %>%
group_by(first) %>%
summarise(words = list(words))

How do you clear column elements from an R data frame based off of other another columns elements in the same data frame?

I have the following data frame
>data.frame
col1 col2
A
x B
C
D
y E
I need a new data frame that looks like:
>new.data.frame
col1 col2
A
x
C
D
y
I just need a method for reading from col1 and if there is ANY characters in Col1 then clear corresponding row value of col2. I was thinking about using an if statement and data.table for this but am unsure of how to relay the information for deleting col2's values based on ANY characters being present in col1.
Something like this works:
# Create data frame
dat <- data.frame(col1=c(NA,"x", NA, NA, "y"), col2=c("A", "B", "C", "D", "E"))
# Create new data frame
dat_new <- dat
dat_new$col2[!is.na(dat_new$col1)] <- NA
# Check that it worked
dat
dat_new
This depends on what you mean by 'remove'. Here I'm assuming a blank string "". However, the same principle will apply for NAs
## create data frame
df <- data.frame(col1 = c("", "x", "","", "y"),
col2 = LETTERS[1:5],
stringsAsFactors = FALSE)
df
# col1 col2
# 1 A
# 2 x B
# 3 C
# 4 D
# 5 y E
## subset by blank values in col1, and replace the values in col2
df[df$col1 != "",]$col2 <- ""
## or df$col2[df$col1 != ""] <- ""
df
# col1 col2
# 1 A
# 2 x
# 3 C
# 4 D
# 5 y
And as you mentioned data.table, the code for this would be
library(data.table)
setDT(df)
## filter by blank entries in col1, and update col2 by-reference (:=)
df[col1 != "", col2 := ""]
df
Using dplyr
library(dplyr)
df %>%
mutate(col2 = replace(col2, col1!="", ""))
# col1 col2
#1 A
#2 x
#3 C
#4 D
#5 y

rbind tbl and df gives errors with filter

I am using dplyr and loving it, but found a strange behavior. I am cleaning some data from different sources and putting them together in a data frame. Part of it required more cleaning, done with dplyr and resulted in a tbl object. The other part was simpler, and I had a data.frame object. I rbind them together, and when I was doing analysis, trying to use dplyr filter function, it wouldn't work properly. Example:
df1 <- data.frame(
group = factor(rep(c("C", "G"), 5)),
value = 1:10)
df1 <- df1 %>% group_by(group) #df1 is now tbl
df2 <- data.frame(
group = factor(rep("G", 10)),
value = 11:20)
df3 <- rbind(df1, df2) #df2 is data.frame
df3 %>% filter(group == "C") #returns filtered rows in df1 and all rows of df2
Source: local data frame [15 x 2]
Groups: group
group value
1 C 1
2 C 3
3 C 5
4 C 7
5 C 9
6 G 11
7 G 12
8 G 13
9 G 14
10 G 15
11 G 16
12 G 17
13 G 18
14 G 19
15 G 20
If I do df3[df3$group == "C", ], it works properly. Bug?
It is because when you use the group_by on df1, its structure changes and operations are performed on it group-wise. When you do the rbind
df3 <- rbind(df1, df2)
R tries to create df3 with the same structure as of the first aregument i.e. df1 but since df1 and df2 are different types of dataframes, when you apply the filter it is applied groupwose only on df1 and results in the erratic output.
if you check
df3<-rbind(df2,df1)
df3 is a normal dataframe without groups and gives correct output.
you should delete the row 'df1 <- df1 %>% group_by(group) #df1 is now tbl'
if you want to change data.frame to tbl_df, you ought to use
df1<-tbl_df(df1)
df1 <- data.frame(
group = factor(rep(c("C", "G"), 5)),
value = 1:10)
# df1 <- df1 %>% group_by(group) #df1 is now tbl
# df1<-tbl_df(df1)
df2 <- data.frame(
group = factor(rep("G", 10)),
value = 11:20)
df3 <- rbind(df1, df2) #df2 is data.frame
df3 %>% filter(group == "C") #returns filtered rows in df1 and all rows of df2

Resources