How to transform table values to column headings - r

Im trying to manipulate my data such that the values in each column become column headings, and the values for each of these columns are their previous column name.
My table can be produced with the following code:
library(tidyverse)
df <- tibble("#" = c(1,2,3),
"1" = c("a","a","b"),
"2" = c("b","b","c"),
"3" = c("c","c","a"))
My current table looks like this:
# 1 2 3
1 a b c
2 a b c
3 b c a
And I want it to look like this:
# a b c
1 1 2 3
2 1 2 3
3 3 1 2

You could use order applied to each row of the tibble:
## new column names
names(df) <- c("#", sort(unlist(df[1, -1], use.names = FALSE)))
## apply order to each row of df
df[, -1] <- t(apply(df[, -1], 1, order))
df
#> # A tibble: 3 x 4
#> `#` a b c
#> <dbl> <int> <int> <int>
#> 1 1 1 2 3
#> 2 2 1 2 3
#> 3 3 3 1 2
Disclaimer: this assumes that each row contains a permutation of all available variables. If this is not the case, (e.g. twice the letter a in a single row), there would be an issue assigning individual values to each column.

Related

How to convert a column to a different type using NSE?

I'm writing a function that takes a data frame and a column names as arguments, and returns the data frame with the column indicated being transformed to character type. However, I'm stuck at the non-standard evaluation part of dplyr.
My current code:
df <- tibble(id = 1:5, value = 6:10)
col <- "id"
mutate(df, "{col}" := as.character({{ col }}))
# # A tibble: 5 x 2
# id value
# <chr> <int>
# 1 id 6
# 2 id 7
# 3 id 8
# 4 id 9
# 5 id 10
As you can see, instead of transforming the contents of the column to character type, the column values are replaced by the column names. {{ col }} isn't evaluated like I expected it to be. What I want is a dynamic equivalent of this:
mutate(df, id = as.character(id))
# # A tibble: 5 x 2
# id value
# <chr> <int>
# 1 1 6
# 2 2 7
# 3 3 8
# 4 4 9
# 5 5 10
I've tried to follow the instructions provided in dplyr's programming vignette, but I'm not finding a solution that works. What am I doing wrong?
Use the .data pronoun -
library(dplyr)
df <- tibble(id = 1:5, value = 6:10)
col <- "id"
mutate(df, "{col}" := as.character(.data[[col]]))
# id value
# <chr> <int>
#1 1 6
#2 2 7
#3 3 8
#4 4 9
#5 5 10
Some other alternatives -
mutate(df, "{col}" := as.character(get(col)))
mutate(df, "{col}" := as.character(!!sym(col)))
We may use across which can also do this on multiple columns
library(dplyr)
df %>%
mutate(across(all_of(col), as.character))
# A tibble: 5 x 2
id value
<chr> <int>
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
data
df <- tibble(id = 1:5, value = 6:10)
col <- "id"

How to detect class type and change in R

I have a dataset where the first line is the header, the second line is some explanatory data, and then rows 3 on are numbers. Because when I read in the data with this second explanatory row, the classes are automatically converted to factors (or I could put stringsasfactors=F).
What I would like to do is remove the second row, and have a function that goes through all columns and detects if they're just numbers and change the class type to the appropriate type. Is there something like that available? Perhaps using dplyr? I have many columns so I'd like to avoid manually reassigning them.
A simplified example below
> df <- data.frame(A = c("col 1",1,2,3,4,5), B = c("col 2",1,2,3,4,5))
> df
A B
1 col 1 col 2
2 1 1
3 2 2
4 3 3
5 4 4
6 5 5
if all the numbers are after the second line, then we can do so
library(tidyverse)
df[-1, ] %>% mutate_all(as.numeric)
depending on the task can be done this way
df <- tibble(A = c("col 1",1,2,3,4,5),
B = c("col 2",1,2,3,4,5),
C = c(letters[1:5], 6))
df[-1, ] %>% mutate_if(~ any(!is.na(as.numeric(.))), as.numeric)
A B C
<dbl> <dbl> <dbl>
1 1 1 NA
2 2 2 NA
3 3 3 NA
4 4 4 NA
5 5 5 6
or so
df[-1, ] %>% mutate_if(~ all(!is.na(as.numeric(.))), as.numeric)
A B C
<dbl> <dbl> <chr>
1 1 1 b
2 2 2 c
3 3 3 d
4 4 4 e
5 5 5 6
In base R, we can just do
df[-1] <- lapply(df[-1], as.numeric)

R reset counter based on two columns [duplicate]

This question already has an answer here:
R code to assign a sequence based off of multiple variables [duplicate]
(1 answer)
Closed 3 years ago.
I have following kind of data and i need output as the second data frame...
a <- c(1,1,1,1,2,2,2,2,2,2,2)
b <- c(1,1,1,2,3,3,3,3,4,5,6)
d <- c(1,2,3,4,1,2,3,4,5,6,7)
df <- as.data.frame(cbind(a,b,d))
output <- c(1,1,1,2,1,1,1,1,2,3,4)
df_output <- as.data.frame(cbind(df,output))
I have tried cumsum and I am not able to get the desired results. Please guide. Regards, Enthu.
based on column a value cahnges and if b is to be reset starting from one.
the condition is if b has same value it should start with 1.
Like in the 5th record, col b has value as 3. It should reset to 1 and if all the values if col b is same ( as the case from ro 6,6,7,8 is same , then it should be 1 and any change should increment by 1).
We can do a group by column 'a' and then create the new column with either match the unique values in 'b'
library(dplyr)
df2 <- df %>%
group_by(a) %>%
mutate(out = match(b, unique(b)))
df2
# A tibble: 11 x 4
# Groups: a [2]
# a b d out
# <dbl> <dbl> <dbl> <int>
# 1 1 1 1 1
# 2 1 1 2 1
# 3 1 1 3 1
# 4 1 2 4 2
# 5 2 3 1 1
# 6 2 3 2 1
# 7 2 3 3 1
# 8 2 3 4 1
# 9 2 4 5 2
#10 2 5 6 3
#11 2 6 7 4
Or another option is to coerce a factor variable to integer
df %>%
group_by(a) %>%
mutate(out = as.integer(factor(b)))
data
df <- data.frame(a, b, d)

Removing mirrored combinations of variables in a data frame

I'm looking to get each unique combination of two variables:
library(purrr)
cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`)
# A tibble: 6 x 2
id1 id2
<int> <int>
1 2 1
2 3 1
3 1 2
4 3 2
5 1 3
6 2 3
How do I remove out the mirrored combinations? That is, I want only one of rows 1 and 3 in the data frame above, only one of rows 2 and 5, and only one of rows 4 and 6. My desired output would be something like:
# A tibble: 3 x 2
id1 id2
<int> <int>
1 2 1
2 3 1
3 3 2
I don't care if a particular id value is in id1 or id2, so the below is just as acceptable as the output:
# A tibble: 3 x 2
id1 id2
<int> <int>
1 1 2
2 1 3
3 2 3
A tidyverse version of Dan's answer:
cross_df(list(id1 = seq_len(3), id2 = seq_len(3)), .filter = `==`) %>%
mutate(min = pmap_int(., min), max = pmap_int(., max)) %>% # Find the min and max in each row
unite(check, c(min, max), remove = FALSE) %>% # Combine them in a "check" variable
distinct(check, .keep_all = TRUE) %>% # Remove duplicates of the "check" variable
select(id1, id2)
# A tibble: 3 x 2
id1 id2
<int> <int>
1 2 1
2 3 1
3 3 2
A Base R approach:
# create a string with the sorted elements of the row
df$temp <- apply(df, 1, function(x) paste(sort(x), collapse=""))
# then you can simply keep rows with a unique sorted-string value
df[!duplicated(df$temp), 1:2]

Split dataframe based on value in column - loop over list of id's [duplicate]

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 3 years ago.
I'm trying to split a dataframe based on values in the id column.
what I have:
ids<-as.data.frame(c("a","a","a","b","b","b","c","c","c"))
unique_id<-unique(ids)
values<-as.data.frame(rep(1:3,3))
df<-as.data.frame(cbind(ids,values))
colnames(df)<-c("id","values")
and it looks like:
> df
id values
a 1
a 2
a 3
b 1
b 2
b 3
c 1
c 2
c 3
the code and error I'm getting is:
> for(id in unique_id){
+ paste0("value_for_",id)<-split(df, id = df$id)
+ }
Error in deparse(...) :
unused argument (id = c(1, 1, 1, 2, 2, 2, 3, 3, 3))
what I want:
value_for_a
id value
a 1
a 2
a 3
value_for_b
id value
b 1
b 2
b 3
value_for_c
id value
c 1
c 2
c 3
I feel this should be fairly straightforward, but I'm fresh out of ideas. I am not opposed to using more sophisticated methods than a for loop.
You can use nest() for this.
library(tidyr)
df%>%
group_by(id)%>%
nest()
# A tibble: 3 x 2
id data
<fctr> <list>
1 a <tibble [3 x 1]>
2 b <tibble [3 x 1]>
3 c <tibble [3 x 1]>
Each tibble contains the values you're interested in.
df%>%
group_by(id)%>%
nest()%>%
.$data
[[1]]
# A tibble: 3 x 1
values
<int>
1 1
2 2
3 3
[[2]]
# A tibble: 3 x 1
values
<int>
1 1
2 2
3 3
[[3]]
# A tibble: 3 x 1
values
<int>
1 1
2 2
3 3
I would recommend to split dataframe using split() function (there's function in R to do exactly what you want).
For example:
# Using OPs data
split(df, df$id)
Here you ask to split df by column id. Output of this function is list of df's.
$a
id values
1 a 1
2 a 2
3 a 3
$b
id values
4 b 1
5 b 2
6 b 3
$c
id values
7 c 1
8 c 2
9 c 3
You can get wanted names using this command:
myList <- split(df, df$id)
names(myList) <- paste0("value_for_", names(myList))

Resources