Ordering rows in a data.table in a specific order

Ordering rows in a data.table in a specific order - r

Suppose I have the following data.table in R:
col_1 col_2
c 1
c 1
d 1
b 1
a 1
b 1
How would I use data.table functions to reorder rows in the particular order, c("b", "c", "b", "c", "a", "d"), so that the resulting data table will be the following?
col_1 col_2
b 1
c 1
b 1
c 1
a 1
d 1

An option using make.unique:
x <- make.unique(c("b", "c", "b", "c", "a", "d"))
DT[match(x, make.unique(col_1))]
output:
col_1 col_2
1: b 1
2: c 1
3: b 1
4: c 1
5: a 1
6: d 1
Reference:
I came across make.unique here:
"Set Difference" between two vectors with duplicate values

Related

How to table two variables which contain listed observations?

Two of the variables in the df I am working with may contain multiple values per observation. I want to table the frequencies of these variables, but can't use table() on type 'list'... I've created a sample df below:
col_a <- c("a", "b", "c", "a,b", "b,c")
col_b <- c("c", "b", "a", "a,a", "a,c")
df <- data.frame(col_a, col_b)
df <- df %>%
mutate(col_a = strsplit(df$col_a, ","),
col_b = strsplit(df$col_b, ",")
)
This outputs:
col_a col_b
1 a c
2 b b
3 c a
4 c("a", "b") c("a", "a")
5 c("b", "c") c("a", "c")
Now, table(df$col_a, df$col_b) returns Error in order(y) : unimplemented type 'list' in 'orderVector1'. In order to table the variables, I want to unlist the concatenated observations so that it looks like this:
col_a col_b
1 a c
2 b b
3 c a
4 a a
5 a a
6 b a
7 b a
8 b a
9 b c
10 c a
11 c c
Any ideas on how to accomplish this?

We may use separate_rows on the original data
library(tidyr)
library(dplyr)
df %>%
separate_rows(col_a) %>%
separate_rows(col_b)
-output
# A tibble: 11 × 2
col_a col_b
<chr> <chr>
1 a c
2 b b
3 c a
4 a a
5 a a
6 b a
7 b a
8 b a
9 b c
10 c a
11 c c

Replace column based on column names

I have 65 columns, but a sample of data could be as follows:
df<-read.table (text=" Name D A D E
Rose D D C B
Smith B A D D
Lora A A D D
Javid A D D B
Ahmed C A E A
Helen B A D D
Nadia A A D A
", header=TRUE)
I want to get the following table:
Name D A D E
Rose 2 1 1 1
Smith 1 2 2 1
Lora 1 2 2 1
Javid 1 1 2 1
Ahmed 1 2 1 1
Helen 1 2 2 1
Nadia 1 2 2 1
The numbers follow the first raw. For example, the second column is D, so all Ds should read 2 and else should read 1. Or in the third column, which is A, all As should read 2 and else should read 1 and so on. Please consider I have 65 columns. I understand I should have different names for the columns, but In this case, I cannot change them as you understand it.

With ifelse and sapply:
df[2:ncol(df)] <- sapply(2:ncol(df), function(i) ifelse(df[i] == colnames(df[i]), 2, 1))
output
#> df
Name D A D E
1 Rose 2 1 1 1
2 Smith 1 2 2 1
3 Lora 1 2 2 1
4 Javid 1 1 2 1
5 Ahmed 1 2 1 1
6 Helen 1 2 2 1
7 Nadia 1 2 2 1
data
df <- structure(list(Name = c("Rose", "Smith", "Lora", "Javid", "Ahmed",
"Helen", "Nadia"), D = c("D", "B", "A", "A", "C", "B", "A"),
A = c("D", "A", "A", "D", "A", "A", "A"), D = c("C", "D",
"D", "D", "E", "D", "D"), E = c("B", "D", "D", "B", "A",
"D", "A")), class = "data.frame", row.names = c(NA, -7L))

cols = names(df)[-1]
df[cols] = lapply(cols, \(x) (df[[x]] == x) + 1L)
# Name D A
# 1 Rose 2 1
# 2 Smith 1 2
# 3 Lora 1 2
# 4 Javid 1 1
# 5 Ahmed 1 2
# 6 Helen 1 2
# 7 Nadia 1 2
Simplified data (without repeated column names)
df <- data.frame(
Name = c("Rose", "Smith", "Lora", "Javid", "Ahmed", "Helen", "Nadia"),
D = c("D", "B", "A", "A", "C", "B", "A"),
A = c("D", "A", "A", "D", "A", "A", "A")
)

Another approach, you can stack, replace and unstack, i.e
stack_df <- stack(df[-1])
stack_df$values <- ifelse(stack_df$values == stack_df$ind, 2, 1)
cbind.data.frame(Name = df$Name, unstack(stack_df))
# Name D A E
#1 Rose 2 1 1
#2 Smith 1 2 1
#3 Lora 1 2 1
#4 Javid 1 1 1
#5 Ahmed 1 2 1
#6 Helen 1 2 1
#7 Nadia 1 2 1
DATA
structure(list(Name = c("Rose", "Smith", "Lora", "Javid", "Ahmed",
"Helen", "Nadia"), D = c("D", "B", "A", "A", "C", "B", "A"),
A = c("D", "A", "A", "D", "A", "A", "A"), E = c("B", "D",
"D", "B", "A", "D", "A")), row.names = c(NA, -7L), class = "data.frame")

dplyr option with ifelse like this:
library(dplyr)
df %>%
mutate(across(D:E, ~ifelse(. == cur_column(), 2, 1)))
#> Name D A D.1 E
#> 1 Rose 2 1 1 1
#> 2 Smith 1 2 1 1
#> 3 Lora 1 2 1 1
#> 4 Javid 1 1 1 1
#> 5 Ahmed 1 2 1 1
#> 6 Helen 1 2 1 1
#> 7 Nadia 1 2 1 1
Created on 2022-09-19 with reprex v2.0.2

Using dplyr:
The data:
df <- read.table (
text = " Name A B C D
Rose D D C B
Smith B A D D
Lora A A D D
Javid A D D B
Ahmed C A E A
Helen B A D D
Nadia A A D A
",
header = TRUE
)
> df
Name A B C D
1 Rose D D C B
2 Smith B A D D
3 Lora A A D D
4 Javid A D D B
5 Ahmed C A E A
6 Helen B A D D
7 Nadia A A D A
Note that i changed the column names
df %>%
mutate(across(!c(Name),
.fns = ~ ifelse(.x == cur_column(), 2, 1)))
Name A B C D
1 Rose 1 1 2 1
2 Smith 1 1 1 2
3 Lora 2 1 1 2
4 Javid 2 1 1 1
5 Ahmed 1 1 1 1
6 Helen 1 1 1 2
7 Nadia 2 1 1 1
The mutate-command modifies columns in dataframes. Using the across()-function we specify that the mutation should be applied to more than one column. inside the across-call, we select every column but the name column using !c(Name). We then specify a function that compares the name of the column cur_column() with the values in the column .x. If they are the same, set the value to 2, else to 1.
EDIT: used ifelse instead of case_when as there is only one condition to check

You can compare each row with the column names. Adding 1 to the logical values converts FALSE and TRUE into 1 and 2 respectively.
df[-1] <- t((t(df[-1]) == names(df)[-1]) + 1)
df
# Name D A D E
# 1 Rose 2 1 1 1
# 2 Smith 1 2 2 1
# 3 Lora 1 2 2 1
# 4 Javid 1 1 2 1
# 5 Ahmed 1 2 1 1
# 6 Helen 1 2 2 1
# 7 Nadia 1 2 2 1

Splitting single column into four columns and count repeated pattern in R

Aim of this project is understand how information is acquired while looking into an object. Imagine an object has elements like a, b, c, d, e and f. A person might look at a and move onto to b and so forth. Now, we wish to plot and understand how that person have navigated across the different elements of a given stimuli. I have data that captured this movement in a single column but I need split this into few columns to get the navigation pattern. Please find the example given below.
I have column extracted from a data frame. Now it has to be split into four columns based on its characteristics.
a <- c( "a", "b", "b", "b", "a", "c", "a", "b", "d", "d", "d", "e", "f", "f", "e", "e", "f")
a <- as.data.frame(a)
Expected output
from to countfrom countto
a b 1 3
b a 3 1
a c 1 1
c a 1 1
a b 1 1
b d 1 3
d e 3 1
e f 1 2
f e 2 2
e f 2 1
Note: I used dplyr to extract from the dataframe.

Use rle to get the relative runs of each letter, and then piece it together:
r <- rle(a$a)
## or maybe `r <- rle(as.character(a$a)` depending on your R version
setNames(
data.frame(lapply(r, head, -1), lapply(r, tail, -1)),
c("countfrom","from","countto","to")
)
## countfrom from countto to
##1 1 a 3 b
##2 3 b 1 a
##3 1 a 1 c
##4 1 c 1 a
##5 1 a 1 b
##6 1 b 3 d
##7 3 d 1 e
##8 1 e 2 f
##9 2 f 2 e
##10 2 e 1 f

Or in the tidyverse
library(tidyverse)
a <- c( "a", "b", "b", "b", "a", "c", "a", "b", "d",
"d", "d", "e", "f", "f", "e", "e", "f")
foo <- rle(a)
answ <- tibble(from = foo$values, to = lead(foo$values),
fromCount = foo$lengths, toCount = lead(foo$lengths)) %>%
filter(!is.na(to))
# A tibble: 10 x 4
from to fromCount toCount
<chr> <chr> <int> <int>
1 a b 1 3
2 b a 3 1
3 a c 1 1
4 c a 1 1
5 a b 1 1
6 b d 1 3
7 d e 3 1
8 e f 1 2
9 f e 2 2
10 e f 2 1

Enumerate a grouping variable in a tibble

I would like to know how to use row_number or anything else to transform a variable group into a integer
tibble_test <- tibble(A = letters[1:10], group = c("A", "A", "A", "B", "B", "C", "C", "C", "C", "D"))
# to get the enumeration inside each group of 'group'
tibble_test %>%
group_by(group) %>%
mutate(G1 = row_number())
But I would like to have this output:
# A tibble: 10 x 4
A group G1 G2
<chr> <chr> <dbl> <dbl>
1 a A 1 1
2 b A 2 1
3 c A 3 1
4 d B 1 2
5 e B 2 2
6 f C 1 3
7 g C 2 3
8 h C 3 3
9 i C 4 3
10 j D 1 4
My question is: how to get this column G2, I know i could transform the 'group' var into a factor then integer (after the tibble is arranged) but I would like to know if it can be done using a counting.

You just need one more step and include the group indices with group_indices(). Be aware that how your data is arranged/sorted will affect the index.
library(dplyr)
tibble_test <- tibble(A = letters[1:10], group = c("A", "A", "A", "B", "B", "C", "C", "C", "C", "D"))
# to get the enumeration inside each group of 'group'
tibble_test %>%
group_by(group) %>%
mutate(G1 = row_number(),
G2 = group_indices())
# A tibble: 10 x 4
# Groups: group [4]
A group G1 G2
<chr> <chr> <int> <int>
1 a A 1 1
2 b A 2 1
3 c A 3 1
4 d B 1 2
5 e B 2 2
6 f C 1 3
7 g C 2 3
8 h C 3 3
9 i C 4 3
10 j D 1 4

How to add a boolean value to a column when 2 different dataframes match on 2 columns in R?

I have 2 different dataframes. I want add a column to my second dataframe and have it assigned a value 0 or 1. In the case where df1$code == df2$code & df1$date == df2$date I want a 0 for these rows. A visual and reproducible example maybe makes it more easy to understand.
df1 <- data.frame(code = c("A", "B", "C", "D"), date = c(1,2,3,4))
df2 <- data.frame(code = c("A", "B", "E", "R", "V", "F"), date = c(1,2,3,4,5,6))
df3 <- data.frame(code = c("A", "B", "E", "R", "V", "F"), date = c(1,2,3,4,5,6), value =c(1,1,0,0,0,0))
DF1
code date
1 A 1
2 B 2
3 C 3
4 D 4
DF2
code date
1 A 1
2 B 2
3 E 3
4 R 4
5 V 5
6 F 6
The resulting DF I want
code date value
1 A 1 1
2 B 2 1
3 E 3 0
4 R 4 0
5 V 5 0
6 F 6 0

We can use %in% to create a logical vector and then coerce it to binary with as.integer or +
df2$value <- +(df2$code %in% df1$code)
df2
# code date value
#1 A 1 1
#2 B 2 1
#3 E 3 0
#4 R 4 0
#5 V 5 0
#6 F 6 0

I would do it like this:
df2 %>% left_join(mutate(df1, value = 1)) %>%
mutate(value = coalesce(value, 0))
# Joining, by = c("code", "date")
# code date value
# 1 A 1 1
# 2 B 2 1
# 3 E 3 0
# 4 R 4 0
# 5 V 5 0
# 6 F 6 0

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Ordering rows in a data.table in a specific order - r

Suppose I have the following data.table in R: col_1 col_2 c 1 c 1 d 1 b 1 a 1 b 1 How would I use data.table functions to reorder rows in the particular order, c("b", "c", "b", "c", "a", "d"), so that the resulting data table will be the following? col_1 col_2 b 1 c 1 b 1 c 1 a 1 d 1

An option using make.unique: x <- make.unique(c("b", "c", "b", "c", "a", "d")) DT[match(x, make.unique(col_1))] output: col_1 col_2 1: b 1 2: c 1 3: b 1 4: c 1 5: a 1 6: d 1 Reference: I came across make.unique here: "Set Difference" between two vectors with duplicate values

Related

How to table two variables which contain listed observations?

Replace column based on column names

Splitting single column into four columns and count repeated pattern in R

Enumerate a grouping variable in a tibble

How to add a boolean value to a column when 2 different dataframes match on 2 columns in R?

Categories

Resources