Conditional operations between factor level pairs

Conditional operations between factor level pairs - r

I have a dataframe (df1) that contains Start times and End times for observations of different IDs:
df <- structure(list(ID = 1:4, Start = c("2021-05-12 13:22:00", "2021-05-12 13:25:00", "2021-05-12 13:30:00", "2021-05-12 13:42:00"),
End = c("2021-05-13 8:15:00", "2021-05-13 8:17:00", "2021-05-13 8:19:00", "2021-05-13 8:12:00")),
class = "data.frame", row.names = c(NA,
-4L))
I want to create a new dataframe that shows the latest Start time and the earliest End time for each possible pairwise comparison between the levels ofID.
I was able to accomplish this by making a duplicate column of ID called ID2, using dplyr::expand to expand them, and saving that in an object called Pairs:
library(dplyr)
df$ID2 <- df$ID
Pairs <-
df%>%
expand(ID, ID2)
Making two new objects a and b that store the Start and End times for each comparison separately, and then combining them into df2:
a <- left_join(df, Pairs, by = 'ID')%>%
rename(StartID1 = Start, EndID1 = End, ID2 = ID2.y)%>%
select(-ID2.x)
b <- left_join(Pairs, df, by = "ID2")%>%
rename(StartID2 = Start, EndID2 = End)%>%
select(ID2, StartID2, EndID2)
df2 <- cbind(a,b)
df2 <- df2[,-4]
and finally using dplyr::if_else to find the LatestStart time and the EarliestEnd time for each of the comparisons:
df2 <-
df2%>%
mutate(LatestStart = if_else(StartID1 > StartID2, StartID1, StartID2),
EarliestEnd = if_else(EndID1 > EndID2, EndID2, EndID1))
This seems like such a simple task to perform, is there a more concise way to achieve this from df1 without creating all of these extra objects?

For such computations usually outer comes handy:
df %>%
mutate(across(c("Start", "End"), lubridate::ymd_hms)) %>%
{
data.frame(
ID1 = rep(.$ID, each = nrow(.)),
ID2 = rep(.$ID, nrow(.)),
LatestStart = outer(.$Start, .$Start, pmax),
LatestEnd = outer(.$End, .$End, pmin)
)
}

Related

How to rbind multiple dataframes with a while-loop?

I'm trying to rbind multiple loaded datasets (all of them have the same num. of columns, named "num", "source" and "target"). In case, I have ten dataframes, which names are "test1", "test2", "test3" and so on...
I thought that trying the solution below (creating an empty dataframe and looping through the others) would solve my problem, but I guess that I'm missing something in the second argument of the rbind function. I don't know if the solution using paste0("test", I) to increment the variable (changing the name of the dataframe) it's correct... I'm afraid that I'm just trying to rbind a dataframe with a string object (and getting an error), is that right?
test = as.data.frame(matrix(ncol = 3, nrow = 0)) %>%
setNames(c("num", "source", "target"))
i=1
while (i < 11) {
test = rbind(test, paste0("test", i))
i = i + 1
}

We need replicate to return as a list
out <- setNames(replicate(10, test, simplify = FALSE),
paste0("test", seq_len(10)))
If there are multiple datasets already created in the global env, get those in to a list and rbind within do.call
out <- do.call(rbind, mget(paste0("test", 1:10)))

We could bind test1:test10 using the common pattern in the name:
library(dplyr)
result <- mget(ls(pattern="^test\\d+")) %>%
bind_rows()

If I understood correctly, this might help you
Libraries
library(dplyr)
Example data
list_of_df <-
list(
df1 = data.frame(a = "1"),
df2 = data.frame(a = "2"),
df3 = data.frame(a = "1"),
df4 = data.frame(a = "2")
)
Code
bind_rows(list_of_df,.id = "dataset")
Result
dataset a
1 df1 1
2 df2 2
3 df3 1
4 df4 2

Filter rows in dataset for distinct words in r

Goal: To filter rows in dataset so that only distinct words remain At the moment, I have used inner_join to retain rows in 2 datasets which has made my rows in this dataset duplicate.
Attempt 1: I have tried to use distinct to retain only those rows which are unique, but this has not worked. I may be using it incorrectly.
This is my code so far; output attached in png format:
# join warriner emotion lemmas by `word` column in collocations data frame to see how many word matches there are
warriner2 <- dplyr::inner_join(warriner, coll, by = "word") # join data; retain only rows in both sets (works both ways)
warriner2 <- distinct(warriner2)
warriner2
coll2 <- dplyr::semi_join(coll, warriner, by = "word") # join all rows in a that have a match in b
# There are 8166 lemma matches (including double-ups)
# There are XXX unique lemma matches

You can try :
library(dplyr)
warriner2 <- inner_join(warriner, coll, by = "word") %>%
distinct(word, .keep_all = TRUE)

To even further clarify Ronak's answer, here is an example with some mock data. Note that you can just use distinct() at the end of the pipe to keep distinct columns if that's what you want. Your error might very well have occurred because you performed two operations, and assigned the result to the same name both times (warriner2).
library(dplyr)
# Here's a couple sample tibbles
name <- c("cat", "dog", "parakeet")
df1 <- tibble(
x = sample(5, 99, rep = TRUE),
y = sample(5, 99, rep = TRUE),
name = rep(name, times = 33))
df2 <- tibble(
x = sample(5, 99, rep = TRUE),
y = sample(5, 99, rep = TRUE),
name = rep(name, times = 33))
# It's much less confusing if you do this in one pipe
p <- df1 %>%
inner_join(df2, by = "name") %>%
distinct()

How do I aggregate data in R in a way that returns the entire row that satisfies the aggregation condition? [no dplyr]

I have data that looks like this:
ID FACTOR_VAR INT_VAR
1 CAT 1
1 DOG 0
I want to aggregate by ID such that the resulting dataframe contains the entire row that satisfies my aggregate condition. So if I aggregate by the max of INT_VAR, I want to return the whole first row:
ID FACTOR_VAR INT_VAR
1 CAT 1
The following will not work because FACTOR_VAR is a factor:
new_data <- aggregate(data[,c("ID", "FACTOR_VAR", "INT_VAR")], by=list(data$ID), fun=max)
How can I do this? I know dplyr has a group by function, but unfortunately I am working on a computer for which downloading packages takes a long time. So I'm looking for a way to do this with just vanilla R.

If you want to keep all the columns, use ave instead :
subset(df, as.logical(ave(INT_VAR, ID, FUN = function(x) x == max(x))))

You can use aggregate for this. If you want to retain all the columns, merge can be used with it.
merge(aggregate(INT_VAR ~ ID, data = df, max), df, all.x = T)
# ID INT_VAR FACTOR_VAR
#1 1 1 CAT
data
df <- structure(list(ID = c(1L, 1L), FACTOR_VAR = structure(1:2, .Label = c("CAT", "DOG"), class = "factor"), INT_VAR = 1:0), class = "data.frame", row.names = c(NA,-2L))

We can do this in dplyr
library(dplyr)
df %>%
group_by(ID)
filter(INT_VAR == max(INT_VAR))
Or using data.table
library(data.table)
setDT(df)[, .SD[INT_VAR == max(INT_VAR)], by = ID]

create(mutate) column with a condition of another one

I have this data
COL
AABC1
AAAABD2
AAAAAABF3
I would like to make a certain column like this:
COL NEW_COL
AABC1 T1
AAAABD2 T2
AAAAAABF3 T3
If COL contains 'BC', NEW_COL will be T1
contains 'BD', it will be T2
contains 'BF', it will be T3.
I would like to use mutate and grepl function but I have 80 conditions (like BC>T1) so that the code does not work in the R.
With the table like:
CLASS NEW_COL
BC T1
BD T2
BF T3
Could I use mutate(create) new column with above standard table ??

Here's your data:
DF <- data.frame(COL = c("AABC1",
"AAAABD2",
"AAAAABF3"),
stringsAsFactors = FALSE)
lookup_tbl <- data.frame(CLASS = c("BC", "BD", "BF"),
NEW_COL = c("T1", "T2", "T3"),
stringsAsFactors = FALSE)
Your problem is solved by a join, after some initial preparation.
To prepare DF, you need to add a column that extracts any instance of CLASS in the lookup table from COL in DF. Then you can join normally. In R:
library(dplyr)
DF %>%
mutate(CLASS = gsub(paste0("^.*(",
paste0(lookup_tbl[["CLASS"]], collapse = "|"),
").*$"),
"\\1",
lookup_tbl[["CLASS"]])) %>%
# or inner_join as required
left_join(lookup_tbl, by = "CLASS")
How the solution should behave COL matches zero or more than one instance in CLASS will need to be specified. The above handles both cases, but maybe not how you'd like.

You can create a lookup table with your 80 conditions and write a little function to match against it. Here's an example (normally, you'd read in lookup_table from file, I'm guessing):
library(tidyverse)
lookup_table <- data.frame(
row.names = c('BC', 'BD', 'BF'),
new_col = c('T1', 'T2', 'T3'),
stringsAsFactors = FALSE)
lookup <- function(x, table) {
for (class in rownames(table)) {
if (grepl(class, x)) {
return(table[class, 'new_col'])
}
}
}
data_frame(col = c('AABC1', 'AAAABD2', 'AAAAAABF3')) %>%
rowwise %>% mutate(new_col = lookup(col, lookup_table))
Note that this will take the first match it finds, so be sure your lookup table is ordered properly with respect to the priority you want to give the matching rules.

How to RBind First 4 Column one above Other with Tag

Below i have to tried to reproduce in representable Form
`v<- data.frame(C1TEMP = c(3,6,1,8,9,2,2,9,1,23),
C1VIB = c(5,6,1,8,9,2,2,9,1,23),
C1DE = c(9,6,1,8,9,2,2,9,1,23),
C1NDE = c(8,6,1,8,9,2,2,9,1,23),
C2TEMP = c(5,6,1,8,9,2,2,9,1,23),
C2VIB = c(378,6,1,8,9,2,2,9,1,23),
C2DE = c(3,78,1,8,9,2,2,9,1,23),
C2NDE = c(3,6,1,8,9,2,2,9,1,23),
C3TEMP= c(3,6,89,8,9,2,2,9,1,23),
C3VIB = c(3,6,1,98,9,2,2,9,1,23),
C3DE = c(33,56,91,82,99,12,22,19,81,23),
C3NDE = c(13,76,91,88,59,42,22,39,21,23))`
Here i want to rbind Every 4 column one above each Other with the tag No Along. And No of Columns will always be divisible of 4. I here with also Attaching an image for a clear picture what result should be expected.
EXPECTED OUTPUT:

I agree with YCR's comment. Still, this is a way to tackle your problem. Use the following code:
# data frames need column headers, so convert to matrix
v01 <- as.matrix(v[, 1:4])
v02 <- as.matrix(v[, 5:8])
v03 <- as.matrix(v[, 9:12])
# remove columnnames
colnames(v01) <- NULL
colnames(v02) <- NULL
colnames(v03) <- NULL
# now you can use rbind and give the columnnames back
v2 <- rbind( v01, v02, v03)
colnames(v2) <- c("C1TEMP", "C1VIB", "C1DE", "C1NDE")
v2

try this
It is a bit more convoluted than previous answers but it should be more adaptable to other data frames
# how many blocks have you got?
howMany <-table(gsub(names(v),pattern = "[0-9]",replacement = ""))[1]
# make a common name string
NAMES <- unique(gsub(names(v),pattern = "[0-9]",replacement = ""))
# create a list
list() -> V
for(i in 1:howMany){
# get the column with matching index number
v[,grep(names(v),pattern = i)] -> vi
names(vi) <- NAMES# change name
data.frame(Tag=i,vi) -> V[[i]]# put it in the list
}
# combine tables in the list into one list
do.call(rbind,V)
Nils

The melt and reshape way:
It implies to get an identifier per row:
v<- data.frame(C1TEMP = c(3,6,1,8,9,2,2,9,1,23),
C1VIB = c(5,6,1,8,9,2,2,9,1,23),
C1DE = c(9,6,1,8,9,2,2,9,1,23),
C1NDE = c(8,6,1,8,9,2,2,9,1,23),
C2TEMP = c(5,6,1,8,9,2,2,9,1,23),
C2VIB = c(378,6,1,8,9,2,2,9,1,23),
C2DE = c(3,78,1,8,9,2,2,9,1,23),
C2NDE = c(3,6,1,8,9,2,2,9,1,23),
C3TEMP= c(3,6,89,8,9,2,2,9,1,23),
C3VIB = c(3,6,1,98,9,2,2,9,1,23),
C3DE = c(33,56,91,82,99,12,22,19,81,23),
C3NDE = c(13,76,91,88,59,42,22,39,21,23),
id = 1:10
, stringsAsFactors = F)
library(tidyverse)
# melt the dataframe(reshape from wide to long format):
v_melt <- reshape2::melt(v, id.vars = "id")
# modify the aggregation variables
v_melt <- v_melt %>%
mutate(var = substr(as.character(variable), 3, 8),
group_id = paste0(substr(as.character(variable), 1, 2), "_", id))
# reshape the data frame in a wide format:
v_cast <- reshape2::dcast(v_melt, group_id ~ var, value.var = "value")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Conditional operations between factor level pairs - r

For such computations usually outer comes handy: df %>% mutate(across(c("Start", "End"), lubridate::ymd_hms)) %>% { data.frame( ID1 = rep(.$ID, each = nrow(.)), ID2 = rep(.$ID, nrow(.)), LatestStart = outer(.$Start, .$Start, pmax), LatestEnd = outer(.$End, .$End, pmin) ) }

Related

How to rbind multiple dataframes with a while-loop?

Filter rows in dataset for distinct words in r

How do I aggregate data in R in a way that returns the entire row that satisfies the aggregation condition? [no dplyr]

create(mutate) column with a condition of another one

How to RBind First 4 Column one above Other with Tag

Categories

Resources