Adding 2 DF's of Different Sizes Together

Adding 2 DF's of Different Sizes Together - r

I have two DF's:
passesComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 0, 2), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 1, 0, 0, 1), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 1, 1), `Good Pass` = c(2,
2, 1, 1, 0, 3, 6), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1, 0
), Turnover = c(0, 0, 0, 0, 0, 1, 0), totalEvents = c(2, 2, 2,
1, 0, 6, 7)), row.names = c("P1", "P2", "P3", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
of size 7x15, and
copyComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 2), P4_Good = c(0, 0, 0, 0, 0, 0), P5_Good = c(0,
0, 1, 0, 0, 1), P1_Bad = c(0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0,
0, 0, 0, 0), P3_Bad = c(0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 0,
0, 0, 0), P5_Bad = c(0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0,
0, 0, 1, 0), `Good Pass` = c(2, 1, 1, 0, 3, 4), `Intercepted Pass` = c(0,
0, 0, 0, 1, 0), Turnover = c(0, 0, 0, 0, 1, 0), totalEvents = c(2,
1, 1, 0, 6, 4)), row.names = c("P1", "P2", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
or simply,
copyComb <- passesComb
copyComb <- copyComb[-3,-3]
#Updating specific cells since [3,3] is removed
copyComb[2,11] <- 1
copyComb[2,14] <- 1
copyComb[6,8] <- 0
copyComb[6,3] <- 0
copyComb[6,10] <- 0
copyComb[6,11] <- 4
copyComb[6,14] <- 4
#This now equals the copyComb from dput() above
of size 6x14.
I am trying to combine/add these two df's together based on matching row/column names. I tried to achieve this using the code from the answer to this post
gamesComb <- data.frame(matrix(NA, nrow = ifelse(nrow(passesComb) >= nrow(copyComb), nrow(passesComb),nrow(copyComb)),
ncol = ifelse(ncol(passesComb) >= ncol(copyComb), ncol(passesComb),ncol(copyComb))))
gamesComb[row.names(ifelse(nrow(passesComb) >= nrow(copyComb), passesComb, copyComb)),
colnames(ifelse(ncol(passesComb) >= ncol(copyComb), passesComb, copyComb))] <- passesComb
Here, I create a df, gamesComb and set the dimensions of whichever passesComb or copyComb is bigger. It does create a 7x15 df, but doesn't add the row/col names.
I also am trying to then add the 2 df's together based on the cell value if they have the same row/col name (same as in the post link above), i.e. passesComb["P2","P1_Good"] = 1 and copyComb["P2","P1_Good"] = 1, so gamesComb["P2","P1_Good"] should = 2, and same for all similar row/col names.
So the final result look like:
expectedOutput <- structure(list(P1_Good = c(0, 2, 0, 0, 0, 0, 2), P2_Good = c(4,
0, 0, 0, 0, 0, 4), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 2, 0, 0, 2), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 2, 1), `Good Pass` = c(4,
3, 1, 2, 0, 6, 10), `Intercepted Pass` = c(0, 0, 0, 0, 0, 2,
0), Turnover = c(0, 0, 0, 0, 0, 2, 0), totalEvents = c(4, 3,
2, 2, 0, 12, 11)), row.names = c("P1", "P2", "P3", "P4", "P5",
"Opponent", "VT"), class = "data.frame")

Here's a dplyr/tidyr approach where I reshape each table into a long format, then join them, sum, and pivot wider again.
library(dplyr); library(tidyr)
lengthen <- function(df) { df %>% rownames_to_column(var = "row") %>% pivot_longer(-row)}
full_join(lengthen(passesComb), lengthen(copyComb), by = c("row", "name")) %>%
mutate(new_val = coalesce(value.x, 0) + coalesce(value.y, 0)) %>%
select(-starts_with("value")) %>%
pivot_wider(names_from = name,values_from = new_val)

Another option is to stack them and then sum by rowname groups.
library(dplyr, warn.conflicts = FALSE)
library(tibble)
out <-
rownames_to_column(passesComb) %>%
bind_rows(rownames_to_column(copyComb)) %>%
# bind_rows(rownames_to_column(third_table)) %>% if you want to add another
select(rowname, names(passesComb)) %>%
group_by(rowname) %>%
summarise(across(everything(), sum, na.rm = T)) %>%
slice(match(rownames(passesComb), rowname)) %>%
column_to_rownames('rowname')
all.equal(out, expectedOutput)
#> [1] TRUE
Created on 2021-10-09 by the reprex package (v2.0.1)

Related

Aggregate similar constructs/ FA with binary variables

I would like to aggregate, in order to reduce the number of constructs, its following data frame containing only binary variables that correspond to "yes/no", its following data frame (first 10 row). The original data frame contains 169 rows.
outcome <-
structure(list(Q9_Automazione.processi = c(0, 0, 0, 0, 0, 0,
1, 1, 1, 0), Q9_Velocita.Prod = c(1, 0, 0, 1, 0, 0, 1, 1, 1,
0), Q9_Flessibilita.Prod = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1), Q9_Controllo.processi = c(0,
0, 0, 1, 0, 0, 1, 1, 0, 0), Q9_Effic.Magazzino = c(0, 0, 0, 1,
0, 0, 0, 0, 0, 0), Q9_Riduz.Costi = c(0, 1, 0, 0, 0, 0, 0, 0,
0, 1), Q9_Miglior.Sicurezza = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1),
Q9_Connett.Interna = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0), Q9_Connett.Esterna = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Virtualizzazione = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Q9_Innov.Prod = c(0, 0, 0, 0, 0,
1, 0, 0, 0, 1), Q9_Person.Prod = c(0, 1, 0, 1, 0, 1, 0, 0,
0, 1), Q9_Nuovi.Mercati = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Q9_Nuovi.BM = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.Energ = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.SostAmb = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")
I have tried performing factor analysis via the tethracoric method on the obtained correlation matrix ( the obtained value from the KMO function turns out to be inadequate) both directly on the dataframe and then using tethracoric correletions in fafunction (using cor = "tet" I get a negative Tucker Lewis Index).
I have been reading up on this but cannot find a methodology that is adequate and of which I am certain of the correctness of the analysis.
So basically what I would like to achieve is to aggregate similar constructs, e.g., assess whether column 5 has value 1 (i.e., "yes") almost always when column 11 has value 1 and then aggregate.
Here the code that I try to used
library(psych)
tet <- tetrachoric(outcome)
corrplot(tet$rho, "ellipse", tl.cex = 0.75, tl.col = "black")
par(mfrow = c(1,2))
corr_matrix %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(corr_matrix)
cortest.bartlett(corr_matrix)
fa.parallel(corr_matrix, fm = "ml")
factor <- fa(corr_matrix, nfactors = 3, rotate = "oblimin", fm = "ml")
print(factor, cut = 0.3, digits = 3)
# -------- Pearson --------
cor(outcome, method = 'pearson', use = "pairwise.complete.obs") %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(outcome)
cortest.bartlett(outcome)
fa.parallel(outcome)
factor1 <- fa(outcome, nfactors = 3, rotate = "oblimin", cor = "tet", fm = "ml")
print(factor1, cut = 0.3, digits = 3)

R function to change value after a condition has been fulfilled

Participants in an experiment took a test that has a rule that says "once a participant has gotten 6 items wrong in a window of 8 items, you stop running the test". However, some experimenters kept testing past this point. I now need to find a way in which I can automatically see where the test should have been stopped, and change all values following the end to 0 (= item wrong). I am not even sure if this is something that can be done in R.
To be clear, I would like to go row by row (which are the participants) and once there are six 0s in a given window of 8 columns (items), I would need all values after the sixth 0 to be 0 too.
While the reproducible data is below, here is a visualization of what I would need, where the blue cells are the ones that should change to 0:
Pre-changes
Post-changes
Reproducible data:
structure(list(Participant_ID = c("E01P01", "E01P02", "E01P03",
"E01P04", "E01P05", "E01P06", "E01P07", "E01P08", "E02P01", "E02P02"
), A2 = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1), A3 = c(1, 1, 0, 0, 0,
1, 0, 0, 0, 0), B1 = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1), B2 = c(1,
1, 1, 1, 1, 1, 0, 0, 0, 1), C3 = c(1, 0, 0, 1, 0, 1, 0, 0, 0,
1), C4 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1), D1 = c(1, 0, 0, 0,
0, 1, 0, 0, 0, 0), D3 = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 1), E1 = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 1), E3 = c(1, 1, 0, 1, 0, 1, 0, 0, 0,
0), F1 = c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0), F4 = c(1, 1, 1, 1,
0, 1, 0, 1, 1, 0), G1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 1), G2 = c(0,
0, 0, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Any help is highly appreciated!

Here is a solution that involves some pivoting, rollsum, cumsum, if_else logic, then pivoting back. Let me know if it works.
library(tidyverse)
library(zoo)
structure(list(Participant_ID = c("E01P01", "E01P02", "E01P03",
"E01P04", "E01P05", "E01P06", "E01P07", "E01P08", "E02P01", "E02P02"
), A2 = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1), A3 = c(1, 1, 0, 0, 0,
1, 0, 0, 0, 0), B1 = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1), B2 = c(1,
1, 1, 1, 1, 1, 0, 0, 0, 1), C3 = c(1, 0, 0, 1, 0, 1, 0, 0, 0,
1), C4 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1), D1 = c(1, 0, 0, 0,
0, 1, 0, 0, 0, 0), D3 = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 1), E1 = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 1), E3 = c(1, 1, 0, 1, 0, 1, 0, 0, 0,
0), F1 = c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0), F4 = c(1, 1, 1, 1,
0, 1, 0, 1, 1, 0), G1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 1), G2 = c(0,
0, 0, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")) %>%
as_tibble() %>%
pivot_longer(-1) %>%
group_by(Participant_ID) %>%
mutate(running_total = zoo::rollsumr(value==0, k = 8, fill = 0),
should_terminate = cumsum(running_total >= 6),
value = if_else(should_terminate > 0, 0, value)) %>%
ungroup() %>%
select(Participant_ID, name, value) %>%
pivot_wider(names_from = name, values_from = value)

Changing a character column into a continuous column, by dividing them into sections (1,2,3,4)

I have a data set I'm trying to run a glm regression on, however it contains characters as age limit, race, and comorbidity class. I would like to change those columns into a continuous variable so the regression can accept it. Data below, I want to change the TBI.irace2 into (Hispanic=1, Black=2, white=3, and other=4) same with age (age 18-28=1, 29-46=2, 47-64=3, and >64=4) and with NISS (NISS 0-10=1, NISS 11-20=2, NISS 21-30=3, and NISS 31-40=4, NISS41-50=5, NISS 51-60=6, NISS 61-70=7, NISS>70= 8)
Please find summary of data below
TBI.crani = c(0, 0, 0, 0, 0, 0), TBI.vte = c(0,
0, 0, 0, 0, 0), TBI.FEMALE = c(0, 0, 1, 0, 1, 0), TBI.iracecat2 = c("Whites",
"Whites", "Whites", "Hispanics", "Whites", "Blacks"), TBI.agecat = c("Age 47-64",
"Age 29-46", "Age > 64", "Age 29-46", "Age 18-28", "Age 18-28"
), TBI.nisscategory = c("NISS 21-30", "NISS 11-20", "NISS 21-30",
"NISS 11-20", "NISS 11-20", "NISS 0-10"), TBI.LOS = c(5, 8, 1,
3, 19, 1), TBI.hospitalteach = c(0, 0, 1, 1, 1, 1), TBI.largebedsize = c(1,
1, 1, 1, 1, 1), TBI.CM_ALCOHOL = c(0, 0, 0, 1, 0, 0), TBI.CM_ANEMDEF = c(0,
0, 0, 0, 0, 0), TBI.CM_BLDLOSS = c(0, 0, 0, 0, 0, 0), TBI.CM_CHF = c(1,
0, 0, 0, 0, 0), TBI.CM_CHRNLUNG = c(0, 0, 0, 0, 0, 0), TBI.CM_COAG = c(0,
0, 0, 0, 1, 0), TBI.CM_HYPOTHY = c(0, 0, 0, 0, 0, 0), TBI.CM_LYTES = c(0,
0, 0, 0, 0, 0), TBI.CM_METS = c(0, 0, 0, 0, 0, 0), TBI.CM_NEURO = c(0,
0, 0, 0, 0, 0), TBI.CM_OBESE = c(0, 0, 0, 0, 0, 0), TBI.CM_PARA = c(0,
0, 0, 0, 0, 0), TBI.CM_PSYCH = c(0, 1, 0, 0, 0, 0), TBI.CM_TUMOR = c(0,
0, 0, 0, 0, 0), TBI.CM_WGHTLOSS = c(0, 0, 0, 0, 0, 0), TBI.UTI = c(0,
0, 0, 0, 0, 0), TBI.pneumonia = c(0, 0, 0, 0, 0, 0), TBI.AMI = c(0,
0, 0, 0, 0, 0), TBI.sepsis = c(0, 0, 0, 0, 0, 0), TBI.arrest = c(0,
0, 0, 0, 0, 0), TBI.spineinjury = c(0, 0, 0, 0, 0, 0), TBI.legfracture = c(0,
0, 0, 0, 0, 0), TBI_time_to_surg.NEW = c(0, 0, 0, 0, 0, 0)), row.names = c(NA,
6L), class = "data.frame")

A small little tip, provide a small sample set that is just big enough to address your question.
library(data.table)
# took a small sample and changed one value to Asian
dt <- data.table(
TBI.FEMALE = c(0, 0, 1, 0, 1, 0),
TBI.iracecat2 = as.character(c("Whites", "Whites", "Asian", "Hispanics", "Whites", "Blacks"))
)
# define race groups, and note I did not define Asian
convert_race <- c("Hispanics" = 1, "Blacks" = 2, "Whites" = 3) # other will all be not defined
dt[, TBI.irace2 := lapply(TBI.iracecat2, function(x) convert_race[x]), by = TBI.iracecat2]
dt[is.na(TBI.irace2), TBI.irace2 := 4]
dt
# TBI.FEMALE TBI.iracecat2 TBI.irace2
# 1: 0 Whites 3
# 2: 0 Whites 3
# 3: 1 Asian 4
# 4: 0 Hispanics 1
# 5: 1 Whites 3
# 6: 0 Blacks 2

Combining Two Data Fames with Different Row/Col Names Together

I have this data frame:
dtMatrix <- structure(list(category = c("Opponent", "Opponent", "Opponent",
"Opponent", "P1", "P2", "P3", "P4", "P2", "Opponent", "Opponent",
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover",
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Good Pass",
"Intercepted Pass", "Bad Pass", "Good Pass"), Receiver = c(NA,
NA, NA, NA, "P2", "P3", "P4", "P5", "P1", NA, NA, "P2")), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
With this, I created a matrix
goodMatrix <- dtMatrix %>%
filter(Event == 'Good Pass' & !is.na(Receiver)) %>%
dplyr::count(category, Receiver) %>%
tidyr::complete(category = dfList, Receiver = dfList, fill = list(n = 0)) %>%
pivot_wider(names_from = Receiver, values_from = n) %>%
column_to_rownames('category')
This goodMatrix stores the combinations of good passes between P1-P5. In the dtMatrix, it also has other values in the Event column such as turnover/intercepted pass, and also accounts for the opponent. I would like to create a similar matrix as goodMatrix but for the events and opponent previously mentioned.
countTypes <- dtMatrix %>% dplyr::count(category, Event) Grabs all the counts of the events based on the category column. With that, I then did:
secondMatrix <- data.frame(matrix(ncol = length(unique(countTypes$Event)), nrow = length(unique(countTypes$category))))
rownames(secondMatrix) <- unique(countTypes$category)
colnames(secondMatrix) <- unique(countTypes$Event)
secondMatrix
test <- merge(goodMatrix, secondMatrix, by = "row.names")
To try and combine the two separate matrices together.
anotherMatrix <- dtMatrix %>%
dplyr::count(category, Event) %>%
tidyr::complete(category = dfList, Event = dfList, fill = list(n = 0)) %>%
pivot_wider(names_from = Event, values_from = n) %>%
column_to_rownames('category')
This also adds them into one, but does not keep the values from dtMatrix and instead resets them to 0.
My expected result should look as such:
expectedOutput <- structure(list(P1 = c(0, 1, 0, 0, 0, 0), P2 = c(2, 0, 0, 0, 0,
0), P3 = c(0, 1, 0, 0, 0, 0), P4 = c(0, 0, 1, 0, 0, 0), P5 = c(0,
0, 0, 1, 0, 0), `Good Pass` = c(2, 2, 1, 1, 0, 3), `Bad Pass` = c(0,
0, 0, 0, 0, 1), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1), Turnover = c(0,
0, 0, 0, 0, 1)), row.names = c("P1", "P2", "P3", "P4", "P5",
"Opponent"), class = "data.frame")
And anotherMatrix does half of this, while dtMatrix does the other half, but I am struggling on merging them into what I would like my result to be.
Edit
newTest <- test[,-1]
rownames(newTest) <- test[,1]
newTry <- merge(anotherMatrix, newTest, by = "row.names")
Just as an extra attempted method - this also gets close to my expected output, but does not include the opponent row, and also doubles every column.
dfList <- c("P1", "P2", "P3", "P4", "P5")
Edit 2
A quick follow up on combining 2 DF's with different row/col lengths, how would I go about combining passesComb + copyComb into gamesComb:
passesComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 0, 2), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 1, 0, 0, 1), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 1, 1), `Good Pass` = c(2,
2, 1, 1, 0, 3, 6), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1, 0
), Turnover = c(0, 0, 0, 0, 0, 1, 0), totalEvents = c(2, 2, 2,
1, 0, 6, 7)), row.names = c("P1", "P2", "P3", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
and
copyComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 2), P4_Good = c(0, 0, 0, 0, 0, 1), P5_Good = c(0,
0, 1, 0, 0, 1), P1_Bad = c(0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0,
0, 0, 0, 0), P3_Bad = c(0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 0,
0, 0, 1), P5_Bad = c(0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0,
0, 0, 1, 1), `Good Pass` = c(2, 2, 1, 0, 3, 6), `Intercepted Pass` = c(0,
0, 0, 0, 1, 0), Turnover = c(0, 0, 0, 0, 1, 0), totalEvents = c(2,
2, 1, 0, 6, 7)), row.names = c("P1", "P2", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
copyComb is the same as passesComb just with row/column 3 removed. I tried adapting from the code for the original answer.
gamesComb <- data.frame(matrix(NA, nrow = ifelse(nrow(passesComb) >= nrow(copyComb), nrow(passesComb),nrow(copyComb)),
ncol = ifelse(ncol(passesComb) >= ncol(copyComb), ncol(passesComb),ncol(copyComb))))
gamesComb[row.names(ifelse(nrow(passesComb) >= nrow(copyComb), passesComb, copyComb)),
colnames(ifelse(ncol(passesComb) >= ncol(copyComb), passesComb, copyComb))] <- passesComb
but this only creates a 7x15 df and doesn't add the row/column names for some reason, in addition to not adding the cell values.

If the intention is to update 'anotherMatrix' with goodMatrix, use the row.names and colnames from 'goodMatrix' to subset the 'anotherMatrix' and assign the 'goodMatrix' to 'anotherMatrix'
anotherMatrix[row.names(goodMatrix), colnames(goodMatrix)] <- goodMatrix
Then, we just replace the NA with 0
anotherMatrix[is.na(anotherMatrix)] <- 0
-checking with 'expectedOutput
> identical(expectedOutput, anotherMatrix[names(expectedOutput)])
[1] TRUE

Hierarchical clustering of a time-series

I am struggling with hierarchical or clustering. I have the following time-series and I want to cluster to based on time. Would transpose function work for this?
structure(list(`04:00` = c(0, 0, 0, 0, 0, 0), `04:10` = c(0,
0, 0, 0, 0, 0), `04:20` = c(0, 0, 0, 0, 0, 0), `04:30` = c(0,
0, 0, 0, 0, 0), `04:40` = c(0, 0, 0, 0, 0, 0), `04:50` = c(0,
0, 0, 0, 0, 0), `05:00` = c(0, 0, 0, 0, 0, 0), `05:10` = c(0,
0, 0, 0, 0, 0), `05:20` = c(0, 0, 0, 0, 0, 0), `05:30` = c(0,
0, 0, 0, 0, 0), `05:40` = c(0, 0, 0, 0, 0, 0), `05:50` = c(1,
0, 0, 0, 0, 0), `06:00` = c(1, 0, 0, 0, 0, 0), `06:10` = c(1,
0, 0, 0, 0, 0), `06:20` = c(2, 0, 0, 0, 0, 0), `06:30` = c(0,
0, 0, 0, 0, 0), `06:40` = c(0, 1, 0, 0, 0, 0), `06:50` = c(0,
2, 0, 0, 0, 1), `07:00` = c(0, 0, 0, 0, 0, 2), `07:10` = c(0,
0, 1, 0, 0, 2), `07:20` = c(0, 0, 0, 0, 0, 2), `07:30` = c(0,
0, 1, 0, 0, 0), `07:40` = c(1, 0, 1, 0, 0, 0), `07:50` = c(1,
0, 0, 0, 2, 0), `08:00` = c(1, 0, 0, 0, 0, 0), `08:10` = c(1,
0, 0, 0, 0, 0), `08:20` = c(2, 0, 0, 0, 0, 0), `08:30` = c(2,
0, 0, 0, 0, 0), `08:40` = c(2, 0, 0, 0, 0, 0), `08:50` = c(2,
0, 0, 0, 0, 0), `09:00` = c(0, 0, 0, 0, 0, 0), `09:10` = c(0,
0, 0, 0, 0, 0), `09:20` = c(0, 1, 0, 0, 0, 0), `09:30` = c(0,
1, 0, 2, 0, 0), `09:40` = c(0, 1, 0, 0, 0, 0), `09:50` = c(0,
1, 0, 0, 0, 0), `10:00` = c(0, 0, 0, 0, 0, 0), `10:10` = c(0,
0, 0, 0, 0, 0), `10:20` = c(0, 1, 0, 0, 0, 0), `10:30` = c(0,
1, 0, 0, 0, 0), `10:40` = c(0, 0, 0, 0, 0, 0), `10:50` = c(0,
0, 0, 0, 0, 0), `11:00` = c(2, 0, 0, 1, 0, 0), `11:10` = c(0,
0, 0, 1, 0, 0), `11:20` = c(0, 0, 0, 1, 0, 1), `11:30` = c(0,
0, 0, 1, 0, 1), `11:40` = c(0, 0, 0, 1, 0, 1), `11:50` = c(0,
0, 0, 1, 0, 0), `12:00` = c(0, 0, 0, 1, 2, 0), `12:10` = c(0,
0, 0, 1, 0, 0), `12:20` = c(0, 0, 0, 1, 0, 0), `12:30` = c(0,
0, 0, 1, 0, 0), `12:40` = c(0, 0, 0, 1, 0, 0), `12:50` = c(0,
0, 0, 1, 1, 0), `13:00` = c(0, 0, 0, 0, 1, 0), `13:10` = c(0,
0, 0, 0, 1, 0), `13:20` = c(0, 0, 0, 0, 1, 0), `13:30` = c(0,
0, 0, 0, 1, 0), `13:40` = c(0, 0, 0, 0, 1, 0), `13:50` = c(0,
0, 0, 0, 1, 0), `14:00` = c(0, 0, 0, 0, 1, 0), `14:10` = c(0,
0, 0, 0, 1, 0), `14:20` = c(0, 0, 0, 0, 1, 0), `14:30` = c(0,
0, 0, 0, 1, 0), `14:40` = c(0, 0, 0, 0, 1, 0), `14:50` = c(0,
0, 0, 0, 0, 0), `15:00` = c(0, 0, 0, 0, 0, 0), `15:10` = c(0,
2, 0, 0, 0, 0), `15:20` = c(0, 2, 0, 0, 1, 0), `15:30` = c(0,
2, 0, 0, 1, 1), `15:40` = c(0, 2, 0, 0, 1, 0), `15:50` = c(0,
2, 0, 0, 1, 0), `16:00` = c(0, 2, 0, 0, 1, 0), `16:10` = c(0,
2, 0, 0, 1, 0), `16:20` = c(2, 2, 0, 0, 1, 0), `16:30` = c(2,
2, 0, 0, 1, 2), `16:40` = c(2, 2, 0, 0, 1, 1), `16:50` = c(2,
2, 0, 0, 0, 1), `17:00` = c(0, 2, 0, 0, 2, 0), `17:10` = c(0,
0, 0, 0, 2, 0), `17:20` = c(0, 0, 0, 0, 2, 0), `17:30` = c(0,
0, 0, 0, 2, 0), `17:40` = c(0, 0, 0, 0, 0, 0), `17:50` = c(0,
0, 0, 0, 0, 0), `18:00` = c(0, 2, 0, 0, 0, 2), `18:10` = c(0,
2, 0, 0, 0, 2), `18:20` = c(0, 0, 0, 0, 2, 2), `18:30` = c(0,
0, 0, 0, 0, 2), `18:40` = c(0, 0, 0, 0, 0, 2), `18:50` = c(1,
0, 0, 0, 0, 2), `19:00` = c(1, 0, 0, 1, 1, 0), `19:10` = c(1,
0, 0, 1, 1, 0), `19:20` = c(1, 0, 0, 1, 1, 0), `19:30` = c(1,
0, 1, 1, 1, 0), `19:40` = c(1, 0, 1, 1, 1, 1), `19:50` = c(1,
0, 1, 1, 1, 1), `20:00` = c(0, 0, 1, 1, 1, 1), `20:10` = c(0,
0, 1, 1, 1, 1), `20:20` = c(0, 0, 1, 1, 1, 1), `20:30` = c(0,
1, 2, 1, 1, 1), `20:40` = c(0, 1, 0, 1, 1, 1), `20:50` = c(0,
1, 0, 1, 1, 1), `21:00` = c(0, 1, 0, 1, 1, 1), `21:10` = c(0,
1, 0, 0, 1, 1), `21:20` = c(0, 1, 0, 0, 1, 1), `21:30` = c(0,
1, 1, 0, 1, 1), `21:40` = c(0, 1, 1, 0, 1, 1), `21:50` = c(0,
1, 1, 0, 0, 1), `22:00` = c(0, 1, 1, 0, 0, 0), `22:10` = c(0,
1, 0, 0, 0, 0), `22:20` = c(0, 1, 0, 0, 0, 0), `22:30` = c(0,
1, 0, 0, 0, 0), `22:40` = c(0, 1, 0, 0, 0, 0), `22:50` = c(0,
1, 0, 0, 0, 0), `23:00` = c(0, 0, 0, 0, 1, 0), `23:10` = c(0,
0, 0, 0, 0, 1), `23:20` = c(0, 0, 0, 0, 0, 1), `23:30` = c(0,
0, 0, 0, 0, 1), `23:40` = c(0, 0, 0, 0, 0, 1), `23:50` = c(0,
0, 0, 0, 0, 0), `00:00` = c(0, 0, 0, 0, 0, 0), `00:10` = c(0,
0, 0, 0, 0, 0), `00:20` = c(0, 0, 0, 0, 0, 0), `00:30` = c(0,
0, 0, 0, 0, 0), `00:40` = c(0, 0, 0, 0, 0, 0), `00:50` = c(0,
0, 0, 0, 0, 0), `01:00` = c(0, 0, 0, 0, 0, 0), `01:10` = c(0,
0, 0, 0, 0, 0), `01:20` = c(0, 0, 0, 0, 0, 0), `01:30` = c(0,
0, 0, 0, 0, 0), `01:40` = c(0, 0, 0, 0, 0, 0), `01:50` = c(0,
0, 0, 0, 0, 0), `02:00` = c(0, 0, 0, 0, 0, 0), `02:10` = c(0,
0, 0, 0, 0, 0), `02:20` = c(0, 0, 0, 0, 0, 0), `02:30` = c(0,
0, 0, 0, 0, 0), `02:40` = c(0, 0, 0, 0, 0, 0), `02:50` = c(0,
0, 0, 0, 0, 0), `03:00` = c(0, 0, 0, 0, 0, 0), `03:10` = c(0,
0, 0, 0, 0, 0), `03:20` = c(0, 0, 0, 0, 0, 0), `03:30` = c(0,
0, 0, 0, 0, 0), `03:40` = c(0, 0, 0, 0, 0, 0), `03:50` = c(0,
0, 0, 0, 0, 0)), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
I managed to run hierarchical clustering but only on cases and not on time
d_distance <- dist(as.matrix(df))
plot(hclust(d_distance))
The plot that I generated
As you can see on the plot the structure end points are indexes - how can I have instead of index time (maybe transpose)? Also I would like to plot time-series cluster separately like below plot. Would dtw be better than hierarchical clustering?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Adding 2 DF's of Different Sizes Together - r

Related

Aggregate similar constructs/ FA with binary variables

R function to change value after a condition has been fulfilled

Changing a character column into a continuous column, by dividing them into sections (1,2,3,4)

Combining Two Data Fames with Different Row/Col Names Together

Hierarchical clustering of a time-series

Categories

Resources