weighted mean R - r

I was wondering how I can get the weighted average of my data. I have already looked on the internet, but when I try the weighted.mean function, I keep getting the same result, so I was wondering what I am doing wrong.
Below is some information of the dataset:
dput(head(new))
structure(list(comp.1 = c(0.5, 0.25, 0, 0.25, 0.31, 0.3), comp.2 = c(0.3,
0.15, 0, 0.15, 0, 0), comp.3 = c(0.2, 0.6, 1, 0.6, 0.69, 0.7),
genderMale = c(0, 1, 1, 1, 0, 0), SeniorCitizen = c(0, 0,
0, 0, 0, 0), PartnerYes = c(1, 0, 0, 0, 0, 0), DependentsYes = c(0,
0, 0, 0, 0, 0), tenure = c(-1.28015700354285, 0.064298112878097,
-1.23941593940889, 0.512449818351747, -1.23941593940889,
-0.994969554605076), MultipleLinesYes = c(0, 0, 0, 0, 0,
1), `InternetServiceFiber optic` = c(0, 0, 0, 0, 1, 1), OnlineSecurityYes = c(0,
1, 1, 1, 0, 0), OnlineBackupYes = c(1, 0, 1, 0, 0, 0), DeviceProtectionYes = c(0,
1, 0, 1, 0, 1), TechSupportYes = c(0, 0, 0, 1, 0, 0), StreamingTVYes = c(0,
0, 0, 0, 0, 1), StreamingMoviesYes = c(0, 0, 0, 0, 0, 1),
`ContractOne year` = c(0, 1, 0, 1, 0, 0), `ContractTwo year` = c(0,
0, 0, 0, 0, 0), PaperlessBillingYes = c(1, 0, 1, 0, 1, 1),
`PaymentMethodCredit card (automatic)` = c(0, 0, 0, 0, 0,
0), `PaymentMethodElectronic check` = c(1, 0, 0, 0, 1, 1),
`PaymentMethodMailed check` = c(0, 1, 1, 0, 0, 0), MonthlyCharges = c(-1.16161133177258,
-0.260859369930086, -0.363897417225722, -0.747797238601399,
0.196164226945719, 1.15840663636787), TotalCharges = c(1.47494433546539,
3.27634689625303, 2.03402652377511, 3.26499480914874, 2.18084241464668,
2.91407858538911)), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")
As you can see, I have 3 components (comp.1, comp.2, comp.3). All of these components have their posterior probabilities. And I am wondering how I can get the weighted averages for all of these and the final weighted averages. I have tried:
weighted.mean(new$comp.1, new$SeniorCitizen)
weighted.mean(new$comp.2, new$SeniorCitizen)
weighted.mean(new$comp.3, new$SeniorCitizen)
It gave me the output 0.24, 0.14 and 0.61. But irrespectively which variable I put, I get the same output. What am I doing wrong?

Related

Aggregate similar constructs/ FA with binary variables

I would like to aggregate, in order to reduce the number of constructs, its following data frame containing only binary variables that correspond to "yes/no", its following data frame (first 10 row). The original data frame contains 169 rows.
outcome <-
structure(list(Q9_Automazione.processi = c(0, 0, 0, 0, 0, 0,
1, 1, 1, 0), Q9_Velocita.Prod = c(1, 0, 0, 1, 0, 0, 1, 1, 1,
0), Q9_Flessibilita.Prod = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1), Q9_Controllo.processi = c(0,
0, 0, 1, 0, 0, 1, 1, 0, 0), Q9_Effic.Magazzino = c(0, 0, 0, 1,
0, 0, 0, 0, 0, 0), Q9_Riduz.Costi = c(0, 1, 0, 0, 0, 0, 0, 0,
0, 1), Q9_Miglior.Sicurezza = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1),
Q9_Connett.Interna = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0), Q9_Connett.Esterna = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Virtualizzazione = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Q9_Innov.Prod = c(0, 0, 0, 0, 0,
1, 0, 0, 0, 1), Q9_Person.Prod = c(0, 1, 0, 1, 0, 1, 0, 0,
0, 1), Q9_Nuovi.Mercati = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Q9_Nuovi.BM = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.Energ = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.SostAmb = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")
I have tried performing factor analysis via the tethracoric method on the obtained correlation matrix ( the obtained value from the KMO function turns out to be inadequate) both directly on the dataframe and then using tethracoric correletions in fafunction (using cor = "tet" I get a negative Tucker Lewis Index).
I have been reading up on this but cannot find a methodology that is adequate and of which I am certain of the correctness of the analysis.
So basically what I would like to achieve is to aggregate similar constructs, e.g., assess whether column 5 has value 1 (i.e., "yes") almost always when column 11 has value 1 and then aggregate.
Here the code that I try to used
library(psych)
tet <- tetrachoric(outcome)
corrplot(tet$rho, "ellipse", tl.cex = 0.75, tl.col = "black")
par(mfrow = c(1,2))
corr_matrix %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(corr_matrix)
cortest.bartlett(corr_matrix)
fa.parallel(corr_matrix, fm = "ml")
factor <- fa(corr_matrix, nfactors = 3, rotate = "oblimin", fm = "ml")
print(factor, cut = 0.3, digits = 3)
# -------- Pearson --------
cor(outcome, method = 'pearson', use = "pairwise.complete.obs") %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(outcome)
cortest.bartlett(outcome)
fa.parallel(outcome)
factor1 <- fa(outcome, nfactors = 3, rotate = "oblimin", cor = "tet", fm = "ml")
print(factor1, cut = 0.3, digits = 3)

Changing a character column into a continuous column, by dividing them into sections (1,2,3,4)

I have a data set I'm trying to run a glm regression on, however it contains characters as age limit, race, and comorbidity class. I would like to change those columns into a continuous variable so the regression can accept it. Data below, I want to change the TBI.irace2 into (Hispanic=1, Black=2, white=3, and other=4) same with age (age 18-28=1, 29-46=2, 47-64=3, and >64=4) and with NISS (NISS 0-10=1, NISS 11-20=2, NISS 21-30=3, and NISS 31-40=4, NISS41-50=5, NISS 51-60=6, NISS 61-70=7, NISS>70= 8)
Please find summary of data below
TBI.crani = c(0, 0, 0, 0, 0, 0), TBI.vte = c(0,
0, 0, 0, 0, 0), TBI.FEMALE = c(0, 0, 1, 0, 1, 0), TBI.iracecat2 = c("Whites",
"Whites", "Whites", "Hispanics", "Whites", "Blacks"), TBI.agecat = c("Age 47-64",
"Age 29-46", "Age > 64", "Age 29-46", "Age 18-28", "Age 18-28"
), TBI.nisscategory = c("NISS 21-30", "NISS 11-20", "NISS 21-30",
"NISS 11-20", "NISS 11-20", "NISS 0-10"), TBI.LOS = c(5, 8, 1,
3, 19, 1), TBI.hospitalteach = c(0, 0, 1, 1, 1, 1), TBI.largebedsize = c(1,
1, 1, 1, 1, 1), TBI.CM_ALCOHOL = c(0, 0, 0, 1, 0, 0), TBI.CM_ANEMDEF = c(0,
0, 0, 0, 0, 0), TBI.CM_BLDLOSS = c(0, 0, 0, 0, 0, 0), TBI.CM_CHF = c(1,
0, 0, 0, 0, 0), TBI.CM_CHRNLUNG = c(0, 0, 0, 0, 0, 0), TBI.CM_COAG = c(0,
0, 0, 0, 1, 0), TBI.CM_HYPOTHY = c(0, 0, 0, 0, 0, 0), TBI.CM_LYTES = c(0,
0, 0, 0, 0, 0), TBI.CM_METS = c(0, 0, 0, 0, 0, 0), TBI.CM_NEURO = c(0,
0, 0, 0, 0, 0), TBI.CM_OBESE = c(0, 0, 0, 0, 0, 0), TBI.CM_PARA = c(0,
0, 0, 0, 0, 0), TBI.CM_PSYCH = c(0, 1, 0, 0, 0, 0), TBI.CM_TUMOR = c(0,
0, 0, 0, 0, 0), TBI.CM_WGHTLOSS = c(0, 0, 0, 0, 0, 0), TBI.UTI = c(0,
0, 0, 0, 0, 0), TBI.pneumonia = c(0, 0, 0, 0, 0, 0), TBI.AMI = c(0,
0, 0, 0, 0, 0), TBI.sepsis = c(0, 0, 0, 0, 0, 0), TBI.arrest = c(0,
0, 0, 0, 0, 0), TBI.spineinjury = c(0, 0, 0, 0, 0, 0), TBI.legfracture = c(0,
0, 0, 0, 0, 0), TBI_time_to_surg.NEW = c(0, 0, 0, 0, 0, 0)), row.names = c(NA,
6L), class = "data.frame")
A small little tip, provide a small sample set that is just big enough to address your question.
library(data.table)
# took a small sample and changed one value to Asian
dt <- data.table(
TBI.FEMALE = c(0, 0, 1, 0, 1, 0),
TBI.iracecat2 = as.character(c("Whites", "Whites", "Asian", "Hispanics", "Whites", "Blacks"))
)
# define race groups, and note I did not define Asian
convert_race <- c("Hispanics" = 1, "Blacks" = 2, "Whites" = 3) # other will all be not defined
dt[, TBI.irace2 := lapply(TBI.iracecat2, function(x) convert_race[x]), by = TBI.iracecat2]
dt[is.na(TBI.irace2), TBI.irace2 := 4]
dt
# TBI.FEMALE TBI.iracecat2 TBI.irace2
# 1: 0 Whites 3
# 2: 0 Whites 3
# 3: 1 Asian 4
# 4: 0 Hispanics 1
# 5: 1 Whites 3
# 6: 0 Blacks 2

Adding 2 DF's of Different Sizes Together

I have two DF's:
passesComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 0, 2), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 1, 0, 0, 1), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 1, 1), `Good Pass` = c(2,
2, 1, 1, 0, 3, 6), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1, 0
), Turnover = c(0, 0, 0, 0, 0, 1, 0), totalEvents = c(2, 2, 2,
1, 0, 6, 7)), row.names = c("P1", "P2", "P3", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
of size 7x15, and
copyComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 2), P4_Good = c(0, 0, 0, 0, 0, 0), P5_Good = c(0,
0, 1, 0, 0, 1), P1_Bad = c(0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0,
0, 0, 0, 0), P3_Bad = c(0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 0,
0, 0, 0), P5_Bad = c(0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0,
0, 0, 1, 0), `Good Pass` = c(2, 1, 1, 0, 3, 4), `Intercepted Pass` = c(0,
0, 0, 0, 1, 0), Turnover = c(0, 0, 0, 0, 1, 0), totalEvents = c(2,
1, 1, 0, 6, 4)), row.names = c("P1", "P2", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
or simply,
copyComb <- passesComb
copyComb <- copyComb[-3,-3]
#Updating specific cells since [3,3] is removed
copyComb[2,11] <- 1
copyComb[2,14] <- 1
copyComb[6,8] <- 0
copyComb[6,3] <- 0
copyComb[6,10] <- 0
copyComb[6,11] <- 4
copyComb[6,14] <- 4
#This now equals the copyComb from dput() above
of size 6x14.
I am trying to combine/add these two df's together based on matching row/column names. I tried to achieve this using the code from the answer to this post
gamesComb <- data.frame(matrix(NA, nrow = ifelse(nrow(passesComb) >= nrow(copyComb), nrow(passesComb),nrow(copyComb)),
ncol = ifelse(ncol(passesComb) >= ncol(copyComb), ncol(passesComb),ncol(copyComb))))
gamesComb[row.names(ifelse(nrow(passesComb) >= nrow(copyComb), passesComb, copyComb)),
colnames(ifelse(ncol(passesComb) >= ncol(copyComb), passesComb, copyComb))] <- passesComb
Here, I create a df, gamesComb and set the dimensions of whichever passesComb or copyComb is bigger. It does create a 7x15 df, but doesn't add the row/col names.
I also am trying to then add the 2 df's together based on the cell value if they have the same row/col name (same as in the post link above), i.e. passesComb["P2","P1_Good"] = 1 and copyComb["P2","P1_Good"] = 1, so gamesComb["P2","P1_Good"] should = 2, and same for all similar row/col names.
So the final result look like:
expectedOutput <- structure(list(P1_Good = c(0, 2, 0, 0, 0, 0, 2), P2_Good = c(4,
0, 0, 0, 0, 0, 4), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 2, 0, 0, 2), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 2, 1), `Good Pass` = c(4,
3, 1, 2, 0, 6, 10), `Intercepted Pass` = c(0, 0, 0, 0, 0, 2,
0), Turnover = c(0, 0, 0, 0, 0, 2, 0), totalEvents = c(4, 3,
2, 2, 0, 12, 11)), row.names = c("P1", "P2", "P3", "P4", "P5",
"Opponent", "VT"), class = "data.frame")
Here's a dplyr/tidyr approach where I reshape each table into a long format, then join them, sum, and pivot wider again.
library(dplyr); library(tidyr)
lengthen <- function(df) { df %>% rownames_to_column(var = "row") %>% pivot_longer(-row)}
full_join(lengthen(passesComb), lengthen(copyComb), by = c("row", "name")) %>%
mutate(new_val = coalesce(value.x, 0) + coalesce(value.y, 0)) %>%
select(-starts_with("value")) %>%
pivot_wider(names_from = name,values_from = new_val)
Another option is to stack them and then sum by rowname groups.
library(dplyr, warn.conflicts = FALSE)
library(tibble)
out <-
rownames_to_column(passesComb) %>%
bind_rows(rownames_to_column(copyComb)) %>%
# bind_rows(rownames_to_column(third_table)) %>% if you want to add another
select(rowname, names(passesComb)) %>%
group_by(rowname) %>%
summarise(across(everything(), sum, na.rm = T)) %>%
slice(match(rownames(passesComb), rowname)) %>%
column_to_rownames('rowname')
all.equal(out, expectedOutput)
#> [1] TRUE
Created on 2021-10-09 by the reprex package (v2.0.1)

Hierarchical clustering of a time-series

I am struggling with hierarchical or clustering. I have the following time-series and I want to cluster to based on time. Would transpose function work for this?
structure(list(`04:00` = c(0, 0, 0, 0, 0, 0), `04:10` = c(0,
0, 0, 0, 0, 0), `04:20` = c(0, 0, 0, 0, 0, 0), `04:30` = c(0,
0, 0, 0, 0, 0), `04:40` = c(0, 0, 0, 0, 0, 0), `04:50` = c(0,
0, 0, 0, 0, 0), `05:00` = c(0, 0, 0, 0, 0, 0), `05:10` = c(0,
0, 0, 0, 0, 0), `05:20` = c(0, 0, 0, 0, 0, 0), `05:30` = c(0,
0, 0, 0, 0, 0), `05:40` = c(0, 0, 0, 0, 0, 0), `05:50` = c(1,
0, 0, 0, 0, 0), `06:00` = c(1, 0, 0, 0, 0, 0), `06:10` = c(1,
0, 0, 0, 0, 0), `06:20` = c(2, 0, 0, 0, 0, 0), `06:30` = c(0,
0, 0, 0, 0, 0), `06:40` = c(0, 1, 0, 0, 0, 0), `06:50` = c(0,
2, 0, 0, 0, 1), `07:00` = c(0, 0, 0, 0, 0, 2), `07:10` = c(0,
0, 1, 0, 0, 2), `07:20` = c(0, 0, 0, 0, 0, 2), `07:30` = c(0,
0, 1, 0, 0, 0), `07:40` = c(1, 0, 1, 0, 0, 0), `07:50` = c(1,
0, 0, 0, 2, 0), `08:00` = c(1, 0, 0, 0, 0, 0), `08:10` = c(1,
0, 0, 0, 0, 0), `08:20` = c(2, 0, 0, 0, 0, 0), `08:30` = c(2,
0, 0, 0, 0, 0), `08:40` = c(2, 0, 0, 0, 0, 0), `08:50` = c(2,
0, 0, 0, 0, 0), `09:00` = c(0, 0, 0, 0, 0, 0), `09:10` = c(0,
0, 0, 0, 0, 0), `09:20` = c(0, 1, 0, 0, 0, 0), `09:30` = c(0,
1, 0, 2, 0, 0), `09:40` = c(0, 1, 0, 0, 0, 0), `09:50` = c(0,
1, 0, 0, 0, 0), `10:00` = c(0, 0, 0, 0, 0, 0), `10:10` = c(0,
0, 0, 0, 0, 0), `10:20` = c(0, 1, 0, 0, 0, 0), `10:30` = c(0,
1, 0, 0, 0, 0), `10:40` = c(0, 0, 0, 0, 0, 0), `10:50` = c(0,
0, 0, 0, 0, 0), `11:00` = c(2, 0, 0, 1, 0, 0), `11:10` = c(0,
0, 0, 1, 0, 0), `11:20` = c(0, 0, 0, 1, 0, 1), `11:30` = c(0,
0, 0, 1, 0, 1), `11:40` = c(0, 0, 0, 1, 0, 1), `11:50` = c(0,
0, 0, 1, 0, 0), `12:00` = c(0, 0, 0, 1, 2, 0), `12:10` = c(0,
0, 0, 1, 0, 0), `12:20` = c(0, 0, 0, 1, 0, 0), `12:30` = c(0,
0, 0, 1, 0, 0), `12:40` = c(0, 0, 0, 1, 0, 0), `12:50` = c(0,
0, 0, 1, 1, 0), `13:00` = c(0, 0, 0, 0, 1, 0), `13:10` = c(0,
0, 0, 0, 1, 0), `13:20` = c(0, 0, 0, 0, 1, 0), `13:30` = c(0,
0, 0, 0, 1, 0), `13:40` = c(0, 0, 0, 0, 1, 0), `13:50` = c(0,
0, 0, 0, 1, 0), `14:00` = c(0, 0, 0, 0, 1, 0), `14:10` = c(0,
0, 0, 0, 1, 0), `14:20` = c(0, 0, 0, 0, 1, 0), `14:30` = c(0,
0, 0, 0, 1, 0), `14:40` = c(0, 0, 0, 0, 1, 0), `14:50` = c(0,
0, 0, 0, 0, 0), `15:00` = c(0, 0, 0, 0, 0, 0), `15:10` = c(0,
2, 0, 0, 0, 0), `15:20` = c(0, 2, 0, 0, 1, 0), `15:30` = c(0,
2, 0, 0, 1, 1), `15:40` = c(0, 2, 0, 0, 1, 0), `15:50` = c(0,
2, 0, 0, 1, 0), `16:00` = c(0, 2, 0, 0, 1, 0), `16:10` = c(0,
2, 0, 0, 1, 0), `16:20` = c(2, 2, 0, 0, 1, 0), `16:30` = c(2,
2, 0, 0, 1, 2), `16:40` = c(2, 2, 0, 0, 1, 1), `16:50` = c(2,
2, 0, 0, 0, 1), `17:00` = c(0, 2, 0, 0, 2, 0), `17:10` = c(0,
0, 0, 0, 2, 0), `17:20` = c(0, 0, 0, 0, 2, 0), `17:30` = c(0,
0, 0, 0, 2, 0), `17:40` = c(0, 0, 0, 0, 0, 0), `17:50` = c(0,
0, 0, 0, 0, 0), `18:00` = c(0, 2, 0, 0, 0, 2), `18:10` = c(0,
2, 0, 0, 0, 2), `18:20` = c(0, 0, 0, 0, 2, 2), `18:30` = c(0,
0, 0, 0, 0, 2), `18:40` = c(0, 0, 0, 0, 0, 2), `18:50` = c(1,
0, 0, 0, 0, 2), `19:00` = c(1, 0, 0, 1, 1, 0), `19:10` = c(1,
0, 0, 1, 1, 0), `19:20` = c(1, 0, 0, 1, 1, 0), `19:30` = c(1,
0, 1, 1, 1, 0), `19:40` = c(1, 0, 1, 1, 1, 1), `19:50` = c(1,
0, 1, 1, 1, 1), `20:00` = c(0, 0, 1, 1, 1, 1), `20:10` = c(0,
0, 1, 1, 1, 1), `20:20` = c(0, 0, 1, 1, 1, 1), `20:30` = c(0,
1, 2, 1, 1, 1), `20:40` = c(0, 1, 0, 1, 1, 1), `20:50` = c(0,
1, 0, 1, 1, 1), `21:00` = c(0, 1, 0, 1, 1, 1), `21:10` = c(0,
1, 0, 0, 1, 1), `21:20` = c(0, 1, 0, 0, 1, 1), `21:30` = c(0,
1, 1, 0, 1, 1), `21:40` = c(0, 1, 1, 0, 1, 1), `21:50` = c(0,
1, 1, 0, 0, 1), `22:00` = c(0, 1, 1, 0, 0, 0), `22:10` = c(0,
1, 0, 0, 0, 0), `22:20` = c(0, 1, 0, 0, 0, 0), `22:30` = c(0,
1, 0, 0, 0, 0), `22:40` = c(0, 1, 0, 0, 0, 0), `22:50` = c(0,
1, 0, 0, 0, 0), `23:00` = c(0, 0, 0, 0, 1, 0), `23:10` = c(0,
0, 0, 0, 0, 1), `23:20` = c(0, 0, 0, 0, 0, 1), `23:30` = c(0,
0, 0, 0, 0, 1), `23:40` = c(0, 0, 0, 0, 0, 1), `23:50` = c(0,
0, 0, 0, 0, 0), `00:00` = c(0, 0, 0, 0, 0, 0), `00:10` = c(0,
0, 0, 0, 0, 0), `00:20` = c(0, 0, 0, 0, 0, 0), `00:30` = c(0,
0, 0, 0, 0, 0), `00:40` = c(0, 0, 0, 0, 0, 0), `00:50` = c(0,
0, 0, 0, 0, 0), `01:00` = c(0, 0, 0, 0, 0, 0), `01:10` = c(0,
0, 0, 0, 0, 0), `01:20` = c(0, 0, 0, 0, 0, 0), `01:30` = c(0,
0, 0, 0, 0, 0), `01:40` = c(0, 0, 0, 0, 0, 0), `01:50` = c(0,
0, 0, 0, 0, 0), `02:00` = c(0, 0, 0, 0, 0, 0), `02:10` = c(0,
0, 0, 0, 0, 0), `02:20` = c(0, 0, 0, 0, 0, 0), `02:30` = c(0,
0, 0, 0, 0, 0), `02:40` = c(0, 0, 0, 0, 0, 0), `02:50` = c(0,
0, 0, 0, 0, 0), `03:00` = c(0, 0, 0, 0, 0, 0), `03:10` = c(0,
0, 0, 0, 0, 0), `03:20` = c(0, 0, 0, 0, 0, 0), `03:30` = c(0,
0, 0, 0, 0, 0), `03:40` = c(0, 0, 0, 0, 0, 0), `03:50` = c(0,
0, 0, 0, 0, 0)), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
I managed to run hierarchical clustering but only on cases and not on time
d_distance <- dist(as.matrix(df))
plot(hclust(d_distance))
The plot that I generated
As you can see on the plot the structure end points are indexes - how can I have instead of index time (maybe transpose)? Also I would like to plot time-series cluster separately like below plot. Would dtw be better than hierarchical clustering?

Excluding one of the columns from the means calculating

I have a data.frame like this:
> dput(head(dat))
structure(list(`Gene name` = c("at1g01050", "at1g01080", "at1g01090",
"at1g01220", "at1g01320", "at1g01420"), `1_1` = c(0, 0, 0, 0,
0, 0), `1_2` = c(0, 0, 0, 0, 0, 0), `1_3` = c(0, 2.2266502274762,
0, 0, 0, 0), `1_4` = c(0, 1.42835007256373, 0, 0, 0, 0), `1_5` = c(0,
1, 0, 0, 0, 0.680307288653971), `1_6` = c(0, 0.974694551708235,
0.0703315834738149, 0, 0, 1.5411058346636), `1_7` = c(1, 1.06166030205396,
0, 0, 0, 0), `1_8` = c(1, 1.07309874414745, 0.129442847788922,
0, 0, 0), `1_9` = c(1.83566164452602, 0.770848509662441, 1.16522133036595,
1.02360016370994, 0, 0), `1_10` = c(0, 0, 0.96367393959757, 0,
0, 0), `1_11` = c(0, 1, 1.459452636222, 0, 0.992067202742928,
0), `1_12` = c(0, 0, 0.670100384155585, 0, 0.461601636474094,
0), `1_13` = c(0, 0, 1.43074917909221, 0, 1.35246977730244, 0
), `1_14` = c(0, 0, 1.13052717277684, 0, 1.27971261718285, 0),
`1_15` = c(0, 0, 0, 0, 0, 0), `1_16` = c(0, 0, 1.02186950513655,
0, 0.937805171752374, 0), `1_17` = c(0, 0, 0, 0, 1.82226410514639,
0), `1_18` = c(0, 0, 1.2057581396188, 0, 1, 0), `1_19` = c(0,
0, 2.54080080087007, 0, 1.74014162763125, 0), `1_20` = c(0,
0, 0, 0, 0, 0), `1_21` = c(0, 0, 1.85335086627868, 0, 2.93605031878879,
0), `1_22` = c(0, 0, 0, 0, 0, 0), `1_23` = c(0, 0, 0, 0,
0, 0), `1_24` = c(0, 0.59685787388353, 4.74450895485671,
0, 1.64665192735547, 0), `1_25` = c(0, 0, 0, 0, 0, 0), `1_26` = c(0,
0, 0, 0, 0, 0), `1_27` = c(0, 1.70324142554566, 0, 0, 0,
0), `1_28` = c(0, 4.02915818089525, 0, 0, 0, 0), `1_29` = c(0,
1.10050253348262, 0, 0, 0, 1.78705663080963), `1_30` = c(0,
0, 0, 0, 0, 0), `1_31` = c(0.525193634811661, 1.19203674964562,
0, 0, 0, 0), `1_32` = c(0.949695564218912, 0.511935958918944,
0.698256748091399, 0.924419021307232, 0, 0), `1_33` = c(1,
0.392202418854686, 0.981531026331928, 1, 0, 0), `1_34` = c(0,
0, 1.04480642952605, 0, 0, 0), `1_35` = c(0.875709646300199,
0.416787083481068, 0.910412293707794, 0, 0.931813162802324,
0), `1_36` = c(0.235817844851986, 0, 0.695496044366791, 0,
0, 0), `1_37` = c(0, 0, 0, 0, 0, 0), `1_38` = c(0, 0, 0,
0, 0, 0), `1_39` = c(0, 0, 0, 0, 0, 0), `1_40` = c(0, 0.426301584359177,
1.05916031917965, 0, 1.11716924423855, 0), `1_41` = c(0,
0, 0, 0, 0, 0), `1_42` = c(0, 0, 0, 0, 0, 0), `1_43` = c(0,
0, 0, 0, 0, 0), `1_44` = c(0, 0.817605484758179, 1, 0, 1,
0), `1_45` = c(0, 0, 0, 0, 1.83706702696725, 0), `1_46` = c(0,
0, 0, 0, 0, 0), `1_48` = c(0, 0, 0, 0, 0, 0), `1_49` = c(0,
0, 0, 0, 0, 0), `1_50` = c(0, 0, 0, 0, 0, 0), `1_51` = c(0,
0.822966241998042, 0, 0, 0, 0), `1_52` = c(0, 1.38548267401525,
0, 0, 0, 0), `1_53` = c(0, 0.693090058304095, 0, 0, 0, 1.200664746484
), `1_54` = c(0, 7.58136662752864, 0, 0, 0, 0), `1_55` = c(0.519878111919004,
0.530809413647805, 0.343274113384907, 0, 0, 0), `1_56` = c(1.24511715957891,
0.545097856366912, 0.397440073804376, 0, 0, 0), `1_57` = c(1.26748496499576,
0.502893153188496, 1, 1.09278985531586, 0, 0), `1_58` = c(0.696198684496234,
0.68197003689249, 1.30108437738319, 0.778091049180591, 0.533017938104689,
0), `1_59` = c(1.15255606344999, 0.294294436704185, 1.07862692616479,
1, 0.250091116406616, 0), `1_60` = c(1.95634163405497, 0,
1.1602014253913, 0, 0, 0), `1_61` = c(1.09287167009628, 0,
2.05939536537347, 1.08165521287259, 0.68027384701565, 0),
`1_62` = c(0.791776166968497, 0, 0.846107162142824, 0, 0.77013323652256,
0), `1_63` = c(0.378787010943447, 0.391876271945063, 0.623223753921758,
0, 0.651918444771296, 0), `1_64` = c(0.189585762007804, 0.361452381684218,
0.799519726870751, 0, 1.06818683719768, 0), `1_65` = c(0,
0, 2.5212953775211, 0, 0, 0), `1_66` = c(0, 0, 0, 0, 0, 0
), `1_67` = c(0, 0, 0, 0, 2.44827717262786, 0), `1_68` = c(0,
0, 0, 0, 0, 0), `1_69` = c(0, 0, 0, 0, 0, 0), `1_70` = c(0,
0, 2.36142611074334, 0, 2.391093649557, 0), `1_71` = c(0,
0, 0.35565044656798, 0, 0, 0), `1_72` = c(0, 0, 5.86951313801941,
0, 0, 0)), .Names = c("Gene name", "1_1", "1_2", "1_3", "1_4",
"1_5", "1_6", "1_7", "1_8", "1_9", "1_10", "1_11", "1_12", "1_13",
"1_14", "1_15", "1_16", "1_17", "1_18", "1_19", "1_20", "1_21",
"1_22", "1_23", "1_24", "1_25", "1_26", "1_27", "1_28", "1_29",
"1_30", "1_31", "1_32", "1_33", "1_34", "1_35", "1_36", "1_37",
"1_38", "1_39", "1_40", "1_41", "1_42", "1_43", "1_44", "1_45",
"1_46", "1_48", "1_49", "1_50", "1_51", "1_52", "1_53", "1_54",
"1_55", "1_56", "1_57", "1_58", "1_59", "1_60", "1_61", "1_62",
"1_63", "1_64", "1_65", "1_66", "1_67", "1_68", "1_69", "1_70",
"1_71", "1_72"), row.names = c(NA, 6L), class = "data.frame")
That's the code I use for calculation of the mean for 3 replicates which I have in the data frame:
## Calculating the mean of 3 "replicates"
ind <- c(1, 25, 49)
dat2 <- dat[-1]
tbl_end <- cbind(dat[1], sapply(0:23, function(i) rowMeans(dat2[ind+i])))
That's an error which comes:
Error in `[.data.frame`(dat2, ind + i) : undefined columns selected
Called from: eval(substitute(browser(skipCalls = pos), list(pos = 9 - frame)),
envir = sys.frame(frame))
I have 71 columns of results (should be 72 because I have 24 fractions and 3 replicates what gives 72 in total) but there should be one more column. No idea why it's missing but anyway I have to solve it. There is no 1_47 which should come with 1_23 and 1_71. Do you have any idea how can I edit my function to just ignore fraction 1_47 and still get a mean of 1_23 and 1_71 ?
Why not just add in a dummy column for 1_47. That will make your data more regular and make it much easier to extract the indexes you need. To do this, try
dat2<-cbind(dat[1:47], 1_47=rep(NA, nrow(dat)), dat[48:72])
ind <- c(1, 25, 49)
tbl_end <- cbind(dat[1], sapply(0:23, function(i) rowMeans(dat2[ind+i+1], na.rm=T)))

Resources