Reshape long to wide with multiple variables - r

I have a long df in R that looks like this:
structure(list(pta = c("636", "899", "989", "1007", "561"), cafta_similarity = c(0.81468368791454,
0.68814557488039, 0.96371483934995, 0.71527668922595, 0.69435303348955
), iso3n = c(124, 124, 124, 124, 152), ccode = c(20, 20, 20,
20, 155), country = c("Canada", "Canada", "Canada", "Canada",
"Chile"), year = c("1992", "2016", "2018", "2018", "1960"), gdppc = c(20879.8483300891,
42315.6037056806, 46548.6384108296, 46548.6384108296, 505.349325487754
), polity2 = c(10, 10, 10, 10, 5), openness = c(52.7380309449972,
65.3636199818813, 66.5818530921921, 66.5818530921921, 46.9745037862152
), hog_right = c(3, 0, 0, 0, 3), hog_left = c(0L, 1L, 1L, 1L,
0L), hog_center = c(0, 0, 0, 0, 0)), class = c("data.table",
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x7fca56811ae0>)
What I am trying to do is to get it wide across all variables so that I can compute averages on a dyadic level. Basically what I want is pta.1, pta.2, iso3n.1, iso3n.2 and etc... Does anyone know how I can do this?
I have looked at most answers here on reshaping data and tried some but nothing seems to work.

Perhaps this helps
library(data.table)
dcast(df, country + year ~ rowid(country, year), value.var = c("pta", "iso3n"), sep = ".")

Related

How do I combine data taken separately into a single dataset?

I have a dataset comprised of leaves which I've weighed individually in order of emergence (first emerged through final emergence), and I'd like to combine these masses so that I have the entire mass of all the leaves for each individual plant.
How would I add these up using R programming language, or what would I need to google to get started on figuring this out?
structure(list(Tray = c(1, 1, 1, 1, 1, 1), Plant = c(2, 2, 2,
2, 3, 3), Treatment = structure(c(4L, 4L, 4L, 4L, 4L, 4L), .Label = c("2TLH",
"E2TL", "EH", "WL"), class = "factor"), PreSwitch = c("Soil",
"Soil", "Soil", "Soil", "Soil", "Soil"), PostSwitch = c("Soil",
"Soil", "Soil", "Soil", "Soil", "Soil"), Pellet = c(1, 1, 1,
1, 1, 1), Rep = c(1, 1, 1, 1, 1, 1), Date = structure(c(1618963200,
1618963200, 1618963200, 1618963200, 1618963200, 1618963200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), DAP = c(60, 60, 60, 60, 60, 60), Position = c(2,
1, 3, 4, 4, 3), Whorl = structure(c(1L, 1L, 2L, 2L, 2L, 2L), .Label = c("1",
"2", "3", "4", "5"), class = "factor"), PetioleLength = c(1.229,
1.365, 1.713, 1.02, 0, 1.408), BladeLength = c(1.604, 1.755,
2.466, 2.672, 0.267, 2.662), BladeWidth = c(1.023, 1.185, 1.803,
1.805, 0.077, 1.771), BladeArea = c(1.289, 1.634, 3.492, 3.789,
0.016, 3.704), BladePerimeter = c(6.721, 7.812, 11.61, 12.958,
1.019, 14.863), BladeCircularity = c(0.359, 0.336, 0.326, 0.284,
0.196, 0.211), BPR = c(1.30512611879577, 1.28571428571429, 1.43957968476357,
2.61960784313725, NA, 1.890625), Leaf.Mass = c(9, 11, 31, 33,
32, 33), BladeAR = c(1.56793743890518, 1.48101265822785, 1.36772046589018,
1.4803324099723, 3.46753246753247, 1.50310559006211), Subirrigation = c(0,
0, 0, 0, 0, 0), Genotype = c(1, 1, 1, 1, 1, 1), Location = c(0,
0, 0, 0, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I may be missing something but isn't this a sum by Plant?
One solution below sums it for each plant into a separate table with just the totals and the second summarizes and adds it back to the main data set in a single step.
library(tidyverse)
#summary data set
plant_total <- df %>% group_by(Plant) %>% summarize(plant_weight = sum(Leaf.Mass, na.rm= TRUE))
#add plant_weight column to df data set
plant_total <- df %>% group_by(Plant) %>% mutate(plant_weight = sum(Leaf.Mass, na.rm = TRUE))

Label group of plots

I merged nine plots together and I would like to group them based on different characteristics (A,B,C). Is there a simple way to add labels or annotations at the bottom of plots? When using cowplot or GridExtra i receive the following error:
In as_grob.default(plot) :
Cannot convert object of class list into a grob.
Sample data
list(list(stats = structure(c(43, 96.5, 297.5, 707.5, 778), .Dim = c(5L,
1L)), n = 36, conf = structure(c(136.603333333333, 458.396666666667
), .Dim = 2:1), out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(2, 10.5, 55.5, 102, 128), .Dim = c(5L,
1L)), n = 36, conf = structure(c(31.405, 79.595), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(1, 3, 5.5, 77, 88), .Dim = c(5L,
1L)), n = 36, conf = structure(c(-13.9866666666667, 24.9866666666667
), .Dim = 2:1), out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(531, 632.5, 701, 726.5, 786), .Dim = c(5L,
1L)), n = 36, conf = structure(c(676.246666666667, 725.753333333333
), .Dim = 2:1), out = c(485, 464, 446), group = c(1, 1, 1
), names = ""), list(stats = structure(c(104,
109.5, 113.5, 121, 125), .Dim = c(5L, 1L)), n = 36, conf = structure(c(110.471666666667,
116.528333333333), .Dim = 2:1), out = c(91, 91, 88, 84, 84,
79), group = c(1, 1, 1, 1, 1, 1), names = ""),
list(stats = structure(c(28, 53.5, 83.5, 88, 91), .Dim = c(5L,
1L)), n = 36, conf = structure(c(74.415, 92.585), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(80, 89, 102.5, 153, 236), .Dim = c(5L,
1L)), n = 36, conf = structure(c(85.6466666666667, 119.353333333333
), .Dim = 2:1), out = c(343, 318, 299, 257), group = c(1,
1, 1, 1), names = """"), list(stats = structure(c(7,
12, 22.5, 44, 72), .Dim = c(5L, 1L)), n = 36, conf = structure(c(14.0733333333333,
30.9266666666667), .Dim = 2:1), out = numeric(0), group = numeric(0),
names = ""), list(stats = structure(c(5,
5, 6, 12.5, 21), .Dim = c(5L, 1L)), n = 36, conf = structure(c(4.025,
7.975), .Dim = 2:1), out = numeric(0), group = numeric(0),
names = ""))
Many thanks
I agree with the idea of using ggplot2 graphics with facets, but given your plot objects, you could do something like this (to get you started). I used ggplotify instead of cowplot because I ran into trouble with the figure margins, but you might be able to fix that by changing the null device (not tested).
Edit:
Added individual labels and y axis labels, as well as outer margins. You might have to adjust some of that depending on the output size of your composite plot. This may show you how you could adjust those settings for individual plots. Still, using ggplot2 to generate the plots would make things quite a bit easier.
library(grid)
library(gridExtra)
library(ggplotify)
sdt <- list(list(stats = structure(c(43, 96.5, 297.5, 707.5, 778), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(136.603333333333, 458.396666666667), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(2, 10.5, 55.5, 102, 128), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(31.405, 79.595), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(1, 3, 5.5, 77, 88), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(-13.9866666666667, 24.9866666666667), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(531, 632.5, 701, 726.5, 786), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(676.246666666667, 725.753333333333), .Dim = 2:1),
out = c(485, 464, 446), group = c(1, 1, 1), names = ""),
list(stats = structure(c(104, 109.5, 113.5, 121, 125), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(110.471666666667, 116.528333333333), .Dim = 2:1),
out = c(91, 91, 88, 84, 84, 79), group = c(1, 1, 1, 1, 1, 1), names = ""),
list(stats = structure(c(28, 53.5, 83.5, 88, 91), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(74.415, 92.585), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(80, 89, 102.5, 153, 236), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(85.6466666666667, 119.353333333333), .Dim = 2:1),
out = c(343, 318, 299, 257), group = c(1,1, 1, 1), names = ""),
list(stats = structure(c(7, 12, 22.5, 44, 72), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(14.0733333333333, 30.9266666666667), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""),
list(stats = structure(c(5, 5, 6, 12.5, 21), .Dim = c(5L, 1L)),
n = 36, conf = structure(c(4.025, 7.975), .Dim = 2:1),
out = numeric(0), group = numeric(0), names = ""))
sublabels <- paste0(rep(LETTERS[1:3], each=3), 1:3)
gplts <- lapply(1:9, function(x) as.grob(function(y=sdt[[x]]) {
par(oma=c(0,3,0,3))
bxp(y, ylab="values", main=sublabels[x])}))
grid.arrange(rectGrob(gp=gpar(col="red")), rectGrob(gp=gpar(col="green")),
rectGrob(gp=gpar(col="yellow")), nrow=1, newpage =T)
vp <- viewport(.33/2,0.45, gp = gpar(col="red"))
grid.text("Group A",
y = .1, just = c("center", "bottom"),
gp = gpar(fontsize=20), vp = vp)
vp <- viewport(.5,.45, gp = gpar(col="green"))
grid.text("Group B",
y = .1, just = c("center", "bottom"),
gp = gpar(fontsize=20), vp = vp)
vp <- viewport(1-(.33/2),.45, gp = gpar(col="yellow"))
grid.text("Group C",
y = .1, just = c("center", "bottom"),
gp = gpar(fontsize=20), vp = vp)
grid.arrange(grobs=gplts, nrow=1, newpage=F)
Created on 2021-03-25 by the reprex package (v1.0.0)

How to summarise (dplyr) user specified variables reactively in flexdashboard/shiny?

I am trying to develop a shiny dashboard app that is able to produce a bar graph for different outcome variables that can be selected by the user. To do so, I need to subset my data reactively to generate aggregate data frames. I am able to have the code below successfully filter my data reactively, but I am running into trouble when I try to use dplyr::summarise() reactively.
Here is my data
dput(head(df))
structure(
list(
geoid = c(
"01001020200",
"01001020300",
"01001020700",
"01001020802",
"01001021000",
"01001021100"
),
state = c(
"Alabama",
"Alabama",
"Alabama",
"Alabama",
"Alabama",
"Alabama"
),
county = c(
"Autauga County",
"Autauga County",
"Autauga County",
"Autauga County",
"Autauga County",
"Autauga County"
),
ozzone = structure(
c(1L, 1L, 2L, 1L, 1L, 1L),
.Label = c("non.oz", "oz"),
class = "factor"
),
tract_type = c(
"LICs",
"Contiguous",
"LICs",
"Contiguous",
"Contiguous",
"LICs"
),
investment_score_1_low_10_high = c(4,
6, 9, 10, 5, 6),
socioeconomic_change_flag_1_yes_blank_no = c(0,
0, 0, 0, 0, 0),
fips_county = c("01001", "01001", "01001", "01001",
"01001", "01001"),
total_empl = c(51809L, 51809L, 51809L, 51809L,
51809L, 51809L),
total_payroll = c(338395L, 338395L, 338395L,
338395L, 338395L, 338395L),
total_establishments = c(5090L, 5090L,
5090L, 5090L, 5090L, 5090L),
largest_employer = c(72L, 72L, 72L,
72L, 72L, 72L),
largest_employer_bypayroll = c(44L, 44L, 44L,
44L, 44L, 44L),
trend_employee_change = c(
2735.60000000046,
2735.60000000046,
2735.60000000046,
2735.60000000046,
2735.60000000046,
2735.60000000046
),
trend_payroll_change = c(
23074.8000000037,
23074.8000000037,
23074.8000000037,
23074.8000000037,
23074.8000000037,
23074.8000000037
),
trend_establishment_change = c(
53.4000000000084,
53.4000000000084,
53.4000000000084,
53.4000000000084,
53.4000000000084,
53.4000000000084
),
damage_cost_weather_total = c(20000, 20000, 20000, 20000,
20000, 20000),
deaths_weather_total = c(0L, 0L, 0L, 0L, 0L, 0L),
medianrent = c(537, 633, 525, 680, 409, 303),
vacancyrate = c(
0.108200455580866,
0.113652113652114,
0.0436681222707424,
0.0512166859791425,
0.229962546816479,
0.21030303030303
),
total_pop = c(503, 827, 900, 2989, 740, 813),
undertwo_percent = c(
0.391650099403579,
0.351874244256348,
0.397777777777778,
0.17096018735363,
0.301351351351351,
0.263222632226322
),
mobility_rate = c(
0.133702166897188,
0.0737753882915173,
0.196514423076923,
0.172716680111141,
0.0641304347826087,
0.0681084570690769
),
unemploy_rate = c(
0.0176991150442478,
0.0273203592814371,
0.109881724532621,
0.0127906976744186,
0.0344982078853047,
0.0281910728269381
),
median_income = c(41287, 46806, 41250, 64439,
46607, 36450),
renter_percent = c(
0.337653478854025,
0.310596310596311,
0.331877729257642,
0.268110942458949,
0.328686327077748,
0.365986394557823
),
blackaa_percent = c(
0.5451197053407,
0.264697193500739,
0.145906432748538,
0.152916262243007,
0.258583690987124,
0.530922930542341
),
hispanic_percent = c(
0.0105893186003683,
0.0803545051698671,
0.0400584795321637,
0.0137651107385511,
0.00822603719599428,
0.00666032350142721
),
transit_score_mean = c(0, 0, 0, 0, 0, 0),
life_expectancy = c(75.67, 75.67, 75.67, 75.67, 75.67, 75.67),
trend_life_expectancy = c(5.1, 5.1, 5.1, 5.1, 5.1, 5.1),
median_monthly_housing_costs = c(885,
885, 885, 885, 885, 885),
pestilence_2018 = c(2, 2, 2, 2, 2,
2),
total_pop_county = c(6772, 6772, 6772, 6772, 6772, 6772),
deaths_weather_pop = c(0, 0, 0, 0, 0, 0),
cost_weather_pop = c(
2.95333727111636,
2.95333727111636,
2.95333727111636,
2.95333727111636,
2.95333727111636,
2.95333727111636
),
Male_HSgrad = c(75, 68, 211, 189, 97,
42),
Male_SomeCollege = c(28, 18, 51, 111, 74, 38),
Male_AssocDeg = c(4,
6, 0, 63, 0, 21),
Male_BachDeg = c(7, 9, 0, 11, 0, 9),
Male_GradDeg = c(0,
0, 0, 29, 6, 0),
MaleEduAboveHS = c(114, 101, 262, 403, 177,
110),
Total_Male18.24 = c(145, 123, 285, 455, 202, 110),
MaleEduHSAbove_pop = c(
0.786206896551724,
0.821138211382114,
0.919298245614035,
0.885714285714286,
0.876237623762376,
1
),
Female_HSgrad = c(11, 60, 87, 156, 23, 83),
Female_SomeCollege = c(22,
25, 13, 47, 54, 65),
Female_AssocDeg = c(0, 0, 20, 82, 0,
0),
Female_BachDeg = c(5, 26, 0, 19, 0, 11),
Female_GradDeg = c(5,
16, 0, 0, 0, 0),
FemaleEduAboveHS = c(43, 127, 120, 304,
77, 159),
Total_Female18.24 = c(53, 127, 192, 581, 92, 198),
FemaleEduHSAbove_pop = c(
0.811320754716981,
1,
0.625,
0.523235800344234,
0.83695652173913,
0.803030303030303
)
),
row.names = c(NA,
6L),
class = "data.frame"
)
Here is my code
#List of potential outcome variables to be plotted
variables <- c("total_empl", "total_payroll", "total_establishments", "largest_employer", "largest_employer_bypayroll", "trend_employee_change", "trend_payroll_change", "trend_establishment_change", "damage_cost_weather_total", "deaths_weather_total", "medianrent", "vacancyrate", "total_pop", "undertwo_percent", "mobility_rate", "unemploy_rate", "median_income", "renter_percent", "blackaa_percent", "hispanic_percent", "median_monthly_housing_costs", "MaleEduAboveHS_pop", "FemaleEduHSAbove_pop")
# Define inputs
selectInput('state_name', label = 'Select a state', choices = lookup)
selectInput('DV', label = 'Outcome Measure', choices = variables)
#Filter data based on the State and outcome measure the user would like to investigate.
bar <- reactive({
st <- df %>%
filter(state == input$state_name)
bp <- st %>%
group_by(tract_type) %>%
summarise(Outcome = mean(st[,input$DV]))
return(bp)
})
bar
UPDATE
Right now, this code successfully filters the data by the input$state_name, but there is an issue with the calculation of means. The result is this:
# A tibble: 2 x 2
tract_type Outcome
<chr> <dbl>
1 Contiguous 468296.
2 LICs 468296.
As you can see, the means that are calculated are identical. In fact, these values correspond to the grand average mean for whichever variable is chosen for input$DV. Therefore, the filtered st data is not being successfully grouped into the two levels of tract_type.
I see what you are trying to do. The difference is that in your reactive part you try to calculate the mean of a string, which won't work. What you want to do is summarise one of the columns in df by providing the name
In the following example, I specify the summarising variable manually. Note that investment_score_1_low_10_high does not have quotes. investment_score_1_low_10_high is what is called a symbol in R.
st <- df %>%
filter(state == "Alabama") %>%
group_by(tract_type) %>%
summarise(Outcome = mean(investment_score_1_low_10_high))
But I think this should work:
bar <- reactive({
# Create a symbol from string.
mean_variable <- sym(input$DV)
bp <- df %>%
filter(state == input$state_name) %>%
group_by(tract_type) %>%
summarise(Outcome = mean(!! mean_variable, na.rm = TRUE))
return(bp)
})
Extra information about the use of !! and what it does can be found here: Here
And even better with examples Here
Solution derived by #dylanvanw
bar <- reactive({
# Create a symbol from string.
mean_variable <- sym(input$DV)
bp <- df %>%
filter(state == input$state_name) %>%
group_by(tract_type) %>%
summarise(Outcome = mean(!! mean_variable, na.rm = TRUE))
return(bp)
})

How to create differences between several pairs of columns?

I have a panel (cross-sectional time series) dataset. For each group (defined by (NAICS2, occ_type) in time ym) I have many variables. For each variable I would like to subtract each group's first (dplyr::first) value from every value of that group.
Ultimately I am trying to take the Euclidean difference between the vector of each row 's group's first entry, (i.e. sqrt(c_1^2 + ... + c_k^2).
I was able to create the a column equal to the first entries for each group:
df2 <- df %>%
group_by(ym, NAICS2, occ_type) %>%
distinct(ym, NAICS2, occ_type, .keep_all = T) %>%
arrange(occ_type, NAICS2, ym) %>%
select(group_cols(), ends_with("_scf")) %>%
mutate_at(vars(-group_cols(), ends_with("_scf")),
list(first = dplyr::first))
I then tried to include variations of f.diff = . - dplyr::first(.) in the list, but none of those worked. I googled the dot notation for a while as well as first and lag in dplyr timeseries but have not been able to resolve this yet.
Ideally, I unite all variables into a vector for each row first and then take the difference.
df2 <- df %>%
group_by(ym, NAICS2, occ_type) %>%
distinct(ym, NAICS2, occ_type, .keep_all = T) %>%
arrange(occ_type, NAICS2, ym) %>%
select(group_cols(), ends_with("_scf")) %>%
unite(vector, c(-group_cols(), ends_with("_scf")), sep = ',') %>%
# TODO: DISTANCE_BETWEEN_ENTRY_AND_FIRST
mutate(vector.diff = ???)
I expect the output to be a numeric column that contains a distance measure of how different each group's row vector is from its initial row vector.
Here is a sample of the data:
structure(list(ym = c("2007-01-01", "2007-02-01"), NAICS2 = c(0L,
0L), occ_type = c("is_middle_manager", "is_middle_manager"),
Administration_scf = c(344, 250), Agriculture..Horticulture..and.the.Outdoors_scf = c(11,
17), Analysis_scf = c(50, 36), Architecture.and.Construction_scf = c(57,
51), Business_scf = c(872, 585), Customer.and.Client.Support_scf = c(302,
163), Design_scf = c(22, 17), Economics..Policy..and.Social.Studies_scf = c(7,
7), Education.and.Training_scf = c(77, 49), Energy.and.Utilities_scf = c(25,
28), Engineering_scf = c(90, 64), Environment_scf = c(19,
19), Finance_scf = c(455, 313), Health.Care_scf = c(105,
71), Human.Resources_scf = c(163, 124), Industry.Knowledge_scf = c(265,
174), Information.Technology_scf = c(467, 402), Legal_scf = c(21,
17), Maintenance..Repair..and.Installation_scf = c(194, 222
), Manufacturing.and.Production_scf = c(176, 174), Marketing.and.Public.Relations_scf = c(139,
109), Media.and.Writing_scf = c(18, 20), Personal.Care.and.Services_scf = c(31,
16), Public.Safety.and.National.Security_scf = c(14, 7),
Religion_scf = c(0, 0), Sales_scf = c(785, 463), Science.and.Research_scf = c(52,
24), Supply.Chain.and.Logistics_scf = c(838, 455), total_scf = c(5599,
3877)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), groups = structure(list(ym = c("2007-01-01",
"2007-02-01"), NAICS2 = c(0L, 0L), occ_type = c("is_middle_manager",
"is_middle_manager"), .rows = list(1L, 2L)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

Error in rowSums(Qf) : 'x' must be an array of at least two dimensions (msm::msm2surv)

I am using function msm2surv in msm package and trying to convert longitudinal data to the format flexsurve package likes. Following is my sample called tmp.
tmp <- structure(list(id = c(89, 90, 90, 91, 91, 91, 92, 92, 93, 93,
94, 94, 94, 95, 95, 96), days = c(9157, 0, 9156, 0, 8394, 9156,
0, 9156, 0, 8079, 0, 8933, 9003, 0, 8430, 0), event = c(1, 1,
1, 1, 2, 2, 1, 1, 1, 5, 1, 3, 6, 1, 4, 1)), row.names = c(NA,
-16L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = "id", drop = TRUE, indices = list(
0L, 1:2, 3:5, 6:7, 8:9, 10:12, 13:14, 15L), group_sizes = c(1L,
2L, 3L, 2L, 2L, 3L, 2L, 1L), biggest_group_size = 3L, labels = structure(list(
id = c(89, 90, 91, 92, 93, 94, 95, 96)), row.names = c(NA,
-8L), class = "data.frame", vars = "id", drop = TRUE, .Names = "id"), .Names = c("id",
"days", "event"))
running codes:
library(msm)
Q <- matrix(c(
0,1,1,1,1,0,
0,0,1,1,1,1,
0,0,0,1,1,1,
0,0,0,0,1,1,
0,0,0,0,0,0,
0,0,0,0,0,0
), nrow=6, ncol=6,
byrow=TRUE,
dimnames=list(from=1:6,to=1:6))
dat <- msm2Surv(data=tmp, subject="id", time="days", state="event", Q=Q)
It gives me the error: Error in rowSums(Qf) : 'x' must be an array of at least two dimensions.
I checked the dimension of the dataframe, it sounds ok to me. But the error is there. Does everyone know how to resolve the problem/error?
Many thanks!
We can convert the tbl_df to data.frame and it should work
out <- msm2Surv(data=as.data.frame(tmp), subject="id",
time="days", state="event", Q=Q)
dim(out)
#[1] 31 8
The "tmp" dataset is a grouped tbl_df and has lots of attributes. By converting to data.frame, we remove those attributes.

Resources