Related
community!
I'm trying to run FAMD on a morphology-based dataset with 25 qualitative variables recording the presence and absence of fluorescence on a body part (binary) and six quantitative variables. Furthermore, I have a few supplementary variables such as sex, genus and depth.
First I ran the code for the FAMD on my data set after I had removed all missing values with na.omit():
res.famd1<-FAMD(fluo_famd1,sup.var=c(1,2,28,35),graph=FALSE, ncp=5)
and retrieved a bunch of results like eigenvalues, scree plot etc.
I then tried to plot my qualitative variables within the two dimensions like in this example:
[Example][1]
This is the code I used:
quali.var1 <- get_famd_var(res.famd1, "quali.var")
quali.var1
fviz_famd_var(res.famd1, "quali.var")
Instead of plotting the categories R is plotting decimal numbers I can't explain.
[Missing categories][2]
After this I tried running the FAMD on my data set with missing values using the code given in the package description:
require(missMDA)
res.impute <- imputeFAMD(fluo_famd2, ncp=3)
res.famd2 <- FAMD(fluo_famd2,tab.disj=res.impute$tab.disj,sup.var=c(1,2,28))
When trying to plot the categories now, they do appear in the plot but they are doubled and labelled with _0 and _1.
[doubled categories][3]
My questions are:
Can you identify an obvious mistake? Why would the categories be plotted twice in the graph? Does it have an impact on the overall analysis? Is FAMD suited for a data set like this?
[1]: https://i.stack.imgur.com/8UFlA.png
[2]: https://i.stack.imgur.com/qb3Cz.png
[3]: https://i.stack.imgur.com/O1Dff.png
Please find a subset of my data here:
structure(list(genus = structure(c(5L, 7L, 7L, 7L, 9L, 7L, 7L,
9L, 9L, 7L, 7L, 9L, 7L, 6L, 7L), .Label = c("Cryptochirus",
"Dacryomaia",
"Fizesereneia", "Fungicola", "Hapalocarcinus", "Hiroia",
"Lithoscaptus",
"Neotroglocarcinus", "Opecarcinus", "Pseudohapalocarcinus",
"Xynomaia"
), class = "factor"), sex = structure(c(1L, 1L, 1L, 2L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("f", "m"), class
=
"factor"),
frontal_dorsal = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class =
"factor"),
frontal_ventral = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class =
"factor"),
mesogastric = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class =
"factor"),
cardial = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"),
branchial = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
ps1 = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
ps2 = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
ps3 = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
ps4 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
ps6 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
telson = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
eyes = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"),
eyestalk = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"),
antennules = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class =
"factor"),
anntenullar_peduncle = structure(c(1L, 1L, 2L, 2L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class
=
"factor"),
depth = c(NA, 10.3, 16, 16.1, 14.3, 12.8, 10.8, 12.6, 10.2,
11, 11.9, 13.1, 10.7, 10.1, 12.3), carapace_fluo = c(NA,
NA, 0.0999104660846311, 0.459446596994549, 0.639459602769835,
0.0157309627508303, NA, 0.792912115871697, 0.385646421420439,
0.0934932558564838, 0.118926192063408, 0.334765757290687,
NA, 0.712954991372207, 0.816431146170724), ap_fluo = c(NA,
0, 0.153709650160554, NA, 0.526410945516736, 0,
0.0572985597508758,
NA, 0.0105633802816901, 0.284174213022855, 0.305258467023173,
0.402286503491138, NA, 0, 0.0679211592610398), prod_fluo = c(NA,
0, 0, NA, 0.528576376861794, 0, 0, 0.15260360009031, 0,
0.0252962625341841,
0.241194486983155, 0.0717077570655442, NA, 0.479219143576826,
0), pol_fluo = c(NA, 0, 0, NA, 0, 0, 0, 0.118164567879938,
0, 0, 1, 0, NA, 0.299160251924423, 0), dac_fluo = c(NA, 0,
0, NA, 0, 0, 0, 0.102848534648042, 0, 0, 0.309536216779573,
0, NA, 0.0654761904761905, 0), sum_chel = c(NA, 0, 0, NA,
0.345118733509235, 0, 0, 0.14349725008088, 0, 0.0155266470835082,
0.347599820547331, 0.0451661774453177, NA, 0.32612422524067,
0)), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
I have a folder which serves as a container for a standardized report from a system. This report is run on a daily basis. However, the report may require re-run for a certain date or range of dates depending on user preferences and asks. Thus file content may change significantly.
I would like to create a script that would group the unique dates together in one dataframe based on the latest run time, and another dataframe for the dates that are being revised.
Here is a simplified version of the table:
structure(list(Source = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), Date = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("11-Feb-20", "12-Feb-20"
), class = "factor"), FarmType = structure(c(3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L), .Label = c("AJSKJA",
"ASKJKA", "GHDGH", "KLKIUK", "KLSAKJ"), class = "factor"), FarmName = structure(c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L), .Label = c("",
"JJHGH", "JKJKK", "JUISO", "SDLLS"), class = "factor"), Perform = c(13.04144378,
1.230474165, 1.230474165, 13.9407486, 13.9407486, 13.04144378,
1.230474165, 1.230474165, 13.9407486, 13.9407486, 13.04144378,
15.26566, 1.230474165, 13.9407486), RunDate = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("02/14/2020",
"02/15/2020"), class = "factor")), class = "data.frame", row.names = c(NA,
-14L))
Please note that the number of columns does not change, however, after each re-run the number of rows may increase/decrease.
The idea is -- the first group of data that is based on the most recent run would represent the up-to-date information (corrections, revisions, etc.), while the second group essentially looks at what is being revised and how the numbers and data are changing.
Expected output for the first group:
structure(list(Source = c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L),
Date = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("11-Feb-20",
"12-Feb-20"), class = "factor"), FarmType = structure(c(3L,
4L, 5L, 1L, 3L, 4L, 5L, 1L, 2L), .Label = c("AJSKJA", "ASKJKA",
"GHDGH", "KLKIUK", "KLSAKJ"), class = "factor"), FarmName = structure(c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("", "JJHGH",
"JKJKK", "JUISO", "SDLLS"), class = "factor"), Perform = c(13.04144378,
15.26566, 1.230474165, 13.9407486, 13.04144378, 1.230474165,
1.230474165, 13.9407486, 13.9407486), RunDate = structure(c(2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("02/14/2020",
"02/15/2020"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
Expected output for the second group:
structure(list(Source = c(1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L),
Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "11-Feb-20", class = "factor"),
FarmType = structure(c(3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L
), .Label = c("AJSKJA", "ASKJKA", "GHDGH", "KLKIUK", "KLSAKJ"
), class = "factor"), FarmName = structure(c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L), .Label = c("", "JJHGH", "JKJKK",
"JUISO", "SDLLS"), class = "factor"), Perform = c(13.04144378,
1.230474165, 1.230474165, 13.9407486, 13.9407486, 13.04144378,
15.26566, 1.230474165, 13.9407486), RunDate = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("02/14/2020",
"02/15/2020"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
Thank you for your time. Please let me know if you have questions.
We could group by 'Date' and filter those groups where the 'RunDate' is the latest after converting to Date class
library(lubridate)
library(dplyr)
new1 <- df1 %>%
group_by(Date) %>%
filter(mdy(RunDate) == max(mdy(RunDate)))
and for the second set, we can check if the number of distinct elements of 'RunDate' is more than 1
new2 <- df1 %>%
group_by(Date) %>%
filter(n_distinct(RunDate) > 1)
I am very new to R (a few months experience from online learning and reading) and have no coding experience before this.
I have been using a data set obtained from work (healthcare) for some practice. I wanted to demonstrate certain patient outcomes over time (by month) in this data set.
I've separated the data by month into a separate data frames that I have stored in a list. I then narrowed down each data frame within the list to the 3 post-operative outcomes that I want to look at. All three outcomes are binary (Y or N).
I would like to know if there is anyway I can work out the percentages of "Y" for each of these outcomes by month, and then store this in an object that I can then plot to show the trend over time (by month).
Have I approached this problem completely wrongly? Should I not have used a list at all?
I managed to get to a point where I have a list of tables of Y's and N's but am now completely clueless as to what to do from there.
list(structure(list(Mobilised_D1 = structure(c(2L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L), .Label = c("N", "Y"), class =
"factor"),
Catheter_rm_D1 = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
Diet_D1 = structure(c(2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("N", "Y"), class = "factor")), class =
"data.frame", row.names = 2:15),
structure(list(Mobilised_D1 = structure(c(1L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("N",
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L
), .Label = c("N", "Y"), class = "factor"), Diet_D1 = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("N", "Y"), class = "factor")), class = "data.frame",
row.names = 16:31),
structure(list(Mobilised_D1 = structure(c(2L, 1L, 1L, 2L,
1L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
Catheter_rm_D1 = structure(c(1L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"),
Diet_D1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("N", "Y"), class = "factor")), class =
"data.frame", row.names = 32:42),
structure(list(Mobilised_D1 = structure(c(2L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("N",
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(2L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("N", "Y"), class = "factor"), Diet_D1 =
structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("N", "Y"), class = "factor")), class = "data.frame",
row.names = 43:60),
structure(list(Mobilised_D1 = structure(c(1L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 2L, NA), .Label = c("N",
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("N",
"Y"), class = "factor"), Diet_D1 = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("N",
"Y"), class = "factor")), class = "data.frame", row.names = 61:74),
structure(list(Mobilised_D1 = structure(c(1L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L), .Label = c("N",
"Y"), class = "factor"), Catheter_rm_D1 = structure(c(1L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L
), .Label = c("N", "Y"), class = "factor"), Diet_D1 = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("N", "Y"), class = "factor")), class = "data.frame",
row.names = 75:90))
For each component of the input list, L, take the indicated mean arranging that into a multivariate time series with one row per month. Then plot it on a single panel. Remove facet=NULL if you want each series in a separate panel.
library(zoo)
library(ggplot2)
series <- zoo( t(sapply(L, function(x) colMeans(x == "Y"))) )
autoplot(series, facet = NULL) + geom_point()
(continued after graph)
Alternative
An alternative is to create a data frame DF from L along with a month vector aggregating by month as shown. This makes use of the fact that DF will have row names consisting of the month followed by a decimal point and a row number from the original component that each input row was was constructed from.
DF <- do.call("rbind", setNames(L, seq_along(L)))
month <- as.integer(rownames(DF))
series <- aggregate(zoo(DF == "Y"), month, mean)
autoplot(series, facet = NULL) + geom_point()
I have several dataframes with data from the same survey. I want to combine them for analysis. The dataframes contain both unique variables and two variables (ID and Contest_no) that are shared across all the dataframes; the two shared variables contain information about the respondent and the contest number (1,2,3, as respondents were asked the same questions three times).
The difficulty is that the dataframes have missing values:
DF1 <- data.frame(V1 = factor(c("A", "B", "C", "D")),
V2 = factor(c("A", "B", "C", "D")),
ID = factor(c("x1", "x1", "y2", "y2")),
Contest_no = factor(c("1", "2", "1", "2")))
DF2 <- data.frame(V3 = factor(c("A", "C", "D")),
V4 = factor(c("A", "C", "D")),
ID = factor(c("x1", "y2", "y2")),
Contest_no = factor(c("1", "1", "2")))
DF3 <- data.frame(V5 = factor(c("A", "B", "C")),
V6 = factor(c("A", "B", "C")),
ID = factor(c("x1", "x1", "y2")),
Contest_no = factor(c("1", "2", "1")))
As a result, respondent IDs and contest numbers aren't aligned. I want to match the data to respondent IDS and contest numbers so that the merged dataframe looks like this:
DF_merged <- data.frame(V1 = factor(c("A", "B", "C", "D")),
V2 = factor(c("A", "B", "C", "D")),
V3 = factor(c("A", NA, "C", "D")),
V4 = factor(c("A", NA, "C", "D")),
V5 = factor(c("A", "B", "C", NA)),
V6 = factor(c("A", "B", "C", NA)),
ID = factor(c("x1", "x1", "y2", "y2")),
Contest_no = factor(c("1", "2", "1", "2")))
I thought that full_join would do the trick, but DF_merged <- full_join(DF1, DF2, DF3, by="ID") gives me nonsensical results.
How can disparate data like this be combined?
New, updated example (to address the problem of multiplied rows). In this example there are no missing values at all, and both dataframes have the same number of rows, but the code results in multiplied rows. First, the two dataframes to be merged:
df1:
structure(list(ID = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("EE1", "EE101", "EE102"), class = "factor"),
Contest_no = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 3L,
3L), Option = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L), .Label = c("Option1", "Option2"), class = "factor"),
Chosen_option = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 1L), Combination = structure(c(5L, 5L, 6L, 6L, 4L, 4L,
2L, 2L, 1L, 1L, 3L, 3L), .Label = c("V133", "V181", "V234",
"V252", "V32", "V67"), class = "factor"), Attribute1 = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("has strong ties to the government",
"has weak ties to the government"), class = "factor"), Attribute2 = structure(c(1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("has strong ties to the local pastoralist community",
"has weak ties to the local pastoralist community"), class = "factor"),
Attribute3 = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 2L), .Label = c("is poor", "is wealthy"), class = "factor"),
Attribute4 = structure(c(2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L,
1L, 2L, 2L, 2L), .Label = c("has attained a high level of formal education (for example university degree)",
"has not attained a high level of formal education (for example never went to school or only attended primary school)"
), class = "factor")), .Names = c("ID", "Contest_no", "Option",
"Chosen_option", "Combination", "Attribute1", "Attribute2", "Attribute3",
"Attribute4"), class = "data.frame", row.names = c(NA, -12L))
df2:
structure(list(ID = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L), .Label = c("EE1", "EE101", "EE102"), class = "factor"),
Contest_no = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 2L, 2L, 3L,
3L), Option = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L), .Label = c("Option1", "Option2"), class = "factor"),
Chosen_option = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
0L, 1L), Combination = structure(c(6L, 6L, 4L, 4L, 1L, 1L,
3L, 3L, 5L, 5L, 2L, 2L), .Label = c("V150", "V249", "V252",
"V29", "V56", "V77"), class = "factor"), Attribute1 = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("has strong ties to the government",
"has weak ties to the government"), class = "factor"), Attribute2 = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("has strong ties to the local pastoralist community",
"has weak ties to the local pastoralist community"), class = "factor"),
Attribute3 = structure(c(2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 2L), .Label = c("is poor", "is wealthy"), class = "factor"),
Attribute4 = structure(c(2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 2L, 2L), .Label = c("has attained a high level of formal education (for example university degree)",
"has not attained a high level of formal education (for example never went to school or only attended primary school)"
), class = "factor")), .Names = c("ID", "Contest_no", "Option",
"Chosen_option", "Combination", "Attribute1", "Attribute2", "Attribute3",
"Attribute4"), class = "data.frame", row.names = c(NA, -12L))
and now the unsuccessful attempt to combine the two dataframes:
df_merge_attempt <- dplyr::full_join(df1, df2, by=c("ID","Contest_no"))
results in:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), .Label = c("EE1", "EE101", "EE102"), class = "factor"), Contest_no = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), Option.x = structure(c(1L, 1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 2L, 2L), .Label = c("Option1", "Option2"), class = "factor"),
Chosen_option.x = c(0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L),
Combination.x = structure(c(5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L), .Label = c("V133", "V181", "V234", "V252", "V32", "V67"
), class = "factor"), Attribute1.x = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("has strong ties to the government",
"has weak ties to the government"), class = "factor"), Attribute2.x = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("has strong ties to the local pastoralist community",
"has weak ties to the local pastoralist community"), class = "factor"),
Attribute3.x = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L), .Label = c("is poor", "is wealthy"), class = "factor"),
Attribute4.x = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("has attained a high level of formal education (for example university degree)",
"has not attained a high level of formal education (for example never went to school or only attended primary school)"
), class = "factor"), Option.y = structure(c(1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Option1", "Option2"), class = "factor"),
Chosen_option.y = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L),
Combination.y = structure(c(6L, 6L, 6L, 6L, 4L, 4L, 4L, 4L,
1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 2L, 2L, 2L,
2L), .Label = c("V150", "V249", "V252", "V29", "V56", "V77"
), class = "factor"), Attribute1.y = structure(c(2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("has strong ties to the government",
"has weak ties to the government"), class = "factor"), Attribute2.y = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("has strong ties to the local pastoralist community",
"has weak ties to the local pastoralist community"), class = "factor"),
Attribute3.y = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
2L), .Label = c("is poor", "is wealthy"), class = "factor"),
Attribute4.y = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("has attained a high level of formal education (for example university degree)",
"has not attained a high level of formal education (for example never went to school or only attended primary school)"
), class = "factor")), class = "data.frame", row.names = c(NA,
-24L), .Names = c("ID", "Contest_no", "Option.x", "Chosen_option.x",
"Combination.x", "Attribute1.x", "Attribute2.x", "Attribute3.x",
"Attribute4.x", "Option.y", "Chosen_option.y", "Combination.y",
"Attribute1.y", "Attribute2.y", "Attribute3.y", "Attribute4.y"
))
You can try dplyr::full_join with by=c("ID","Contest_no") argument as:
library(dplyr)
df1 <- full_join(DF1, DF2, by=c("ID","Contest_no")) %>%
full_join(DF3, by=c("ID","Contest_no"))
df1
# V1 V2 V3 V4 V5 V6 ID Contest_no
#1 A A A A A A x1 1
#2 B B <NA> <NA> B B x1 2
#3 C C C C C C y2 1
#4 D D D D <NA> <NA> y2 2
Updated: Answer has been modified to consider another column Option in full_join as:
df1 <- full_join(DF1, DF2, by=c("ID","Contest_no", "Option"))
Note: I had to tweak my dplyr to match what is suggested by #Gregor in order to get expected result.
I would like to make a "nested" sort of table in R that mirrors the formatting of a plot I can make with ggplot using facet_wrap.
Here are some data and the code:
tabledata = structure(list(row = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,1L, 2L, 1L, 2L, 1L, 2L),
col = c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L),
grp1 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("a", "b"), class = "factor"),
grp2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
.Label = c("g", "h"), class = "factor"),
value = c(9L, 9L, 14L, 8L, 10L, 9L, 8L, 15L, 2L, 1L, 3L, 4L, 1L, 5L, 2L, 4L)),
.Names = c("row", "col", "grp1", "grp2", "value"), class = "data.frame",
row.names = c(NA, -16L))
ggplot(tabledata, aes(grp2, value, shape = grp1)) + geom_jitter() + facet_grid(row ~ col)
Which produce this plot:
Here is the table I would like to make (which can easily be done with a pivot table, but obviously that is not ideal):
A nested table can be made using the tabular() function in the tables package using the following code.
tabular(
(Heading()*Factor(row)*Heading()*grp1)~
(Heading()*Factor(col)*Heading()*grp2)*Heading()*value*Heading()*identity,
data = tabledata)
The table can then be saved as a .csv file using write.csv.tabular().
Tidyverse just added a table package that has the nested format built in. It's called "gt" (great tables) https://blog.rstudio.com/2020/04/08/great-looking-tables-gt-0-2/