Separate conduction of conjugacy tables in R - r

In my dataset there are 7 binary (categorical) variables (x1-x7.)
Another vars are scale and we won't use it.
mydat(part of)
structure(list(city = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 1L), .Label = c("New-York", "Washington"), class = "factor"),
x1 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x2 = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x3 = c(0L, 0L, 1L, 1L,
0L, 0L, 0L, 1L, 1L, 0L), x4 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L,
1L, 1L, 0L), x5 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L
), x6 = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), x7 = c(0L,
0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L), var1 = c(10L, 71L, 49L,
70L, 79L, 46L, 87L, 57L, 81L, 68L), var2 = c(34L, 17L, 28L,
63L, 95L, 99L, 40L, 63L, 24L, 90L), var3 = c(21L, 89L, 81L,
26L, 59L, 87L, 84L, 24L, 27L, 83L), var4 = c(86L, 70L, 45L,
40L, 95L, 94L, 39L, 97L, 89L, 30L)), .Names = c("city", "x1",
"x2", "x3", "x4", "x5", "x6", "x7", "var1", "var2", "var3", "var4"
), class = "data.frame", row.names = c(NA, -10L))
I created the function which perform cross-tabs between all binary variables.
Perhaps maybe it will be useful for anybody.
It works.
mydat=read.csv(mydat)
library("gmodels")
mult_crosstab <- function (data = cross) {
for (j in 1:(ncol(data)-1)) {
for (i in (j+1):(ncol(data))) {
x <- names(data)[j]
y <- names(data)[i]
call <- call("CrossTable", as.name(x), as.name(y), chisq = TRUE)
eval(call, data)
}
}
}
mult_crosstab()
But in dataset - mydat there is variable city (Washington and New-York).
How to use my fuction to calculcate the cross-tabs for two cities separately?

Split your data by city, then iterate over each split and run your function
lapply(split(cross, cross$city), mult_crosstab)
It seems that CrossTable is a nasty little function that prints to the console instead of returning an object. You can capture your output as text with capture.output(code_above) if that is what you desire.

Related

R code of scatter plot for four variables

I tried plotting ASB vs YOI for each Child grouped by Race
I got something like:
library(tidyverse)
Antisocial <- structure(list(Child = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L), ASB = c(1L, 1L, 1L, 0L, 0L, 0L, 5L, 5L, 5L, 2L), Race = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Y92 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), Y94 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), YOI = c(90L, 92L, 94L, 90L, 92L, 94L, 90L, 92L, 94L, 90L)), row.names = c(NA, 10L), class = "data.frame")
ggplot(data = Antisocial, aes(x = YOI, y = ASB)) +
geom_point( colour = "Black", size = 2) +
geom_line(data = Antisocial, aes(x= Child), size = 1) +
facet_grid(.~ Race)
Plot Image I generated: https://drive.google.com/file/d/1sZVsRFiGC0dIGg0GWhHhNDCaiW2iB-ky/view?usp=sharing
Full dataset- https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
I want to use 2 charts side by side Race=0, Race= 1 to plot ASB vs YOI for each Child grouped by Race. The line, however, should only connect to dots of the same child. As it is right now, all the dots are connected. Furthermore the scale of YOI should be (90,94).
Can you suggest what change should I do?
Thanks!
Thanks for providing the data. I changed 4 observations to race 0 to have some variation:
library(tidyverse)
Antisocial <- structure(list(Child = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L), ASB = c(1L, 1L, 1L, 0L, 0L, 0L, 5L, 5L, 5L, 2L), Race = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L), Y92 = c(0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L), Y94 = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L), YOI = c(90L, 92L, 94L, 90L, 92L, 94L, 90L, 92L, 94L, 90L)), row.names = c(NA, 10L), class = "data.frame")
ggplot(data = Antisocial, aes(x = YOI, y = ASB, , group = Child)) +
geom_point( colour = "Black", size = 2) +
geom_line()+
facet_grid(.~ Race)
To connect the dots for each child, you need to include group = Child in the code. I think this is what you want? Let me know if this solved your problem :)

How can I correct my for loop code to ensure that it is applied to each of my columns and the results are combined together?

I am trying to use a for loop to fit a model to each of my metabolites. There are 790 columns I need to apply this to. The output of the results are three values ( estimate, std.error and p value), and my empty matrix has 790 rows and three columns for the results to be entered into. (Therefore, I aim to get an estimate, std.error and p value for each of the 790 metabolites, in order to be able to gauge whether there are increases or decreases comparing control relative to disease, and whether there is any statistical significance).
Please find below the code I have tried so far, any suggestions would be greatly appreciated.
results.out <- matrix(0, nrow=790, ncol=3)
require(lme4)
require(lmerTest)
for(i in 1:ncol(data[, 8:792])){
fit <- lmer (data1[,i] ~ Diseasestatus + BB + ACOG + WA + BMI + Age + (1|ParticpantID), data=data, REML=F, na.action=na.omit)
results<- summary(fit)$coef[2, c(1, 2, 5)][i]
results.out[, i] <- results
}
The error message I get with the above is
Error in eval_f(x, ...) : Downdated VtV is not positive definite
( not sure whether this might be due to the presence of 0 and 1 in some columns. For instance disease is 1 and control is 0. Taking the medication BB, ACOG, WA is denoted 1, not taking it is 0.
or trying one of the apply functions also gets an error
output <- apply(data[,8:792], 2, function(i){
fit <- lmer (data[,i] ~ Diseasestatus + BB + ACOG + WA + BMI + Age + (1|ParticpantID), data=data, REML=F, na.action=na.omit)
results<- summary(fit)$coef[2, c(1, 2, 5)][i]
})
dplyr::bind_rows(output, .id="Metabolite")
The error message from the above is;
Error in model.frame.default(data = data, na.action =
na.omit, :
invalid type (list) for variable 'data[, i]'
A snapshot of my data in case that is useful can be found below;
structure(list(Participant_ID = c(34L, 35L, 119L, 157L, 158L,
208L, 209L, 1364L, 1365L, 127911L, 127912L, 154110L, 154120L,
167113L, 167123L, 171713L, 171724L, 184212L, 184213L), BB = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 1L), ACOG = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L), WA = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), BMI =
c(23.94688606,
25.87052536, 26.38413048, 24.10971069, 27.77280045, 24.93728065,
26.8804493, 23.90113258, 25.07429123, 27.60118484, 23.12600708,
26.39195442, 23.01516533, 31.3666172, 31.80447578, 24.03654861,
25.11828613, 24.17065239, 28.48728561), Age = c(76L, 76L, 68L,
68L, 68L, 57L, 57L, 56L, 56L, 60L, 60L, 44L, 44L, 58L, 58L, 71L,
71L, 56L, 56L), Diseasestatus = c(0L, 0L, 1L, 1L, 0L, 0L, 1L,
1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L), Met1 = c(0.326537646,
0.362137501, 0.403331692, 0.343789581, 0.437786804, 0.720648545,
0.974583105, 0.565800103, 0.613001417, 0.547743467, 0.337683125,
0.393250468, 0.465795971, 0.390206584, 0.172261362, 0.382496277,
0.435237338, 0.945312001, 0.321214419), Met2 = c(0.465736593,
0.540715637, 0.472693123, 0.681156674, 0.416291697, 0.487306504,
0.499092007, 0.634904337, 0.408109505, 0.808546214, 0.4113336,
0.924069141, 0.673204104, 0.693500596, 0.522794352, 0.373602067,
0.716407827, 0.649634492, 0.514429127), Met3 = c(0.902854296,
0.413241218, 0.418436978, 0.599698582, 0.806269489, 0.746859677,
0.461750237, 0.534943022, 0.511101841, 0.339406025, 0.235624644,
0.405761674, 0.312947287, 0.409833325, 0.026137354, 0.477175654,
0.387610389, 0.226427797, 0.19742037), Met4 = c(0.99425024, 0.923934731,
0.804677487, 0.31081605, 0.351561982, 0.529615606, 0.756342125,
0.968115646, 0.989016517, 0.938703504, 0.841777433, 0.103150219,
0.68397041, 0.903129097, 0.897388285, 0.905293975, 0.992337012,
0.358619626, 0.159601445), Met5 = c(0.527268407, 0.646332723,
0.646042578, 0.163344212, 0.202267074, 0.536976636, 0.789061409,
0.725657854, 0.697350164, 0.044081822, 0.959496477, 0.295039796,
0.120109301, 0.160817478, 0.901107461, 0.529179518, 0.573373775,
0.560701172, 0.325806613), Met6 = c(0.809497068, 0.614253411,
0.375421856, 0.446069992, 0.710859888, 0.474587655, 0.217817798,
0.464787031, 0.5540375, 0.62822217, 0.082906217, 0.294754096,
0.862216149, 0.427856328, 0.418944666, 0.516181576, 0.544516281,
0.519113772, 0.279522811), Met7 = c(0.419627992, 0.365954584,
0.434398151, 0.313441811, 0.368051981, 0.660614914, 0.825809828,
0.412109302, 0.545740249, 0.326247449, 0.373035298, 0.380623499,
0.428859232, 0.321044089, 0.24939936, 0.298372835, 0.387467105,
0.906034877, 0.147250125), Met8 = c(0.549683979, 0.347795497,
0.465729386, 0.625045713, 0.551784129, 0.348174756, 0.4334509,
0.594903245, 0.561353241, 0.621274979, 0.231389704, 0.308801446,
0.464799907, 0.401663011, 0.332966555, 0.109698561, 0.184359915,
0.091447702, 0.20568595), Met9 = c(0.605266628, 0.316564583,
0.166558136, 0.337470002, 0.458328756, 0.409329111, 0.269424154,
0.514746553, 0.408357879, 0.572246814, 0.264718681, 0.125162297,
0.211230627, 0.655667116, 0.034006203, 0.189685846, 0.243832622,
0.360657636, 0.259174139), Met10 = c(0.576174353, 0.214361265,
0.523133504, 0.549085457, 0.430400583, 0.53943429, 0.441563681,
0.401805576, 0.386025835, 0.514017513, 0, 0.330305736, 0.567380079,
0.50505895, 0.242814909, 0.306522744, 0.132950297, 0.207312191,
0.328760686)), class = "data.frame", row.names = c(NA, -19L))

How to mutate a column using dplyr with a value when any of the columns contain a 1 otherwise 0

events <- structure(list(ID = c(3049951, 3085397, 3204081, 3262134,
3467254), TVTProcedureStartDate = structure(c(16210, 16238, 16322,
16420, 16546), class = "Date"), DCDate = structure(c(16213, 16250,
16326, 16426, 16560), class = "Date"), CE_EventOccurred = c(0L,
0L, 0L, 0L, 0L), CE_EventDate = c(0L, 0L, 0L, 0L, 0L), `Annular Dissection (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Aortic Dissection (In Hospital)` = c(0L, 0L,
0L, 1L, 0L), `Atrial Fibrillation (In Hospital)` = c(0L, 1L,
0L, 0L, 1L), `Bleeding at Access Site (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Cardiac Arrest (In Hospital)` = c(1L, 0L, 0L,
0L, 0L), `Conduction/Native Pacer Disturbance Req ICD (In Hospital)` = c(0L,
0L, 1L, 0L, 0L), `Conduction/Native Pacer Disturbance Req Pacer (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Endocarditis (In Hospital)` = c(0L, 0L, 0L,
0L, 0L), `GI Bleed (In Hospital)` = c(0L, 0L, 0L, 0L, 0L), `Hematoma at Access Site (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Ischemic Stroke (In Hospital)` = c(0L, 0L,
0L, 0L, 0L), `Major Vascular Complications (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Minor Vascular Complication (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Mitral Leaflet Injury - detected during surgery (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Mitral Subvalvular Injury -detected during surgery (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `New Requirement for Dialysis (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Other Bleed (In Hospital)` = c(0L, 0L, 0L,
0L, 0L), `Perforation with or w/o Tamponade (In Hospital)` = c(1L,
0L, 0L, 0L, 0L), `Retroperitoneal Bleeding (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Single Leaflet Device Attachment (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Unplanned Other Cardiac Surgery or Intervention (In Hospital)` = c(0L,
0L, 0L, 0L, 0L), `Unplanned Vascular Surgery or Intervention (In Hospital)` = c(0L,
0L, 0L, 1L, 0L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), vars = "NCDRPatientID", labels = structure(list(
NCDRPatientID = c(3049951, 3085397, 3204081, 3262134, 3467254
)), class = "data.frame", row.names = c(NA, -5L), vars = "NCDRPatientID", labels = structure(list(
NCDRPatientID = c(3049951, 3085397, 3204081, 3262134, 3467254,
3467324, 3510387, 3586037, 3661089, 3668621, 3679485, 3737916,
3738064, 3960141, 4006862, 4018241, 4019056, 4025174, 4027490,
4050900, 4051101, 4096816, 4097119, 4097146, 4097180, 4098426,
4106410, 4109968, 4147466, 4198427, 4198450, 4198458, 4204554,
4208053, 4213116, 4218802, 4218854, 4223378, 4223415, 4243959,
4316979, 4341660, 4348676, 4413567, 4419513, 4421948, 4422768,
4426483, 4430159, 4431211, 4433156, 4433406, 4433988)), class = "data.frame", row.names = c(NA,
-53L), vars = "NCDRPatientID", labels = structure(list(NCDRPatientID = c(3049951,
3085397, 3204081, 3262134, 3467254, 3467324, 3510387, 3586037,
3661089, 3668621, 3679485, 3737916, 3738064, 3960141, 4006862,
4018241, 4019056, 4025174, 4027490, 4050900, 4051101, 4096816,
4097119, 4097146, 4097180, 4098426, 4106410, 4109968, 4147466,
4198427, 4198450, 4198458, 4204554, 4208053, 4213116, 4218802,
4218854, 4223378, 4223415, 4243959, 4316979, 4341660, 4348676,
4413567, 4419513, 4421948, 4422768, 4426483, 4430159, 4431211,
4433156, 4433406, 4433988)), class = "data.frame", row.names = c(NA,
-53L), vars = "NCDRPatientID", drop = TRUE), indices = list(0L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10:12, 13L, 14L, 15L,
16:17, 18L, 19:21, 22L, 23L, 24L, 25:26, 27L, 28L, 29:30,
31L, 32:33, 34L, 35:38, 39L, 40:41, 42L, 43L, 44L, 45L, 46L,
47L, 48:50, 51:53, 54L, 55L, 56L, 57L, 58L, 59:60, 61L, 62L,
63:64, 65:66, 67:68, 69L, 70L, 71:72, 73L), drop = TRUE, group_sizes = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 3L,
1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 4L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 2L, 1L), biggest_group_size = 4L), indices = list(0L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L,
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L,
51L, 52L), drop = TRUE, group_sizes = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), biggest_group_size = 1L), indices = list(0L, 1L, 2L, 3L, 4L), drop = TRUE, group_sizes = c(1L,
1L, 1L, 1L, 1L), biggest_group_size = 1L)
From this data, I need to create a column that has value 1 if any of the columns which ends in (in-hospital) contains 1 else 0.
I tried multiple things but either doesn't work or displays error
Error in mutate_impl(.data, dots) : Evaluation error: NA/NaN argument.
event %>% mutate(TR = rowSums(select_(.,6:n)))
Error in mutate_impl(.data, dots) : Column `TR` must be length 1 (the group size), not 53
event %>% mutate(TR = rowSums(.[6:ncol(.)]))
And some other variations of it to see if I can understand or make some sense, but it keeps running into the similar errors and problems
Another thing i tried was the following which seems to do the row sums, but it also adds the ID even when I'm doing the following:
event %>% select(6:27) %>% rowSums()
but it added the ID with the 1s and 0s from columns 6 to 27 for each row. Not sure why it's doing this.
I want the results as a data frame with the same data, but also a column with 1s if any of the columns from 6 to 27 contains 1 otherwise 0
Before I developed my solution, I ran the following code to ungroup your data.
library(dplyr)
events <- events %>% ungroup()
Solution 1: rowSums with selected columns
The idea of this solution is to use rowSums to add all the numbers from the selected columns, determine if the sum is larger than 0, and then convert the logical vector to an integer vector (with 1 or 0).
There are many ways to select the columns. We can select based on column numbers.
events2 <- events %>% mutate(Col = as.integer(rowSums(select(., 6:27)) > 0))
events2$Col
# [1] 1 1 1 1 1
We can use ends_with.
events2 <- events %>% mutate(Col = as.integer(rowSums(select(., ends_with("(In Hospital)"))) > 0))
events2$Col
# [1] 1 1 1 1 1
We can use matches. The regular expression \\(In Hospital\\)$ indicates the string at the end.
events2 <- events %>% mutate(Col = as.integer(rowSums(select(., matches("\\(In Hospital\\)$"))) > 0))
events2$Col
# [1] 1 1 1 1 1
We can use contains, but notice that the target string does not need to be in the end of the column names.
events2 <- events %>% mutate(Col = as.integer(rowSums(select(., contains("(In Hospital)"))) > 0))
events2$Col
# [1] 1 1 1 1 1
Solution 2: apply with max
Since the numbers from the target columns are all 1 or 0, we can use apply with max to get the maximum, which will be 1 if there ara any 1, or 0. All the ways to use the select function as was shown above will also work here. Below I presented one way to do this.
events2 <- events %>% mutate(Col = apply(select(., ends_with("(In Hospital)")), 1, max))
events2$Col
# [1] 1 1 1 1 1
It is not a dplyr way, but it also works:
events$new_col <- 0
events$new_col[rowSums(events[, grep("In Hospital", colnames(events))]) >= 1] <- 1
A solution from base R using apply()
cols <- grep("in hospital", colnames(events), ignore.case = T)
apply(events[, cols], 1, function(x) ifelse(any(x == 1), 1, 0))
# [1] 1 1 1 1 1

How to superpose barplots in ggplot2

I would like to superpose three barplots.
Plot 1:
Plot 2:
Plot 3:
fzg <- structure(list(start = c(40L, 22L, 37L, 32L, 72L, 41L, 2L, 11L, 57L, 10L, 102L, 40L, 17L, 48L, 86L, 46L, 49L, 7L, 1L, 2L, 13L, 69L, 42L, 31L, 39L, 64L, 39L, 29L, 67L, 5L, 1L, 54L, 32L, 7L, 4L, 67L, 14L, 26L, 20L, 42L, 26L, 57L, 0L, 34L, 114L), period = 1:45, zug = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE), typ = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), dyn1 = c(0L, 0L, 0L, 0L, 203L, 0L, 0L, 0L, 111L, 0L, 112L, 0L, 0L, 0L, 191L, 0L, 95L, 0L, 0L, 0L, 0L, 92L, 0L, 0L, 0L, 176L, 0L, 0L, 135L, 0L, 0L, 60L, 0L, 0L, 0L, 110L, 0L, 0L, 0L, 0L, 0L, 185L, 0L, 0L, 148L), dyn2 = c(0L, 0L, 0L, 0L, 203L, 0L, 0L, 0L, 0L, 0L, 223L, 0L, 0L, 0L, 0L, 0L, 286L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 268L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 305L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 333L)), .Names = c("start", "period", "zug", "typ", "dyn1", "dyn2"), row.names = c(NA, -45L), class = "data.frame")
x_scale_max <- max(fzg$start, fzg$dyn1, fzg$dyn2)
ggplot(fzg, aes(x=period, y=start, fill=typ)) + geom_bar(stat="identity", position="dodge") + ylim(0,x_scale_max)
ggplot(fzg, aes(x=period, y=dyn1, fill=typ)) + geom_bar(stat="identity", position="dodge") + ylim(0,x_scale_max)
ggplot(fzg, aes(x=period, y=dyn2, fill=typ)) + geom_bar(stat="identity", position="dodge")+ ylim(0,x_scale_max)
The resulting barplot should
show all the small bars from plot 1 in color 0
show all the highlighted bars from plot 1 in color 1
show the added portions from plot 2 in color 2
show the added portions from plot 3 in color 3
I managed to get all in one plot
library(reshape2)
mdat <- melt(fzg[c("start", "period", "dyn1", "dyn2")], measured=c("start","dyn1","dyn2"), id="period")
ggplot(mdat, aes(x=period, y=value, fill=variable)) + geom_bar(stat="identity", position="stack") + ylim(0,x_scale_max)
But the color highlighting of the different steps does not work well.
If you are looking for this
plot
Just modify your code :
mdat <- melt(fzg[c("start", "period", "dyn1", "dyn2", "typ")], measured=c("start","dyn1","dyn2"), id=c("period", "typ"))
mdat <- mdat[mdat$value != 0,]
ggplot(mdat, aes(x=period, y=value, fill=interaction(variable,typ))) + geom_bar(stat = "identity")

R create variable IF ELSE leads to wrong values

I have a dataframe with:
"serial" the number of households, each one with a variable number of components "head, spouse, parent and child or grandchild" and total number of children in the house "nchild"
I want to create a new variable (in the dput I added an example for clarity: withCM 'living with male child' and withCF). I have tried various combinations but I cannot discriminate on the sex of the child within the same "serial", so that for withCM=1 only when relate=="child"&sex==1, but the 1 would appear on a different row (that of the head, spouse or parent)
mydata$withCM<- ifelse(mydata$nchild>0&mydata$relate!="child",1,0)
mydata <- structure(list(serial = c(12345L, 12345L, 12345L, 12345L, 12346L,
12346L, 12347L, 12347L, 12347L, 12348L, 12348L, 12348L, 12348L,
12348L, 12348L, 12348L, 12349L, 12350L, 12350L, 12351L, 12351L,
12351L, 12352L, 12352L, 12352L, 12352L, 12352L, 12353L, 12354L,
12354L), age = c(45L, 44L, 13L, 11L, 29L, 28L, 65L, 61L, 35L,
68L, 61L, 35L, 34L, 6L, 2L, 1L, 62L, 54L, 52L, 67L, 67L, 12L,
49L, 50L, 28L, 21L, 22L, 70L, 89L, 55L), sex = c(1L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L), relate = structure(c(4L,
7L, 1L, 1L, 4L, 7L, 6L, 6L, 4L, 4L, 7L, 1L, 2L, 3L, 3L, 3L, 4L,
4L, 7L, 4L, 7L, 3L, 4L, 7L, 1L, 5L, 5L, 4L, 6L, 4L), .Label = c("child",
"childinlaw", "grandchild", "head", "nonrelative", "parent",
"spouse"), class = "factor"), nchild = c(2L, 2L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 1L, 1L, 3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L), conhija = c(1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), conhijo = c(1L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("serial",
"age", "sex", "relate", "nchild", "conhija", "conhijo"), class = "data.frame", row.names = c(NA,
-30L))
You can tabulate the gender, family, and role-within-family as:
xtab <- table(mydata$serial, mydata$sex, mydata$relate)
And then choose the heads of the families (or, in the commented line, anyone who has the specific relationship), and alter their tallies as follows:
mydata$sex1 <- 0
mydata$sex2 <- 0
ind <- mydata$relate=="head"
#ind <- mydata$relate %in% c("head","spouse","parent")
mydata$sex1[ind] <- xtab[as.character(mydata$serial[ind]), "1", "child"]
mydata$sex2[ind] <- xtab[as.character(mydata$serial[ind]), "2", "child"]
Use lapply to split into families, then test if they are an adult, and there is at least one male child in the unit.
lives_with_boy <- function(serial)
{
unit <- mydata[mydata$serial==serial,]
as.character(unit$relate) %in% c("head","spouse","parent") & any(unit$relate == "child" & unit$sex==1)
}
mydata$withCM <- unlist(lapply(unique(mydata$serial),lives_with_boy ))

Resources