Converting to JSON (key,value) pair using R - r

My data frame contains data as follows:
Tester W1 W2 W3 A P WD(%) TS(Hrs.) AT(Hrs.) SU(%)
a 60 40 102 202 150 100 120 120 100
b 30 38 46 114 150 76 135 120 100
c 25 30 52 107 150 71 120 120 100
By using the package jsonlite I have converted to json format:
{
"Tester": [ "a", "b", "c" ],
"W1": [ 60, 30, 25],
"W2": [ 40, 38, 30 ],
"W3": [ 102, 46, 52 ],
"A": [ 202, 114, 107 ],
"P": [ 150, 150, 150 ],
"WD...": [ 100, 76, 71 ],
"TS.Hrs..": [ 120, 135, 120 ],
"AT.Hrs..": [ 120, 120, 120 ],
"SU...": [ 100, 100, 100 ]
}
But my requirement is to get the JSON format like:
[ {
"Tester":"a"
"W1": 60,
"w2": 40
"w3": 102,
"A": 202
"P": 150,
"WD(%)":100,
"TS (Hrs.) ": 120,
"AT (Hrs.)": 120,
"SU(%)": 100
}]
Can someone please help me?

The output that you're seeing is produced by jsonlite, when a data set is a list:
library(jsonlite)
toJSON(as.list(head(iris)))
{"Sepal.Length":[5.1,4.9,4.7,4.6,5,5.4],"Sepal.Width":[3.5,3,3.2,3.1,3.6,3.9],"Petal.Length":[1.4,1.4,1.3,1.5,1.4,1.7],"Petal.Width":[0.2,0.2,0.2,0.2,0.2,0.4],"Species":["setosa","setosa","setosa","setosa","setosa","setosa"]}
Make sure that your data set is indeed a data frame and you will see the expected output:
library(jsonlite)
toJSON(head(iris), pretty = TRUE)
[
{
"Sepal.Length": 5.1,
"Sepal.Width": 3.5,
"Petal.Length": 1.4,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 4.9,
"Sepal.Width": 3,
"Petal.Length": 1.4,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 4.7,
"Sepal.Width": 3.2,
"Petal.Length": 1.3,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 4.6,
"Sepal.Width": 3.1,
"Petal.Length": 1.5,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 5,
"Sepal.Width": 3.6,
"Petal.Length": 1.4,
"Petal.Width": 0.2,
"Species": "setosa"
},
{
"Sepal.Length": 5.4,
"Sepal.Width": 3.9,
"Petal.Length": 1.7,
"Petal.Width": 0.4,
"Species": "setosa"
}
]

Related

structuring JSON data in R

I'm new to JSON data and am having a bit of trouble trying to get my data into a combined data frame common to data frames in R. Here is an example of the JSON data:
{
"id": "rub_al_khali",
"conversion_px": 0.0395882818685669,
"n_surfaces": 4,
"lithic_contours": [
{
"surface_id": 0,
"classification": "Ventral",
"total_area_px": 530565.5,
"total_area": 831.5,
"max_breadth": 22.4,
"max_length": 54,
"polygon_count": 7,
"scar_count": 0,
"percentage_detected_scars": 0,
"scar_contours": []
},
{
"surface_id": 1,
"classification": "Dorsal",
"total_area_px": 530503.5,
"total_area": 831.4,
"max_breadth": 22.4,
"max_length": 54,
"polygon_count": 7,
"scar_count": 4,
"percentage_detected_scars": 0.62,
"scar_contours": [
{
"scar_id": 0,
"total_area_px": 129337,
"total_area": 202.7,
"max_breadth": 10.3,
"max_length": 41.7,
"percentage_of_surface": 0.24,
"scar_angle": 1.85,
"polygon_count": 5
},
{
"scar_id": 1,
"total_area_px": 100130,
"total_area": 156.9,
"max_breadth": 7.2,
"max_length": 43,
"percentage_of_surface": 0.19,
"scar_angle": 357.36,
"polygon_count": 4
},
{
"scar_id": 2,
"total_area_px": 93162,
"total_area": 146,
"max_breadth": 6.5,
"max_length": 41.4,
"percentage_of_surface": 0.18,
"scar_angle": 5.01,
"polygon_count": 4
},
{
"scar_id": 3,
"total_area_px": 6148.5,
"total_area": 9.6,
"max_breadth": 4,
"max_length": 7.1,
"percentage_of_surface": 0.01,
"scar_angle": "NaN",
"polygon_count": 9
}
]
},
{
"surface_id": 2,
"classification": "Lateral",
"total_area_px": 176204,
"total_area": 276.2,
"max_breadth": 8.6,
"max_length": 54.2,
"polygon_count": 3,
"scar_count": 2,
"percentage_detected_scars": 0.33,
"scar_contours": [
{
"scar_id": 0,
"total_area_px": 44605,
"total_area": 69.9,
"max_breadth": 5,
"max_length": 50,
"percentage_of_surface": 0.25,
"scar_angle": "NaN",
"polygon_count": 3
},
{
"scar_id": 1,
"total_area_px": 12877,
"total_area": 20.2,
"max_breadth": 1.5,
"max_length": 22.3,
"percentage_of_surface": 0.07,
"scar_angle": "NaN",
"polygon_count": 2
}
]
},
{
"surface_id": 3,
"classification": "Platform",
"total_area_px": 55252.5,
"total_area": 86.6,
"max_breadth": 20.3,
"max_length": 6.6,
"polygon_count": 5,
"scar_count": 1,
"percentage_detected_scars": 0.42,
"scar_contours": [
{
"scar_id": 0,
"total_area_px": 23298.5,
"total_area": 36.5,
"max_breadth": 15,
"max_length": 4.1,
"percentage_of_surface": 0.42,
"scar_angle": "NaN",
"polygon_count": 4
}
]
}
]
}
So far I've used jsonlite to import to R using flatten = TRUE
library(jsonlite)
dta <- fromJSON("~/rub_al_khali.json", flatten = TRUE)
and while this gets me half way there it's not really a combined/comprehensive data.frame. I think that it might be the dta$lithic_contours that is creating the issue. Any help is much appreciated
jsonlite::fromJSON() returns a list, but the element lithic_contours contains a data.frame. Just subset the list to get your data.frame:
# Subset the list on lithic_contours with $ ...
df <- jsonlite::fromJSON(<file>, flatten = TRUE)$lithic_contours
# ... and it's already a data.frame
class(df)
#> [1] "data.frame"
# Turning into a tibble for better printing
tibble::as_tibble(df)
#> # A tibble: 4 × 10
#> surface_id classification total_area_px total_area max_breadth max_length
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 0 Ventral 530566. 832. 22.4 54
#> 2 1 Dorsal 530504. 831. 22.4 54
#> 3 2 Lateral 176204 276. 8.6 54.2
#> 4 3 Platform 55252. 86.6 20.3 6.6
#> # … with 4 more variables: polygon_count <int>, scar_count <int>,
#> # percentage_detected_scars <dbl>, scar_contours <list>
Created on 2022-04-04 by the reprex package (v2.0.1)
Update: unnesting list column
The scar_contours column of your dataframe is a list column. This is actually often a quite convenient format for analysis, but if you want to remove it you can use the function tidyr::unnest():
library(tidyr)
df %>% unnest(scar_contours, names_repair = "minimal")
#> # A tibble: 7 × 17
#> surface_id classification total_area_px total_area max_breadth max_length
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Dorsal 530504. 831. 22.4 54
#> 2 1 Dorsal 530504. 831. 22.4 54
#> 3 1 Dorsal 530504. 831. 22.4 54
#> 4 1 Dorsal 530504. 831. 22.4 54
#> 5 2 Lateral 176204 276. 8.6 54.2
#> 6 2 Lateral 176204 276. 8.6 54.2
#> 7 3 Platform 55252. 86.6 20.3 6.6
#> # … with 11 more variables: polygon_count <int>, scar_count <int>,
#> # percentage_detected_scars <dbl>, scar_id <int>, total_area_px <dbl>,
#> # total_area <dbl>, max_breadth <dbl>, max_length <dbl>,
#> # percentage_of_surface <dbl>, scar_angle <dbl>, polygon_count <int>

ggplot2 pie chart : Repositioning ggrepel slice labels by moving them toward circumference of the pie possible?

I am trying to create a pie chart of a following data.table obj by using ggplot2;
> pietable
cluster N P p
1: 1 962 17.4 8.70
2: 3 611 11.1 22.95
3: 10 343 6.2 31.60
4: 12 306 5.5 37.45
5: 8 290 5.2 42.80
6: 5 288 5.2 48.00
7: 7 259 4.7 52.95
8: 18 210 3.8 57.20
9: 4 207 3.7 60.95
10: 9 204 3.7 64.65
11: 16 199 3.6 68.30
12: 17 201 3.6 71.90
13: 14 174 3.1 75.25
14: 22 159 2.9 78.25
15: 6 121 2.2 80.80
16: 21 106 1.9 82.85
17: 2 101 1.8 84.70
18: 26 95 1.7 86.45
19: 11 89 1.6 88.10
20: 24 84 1.5 89.65
21: 32 71 1.3 91.05
22: 13 65 1.2 92.30
23: 38 50 0.9 93.35
24: 25 41 0.7 94.15
25: 36 36 0.7 94.85
26: 20 39 0.7 95.55
27: 28 33 0.6 96.20
28: 23 30 0.5 96.75
29: 31 24 0.4 97.20
30: 15 21 0.4 97.60
31: 34 22 0.4 98.00
32: 30 14 0.3 98.35
33: 33 19 0.3 98.65
34: 40 10 0.2 98.90
35: 29 11 0.2 99.10
36: 37 9 0.2 99.30
37: 19 8 0.1 99.45
38: 27 6 0.1 99.55
39: 39 6 0.1 99.65
40: 35 3 0.1 99.75
N is a total count of a particular cluster, P is a proportion , and p is a cumsum of P.
A ggplot2 line to create the pie chart is as follows;
ggplot(pietable, aes("", P)) +
geom_bar(
stat = "identity",
aes(
fill = rev(fct_inorder(cluster)))) +
geom_label_repel(
data = pietable[!P<1],
aes(
label = paste0(P, "%"),
y = p1,
#col = rev(fct_inorder(cluster))
),
point.padding = NA,
max.overlaps = Inf,
nudge_x = 1,
color="red",
force = 0.5,
force_pull = 0,
segment.alpha=0.5,
arrow=arrow(length = unit(0.05, "inches"),ends = "last", type = "open"),
show.legend = F
)+
geom_label_repel(
data = pietable[!P<1],
aes(label = cluster,
y = p1),
size=2,
col="black",
force = 0,
force_pull = 0,
label.size = 0.01,
show.legend = F
)+
scale_fill_manual(values = P40) +
coord_polar(theta = "y")+
theme_void()
This generates a pie chart like this;
Some of the pie slices with extremely small values are not labeled for an obvious reason.
What I'd like to do is to reposition slice labels (in number) by moving them toward circumference of the pie.
An example of the pie chart that I'd like to create looks like this;
I'd appreciate any suggestions to accomplish this.
This is a lot of labeling for one visualization, and you may want to consider a different design.
That said, you can add nudge_x = 0.33 and segment.color = 'transparent' to the relevant geom to adjust the positioning of the labels. Note that in order to make the red arrows align to the nudged black labels, you have to add aes(after_stat(1.33)) to the red geom, matching the new nudge_x parameter of the black geom (+1.0). These lines are marked with # added comments in the code below.
I've made some light modifications to your plotting code (you seem to have variables in your workspace that are not available in your post), but this is the general idea:
ggplot(pietable, aes("", P)) +
geom_bar(
stat = "identity",
aes(
fill = factor(cluster))) +
geom_label_repel(
data = pietable[!pietable$P<1, ],
aes(
label = paste0(P, "%"),
y = p,
x = after_stat(1.33) # added
#col = rev(fct_inorder(cluster))
),
point.padding = NA,
max.overlaps = Inf,
nudge_x = 1,
color="red",
force = 0.5,
force_pull = 0,
segment.alpha=0.5,
arrow=arrow(length = unit(0.05, "inches"),ends = "last", type = "open"),
show.legend = F
)+
geom_label_repel(
data = pietable[!pietable$P<1,],
aes(label = cluster,
y = p),
size=2,
col="black",
nudge_x = 0.33, # added
force = 0,
force_pull = 0,
label.size = 0.01,
show.legend = F,
segment.color = 'transparent' # added
)+
coord_polar(theta = "y")+
theme_void()
And for reference, the data set I worked from:
pietable <- structure(list(cluster = c(1, 3, 10, 12, 8, 5, 7, 18, 4, 9, 16,
17, 14, 22, 6, 21, 2, 26, 11, 24, 32, 13, 38, 25, 36, 20, 28,
23, 31, 15, 34, 30, 33, 40, 29, 37, 19, 27, 39, 35), N = c(962,
611, 343, 306, 290, 288, 259, 210, 207, 204, 199, 201, 174, 159,
121, 106, 101, 95, 89, 84, 71, 65, 50, 41, 36, 39, 33, 30, 24,
21, 22, 14, 19, 10, 11, 9, 8, 6, 6, 3), P = c(17.4, 11.1, 6.2,
5.5, 5.2, 5.2, 4.7, 3.8, 3.7, 3.7, 3.6, 3.6, 3.1, 2.9, 2.2, 1.9,
1.8, 1.7, 1.6, 1.5, 1.3, 1.2, 0.9, 0.7, 0.7, 0.7, 0.6, 0.5, 0.4,
0.4, 0.4, 0.3, 0.3, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1), p = c(8.7,
22.95, 31.6, 37.45, 42.8, 48, 52.95, 57.2, 60.95, 64.65, 68.3,
71.9, 75.25, 78.25, 80.8, 82.85, 84.7, 86.45, 88.1, 89.65, 91.05,
92.3, 93.35, 94.15, 94.85, 95.55, 96.2, 96.75, 97.2, 97.6, 98,
98.35, 98.65, 98.9, 99.1, 99.3, 99.45, 99.55, 99.65, 99.75)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame"))

How can I fix the VECM warning message of "the condition has length > 1 and only the first element will be used"?

When I run vector error correction model (VECM) using the "VECM" code from "library(tsDyn)", I keep getting the following warning which I could not fix:
Warning messages:
1: In if (class(x) == "numeric") return(noquote(r)) :
the condition has length > 1 and only the first element will be used
2: In if (class(x) == "matrix") return(matrix(noquote(r), ncol = ncol(x), :
the condition has length > 1 and only the first element will be used
3: In if (class(x) == "numeric") return(noquote(r)) :
the condition has length > 1 and only the first element will be used
4: In if (class(x) == "matrix") return(matrix(noquote(r), ncol = ncol(x), :
the condition has length > 1 and only the first element will be used
The code I use is as follows:
e6<-data1[,c("x8", "x1","x2","x3","x4","x5","x6")]
est_tsdyn <- VECM(e6, lag = 8, include = "both", estim = "ML", exogen = NULL)
summary(est_tsdyn)
The data is balanced with 2060 number of rows with no missing values.
Possibly because I am using balanced panel data on a vector error correction model, it might not have taken care of the panel data structure properly. However, I cannot find any alternative than the VECM function if I want to run a vector error correction model on my panel data. I do not think the panel VECM exists in R code either.
The snippet of the first 50 rows using dput(data1) are as follows:
> dput(data1[1:50,])
structure(list(Country = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Date = c(48,
49, 52, 53, 54, 57, 59, 60, 64, 65, 69, 71, 86, 87, 88, 92, 101,
102, 105, 106, 110, 113, 118, 119, 121, 123, 124, 125, 126, 127,
129, 132, 133, 136, 137, 143, 144, 148, 149, 151, 152, 155, 156,
157, 158, 161, 162, 166, 167, 168), x1 = c(0.014748522,
0.118574701, 0.014776643, 0.110949861, 0.01481079, 0.118697229,
0.109259581, 0.106920507, 0.09964718, 0.107359397, 0.100214624,
0.101336456, 0.084556183, 0.109388135, 0.049318414, 0.083084846,
0.101614654, 0.09898533, 0.08605765, 0.099262524, 0.097317145,
0.094441761, 0.088059271, 0.101287244, 0.102545664, 0.106297825,
0.097040955, 0.080330986, 0.103339081, 0.108313506, 0.100936735,
0.10794291, 0.11167398, 0.111364648, 0.108089542, 0.110835368,
0.112419189, 0.110474815, 0.112116887, 0.122428299, 0.114857692,
0.115030436, 0.119601122, 0.114017072, 0.114926991, 0.113645471,
0.117205805, 0.115805775, 0.11617135, 0.114326404), x2 = c(0.044647275,
0.053976585, 0.030403218, 0.044558117, 0.063132462, 0.103456438,
0.117170791, 0.104951921, 0.108145525, 0.107693444, 0.096528502,
0.095931022, 0.083300776, 0.080563349, 0.076819818, 0.084028311,
0.095892312, 0.096190825, 0.091091159, 0.090343147, 0.096242416,
0.085306606, 0.085667078, 0.09251297, 0.105269247, 0.095251763,
0.093446551, 0.096549008, 0.100387759, 0.101508899, 0.100509418,
0.107830747, 0.109448071, 0.110830736, 0.109078427, 0.109318996,
0.112848661, 0.110987973, 0.112196608, 0.115601933, 0.114478704,
0.116686745, 0.116382225, 0.113006561, 0.109417021, 0.114979708,
0.115397391, 0.115777083, 0.114273074, 0.111343996), x3 = c(25,
25, 41.67, 75, 88.89, 93.52, 93.52, 93.52, 93.52, 93.52, 93.52,
93.52, 90.74, 90.74, 90.74, 90.74, 90.74, 88.89, 88.89, 88.89,
88.89, 88.89, 88.89, 92.59, 92.59, 92.59, 92.59, 92.59, 92.59,
92.59, 92.59, 90.74, 90.74, 90.74, 90.74, 88.89, 87.96, 87.96,
87.96, 87.96, 87.96, 87.96, 87.96, 87.96, 87.96, 87.96, 87.96,
87.96, 87.96, 87.96), x4 = c(0, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
x5 = c(4.815325122, 4.815325122, 4.815325122,
4.815325122, 4.815325122, 4.815325122, 4.815325122, 4.815325122,
4.815325122, 4.815325122, 4.815325122, 4.815325122, 4.815325122,
4.815325122, 4.815325122, 4.815325122, 4.815325122, 4.815325122,
4.815325122, 4.815325122, 6.041347309, 6.041347309, 6.041347309,
6.041347309, 6.041347309, 6.041347309, 6.041347309, 6.041347309,
6.041347309, 6.041347309, 6.041347309, 6.041347309, 6.041347309,
6.041347309, 6.041347309, 6.041347309, 6.041347309, 6.041347309,
6.041347309, 6.041347309, 6.041347309, 6.041347309, 6.041347309,
6.041347309, 6.041347309, 6.041347309, 6.041347309, 6.041347309,
6.041347309, 6.041347309), x6 = c(0.7935,
0.7303, 0.5763, 0.5331, 0.4907, 0.3064, 0.2461, 0.1939, 0.1127,
0.096, 0.0012, -0.0282, -0.2368, -0.2497, -0.2622, -0.3073,
-0.4152, -0.425, -0.4503, -0.461, -0.5089, -0.5376, -0.5856,
-0.5956, -0.6147, -0.6337, -0.6429, -0.652, -0.6779, -0.6863,
-0.7033, -0.7285, -0.7366, -0.7596, -0.7673, -0.8152, -0.8226,
-0.8511, -0.8582, -0.8817, -0.8897, -0.913, -0.9206, -0.9285,
-0.9366, -0.9632, -0.9714, -1.0053, -1.0137, -1.0223), x7 = c(38,
38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
38, 38, 38, 38), X8 = c(-4.397966662, -6.304929628,
0.488928104, -6.304929628, 2.54486109, -3.296545249, 1.344450099,
3.782659735, -0.844822382, 4.83150399, -6.304929628, 2.159834672,
1.420876501, -3.354324242, 3.589037795, 1.061780955, 4.228123326,
-0.404162634, -5.056291726, 0.010801841, -5.328349718, -1.493660218,
-0.696633142, -4.105707617, -0.871840445, 5.29044444, -1.962123959,
0.586428005, 1.138495764, 1.753597336, 0.275856688, 2.375667683,
3.884202996, 1.723158621, -1.047778386, -2.310359726, 0.175022741,
-4.057753192, 1.331212028, -4.328358106, 2.086407315, -1.432959593,
-0.337455739, -1.618003031, -3.500966569, -0.620899578, -3.649420293,
-0.459085095, 2.257504544, 0.745875601), X9 = c(-4.302658422,
-6.110280589, 0.490125308, -6.110280589, 2.577519125, -3.242801379,
1.353528468, 3.855112975, -0.841263786, 4.950123801, -6.110280589,
2.183327935, 1.431018931, -3.298690566, 3.654221238, 1.067437852,
4.318781661, -0.403346996, -4.930588828, 0.010802424, -5.188881247,
-1.482560447, -0.694212278, -4.022565186, -0.868050937, 5.432889579,
-1.942999592, 0.58815086, 1.145001292, 1.769063124, 0.276237523,
2.404111465, 3.960624404, 1.738090643, -1.04230831, -2.28387527,
0.175175995, -3.976528721, 1.340112104, -4.236021695, 2.108324957,
-1.422741592, -0.336886997, -1.604983674, -3.440391694, -0.61897598,
-3.583631679, -0.45803291, 2.283179015, 0.748664182), X10 = c(0.022036057,
0.022099114, 0.022148854, 0.022295818, 0.022296321, 0.022417636,
0.022468635, 0.022471382, 0.022464479, 0.022474524, 0.022565,
0.022556508, 0.022628762, 0.022632952, 0.022636849, 0.022625484,
0.022663127, 0.022660331, 0.022713486, 0.022710519, 0.022745041,
0.022848741, 0.022858749, 0.022866118, 0.022865227, 0.022874749,
0.022874749, 0.022874749, 0.022874749, 0.022874749, 0.022873025,
0.022861229, 0.022866133, 0.022853027, 0.022850894, 0.022853874,
0.022850921, 0.022855289, 0.022853114, 0.022862262, 0.022861413,
0.022849419, 0.022846619, 0.022845453, 0.022850036, 0.022871213,
0.022874749, 0.022860246, 0.022859786, 0.022857052), x11 = c(0.02205167,
0.022114713, 0.022164428, 0.022311364, 0.022311864, 0.022433137,
0.022484114, 0.022486855, 0.022479932, 0.022489972, 0.022580409,
0.022571904, 0.022644075, 0.022648261, 0.022652155, 0.022640772,
0.022678364, 0.022675565, 0.022728696, 0.022725727, 0.022760221,
0.022863891, 0.022873875, 0.02288124, 0.022880342, 0.022889387,
0.022889387, 0.022889387, 0.022889387, 0.022889387, 0.022888096,
0.022876286, 0.022881185, 0.022868066, 0.02286593, 0.022868884,
0.022865929, 0.022870278, 0.0228681, 0.022877231, 0.022876379,
0.022864371, 0.022861568, 0.022860399, 0.022864979, 0.022886138,
0.022889387, 0.022875151, 0.022874688, 0.022871951), x12 = c(0.021513181,
0.021571753, 0.021617452, 0.02174688, 0.021747569, 0.021882247,
0.021932113, 0.021935407, 0.021929198, 0.021940171, 0.022036504,
0.022028441, 0.022112581, 0.02211688, 0.022121171, 0.022110325,
0.022152497, 0.022149788, 0.022207397, 0.022204502, 0.022237638,
0.022350023, 0.022361011, 0.022368394, 0.022367831, 0.022392916,
0.022392916, 0.022392916, 0.022385136, 0.022383687, 0.022381105,
0.022369664, 0.022375024, 0.022362253, 0.02236023, 0.022365686,
0.022362796, 0.022367793, 0.022365675, 0.022375336, 0.022374587,
0.022363052, 0.022360332, 0.022359293, 0.022363957, 0.022387616,
0.022392877, 0.022377085, 0.02237674, 0.022374056), x13 = c(0.021528877,
0.021587435, 0.021633108, 0.021762508, 0.021763194, 0.021897824,
0.021947669, 0.021950955, 0.021944726, 0.021955694, 0.022051985,
0.022043909, 0.022127962, 0.022132257, 0.022136544, 0.02212568,
0.022167799, 0.022165088, 0.022222671, 0.022219773, 0.022252881,
0.022365232, 0.022376196, 0.022383574, 0.022383005, 0.022407741,
0.022407741, 0.022407741, 0.022400273, 0.022398821, 0.022396232,
0.022384778, 0.022390134, 0.022377348, 0.022375323, 0.022380752,
0.02237786, 0.022382837, 0.022380717, 0.022390361, 0.022389608,
0.02237806, 0.022375337, 0.022374295, 0.022378955, 0.022402595,
0.022407741, 0.022392044, 0.022391696, 0.022389009), x14 = c(355.7064977,
355.7064977, 355.7064977, 355.7064977, 355.7064977, 355.7064977,
355.7064977, 366.871849, 366.871849, 366.871849, 366.871849,
366.871849, 436.6764361, 436.6764361, 436.6764361, 436.6764361,
343.7874609, 343.7874609, 343.7874609, 343.7874609, 343.7874609,
343.7874609, 343.7874609, 343.7874609, 351.4579307, 351.4579307,
351.4579307, 351.4579307, 351.4579307, 351.4579307, 351.4579307,
351.4579307, 351.4579307, 351.4579307, 351.4579307, 313.8276295,
313.8276295, 313.8276295, 313.8276295, 313.8276295, 313.8276295,
313.8276295, 313.8276295, 313.8276295, 313.8276295, 299.7095158,
299.7095158, 299.7095158, 299.7095158, 299.7095158), x15 = c(13,
13, 13, 13, 13, 13, 13, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5,
-1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5,
-5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5,
-5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5,
-5.5, -5.5, -5.5, -5.5, -5.5, -5.5, -5.5), x16 = c(2, 2,
2, 2, 2, 2, 2, 3.3, 3.3, 3.3, 3.3, 3.3, 1.5, 1.5, 1.5, 1.5,
1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.2, 2.2, 2.2, 2.2, 2.2,
2.2, 2.2, 2.2, 2.2, 2.2, 2.2, 2.2, 1.9, 1.9, 1.9, 1.9, 1.9,
1.9, 1.9, 1.9, 1.9, 1.9, 2.7, 2.7, 2.7, 2.7, 2.7), x17 = c(53.9,
75.47, 75.91, 75.91, 72, 61, 57.08, 57.06, 46.7, 43.35, 40.11,
43.83, 33.04, 35.28, 32.61, 27.99, 25.66, 25.81, 27.57, 27.57,
33.47, 31.77, 31.78, 30.43, 27.68, 27.94, 29.43, 28.08, 32.19,
29.52, 28, 24.84, 24.32, 24.74, 25.44, 22.99, 22.65, 22.28,
22.13, 21.51, 22.54, 22.37, 22.03, 23.27, 24.47, 26.12, 26.57,
31.46, 28.81, 29.71), x18 = c(13.95348837, 40.01855288,
-8.199298585, 0.711368726, -5.820797907, -4.61297889, -12.9081477,
6.574523721, 3.227232538, -7.173447537, -1.787463271, 14.88859764,
19.84040624, 6.779661017, -7.568027211, -8.319685555, -4.396423249,
0.58456742, 6.819062379, 0, -0.594000594, -9.538724374, -8.494097322,
-4.247954688, -3.284416492, 0.939306358, 5.33285612, -4.587155963,
17.95529498, -8.294501398, 0.864553314, 1.553556827, -2.093397746,
-4.256965944, 2.829426031, -3.240740741, -1.478903871, -7.282563462,
-0.673249551, 0.74941452, 4.788470479, -0.754214729, -1.519892713,
5.628688153, 5.156854319, -1.098068913, 1.722817764, 2.308943089,
-8.423394787, 3.123915307)), row.names = c(NA, 50L), class = "data.frame")
If I show the first 50 rows of the raw data itself with data1[1:50,], it shows as below:
Country Date x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18
1 48 0.01474852 0.04464728 25 0 4.815325 0.7935 38 -4.39796666 -4.30265842 0.02203606 0.02205167 0.02151318 0.02152888 355.7065 13 2 53.9 13.9534884
1 49 0.1185747 0.05397659 25 0 4.815325 0.7303 38 -6.30492963 -6.11028059 0.02209911 0.02211471 0.02157175 0.02158743 355.7065 13 2 75.47 40.0185529
1 52 0.01477664 0.03040322 41.67 0 4.815325 0.5763 38 0.4889281 0.49012531 0.02214885 0.02216443 0.02161745 0.02163311 355.7065 13 2 75.91 -8.1992986
1 53 0.11094986 0.04455812 75 0 4.815325 0.5331 38 -6.30492963 -6.11028059 0.02229582 0.02231136 0.02174688 0.02176251 355.7065 13 2 75.91 0.7113687
1 54 0.01481079 0.06313246 88.89 1 4.815325 0.4907 38 2.54486109 2.57751912 0.02229632 0.02231186 0.02174757 0.02176319 355.7065 13 2 72 -5.8207979
1 57 0.11869723 0.10345644 93.52 1 4.815325 0.3064 38 -3.29654525 -3.24280138 0.02241764 0.02243314 0.02188225 0.02189782 355.7065 13 2 61 -4.6129789
1 59 0.10925958 0.11717079 93.52 1 4.815325 0.2461 38 1.3444501 1.35352847 0.02246864 0.02248411 0.02193211 0.02194767 355.7065 13 2 57.08 -12.9081477
1 60 0.10692051 0.10495192 93.52 1 4.815325 0.1939 38 3.78265974 3.85511297 0.02247138 0.02248686 0.02193541 0.02195096 366.8718 -1.5 3.3 57.06 6.5745237
1 64 0.09964718 0.10814553 93.52 1 4.815325 0.1127 38 -0.84482238 -0.84126379 0.02246448 0.02247993 0.0219292 0.02194473 366.8718 -1.5 3.3 46.7 3.2272325
1 65 0.1073594 0.10769344 93.52 1 4.815325 0.096 38 4.83150399 4.9501238 0.02247452 0.02248997 0.02194017 0.02195569 366.8718 -1.5 3.3 43.35 -7.1734475
1 69 0.10021462 0.0965285 93.52 1 4.815325 0.0012 38 -6.30492963 -6.11028059 0.022565 0.02258041 0.0220365 0.02205198 366.8718 -1.5 3.3 40.11 -1.7874633
1 71 0.10133646 0.09593102 93.52 1 4.815325 -0.0282 38 2.15983467 2.18332793 0.02255651 0.0225719 0.02202844 0.02204391 366.8718 -1.5 3.3 43.83 14.8885976
1 86 0.08455618 0.08330078 90.74 1 4.815325 -0.2368 38 1.4208765 1.43101893 0.02262876 0.02264407 0.02211258 0.02212796 436.6764 -1.5 1.5 33.04 19.8404062
1 87 0.10938813 0.08056335 90.74 1 4.815325 -0.2497 38 -3.35432424 -3.29869057 0.02263295 0.02264826 0.02211688 0.02213226 436.6764 -1.5 1.5 35.28 6.779661
1 88 0.04931841 0.07681982 90.74 1 4.815325 -0.2622 38 3.58903779 3.65422124 0.02263685 0.02265216 0.02212117 0.02213654 436.6764 -1.5 1.5 32.61 -7.5680272
1 92 0.08308485 0.08402831 90.74 1 4.815325 -0.3073 38 1.06178095 1.06743785 0.02262548 0.02264077 0.02211033 0.02212568 436.6764 -1.5 1.5 27.99 -8.3196856
1 101 0.10161465 0.09589231 90.74 1 4.815325 -0.4152 38 4.22812333 4.31878166 0.02266313 0.02267836 0.0221525 0.0221678 343.7875 -1.5 1.5 25.66 -4.3964232
1 102 0.09898533 0.09619082 88.89 1 4.815325 -0.425 38 -0.40416263 -0.403347 0.02266033 0.02267557 0.02214979 0.02216509 343.7875 -1.5 1.5 25.81 0.5845674
1 105 0.08605765 0.09109116 88.89 1 4.815325 -0.4503 38 -5.05629173 -4.93058883 0.02271349 0.0227287 0.0222074 0.02222267 343.7875 -1.5 1.5 27.57 6.8190624
1 106 0.09926252 0.09034315 88.89 1 4.815325 -0.461 38 0.01080184 0.01080242 0.02271052 0.02272573 0.0222045 0.02221977 343.7875 -1.5 1.5 27.57 0
1 110 0.09731714 0.09624242 88.89 1 6.041347 -0.5089 38 -5.32834972 -5.18888125 0.02274504 0.02276022 0.02223764 0.02225288 343.7875 -1.5 1.5 33.47 -0.5940006
1 113 0.09444176 0.08530661 88.89 1 6.041347 -0.5376 38 -1.49366022 -1.48256045 0.02284874 0.02286389 0.02235002 0.02236523 343.7875 -1.5 1.5 31.77 -9.5387244
1 118 0.08805927 0.08566708 88.89 1 6.041347 -0.5856 38 -0.69663314 -0.69421228 0.02285875 0.02287387 0.02236101 0.0223762 343.7875 -1.5 1.5 31.78 -8.4940973
1 119 0.10128724 0.09251297 92.59 1 6.041347 -0.5956 38 -4.10570762 -4.02256519 0.02286612 0.02288124 0.02236839 0.02238357 343.7875 -5.5 2.2 30.43 -4.2479547
1 121 0.10254566 0.10526925 92.59 1 6.041347 -0.6147 38 -0.87184045 -0.86805094 0.02286523 0.02288034 0.02236783 0.02238301 351.4579 -5.5 2.2 27.68 -3.2844165
1 123 0.10629782 0.09525176 92.59 1 6.041347 -0.6337 38 5.29044444 5.43288958 0.02287475 0.02288939 0.02239292 0.02240774 351.4579 -5.5 2.2 27.94 0.9393064
1 124 0.09704095 0.09344655 92.59 1 6.041347 -0.6429 38 -1.96212396 -1.94299959 0.02287475 0.02288939 0.02239292 0.02240774 351.4579 -5.5 2.2 29.43 5.3328561
1 125 0.08033099 0.09654901 92.59 1 6.041347 -0.652 38 0.58642801 0.58815086 0.02287475 0.02288939 0.02239292 0.02240774 351.4579 -5.5 2.2 28.08 -4.587156
1 126 0.10333908 0.10038776 92.59 1 6.041347 -0.6779 38 1.13849576 1.14500129 0.02287475 0.02288939 0.02238514 0.02240027 351.4579 -5.5 2.2 32.19 17.955295
1 127 0.10831351 0.1015089 92.59 1 6.041347 -0.6863 38 1.75359734 1.76906312 0.02287475 0.02288939 0.02238369 0.02239882 351.4579 -5.5 2.2 29.52 -8.2945014
1 129 0.10093673 0.10050942 92.59 1 6.041347 -0.7033 38 0.27585669 0.27623752 0.02287303 0.0228881 0.0223811 0.02239623 351.4579 -5.5 2.2 28 0.8645533
1 132 0.10794291 0.10783075 90.74 1 6.041347 -0.7285 38 2.37566768 2.40411147 0.02286123 0.02287629 0.02236966 0.02238478 351.4579 -5.5 2.2 24.84 1.5535568
1 133 0.11167398 0.10944807 90.74 1 6.041347 -0.7366 38 3.884203 3.9606244 0.02286613 0.02288118 0.02237502 0.02239013 351.4579 -5.5 2.2 24.32 -2.0933977
1 136 0.11136465 0.11083074 90.74 1 6.041347 -0.7596 38 1.72315862 1.73809064 0.02285303 0.02286807 0.02236225 0.02237735 351.4579 -5.5 2.2 24.74 -4.2569659
1 137 0.10808954 0.10907843 90.74 1 6.041347 -0.7673 38 -1.04777839 -1.04230831 0.02285089 0.02286593 0.02236023 0.02237532 351.4579 -5.5 2.2 25.44 2.829426
1 143 0.11083537 0.109319 88.89 1 6.041347 -0.8152 38 -2.31035973 -2.28387527 0.02285387 0.02286888 0.02236569 0.02238075 313.8276 -5.5 1.9 22.99 -3.2407407
1 144 0.11241919 0.11284866 87.96 1 6.041347 -0.8226 38 0.17502274 0.175176 0.02285092 0.02286593 0.0223628 0.02237786 313.8276 -5.5 1.9 22.65 -1.4789039
1 148 0.11047482 0.11098797 87.96 1 6.041347 -0.8511 38 -4.05775319 -3.97652872 0.02285529 0.02287028 0.02236779 0.02238284 313.8276 -5.5 1.9 22.28 -7.2825635
1 149 0.11211689 0.11219661 87.96 1 6.041347 -0.8582 38 1.33121203 1.3401121 0.02285311 0.0228681 0.02236568 0.02238072 313.8276 -5.5 1.9 22.13 -0.6732496
1 151 0.1224283 0.11560193 87.96 1 6.041347 -0.8817 38 -4.32835811 -4.23602169 0.02286226 0.02287723 0.02237534 0.02239036 313.8276 -5.5 1.9 21.51 0.7494145
1 152 0.11485769 0.1144787 87.96 1 6.041347 -0.8897 38 2.08640732 2.10832496 0.02286141 0.02287638 0.02237459 0.02238961 313.8276 -5.5 1.9 22.54 4.7884705
1 155 0.11503044 0.11668674 87.96 1 6.041347 -0.913 38 -1.43295959 -1.42274159 0.02284942 0.02286437 0.02236305 0.02237806 313.8276 -5.5 1.9 22.37 -0.7542147
1 156 0.11960112 0.11638223 87.96 1 6.041347 -0.9206 38 -0.33745574 -0.336887 0.02284662 0.02286157 0.02236033 0.02237534 313.8276 -5.5 1.9 22.03 -1.5198927
1 157 0.11401707 0.11300656 87.96 1 6.041347 -0.9285 38 -1.61800303 -1.60498367 0.02284545 0.0228604 0.02235929 0.02237429 313.8276 -5.5 1.9 23.27 5.6286882
1 158 0.11492699 0.10941702 87.96 1 6.041347 -0.9366 38 -3.50096657 -3.44039169 0.02285004 0.02286498 0.02236396 0.02237895 313.8276 -5.5 1.9 24.47 5.1568543
1 161 0.11364547 0.11497971 87.96 1 6.041347 -0.9632 38 -0.62089958 -0.61897598 0.02287121 0.02288614 0.02238762 0.0224026 299.7095 -5.5 2.7 26.12 -1.0980689
1 162 0.1172058 0.11539739 87.96 1 6.041347 -0.9714 38 -3.64942029 -3.58363168 0.02287475 0.02288939 0.02239288 0.02240774 299.7095 -5.5 2.7 26.57 1.7228178
1 166 0.11580577 0.11577708 87.96 1 6.041347 -1.0053 38 -0.45908509 -0.45803291 0.02286025 0.02287515 0.02237709 0.02239204 299.7095 -5.5 2.7 31.46 2.3089431
1 167 0.11617135 0.11427307 87.96 1 6.041347 -1.0137 38 2.25750454 2.28317901 0.02285979 0.02287469 0.02237674 0.0223917 299.7095 -5.5 2.7 28.81 -8.4233948
1 168 0.1143264 0.111344 87.96 1 6.041347 -1.0223 38 0.7458756 0.74866418 0.02285705 0.02287195 0.02237406 0.02238901 299.7095 -5.5 2.7 29.71 3.1239153
May I get help on fixing this error please?
Looking at your code, there are potentially two issues:
Reading your dput structure, your column "x8" is actually "X8". Note the capitalization of the x. As such, I think you need to change this line to be read as:
e6<-data1[,c("X8", "x1","x2","x3","x4","x5","x6")] #note the upper case first x.
If that doesn't fix it, please upload your whole data somewhere. I ran your data and it executed fine without errors (I just had to lower the lag as I had fewer observations).
The underlying problem is not in your code, but rather in the source code for the tsDyn package.
Diagnosis
In the definition of tsDyn:::myformat() within the file misc.R, we find these lines:
if(class(x)=="numeric")
return(noquote(r))
if(class(x)=="matrix")
return(matrix(noquote(r), ncol=ncol(x), nrow=nrow(x)))
This is a rudimentary way to check that the input x is the right kind of object. However, class() returns a character vector of class names: not only the class of the object itself, but also every class from which it inherits.
For example:
x <- matrix(1:9)
class(x)
#> [1] "matrix" "array"
x <- as.data.frame(x)
class(x)
#> [1] "data.frame"
x <- tibble::as_tibble(x)
class(x)
#> [1] "tbl_df" "tbl" "data.frame"
This means that when tsDyn:::myformat() performs its checks, it is getting multiple values:
x <- matrix(1:9)
class(x)
#> [1] "matrix" "array"
class(x) == "matrix"
#> [1] TRUE FALSE
class(x) == "numeric"
#> [1] FALSE FALSE
Now in R, an if statement expects a condition like so:
A length-one logical vector that is not NA. Conditions of length greater than one are currently accepted with a warning, but only the first element is used.
So when we run if statements with "conditions of length greater than one"
if(class(x) == "matrix") {
# ...
}
# Equivalently.
if(c(TRUE, FALSE)) {
# ...
}
we get your warning:
Warning message:
In if (class(x) == "matrix") { :
the condition has length > 1 and only the first element will be used
Verification
You should verify this is the case for your installation of tsDyn. You can do so by entering
View(tsDyn:::myformat)
and inspecting the source code for the problematic lines:
Solution
This fork of the tsDyn package was fixed in November 2020. Its implementation of tsDyn:::myformat() now follows best practice, by using inherits() to check the class of x:
if(inherits(x, "numeric")) {
return(noquote(r))
} else if(inherits(x, "matrix")) {
return(matrix(noquote(r), ncol=ncol(x), nrow=nrow(x)))
}
Given that the original version was last updated in 2011, you might want to reinstall from the aforementioned fork, which was last updated in March 2021:
devtools::install_github("MatthieuStigler/tsDyn/tsDyn", ref = "master")
Verification
You should inspect tsDyn::myformat() once more
View(tsDyn:::myformat)
and verify that the problematic lines have been updated:

How to create two columns that count the total number of two conditions

I have a diabetes dataset that has a column called Outcome and only has two values, 1 = Diabetes, 0 = Non-Diabetes. I want to count the total number of 1's and 0's based on age and then have a % of 1's based on age.
I have this code below:
by_age1 <-
diabetes.df %>%
select(Age, Outcome) %>%
group_by(Age,Outcome) %>%
summarize(Diabetes_Count = n()) %>%
filter(Outcome=="1"| Outcome == "0")
This code generates this table
Age | Outcome | Count
21 0 58
21 1 5
And so on
I want the table to look like this though
Age | Count_Outcome=1 | Count_Outcome=0
21 5 58
22 11 61
So I can eventually get to this
Age | Count_Outcome=1 | Count_Outcome=0 | Count_Outcome=1/Count_Outcome=0
21 5 58 0.086
22 11 61 0.180
Here is the dataset
Rows: 768
Columns: 23
$ Pregnancies <int> 6, 1, 8, 1, 0, 5, 3, 10, 2, 8, 4, 10, 10, 1, 5, 7, 0, 7, 1, 1, 3, 8, 7, 9, 11, 10, 7, 1, 13, 5, 5, 3, ...
$ Glucose <int> 148, 85, 183, 89, 137, 116, 78, 115, 197, 125, 110, 168, 139, 189, 166, 100, 118, 107, 103, 115, 126, ...
$ BloodPressure <int> 72, 66, 64, 66, 40, 74, 50, 0, 70, 96, 92, 74, 80, 60, 72, 0, 84, 74, 30, 70, 88, 84, 90, 80, 94, 70, ...
$ SkinThickness <int> 35, 29, 0, 23, 35, 0, 32, 0, 45, 0, 0, 0, 0, 23, 19, 0, 47, 0, 38, 30, 41, 0, 0, 35, 33, 26, 0, 15, 19...
$ Insulin <int> 0, 0, 0, 94, 168, 0, 88, 0, 543, 0, 0, 0, 0, 846, 175, 0, 230, 0, 83, 96, 235, 0, 0, 0, 146, 115, 0, 1...
$ BMI <dbl> 33.6, 26.6, 23.3, 28.1, 43.1, 25.6, 31.0, 35.3, 30.5, 0.0, 37.6, 38.0, 27.1, 30.1, 25.8, 30.0, 45.8, 2...
$ DiabetesPedigreeFunction <dbl> 0.627, 0.351, 0.672, 0.167, 2.288, 0.201, 0.248, 0.134, 0.158, 0.232, 0.191, 0.537, 1.441, 0.398, 0.58...
$ Age <int> 50, 31, 32, 21, 33, 30, 26, 29, 53, 54, 30, 34, 57, 59, 51, 32, 31, 31, 33, 32, 27, 50, 41, 29, 51, 41...
$ Outcome <int> 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, ...
$ Skin.log <dbl> 3.555634, 3.367641, -4.605170, 3.135929, 3.555634, -4.605170, 3.466048, -4.605170, 3.806885, -4.605170...
$ Insulin.log <dbl> -2.302585, -2.302585, -2.302585, 4.544358, 5.124559, -2.302585, 4.478473, -2.302585, 6.297293, -2.3025...
$ DPF.log <dbl> -0.46680874, -1.04696906, -0.39749694, -1.78976147, 0.82767807, -1.60445037, -1.39432653, -2.00991548,...
$ Preg.log <dbl> 1.793424749, 0.009950331, 2.080690761, 0.009950331, -4.605170186, 1.611435915, 1.101940079, 2.30358459...
$ Age.log <dbl> 3.912023, 3.433987, 3.465736, 3.044522, 3.496508, 3.401197, 3.258097, 3.367296, 3.970292, 3.988984, 3....
$ G <dbl> 0.84777132, -1.12266474, 1.94245802, -0.99755769, 0.50372693, -0.15308509, -1.34160209, -0.18436186, 2...
$ BP <dbl> 0.14954330, -0.16044119, -0.26376935, -0.16044119, -1.50370731, 0.25287146, -0.98706650, -3.57027057, ...
$ S <dbl> 0.7143403, 0.6624894, -1.5365134, 0.5985804, 0.7143403, -1.5365134, 0.6896315, -1.5365134, 0.7836385, ...
$ I <dbl> -1.0157459, -1.0157459, -1.0157459, 0.8904827, 1.0520140, -1.0157459, 0.8721398, -1.0157459, 1.3785101...
$ D <dbl> 0.76534970, -0.13507072, 0.87292300, -1.28789940, 2.77441913, -1.00029287, -0.67417647, -1.62958283, -...
$ BM <dbl> 0.20387991, -0.68397621, -1.10253696, -0.49372133, 1.40882750, -0.81081280, -0.12589522, 0.41950211, -...
$ P <dbl> 0.6504082, -0.1684863, 0.7823084, -0.1684863, -2.2875506, 0.5668468, 0.3329083, 0.8846516, 0.1474983, ...
$ A <dbl> 1.43544387, -0.04590939, 0.05247453, -1.25279578, 0.14783077, -0.14751959, -0.59096525, -0.25257485, 1...
$ Segment <int> 4, 3, 2, 3, 5, 2, 3, 1, 4, 2, 2, 2, 2, 4, 4, 1, 5, 2, 3, 3, 4, 2, 2, 3, 4, 4, 2, 3, 4, 2, 4, 4, 3, 2, ...
``
Random data:
r <- function(x) {rnorm(x, 50, 2)}
set.seed(123)
diabetes.df <- data.frame(Age = round(r(10)), Outcome = as.character((r(10) < 50)*1))
> diabetes.df
Age Outcome
1 49 0
2 50 0
3 53 0
4 50 0
5 50 1
6 53 0
7 51 0
8 47 1
9 49 0
10 49 1
Then pivot_wider() will do what you want:
df <- diabetes.df %>%
select(Age, Outcome) %>%
group_by(Age,Outcome) %>%
dplyr::summarize(Diabetes_Count = n()) %>%
filter(Outcome=="1"| Outcome == "0")
df = pivot_wider(df, names_from = c("Outcome"), values_from = "Diabetes_Count", names_prefix = "Outcome_", values_fill = 0)
> df
# A tibble: 5 x 3
# Groups: Age [5]
Age Outcome_1 Outcome_0
<dbl> <int> <int>
1 47 1 0
2 49 1 2
3 50 1 2
4 51 0 1
5 53 0 2
> df %>% mutate(`Outcome_1/Outcome_0` = Outcome_1 / Outcome_0)
# A tibble: 5 x 4
# Groups: Age [5]
Age Outcome_1 Outcome_0 `Outcome_1/Outcome_0`
<dbl> <int> <int> <dbl>
1 47 1 0 Inf
2 49 1 2 0.5
3 50 1 2 0.5
4 51 0 1 0
5 53 0 2 0

ddply has different output depending on sorting/order of .variables used to apply function

My full data (results of dput()) is at the end of the question. I'm trying to make a tile plot with ggplot() and have unevenly spaced x and y measurements, so the tiles don't fill out the full area. Here's an example:
library(ggplot2)
ggplot(data, aes(x = x, y = -y, z = d)) + geom_tile(aes(fill = d))
I don't know for sure, but I think ggplot might default to a tile size of something like unique(data$x)[2] - unique(data$x)[1], hence the rows of my data where this is, indeed, the distance between consecutive x or y measurements touch, but not the rest. I figured I'd make a height and width column for my data using plyr and ddply(), but am experiencing odd results.
For those who aren't going to load the full data, here's the structure:
head(data, 5)
x y d
1 2.0 0 0.28125
2 5.5 0 0.81250
3 11.5 0 0.56250
4 17.5 0 0.46875
5 23.5 0 0.40625
tail(data, 5)
x y d
191 47.5 80.5 0.000
192 53.5 80.5 0.125
193 59.5 80.5 0.000
194 65.5 80.5 0.000
195 71.0 80.5 0.000
So, I'm cycling through every value of x for each unique value of y. Here's how I tried setting a height/width column:
# for each unique value of y, calculate diff for the x's and then add on 1
data$width <- ddply(data, .(y), summarize, width = c(diff(x), 1))$width
# for each unique value of x, calculate diff for the y's and then add on 1
data$height <- ddply(data, .(x), summarize, height = c(diff(y), 1))$height
I just threw on a 1 at the end since the length of diff() for n values is n-1 and I thought I'd play with the correct value to concatenate later. Here's what I'm getting, though:
ggplot(data, aes(x = x, y = -y, z = d)) +
geom_tile(aes(fill = d, height = height, width = width))
The widths are correct, but not the heights. Upon investigating:
head(data, 5)
x y d height width
1 2.0 0 0.28125 5.5 3.5
2 5.5 0 0.81250 6.5 6.0
3 11.5 0 0.56250 6.0 6.0
4 17.5 0 0.46875 6.0 6.0
5 23.5 0 0.40625 6.0 6.0
So, we can see that the widths are correct: 2 -> 5.5 = 3.5, 5.5 -> 11.5 = 6, and so on.
But the heights are not, which we can see if we just look the output of constant x values:
head(data[data$x == 2, ], 5)
x y d height width
1 2 0.0 0.28125 5.5 3.5
14 2 5.5 0.37500 4.5 3.5
27 2 12.0 0.37500 4.5 3.5
40 2 18.0 0.56250 6.0 3.5
53 2 24.0 0.25000 6.0 3.5
The first should be 5.5 (correct), but the second should be 6.5, then 6, and so on.
If I manually run my ddply function by subsetting myself, it seems to work:
c(diff(data[data$x == 2, "y"]), 1)
[1] 5.5 6.5 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 4.5 5.5 4.5 1.0
In re-examining the height values, they appeared to be the same, but re-arranged. Following that observation, I re-sorted my data as though I'd collected data for each unique x while holding y constant, instead of the other way around, and then re-defined my height and width columns:
data_sort <- data[order(data$y, data$x), c("x", "y", "d")]
data_sort$width <- ddply(data_sort, .(y), summarize, width = c(diff(x), 1))$width
data_sort$height <- ddply(data_sort, .(x), summarize, height = c(diff(y), 1))$height
Heights are now correct, but widths are jumbled:
head(data_sort, 5)
x y d width height
1 2 0.0 0.28125 3.5 5.5
14 2 5.5 0.37500 6.0 6.5
27 2 12.0 0.37500 6.0 6.0
40 2 18.0 0.56250 6.0 6.0
53 2 24.0 0.25000 6.0 6.0
66 2 30.0 0.31250 6.0 6.0
What am I missing that ddply isn't keeping things in order when searching over unique but non-consecutive levels/values?
The data:
dput(data)
structure(list(x = c(2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5,
47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5,
41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5,
35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5,
29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5,
23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5,
17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2,
5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5,
71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5,
65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5,
59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5,
53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5,
47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5,
41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5,
35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5,
29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5,
23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71), y = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.5, 5.5, 5.5, 5.5, 5.5,
5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 18, 18, 18, 18, 18, 18, 18, 18, 18,
18, 18, 18, 18, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 36, 36,
36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 42, 42, 48, 48, 48, 48, 48, 48, 48, 48,
48, 48, 48, 48, 48, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54,
54, 54, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 66,
66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 70.5, 70.5, 70.5,
70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 76,
76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 80.5, 80.5, 80.5,
80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5),
d = c(0.28125, 0.8125, 0.5625, 0.46875, 0.40625, 0.3125,
0.25, 0.125, 0.09375, 0.0625, 0.1875, 0.25, 0, 0.375, 0.46875,
0.5, 0.4375, 0.4375, 0.3125, 0.28125, 0.1875, 0.125, 0.0625,
0.1875, 0.3125, 0.5, 0.375, 0.25, 0.375, 0.4375, 0.375, 0.3125,
0.28125, 0.15625, 0.125, 0.0625, 0.1875, 0.3125, 0.5, 0.5625,
0.375, 0.4375, 0.40625, 0.375, 0.3125, 0.25, 0.15625, 0.09375,
0.0625, 0.125, 0.28125, 0.3125, 0.25, 0.34375, 0.40625, 0.40625,
0.375, 0.3125, 0.21875, 0.125, 0.09375, 0.0625, 0.125, 0.25,
0.3125, 0.3125, 0.375, 0.40625, 0.40625, 0.375, 0.3125, 0.21875,
0.09375, 0.0625, 0, 0.09375, 0.15625, 0.25, 0.28125, 0.34375,
0.40625, 0.4375, 0.4375, 0.375, 0.3125, 0.1875, 0.15625,
0.0625, 0.125, 0.25, 0.3125, 0.3125, 0.375, 0.4375, 0.46875,
0.46875, 0.4375, 0.375, 0.28125, 0.5625, 0.0625, 0.125, 0.25,
0.34375, 0.3125, 0.4375, 0.4375, 0.5, 0.5, 0.5, 0.4375, 0.34375,
0.21875, 0.0625, 0.125, 0.25, 0.34375, 0.3125, 0.4375, 0.4375,
0.46875, 0.5, 0.5, 0.4375, 0.34375, 0.21875, 0.09375, 0.15625,
0.3125, 0.34375, 0.25, 0.34375, 0.34375, 0.375, 0.375, 0.6875,
0.3125, 0.1875, 0.125, 0.0625, 0.125, 0.25, 0.3125, 0.125,
0.21875, 0.28125, 0.28125, 0.25, 0.25, 0.1875, 0.09375, 0.0625,
0.0625, 0.1875, 0.3125, 0.4375, 0, 0.125, 0.1875, 0.1875,
0.21875, 0.1875, 0.1875, 0.28125, 0.15625, 0.125, 0.125,
0.375, 0.625, 0, 0.0625, 0.09375, 0.09375, 0.21875, 0.21875,
0.21875, 0.21875, 0.1875, 0.15625, 0.4375, 0.625, 0, 0, 0,
0, 0.09375, 0.125, 0.125, 0.09375, 0.0625, 0, 0.125, 0, 0,
0)), .Names = c("x", "y", "d"), row.names = c(NA, -195L), class = "data.frame")
Silly, silly, silly.
ddply's output re-arranges things into the order it processes them and I completely ignored (forgot/was ignorant) of that fact when I extracted just the output of the height column. So, even though my data was sorted first by y's and then x's, when I called ddply to compute something based on unique x's and /then/ y's, that's how it provided the output.
Just to show this:
head(data)
x y d
1 2.0 0 0.28125
2 5.5 0 0.40625
3 11.5 0 0.56250
4 17.5 0 0.46875
5 23.5 0 0.40625
6 29.5 0 0.31250
And looking at the full output of my ddply call shows that that y's are grouped just how they appear in the original data, so cbinding that column in as data$width works fine:
widths <- ddply(data, .(y), summarize, width = c(diff(x), 1))
head(widths)
y width
1 0 3.5
2 0 6.0
3 0 6.0
4 0 6.0
5 0 6.0
6 0 6.0
But when I did that for the heights, the data was grouped by unique x's, which is not how my data is arranged:
heights <- ddply(data, .(x), summarize, height = c(diff(y), 1))
head(heights)
x height
1 2 5.5
2 2 6.5
3 2 6.0
4 2 6.0
5 2 6.0
6 2 6.0
Certainly didn't warrant a question -- by extracting just the column I wanted, I completely overlooked the form of the ddply output compared to my data.
To get around this, I probably should have created two data frames with both the x and y values along with height and width (calculated from diff()), and then merged them by unique combinations of x and y.

Resources