I'm trying to plot the following data frame with ggplot. This data frame has an ordered summary of top10 scores from 13 separate clusters (10x13 = 130 total observations)
Cell.Type Score
1 GN_Thio_PC 2677.80617009519
2 GN_UrAc_PC 2637.41032364779
3 Mo_6C+II-_Bl 2556.92913594902
4 GN_Arth_SynF 2391.45433001888
5 Mo_6C+II+_Bl 2315.52547519278
6 GN_Bl 2304.98201202492
7 MF_Thio5_II-480int_PC 2285.71825571867
8 MF_Thio5_II+480int_PC 2248.11270401521
9 MF_RP_Sp 2224.65822294734
10 GN_BM 2069.57828218951
11 T_8Mem_Sp 3650.92933141558
12 NK_b2m-_Sp 3623.07526981183
13 NK_DAP10-_Sp 3568.82957776803
14 T_8Nve_Sp_OT1 3535.57114525684
15 T_8Mem_Sp_OT1_d100_LisOva 3532.02217747173
16 NK_49H+_Sp 3519.49779859704
17 NK_DAP12-_Sp 3500.5532642101
18 T_8Eff_Sp_OT1_d10_LisOva 3448.56816636704
19 NKT_4+_Lv 3445.33162798595
20 T_8Nve_LN 3442.41228249856
21 DC_8+_Th 1384.56532273906
22 DC_8+_SLN 1333.48389922898
23 DC_8-_Th 1329.54597466529
24 DC_103+11b-_Lu 1293.2048614532
25 B_GC_Sp 1291.13567921318
26 DC_8+_MLN 1266.18747131352
27 DC_8+_Sp_ST 1251.44702553637
28 SC_MEP_BM 1229.9373063931
29 DC_8-4-11b-_Sp 1204.49733058435
30 DC_103+11b-_Lv 1196.79647629317
31 T_8Mem_Sp 4888.53608836612
32 T_8Nve_Sp_OT1 4844.62387372193
33 T_8Nve_LN 4803.08591833927
34 T_8Nve_MLN 4801.60498804064
35 T_4Nve_LN 4704.34332374882
36 T_8Nve_PP 4680.54098917638
37 T_8Nve_Sp 4668.51868809073
38 T_4Nve_PP 4654.6363768553
39 T_4Nve_MLN 4644.53632493487
40 T_4FP3-_Sp 4643.81716614074
41 MF_Thio5_II-480int_PC 2104.44279848771
42 MF_Thio5_II-480hi_PC 2051.91548567208
43 MF_PPAR-_Lu 1971.01987723135
44 MF_Thio5_II+480int_PC 1968.1818651747
45 MF_Lu 1941.36173402858
46 MF_II-480hi_PC 1821.13874693704
47 MF_Thio5_II+480lo_PC 1787.05712946341
48 GN_Thio_PC 1728.7523034795
49 MF_Microglia_CNS 1711.6599582643
50 MF_II+480lo_PC 1711.26332938833
51 Mo_6C+II-_Bl 2962.12603781126
52 Mo_6C-II-_Bl 2758.18375042302
53 Mo_6C-IIint_Bl 2638.16680094079
54 GN_Bl 2620.07541962536
55 GN_UrAc_PC 2537.2896087047
56 Mo_6C+II+_Bl 2435.36956544536
57 GN_BM 2387.13935841906
58 GN_Thio_PC 2337.29884997719
59 GN_Arth_BM 2311.2451915426
60 GN_Arth_SynF 2128.725307006
61 T_8Eff_Sp_OT1_48hr_LisOva 3285.38489328741
62 T_8Eff_Sp_OT1_24hr_LisOva 3069.01874851731
63 T_8Mem_Sp 3022.56013472619
64 T_ISP_Th 2983.45678085374
65 T_8Eff_Sp_OT1_12hr_LisOva 2964.79056150505
66 T_8Eff_Sp_OT1_d5_VSVOva 2950.29516615634
67 T_8Nve_Sp_OT1 2893.7887214778
68 T_8Eff_Sp_OT1_d6_LisOva 2891.41381948125
69 Tgd_vg3+24alo_e17_Th 2875.40895460188
70 T_8SP24-_Th 2858.05344865649
71 MF_Thio5_II-480int_PC 2783.70950776927
72 MF_Thio5_II-480hi_PC 2737.88858084566
73 MF_Thio5_II+480int_PC 2708.73493567958
74 MF_II-480hi_PC 2377.99196863673
75 MF_RP_Sp 2281.78751440853
76 MF_Microglia_CNS 2145.29897799
77 MF_PPAR-_Lu 2089.46703313723
78 Fi_Sk 2077.00426240616
79 MF_II+480lo_PC 2070.33177217184
80 MF_Thio5_II+480lo_PC 2049.00134439134
81 GN_Thio_PC 3158.65427762739
82 GN_UrAc_PC 2993.45396316058
83 Mo_6C+II-_Bl 2807.36027869234
84 GN_Arth_SynF 2783.56931762011
85 GN_Bl 2666.31559591767
86 Mo_6C+II+_Bl 2472.0977029947
87 GN_BM 2422.62741443588
88 Mo_6C-II-_Bl 2309.43925461481
89 Mo_6C-IIint_Bl 2238.0777055497
90 GN_Arth_BM 2215.70702972594
91 GN_Arth_SynF 3027.3451404511
92 GN_Thio_PC 2939.74912223882
93 GN_UrAc_PC 2694.04395500259
94 MF_Thio5_II-480hi_PC 2507.39045396954
95 GN_Bl 2407.18406139123
96 MF_Thio5_II-480int_PC 2380.65584862485
97 Fi_Sk 2211.83581518875
98 Fi_MTS15+_Th 2209.41411415371
99 MF_Microglia_CNS 2149.92996548155
100 MF_RP_Sp 2111.76895702472
101 T_8Eff_Sp_OT1_48hr_LisOva 4316.4227070348
102 T_ISP_Th 4280.25335061696
103 T_DPbl_Th 4102.01504953757
104 preT_DN3B_Th 3910.32665898991
105 preT_DN3-4_Th 3907.5798288054
106 T_DN4_Th 3840.18882533614
107 SC_MEP_BM 3780.57141700037
108 T_8Eff_Sp_OT1_d5_VSVOva 3757.01659494412
109 Tgd_vg3+24alo_e17_Th 3685.84648926922
110 Tgd_vg5+24ahi_Th 3616.99224871103
111 DC_IIhilang+103+11blo_SLN 4519.50952669406
112 DC_IIhilang+103-11b+_SLN 4415.97080261725
113 DC_IIhilang-103-11blo_SLN 4170.40873917108
114 DC_IIhilang-103-11b+_SLN 3963.46358118485
115 DC_8-4-11b-_MLN 3631.2974118135
116 DC_8-4-11b+_MLN 3386.67029828899
117 DC_8-4-11b+_SLN 3026.47679955977
118 Ep_MEChi_Th 2844.36034968535
119 DC_8-4-11b-_SLN 2835.94178377956
120 DC_8+_MLN 2183.62863550565
121 DC_pDC_8+_SLN 3785.05815189249
122 DC_pDC_8+_Sp 3767.7193092587
123 DC_pDC_8+_MLN 3758.58340817543
124 DC_pDC_8-_Sp 3747.60526193027
125 B_T1_Sp 2325.20093650829
126 B_T2_Sp 2316.21996448563
127 B_Fo_Sp 2253.78988549461
128 B_T3_Sp 2225.2075463753
129 B_Fo_MLN 2159.05742315915
130 B_Fo_PC 2142.51258891406
to obtain a graph like this (which was generated by using the base graphing functions by using a rather ugly code):
The problem I'm running into is that ggplot groups and reorders the data instead of keeping the original order. How can I stop this behavior?
On another note, is there a better way of structuring this data frame for it to better work with ggplot? I'd like to visualize the top10 scores of all clusters visually separated but side-by-side (like in the image I attached above). My ultimate goal is making my code scalable so it can work with different numbers of starting clusters (i.e. 18 clusters as opposed to 13 clusters here) and top scores (ie. top5 as opposed to top10 here) amount just as well with minimal code re-writing.
It is a little tricky because you have duplicate labels in the different groups, so the standard advice to convert the x-axis labels to a factor is not quite enough. Here I made a copy of cell.type before merging cell.type and group id, the graph is plotted with with this and then the labels switched. I used facets to show the labels
The graph needs to be quite large to work
library("tidyverse")
df2 <- df %>% mutate(id = rep(1:13, each = 10),
Cell.Type.label = Cell.Type,
Cell.Type = paste(Cell.Type, id, sep = "_"),
Cell.Type = factor(Cell.Type, levels = Cell.Type))
df2 %>%
ggplot(aes(x = Cell.Type, y = Score, colour = as.factor(id))) +
geom_point(show.legend = FALSE) +
facet_wrap(~id, nrow = 1, scales = "free_x") +
scale_x_discrete(labels = df2$Cell.Type.label) +
theme(panel.spacing = unit(x = 0, units = "pt"),
axis.text.x = element_text(angle = 90, hjust = 1, size = 4))
data
df <- structure(list(Cell.Type = c("GN_Thio_PC", "GN_UrAc_PC", "Mo_6C+II-_Bl",
"GN_Arth_SynF", "Mo_6C+II+_Bl", "GN_Bl", "MF_Thio5_II-480int_PC",
"MF_Thio5_II+480int_PC", "MF_RP_Sp", "GN_BM", "T_8Mem_Sp", "NK_b2m-_Sp",
"NK_DAP10-_Sp", "T_8Nve_Sp_OT1", "T_8Mem_Sp_OT1_d100_LisOva",
"NK_49H+_Sp", "NK_DAP12-_Sp", "T_8Eff_Sp_OT1_d10_LisOva", "NKT_4+_Lv",
"T_8Nve_LN", "DC_8+_Th", "DC_8+_SLN", "DC_8-_Th", "DC_103+11b-_Lu",
"B_GC_Sp", "DC_8+_MLN", "DC_8+_Sp_ST", "SC_MEP_BM", "DC_8-4-11b-_Sp",
"DC_103+11b-_Lv", "T_8Mem_Sp", "T_8Nve_Sp_OT1", "T_8Nve_LN",
"T_8Nve_MLN", "T_4Nve_LN", "T_8Nve_PP", "T_8Nve_Sp", "T_4Nve_PP",
"T_4Nve_MLN", "T_4FP3-_Sp", "MF_Thio5_II-480int_PC", "MF_Thio5_II-480hi_PC",
"MF_PPAR-_Lu", "MF_Thio5_II+480int_PC", "MF_Lu", "MF_II-480hi_PC",
"MF_Thio5_II+480lo_PC", "GN_Thio_PC", "MF_Microglia_CNS", "MF_II+480lo_PC",
"Mo_6C+II-_Bl", "Mo_6C-II-_Bl", "Mo_6C-IIint_Bl", "GN_Bl", "GN_UrAc_PC",
"Mo_6C+II+_Bl", "GN_BM", "GN_Thio_PC", "GN_Arth_BM", "GN_Arth_SynF",
"T_8Eff_Sp_OT1_48hr_LisOva", "T_8Eff_Sp_OT1_24hr_LisOva", "T_8Mem_Sp",
"T_ISP_Th", "T_8Eff_Sp_OT1_12hr_LisOva", "T_8Eff_Sp_OT1_d5_VSVOva",
"T_8Nve_Sp_OT1", "T_8Eff_Sp_OT1_d6_LisOva", "Tgd_vg3+24alo_e17_Th",
"T_8SP24-_Th", "MF_Thio5_II-480int_PC", "MF_Thio5_II-480hi_PC",
"MF_Thio5_II+480int_PC", "MF_II-480hi_PC", "MF_RP_Sp", "MF_Microglia_CNS",
"MF_PPAR-_Lu", "Fi_Sk", "MF_II+480lo_PC", "MF_Thio5_II+480lo_PC",
"GN_Thio_PC", "GN_UrAc_PC", "Mo_6C+II-_Bl", "GN_Arth_SynF", "GN_Bl",
"Mo_6C+II+_Bl", "GN_BM", "Mo_6C-II-_Bl", "Mo_6C-IIint_Bl", "GN_Arth_BM",
"GN_Arth_SynF", "GN_Thio_PC", "GN_UrAc_PC", "MF_Thio5_II-480hi_PC",
"GN_Bl", "MF_Thio5_II-480int_PC", "Fi_Sk", "Fi_MTS15+_Th", "MF_Microglia_CNS",
"MF_RP_Sp", "T_8Eff_Sp_OT1_48hr_LisOva", "T_ISP_Th", "T_DPbl_Th",
"preT_DN3B_Th", "preT_DN3-4_Th", "T_DN4_Th", "SC_MEP_BM", "T_8Eff_Sp_OT1_d5_VSVOva",
"Tgd_vg3+24alo_e17_Th", "Tgd_vg5+24ahi_Th", "DC_IIhilang+103+11blo_SLN",
"DC_IIhilang+103-11b+_SLN", "DC_IIhilang-103-11blo_SLN", "DC_IIhilang-103-11b+_SLN",
"DC_8-4-11b-_MLN", "DC_8-4-11b+_MLN", "DC_8-4-11b+_SLN", "Ep_MEChi_Th",
"DC_8-4-11b-_SLN", "DC_8+_MLN", "DC_pDC_8+_SLN", "DC_pDC_8+_Sp",
"DC_pDC_8+_MLN", "DC_pDC_8-_Sp", "B_T1_Sp", "B_T2_Sp", "B_Fo_Sp",
"B_T3_Sp", "B_Fo_MLN", "B_Fo_PC"), Score = c(2677.80617009519,
2637.41032364779, 2556.92913594902, 2391.45433001888, 2315.52547519278,
2304.98201202492, 2285.71825571867, 2248.11270401521, 2224.65822294734,
2069.57828218951, 3650.92933141558, 3623.07526981183, 3568.82957776803,
3535.57114525684, 3532.02217747173, 3519.49779859704, 3500.5532642101,
3448.56816636704, 3445.33162798595, 3442.41228249856, 1384.56532273906,
1333.48389922898, 1329.54597466529, 1293.2048614532, 1291.13567921318,
1266.18747131352, 1251.44702553637, 1229.9373063931, 1204.49733058435,
1196.79647629317, 4888.53608836612, 4844.62387372193, 4803.08591833927,
4801.60498804064, 4704.34332374882, 4680.54098917638, 4668.51868809073,
4654.6363768553, 4644.53632493487, 4643.81716614074, 2104.44279848771,
2051.91548567208, 1971.01987723135, 1968.1818651747, 1941.36173402858,
1821.13874693704, 1787.05712946341, 1728.7523034795, 1711.6599582643,
1711.26332938833, 2962.12603781126, 2758.18375042302, 2638.16680094079,
2620.07541962536, 2537.2896087047, 2435.36956544536, 2387.13935841906,
2337.29884997719, 2311.2451915426, 2128.725307006, 3285.38489328741,
3069.01874851731, 3022.56013472619, 2983.45678085374, 2964.79056150505,
2950.29516615634, 2893.7887214778, 2891.41381948125, 2875.40895460188,
2858.05344865649, 2783.70950776927, 2737.88858084566, 2708.73493567958,
2377.99196863673, 2281.78751440853, 2145.29897799, 2089.46703313723,
2077.00426240616, 2070.33177217184, 2049.00134439134, 3158.65427762739,
2993.45396316058, 2807.36027869234, 2783.56931762011, 2666.31559591767,
2472.0977029947, 2422.62741443588, 2309.43925461481, 2238.0777055497,
2215.70702972594, 3027.3451404511, 2939.74912223882, 2694.04395500259,
2507.39045396954, 2407.18406139123, 2380.65584862485, 2211.83581518875,
2209.41411415371, 2149.92996548155, 2111.76895702472, 4316.4227070348,
4280.25335061696, 4102.01504953757, 3910.32665898991, 3907.5798288054,
3840.18882533614, 3780.57141700037, 3757.01659494412, 3685.84648926922,
3616.99224871103, 4519.50952669406, 4415.97080261725, 4170.40873917108,
3963.46358118485, 3631.2974118135, 3386.67029828899, 3026.47679955977,
2844.36034968535, 2835.94178377956, 2183.62863550565, 3785.05815189249,
3767.7193092587, 3758.58340817543, 3747.60526193027, 2325.20093650829,
2316.21996448563, 2253.78988549461, 2225.2075463753, 2159.05742315915,
2142.51258891406)), .Names = c("Cell.Type", "Score"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
"101", "102", "103", "104", "105", "106", "107", "108", "109",
"110", "111", "112", "113", "114", "115", "116", "117", "118",
"119", "120", "121", "122", "123", "124", "125", "126", "127",
"128", "129", "130"))
Related
Suppose a data:
df1 <- tibble::tribble(~"M1", ~"M2", ~"Beer, pints", ~"Coffee, oz", ~"Gasoline, galons", ~"Milk, galons", ~"Warehouse, square feet", ~"Nearest place, miles",
"NY", "22", "10", "12", "15", "100", "100", "20",
"NY", "20", "9", "10", "12", "100", "100", "20",
"NY", "18", "8", "9", "11", "100", "100", "20",
"M1", "M2", "Beer, liters", "Coffee, cups (120 ml)", "Gasoline, liters", "Milk, liters", "Warehouse, square meters", "Nearest place, kilometers",
"PR", "22", "7", "8", "9", "70", "67", "7",
"PR", "20", "6", "7", "8", "80", "75", "7",
"M1", "M2", "Beer, pints", "Coffee, oz", "Gasoline, liters", "Milk, liters", "Warehouse, square feet", "Nearest place, miles",
"KR", "22", "6", "6", "7", "60", "50", "9",
"KR", "20", "5", "6", "8", "55", "65", "9",
"KR", "18", "5", "6", "8", "50", "55", "9")
For visual representation:
Is there a nice method to recalculate all columns in the same metrics (like if it is liters, then the entrire column should be liters; if miles (not kilometers), then the entire column to be miles [based on condition in the subheadings inside]?
It could be great to think on the nicest methods to solve it.
PS: for information:
1 gallon = 3.78541 liters
1 pint = 0.473176 liters
1 oz = 0.0295735 liters
11 square feet = 1.02193 square meters
1 mile = 1.60934 kilometers
I am just wondering and just started to consider for solution.
I am interested to look for possible nice solutions.
In addition, it will be interesting for the entire R community to think on the best methods to edit the data by condition.
When the data is sloppy, we must also get our hands dirty.I thought of way, with many steps.
Data
df1 <-
structure(list(m1 = c("M1", "NY", "NY", "NY", "M1", "PR", "PR",
"M1", "KR", "KR", "KR"), m2 = c("M2", "22", "20", "18", "M2",
"22", "20", "M2", "22", "20", "18"), beer = c("Beer, pints",
"10", "9", "8", "Beer, liters", "7", "6", "Beer, pints", "6",
"5", "5"), coffee = c("Coffee, oz", "12", "10", "9", "Coffee, cups (120 ml)",
"8", "7", "Coffee, oz", "6", "6", "6"), gasoline = c("Gasoline, galons",
"15", "12", "11", "Gasoline, liters", "9", "8", "Gasoline, liters",
"7", "8", "8"), milk = c("Milk, galons", "100", "100", "100",
"Milk, liters", "70", "80", "Milk, liters", "60", "55", "50"),
warehouse = c("Warehouse, square feet", "100", "100", "100",
"Warehouse, square meters", "67", "75", "Warehouse, square feet",
"50", "65", "55"), nearest_place = c("Nearest_place, miles",
"20", "20", "20", "Nearest place, kilometers", "7", "7",
"Nearest place, miles", "9", "9", "9")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
Convert function
convert_unit <- function(value,unit){
m <-
case_when(
unit == "galons" ~ 3.78541,
unit == "pints" ~ 0.473176,
unit == "oz" ~ 0.0295735,
unit == "squarefeet" ~ 1.02193/11,
unit == "miles" ~ 1.02193/11,
TRUE ~ 1
)
output <- m*as.numeric(value)
return(output)
}
Data preparation
First, I would add the header as the first row and also create better names.
library(dplyr)
library(stringr)
library(tidyr)
#remotes::install_github("vbfelix/relper")
library(relper)
or_names <- names(df1)
new_names <- str_to_lower(str_select(or_names,before = ","))
n_row <- nrow(df1)
df1[2:(n_row+1),] <- df1
df1[1,] <- as.list(or_names)
names(df1) <- new_names
Data manipulation
Then, I would create new columns with the units, and the apply the function to each one.
df1 %>%
mutate(
across(.cols = -c(m1:m2),.fns = ~str_keep(str_select(.,after = ",")),.names = "{.col}_unit"),
aux = beer_unit == "",
across(.cols = ends_with("_unit"),~if_else(. == "",NA_character_,.))) %>%
fill(ends_with("_unit"),.direction = "down") %>%
filter(aux) %>%
mutate(
across(
.cols = beer:nearest_place,
.fns = ~convert_unit(value = .,unit = get(str_c(cur_column(),"_unit")))
)
) %>%
select(-aux,-ends_with("_unit"))
Output
# A tibble: 8 x 8
m1 m2 beer coffee gasoline milk warehouse nearest_place
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NY 22 4.73 0.355 56.8 379. 9.29 1.86
2 NY 20 4.26 0.296 45.4 379. 9.29 1.86
3 NY 18 3.79 0.266 41.6 379. 9.29 1.86
4 PR 22 7 8 9 70 67 7
5 PR 20 6 7 8 80 75 7
6 KR 22 2.84 0.177 7 60 4.65 0.836
7 KR 20 2.37 0.177 8 55 6.04 0.836
8 KR 18 2.37 0.177 8 50 5.11 0.836
I want to figure out where the grades fall, how many people’s grades fall between the step distance. Just like excel's Count_if in R, so I try to use sum(), threre is the data following down...
test
1 75
2 65
3 51
4 28
5 88
6 55
7 98
8 18
9 58
10 26
11 10
12 50
13 32
14 10
15 47
16 100
17 75
18 74
19 64
20 100
21 30
22 50
23 83
24 93
25 68
26 77
27 30
28 100
29 5
30 98
31 28
32 85
33 56
34 66
35 100
36 20
37 66
38 64
39 88
40 22
41 63
42 98
43 43
44 60
45 47
46 58
47 29
48 71
49 91
50 36
51 16
52 13
53 88
54 0
55 90
56 46
57 78
58 78
59 86
60 31
61 29
62 40
63 28
64 90
When I try to find how many people get 100 in the test, it work, just like...
sum(data$test == 100, na.rm = T)
> sum(data$test == 100, na.rm = T)
[1] 4
But when i try to figure those who get above 90 but not 100, it goes...
sum(data$test < 100 & data$test >= 90, na.rm = T)
> sum(data$test < 100 & data$test >= 90, na.rm = T)
[1] 0
It seems that uncorrect. But when I change the code < 100 to != 100, it works,
sum(data$test != 100 & data$test >= 90, na.rm = T)
> sum(data$test != 100 & data$test >= 90, na.rm = T)
[1] 7
Who can explain the reason for me, thanks a lot!
I'm sure your "test" column is character, try to coerce as.numeric.
sum(data$test < 100 & data$test >= 90, na.rm=TRUE)
# [1] 0
data$test <- as.numeric(data$test) ## coercion
sum(data$test < 100 & data$test >= 90, na.rm=TRUE)
# [1] 7
The reason why it works with == but not with <, >= is the following:
sort(c(10, 9, 11, 100, 1000))
# [1] 9 10 11 100 1000
sort(as.character(c(10, 9, 11, 100, 1000)))
# [1] "10" "100" "1000" "11" "9"
Characters are sorted alphabetically whereas numerics by their values.
Data
data <- structure(list(test = c("75", "65", "51", "28", "88", "55", "98",
"18", "58", "26", "10", "50", "32", "10", "47", "100", "75",
"74", "64", "100", "30", "50", "83", "93", "68", "77", "30",
"100", "5", "98", "28", "85", "56", "66", "100", "20", "66",
"64", "88", "22", "63", "98", "43", "60", "47", "58", "29", "71",
"91", "36", "16", "13", "88", "0", "90", "46", "78", "78", "86",
"31", "29", "40", "28", "90")), row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59",
"60", "61", "62", "63", "64"), class = "data.frame")
A different coding would be to filter on your dataset for the criteria, and then count the number of rows left.
Assuming your data is named data and test is your variable to filter on. If you want us to diagnose your question exactly, then provide a reproducible example by using dput(data) and pasting that to your question for us to read in as a starting point.
library(tidyverse)
data %>%
dplyr::filter(test >= 90, test < 100) %>%
nrow()
I used your code and it worked for me.
I'm not sure why it didn't for you.
data <- structure(list(test = c(0, 5, 10, 10, 13, 16, 18, 20, 22, 26,
28, 28, 28, 29, 29, 30, 30, 31, 32, 36, 40, 43, 46, 47, 47, 50,
50, 51, 55, 56, 58, 58, 60, 63, 64, 64, 65, 66, 66, 68, 71, 74,
75, 75, 77, 78, 78, 83, 85, 86, 88, 88, 88, 90, 90, 91, 93, 98,
98, 98, 100, 100, 100, 100)), class = "data.frame", row.names = c(NA,
-64L))
sum(data$test < 100 & data$test >= 90, na.rm = T)
[1] 7
I am trying to get my data into a specific time-series format. The format I am trying to make my data is similar to the following time series format;
library(fpp)
data(ausbeer)
> str(ausbeer)
Time-Series [1:211] from 1956 to 2008: 284 213 227 308 262 228 236 320 272 233 ...
However my data is currently the following;
> str(wide_DF)
Time-Series [1:5, 1:53] from 1 to 5: 2008 2009 2010 2011 2012 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:53] "year" "1" "2" "3" ...
No matter what I am trying I cannot seem to get it into the same format as the ausbeer data.
Any push in the right direction would be appreciated.
Data:
wide_DF <- structure(c(2008, 2009, 2010, 2011, 2012, 0.149697356812732,
0.506609550726262, 0.483701772054131, 0.340553948856928, 0.333626779091713,
0.0244998111324886, 0.112120844942705, 0.0900944558373256, 0,
0.0415224265012151, 0.0550537737139359, 0.17295508953821, 0.163929433720004,
0.0576641678554561, 0.0906564289945531, 0.0274508134712055, 0.16192922040378,
0.142555512496348, 0.0625454708565096, 0.0713307780915137, 0.213984247872558,
0.388163795230051, 0.164139722545731, 0.0770529539010844, 0.0938540183343052,
0.0783422018092716, 0.227206496783438, 0.35237013136599, 0.258504104665886,
0.321155525044278, 0.0310253280164216, 0.145802804370329, 0.235904612124217,
0.099505662356445, 0.184479613065924, 0.0541816334844162, 0.171606159981382,
0.115107342701831, 0.0741165588765761, 0.108561719279517, 0.0806492605345567,
0.154086848847159, 0.128553880389704, 0.0619227096142703, 0.102441840801919,
0.0877855302949306, 0.210374588670803, 0.168914894757668, 0.0892981276862553,
0.107796585571731, 0.152766825529036, 0.197064573460434, 0.157147609673816,
0.0794331221751312, 0.130178451495829, NA, 0.288013610146669,
0.218033903861127, 0.144165085504355, 0.265694549788369, 0.168423392180753,
0.220217969236187, 0.192778260148724, 0.0616202640553713, 0.208895233807108,
0.172908899350928, 0.273558409751774, NA, 0.131826476698887,
0.214943212753592, 0.185482743591095, NA, 0.264010141661686,
0.137209722798776, NA, 0.213353668598008, 0.288506341574192,
0.265934476984103, 0.166437178815794, 0.213012834405297, 0.229097493059307,
0.326273737259306, 0.209431740094857, NA, 0.240648159088921,
0.261158363124192, 0.317036580243605, 0.244681209115455, 0.166687664239444,
0.240465787771525, 0.282936314890266, 0.376241375996475, 0.288711990429523,
0.218930682309907, 0.294307615813644, 0.340039521860067, 0.381665974567176,
0.289509990005749, 0.222712288785976, 0.302326040229749, 0.592897079173477,
0.707453475415865, 0.315092875222347, 0.238746934161925, 0.360467454111782,
0.437728811188524, 0.485169961326965, 0.686681695697921, 0.513927986995597,
0.657805801598166, 0.413066850628898, 0.420451063363391, 0.452317417206126,
0.392680063685442, 0.467494633248041, 0.490885152462683, 0.449702773878119,
0.374537214449401, 0.314372316775567, 0.352543088557757, 0.456852949424961,
0.502821656395841, 0.473850571102317, 0.37271347773425, 0.468025614416299,
0.492964518353547, 0.491841956261615, 0.451832204837682, 0.330054166675406,
0.452103599554613, 0.972882256833953, 1, 0.836981605987354, 0.735454399633936,
0.625060089794185, 0.420276672512582, 0.44763479957363, 0.51920428542675,
0.484249008420553, 0.828415542650317, 0.439876590158875, 0.458798662510525,
0.446969106246101, 0.329267937698866, 0.402265340895058, 0.443357095278529,
0.48161107578401, 0.421502554574427, 0.35492302612805, 0.389391661815002,
0.480802216652516, 0.496614239968388, 0.41709701215027, 0.355395255525041,
0.427983230181801, 0.426624787626307, 0.47619764751241, 0.390323036410375,
0.346946500338582, 0.444962482661289, 0.398178487457366, 0.460418831412368,
0.365705653465875, 0.314414354295281, 0.404995279601097, 0.395484743345358,
0.447895106385658, 0.333904920716383, 0.315905256117267, 0.38580728350725,
0.61293865090702, 0.392285202440178, 0.300121453991199, 0.318457847197856,
0.382196506098525, 0.42777529076777, 0.655937896884758, 0.579486246422688,
0.512463359506227, 0.601431192394729, 0.283409977946298, 0.430264772601089,
0.321055545570556, 0.311027552565597, 0.419878449584049, 0.295947790026711,
0.323869738229137, 0.215519275318642, 0.192393768801782, 0.326484958316528,
0.317550712975473, 0.303764772399812, 0.215565915142833, 0.177813119709567,
0.288920671391334, 0.299640010568774, 0.258602815268962, 0.208650826721134,
0.192887375961921, 0.273866371013686, 0.300719638221296, 0.260930408982457,
0.214130384575884, 0.20094859121612, 0.262324215127644, 0.291610161608615,
0.240764266638331, 0.232400949526744, 0.190638711181672, 0.194923630854379,
0.247095733415861, 0.250696875411684, 0.17072512824086, 0.142654512656176,
0.221234530015598, 0.336036187889497, 0.337172813493932, 0.241964382857466,
0.188030459289294, 0.247565234387846, 0.228521023231508, 0.227452403443811,
0.156667771761189, 0.131392002677444, 0.229392396017928, 0.213955172137217,
0.229145352317625, 0.145988572682793, 0.1354966579701, 0.164468590746803,
0.273338090020996, 0.28521986301974, 0.199009246024986, 0.178427989941778,
0.218632123403024, 0.485758317106326, 0.478231444703654, 0.371723057102618,
0.358665186970456, 0.437144925882923, 0.143782632825279, 0.132760650342865,
0.0511910889931185, 0.0421970278185858, 0.0830346125807046, 0.169335703112876,
0.127452787871597, 0.0947298145120868, 0.0660661513870076, 0.136073219608577,
0.328666970899003, 0.296331593970631, 0.154669507656273, 0.12819972894051,
0.134629124753297, 0.491065758190125, 0.47654036029283, 0.357442986752192,
NA, 0.388556693139287), .Dim = c(5L, 53L), .Dimnames = list(NULL,
c("year", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
"41", "42", "43", "44", "45", "46", "47", "48", "49", "50",
"51", "52")), .Tsp = c(1, 5, 1), class = c("mts", "ts", "matrix"
))
EDIT:
audbeer shape:
> head(ausbeer, 32)
Qtr1 Qtr2 Qtr3 Qtr4
1956 284 213 227 308
1957 262 228 236 320
1958 272 233 237 313
1959 261 227 250 314
1960 286 227 260 311
1961 295 233 257 339
1962 279 250 270 346
1963 294 255 278 363
I think I see what you're trying to do now. Let's just transpose, and fix the column names.
wide_df2 <- as.data.frame(t(as.data.frame(wide_DF)))
# we need as.data.frame() twice here to strip the ts class, and then add the data.frame class back after t()
Set the first row as column names:
names(wide_df2) <- wide_df2[1,]
Remove the column names from the data:
wide_df2 <- wide_df2[-1, ]
And convert back to ts:
wide_df2 <- ts(wide_df2)
2008 2009 2010 2011 2012
1 0.14969736 0.5066096 0.48370177 0.34055395 0.33362678
2 0.02449981 0.1121208 0.09009446 0.00000000 0.04152243
3 0.05505377 0.1729551 0.16392943 0.05766417 0.09065643
4 0.02745081 0.1619292 0.14255551 0.06254547 0.07133078
5 0.21398425 0.3881638 0.16413972 0.07705295 0.09385402
6 0.07834220 0.2272065 0.35237013 0.25850410 0.32115553
7 0.03102533 0.1458028 0.23590461 0.09950566 0.18447961
8 0.05418163 0.1716062 0.11510734 0.07411656 0.10856172
I have data sets for NHL teams, over a certain number of years. I want to know the best way to join these data sets. For example, I have Chicago Blackhawks stats from 1991 and 1992, with Games Played (GP), Wins (W), Losses (L), etc.
How would I join these sets together, without creating two separate columns, GP.x and GP.y?
I've used dput() to get the first ten teams and their respective statistics:
# 1991 team stats - first ten teams
structure(list(Team = c("Chicago Blackhawks*", "St. Louis Blues*",
"Los Angeles Kings*", "Boston Bruins*", "Calgary Flames*",
"Montreal Canadiens*", "Pittsburgh Penguins*", "New York Rangers*",
"Washington Capitals*", "Buffalo Sabres*"),
GP = c("80", "80", "80", "80", "80", "80", "80", "80", "80", "80"),
W = c("49", "47", "46", "44", "46", "39", "41", "36", "37", "31"),
L = c("23", "22", "24", "24", "26", "30", "33", "31", "36", "30"),
T = c("8", "11", "10", "12", "8", "11", "6", "13", "7", "19"),
Pts = c("106", "105", "102", "100", "100", "89", "88", "85", "81","81"),
`Pts %` = c(".663", ".656", ".638", ".625", ".625", ".556", ".550",
".531", ".506", ".506"),
GF = c("284", "310", "340", "299", "344", "273", "342", "297", "258",
"292"),
GA = c("211", "250", "254", "264", "263", "249", "305", "265", "258",
"278"),
SRS = c("0.85", "0.70", "1.04",
"0.32", "0.98", "0.20", "0.42", "0.36", "0.00", "0.08"),
SOS = c("-0.06", "-0.05", "-0.04", "-0.12", "-0.03", "-0.10", "-0.04",
"-0.04", "0.00", "-0.09"),
`TG/G` = c("6.19", "7.00", "7.43", "7.04", "7.59", "6.53", "8.09",
"7.03", "6.45", "7.13"),
EVGF = c("177", "230", "252", "214", "236", "201", "241", "197", "181",
"204"),
EVGA = c("132", "177", "173", "192", "178", "185", "220", "182", "199",
"208"),
PP = c("87", "70", "80", "74", "91", "66", "89", "91", "64", "73"),
PPO = c("393", "348", "391", "351", "384", "357", "388", "389", "340",
"400"),
`PP%` = c("22.14", "20.11", "20.46", "21.08", "23.70", "18.49", "22.94",
"23.39", "18.82", "18.25"),
PPA = c("68", "55", "63", "64", "77", "54", "73", "73", "44", "62"),
PPOA = c("425", "339", "370", "368", "420", "282", "351", "362", "314",
"368"),
`PK%` = c("84.00", "83.78", "82.97", "82.61", "81.67", "80.85", "79.20",
"79.83", "85.99", "83.15"),
SH = c("20", "10", "8", "11", "17", "6", "12", "9", "13", "15"),
SHA = c("10", "18", "18", "8", "8", "10", "12", "10", "15", "8"),
`PIM/G` = c("29.9", "24.6", "27.6", "20.8", "27.1", "17.6", "20.4",
"23.4", "22.8", "21.3"),
`oPIM/G` = c("28.2", "25.3", "30.5", "23.3", "25.9", "19.5", "21.3",
"24.1", "25.3", "22.1"),
S = c("2564", "2550", "2410", "2512", "2604", "2385", "2416", "2444",
"2370", "2410"),
`S%` = c("11.1", "12.2", "14.1", "11.9", "13.2", "11.4", "14.2", "12.2",
"10.9", "12.1"),
SA = c("2214", "2345", "2412", "2240", "2200", "2316", "2723", "2550",
"2112", "2432"),
`SV%` = c(".905", ".893", ".895", ".882", ".880", ".892",
".888", ".896", ".878", ".886"),
PDO = c("", "", "", "", "", "", "", "", "", "")),
.Names = c("Team", "GP", "W", "L", "T", "Pts", "Pts %", "GF", "GA",
"SRS", "SOS", "TG/G", "EVGF", "EVGA", "PP", "PPO", "PP%", "PPA",
"PPOA", "PK%", "SH", "SHA", "PIM/G", "oPIM/G", "S", "S%", "SA",
"SV%", "PDO"),
row.names = 2:11, class = "data.frame")
# 1992 team stats - first ten teams
structure(list(Team = c("New York Rangers*", "Washington Capitals*",
"Detroit Red Wings*", "Vancouver Canucks*", "Montreal Canadiens*",
"Pittsburgh Penguins*", "Chicago Blackhawks*", "New Jersey Devils*",
"Boston Bruins*", "Los Angeles Kings*"),
GP = c("80", "80", "80",
"80", "80", "80", "80", "80", "80", "80"),
W = c("50", "45", "43", "42", "41", "39", "36", "38", "36", "35"),
L = c("25", "27", "25", "26", "28", "32", "29", "31", "32", "31"),
T = c("5", "8", "12", "12", "11", "9", "15", "11", "12", "14"),
Pts = c("105", "98", "98", "96", "93", "87", "87", "87", "84", "84"),
`Pts %` = c(".656", ".613", ".613", ".600", ".581", ".544", ".544",
".544", ".525", ".525"),
GF = c("321", "330", "320", "285", "267", "343", "257", "289", "270",
"287"),
GA = c("246", "275", "256", "250",
"207", "308", "236", "259", "275", "296"),
SRS = c("1.02", "0.78", "0.74", "0.31", "0.64", "0.52", "0.22", "0.48",
"-0.09", "-0.19"),
SOS = c("0.08", "0.09", "-0.06", "-0.13", "-0.12", "0.08", "-0.04",
"0.10", "-0.03", "-0.08"),
`TG/G` = c("7.09", "7.56", "7.20", "6.69", "5.93", "8.14", "6.16",
"6.85", "6.81", "7.29"),
EVGF = c("226", "224", "230", "188", "189", "235", "165", "215", "186",
"197"),
EVGA = c("174", "200", "171", "167", "142", "217", "150", "181", "189",
"208"),
PP = c("81", "92", "72", "85", "74", "92", "81", "59", "77", "79"),
PPO = c("387", "412", "386", "439", "379", "423", "467", "338", "406",
"411"),
`PP%` = c("20.93", "22.33", "18.65", "19.36", "19.53", "21.75", "17.34",
"17.46", "18.97", "19.22"),
PPA = c("60", "60", "78", "76", "60", "77", "76", "68", "72", "76"),
PPOA = c("395", "368", "419", "382", "320", "383", "482", "374", "363",
"417"),
`PK%` = c("84.81", "83.70", "81.38", "80.10", "81.25", "79.90",
"84.23", "81.82", "80.17", "81.77"),
SH = c("14", "14", "18",
"12", "4", "16", "11", "15", "7", "11"),
SHA = c("12", "15",
"7", "7", "5", "14", "10", "10", "14", "12"),
`PIM/G` = c("22.4", "21.8", "25.6", "25.7", "19.3", "23.7", "33.0",
"20.0", "21.8", "26.9"),
`oPIM/G` = c("24.1", "24.2", "23.9", "28.4", "22.0",
"23.9", "31.8", "20.4", "23.7", "25.6"),
S = c("2632", "2481", "2478", "2669", "2443", "2542", "2646", "2495",
"2664", "2419"),
`S%` = c("12.2", "13.3", "12.9", "10.7", "10.9", "13.5", "9.7",
"11.6", "10.1", "11.9"),
SA = c("2543", "2270", "2238", "2299", "2227", "2518", "2028", "2290",
"2339", "2663"),
`SV%` = c(".903", ".879", ".886", ".891", ".907", ".878", ".884",
".887", ".882", ".889"),
PDO = c("", "", "", "", "", "", "", "", "", "")),
.Names = c("Team", "GP", "W", "L", "T", "Pts", "Pts %", "GF", "GA",
"SRS", "SOS", "TG/G", "EVGF", "EVGA", "PP", "PPO", "PP%", "PPA",
"PPOA", "PK%", "SH", "SHA", "PIM/G", "oPIM/G", "S", "S%", "SA",
"SV%", "PDO"),
row.names = 2:11, class = "data.frame")
I understand joining these sets may be... tough, but any advice/thoughts would be great! Thanks!
The issue is that trying to merge, join, or cbind two data frames with the same column names will either give you column names like GP.x, which you said you don't want, or will throw errors when you try to create multiple columns with the same name. You can verify that the two data frames have identical column names like so:
names(df91) == names(df92)
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [29] TRUE
It also is just more logical to have, for example, all your GP observations under a single column GP where observations are demarcated with the year in which they occurred, rather than two GP columns and no clear way of knowing which is for which year. This also wouldn't scale well—if you're scraping score data, I'm guessing at some point you'll want more than just these two years.
You instead want to be binding rows. You can give each data frame a column for the year, and use rbind in base R to bind them into one data frame.
df91$year <- 1991
df92$year <- 1992
df_base <- rbind(df91, df92)
head(df_base)
#> Team GP W L T Pts Pts % GF GA SRS SOS TG/G EVGF
#> 2 Chicago Blackhawks* 80 49 23 8 106 .663 284 211 0.85 -0.06 6.19 177
#> 3 St. Louis Blues* 80 47 22 11 105 .656 310 250 0.70 -0.05 7.00 230
#> 4 Los Angeles Kings* 80 46 24 10 102 .638 340 254 1.04 -0.04 7.43 252
#> 5 Boston Bruins* 80 44 24 12 100 .625 299 264 0.32 -0.12 7.04 214
#> 6 Calgary Flames* 80 46 26 8 100 .625 344 263 0.98 -0.03 7.59 236
#> 7 Montreal Canadiens* 80 39 30 11 89 .556 273 249 0.20 -0.10 6.53 201
#> EVGA PP PPO PP% PPA PPOA PK% SH SHA PIM/G oPIM/G S S% SA SV%
#> 2 132 87 393 22.14 68 425 84.00 20 10 29.9 28.2 2564 11.1 2214 .905
#> 3 177 70 348 20.11 55 339 83.78 10 18 24.6 25.3 2550 12.2 2345 .893
#> 4 173 80 391 20.46 63 370 82.97 8 18 27.6 30.5 2410 14.1 2412 .895
#> 5 192 74 351 21.08 64 368 82.61 11 8 20.8 23.3 2512 11.9 2240 .882
#> 6 178 91 384 23.70 77 420 81.67 17 8 27.1 25.9 2604 13.2 2200 .880
#> 7 185 66 357 18.49 54 282 80.85 6 10 17.6 19.5 2385 11.4 2316 .892
#> PDO year
#> 2 1991
#> 3 1991
#> 4 1991
#> 5 1991
#> 6 1991
#> 7 1991
Or you can do it in one step with dplyr's bind_rows, and mutate to create the year columns. bind_rows has the advantage also of not being limited to 2 arguments, so you can scale this if you have more than just these two years' worth of data.
df_dplyr <- dplyr::bind_rows(
dplyr::mutate(df91, year = 1991),
dplyr::mutate(df92, year = 1992)
)
head(df_dplyr)
#> Team GP W L T Pts Pts % GF GA SRS SOS TG/G EVGF
#> 1 Chicago Blackhawks* 80 49 23 8 106 .663 284 211 0.85 -0.06 6.19 177
#> 2 St. Louis Blues* 80 47 22 11 105 .656 310 250 0.70 -0.05 7.00 230
#> 3 Los Angeles Kings* 80 46 24 10 102 .638 340 254 1.04 -0.04 7.43 252
#> 4 Boston Bruins* 80 44 24 12 100 .625 299 264 0.32 -0.12 7.04 214
#> 5 Calgary Flames* 80 46 26 8 100 .625 344 263 0.98 -0.03 7.59 236
#> 6 Montreal Canadiens* 80 39 30 11 89 .556 273 249 0.20 -0.10 6.53 201
#> EVGA PP PPO PP% PPA PPOA PK% SH SHA PIM/G oPIM/G S S% SA SV%
#> 1 132 87 393 22.14 68 425 84.00 20 10 29.9 28.2 2564 11.1 2214 .905
#> 2 177 70 348 20.11 55 339 83.78 10 18 24.6 25.3 2550 12.2 2345 .893
#> 3 173 80 391 20.46 63 370 82.97 8 18 27.6 30.5 2410 14.1 2412 .895
#> 4 192 74 351 21.08 64 368 82.61 11 8 20.8 23.3 2512 11.9 2240 .882
#> 5 178 91 384 23.70 77 420 81.67 17 8 27.1 25.9 2604 13.2 2200 .880
#> 6 185 66 357 18.49 54 282 80.85 6 10 17.6 19.5 2385 11.4 2316 .892
#> PDO year
#> 1 1991
#> 2 1991
#> 3 1991
#> 4 1991
#> 5 1991
#> 6 1991
Created on 2018-06-18 by the reprex package (v0.2.0).
I have a data frame, the rows of which I would like to sort based on time stamp.
V1 V2 V3 V4 V5 V6
1 {"2014-08-01T01:00:00": "64", "2014-08-01T13:00:00": "53", "2014-08-01T01:20:00": "73",
2 {"2014-08-02T18:00:00": "37", "2014-08-02T22:00:00": "56", "2014-08-02T17:00:00": "24",
3 {"2014-08-03T17:50:00": "78", "2014-08-03T04:20:00": "83", "2014-08-03T00:20:00": "73",
4 {"2014-08-04T15:00:00": "37", "2014-08-04T21:00:00": "39", "2014-08-04T15:20:00": "43",
5 {"2014-08-05T19:20:00": "78", "2014-08-05T13:20:00": "46", "2014-08-05T00:00:00": "62",
6 {"2014-08-06T11:00:00": "45", "2014-08-06T09:00:00": "56", "2014-08-06T21:50:00": "68",
V7 V8 V9 V10 V11 V12
1 "2014-08-01T13:20:00": "57", "2014-08-01T13:50:00": "47", "2014-08-01T20:50:00": "44",
2 "2014-08-02T01:00:00": "56", "2014-08-02T17:20:00": "42", "2014-08-02T01:20:00": "68",
3 "2014-08-03T23:00:00": "81", "2014-08-03T00:00:00": "63", "2014-08-03T00:50:00": "73",
4 "2014-08-04T02:00:00": "81", "2014-08-04T18:00:00": "29", "2014-08-04T02:20:00": "88",
5 "2014-08-05T00:20:00": "72", "2014-08-05T00:50:00": "77", "2014-08-05T19:00:00": "75",
6 "2014-08-06T14:20:00": "53", "2014-08-06T14:00:00": "40", "2014-08-06T23:20:00": "77",
Desired out
The output of only one row is shown below.
{"2014-08-01T01:00:00": "64", "2014-08-01T01:20:00": "73", "2014-08-01T13:00:00": "53", "2014-08-01T13:20:00": "57", "2014-08-01T13:50:00": "47", "2014-08-01T20:50:00": "44",
We convert the datetime columns (df2[c(TRUE, FALSE)]- we are subsetting by recycling the logical vector) to POSIXct class by looping through the columns with lapply, then order by row using apply with MARGIN=1 ('m1'). We split the time columns and the value columns by row to create two lists 'l1', 'l2', then use Map with paste to concatenate the string together after we order the elements based on 'm1'. This can be converted to data.frame with one column.
df2[c(TRUE, FALSE)] <- lapply(df1[c(TRUE, FALSE)], function(x) as.POSIXct(sub('[{]', '', x), format = '%Y-%m-%dT%H:%M:%S:'))
m1 <- apply(df2[c(TRUE, FALSE)], 1, order)
l1 <- split(as.matrix(df1[c(TRUE, FALSE)]), row(df1[c(TRUE, FALSE)]))
l2 <- split(as.matrix(df2[c(FALSE, TRUE)]), row(df2[c( FALSE, TRUE)]))
data.frame(col1=unlist(Map(function(x,y,z) paste0('{',
paste(gsub('^\\{*(\\d+.*)(\\:)', '"\\1"\\2', x[z]),
gsub('(\\d+)', '"\\1"', y[z]), sep=' ', collapse=' ')),
l1, l2, split(m1, col(m1)))), stringsAsFactors=FALSE)
col1
#1 {"2014-08-01T01:00:00": "64", "2014-08-01T01:20:00": "73", "2014-08-01T13:00:00": "53", "2014-08-01T13:20:00": "57", "2014-08-01T13:50:00": "47", "2014-08-01T20:50:00": "44",
#2 {"2014-08-02T01:00:00": "56", "2014-08-02T01:20:00": "68", "2014-08-02T17:00:00": "24", "2014-08-02T17:20:00": "42", "2014-08-02T18:00:00": "37", "2014-08-02T22:00:00": "56",
#3 {"2014-08-03T00:00:00": "63", "2014-08-03T00:20:00": "73", "2014-08-03T00:50:00": "73", "2014-08-03T04:20:00": "83", "2014-08-03T17:50:00": "78", "2014-08-03T23:00:00": "81",
#4 {"2014-08-04T02:00:00": "81", "2014-08-04T02:20:00": "88", "2014-08-04T15:00:00": "37", "2014-08-04T15:20:00": "43", "2014-08-04T18:00:00": "29", "2014-08-04T21:00:00": "39",
#5 {"2014-08-05T00:00:00": "62", "2014-08-05T00:20:00": "72", "2014-08-05T00:50:00": "77", "2014-08-05T13:20:00": "46", "2014-08-05T19:00:00": "75", "2014-08-05T19:20:00": "78",
#6 {"2014-08-06T09:00:00": "56", "2014-08-06T11:00:00": "45", "2014-08-06T14:00:00": "40", "2014-08-06T14:20:00": "53", "2014-08-06T21:50:00": "68", "2014-08-06T23:20:00": "77",
data
lines <- readLines(textConnection('V1 V2 V3 V4 V5 V6
1 {"2014-08-01T01:00:00": "64", "2014-08-01T13:00:00": "53", "2014-08-01T01:20:00": "73",
2 {"2014-08-02T18:00:00": "37", "2014-08-02T22:00:00": "56", "2014-08-02T17:00:00": "24",
3 {"2014-08-03T17:50:00": "78", "2014-08-03T04:20:00": "83", "2014-08-03T00:20:00": "73",
4 {"2014-08-04T15:00:00": "37", "2014-08-04T21:00:00": "39", "2014-08-04T15:20:00": "43",
5 {"2014-08-05T19:20:00": "78", "2014-08-05T13:20:00": "46", "2014-08-05T00:00:00": "62",
6 {"2014-08-06T11:00:00": "45", "2014-08-06T09:00:00": "56", "2014-08-06T21:50:00": "68",'))
lines2 <- readLines(textConnection('V7 V8 V9 V10 V11 V12
1 "2014-08-01T13:20:00": "57", "2014-08-01T13:50:00": "47", "2014-08-01T20:50:00": "44",
2 "2014-08-02T01:00:00": "56", "2014-08-02T17:20:00": "42", "2014-08-02T01:20:00": "68",
3 "2014-08-03T23:00:00": "81", "2014-08-03T00:00:00": "63", "2014-08-03T00:50:00": "73",
4 "2014-08-04T02:00:00": "81", "2014-08-04T18:00:00": "29", "2014-08-04T02:20:00": "88",
5 "2014-08-05T00:20:00": "72", "2014-08-05T00:50:00": "77", "2014-08-05T19:00:00": "75",
6 "2014-08-06T14:20:00": "53", "2014-08-06T14:00:00": "40", "2014-08-06T23:20:00": "77",'))
d1 <- read.table(text=gsub('^\\d+\\s+|"', '', lines), header=TRUE, stringsAsFactors=FALSE)
d2 <- read.table(text=gsub('^\\d+\\s+|"', '', lines2), header=TRUE, stringsAsFactors=FALSE)
df1 <- cbind(d1, d2)
df2 <- df1