separate data in a column into multiple columns - r
I have some data which looks like:
> head(d)
V1 V2 V3 V4 V5 V6 V7 V8 V9
50 28 79 4 6 48 2 17 4 20
51 28 79 4 6 48 2 17 4 21
52 28 79 4 6 48 2 17 4 22
53 28 79 4 6 48 2 17 4 23
54 28 79 4 6 48 2 17 4 24
55 28 79 4 6 48 2 17 4 25
V10
50 000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V
51 000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.5V000.8V000.8V000.5V000.4V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.6V
52 000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.6V000.4V000.4V000.5V000.5V000.5V000.4V000.4V000.4V000.5V000.5V000.5V000.7V000.7V000.8V
53 000.7V000.5V000.4V000.5V000.5V000.4V000.4V000.4V000.4V000.5V000.5V000.4V000.4V000.3V000.4V000.4V000.4V000.3V000.4V000.3V000.4V000.9V001.0V001.0V
54 000.8V000.5V000.3V000.4V000.4V000.4V000.7V001.2V001.2V001.0N000.5V000.4V000.4V000.4V000.4V000.3V000.3V000.3V000.4V000.4V000.5V000.5V000.6V000.6V
55 000.5V000.5V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.6V000.5V000.5V000.4V000.3V000.4V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.3V000.4V
Where column V10 is a large column separated by 24 V's and N's. The V's are "valid" observations and the N's are non-valid observations.
I want to separate the V10 column. I tried using the following but it does not solve the N's problem.
sep_data <- df %>%
separate(V10, into = paste("x",1:24, sep = "_"), sep = "V")
Looking at the tail() of the sep_data:
V1 V2 V3 V4 V5 x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 x_11 x_12
174 28079008 6 48 2 17042 4002.0 001.1 000.6 000.4 000.4 000.4 000.4 000.5 000.7 000.9 000.7N000.5 000.5
175 28079008 6 48 2 17042 5000.7 000.5 000.4 000.3 000.3 000.3 000.3 000.4 000.5 000.6 000.5 000.5
176 28079008 6 48 2 17042 6000.4 000.3 000.3 000.3 000.3 000.3 000.3 000.4 000.5 000.6 000.5 000.4
177 28079008 6 48 2 17042 7000.3 000.3 000.2 000.2 000.2 000.2 000.2 000.3 000.4 000.5 000.4 000.4
178 28079008 6 48 2 17042 8000.3 000.3 000.3 000.3 000.3 000.3 000.3 000.4 000.5 000.5 000.5 000.5
179 28079008 6 48 2 17042 9000.3 000.3 000.3 000.3 000.3 000.3 000.3 000.3 000.3 000.3 000.3 000.3
x_13 x_14 x_15 x_16 x_17 x_18 x_19 x_20 x_21 x_22 x_23 x_24 nchar
174 000.4 000.5 000.5 000.5 000.5 000.5 000.6 000.6 000.6 000.6 000.7 145
175 000.5 000.5 000.5 000.5 000.5 000.5 000.5 000.6 000.6 000.6 000.6 000.5 145
176 000.4 000.4 000.5 000.5 000.5 000.5 000.5 000.5 000.5 000.4 000.4 000.3 145
177 000.4 000.4 000.4 000.4 000.4 000.5 000.5 000.6 000.5 000.5 000.5 000.4 145
178 000.4 000.4 000.4 000.5 000.5 000.5 000.5 000.4 000.4 000.4 000.4 000.4 145
179 000.4 000.3 000.3 000.3 000.3 000.3 000.4 000.5 000.5 000.5 000.6 000.5 145
I have 000.7N000.5.
How can I use separate() to separate based on V OR N.
Data:
structure(list(V1 = c(28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 28), V2 = c(79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79,
79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79, 79), V3 = c(4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8), V4 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6), V5 = c(48, 48, 48, 48, 48, 48, 48,
48, 48, 48, 48, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38,
38, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48,
48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48, 48), V6 = c(2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2), V7 = c(17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
17, 17, 17, 17, 17, 17, 17), V8 = c(4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), V9 = c(20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29), V10 = c("000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.5V000.8V000.8V000.5V000.4V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.6V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.6V000.4V000.4V000.5V000.5V000.5V000.4V000.4V000.4V000.5V000.5V000.5V000.7V000.7V000.8V",
"000.7V000.5V000.4V000.5V000.5V000.4V000.4V000.4V000.4V000.5V000.5V000.4V000.4V000.3V000.4V000.4V000.4V000.3V000.4V000.3V000.4V000.9V001.0V001.0V",
"000.8V000.5V000.3V000.4V000.4V000.4V000.7V001.2V001.2V001.0N000.5V000.4V000.4V000.4V000.4V000.3V000.3V000.3V000.4V000.4V000.5V000.5V000.6V000.6V",
"000.5V000.5V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.6V000.5V000.5V000.4V000.3V000.4V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.3V000.4V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.5V000.4V000.4V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.5V000.4V000.3V000.3V000.3V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.3V000.4V000.4V000.4V000.4V000.4V000.4V000.3V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.4V000.4V000.4V000.3V000.4V000.4V000.4V000.3V",
"000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.4V000.5V000.6V000.6V",
"000.7V000.7V000.6V000.4V000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.4V",
"00003V00004V00002V00002V00001V00002V00004V00003V00005V00005V00005V00009V00011V00014V00014V00009V00008V00010V00011V00014V00011V00008V00009V00006V",
"00005V00005V00003V00003V00002V00003V00009V00023V00069V00068V00008V00013V00010V00007V00012V00008V00007V00004V00006V00005V00007V00006V00004V00004V",
"00002V00001V00002V00001V00001V00001V00003V00007V00018V00022V00042V00034V00021V00016V00018V00011V00009V00009V00014V00010V00010V00033V00083V00111V",
"00098V00046V00016V00010V00005V00013V00052V00217V00337V00485V00138V00029V00026V00020V00019V00011V00006V00012V00012V00008V00010V00007V00005V00004V",
"00002V00002V00001V00001V00001V00001V00002V00007V00013V00020V00012V00010V00010V00012V00009V00007V00006V00008V00010V00009V00010V00006V00004V00003V",
"00002V00001V00001V00001V00001V00002V00003V00009V00014V00015V00013V00014V00018V00017V00010V00009V00010V00009V00006V00007V00008V00036V00061V00007V",
"00005V00005V00002V00002V00002V00002V00003V00008V00055V00038V00020V00023V00022V00015V00017V00011V00007V00006V00009V00009V00015V00117V00082V00096V",
"00046V00031V00047V00042V00037V00033V00040V00044V00067V00093V00053V00009V00006V00006V00006V00004V00003V00004V00020V00016V00010V00020V00049V00104V",
"00088V00112V00055V00046V00013V00043V00023V00023V00040V00093V00046V00011V00007V00007V00007V00005V00008V00007V00010V00010V00025V00079V00059V00025V",
"00028V00038V00022V00017V00018V00028V00049V00152V00116V00158V00087V00030V00024V00017V00018V00008V00008V00008V00010V00013V00012V00022V00072V00083V",
"00060V00024V00032V00026V00014V00032V00078V00129V00199V00164V00107V00067V00052V00024V00012V00009V00015V00015V00036V00024V00019V00035V00056V00062V",
"00020V00046V00042V00019V00003V00004V00020V00091V00159V00137V00121V00052V00045V00039V00008V00004V00005V00004V00008V00009V00004V00002V00009V00005V",
"00070V00027V00026V00038V00020V00011V00016V00057V00073V00111V00053V00031V00018V00005V00003V00002V00002V00002V00010V00030V00033V00009V00005V00017V",
"00033V00059V00042V00019V00018V00015V00017V00024V00036V00063V00026V00008V00005V00003V00003V00002V00003V00005V00005V00009V00009V00007V00005V00006V",
"00009V00004V00001V00001V00002V00015V00012V00019V00031V00028V00013V00010V00008V00007V00004V00003V00003V00006V00017V00017V00014V00024V00012V00002V",
"00002V00002V00003V00002V00001V00001V00002V00002V00009V00003V00004V00004V00006V00007V00010V00005V00004V00003V00003V00002V00002V00005V00067V00080V",
"00063V00039V00025V00004V00003V00006V00015V00062V00144V00098V00079V00026V00017V00015V00007V00005V00004V00006V00004V00003V00003V00002V00003V00005V",
"00004V00001V00001V00001V00001V00001V00002V00005V00011V00014V00013V00012V00013V00020V00014V00013V00008V00007V00007V00006V00007V00006V00003V00002V",
"00002V00001V00001V00001V00001V00001V00002V00004V00008V00012V00025V00020V00026V00016V00017V00012V00011V00012V00017V00010V00012V00011V00006V00004V",
"00003V00002V00002V00001V00001V00002V00008V00013V00013V00019V00022V00015V00019V00018V00015V00015V00010V00011V00016V00015V00012V00011V00007V00005V",
"00003V00002V00002V00001V00001V00002V00004V00026V00145V00131V00027V00021V00020V00017V00019V00014V00011V00013V00012V00010V00007V00016V00025V00034V",
"00004V00003V00003V00002V00002V00003V00007V00024V00022V00046V00013V00011V00012V00016V00019V00011V00010V00010V00018V00022V00039V00060V00051V00121V",
"00085V00038V00024V00056V00039V00024V00022V00026V00040V00058V00041V00010V00010V00008V00013V00008V00008V00006V00013V00005V00004V00068V00079V00102V",
"00081V00019V00005V00015V00008V00048V00114V00179V00240V00153N00031V00021V00024V00018V00013V00008V00004V00006V00010V00014V00009V00013V00010V00032V",
"00018V00030V00006V00001V00001V00001V00003V00024V00038V00046V00024V00009V00010V00010V00006V00004V00004V00004V00004V00006V00004V00003V00002V00006V",
"00003V00002V00002V00001V00001V00001V00002V00012V00021V00025V00027V00011V00009V00006V00007V00004V00006V00005V00006V00020V00008V00004V00003V00004V",
"00002V00001V00001V00001V00001V00001V00002V00004V00007V00010V00011V00013V00013V00018V00010V00006V00005V00007V00010V00014V00016V00009V00006V00004V",
"00003V00002V00001V00001V00001V00001V00003V00006V00009V00012V00011V00015V00019V00029V00022V00016V00010V00013V00009V00008V00012V00012V00017V00009V",
"00009V00006V00003V00002V00003V00002V00003V00007V00007V00009V00010V00012V00011V00022V00013V00011V00011V00013V00019V00029V00022V00024V00037V00047V",
"00074V00082V00059V00021V00003V00002V00001V00002V00002V00002V00003V00003V00003V00004V00007V00007V00004V00004V00005V00010V00007V00009V00005V00005V",
"00025V00021V00010V00010V00008V00016V00032V00023V00023V00017V00013V00021V00024V00028V00027V00020V00020V00024V00029V00040V00048V00049V00043V00028V",
"00027V00033V00035V00022V00026V00036V00048V00063V00065V00059V00022V00029V00025V00022V00029V00028V00021V00017V00018V00021V00031V00030V00024V00020V",
"00012V00008V00008V00006V00005V00009V00018V00040V00059V00053V00063V00060V00044V00040V00043V00032V00029V00032V00052V00058V00071V00110V00128V00117V",
"00097V00074V00058V00049V00043V00043V00052V00094V00127V00160V00091V00050V00043V00038V00040V00031V00021V00038V00046V00043V00050V00061V00050V00022V",
"00017V00014V00019V00018V00008V00011V00014V00042V00050V00048V00027V00021V00019V00021V00016V00013V00016V00019V00028V00030V00034V00035V00021V00016V",
"00010V00008V00007V00006V00007V00009V00015V00040V00058V00045V00034V00032V00039V00032V00023V00024V00029V00029V00033V00039V00068V00112V00096V00047V",
"00030V00024V00018V00010V00010V00011V00019V00045V00071V00061V00050V00052V00049V00045V00046V00038V00027V00029V00041V00046V00073V00143V00131V00109V",
"00065V00054V00063V00067V00060V00055V00060V00062V00067V00075V00068V00036V00026V00025V00028V00020V00016V00020V00044V00056V00062V00098V00127V00116V",
"00091V00094V00079V00067V00040V00057V00056V00054V00054V00065V00059V00041V00027V00030V00026V00019V00023V00022V00032V00040V00060V00087V00079V00054V",
"00047V00061V00057V00050V00049V00049V00056V00085V00074V00086V00084V00062V00058V00048V00047V00027V00026V00026V00036V00064V00085V00096V00117V00106V",
"00084V00072V00062V00055V00042V00055V00066V00082V00099V00098V00096V00092V00089V00064V00045V00038V00036V00042V00075V00066V00066V00087V00088V00103V",
"00092V00090V00074V00059V00046V00038V00051V00076V00092V00091V00097V00084V00084V00081V00033V00017V00020V00018V00031V00045V00038V00041V00038V00065V",
"00113V00086V00082V00072V00059V00050V00050V00057V00058V00067V00072V00070V00061V00025V00009V00007V00010V00013V00028V00071V00084V00060V00044V00068V",
"00090V00084V00070V00061V00053V00049V00041V00041V00041V00048V00043V00030V00021V00012V00011V00007V00007V00009V00013V00021V00034V00044V00040V00041V",
"00057V00049V00015V00027V00032V00047V00041V00045V00046V00040V00034V00028V00023V00022V00009V00008V00007V00014V00033V00049V00056V00072V00039V00019V",
"00021V00017V00013V00008V00007V00011V00015V00013V00027V00010V00012V00011V00016V00023V00031V00021V00020V00017V00017V00021V00035V00073V00130V00115V",
"00102V00084V00071V00049V00041V00043V00045V00067V00083V00082V00083V00058V00048V00049V00028V00019V00015V00021V00025V00026V00033V00031V00042V00060V",
"00025V00009V00006V00010V00006V00008V00016V00038V00054V00047V00038V00037V00040V00056V00041V00041V00027V00027V00029V00034V00049V00065V00024V00012V",
"00010V00007V00004V00004V00003V00004V00010V00025V00041V00039V00051V00046V00051V00046V00046V00033V00031V00035V00040V00030V00036V00041V00035V00028V",
"00020V00015V00013V00014V00012V00017V00032V00056V00051V00057V00052V00037V00040V00039V00038V00040V00033V00036V00047V00050V00044V00046V00038V00032V",
"00021V00017V00015V00012V00010V00013V00024V00050V00101V00099V00053V00048V00043V00040V00044V00038V00035V00042V00043V00042V00040V00057V00060V00058V",
"00036V00030V00028V00021V00020V00035V00049V00067V00058V00070V00042V00038V00044V00053V00050V00044V00039V00043V00054V00059V00082V00098V00116V00119V",
"00101V00085V00076V00078V00066V00058V00054V00053V00049V00053V00058V00035V00033V00028V00032V00029V00027V00025V00047V00028V00049V00122V00135V00123V",
"00113V00077V00040V00041V00050V00056V00074V00091V00113V00097N00068V00055V00059V00044V00037V00024V00020V00028V00045V00058V00070V00082V00084V00102V",
"00082V00077V00044V00019V00012V00009V00029V00077V00084V00081V00051V00028V00031V00031V00022V00017V00014V00015V00015V00027V00037V00038V00020V00031V",
"00019V00010V00010V00009V00006V00010V00019V00048V00068V00058V00047V00030V00022V00018V00017V00014V00019V00018V00026V00059V00031V00019V00017V00014V",
"00009V00007V00003V00002V00002V00004V00008V00020V00030V00029V00024V00028V00028V00031V00022V00017V00016V00020V00028V00034V00039V00037V00029V00019V",
"00012V00009V00005V00004V00003V00003V00010V00025V00037V00034V00027V00031V00036V00043V00040V00037V00035V00040V00031V00027V00039V00038V00038V00032V",
"00031V00028V00022V00018V00021V00019V00021V00033V00036V00040V00041V00045V00040V00043V00039V00037V00041V00048V00053V00053V00052V00056V00055V00055V",
"00050V00044V00047V00048V00030V00011V00008V00011V00012V00010V00008V00014V00011V00017V00031V00022V00009V00008V00018V00038V00040V00043V00033V00027V",
"00004V00004V00004V00004V00003V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V",
"00004V00004V00004V00004V00004V00004V00004V00004V00004V00005V00005V00005V00005V00005V00005V00004V00005V00005V00005V00006V00006V00005V00005V00004V",
"00004V00004V00004V00004V00004V00004V00004V00004V00005V00005V00006V00006V00005V00005V00004V00005V00004V00005V00005V00005V00005V00005V00006V00008V",
"00006V00006V00006V00005V00005V00005V00004V00005V00008V00006V00006V00006V00005V00005V00005V00004V00004V00004V00004V00005V00006V00006V00006V00005V",
"00004V00004V00004V00004V00004V00004V00004V00004V00005V00005V00005V00006V00006V00005V00005V00005V00005V00005V00005V00005V00005V00005V00004V00004V",
"00004V00004V00004V00004V00004V00004V00004V00004V00005V00006V00005V00005V00005V00005V00005V00006V00005V00005V00005V00006V00007V00008V00007V00005V",
"00005V00004V00004V00004V00004V00004V00004V00004V00005V00005V00006V00006V00005V00005V00005V00005V00005V00005V00004V00004V00004V00004V00005V00004V",
"00004V00004V00005V00004V00005V00004V00004V00005V00006V00007V00005V00005V00005V00005V00005V00005V00005V00004V00004V00004V00004V00005V00005V00005V",
"00004V00004V00005V00005V00004V00004V00004V00005V00005V00006V00006V00005V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V00004V",
"00004V00005V00004V00004V00004V00004V00004V00005V00005V00006V00005V00005V00005V00006V00005V00004V00004V00005V00004V00004V00004V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00006V00006V00005V00005V00005V00005V00004V00004V00004V00005V00004V00005V00006V00006V00005V00005V",
"00005V00005V00005V00005V00005V00004V00005V00005V00005V00006V00005V00006V00006V00005V00004V00004V00004V00004V00004V00004V00004V00004V00005V00005V",
"00005V00005V00005V00006V00006V00005V00005V00005V00006V00007V00007V00006V00006V00005V00004V00004V00005V00004V00004V00004V00004V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00006V00006V00006V00006V00005V00005V00005V00004V00004V00004V00004V00004V00004V00004V00004V00004V00005V",
"00005V00005V00005V00004V00004V00005V00005V00005V00005V00006V00006V00005V00005V00005V00004V00004V00005V00005V00005V00004V00004V00004V00005V00005V",
"00004V00005V00005V00004V00004V00004V00004V00004V00004V00004V00005V00005V00005V00005V00005V00005V00004V00005V00005V00004V00004V00005V00006V00006V",
"00008V00007V00005V00005V00005V00005V00005V00005V00005V00007V00006V00006V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"00005V00005V00004V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00004V",
"00004V00004V00004V00004V00004V00004V00004V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"00004V00005V00005V00005V00005V00005V00005V00005V00006V00006V00006V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00006V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00006V00006V00007V",
"00006V00006V00006V00005V00005V00005V00006V00006V00006V00006V00006V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00006V00008V00011V",
"00009V00006V00005V00005V00005V00005V00005V00006V00007V00008V00006N00006V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00006V00006V",
"00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00005V00006V00006V00006V00006V00006V00006V00006V00006V00006V00006V00006V00006V00007V00006V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00006V00006V00007V00007V00007V00006V00006V00006V00006V00005V00005V00005V00005V00005V00005V00005V",
"00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00006V00006V00006V00006V00006V",
"00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V00005V",
"000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.5V000.4V000.4V000.5V000.5V000.6V000.6V000.6V000.5V",
"000.5V000.5V000.5V000.5V000.4V000.4V000.5V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.5V000.4V000.4V000.4V000.5V000.5V000.5V000.5V000.5V000.4V",
"000.4V000.4V000.3V000.3V000.3V000.3V000.4V000.4V000.6V000.7V000.6V000.5V000.5V000.5V000.5V000.5V000.4V000.4V000.5V000.5V000.6V000.7V000.8V001.1V",
"001.2V000.9V000.8V000.6V000.5V000.4V000.4V000.6V001.1V001.0V000.7V000.6V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.6V000.7V000.8V000.7V000.6V",
"000.5V000.4V000.4V000.4V000.3V000.3V000.4V000.5V000.6V000.6V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.6V000.6V000.6V000.5V000.4V",
"000.4V000.4V000.4V000.4V000.3V000.3V000.4V000.5V000.6V000.7V000.6V000.5V000.5V000.5V000.5V000.5V000.4V000.4V000.5V000.6V000.7V000.9V001.0V000.7V",
"000.5V000.4V000.4V000.4V000.3V000.3V000.4V000.5V000.7V000.8V000.6V000.6V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.7V000.7V000.7V",
"000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.5V000.5V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.5V000.6V000.7V000.7V",
"000.6V000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.5V",
"000.4V000.4V000.4V000.4V000.4V000.3V000.4V000.5V000.7V000.8V000.6V000.5V000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.7V000.9V000.7V",
"000.7V000.5V000.4V000.3V000.3V000.3V000.4V000.5V000.7V000.8V000.6V000.6V000.7V000.7V000.5V000.5V000.5V000.4V000.4V000.5V000.6V000.7V000.7V000.6V",
"000.6V000.5V000.4V000.4V000.4V000.3V000.4V000.5V000.6V000.7V000.7V000.6V000.6V000.5V000.6V000.5V000.5V000.5V000.5V000.5V000.6V000.6V000.7V000.7V",
"000.7V000.7V000.7V000.8V000.8V000.7V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.5V",
"000.6V000.6V000.5V000.5V000.5V000.5V000.5V000.5V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.3V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V",
"000.5V000.5V000.5V000.4V000.4V000.4V000.3V000.3V000.4V000.4V000.5V000.4V000.4V000.4V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.5V000.4V000.4V",
"000.4V000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.7V000.9V",
"001.1V001.2V000.7V000.4V000.3V000.3V000.3V000.4V000.6V000.7V000.5V000.5V000.5V000.5V000.5V000.4V000.4V000.4V000.4V000.5V000.5V000.5V000.5V000.6V",
"000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.6V000.5V000.5V000.5V000.5V000.4V000.4V000.4V000.3V000.4V000.5V000.6V000.7V000.5V000.4V",
"000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.5V000.5V000.5V000.6V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.4V000.6V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V",
"000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.6V000.7V000.6V000.5V002.7V000.8V000.5V000.4V000.4V000.4V000.5V000.5V000.5V000.5V000.5V000.5V",
"000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.5V000.6V000.8V000.8V",
"000.8V000.8V000.7V000.5V000.4V000.4V000.4V000.5V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.6V000.9V001.8V",
"002.0V001.1V000.6V000.4V000.4V000.4V000.4V000.5V000.7V000.9V000.7N000.5V000.5V000.4V000.5V000.5V000.5V000.5V000.5V000.6V000.6V000.6V000.6V000.7V",
"000.7V000.5V000.4V000.3V000.3V000.3V000.3V000.4V000.5V000.6V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.6V000.6V000.6V000.6V000.5V",
"000.4V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.6V000.5V000.4V000.4V000.4V000.5V000.5V000.5V000.5V000.5V000.5V000.5V000.4V000.4V000.3V",
"000.3V000.3V000.2V000.2V000.2V000.2V000.2V000.3V000.4V000.5V000.4V000.4V000.4V000.4V000.4V000.4V000.4V000.5V000.5V000.6V000.5V000.5V000.5V000.4V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.5V000.5V000.5V000.4V000.4V000.4V000.5V000.5V000.5V000.5V000.4V000.4V000.4V000.4V000.4V",
"000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.3V000.4V000.3V000.3V000.3V000.3V000.3V000.4V000.5V000.5V000.5V000.6V000.5V"
)), row.names = 50:179, class = "data.frame")
How can I use separate() to separate based on V OR N? sep = "V|N".
But I'd suggest pre-processing. Use gsub or stringr::str_replace_all to make the N entries NA before separating. Something like df$V10 = gsub("N\\d+(\\.)?\\d+", "VNA", df$V10). Replace NXXX.XX with VNA, then splitting on V should leave NAs where the Ns originally were.
Related
tidyverse and dplyr: Conditional replacement of values in a column based on other column [duplicate]
This question already has answers here: Can dplyr package be used for conditional mutating? (5 answers) Closed 2 years ago. I want to mutate a column A4 by A3 but reducing value of A3 by 1 if Total == 63. What am I doing wrong here? tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1)) The complete code with data is here library(tidyverse) tb1 <- structure( list( A1 = c(16, 11, 16, 18, 20, 19, 16, 18, 20, 15, 17, 19, 19, 19, 16, 19, 16, 15, 19, 19, 16, 18, 18, 19, 19, 18, 20, 18, 19, 19, 19, 19, 17, 19, 17, 16, 18, 19, 16, 18, 17, 19, 19, 20, 17, 16, 18, 16, 15, 19, 19, 17, 20, 18, 16, 19, 19, 15, 17, 17, 19, 19, 16, 17, 18, 19, 17, 19, 17, 15, 19, 16, 17 ) , A2 = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 ) , A3 = c(33, 34, 38, 36, 36, 34, 41, 36, 40, 38, 38, 41, 38, 34, 33, 36, 41, 40, 41, 38, 41, 33, 40, 38, 40, 38, 41, 41, 40, 41, 40, 38, 34, 40, 36, 41, 40, 40, 33, 38, 36, 41, 40, 40, 28, 41, 40, 41, 33, 41, 36, 36, 40, 34, 41, 41, 38, 38, 41, 38, 41, 41, 36, 40, 38, 38, 40, 41, 38, 22, 36, 34, 38 ) , Total = c(57, 53, 62, 62, 64, 61, 65, 62, 68, 61, 63, 68, 65, 61, 57, 63, 65, 63, 68, 65, 65, 59, 66, 65, 67, 64, 69, 67, 67, 68, 67, 65, 59, 67, 61, 65, 66, 67, 57, 64, 61, 68, 67, 68, 53, 65, 66, 65, 56, 68, 63, 61, 68, 60, 65, 68, 65, 61, 66, 63, 68, 68, 60, 65, 64, 65, 65, 68, 63, 45, 63, 58, 63 ) ) , class = "data.frame" , row.names = c(NA, -73L) ) tb1 %>% filter(Total == 63) #> A1 A2 A3 Total #> 1 17 8 38 63 #> 2 19 8 36 63 #> 3 15 8 40 63 #> 4 19 8 36 63 #> 5 17 8 38 63 #> 6 17 8 38 63 #> 7 19 8 36 63 #> 8 17 8 38 63 tb2 <- tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1)) %>% mutate(Total = A1 + A2 + A3) #> Warning: Problem with `mutate()` input `A4`. #> x number of items to replace is not a multiple of replacement length #> ℹ Input `A4` is `replace(A3, Total == 63, A3 - 1)`. tb2 %>% filter(Total == 62) #> A1 A2 A3 Total #> 1 16 8 38 62 #> 2 18 8 36 62 #> 3 18 8 36 62
You are better using ifelse here : library(dplyr) tb1 %>% mutate(A4 = ifelse(Total == 63, A3 -1, A3)) As far as why replace does not work if you check the source code of replace : replace function (x, list, values) { x[list] <- values x } It assigns values to x after subsetting for list. When you use : tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1)) your values is of length length(tb1$A3) but list is of length sum(tb1$Total == 63) which do not match hence you get the warning of number of items to replace is not a multiple of replacement length, since it tries recycling those values but still the length is unequal. If you want to make replace work you can try : tb1 %>% mutate(A4 = replace(A3, Total == 63, A3[Total == 63] -1)) but again as I mentioned it is easier to just use ifelse here.
Having trouble using tidyr pivot_wider to spread data
I have a dataset comparing 15 hybrids, each with 5 separate measurements. I am trying to spread the data into a wider dataset using pivot_wider for a regression analysis, since spread() would not work (probably because of the repeated observations). The dataset I am working with is below: data <- structure(list(hybrid = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15), measurement = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5), value = c(245, 889, 450, 45, 515, 318, 956, 434, 29, 740, 156, 516, 767, 292, 753, 573, 636, 611, 777, 557, 408, 95, 482, 227, 495, 360, 55, 76, 393, 37, 667, 802, 724, 900, 885, 191, 79, 143, 531, 398, 324, 129, 172, 467, 25, 101, 476, 629, 915, 122, 498, 649, 354, 527, 920, 788, 565, 552, 586, 127, 461, 307, 77, 552, 198, 240, 816, 144, 136, 781, 593, 421, 233, 264, 812, 407, 492, 932, 940, 139, 764, 200, 352, 754, 271, 506, 381, 973, 678, 848, 432, 358, 218, 736, 287, 411, 220, 264, 531, 669, 666, 727, 841, 792, 79, 460, 159, 426, 90, 395, 793, 507, 262, 814, 157, 641, 230, 870, 304, 591, 636, 277, 534, 783, 562, 938, 889, 68, 557, 892, 809, 157, 71, 54, 256, 246, 301, 823, 622, 953, 6, 66, 556, 902, 207, 832, 248, 540, 192, 65, 381, 712, 15, 323, 1, 193, 146, 637, 488, 158, 289, 839, 229, 237, 273, 978, 560, 969, 898, 204, 335, 930, 444, 968, 920, 398, 303, 318, 975, 182, 630, 4, 624, 271, 272, 438, 661, 728, 32, 106, 473, 465, 498, 33, 189, 918, 704, 605, 867, 240, 833, 497, 514, 241, 860, 228, 643, 791, 4, 898, 574, 225, 339, 365, 387, 548, 88, 604, 283)), class = "data.frame", row.names = c(NA, -219L)) I'm new to the pivot_wider function, so when I run my code, I get an error: data%>% pivot_wider(cols = -hybrid, names_to = c("1","2","3","4","5")) Error in pivot_wider(., cols = -hybrid, names_to = c("1", "2", "3", "4", : unused arguments (cols = -hybrid, names_to = c("1", "2", "3", "4", "5")) How can I spread this data so that I have 5 columns? Hybrid, 1, 2, 3, 4, 5 (with the values under the columns entitled 1:5).
My guess is that you are you looking for this: library(tidyr) pivot_wider(data, id_cols = hybrid, names_from = measurement, values_from = "value", values_fn = sum) # # A tibble: 15 x 6 # hybrid `1` `2` `3` `4` `5` # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 1 1584 878 1419 1412 1812 # 2 2 1820 1742 804 910 506 # 3 3 2193 1976 753 851 664 # 4 4 1206 1535 1530 2273 1265 # 5 5 845 990 1096 1795 1309 # 6 6 1831 1843 1306 1158 2499 # 7 7 1008 1434 1015 2062 1712 # 8 8 1045 1278 1583 1028 1765 # 9 9 913 1317 1500 957 1449 # 10 10 1037 556 1746 1025 1665 # 11 11 1620 638 1050 340 1283 # 12 12 1357 1488 2427 1469 2332 # 13 13 1019 1787 899 1371 866 # 14 14 1436 1140 2176 1570 1615 # 15 15 1662 1476 929 1023 887
Using dcast from data.table library(data.table) dcast(setDT(data), hybrid ~ measurement, sum) # hybrid 1 2 3 4 5 # 1: 1 1584 878 1419 1412 1812 # 2: 2 1820 1742 804 910 506 # 3: 3 2193 1976 753 851 664 # 4: 4 1206 1535 1530 2273 1265 # 5: 5 845 990 1096 1795 1309 # 6: 6 1831 1843 1306 1158 2499 # 7: 7 1008 1434 1015 2062 1712 # 8: 8 1045 1278 1583 1028 1765 # 9: 9 913 1317 1500 957 1449 #10: 10 1037 556 1746 1025 1665 #11: 11 1620 638 1050 340 1283 #12: 12 1357 1488 2427 1469 2332 #13: 13 1019 1787 899 1371 866 #14: 14 1436 1140 2176 1570 1615 #15: 15 1662 1476 929 1023 887
Calculate row similarity percentage pair wise and add it as a new column
I have a date frame like this sample, I would like to find similar rows (not duplicate) and calculate similarity per wise. I find this solution but i would like to keep all my columns and add similarity percentage as a new variable. My aim is to find records with highest similarity percentage. How could I do it ? sample data set df <- tibble::tribble( ~date, ~user_id, ~Station_id, ~location_id, ~ind_id, ~start_hour, ~start_minute, ~start_second, ~end_hour, ~end_minute, ~end_second, ~duration_min, 20191015, 19900234, 242, 2, "ac", 7, 25, 0, 7, 30, 59, 6, 20191015, 19900234, 242, 2, "ac", 7, 31, 0, 7, 32, 59, 2, 20191015, 19900234, 242, 2, "ac", 7, 33, 0, 7, 38, 59, 6, 20191015, 19900234, 242, 2, "ac", 7, 39, 0, 7, 40, 59, 2, 20191015, 19900234, 242, 2, "ac", 7, 41, 0, 7, 43, 59, 3, 20191015, 19900234, 242, 2, "ac", 7, 44, 0, 7, 45, 59, 2, 20191015, 19900234, 242, 2, "ac", 7, 47, 0, 7, 59, 59, 13, 20191015, 19900234, 242, 2, "ad", 7, 47, 0, 7, 59, 59, 13, 20191015, 19900234, 242, 2, "ac", 8, 5, 0, 8, 6, 59, 2, 20191015, 19900234, 242, 2, "ad", 8, 5, 0, 8, 6, 59, 2, 20191015, 19900234, 242, 2, "ac", 8, 7, 0, 8, 8, 59, 2, 20191015, 19900234, 242, 2, "ad", 8, 7, 0, 8, 8, 59, 2, 20191015, 19900234, 242, 2, "ac", 16, 26, 0, 16, 55, 59, 30, 20191015, 19900234, 242, 2, "ad", 16, 26, 0, 16, 55, 59, 30, 20191015, 19900234, 242, 2, "ad", 17, 5, 0, 17, 6, 59, 2, 20191015, 19900234, 242, 2, "ac", 17, 5, 0, 17, 23, 59, 19, 20191015, 19900234, 242, 2, "ad", 17, 7, 0, 17, 15, 59, 9, 20191015, 19900234, 242, 2, "ad", 17, 16, 0, 17, 22, 59, 7, 20191015, 19900234, 264, 2, "ac", 17, 24, 0, 17, 35, 59, 12, 20191015, 19900234, 264, 2, "ad", 17, 25, 0, 17, 35, 59, 11, 20191016, 19900234, 242, 1, "ac", 7, 12, 0, 7, 14, 59, 3, 20191016, 19900234, 242, 1, "ad", 7, 13, 0, 7, 13, 59, 1, 20191016, 19900234, 242, 1, "ac", 17, 45, 0, 17, 49, 59, 5, 20191016, 19900234, 242, 1, "ad", 17, 46, 0, 17, 48, 59, 3, 20191016, 19900234, 242, 2, "ad", 7, 14, 0, 8, 0, 59, 47, 20191016, 19900234, 242, 2, "ac", 7, 15, 0, 8, 0, 59, 47 ) Function for comparing rows row_cf <- function(x, y, df){ sum(df[x,] == df[y,])/ncol(df) } Function output # 1) Create all possible row combinations # 2) Rename # 3) Run through each row # 4) Calculate similarity expand.grid(1:nrow(df), 1:nrow(df)) %>% rename(row_1 = Var1, row_2 = Var2) %>% rowwise() %>% mutate(similarity = row_cf(row_1, row_2, df)) # A tibble: 676 x 3 row_1 row_2 similarity <int> <int> <dbl> 1 1 1 1 2 2 1 0.75 3 3 1 0.833 4 4 1 0.75 5 5 1 0.75 6 6 1 0.75 7 7 1 0.75 8 8 1 0.667 9 9 1 0.583 10 10 1 0.5 Edit: I would like to find similar rows in the data like here
Using your "function output", call it sim. Eliminate the self-comparisons and then keep the max similarity row grouped by row_1: sim = sim %>% filter(row_1 != row_2) %>% group_by(row_1) %>% slice(which.max(similarity)) Then you can add these to your original data: df %>% mutate(row_1 = 1:n()) %>% left_join(sim) The row_2 column gives the row number of the most similar row, and similarity gives its similarity score. (You may want to improve these column names.)
cut2 splits into unequal buckets
I am currently doing some data manipulation and have been searching for a way to create deciles with equal number of observations in each group. I ran into the Hmisc package and the cut2 function and was under the impression it should split the data into 10 buckets with equal numbers of observations in each by specifying g=10. However the output from this function has been quite a bit off. Am I using cut2 incorrectly? The code I am using: library(Hmisc) testdata <- data.frame(rating= c(8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 6, 8, 8, 8, 8, 6, 8, 6, 8, 4, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 8, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 6, 8, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 6, 8, 8, 6, 4, 8, 8, 8, 8, 8, 6, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 2, 8, 6, 8, 8, 8, 6, 8, 8, 6, 6, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 6, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 6, 8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 8, 8, 8, 8, 8, 6, 8, 8, 8, 6) ,age=c(0, 0, 0, 0, 3, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 9, 10, 10, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 30, 30, 30, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 39, 39, 39, 40, 40, 41, 41, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 48, 48, 48, 54, 54, 54, 56, 56, 58, 59, 59, 59, 59, 60, 60, 60, 61, 66, 66, 70, 72)) cutcutcut <- cut2(testdata$age,g=10) testtable <- table(cutcutcut) and the output of unequal observations in each bucket testtable [ 0,13) [13,15) [15,20) [20,24) [24,26) [26,28) [28,33) [33,40) [40,46) [46,72] 46 16 35 28 33 35 26 31 31 28
The answer to your question lies in looking at the distribution of your data: table(testdata$age) # 0 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # 4 1 4 6 4 3 4 2 2 16 9 7 5 10 6 7 7 13 4 2 9 # 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 # 23 10 18 17 8 5 3 2 8 2 2 5 9 5 5 3 2 8 7 3 6 # 45 46 47 48 54 56 58 59 60 61 66 70 72 # 5 4 3 3 3 2 1 4 3 1 2 1 1 We see that some ages have a large number of individuals at that age (e.g. there are 16 individuals with age 12 and 23 individuals with age 24). Since the cutting algorithm needs to put all individuals with the exact same age into the same bucket, this may lead to some imbalances in the buckets. Since there are 309 total observations in your data and you seek 10 buckets, you would ideally want 31 observations in 9 of the buckets and 30 in the last. Right now the last bucket is defined as [46, 72], which contains 28 elements (too low). If you expanded this to [45, 72], it would contain 33 elements (too many). There is no way to split up the data to get exactly 30 or 31 observations in this last bucket because there are 5 elements with value 45.
R - Keep reading line if 7 or more numbers are => 10
I have a file foo.txt that looks like this: 7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13, 34, 24, 17, 50, 30, 42, 35, 29, 27, 52, 35, 44, 52, 36, 39, 25, 40, 50, 52, 40, 2, 52, 52, 31, 35, 30, 19, 32, 46, 50, 43, 36, 15, 21, 16, 36, 25, 7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6 I want to read the numbers in sets of 15, moving to the right one number at the time: 7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5 then 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5, 22 and so on. If 7 or more of those 15 numbers are =>10 then keep them in a growing object that ends when the condition isn't met. So the first one to keep would be 3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13 because 7 out of those 15 numbers are => 10 (those numbers are 22, 18, 14, 23, 16, 18 and 13 The output file would look like this: 3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13, 34, 24, 17, 50, 30, 42, 35, 29, 27, 52, 35, 44, 52, 36, 39, 25, 40, 50, 52, 40, 2, 52, 52, 31, 35, 30, 19, 32, 46, 50, 43, 36, 15, 21, 16, 36, 25, 7, 3, 5, 7, 3, 3, 3, 3 So far I'm stuck at getting sets of 15 digits but I don't know how to make the condition "7 or more must be => 10" qual <- readLines("foo.txt", 1) separados <- unlist(strsplit(qual, ", ")) for (i in 1:length(qual)) { separados[(i):(i + 14)] -> numbers I don't mind the language as long as it does the work
I've added two ='s to Vlo's solutions and made this for you. Does this answer your question? foo.txt <- c(7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13, 34, 24, 17, 50, 30, 42, 35, 29, 27, 52, 35, 44, 52, 36, 39, 25, 40, 50, 52, 40, 2, 52, 52, 31, 35, 30, 19, 32, 46, 50, 43, 36, 15, 21, 16, 36, 25, 7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6) # install.packages(c("zoo"), dependencies = TRUE) require(zoo) bar <- rollapply(foo.txt, 15, function(x) sum(x >= 10 ) >= 7) (product <- foo.txt[bar]) [1] 3 3 3 6 7 5 5 22 18 14 23 16 18 5 13 34 24 17 50 30 42 35 29 27 [25] 52 35 44 52 36 39 25 40 50 52 40 2 52 52 31 35 30 19 32 46 50 43 3 3 [49] 3 3 3 6
I would do it in Python (you said you don't mind the language): array = [] with open("foo.txt","r") as f: for line in f: for num in line.strip().split(', '): array.append(int(num)) result = [] growing = False while len(array) >= 15: if sum(1 for e in filter(lambda x: x>=10, array[:15])) >= 7: if growing: result.append(array[15]) else: result.extend(array[:15]) growing = True else: growing = False del(array[0]) print(str(result)[1:-1]) Short explanation: first while simply reads the lines in the file, strips end of line, separates every number between ", " characters and appends each number to array. Second while checks the first 15 numbers in array; if they have at least 7 numbers >= 0, it appends all the numbers, or just the last one (depending if the last iteration), to result. At the end of the loop, it removes the first number in array so that the loop can continue with the next 15 numbers.