I have the following vector:
vec <- c(28, 44, 45, 46, 47, 48, 61, 62, 70, 71, 82, 83, 104, 105, 111, 115, 125, 136, 137, 138, 146, 147, 158, 159, 160, 185, 186, 187, 188, 189, 190, 191, 192, 193, 209, 263, 264, 265, 266, 267, 268, 280, 283, 284, 308, 309, 318, 319, 324, 333, 334, 335, 347, 354)
Now I would like to get the number of consecutive occurrences in the vector of the minimum length two.
So here this would be valid for the following cases:
44, 45, 46, 47, 48
61, 62
70, 71
82, 83
104, 105
136, 137, 138
146, 147
158, 159, 160
185, 186, 187, 188, 189, 190, 191, 192, 193
263, 264, 265, 266, 267, 268
283, 284
308, 309
318, 319
333, 334, 335
So there are 14 times cases of consecutive numbers, and I just need the integer 14 as output.
Anybody with an idea how to do that?
We can use rle and diff functions :
a=rle(diff(vec))
sum(a$values==1)
diff and split will help
vec2 <- split(vec, cumsum(c(1, diff(vec) != 1)))
vec2[(sapply(vec2, function(x) length(x))>1)]
$`2`
[1] 44 45 46 47 48
$`3`
[1] 61 62
$`4`
[1] 70 71
$`5`
[1] 82 83
$`6`
[1] 104 105
$`10`
[1] 136 137 138
$`11`
[1] 146 147
$`12`
[1] 158 159 160
$`13`
[1] 185 186 187 188 189 190 191 192 193
$`15`
[1] 263 264 265 266 267 268
$`17`
[1] 283 284
$`18`
[1] 308 309
$`19`
[1] 318 319
$`21`
[1] 333 334 335
Brut force :
var <- sort(var)
nconsecutive <- 0
p <- length(var)-1
for (i in 1:p){
if((var[i + 1] - var[i]) == 1){
consecutive <- consecutive + 1
}else{
# If at least one consecutive number
if(consecutive > 0){
# when no more consecutive numbers add one to your increment
nconsecutive = nconsecutive + 1
}
# Re set to 0 your increment
consecutive <- 0
}
}
Here's another base R one-liner using tapply -
sum(tapply(vec, cumsum(c(TRUE, diff(vec) != 1)), length) > 1)
#[1] 14
# Sample data
df <- tibble(id=1:2, xml_str=c("<?xml version='1.0'?><!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.1//EN' 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd'><svg version='1.1' xmlns='http://www.w3.org/2000/svg'>'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M171, 160 L171, 160, 168, 159, 164, 159, 163, 159, 162, 159, 161, 159, 161, 158, 162, 158, 162, 157, 163, 156, 165, 156'/>'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M172, 226 L172, 226, 171, 213, 170, 212, 171, 212, 172, 212, 173, 212, 173, 211, 172, 211, 171, 211, 171, 212, 171, 215'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M153, 94 L153, 94, 150, 90, 150, 89, 150, 88, 150, 87, 150, 86, 150, 85, 150, 84, 150, 82, 150, 81, 150, 80, 150, 79'/>'/>'/>'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M346, 84 L346, 84, 346, 79, 347, 78, 347, 77, 348, 77, 348, 76, 348, 75, 348, 76, 348, 77, 349, 77, 348, 78'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M314, 67 L314, 67, 311, 76, 309, 76, 308, 77, 307, 77, 307, 76, 306, 76, 305, 76, 305, 77, 306, 77, 307, 77, 306, 77, 305, 79, 304, 80'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M313, 57 L313, 57, 321, 56, 321, 57, 321, 58'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M332, 58 L332, 58, 332, 57, 331, 57, 333, 57, 334, 57, 335, 57, 336, 58, 337, 58, 338, 58, 339, 58, 340, 58, 341, 58, 341, 59, 340, 60, 339, 60, 338, 60, 337, 60, 336, 60, 335, 60, 334, 60, 333, 60, 332, 60, 331, 60, 331, 59, 333, 58, 334, 58'/></svg>", "<?xml version='1.0'?><!DOCTYPE svg PUBLIC '-//W3C//DTD SVG 1.1//EN' 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd'><svg version='1.1' xmlns='http://www.w3.org/2000/svg'>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 80 L315, 80, 321, 79, 320, 79, 318, 79, 317, 79'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M334, 83 L334, 83, 334, 82'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 80 L315, 80, 315, 82, 315, 83, 315, 84, 315, 85'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 72 L315, 72'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 69 L315, 69, 315, 70'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M332, 66 L332, 66, 332, 67'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 56 L315, 56'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 66 L315, 66, 315, 67'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 72 L315, 72'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M332, 72 L332, 72, 333, 75'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M315, 72 L315, 72'/>\n<path fill='none' stroke='#ff0000' stroke-width='5' d='M334, 73 L334, 73, 333, 73'/></svg>"))
df <- df %>%
rowwise() %>%
mutate(nodes = (xml_str %>% read_xml() %>% xml_find_all(., "//#d") %>% as_list()))
With the data frame above, I want to extract all path-element d-nodes from the xml string and store them as a list in the same data frame, but I get Column nodes must be length 1 (the group size), not 7
The piping used in the mutate statement does return a single list.
I can leave out the 'rowwise()', but that simply expects length 2 instead of 1.
What am I missing here?
It's not exactly the way you're doing it, but you can use str_extract_all and regex to pull out the relevant string as a list of comma-separated strings
ans <-
df %>%
dplyr::mutate(dnodes = stringr::str_extract_all(xml_str, "(?<=[d]=')[^']+(?='\\/)"))
ans$dnodes
# [[1]]
# [1] "M171, 160 L171, 160, 168, 159, 164, 159, 163, 159, 162, 159, 161, 159, 161, 158, 162, 158, 162, 157, 163, 156, 165, 156"
# [2] "M172, 226 L172, 226, 171, 213, 170, 212, 171, 212, 172, 212, 173, 212, 173, 211, 172, 211, 171, 211, 171, 212, 171, 215"
# [3] "M153, 94 L153, 94, 150, 90, 150, 89, 150, 88, 150, 87, 150, 86, 150, 85, 150, 84, 150, 82, 150, 81, 150, 80, 150, 79"
# [4] "M346, 84 L346, 84, 346, 79, 347, 78, 347, 77, 348, 77, 348, 76, 348, 75, 348, 76, 348, 77, 349, 77, 348, 78"
# [5] "M314, 67 L314, 67, 311, 76, 309, 76, 308, 77, 307, 77, 307, 76, 306, 76, 305, 76, 305, 77, 306, 77, 307, 77, 306, 77, 305, 79, 304, 80"
# [6] "M313, 57 L313, 57, 321, 56, 321, 57, 321, 58"
# [7] "M332, 58 L332, 58, 332, 57, 331, 57, 333, 57, 334, 57, 335, 57, 336, 58, 337, 58, 338, 58, 339, 58, 340, 58, 341, 58, 341, 59, 340, 60, 339, 60, 338, 60, 337, 60, 336, 60, 335, 60, 334, 60, 333, 60, 332, 60, 331, 60, 331, 59, 333, 58, 334, 58"
# [[2]]
# [1] "M315, 80 L315, 80, 321, 79, 320, 79, 318, 79, 317, 79" "M334, 83 L334, 83, 334, 82"
# [3] "M315, 80 L315, 80, 315, 82, 315, 83, 315, 84, 315, 85" "M315, 72 L315, 72"
# [5] "M315, 69 L315, 69, 315, 70" "M332, 66 L332, 66, 332, 67"
# [7] "M315, 56 L315, 56" "M315, 66 L315, 66, 315, 67"
# [9] "M315, 72 L315, 72" "M332, 72 L332, 72, 333, 75"
# [11] "M315, 72 L315, 72" "M334, 73 L334, 73, 333, 73"
You can convert to list of a vector of strings with
ans <-
df %>%
dplyr::mutate(dnodes = stringr::str_extract_all(xml_str, "(?<=[d]=')[^']+(?='\\/)")) %>%
dplyr::mutate(dnodes = purrr::map(dnodes, ~unlist(strsplit(paste(.x, collapse=", "), ", "))))
ans$dnodes
# [[1]]
# [1] "M171" "160 L171" "160" "168" "159" "164" "159" "163" "159" "162"
# [11] "159" "161" "159" "161" "158" "162" "158" "162" "157" "163"
# [21] "156" "165" "156" "M172" "226 L172" "226" "171" "213" "170" "212"
# [31] "171" "212" "172" "212" "173" "212" "173" "211" "172" "211"
# [41] "171" "211" "171" "212" "171" "215" "M153" "94 L153" "94" "150"
# [51] "90" "150" "89" "150" "88" "150" "87" "150" "86" "150"
# [61] "85" "150" "84" "150" "82" "150" "81" "150" "80" "150"
# etc
Does this do what you want? I usually wrap the right side of my mutate(name = right_side) in list() to accomplish this.
df <- df %>%
mutate(nodes = list(xml_str %>% read_xml() %>% xml_find_all(., "//#d")))
class(df$nodes)
"list"
class(df$nodes[[1]])
"xml_nodeset"
Not sure if you want the xml_nodeset objects or perhaps CPak's solution with actual strings is better for you.
I would like to turn data.frame like this one:
dat = data.frame (
ConditionA = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
ConditionB = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5),
X = c(460, 382, 468, 618, 421, 518, 655, 656, 621, 552, 750, 725, 337, 328, 342, 549, 569, 523, 469, 429),
Y = c(437, 305, 498, 620, 381, 543, 214, 181, 183, 387, 439, 351, 327, 268, 276, 178, 375, 393, 312, 302)
)
into a list of lists like this (or similar):
lst = list(
list(
c(460, 382, 468, 618),
c(437, 305, 498, 620)
),
list(
c(421, 518, 655, 656, 621),
c(381, 543, 214, 181, 183)
),
list(
c(552, 750, 725),
c(387, 439, 351)
),
list(
c(337, 328, 342, 549),
c(327, 268, 276, 178)
),
list(
c(569, 523, 469, 429),
c(375, 393, 312, 302)
)
)
> lst
[[1]]
[[1]][[1]]
[1] 460 382 468 618
[[1]][[2]]
[1] 437 305 498 620
[[2]]
[[2]][[1]]
[1] 421 518 655 656 621
[[2]][[2]]
[1] 381 543 214 181 183
[[3]]
[[3]][[1]]
[1] 552 750 725
[[3]][[2]]
[1] 387 439 351
. . .
What would be the most efficient way to make such a conversion?
We can do a split based on the 1st and 2nd columns, use drop=TRUE for removing the combinations with 0 elements and convert to list
lapply(split(dat[-(1:2)], dat[1:2], drop = TRUE), as.list)
Or using tidyverse
library(tidyverse)
dat %>%
group_by(ConditionA, ConditionA.1) %>%
nest %>%
mutate(data = map(data, as.list)) %>%
pull(data)
May be this using data.table
Data:
dat = data.frame (
ConditionA = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
ConditionB = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5),
X = c(460, 382, 468, 618, 421, 518, 655, 656, 621, 552, 750, 725, 337, 328, 342, 549, 569, 523, 469, 429),
Y = c(437, 305, 498, 620, 381, 543, 214, 181, 183, 387, 439, 351, 327, 268, 276, 178, 375, 393, 312, 302)
)
Code:
library('data.table')
setDT(dat)
dat[, list(list(as.list(.SD))),by = .(ConditionA, ConditionB)][, V1]
or this
dat[, list(list(list(.SD))),by = .(ConditionA, ConditionB)][, V1]
c(by(dat[3:4],dat[1:2],as.list))
[[1]]
[[1]]$X
[1] 460 382 468 618
[[1]]$Y
[1] 437 305 498 620
[[2]]
[[2]]$X
[1] 421 518 655 656 621
[[2]]$Y
[1] 381 543 214 181 183
[[3]]
[[3]]$X
[1] 552 750 725
[[3]]$Y
[1] 387 439 351
. . . .
I currently have data spread out across multiple columns in R. I am looking for a way to put this information into the one column as a vector for each of the individual rows.
Is there a function to do this?
For example, the data looks like this:
DF <- data.frame(id=rep(LETTERS, each=1)[1:26], replicate(26, sample(1001, 26)), Class=sample(c("Yes", "No"), 26, TRUE))
select(DF, cols=c("id", "X1","X2", "X23", "Class"))
How can I merge the columns "X1","X2", "X23" into a vector containing numeric type variables for each of the IDs?
Like this?
library(reshape2)
melt(df) %>% dcast(id ~ ., fun.aggregate = list)
Using id, Class as id variables
id .
1 A 422, 74, 439
2 B 879, 443, 923
3 C 575, 901, 749
4 D 813, 747, 21
5 E 438, 526, 675
6 F 863, 562, 474
7 G 103, 713, 918
8 H 585, 294, 525
9 I 115, 76, 175
10 J 953, 379, 926
11 K 679, 439, 377
12 L 816, 624, 538
13 M 678, 226, 142
14 N 667, 369, 586
15 O 795, 422, 248
16 P 165, 22, 612
17 Q 294, 476, 746
18 R 968, 368, 290
19 S 238, 481, 980
20 T 921, 482, 741
21 U 550, 15, 296
22 V 121, 358, 625
23 W 213, 313, 242
24 X 92, 77, 58
25 Y 607, 936, 350
26 Z 660, 42, 275
A note though: I do not know your final use case, but this strikes me as something you probably do not want to have. It is often more advisable to stick to tidy data, see e.g. https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html