I'm essentially given a paired sample say:
X = c(14, 5, 2, 8 , 9, 10)
Y = c(7, 3, 4, 13, 11, 12)
If I sort and pair the two samples into say Z what function can I use to record the number of ranks in Z?
Z = c(2, 3, 4, 7, 8, 9, 10, 11, 12, 13)
so Z is now
Z = (X, Y, Y, Y, X, X, X, Y, Y, Y, X)
How do i count the number of X-runs which in this case is 3 of sizes 1, 2 and 1
I've tried the rle() function but I don't understand how to return the different X and Y's
To get the number of runs of each value in Z, you can use rle(), firstly by finding which values of Z are in X or Y, then subsetting that again on the values that are TRUE
rle(Z %in% X)$lengths[rle(Z %in% X)$values]
#[1] 1 3
rle(Z %in% Y)$lengths[rle(Z %in% Y)$values]
#[1] 3 3
Which, as #docendo discimus points out can be written as
with(rle(Z %in% X), lengths[values])
with(rle(Z %in% Y), lengths[values])
Where
Z %in% X ## gives
TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
So then using rle on the TRUE/FALSE vector gives us our runs of each TRUE/FALSE
rle(Z %in% X) ## gives
Run Length Encoding
lengths: int [1:4] 1 3 3 3
values : logi [1:4] TRUE FALSE TRUE FALSE
So we can take the lenghts and values components separately, and subset the lenghts where values == TRUE
Data
X <- c(14, 5, 2, 8 , 9, 10)
Y <- c(7, 3, 4, 13, 11, 12)
Z <- c(2, 3, 4, 7, 8, 9, 10, 11, 12, 13)
Related
I wish to print the following text,
x = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
with the contents of the set {a,...,j} being an integer vector.
x = 1:10
print(c("x = {", x, "}"), quote=FALSE)
#[1] x = { 1 2 3 4 5 6 7 8 9 10 }
x = 1:10
noquote(paste(c("x = {",x,"}"),sep=","))
#[1] x = { 1 2 3 4 5 6 7 8 9 10 }
Both of these have the same output and the same two issues-- too many spaces and no commas between vector entries. I'm aware this is a very beginner question but any tips?
print is not designed to paste element together, use paste instead:
paste0("x = {", paste(x, collapse = ", "), "}")
#[1] "x = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}"
A tidyverse alternative:
library(stringr)
library(glue)
glue("x = {{{str_flatten_comma(x)}}}")
#[1] "x = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}"
I am trying to extract a vector from a nested list based on the value of another variable\element within the same nested list. Hopefully my example will explain what I'm trying to do.
To begin, I have a list of lists like so:
## Create the inner lists
# Inner list 1
listInner1 <- list(
value = c(0.25),
index = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
left = c(),
right = c()
)
listInner1$left$index <- c(1, 2, 3, 4, 5)
listInner1$left$good <- TRUE
listInner1$right$index <- c(6, 7, 8, 8, 10)
listInner1$right$good <- TRUE
# Inner list 2
listInner2 <- list(
value = c(1.5),
index = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
left = c(),
right = c()
)
listInner2$left$index <- c(1, 2, 3)
listInner2$left$good <- TRUE
listInner2$right$index <- c(4, 5, 6, 7, 8, 9, 10)
listInner2$right$good <- TRUE
# Inner list 3
listInner3 <- list(
value = c(0.5),
index = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
left = c(),
right = c()
)
listInner3$left$index <- c(1, 2, 3, 4, 5)
listInner3$right$index <- c( 6, 7, 8, 9, 10)
listInner3$left$left$index <- c(2, 4, 6, 8, 10)
listInner3$left$right$index <- c(1, 3, 5 ,7, 9)
listInner3$left$left$good <- TRUE
listInner3$left$right$good <- TRUE
# put all inner lists into single list object
listMiddle <- list(listInner1, listInner2, listInner3)
# one more list for fun
listMaster <- list(listMiddle)
As you can see, some of the left and right elements of the nested lists contain the element good = TRUE and some don't.
What I'm trying to do is if a particular nested list contains the element good = TRUE then to extract the element index from that same nested list.
For example, manually creating my desired output for the above example would look something like this:
ans <- list(
index.1 = c(1, 2, 3, 4, 5),
index.2 = c(6, 7, 8, 8, 10),
index.3 = c(1, 2, 3),
index.4 = c(4, 5, 6, 7, 8, 9, 10),
index.5 = c(2, 4, 6, 8, 10),
index.6 = c(1, 3, 5 ,7, 9)
)
The object ans contains all the index vectors that are contained within a nested list that also contains good = TRUE.
Any suggestions as to how I could do this?
Here is an option where we bind the nested elements to a more easily approachable format with rrapply, then, we get the index of 'good' columns, extract the corresponding 'index' elements from that position index by looping over in map2 (based on the the TRUE values), transpose the list , keep only the elements having greater than 0 length, flatten the list and set the names (if needed)
library(purrr)
library(rrapply)
library(stringr)
library(dplyr)
out <- rrapply(listMaster, how = 'bind')
i1 <- grep('good', names(out))
map2(out[i1-1], out[i1], `[`) %>%
transpose %>%
map( ~ keep(.x, lengths(.x) > 0)) %>%
flatten %>%
setNames(str_c('index.', seq_along(.)))
-output
$index.1
[1] 1 2 3 4 5
$index.2
[1] 6 7 8 8 10
$index.3
[1] 1 2 3
$index.4
[1] 4 5 6 7 8 9 10
$index.5
[1] 2 4 6 8 10
$index.6
[1] 1 3 5 7 9
I created a mapply function to select samples from a dataset but is there any faster ways to do it by avoiding mapply because it is slow and I have a larger dataset? My goal is to use more matrix / vector operations and less in terms of lists.
#A list of a set of data to be selected
bl <- list(list(c(1, 2),c(2, 3), c(3, 4), c(4, 5), c(5, 6), c(6, 7), c(7, 8), c(8, 9)),
list(c(1, 2, 3), c(2, 3, 4), c(3, 4, 5), c(4, 5, 6), c(5, 6, 7), c(6, 7, 8)),
list(c(1, 2, 3, 4, 5), c(2, 3, 4, 5, 6), c(3, 4, 5, 6, 7), c(4, 5, 6, 7, 8), c(5, 6, 7, 8, 9)))
#Number of elements to be selected
kn <- c(5, 4, 3)
#Total number of elements in each set
nb <- c(8, 6, 5)
#This output a list but preferably I would like a matrix
bl_func <- function() mapply(function(x, y, z) {
x[sample.int(y, z, replace = TRUE)]
}, bl, nb, kn, SIMPLIFY = FALSE)
EDIT
As suggested by #LMc, parallel::mcmapply indeed is faster:
mc.cores=parallel::detectCores()-1
bl_func <- function() parallel::mcmapply(function(x, y, z) {
x[sample.int(y, z, replace = TRUE)]
}, bl, nb, kn, SIMPLIFY = FALSE)
bl_func.0 <- function() mapply(function(x, y, z) {
x[sample.int(y, z, replace = TRUE)]
}, bl, nb, kn, SIMPLIFY = FALSE)
library(microbenchmark)
microbenchmark(
para = bl_func(),
nopara = bl_func.0(),
times = 100
)
Unit: microseconds
expr min lq mean median uq max neval
para 11601.12 18176.46 19901 20402.4 21872 26457 100
nopara 37.34 90.86 1275 246.5 1311 9159 100
I am still curious, though, of other ways to speed things up without the aid of parallel process. Any ideas will be appreciated!
Use a tool designed for speed and large datasets,e.g. data.table .
To do this you would need to reshape your data from lists to a data.table which is in any ways a good idea.
Here is an attempt:
require(data.table)
x = lapply(bl, function(x) data.table( t(data.frame(x) ) ) )
x = lapply(x, melt)
for( i in 1:length(x) ) x[[i]][, group := i]
x = rbindlist(x)
Now the original list of lists is structured in a data.table with 3 columns: the value containing the actual data, the variable defining the vectors within each list and the group defining the list ID.
> head(x)
variable value group
1: V1 1 1
2: V1 2 1
3: V1 3 1
4: V1 4 1
5: V1 5 1
6: V1 6 1
data.table has a by argument which means we can sample rows (.SD ) by one or several columns in the data.table like this:
x[,.SD[ sample( .N, sample(nb,1) , replace = TRUE ) ],by = group ]
group variable value
1: 1 V2 6
2: 1 V2 5
3: 1 V1 6
4: 1 V1 7
5: 1 V1 3
I have three variables; and I want to create a new varible showing which column that had the highest number. Data:
x= c(5, 1, 4, 5, 5, 1, 1)
y= c(1, 2, 4, 5, 1, 4, 1)
z= c(1, 1, 5, 3, 5, 4, 1)
data <-data.frame(x, y, z)
Importantly if there are a tie I want this to be indicated too, so that.
1= x is highest
2= y is highest
3= z is highest
4= x and y is highest as tie
5= x and z is highest as a tie
6 = y and z is highest as a tie
7 = x, y and z is all equally high.
I've tried below, but it doesn't handle the ties correctly.
data$Highest <- apply(data, 1, which.max)
data
PS. The correct new variable that I would like to get from the data above should be:
correct= c(1, 2, 3, 4, 5, 6, 7)
fun <- function(v) {
stopifnot(length(v) == 3L)
if (anyNA(v)) stop("NA values in input")
if (length(unique(v)) == 1L) return(7L)
rk <- rank(v)
if (max(rk) %% 1 == 0L) return(which.max(rk))
test <- rk %% 1 != 0L
if (sum(test) == 2L) return(sum(which(test)) + 1L)
stop("undefined case")
}
apply(data, 1, fun)
#[1] 1 2 3 4 5 6 7
You can do:
library(plyr)
combn2 <- function(x, y) combn(y, x, paste, collapse="")
x = unlist(sapply(1:ncol(data), combn2, names(data)))
vec = alply(data, 1, function(u) which(paste(names(data)[max(u)==u], collapse='')==x))
#unlist(vec)
#1 2 3 4 5 6 7
I would like to extract information from a data frame parametrically.
That is:
A <- c(3, 10, 20, 30, 40)
B <- c(30, 100, 200, 300, 400)
DF <- data.frame(A, B)
DF[A%in%c(1, 2, 3, 4, 5), ] # it works
# But what if this is the case,
# which comes for example out of a user-menu selection:
m <- "A%in%"
k <- c(1, 2, 3, 4, 5)
# How can we make something like that work:
DF[eval(parse(text=c(m, k))), ]
This works:
DF[eval(parse(text = paste0(m, deparse(k)))), ]
# A B
#1 3 30
However, eval(parse()) should be avoided. Maybe this would be an alternative for you?
x <- "A"
fun <- "%in%"
k <- c(1, 2, 3, 4, 5)
DF[getFunction(fun)(get(x), k), ]
# A B
#1 3 30
Also,
DF[eval(parse(text=paste(m, substitute(k)))),]
or
DF[eval(parse(text=paste(m, quote(k)))),]
or
DF[eval(parse(text=paste(m, "k"))),]