R find values from intervals - r

I have a data frame x:
begin end
1 1 3
2 5 6
3 11 18
and a vector v <- c(1,2,5,9,10,11,17,20)
I'd like to find all values from vector that are elements of any of interval from data frame. So i would like to get a vector c(1,2,5,11,17). How is it possible?

To get row-wise values, use apply on MARGIN 1 with intersect
apply(df, 1, function(a) intersect(v, a[1]:a[2]))
#[[1]]
#[1] 1 2
#[[2]]
#[1] 5
#[[3]]
#[1] 11 17
OR unlist to get a vector
unlist(apply(df, 1, function(a) intersect(v, a[1]:a[2])))
#OR
intersect(v, unlist(apply(df, 1, function(a) a[1]:a[2]))) #as commented by akrun
#[1] 1 2 5 11 17

We can use Map to get the sequence between corresponding, begin/end values in a list, unlist the list and use intersect to get the elements common in both the vectors
intersect(unlist(Map(`:`, x$begin, x$end)), v)
#[1] 1 2 5 11 17

Related

Multiplication of selected columns by a given vector

I have been struggling with a task in R for some time, which seems to be easy.
suppose this is my sample data:
df <- data.frame(a=c(2,2,7),b=c(1,4,3),c=c(9,5,3))
v <- c(1,2,3)
now I would like to multiply each column by the corresponding vector element e.g. first column by v[1], second column by v[2]etc..
expected output:
a b c
1 2 2 27
2 2 8 15
3 7 6 9
The target data is much larger and consists of integers and floating point numbers.
Thank you in advance!
You can use sweep:
sweep(df, 2, v, FUN="*")
Second option is mapply:
mapply(`*`, df, v)
Or with transposing:
t(t(df)*v)
You can try col
> v[col(df)] * df
a b c
1 2 2 27
2 2 8 15
3 7 6 9
apply(df, 1, function(x) x * v) |> t()
or
t(apply(df, 1, function(x) x * v))

How to inverse subset in R?

I am trying to make non-overlapping subsets of a totally inclusive group in R. The first subset contains pairs of elements from the totally inclusive group. The other subset should be all of the elements in the totally inclusive group, but not in the first subset.
poplength <- 10
samples <- 7
numpair <- 2
totallyinclusivegroup <- sample(1:poplength, samples)
Subset1 <- sample(totallyinclusivegroup, size = numpair*2)
I don't know how to get a "Subset2" that includes everything in "totallyinclusivegroup" but not in Subset 1. I've tried using the "-" operator, with no success. For example,
Subset2 <- totallyinclusivegroup[-Subset1]
does not work, and includes elements from Subset1. Any advice/help is appreciated.
We can negate with ! on the logical vector from %in% so that TRUE -> FALSE and viceversa
out <- totallyinclusivegroup[!totallyinclusivegroup %in% Subset1]
-output
Subset1
#[1] 2 6 9 7
totallyinclusivegroup
#[1] 3 2 6 1 9 7 8
out
#[1] 3 1 8
Or an easier option is setdiff
setdiff(totallyinclusivegroup, Subset1)
#[1] 3 1 8
If there are duplicate elements, it is better to use vsetdiff from vecsets
library(vecsets)
vsetdiff(totallyinclusivegroup, Subset1)
Try:
#Code
Subset2 <- totallyinclusivegroup[-which(totallyinclusivegroup%in% Subset1 )]
Output:
totallyinclusivegroup
[1] 8 5 10 2 9 1 3
Subset1
[1] 5 10 3 9
Subset2
[1] 8 2 1

Duplicating R dataframe vector values using another vector as a guide

I have the following R dataframe: df = data.frame(value=c(5,4,3,2,1), a=c(2,0,1,6,9), b=c(7,0,0,3,4)). I would like to duplicate the values of a and b by the number of times of the corresponding position values in value. For example, Expanding b would look like b_ex = c(7,7,7,7,7,2,2,2,4). No values of three or four would be in b_ex because values of zero are in b[2] and b[3]. The expanded vectors would be assigned names and be stand-alone.
Thanks!
Maybe you are looking for :
result <- lapply(df[-1], function(x) rep(x[x != 0], df$value[x != 0]))
#$a
#[1] 2 2 2 2 2 1 1 1 6 6 9
#$b
#[1] 7 7 7 7 7 3 3 4
To have them as separate vectors in global environment use list2env :
list2env(result, .GlobalEnv)

choosing vector elements in a loop based on another vector [duplicate]

a questions from a relative n00b: I’d like to split a vector into three vectors of different lengths, with the values assigned to each vector at random. For example, I’d like to split the vector of length 12 below into vectors of length 2,3, and 7
I can get three equal sized vectors using this:
test<-1:12
split(test,sample(1:3))
Any suggestions on how to split test into vectors of 2,3, and 7 instead of three vectors of length 4?
You could use rep to create the indices for each group and then split based on that
split(1:12, rep(1:3, c(2, 3, 7)))
If you wanted the items to be randomly assigned so that it's not just the first 2 items in the first vector, the next 3 items in the second vector, ..., you could just add call to sample
split(1:12, sample(rep(1:3, c(2, 3, 7))))
If you don't have the specific lengths (2,3,7) in mind but just don't want it to be equal length vectors every time then SimonO101's answer is the way to go.
How about using sample slightly differently...
set.seed(123)
test<-1:12
split( test , sample(3, 12 , repl = TRUE) )
#$`1`
#[1] 1 6
#$`2`
#[1] 3 7 9 10 12
#$`3`
#[1] 2 4 5 8 11
set.seed(1234)
test<-1:12
split( test , sample(3, 12 , repl = TRUE) )
#$`1`
#[1] 1 7 8
#$`2`
#[1] 2 3 4 6 9 10 12
#$`3`
#[1] 5 11
The first argument in sample is the number of groups to split the vector into. The second argument is the number of elements in the vector. This will randomly assign each successive element into one of 3 vectors. For 4 vectors just do split( test , sample(4, 12 , repl = TRUE) ).
It is easier than you think. To split the vector in three new randomly chosen sets run the following code:
test <- 1:12
split(sample(test), 1:3)
By doing so any time you run your this code you would get a new random distribution in three different sets(perfect for k-fold cross validation).
You get:
> split(sample(test), 1:3)
$`1`
[1] 5 8 7 3
$`2`
[1] 4 1 10 9
$`3`
[1] 2 11 12 6
> split(sample(test), 1:3)
$`1`
[1] 12 6 4 1
$`2`
[1] 3 8 7 5
$`3`
[1] 9 2 10 11
You could use an auxiliary vector to format the way you want to split your data. Example:
Data <- c(1,2,3,4,5,6)
Format <- c("X","Y","X","Y","Z,"Z")
output <- split(Data,Format)
Will generate the output:
$X
[1] 1 3
$Y
[1] 2 4
$Z
[1] 5 6

Split a vector into three vectors of unequal length in R

a questions from a relative n00b: I’d like to split a vector into three vectors of different lengths, with the values assigned to each vector at random. For example, I’d like to split the vector of length 12 below into vectors of length 2,3, and 7
I can get three equal sized vectors using this:
test<-1:12
split(test,sample(1:3))
Any suggestions on how to split test into vectors of 2,3, and 7 instead of three vectors of length 4?
You could use rep to create the indices for each group and then split based on that
split(1:12, rep(1:3, c(2, 3, 7)))
If you wanted the items to be randomly assigned so that it's not just the first 2 items in the first vector, the next 3 items in the second vector, ..., you could just add call to sample
split(1:12, sample(rep(1:3, c(2, 3, 7))))
If you don't have the specific lengths (2,3,7) in mind but just don't want it to be equal length vectors every time then SimonO101's answer is the way to go.
How about using sample slightly differently...
set.seed(123)
test<-1:12
split( test , sample(3, 12 , repl = TRUE) )
#$`1`
#[1] 1 6
#$`2`
#[1] 3 7 9 10 12
#$`3`
#[1] 2 4 5 8 11
set.seed(1234)
test<-1:12
split( test , sample(3, 12 , repl = TRUE) )
#$`1`
#[1] 1 7 8
#$`2`
#[1] 2 3 4 6 9 10 12
#$`3`
#[1] 5 11
The first argument in sample is the number of groups to split the vector into. The second argument is the number of elements in the vector. This will randomly assign each successive element into one of 3 vectors. For 4 vectors just do split( test , sample(4, 12 , repl = TRUE) ).
It is easier than you think. To split the vector in three new randomly chosen sets run the following code:
test <- 1:12
split(sample(test), 1:3)
By doing so any time you run your this code you would get a new random distribution in three different sets(perfect for k-fold cross validation).
You get:
> split(sample(test), 1:3)
$`1`
[1] 5 8 7 3
$`2`
[1] 4 1 10 9
$`3`
[1] 2 11 12 6
> split(sample(test), 1:3)
$`1`
[1] 12 6 4 1
$`2`
[1] 3 8 7 5
$`3`
[1] 9 2 10 11
You could use an auxiliary vector to format the way you want to split your data. Example:
Data <- c(1,2,3,4,5,6)
Format <- c("X","Y","X","Y","Z,"Z")
output <- split(Data,Format)
Will generate the output:
$X
[1] 1 3
$Y
[1] 2 4
$Z
[1] 5 6

Resources