subset data.frame for every level of a factor

subset data.frame for every level of a factor - r

Given the following vectors to build a dataframe :
set.seed(1)
x <- sample( LETTERS[1:4], 100, replace=TRUE)
y <- runif(100,0,100)
df <- data.frame(x,y)
I would like to have if possible, a clean code with a loop or apply or any other method to subset the data.frame by different conditionals for every level of factor x. For example:
level A y >30 | y <20
level B y >21 | y <12
level C y >42 | y <21
level D y >58 | y <13

A split apply combine approach where we use Map to iterate over the subsets and the conditions in parallel.
do.call(rbind,
Map(function(data, left, right) {
subset(x = data, subset = y > left | y < right)
},
data = split(df, df$x),
left = c(30, 21, 42, 58),
right = c(20, 12, 21, 13)
))
# x y
#A.5 A 63.349326
#A.10 A 59.876097
#A.11 A 97.617069
#A.12 A 73.179251
#A.22 A 49.559358
#A.24 A 17.344233
# ...
We split your data by x, subset each according to your conditions and combine the list to a single dataframe.

What about something like this
df[df$x == 'A' & (df$y > 30 | df$y < 20),]
# x y
# 2 A 71.117606
# 3 A 44.438057
# 6 A 63.244699
# 7 A 54.185802
# 11 A 39.577617
# 13 A 8.681545
# 29 A 94.437431
# ...
# or depending on what you mean by '&'
df[df$x == 'A' & df$y > 30,]
# x y
# 2 A 71.11761
# 3 A 44.43806
# 6 A 63.24470
# 7 A 54.18580
# 11 A 39.57762
# 29 A 94.43743
# 31 A 54.17604
# ...
# and then accordingly for the other cases

using library(data.table) we can do
lower = c(20, 12, 21, 13)
upper = c(30, 21, 42, 58)
setDT(df)[!between(y, lower[x], upper[x]), .SD, keyby=x]
# x y
# 1: A 63.349326
# 2: A 59.876097
# 3: A 97.617069
# 4: A 73.179251
# 5: A 49.559358
# 6: A 17.344233
# 7: A 51.116978
# ...

Related

R: Using loop to calculate reads per million (number multiplied by 10^6 divided by the sum of the numbers within a column)

I am a junior bioinformatician / data scientist. I am trying to improve my programming skills in R, especially loops.
I wanted to write a script in which I would calculate RPM (reads per million) - a normalized value for read counts in metagenomics. This value is calculated as follows: number of reads * 10^6 / total number of reads in a sample (thus in a column).
To make it more clear, here is the R code that may be used to calculate it manually (thus without a loop):
`dat <- data.frame(x = c(50, 35, 78, 66), y = c(45, 34, 10, 20)) # random data frame
dat # view the data frame
# x y
# 1 50 45
# 2 35 34
# 3 78 10
# 4 66 20
sum_of_the_column_x <- sum(dat$x) # calculate the sum of column x
sum_of_the_column_y <- sum(dat$y) # calculate the sum of column y
dat$rpm_x <- (dat$x * 10^6) / sum_of_the_column_x # calculate the rpm of x, add to the data frame
dat$rpm_y <- (dat$y * 10^6) / sum_of_the_column_y # calculate the rpm of y, add to the data frame
dat # view it again
# x y rpm_x rpm_y
# 1 50 45 218340.6 412844.04
# 2 35 34 152838.4 311926.61
# 3 78 10 340611.4 91743.12
# 4 66 20 288209.6 183486.24
# when you sum up the rpm_x or rpm_y column, it should give 10^6.`
I would like to perform this calculation on many columns at the same time (I have sometimes more than 100 samples, thus columns, in a dataframe). Because it is so many columns, I wanted to create a for loop. I tried a few ways, but none of them really worked... (see below)
I feel like I am very close to the answer, but I ran out of ideas. I would be grateful for hints how to solve this issue. Also, please let me know in case sth is unclear in my question. Thank you so much!
`### WHAT IVE TRIED
dat <- data.frame(x = c(50, 35, 78, 66), y = c(45, 34, 10, 20)) # random data frame
dat # view the data frame
# x y
# 1 50 45
# 2 35 34
# 3 78 10
# 4 66 20
### TRY 1
newdata <- data.frame(x = NA, y = NA)
for (i in dat[,1:2]) {
newdata <- as.data.frame(cbind((i * 1e6 / sum(i))))
}
newdata
# V1
# 1 412844.04
# 2 311926.61
# 3 91743.12
# 4 183486.24
### for some reason only the second column got calculated...
### TRY 2
newdat <- NA
for (i in dat[,1:2]) {
for (j in dat[,1:2]) {
newdat[i] <- i * 10^6
}
}
### now it made a huge table with a lot of NAs.
### TRY 3
# now I had to make a "newdata" dataframe with four rows of NAs, otherwise it doesn't work
newdata <- data.frame(x = rep(NA, 4), y = rep(NA, 4))
for(i in dat[,1:2]) { # for every number i in column
for(j in colnames(newdata[,1:2])) { # for every column of newdata
for(k in colnames(dat[,1:2])) { # for every column of dat
newdata[j] <- i * 10^6 / sum(dat[[k]])
}
}
}
newdata
# x y
# 1 412844.04 412844.04
# 2 311926.61 311926.61
# 3 91743.12 91743.12
# 4 183486.24 183486.24
### Again wrong... Now it is the same column two times.
### TRY 4
newdata <- data.frame(x = rep(NA, 4), y = rep(NA, 4))
for(i in dat[,1:2]) { # for every number i in columns
for(j in colnames(dat[,1:2])) { # for columns of dat
newdata[j] <- i * 10^6 / sum(dat[[j]])
}
}
newdata
# x y
# 1 196506.55 412844.04
# 2 148471.62 311926.61
# 3 43668.12 91743.12
# 4 87336.24 183486.24
### Again wrong - the results of the x are incorrect`

Unnecessary forloops slow things down. There are many other ways of doing this. One is
newdat <- dat * 10^6 / rep(colSums(dat), each=nrow(dat))
which gives what I think you want
> newdat
x y
1 218340.6 412844.04
2 152838.4 311926.61
3 340611.4 91743.12
4 288209.6 183486.24

Create a group of numbers that does not exceed 34

I need to create groups of numbers which summed up do not reach 34.
For example: I have an array x<-c(28,26,20,5,3,2,1) and I need to create the following groups: a=(28,5,1), b=(26,3,2), c=(20) because the sums of the groups a, b and c do not exceed 34.
Is it possible to implement this procedure in R?

If I understand correctly this is what you want to do:
create_groups <- function(input, threshold) {
input <- sort(input, decreasing = TRUE)
result <- vector("list", length(input))
sums <- rep(0, length(input))
for (k in input) {
i <- match(TRUE, sums + k <= threshold)
if (!is.na(i)) {
result[[i]] <- c(result[[i]], k)
sums[i] <- sums[i] + k
}
}
result[sapply(result, is.null)] <- NULL
result
}
create_groups(x, 34)
# [[1]]
# [1] 28 5 1
#
# [[2]]
# [1] 26 3 2
#
# [[3]]
# [1] 20
However it is not guaranteed that this greedy algorithm will output the optimal solution in terms of number of groups. For instance:
y <- c(18, 15, 11, 9, 8, 7)
create_groups(y, 34)
# [[1]]
# [1] 18 15
#
# [[2]]
# [1] 11 9 8
#
# [[3]]
# [1] 7
while the optimal solution in this case consists of only 2 groups: list(c(18, 9, 7), c(15, 11, 8)).

Assuming you want all possible combinations of subsets of x that meet this condition, you can use
x = c(28,26,20,5,3,2,1)
y = lapply(seq_along(x), function(y) combn(x, y)) # list all combinations of all subsets
le34 = sapply(y, function(z) colSums(z) <= 34) # which sums are less than 34
lapply(seq_along(y), function(i) y[[i]][,le34[[i]]] ) # list of combinations that meet condition

Clustering a second vector according to a clustered first vector in R

I have a vector that has been divided into two clusters (as discussed in this question):
x <- c(1, 4, 5, 6, 9, 29, 32, 46, 55)
tree <- hclust(dist(x), method = "single")
split(x, cutree(tree, h = 19))
# $`1`
# [1] 1 4 5 6 9
#
# $`2`
# [1] 29 32 46 55
Now suppose I have another cluster of the same length, which I wish to divide into the same number of clusters by the same indices as x, take the following vector y as an example:
set.seed(77)
y = rnorm(9)
y
#[1] -0.54964 1.09105 0.63978 1.04258 0.16970 1.13780 -0.97055 -0.13183
#[9] 0.14623
The desired result should be like this:
# $`1`
# [1] -0.54964 1.09105 0.63978 1.04258 0.16970
#
# $`2`
# [1] 1.13780 -0.97055 -0.13183 0.14623

Just like you did for x:
split(y, cutree(tree, h = 19))
And since you are now using cutree(tree, h = 19) in multiple places, you might as well assign it to a variable:
groups <- cutree(tree, h = 19)
split(x, groups)
split(y, groups)

How to write a data-frame with one column a list to a file?

Here is my dummy dataset:
dataset<-data.frame(a=c(1,2,3,4),b=c('a','b','c','d'), c=c("HI","DD","gg","ff"))
g=list(c("a","b"),c(2,3,4), c(44,33,11,22),c("chr","ID","i","II"))
dataset$l<-g
dataset
a b c l
1 1 a HI a, b
2 2 b DD 2, 3, 4
3 3 c gg 44, 33, 11, 22
4 4 d ff chr, ID, i, II
> mode(dataset$l)
[1] "list"
when I try to write the dataset to a file:
> write.table(dataset, "dataset.txt", quote=F, sep="\t")
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), :
unimplemented type 'list' in 'EncodeElement'
How can i solve this problem?

I can think a few options, depending on what you're trying to achieve.
If it is for display only, then you might simply want capture.output() or sink(); neither of these would be very convenient to read back into R:
capture.output(dataset, file="myfile.txt")
### Result is a text file that looks like this:
# a b c l
# 1 1 a HI a, b
# 2 2 b DD 2, 3, 4
# 3 3 c gg 44, 33, 11, 22
# 4 4 d ff chr, ID, i, II
sink("myfile.txt")
dataset
sink()
## Same result as `capture.output()` approach
If you want to be able to read the resulting table back into R (albeit without preserving the fact that column "l" is a list), you can take an approach similar to what #DWin suggested.
In the code below, the dataset2[sapply... line identifies which variables are lists and concatenates them into a single string. Thus, they become simple character variables, allowing you to use write.table().
dataset2 <- dataset # make a copy just to be on the safe side
dataset2[sapply(dataset2, is.list)] <- apply(dataset2[sapply(dataset2, is.list)],
1, function(x)
paste(unlist(x),
sep=", ", collapse=", "))
str(dataset2)
# 'data.frame': 4 obs. of 4 variables:
# $ a: num 1 2 3 4
# $ b: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
# $ c: Factor w/ 4 levels "DD","ff","gg",..: 4 1 3 2
# $ l: chr "a, b" "2, 3, 4" "44, 33, 11, 22" "chr, ID, i, II"
write.table(dataset2, "myfile.txt", quote=FALSE, sep="\t")
# can be read back in with: dataset3 <- read.delim("myfile.txt")

Output from save is unreadable. Output from dump or dput is ASCII and is readable to people who understand the structure of R objects, but I'm guessing you wanted it more conventionally arranged.
> apply(dataset, 1, function(x) paste(x, sep=",", collapse=","))
[1] "1,a,HI,c(\"a\", \"b\")"
[2] "2,b,DD,c(2, 3, 4)"
[3] "3,c,gg,c(44, 33, 11, 22)"
[4] "4,d,ff,c(\"chr\", \"ID\", \"i\", \"II\")"
The backslashes do not appear in the text-file output:
writeLines(con="test.txt", apply(dataset, 1, function(x) paste(x, sep=",", collapse=",")))
#-------output-----
1,a,HI,c("a", "b")
2,b,DD,c(2, 3, 4)
3,c,gg,c(44, 33, 11, 22)
4,d,ff,c("chr", "ID", "i", "II")

If one of the requirements is to preserve the formatting for excel, etc, this might help:
writableTable <- tableFlatten(dataset, filler="")
# a b c l.01 l.02 l.03 l.04
# 1 a HI a b
# 2 b DD 2 3 4
# 3 c gg 44 33 11 22
# 4 d ff chr ID i II
write.csv(writableTable, "myFile.csv")
tableFlatten uses a function listFlatten which, as the name implies, takes nested lists and flattens them.
However, if the elements within the lists are of different sizes, it adds filler (which can be NAs, blank spaces, or any other user defined option)
The code for it is below.
tableFlatten <- function(tableWithLists, filler="") {
# takes as input a table with lists and returns a flat table
# empty spots in lists are filled with value of `filler`
#
# depends on: listFlatten(.), findGroupRanges(.), fw0(.)
# index which columns are lists
listCols <- sapply(tableWithLists, is.list)
tableWithLists[listCols]
tableWithLists[!listCols]
# flatten lists into table
flattened <- sapply(tableWithLists[listCols], listFlatten, filler=filler, simplify=FALSE)
# fix names
for (i in 1:length(flattened)) colnames(flattened[[i]]) <- fw0(ncol(flattened[[i]]), 2)
# REASSEMBLE, IN ORDER
# find pivot point counts
pivots <- sapply(findGroupRanges(listCols), length)
#index markers
indNonList <- indList <- 1
# nonListGrp <- (0:(length(pivots)/2)) * 2 + 1
# ListGrp <- (1:(length(pivots)/2)) * 2
final <- data.frame(row.names=row.names(tableWithLists))
for (i in 1:length(pivots)) {
if(i %% 2 == 1) {
final <- cbind(final,
tableWithLists[!listCols][indNonList:((indNonList<-indNonList+pivots[[i]])-1)]
)
} else {
final <- cbind(final,
flattened[indList:((indList<-indList+pivots[[i]])-1)]
)
}
}
return(final)
}
#=====================================
listFlatten <- function(obj, filler=NA) {
## Flattens obj like rbind, but if elements are of different length, plugs in value filler
# Initialize Vars
bind <- FALSE
# IF ALL ELEMENTS ARE MATRIX-LIKE OR VECTORS, MAKE SURE SAME NUMBER OF COLUMNS
matLike <- sapply(obj, function(x) !is.null(dim(x)))
vecLike <- sapply(obj, is.vector)
# If all matrix-like.
if (all(matLike)) {
maxLng <- max(sapply(obj[matLike], ncol))
obj[matLike] <- lapply(obj[matLike], function(x) t(apply(x, 1, c, rep(filler, maxLng - ncol(x)))))
bind <- TRUE
# If all vector-like
} else if (all(vecLike)) {
maxLng <- max(sapply(obj[vecLike], length))
obj[vecLike] <- lapply(obj[vecLike], function(x) c(x, rep(filler, maxLng - length(x))))
bind <- TRUE
# If all are either matrix- or vector-like
} else if (all(matLike & vecLike)) {
maxLng <- max(sapply(obj[matLike], ncol), sapply(obj[vecLike], length))
# Add in filler's as needed
obj[matLike] <-
lapply(obj[matLike], function(x) t(apply(x, 1, c, rep(filler, maxLng - ncol(x)))))
obj[vecLike] <-
lapply(obj[vecLike], function(x) c(x, rep(filler, maxLng - length(x))))
bind <- TRUE
}
# If processed and ready to be returned, then just clean it up
if(bind) {
ret <- (do.call(rbind, obj))
colnames(ret) <- paste0("L", fw0(1:ncol(ret), digs=2))
return(ret)
}
# Otherwise, if obj is sitll a list, continue recursively
if (is.list(obj)) {
return(lapply(obj, listFlatten))
}
# If none of the above, return an error.
stop("Unknown object type")
}
#--------------------------------------------
findGroupRanges <- function(booleanVec) {
# returns list of indexes indicating a series of identical values
pivots <- which(sapply(2:length(booleanVec), function(i) booleanVec[[i]] != booleanVec[[i-1]]))
### THIS ISNT NEEDED...
# if (identical(pivots, numeric(0)))
# pivots <- length(booleanVec)
pivots <- c(0, pivots, length(booleanVec))
lapply(seq(2, length(pivots)), function(i)
seq(pivots[i-1]+1, pivots[i])
)
}
#--------------------------------------------
fw0 <- function(num, digs=NULL, mkSeq=TRUE) {
## formats digits with leading 0's.
## num should be an integer or range of integers.
## if mkSeq=T, then an num of length 1 will be expanded to seq(1, num).
# TODO 1: put more error check
if (is.list(num))
lapply(num, fw0)
if (!is.vector(num)) {
stop("num should be integer or vector")
}
# convert strings to numbers
num <- as.numeric(num)
# If num is a single number and mkSeq is T, expand to seq(1, num)
if(mkSeq && !length(num)>1)
num <- (1:num)
# number of digits is that of largest number or digs, whichever is max
digs <- max(nchar(max(abs(num))), digs)
# if there are a mix of neg & pos numbers, add a space for pos numbs
posSpace <- ifelse(sign(max(num)) != sign(min(num)), " ", "")
# return: paste appropriate 0's and preface neg/pos mark
sapply(num, function(x) ifelse(x<0,
paste0("-", paste0(rep(0, max(0, digs-nchar(abs(x)))), collapse=""), abs(x)),
paste0(posSpace, paste0(rep(0, max(0, digs-nchar(abs(x)))), collapse=""), x)
))
}
#-----------------------------------------------

You can use dput for this.
dput(dataset, "dataset.txt")

you can also use save()
save(dataset, file="dataset.RData")

The answer provided by #Ananda is excellent, however, I ran into an issue when I had a data frame with two columns that were lists.
dataset<-data.frame(a=c(1,2,3,4),b=c('a','b','c','d'), c=c("HI","DD","gg","ff"))
g=list(c("a","b"),c(2,3,4), c(44,33,11,22),c("chr","ID","i","II"))
dataset$l<-g
dataset$l2<-g
dataset
a b c l l2
1 1 a HI a, b a, b
2 2 b DD 2, 3, 4 2, 3, 4
3 3 c gg 44, 33, 11, 22 44, 33, 11, 22
4 4 d ff chr, ID, i, II chr, ID, i, II
Using the original answer, both list columns contain the concatenated contents of both columns.
a b c l l2
1 1 a HI a, b, a, b a, b, a, b
2 2 b DD 2, 3, 4, 2, 3, 4 2, 3, 4, 2, 3, 4
3 3 c gg 44, 33, 11, 22, 44, 33, 11, 22 44, 33, 11, 22, 44, 33, 11, 22
4 4 d ff chr, ID, i, II, chr, ID, i, II chr, ID, i, II, chr, ID, i, II
Instead, try this modified version:
dataset2 <- dataset # make a copy just to be on the safe side
dataset2[sapply(dataset2, is.list)] <-
sapply(dataset2[sapply(dataset2, is.list)],
function(x)sapply(x, function(y) paste(unlist(y),collapse=", ") ) )
dataset2
a b c l l2
1 1 a HI a, b a, b
2 2 b DD 2, 3, 4 2, 3, 4
3 3 c gg 44, 33, 11, 22 44, 33, 11, 22
4 4 d ff chr, ID, i, II chr, ID, i, II

I stumbled across this and while there are a lot of great answers, I ended up doing something else. Sharing for posterity.
library(dplyr)
flatten_list = function(x){
if (typeof(x) != "list") {
return(x)
}
sapply(x, function(y) paste(y, collapse = " | "))
}
data %>%
mutate_each(funs(flatten_list)) ->
write_csv("data.csv")

Interpolate NA values

I have two set of samples that are time independent. I would like to merge them and calculate the missing values
for the times where I do not have values of both. Simplified example:
A <- cbind(time=c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
Avalue=c(1, 2, 3, 2, 1, 2, 3, 2, 1, 2))
B <- cbind(time=c(15, 30, 45, 60), Bvalue=c(100, 200, 300, 400))
C <- merge(A,B, all=TRUE)
time Avalue Bvalue
1 10 1 NA
2 15 NA 100
3 20 2 NA
4 30 3 200
5 40 2 NA
6 45 NA 300
7 50 1 NA
8 60 2 400
9 70 3 NA
10 80 2 NA
11 90 1 NA
12 100 2 NA
By assuming linear change between each sample, it is possible to calculate the missing NA values.
Intuitively it is easy to see that the A value at time 15 and 45 should be 1.5. But a proper calculation for B
for instance at time 20 would be
100 + (20 - 15) * (200 - 100) / (30 - 15)
which equals 133.33333.
The first parenthesis being the time between estimate time and the last sample available.
The second parenthesis being the difference between the nearest samples.
The third parenthesis being the time between the nearest samples.
How can I use R to calculate the NA values?

Using the zoo package:
library(zoo)
Cz <- zoo(C)
index(Cz) <- Cz[,1]
Cz_approx <- na.approx(Cz)

The proper way to do this statistically and still get valid confidence intervals is to use Multiple Imputation. See Rubin's classic book, and there's an excellent R package for this (mi).

An ugly and probably inefficient Base R solution:
# Data provided:
A <- cbind(time=c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
Avalue=c(1, 2, 3, 2, 1, 2, 3, 2, 1, 2))
B <- cbind(time=c(15, 30, 45, 60), Bvalue=c(100, 200, 300, 400))
C <- merge(A,B, all=TRUE)
# Scalar valued at the minimum time difference: -> min_time_diff
min_time_diff <- min(diff(C$time))
# Adjust frequency of the series to hold all steps in range: -> df
df <- merge(C,
data.frame(time = seq(min_time_diff,
max(C$time),
by = min_time_diff)),
by = "time",
all = TRUE)
# Linear interpolation function handling ties,
# returns interpolated vector the same length
# a the input vector: -> vector
l_interp_vec <- function(na_vec){
approx(x = na_vec,
method = "linear",
ties = "constant",
n = length(na_vec))$y
}
# Applied to a dataframe, replacing NA values
# in each of the numeric vectors,
# with interpolated values.
# input is dataframe: -> dataframe()
interped_df <- data.frame(lapply(df, function(x){
if(is.numeric(x)){
# Store a scalar of min row where x isn't NA: -> min_non_na
min_non_na <- min(which(!(is.na(x))))
# Store a scalar of max row where x isn't NA: -> max_non_na
max_non_na <- max(which(!(is.na(x))))
# Store scalar of the number of rows needed to impute prior
# to first NA value: -> ru_lower
ru_lower <- ifelse(min_non_na > 1, min_non_na - 1, min_non_na)
# Store scalar of the number of rows needed to impute after
# the last non-NA value: -> ru_lower
ru_upper <- ifelse(max_non_na == length(x),
length(x) - 1,
(length(x) - (max_non_na + 1)))
# Store a vector of the ramp to function: -> l_ramp_up:
ramp_up <- as.numeric(
cumsum(rep(x[min_non_na]/(min_non_na), ru_lower))
)
# Apply the interpolation function on vector "x": -> y
y <- as.numeric(l_interp_vec(as.numeric(x[min_non_na:max_non_na])))
# Create a vector that combines the ramp_up vector
# and y if the first NA is at row 1: -> z
if(length(ramp_up) > 1 & max_non_na != length(x)){
# Create a vector interpolations if there are
# multiple NA values after the last value: -> lower_l_int
lower_l_int <- as.numeric(cumsum(rep(mean(diff(c(ramp_up, y))),
ru_upper+1)) +
as.numeric(x[max_non_na]))
# Store the linear interpolations in a vector: -> z
z <- as.numeric(c(ramp_up, y, lower_l_int))
}else if(length(ramp_up) > 1 & max_non_na == length(x)){
# Store the linear interpolations in a vector: -> z
z <- as.numeric(c(ramp_up, y))
}else if(min_non_na == 1 & max_non_na != length(x)){
# Create a vector interpolations if there are
# multiple NA values after the last value: -> lower_l_int
lower_l_int <- as.numeric(cumsum(rep(mean(diff(c(ramp_up, y))),
ru_upper+1)) +
as.numeric(x[max_non_na]))
# Store the linear interpolations in a vector: -> z
z <- as.numeric(c(y, lower_l_int))
}else{
# Store the linear interpolations in a vector: -> z
z <- as.numeric(y)
}
# Interpolate between points in x, return new x:
return(as.numeric(ifelse(is.na(x), z, x)))
}else{
x
}
}
)
)
# Subset interped df to only contain
# the time values in C, store a data frame: -> int_df_subset
int_df_subset <- interped_df[interped_df$time %in% C$time,]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

subset data.frame for every level of a factor - r

using library(data.table) we can do lower = c(20, 12, 21, 13) upper = c(30, 21, 42, 58) setDT(df)[!between(y, lower[x], upper[x]), .SD, keyby=x] # x y # 1: A 63.349326 # 2: A 59.876097 # 3: A 97.617069 # 4: A 73.179251 # 5: A 49.559358 # 6: A 17.344233 # 7: A 51.116978 # ...

Related

R: Using loop to calculate reads per million (number multiplied by 10^6 divided by the sum of the numbers within a column)

Create a group of numbers that does not exceed 34

Clustering a second vector according to a clustered first vector in R

How to write a data-frame with one column a list to a file?

Interpolate NA values

Categories

Resources