For example, suppose that you had a function that applied some DPLYR functions, but you couldn't expect datasets passed to this function to have the same column names.
For a simplified example of what I mean, say you had a data frame, arizona.trees:
arizona.trees
group arizona.redwoods arizona.oaks
A 23 11
A 24 12
B 9 8
B 10 7
C 88 22
and another very similar data frame, california.trees:
california.trees
group california.redwoods california.oaks
A 25 50
A 11 33
B 90 5
B 77 3
C 90 35
And you wanted to implement a function that returns the mean for the given groups (A, B, ... Z) for a given type of tree that would work for both of these data frames.
foo <- function(dataset, group1, group2, tree.type) {
column.name <- colnames(dataset[2])
result <- filter(dataset, group %in% c(group1, group2) %>%
select(group, contains(tree.type)) %>%
group_by(group) %>%
summarize("mean" = mean(column.name))
return(result)
}
A desired output for a call of foo(california.trees, A, B, redwoods) would be:
result
mean
A 18
B 83.5
For some reason, doing something like the implementation of foo() just doesn't seem to work. This is likely due to some error with the data frame indexing - the function seems to think I am attempting to get the mean of the column.name string, rather than retrieving the column and passing the column to mean(). I'm not sure how to avoid this. There's the issue of the implicit passing of the modified dataframe that can't be directly referenced with the pipe operator that may be causing the issue.
Why is this? Is there some alternative implementation that would work?
We can use the quosure based solution from the devel version of dplyr (soon to be released 0.6.0)
foo <- function(dataset, group1, group2, tree.type){
group1 <- quo_name(enquo(group1))
group2 <- quo_name(enquo(group2))
colN <- rlang::parse_quosure(names(dataset)[2])
tree.type <- quo_name(enquo(tree.type))
dataset %>%
filter(group %in% c(group1, group2)) %>%
select(group, contains(tree.type)) %>%
group_by(group) %>%
summarise(mean = mean(UQ(colN)))
}
foo(california.trees, A, B, redwoods)
# A tibble: 2 × 2
# group mean
# <chr> <dbl>
#1 A 18.0
#2 B 83.5
foo(arizona.trees, A, B, redwoods)
# A tibble: 2 × 2
# group mean
# <chr> <dbl>
#1 A 23.5
#2 B 9.5
The enquotakes the input arguments and converts it to quosure, with quo_name, it is converted to string for using with %in%, the second column name is converted to quosure from string using parse_quosure and then it is unquoted (UQ or !!) for evaluation within summarise
NOTE: This is based on the OP's function about selecting the second column
The above solution was based on selecting the column based on position (as per the OP's code) and it may not work for other columns. So, we can match the 'tree.type' and get the 'mean' of the columns based on that
foo1 <- function(dataset, group1, group2, tree.type){
group1 <- quo_name(enquo(group1))
group2 <- quo_name(enquo(group2))
tree.type <- quo_name(enquo(tree.type))
dataset %>%
filter(group %in% c(group1, group2)) %>%
select(group, contains(tree.type)) %>%
group_by(group) %>%
summarise_at(vars(contains(tree.type)), funs(mean = mean(.)))
}
The function can be tested for different columns in the two datasets
foo1(arizona.trees, A, B, oaks)
# A tibble: 2 × 2
# group mean
# <chr> <dbl>
#1 A 11.5
#2 B 7.5
foo1(arizona.trees, A, B, redwood)
# A tibble: 2 × 2
# group mean
# <chr> <dbl>
#1 A 23.5
#2 B 9.5
foo1(california.trees, A, B, redwood)
# A tibble: 2 × 2
# group mean
# <chr> <dbl>
#1 A 18.0
#2 B 83.5
foo1(california.trees, A, B, oaks)
# A tibble: 2 × 2
# group mean
# <chr> <dbl>
#1 A 41.5
#2 B 4.0
data
arizona.trees <- structure(list(group = c("A", "A", "B", "B", "C"),
arizona.redwoods = c(23L,
24L, 9L, 10L, 88L), arizona.oaks = c(11L, 12L, 8L, 7L, 22L)),
.Names = c("group",
"arizona.redwoods", "arizona.oaks"), class = "data.frame",
row.names = c(NA, -5L))
california.trees <- structure(list(group = c("A", "A", "B", "B", "C"),
california.redwoods = c(25L,
11L, 90L, 77L, 90L), california.oaks = c(50L, 33L, 5L, 3L, 35L
)), .Names = c("group", "california.redwoods", "california.oaks"
), class = "data.frame", row.names = c(NA, -5L))
Related
I'm working with a dataframe with the following structure:
ID origin value1 value2
1 A 100 50
1 A 200 100
2 B 10 2
2 B 150 30
So each row can have different origins and I need to make some calculations by ID, but the value variable I'm using depends on the origin variable. So if origin == 'A' I should use value1 and if it's B I should use value2. My code without taking this last condition into account looks like this:
df2 <- df %>%
group_by(ID) %>%
mutate(mean_value = mean(value1, na.rm = TRUE),
sd_value = sd(value1, na.rm = TRUE),
median_value = median(value1, na.rm = TRUE),
cv_value = sd_value1/mean_value1,
p25_value = quantile(value1, 0.25, na.rm = TRUE),
p75_value = quantile(value1, 0.75, na.rm = TRUE))
I know I could add an if_else statement to each line, but I think my code will lose some readability (In my actual data there's multiple origins, which makes this a bit more cumbersome). So, I was thinking of creating a custom function, maybe using map or maybe something using group_by origin, but I'm not finding a good way to implement these options. Any ideas? My desired dataframe would look like this (I'll add only the first mutate column for simplicity):
ID origin value1 value2 mean_value
1 A 100 50 150
1 A 200 100 150
2 B 10 2 16
2 B 150 30 16
So the first mean value is (100 + 200) / 2 (from value1) and the second is (30 + 2) / 2 (from value2).
Thanks!
We could create a temporary column first and then do the mean afterwards. In this way, we may need to use ifelse/case_when only once
library(dplyr)
df %>%
mutate(valuenew = case_when(origin == 'A' ~ value1,
TRUE ~ value2)) %>%
group_by(ID) %>%
mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
ungroup
-output
# A tibble: 4 × 5
ID origin value1 value2 mean_value
<int> <chr> <int> <int> <dbl>
1 1 A 100 50 150
2 1 A 200 100 150
3 2 B 10 2 16
4 2 B 150 30 16
data
df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B",
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L,
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))
I want to extract the index based of the minimum number for every Group
Group <- c("A","A","A","A","A","B","B","C","C","C","C")
Number <- c(12,45,15,65,54,21,23,12,3,5,6,11,34,656,754)
data.frame(Group,Number)
Group Number
1 A 12
2 A 45
3 A 15
4 A 65
5 A 54
6 B 21
7 B 23
8 C 12
9 C 3
10 C 5
11 C 6
The result should be a vector that contain the indices:
Answer
vector <- (1,6,9)
Create a sequence column, grouped by 'Group', summarise by returning the corresponding row number based on the index of min value of 'Number' (which.min) and pull the column as a vector
library(dplyr)
df1 %>%
mutate(rn = row_number()) %>%
group_by(Group) %>%
summarise(n = rn[which.min(Number)]) %>%
pull(n)
#[1] 1 6 9
data
df1 <- structure(list(Group = c("A", "A", "A", "A", "A", "B", "B", "C",
"C", "C", "C"), Number = c(12L, 45L, 15L, 65L, 54L, 21L, 23L,
12L, 3L, 5L, 6L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11"))
Does this work for you?
library(dplyr)
df %>%
mutate(row_n = row_number()) %>%
group_by(Group) %>%
slice_min(Number)
# A tibble: 3 x 3
# Groups: Group [3]
Group Number row_n
<chr> <dbl> <int>
1 A 12 1
2 B 12 7
3 C 3 8
The row numbers are in column row_n. If you want outputted only the row numbers, add %>% ungroup() %>% select(-c(1:2)) like so:
df %>%
mutate(row_n = row_number()) %>%
group_by(Group) %>%
slice_min(Number) %>%
ungroup() %>%
select(-c(1:2))
# A tibble: 3 x 1
row_n
<int>
1 1
2 7
3 8
Data:
Group <- c("A","A","A","A","A","B","B","C","C","C","C")
Number <- c(12,45,65,54,21,23,12,3,5,6,34)
df <- data.frame(Group,Number)
This function returns the index i of the smallest value in v
FUN = function(v, i) i[which.min(v)]
Here are the values by group
v = split(df$Number, df$Group)
and the index into the original data.frame by group
i = split(seq_along(df$Number), df$Group)
Apply our function to each group
mapply(FUN, v, i)
In one go:
FUN = function(v, i) i[which.min(v)]
v = split(df$Number, df$Group)
i = split(seq_along(df$Number), df$Group)
mapply(FUN, v, i)
I'm trying to merge two data frames together which are related to each other via a specific variable named patient. The second data frame has multiple entries for the same patient column. I don't want to create duplicate patient entries upon merging, but I want to retain unique information in the second data frame by concatenating the values under one column.
I tried manually concatenating certain variables using group_by which works. I have several variables, however, and manually specifying all of them is not feasible
I can also concatenate every variable in the data frame by using dplyr as seen below. The problem in the second case is that duplicate values are also concatenated making the data frame unnecessarily big and difficult to deal with. Please see the reprex below.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- data.frame(patient=c("a", "b", "c"),
var1 = 1:3,
var2=11:13)
df1
#> patient var1 var2
#> 1 a 1 11
#> 2 b 2 12
#> 3 c 3 13
df2 <- data.frame(patient=c("a","a", "b", "b", "c", "c" ),
treatment= rep(c("drug1", "drug2"), 3),
time= rep(c("time1", "time2"), 3),
var3= "constant")
df2
#> patient treatment time var3
#> 1 a drug1 time1 constant
#> 2 a drug2 time2 constant
#> 3 b drug1 time1 constant
#> 4 b drug2 time2 constant
#> 5 c drug1 time1 constant
#> 6 c drug2 time2 constant
df_merged <- left_join(df1, df2)
#> Joining, by = "patient"
# Don't want duplicates like this
df_merged
#> patient var1 var2 treatment time var3
#> 1 a 1 11 drug1 time1 constant
#> 2 a 1 11 drug2 time2 constant
#> 3 b 2 12 drug1 time1 constant
#> 4 b 2 12 drug2 time2 constant
#> 5 c 3 13 drug1 time1 constant
#> 6 c 3 13 drug2 time2 constant
df_merged2 <- df_merged %>%
group_by(patient) %>%
mutate(treatment = paste(treatment, collapse = "_"),
time=paste(time, collapse = "_")) %>%
filter(!duplicated(patient))
# I can manually edit a few variables like this
df_merged2
#> # A tibble: 3 x 6
#> # Groups: patient [3]
#> patient var1 var2 treatment time var3
#> <fct> <int> <int> <chr> <chr> <fct>
#> 1 a 1 11 drug1_drug2 time1_time2 constant
#> 2 b 2 12 drug1_drug2 time1_time2 constant
#> 3 c 3 13 drug1_drug2 time1_time2 constant
df_merged3 <- df_merged %>%
group_by(patient) %>%
mutate_at(vars(-group_cols()), .funs = ~paste(., collapse ="_")) %>%
filter(!duplicated(patient))
# I have many variables I can't specify manually
# I can create this merged data frame, but I don't want to
# concatenate duplicated values such as var1, var2, and var3
df_merged3
#> # A tibble: 3 x 6
#> # Groups: patient [3]
#> patient var1 var2 treatment time var3
#> <fct> <chr> <chr> <chr> <chr> <chr>
#> 1 a 1_1 11_11 drug1_drug2 time1_time2 constant_constant
#> 2 b 2_2 12_12 drug1_drug2 time1_time2 constant_constant
#> 3 c 3_3 13_13 drug1_drug2 time1_time2 constant_constant
Created on 2019-10-23 by the reprex package (v0.3.0)
I'd like to see if there is a way of concatenating variables containing only unique values to retain information from the second data frame without duplicating the rows in the df_merged.
I would be happy to hear if you have recommendations other than dplyr. A data.table solution may also be suitable for me as well, since my real data frames are quite large.
Thanks!
We can use summarise_at and unique
library(dplyr)
df_merged %>%
group_by(patient) %>%
summarise_at(vars(-group_cols()), .funs = ~paste(unique(.), collapse ="_"))
Or we can do the merge/joint directly instead of adding/altering the Global Env with an intermediate dataframe.
left_join(df1,
df2 %>% group_by(patient) %>%
summarise_at(vars(-group_cols()), .funs = ~paste(unique(.), collapse ="_")) %>%
ungroup()
)
Joining, by = "patient"
patient var1 var2 treatment time var3
1 a 1 11 drug1_drug2 time1_time2 constant
2 b 2 12 drug1_drug2 time1_time2 constant
3 c 3 13 drug1_drug2 time1_time2 constant
Update
#Here a toy example to experiment with, uncomment browser to see how it works inside Reduce,
#also see ?Reduce for more info
paste_mod <- function(x) Reduce(function(u, v){
u <- ifelse(!grepl('_',u) & is.na(u),'.',u)
v <- ifelse(is.na(v),'.',v)
if(v=='.' | !grepl(v,u)) paste0(u,'_',v) else u
}, x)
paste_mod(c("drug1",NA,NA,"drug2","drug1","drug2"))
[1] "drug1_._._drug2"
paste_mod(c(NA,NA,"drug2","drug1","drug2"))
[1] "._._drug2_drug1"
#replace NA with . then apply Reduce
df2 %>%
mutate_if(is.factor,as.character) %>% mutate_all(~replace(.,is.na(.),'.')) %>%
group_by(patient) %>%
summarise_at(vars(-group_cols()), .funs = ~Reduce(function(u, v) if(v=='.' | !grepl(v,u)) paste0(u,'_',v) else u, .)) %>%
ungroup()
# A tibble: 2 x 4
patient treatment time var3
<chr> <chr> <chr> <chr>
1 a drug1_._._drug2 time1_time2 constant
2 c drug1_drug2 time1_time2 constant
New df2 for testing the updated solution
df2 <- structure(list(patient = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("a",
"c"), class = "factor"), treatment = structure(c(1L, NA, NA,
2L, 1L, 2L), .Label = c("drug1", "drug2"), class = "factor"),
time = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("time1",
"time2"), class = "factor"), var3 = structure(c(1L, 1L, 1L,
1L, 1L, 1L), class = "factor", .Label = "constant")), class = "data.frame", row.names = c(NA,
-6L))
Setup:
I have a tibble (named data) with an embedded list of data.frames.
df1 <- data.frame(name = c("columnName1","columnName2","columnName3"),
value = c("yes", 1L, 0L),
stringsAsFactors = F)
df2 <- data.frame(name = c("columnName1","columnName2","columnName3"),
value = c("no", 1L, 1L),
stringsAsFactors = F)
df3 <- data.frame(name = c("columnName1","columnName2","columnName3"),
value = c("yes", 0L, 0L),
stringsAsFactors = F)
responses = list(df1,
df2,
df3)
data <- tibble(ids = c(23L, 42L, 84L),
responses = responses)
Note this is a simplified example of the data. The original data is from a flat json file and loaded with jsonlite::stream_in() function.
Objective:
My goal is to convert this tibble to another tibble where the embedded data.frames are spread (transposed) as columns; for example, my goal tibble is:
goal <- tibble(ids = c(23L, 42L, 84L),
columnName1 = c("yes","no","yes"),
columnName2 = c(1L, 1L, 0L),
columnName3 = c(0L, 1L, 0L))
# goal tibble
> goal
# A tibble: 3 x 4
ids columnName1 columnName2 columnName3
<int> <chr> <int> <int>
1 23 yes 1 0
2 42 no 1 1
3 84 yes 0 0
My inelegant solution:
Use dplyr::bind_rows() and tidyr::spread():
rdf <- dplyr::bind_rows(data$responses, .id = "id") %>%
tidyr::spread(key = "name", -id)
goal2 <- cbind(ids = data$ids, rdf[,-1]) %>%
as.tibble()
Comparing my solution to the goal:
# produced tibble
> goal2
# A tibble: 3 x 4
ids columnName1 columnName2 columnName3
* <int> <chr> <chr> <chr>
1 23 yes 1 0
2 42 no 1 1
3 84 yes 0 0
Overall, my solution works but has a few problems:
I don't know how to pass the unique ids through bind_rows() which forces me to create a dummy id ("id") which can't match to the original id ("ids"). This forces me to use a cbind() (which I don't like) and manually remove the dummy id (using -1 slicing on rdf).
The format of the columns are lost as my approach converts the integer columns to characters.
Any suggestions on how to improve my solution (especially using tidyverse based packages like tidyjson or tidyr)?
We can loop over the 'responses' column with map, spread it to 'wide' with convert = TRUE so that the column types, create that as a column with transmute and then unnest
library(tidyverse)
data %>%
transmute(ids, ind = map(responses, ~.x %>%
spread(name, value, convert = TRUE))) %>%
unnest
# A tibble: 3 x 4
# ids columnName1 columnName2 columnName3
# <int> <chr> <int> <int>
#1 23 yes 1 0
#2 42 no 1 1
#3 84 yes 0 0
Or using the OP's code, we set the names of the list with 'ids' column, do the bind_rows and then spread
bind_rows(setNames(data$responses, data$ids), .id = 'ids') %>%
spread(name, value, convert = TRUE)
I have two data frames. dfOne is made like this:
X Y Z T J
3 4 5 6 1
1 2 3 4 1
5 1 2 5 1
and dfTwo is made like this
C.1 C.2
X Z
Y T
I want to obtain a new dataframe where there are simultaneously X, Y, Z, T Values which are major than a specific threshold.
Example. I need simultaneously (in the same row):
X, Y > 2
Z, T > 4
I need to use the second data frame to reach my objective, I expect something like:
dfTwo$C.1>2
so the result would be a new dataframe with this structure:
X Y Z T J
3 4 5 6 1
How could I do it?
Here is a base R method with Map and Reduce.
# build lookup table of thresholds relative to variable name
vals <- setNames(c(2, 2, 4, 4), unlist(dat2))
# subset data.frame
dat[Reduce("&", Map(">", dat[names(vals)], vals)), ]
X Y Z T J
1 3 4 5 6 1
Here, Map returns a list of length 4 with logical variables corresponding to each comparison. This list is passed to Reduce which returns a single logical vector with length corresponding to the number of rows in the data.frame, dat. This logical vector is used to subset dat.
data
dat <-
structure(list(X = c(3L, 1L, 5L), Y = c(4L, 2L, 1L), Z = c(5L,
3L, 2L), T = c(6L, 4L, 5L), J = c(1L, 1L, 1L)), .Names = c("X",
"Y", "Z", "T", "J"), class = "data.frame", row.names = c(NA,
-3L))
dat2 <-
structure(list(C.1 = structure(1:2, .Label = c("X", "Y"), class = "factor"),
C.2 = structure(c(2L, 1L), .Label = c("T", "Z"), class = "factor")), .Names = c("C.1",
"C.2"), class = "data.frame", row.names = c(NA, -2L))
We can use the purrr package
Here is the input data.
# Data frame from lmo's solution
dat <-
structure(list(X = c(3L, 1L, 5L), Y = c(4L, 2L, 1L), Z = c(5L,
3L, 2L), T = c(6L, 4L, 5L), J = c(1L, 1L, 1L)), .Names = c("X",
"Y", "Z", "T", "J"), class = "data.frame", row.names = c(NA,
-3L))
# A numeric vector to show the threshold values
# Notice that columns without any requirements need NA
vals <- c(X = 2, Y = 2, Z = 4, T = 4, J = NA)
Here is the implementation
library(purrr)
map2_dfc(dat, vals, ~ifelse(.x > .y | is.na(.y), .x, NA)) %>% na.omit()
# A tibble: 1 x 5
X Y Z T J
<int> <int> <int> <int> <int>
1 3 4 5 6 1
map2_dfc loop through each column in dat and each value in vals one by one with a defined function. ~ifelse(.x > .y | is.na(.y), .x, NA) means if the number in each column is larger than the corresponding value in vals, or vals is NA, the output should be the original value from the column. Otherwise, the value is replaced to be NA. The output of map2_dfc(dat, vals, ~ifelse(.x > .y | is.na(.y), .x, NA)) is a data frame with NA values in some rows indicating that the condition is not met. Finally, na.omit removes those rows.
Update
Here I demonstrate how to covert the dfTwo dataframe to the vals vector in my example.
First, let's create the dfTwo data frame.
dfTwo <- read.table(text = "C.1 C.2
X Z
Y T",
header = TRUE, stringsAsFactors = FALSE)
dfTwo
C.1 C.2
1 X Z
2 Y T
To complete the task, I load the dplyr and tidyr package.
library(dplyr)
library(tidyr)
Now I begin the transformation of dfTwo. The first step is to use stack function to convert the format.
dfTwo2 <- dfTwo %>%
stack() %>%
setNames(c("Col", "Group")) %>%
mutate(Group = as.character(Group))
dfTwo2
Col Group
1 X C.1
2 Y C.1
3 Z C.2
4 T C.2
The second step is to add the threshold information. One way to do this is to create a look-up table showing the association between Group and Value
threshold_df <- data.frame(Group = c("C.1", "C.2"),
Value = c(2, 4),
stringsAsFactors = FALSE)
threshold_df
Group Value
1 C.1 2
2 C.2 4
And then we can use the left_join function to combine the data frame.
dfTwo3 <- dfTwo2 %>% left_join(threshold_dt, by = "Group")
dfTwo3
Col Group Value
1 X C.1 2
2 Y C.1 2
3 Z C.2 4
4 T C.2 4
Now it is the third step. Notice that there is a column called J which does not need any threshold. So we need to add this information to dfTwo3. We can use the complete function from tidyr. The following code completes the data frame by adding Col in dat but not in dfTwo3 and NA to the Value.
dfTwo4 <- dfTwo3 %>% complete(Col = colnames(dat))
dfTwo4
# A tibble: 5 x 3
Col Group Value
<chr> <chr> <dbl>
1 J <NA> NA
2 T C.2 4
3 X C.1 2
4 Y C.1 2
5 Z C.2 4
The fourth step is arrange the right order of dfTwo4. We can achieve this by turning Col to factor and assign the level based on the order of the column name in dat.
dfTwo5 <- dfTwo4 %>%
mutate(Col = factor(Col, levels = colnames(dat))) %>%
arrange(Col) %>%
mutate(Col = as.character(Col))
dfTwo5
# A tibble: 5 x 3
Col Group Value
<chr> <chr> <dbl>
1 X C.1 2
2 Y C.1 2
3 Z C.2 4
4 T C.2 4
5 J <NA> NA
We are almost there. Now we can create vals from dfTwo5.
vals <- dfTwo5$Value
names(vals) <- dfTwo5$Col
vals
X Y Z T J
2 2 4 4 NA
Now we are ready to use the purrr package to filter the data.
The aboved are the breakdown of steps. We can combine all these steps into the following code for simlicity.
library(dplyr)
library(tidyr)
threshold_df <- data.frame(Group = c("C.1", "C.2"),
Value = c(2, 4),
stringsAsFactors = FALSE)
dfTwo2 <- dfTwo %>%
stack() %>%
setNames(c("Col", "Group")) %>%
mutate(Group = as.character(Group)) %>%
left_join(threshold_df, by = "Group") %>%
complete(Col = colnames(dat)) %>%
mutate(Col = factor(Col, levels = colnames(dat))) %>%
arrange(Col) %>%
mutate(Col = as.character(Col))
vals <- dfTwo2$Value
names(vals) <- dfTwo2$Col
dfOne[Reduce(intersect, list(which(dfOne["X"] > 2),
which(dfOne["Y"] > 2),
which(dfOne["Z"] > 4),
which(dfOne["T"] > 4))),]
# X Y Z T J
#1 3 4 5 6 1
Or iteratively (so fewer inequalities are tested):
vals = c(X = 2, Y = 2, Z = 4, T = 4) # from #lmo's answer
dfOne[Reduce(intersect, lapply(names(vals), function(x) which(dfOne[x] > vals[x]))),]
# X Y Z T J
#1 3 4 5 6 1
I'm writing this assuming that the second DF is meant to categorize the fields in the first DF. It's way simpler if you don't need to use the second one to define the conditions:
dfNew = dfOne[dfOne$X > 2 & dfOne$Y > 2 & dfOne$Z > 4 & dfOne$T > 4, ]
Or, using dplyr:
library(dplyr)
dfNew = dfOne %>% filter(X > 2 & Y > 2 & Z > 4 & T > 4)
In case that's all you need, I'll save this comment while I poke at the more complicated version of the question.