Exact age matched match with Matchit doesn't work. - r
I am trying to perform a case-control exact matching by age.
My database is composed of 139 eyes of 75 patients divided into 2 group by a dichotomy variable (G6PDcarente= 0/1).
I am trying to perform the matching with the code:
match.it <- matchit(G6PDcarente~age, data = newdata, method="exact",ratio=1,replace=FALSE)
match.it
The problem is that the results are:
Exact Subclasses: 14
Sample sizes:
Control Treated
All 43 85
Matched 31 42
Unmatched 12 43
Why is the sample size of the matched pairs so different?
Should not it be the same for the control and treat matched sample (eg:31-31)?
How can I obtain an exact match on age with the same sample size in the two group?
I have also tried the code:
match.it <- matchit(G6PDcarente~age, data = newdata, method="nearest",exact="age",ratio=1, replace=FALSE)
But I have the following error message:
Error in Ops.data.frame(exact[itert, k], exact[clabels, k]) :
‘!=’ only defined for equally-sized data frames
Inoltre: Warning message:
In matchit2nearest(c(`1` = 0, `2` = 0, `3` = 0, `4` = 0, `5` = 0, :
Fewer control than treated units and matching without replacement. Not all treated units will receive a match. Treated units will be matched in the order specified by m.order: largest
Can someone help me?
Thanks
Here is the code that reproduces a sample of my data:
newdata <- structure(list(NumeroProgressivo = c(43, 44, 137, 138, 129, 130,
65, 111, 148, 149, 35, 36, 83, 84, 37, 38, 127, 128, 160, 161,
75, 76, 53, 54, 119, 120, 109, 110, 57, 58, 39, 51, 52, 29, 30,
71, 72, 154, 155, 77, 78, 1, 2, 61, 62, 158, 101, 102, 27, 28,
73, 103, 104, 121, 122, 152, 153, 107, 108, 45, 46, 81, 82, 139,
140, 59, 60, 95, 96, 33, 34, 91, 92, 26, 49, 50, 79, 6, 63, 64,
15, 16, 31, 32, 143, 144, 69, 70, 89, 90, 41, 42, 17, 18, 67,
68, 115, 116, 150, 151, 97, 98, 93, 94, 135, 136, 55, 56, 131,
132, 162, 163, 21, 22, 23, 24, 156, 157, 133, 166, 174, 175,
164, 165, 172, 173, 176, 177), IDpaziente = c(22, 22, 67, 67,
63, 63, 33, 56, 73, 73, 18, 18, 42, 42, 19, 19, 62, 62, 79, 79,
38, 38, 27, 27, 60, 60, 55, 55, 29, 29, 20, 26, 26, 15, 15, 36,
36, 76, 76, 39, 39, 1, 1, 31, 31, 78, 51, 51, 14, 14, 37, 52,
52, 61, 61, 75, 75, 54, 54, 23, 23, 41, 41, 68, 68, 30, 30, 48,
48, 17, 17, 46, 46, 13, 25, 25, 40, 3, 32, 32, 8, 8, 16, 16,
70, 70, 35, 35, 45, 45, 21, 21, 9, 9, 34, 34, 58, 58, 74, 74,
49, 49, 47, 47, 66, 66, 28, 28, 64, 64, 80, 80, 11, 11, 12, 12,
77, 77, 65, 82, 86, 86, 81, 81, 85, 85, 87, 87), Occhio = c("OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD",
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS",
"OD", "OS", "OD", "OS", "OD", "OS"), G6PDcarente = c(0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
age = c(70, 70, 38, 38, 54, 54, 41, 74, 31, 31, 27, 27, 36,
36, 36, 36, 49, 49, 34, 34, 49, 49, 34, 34, 33, 33, 34, 34,
38, 38, 62, 30, 30, 38, 38, 53, 53, 27, 27, 57, 57, 84, 84,
25, 25, 26, 57, 57, 47, 47, 29, 31, 31, 26, 26, 23, 23, 34,
34, 48, 48, 34, 34, 34, 34, 40, 40, 45, 45, 33, 33, 61, 61,
73, 32, 32, 67, 80, 39, 39, 67, 67, 37, 37, 28, 28, 26, 26,
32, 32, 24, 24, 61, 61, 36, 36, 66, 66, 26, 26, 35, 35, 39,
39, 32, 32, 39, 39, 39, 39, 42, 42, 35, 35, 64, 64, 34, 34,
37, 61, 80, 80, 74, 74, 62, 62, 71, 71)), row.names = c(NA,
-128L), class = c("tbl_df", "tbl", "data.frame"))
The number of observations assigned to the Control / Treatment groups is exactly what they should be, since the assignment is based on the values in the G6PDcarente variable.
From the help file ?matchit:
(For the first argument in the function, formula) This argument
takes the usual syntax of R formula, treat ~ x1 + x2, where treat
is a binary treatment indicator and x1 and x2 are the
pre-treatment covariates.
In your case, the formula corresponds to G6PDcarente~age, and the number of observations where G6PDcarente == 1 is different from the number where G6PDcarente == 0.
We can verify that directly with a manual inspection, since the dataset is not very large:
library(dplyr)
library(tidyr)
new.data.check <- newdata %>%
count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
na.omit() # remove NA rows, where a specific age only has G6PDCarente == 0
# OR G6PDCarente == 1, but not both (i.e. unmatched samples)
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 4
2 27 2 2
3 31 2 2
4 32 2 4
5 34 6 8
6 37 1 2
7 38 2 4
8 39 2 6
9 49 2 2
10 61 1 4
11 62 2 1
12 67 2 1
13 74 2 1
14 80 2 1
For age values with both G6PDcarente == 0 and == 1, there are 31 observations for which G6PDcarente == 0 and 42 observations for which G6PDcarente == 1:
> colSums(new.data.check)
age 0 1
657 31 42
Not knowing your exact use case, I guess if you really want the same number for treatment vs. control, you can always drop a few observations...
Thanks to #Z.Lin reply I have figured out how to resolve my issues.
Here the code I have used following the instruction of this tutorial:
OCTA.Filtered = as.data.frame(na.omit(OCTA.Filtered))
m.out.test = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, ratio = 1)
test_data = match.data(m.out.test)
ps.sd = sd(test_data$distance)
# matching is performed below using propensity scores given the covariates mentioned below
# caliper = 0.25 times sd of propensity scores (optimal)
m.out = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, caliper = 0.25*ps.sd)
# check the sample sizes (below)
m.out
# Final matched data saved as final_data
final_data = match.data(m.out)
# (here distance = propensity score)
new.data.check <- final_data %>%
+ count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
+ spread(G6PDcarente, n) %>% # create separate columns for G6PDcarente == 0 / == 1
+ na.omit()
> new.data.check
# A tibble: 14 x 3
age `0` `1`
<dbl> <int> <int>
1 26 3 3
2 27 2 2
3 31 2 2
4 32 2 2
5 34 6 6
6 37 1 1
7 38 2 2
8 39 2 2
9 49 2 2
10 61 1 1
11 62 1 1
12 67 1 1
13 74 1 1
14 80 1 1
Related
r loop only returns one dataframe
I have a data structure like so: daynum year1 year2 year3 1jan 1 3 4 2jan 2 4 6 I want to coalesce the column titles to the day numbers into a new column, then stack the first 70 rows of each column into 1 new column. daynum = c("day1", "day2", "day3", "day4") year1 = c(1,2,3,4) year2 = c(5,6,7,8) year3 = c(7,8,9,10) df <- data.frame(daynum, year1, year2, year3) print(df) #create output data outside of loop summerdata = data.frame() for (i in names(df[,-1])){ yeardate = data.frame(paste(i, df$daynum, sep="-")) sdata = data.frame(df[[i]]) sur = sdata[1:70,] sy = yeardate[1:70,] endsummers = cbind(sy,sur) summerdata = rbind(summerdata,endsummers) } colnames(endsummers)=c("yeardate","summertemp") print(endsummers) all that returns in the first year1, and it's storing the numbers as text. output should look like this: yeardate summertemp [1,] "year1-day1" 85 [2,] "year1-day2" 87 [3,] "year1-day3" 79
library(tidyverse) df %>% pivot_longer(-daynum) %>% group_by(name) %>% slice(1:70) %>% ungroup() %>% # only first 70 rows per year unite("daynum", 2:1, sep = "-") %>% arrange(daynum) Result # A tibble: 12 × 2 daynum value <chr> <dbl> 1 year1-day1 1 2 year1-day2 2 3 year1-day3 3 4 year1-day4 4 5 year2-day1 5 6 year2-day2 6 7 year2-day3 7 8 year2-day4 8 9 year3-day1 7 10 year3-day2 8 11 year3-day3 9 12 year3-day4 10
So, there is actually nothing wrong with your loop, you are just changing the names of and printing endsummers, rather than summerdata. Also, to change to numeric, we can just add in as.numeric for sur when using cbind. #create output data outside of loop summerdata = data.frame() for (i in colnames(df[,-1])){ yeardate = data.frame(paste(i, df$daynum, sep="-")) sdata = data.frame(df[[i]]) sur = sdata[1:70,] sy = yeardate[1:70,] endsummers = cbind(sy,as.numeric(sur)) summerdata = rbind(summerdata,endsummers) } colnames(summerdata) <- c("yeardate","summertemp") print(summerdata) Output head(summerdata, 15) yeardate summertemp 1 year1-day1 1 2 year1-day2 2 3 year1-day3 3 4 year1-day4 4 5 year1-day1 5 6 year1-day2 6 7 year1-day3 7 8 year1-day4 8 9 year1-day1 9 10 year1-day2 10 11 year1-day3 11 12 year1-day4 12 13 year1-day1 13 14 year1-day2 14 15 year1-day3 15 Data df <- structure(list(daynum = c("day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4", "day1", "day2", "day3", "day4"), year1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100), year2 = c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), year3 = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110)), class = "data.frame", row.names = c(NA, -100L))
tidyverse and dplyr: Conditional replacement of values in a column based on other column [duplicate]
This question already has answers here: Can dplyr package be used for conditional mutating? (5 answers) Closed 2 years ago. I want to mutate a column A4 by A3 but reducing value of A3 by 1 if Total == 63. What am I doing wrong here? tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1)) The complete code with data is here library(tidyverse) tb1 <- structure( list( A1 = c(16, 11, 16, 18, 20, 19, 16, 18, 20, 15, 17, 19, 19, 19, 16, 19, 16, 15, 19, 19, 16, 18, 18, 19, 19, 18, 20, 18, 19, 19, 19, 19, 17, 19, 17, 16, 18, 19, 16, 18, 17, 19, 19, 20, 17, 16, 18, 16, 15, 19, 19, 17, 20, 18, 16, 19, 19, 15, 17, 17, 19, 19, 16, 17, 18, 19, 17, 19, 17, 15, 19, 16, 17 ) , A2 = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8 ) , A3 = c(33, 34, 38, 36, 36, 34, 41, 36, 40, 38, 38, 41, 38, 34, 33, 36, 41, 40, 41, 38, 41, 33, 40, 38, 40, 38, 41, 41, 40, 41, 40, 38, 34, 40, 36, 41, 40, 40, 33, 38, 36, 41, 40, 40, 28, 41, 40, 41, 33, 41, 36, 36, 40, 34, 41, 41, 38, 38, 41, 38, 41, 41, 36, 40, 38, 38, 40, 41, 38, 22, 36, 34, 38 ) , Total = c(57, 53, 62, 62, 64, 61, 65, 62, 68, 61, 63, 68, 65, 61, 57, 63, 65, 63, 68, 65, 65, 59, 66, 65, 67, 64, 69, 67, 67, 68, 67, 65, 59, 67, 61, 65, 66, 67, 57, 64, 61, 68, 67, 68, 53, 65, 66, 65, 56, 68, 63, 61, 68, 60, 65, 68, 65, 61, 66, 63, 68, 68, 60, 65, 64, 65, 65, 68, 63, 45, 63, 58, 63 ) ) , class = "data.frame" , row.names = c(NA, -73L) ) tb1 %>% filter(Total == 63) #> A1 A2 A3 Total #> 1 17 8 38 63 #> 2 19 8 36 63 #> 3 15 8 40 63 #> 4 19 8 36 63 #> 5 17 8 38 63 #> 6 17 8 38 63 #> 7 19 8 36 63 #> 8 17 8 38 63 tb2 <- tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1)) %>% mutate(Total = A1 + A2 + A3) #> Warning: Problem with `mutate()` input `A4`. #> x number of items to replace is not a multiple of replacement length #> ℹ Input `A4` is `replace(A3, Total == 63, A3 - 1)`. tb2 %>% filter(Total == 62) #> A1 A2 A3 Total #> 1 16 8 38 62 #> 2 18 8 36 62 #> 3 18 8 36 62
You are better using ifelse here : library(dplyr) tb1 %>% mutate(A4 = ifelse(Total == 63, A3 -1, A3)) As far as why replace does not work if you check the source code of replace : replace function (x, list, values) { x[list] <- values x } It assigns values to x after subsetting for list. When you use : tb1 %>% mutate(A4 = replace(A3, Total == 63, A3-1)) your values is of length length(tb1$A3) but list is of length sum(tb1$Total == 63) which do not match hence you get the warning of number of items to replace is not a multiple of replacement length, since it tries recycling those values but still the length is unequal. If you want to make replace work you can try : tb1 %>% mutate(A4 = replace(A3, Total == 63, A3[Total == 63] -1)) but again as I mentioned it is easier to just use ifelse here.
For loops with a dataframe in R
I got a dataframe (merged_df) with 52 columns (I show here only the first 4): Row.names node_demand Node 1 Node 2 1 Node 1 3 0 87 2 Node 10 6 58 52 3 Node 11 10 43 70 4 Node 12 18 94 8 5 Node 13 3 44 63 6 Node 14 6 21 98 7 Node 15 20 31 64 8 Node 16 4 35 76 9 Node 17 14 58 52 10 Node 18 11 19 71 11 Node 19 19 62 38 12 Node 2 14 87 0 13 Node 20 15 102 19 14 Node 21 15 16 76 15 Node 22 4 54 51 16 Node 23 13 59 75 17 Node 24 13 73 28 18 Node 25 5 82 33 19 Node 26 16 62 72 20 Node 27 3 59 30 21 Node 28 7 73 32 22 Node 29 14 45 48 23 Node 3 1 43 78 24 Node 30 17 69 44 25 Node 31 3 70 43 26 Node 32 3 15 87 27 Node 33 12 38 72 28 Node 34 14 62 81 29 Node 35 20 104 17 30 Node 36 13 18 77 31 Node 37 10 70 22 32 Node 38 9 65 46 33 Node 39 6 24 64 34 Node 4 14 68 23 35 Node 40 18 85 8 36 Node 41 7 20 95 37 Node 42 20 55 82 38 Node 43 9 94 16 39 Node 44 1 10 79 40 Node 45 8 62 63 41 Node 46 5 50 88 42 Node 47 1 70 50 43 Node 48 7 54 73 44 Node 49 9 52 43 45 Node 5 19 57 48 46 Node 50 2 4 86 47 Node 6 2 76 22 48 Node 7 14 79 60 49 Node 8 6 108 25 50 Node 9 7 101 18 The columns Node 1, Node 2 .....Node 45....Node 46 show the distance from the Node indicated on the column respect all the other nodes. I want to pick the closest nodes, and then to select all the nodes under which cumsum() node_demand is less than 120, starting from the first row. Since the first value is the distance between the main Node and itself I don't consider the first row. To do that for Node 1 I would do: test <- merged_df[,c(1,2,3)] # Columns 1 and 2 are fixed test <- test[(order(test[3])),][2:50,] # to get the closest distances first test<- test[cumsum(test$node_demand)< 120,] I then need to create a new variable for each node with the last value of the cumsum() node_1 <- tail(cumsum(test$`Node 1`), n=1) # 381 The output for node_1 would be 381 To do the same for node_2: test <- merged_df[,c(1,2,4)] #c(1,2,**4**) 4 instead of 3 as before test <- test[(order(test[3])),][2:50,] test<- test[cumsum(test$node_demand)< 120,] node_2 <- tail(cumsum(test$`Node 2`), n=1) The output for node_2 is 178 Since this process is very repetitive I guess a loop could do it but I am not sure how to create the different variables I need for(i in 3:52){ test <- merged_df[,c(1,2,i)] test <- merged_df[order(test[3]),][2:50] test<- test[cumsum(test$node_demand)< 120,] } node_1 <- tail(cumsum(test$test$`Node 1`), n=1) # should return 381 #I'm not sure how to create the variables node_1, node_2....node_50 The process to follow would be: Create a subset of the dataframe using columns 1, 2 and i (representing the number from Node 1 to Node 50. Sort the subset by the column Node i so the smallest distances are placed first. I need to select rows until cumsum(node_demand) < 120. (which is what I do using test<- test[cumsum(test$node_demand)< 120,]) Then I need to calculate cumsum(test$Node 1). This will give me the cumulative distance from all the nodes under the condition of cumsum(node_demand) < 120 Anybody could give me a hand? Many thanks! The output of dput() is : structure(list(Row.names = structure(c("Node 1", "Node 10", "Node 11", "Node 12", "Node 13", "Node 14", "Node 15", "Node 16", "Node 17", "Node 18", "Node 19", "Node 2", "Node 20", "Node 21", "Node 22", "Node 23", "Node 24", "Node 25", "Node 26", "Node 27", "Node 28", "Node 29", "Node 3", "Node 30", "Node 31", "Node 32", "Node 33", "Node 34", "Node 35", "Node 36", "Node 37", "Node 38", "Node 39", "Node 4", "Node 40", "Node 41", "Node 42", "Node 43", "Node 44", "Node 45", "Node 46", "Node 47", "Node 48", "Node 49", "Node 5", "Node 50", "Node 6", "Node 7", "Node 8", "Node 9"), class = "AsIs"), node_demand = c(3L, 6L, 10L, 18L, 3L, 6L, 20L, 4L, 14L, 11L, 19L, 14L, 15L, 15L, 4L, 13L, 13L, 5L, 16L, 3L, 7L, 14L, 1L, 17L, 3L, 3L, 12L, 14L, 20L, 13L, 10L, 9L, 6L, 14L, 18L, 7L, 20L, 9L, 1L, 8L, 5L, 1L, 7L, 9L, 19L, 2L, 2L, 14L, 6L, 7L ), `Node 1` = c(0, 58, 43, 94, 44, 21, 31, 35, 58, 19, 62, 87, 102, 16, 54, 59, 73, 82, 62, 59, 73, 45, 43, 69, 70, 15, 38, 62, 104, 18, 70, 65, 24, 68, 85, 20, 55, 94, 10, 62, 50, 70, 54, 52, 57, 4, 76, 79, 108, 101), `Node 2` = c(87, 52, 70, 8, 63, 98, 64, 76, 52, 71, 38, 0, 19, 76, 51, 75, 28, 33, 72, 30, 32, 48, 78, 44, 43, 87, 72, 81, 17, 77, 22, 46, 64, 23, 8, 95, 82, 16, 79, 63, 88, 50, 73, 43, 48, 86, 22, 60, 25, 18), `Node 3` = c(43, 28, 11, 84, 15, 35, 52, 68, 30, 45, 73, 78, 97, 43, 72, 20, 78, 57, 91, 58, 80, 58, 0, 42, 83, 29, 69, 94, 91, 51, 70, 36, 41, 70, 79, 33, 22, 78, 34, 25, 13, 86, 84, 35, 73, 46, 60, 43, 101, 94), `Node 4` = c(68, 50, 62, 30, 56, 82, 43, 53, 49, 51, 16, 23, 34, 57, 29, 71, 10, 44, 50, 15, 15, 26, 70, 46, 25, 71, 49, 58, 39, 57, 5, 47, 45, 0, 19, 79, 76, 37, 62, 61, 81, 31, 50, 36, 25, 67, 30, 65, 41, 35), `Node 5` = c(57, 62, 66, 54, 62, 74, 30, 33, 61, 39, 10, 48, 55, 45, 15, 79, 23, 66, 26, 30, 24, 16, 73, 62, 22, 64, 27, 34, 63, 42, 28, 62, 37, 25, 42, 71, 80, 62, 55, 73, 84, 24, 28, 48, 0, 55, 53, 81, 63, 58), `Node 6` = c(76, 34, 54, 27, 46, 84, 60, 73, 36, 63, 45, 22, 40, 68, 53, 58, 38, 16, 77, 25, 41, 46, 60, 29, 50, 73, 71, 85, 34, 71, 28, 29, 55, 30, 27, 82, 67, 20, 67, 44, 70, 56, 75, 28, 53, 76, 0, 44, 41, 35), `Node 7` = c(79, 24, 38, 62, 36, 75, 72, 93, 23, 72, 75, 60, 79, 72, 84, 28, 75, 30, 105, 58, 79, 69, 43, 20, 88, 69, 91, 112, 66, 80, 66, 20, 66, 65, 64, 73, 38, 53, 70, 18, 45, 93, 102, 35, 81, 82, 44, 0, 81, 75 ), `Node 8` = c(108, 75, 94, 23, 86, 120, 85, 91, 76, 91, 54, 25, 11, 97, 64, 98, 40, 52, 83, 49, 41, 67, 101, 67, 51, 109, 88, 91, 21, 98, 39, 69, 84, 41, 27, 117, 106, 28, 101, 85, 111, 56, 85, 66, 63, 107, 41, 81, 0, 7), `Node 9` = c(101, 68, 87, 17, 79, 113, 78, 86, 69, 85, 48, 18, 9, 90, 58, 91, 35, 46, 78, 42, 36, 60, 94, 60, 47, 102, 83, 87, 18, 91, 32, 62, 78, 35, 20, 110, 99, 23, 94, 79, 104, 52, 80, 59, 58, 100, 35, 75, 7, 0), `Node 10` = c(58, 0, 23, 57, 16, 58, 51, 70, 8, 50, 58, 52, 71, 51, 63, 24, 60, 29, 85, 40, 62, 48, 28, 16, 69, 49, 69, 91, 64, 59, 50, 8, 43, 50, 55, 56, 34, 50, 48, 12, 37, 73, 80, 14, 62, 60, 34, 24, 75, 68 ), `Node 11` = c(43, 23, 0, 76, 10, 37, 45, 65, 22, 41, 65, 70, 89, 39, 67, 17, 71, 52, 85, 51, 73, 51, 11, 34, 77, 31, 64, 89, 83, 47, 63, 30, 36, 62, 71, 35, 18, 72, 34, 21, 19, 81, 79, 27, 66, 46, 54, 38, 94, 87), `Node 12` = c(94, 57, 76, 0, 69, 104, 71, 83, 56, 78, 44, 8, 17, 83, 58, 79, 34, 36, 78, 38, 38, 55, 84, 47, 50, 94, 79, 87, 9, 85, 29, 50, 71, 30, 12, 101, 86, 14, 87, 67, 93, 56, 80, 49, 54, 94, 27, 62, 23, 17), `Node 13` = c(44, 16, 10, 69, 0, 43, 45, 63, 18, 40, 60, 63, 82, 40, 61, 22, 65, 44, 82, 44, 67, 47, 15, 30, 71, 34, 62, 86, 77, 48, 56, 24, 34, 56, 65, 41, 27, 64, 34, 19, 26, 74, 76, 21, 62, 47, 46, 36, 86, 79), `Node 14` = c(21, 58, 37, 104, 43, 0, 46, 56, 58, 35, 78, 98, 114, 30, 73, 51, 88, 86, 82, 71, 89, 60, 35, 70, 88, 12, 58, 82, 113, 36, 84, 66, 39, 82, 97, 3, 44, 103, 22, 58, 37, 89, 74, 57, 74, 25, 84, 75, 120, 113), `Node 15` = c(31, 51, 45, 71, 45, 46, 0, 27, 49, 12, 35, 64, 77, 16, 34, 61, 47, 68, 42, 39, 49, 19, 52, 56, 48, 37, 22, 45, 80, 16, 46, 55, 16, 43, 60, 43, 58, 74, 29, 59, 61, 49, 38, 40, 30, 30, 60, 72, 85, 78), `Node 16` = c(35, 70, 65, 83, 63, 56, 27, 0, 70, 26, 42, 76, 85, 32, 28, 82, 53, 85, 29, 48, 51, 30, 68, 77, 44, 48, 9, 28, 93, 23, 55, 74, 29, 53, 72, 54, 80, 87, 38, 79, 79, 42, 19, 59, 33, 31, 73, 93, 91, 86), `Node 17` = c(58, 8, 22, 56, 18, 58, 49, 70, 0, 49, 56, 52, 71, 50, 63, 23, 59, 31, 84, 41, 63, 47, 30, 12, 70, 49, 68, 90, 63, 57, 50, 10, 43, 49, 54, 55, 31, 51, 48, 14, 37, 74, 80, 14, 61, 60, 36, 23, 76, 69), `Node 18` = c(19, 50, 41, 78, 40, 35, 12, 26, 49, 0, 44, 71, 85, 8, 39, 58, 55, 71, 48, 43, 56, 26, 45, 58, 54, 27, 25, 50, 87, 8, 53, 55, 9, 51, 68, 33, 55, 80, 17, 57, 55, 55, 41, 41, 39, 18, 63, 72, 91, 85), `Node 19` = c(62, 58, 65, 44, 60, 78, 35, 42, 56, 44, 0, 38, 46, 50, 19, 77, 14, 58, 35, 24, 18, 18, 73, 56, 21, 68, 37, 44, 53, 48, 19, 57, 40, 16, 32, 75, 79, 53, 58, 69, 84, 25, 37, 44, 10, 60, 45, 75, 54, 48), `Node 20` = c(102, 71, 89, 17, 82, 114, 77, 85, 71, 85, 46, 19, 0, 90, 58, 94, 33, 51, 75, 44, 35, 60, 97, 63, 46, 103, 81, 84, 18, 91, 32, 66, 78, 34, 19, 111, 101, 28, 95, 82, 107, 51, 78, 62, 55, 101, 40, 79, 11, 9), `Node 21` = c(16, 51, 39, 83, 40, 30, 16, 32, 50, 8, 50, 76, 90, 0, 46, 55, 62, 74, 54, 49, 63, 33, 43, 59, 61, 22, 30, 56, 92, 10, 59, 57, 15, 57, 73, 27, 51, 84, 16, 57, 51, 62, 48, 44, 45, 17, 68, 72, 97, 90), `Node 22` = c(54, 63, 67, 58, 61, 73, 34, 28, 63, 39, 19, 51, 58, 46, 0, 81, 26, 67, 26, 28, 23, 19, 72, 65, 16, 63, 26, 33, 67, 42, 29, 64, 35, 29, 46, 70, 83, 64, 52, 74, 84, 16, 23, 49, 15, 52, 53, 84, 64, 58), `Node 23` = c(59, 24, 17, 79, 22, 51, 61, 82, 23, 58, 77, 75, 94, 55, 81, 0, 81, 50, 100, 62, 84, 65, 20, 33, 90, 46, 80, 105, 85, 64, 73, 29, 53, 71, 77, 49, 13, 73, 51, 14, 19, 94, 95, 35, 79, 63, 58, 28, 98, 91), `Node 24` = c(73, 60, 71, 34, 65, 88, 47, 53, 59, 55, 14, 28, 33, 62, 26, 81, 0, 53, 44, 22, 7, 29, 78, 56, 18, 77, 49, 53, 43, 60, 10, 57, 50, 10, 22, 85, 85, 43, 68, 71, 90, 24, 47, 46, 23, 71, 38, 75, 40, 35), `Node 25` = c(82, 29, 52, 36, 44, 86, 68, 85, 31, 71, 58, 33, 51, 74, 67, 50, 53, 0, 91, 39, 56, 58, 57, 22, 66, 76, 82, 99, 40, 79, 43, 22, 62, 44, 39, 84, 61, 25, 72, 36, 65, 72, 89, 30, 66, 83, 16, 30, 52, 46), `Node 26` = c(62, 85, 85, 78, 82, 82, 42, 29, 84, 48, 35, 72, 75, 54, 26, 100, 44, 91, 0, 53, 43, 37, 91, 86, 34, 74, 25, 9, 86, 46, 51, 86, 50, 50, 66, 80, 99, 86, 64, 95, 101, 30, 12, 71, 26, 59, 77, 105, 83, 78), `Node 27` = c(59, 40, 51, 38, 44, 71, 39, 48, 41, 43, 24, 30, 44, 49, 28, 62, 22, 39, 53, 0, 23, 23, 58, 40, 30, 60, 46, 60, 48, 50, 14, 39, 35, 15, 29, 69, 67, 39, 51, 52, 69, 34, 50, 27, 30, 58, 25, 58, 49, 42), `Node 28` = c(73, 62, 73, 38, 67, 89, 49, 51, 63, 56, 18, 32, 35, 63, 23, 84, 7, 56, 43, 23, 0, 31, 80, 60, 12, 78, 48, 51, 46, 61, 14, 60, 51, 15, 27, 86, 89, 46, 68, 74, 92, 18, 44, 49, 24, 71, 41, 79, 41, 36), `Node 29` = c(45, 48, 51, 55, 47, 60, 19, 30, 47, 26, 18, 48, 60, 33, 19, 65, 29, 58, 37, 23, 31, 0, 58, 50, 31, 50, 26, 43, 64, 31, 28, 50, 23, 26, 44, 57, 66, 59, 40, 59, 69, 33, 34, 35, 16, 43, 46, 69, 67, 60), `Node 30` = c(69, 16, 34, 47, 30, 70, 56, 77, 12, 58, 56, 44, 63, 59, 65, 33, 56, 22, 86, 40, 60, 50, 42, 0, 69, 61, 74, 93, 53, 66, 47, 9, 51, 46, 46, 67, 41, 41, 59, 21, 49, 74, 84, 19, 62, 70, 29, 20, 67, 60), `Node 31` = c(70, 69, 77, 50, 71, 88, 48, 44, 70, 54, 21, 43, 46, 61, 16, 90, 18, 66, 34, 30, 12, 31, 83, 69, 0, 77, 42, 42, 58, 58, 24, 68, 50, 25, 39, 86, 94, 57, 67, 81, 95, 6, 34, 55, 22, 68, 50, 88, 51, 47), `Node 32` = c(15, 49, 31, 94, 34, 12, 37, 48, 49, 27, 68, 87, 103, 22, 63, 46, 77, 76, 74, 60, 78, 50, 29, 61, 77, 0, 50, 75, 102, 29, 73, 57, 28, 71, 86, 9, 42, 92, 11, 51, 35, 78, 66, 47, 64, 19, 73, 69, 109, 102), `Node 33` = c(38, 69, 64, 79, 62, 58, 22, 9, 68, 25, 37, 72, 81, 30, 26, 80, 49, 82, 25, 46, 48, 26, 69, 74, 42, 50, 0, 26, 88, 22, 51, 72, 28, 49, 67, 55, 79, 84, 40, 78, 79, 40, 18, 57, 27, 34, 71, 91, 88, 83), `Node 34` = c(62, 91, 89, 87, 86, 82, 45, 28, 90, 50, 44, 81, 84, 56, 33, 105, 53, 99, 9, 60, 51, 43, 94, 93, 42, 75, 26, 0, 96, 47, 60, 93, 53, 58, 75, 80, 103, 95, 65, 101, 104, 37, 12, 77, 34, 58, 85, 112, 91, 87), `Node 35` = c(104, 64, 83, 9, 77, 113, 80, 93, 63, 87, 53, 17, 18, 92, 67, 85, 43, 40, 86, 48, 46, 64, 91, 53, 58, 102, 88, 96, 0, 94, 38, 57, 80, 39, 21, 110, 93, 17, 96, 73, 100, 64, 89, 57, 63, 103, 34, 66, 21, 18), `Node 36` = c(18, 59, 47, 85, 48, 36, 16, 23, 57, 8, 48, 77, 91, 10, 42, 64, 60, 79, 46, 50, 61, 31, 51, 66, 58, 29, 22, 47, 94, 0, 59, 64, 18, 57, 74, 34, 61, 87, 21, 65, 60, 57, 40, 49, 42, 16, 71, 80, 98, 91), `Node 37` = c(70, 50, 63, 29, 56, 84, 46, 55, 50, 53, 19, 22, 32, 59, 29, 73, 10, 43, 51, 14, 14, 28, 70, 47, 24, 73, 51, 60, 38, 59, 0, 48, 47, 5, 18, 81, 78, 35, 64, 62, 82, 30, 52, 37, 28, 69, 28, 66, 39, 32), `Node 38` = c(65, 8, 30, 50, 24, 66, 55, 74, 10, 55, 57, 46, 66, 57, 64, 29, 57, 22, 86, 39, 60, 50, 36, 9, 68, 57, 72, 93, 57, 64, 48, 0, 48, 47, 50, 63, 39, 44, 55, 16, 44, 73, 83, 16, 62, 66, 29, 20, 69, 62), `Node 39` = c(24, 43, 36, 71, 34, 39, 16, 29, 43, 9, 40, 64, 78, 15, 35, 53, 50, 62, 50, 35, 51, 23, 41, 51, 50, 28, 28, 53, 80, 18, 47, 48, 0, 45, 62, 36, 53, 72, 18, 51, 52, 51, 43, 33, 37, 23, 55, 66, 84, 78), `Node 40` = c(85, 55, 71, 12, 65, 97, 60, 72, 54, 68, 32, 8, 19, 73, 46, 77, 22, 39, 66, 29, 27, 44, 79, 46, 39, 86, 67, 75, 21, 74, 18, 50, 62, 19, 0, 94, 83, 23, 78, 66, 89, 45, 68, 44, 42, 84, 27, 64, 27, 20), `Node 41` = c(20, 56, 35, 101, 41, 3, 43, 54, 55, 33, 75, 95, 111, 27, 70, 49, 85, 84, 80, 69, 86, 57, 33, 67, 86, 9, 55, 80, 110, 34, 81, 63, 36, 79, 94, 0, 42, 100, 20, 56, 36, 86, 72, 55, 71, 24, 82, 73, 117, 110), `Node 42` = c(55, 34, 18, 86, 27, 44, 58, 80, 31, 55, 79, 82, 101, 51, 83, 13, 85, 61, 99, 67, 89, 66, 22, 41, 94, 42, 79, 103, 93, 61, 78, 39, 53, 76, 83, 42, 0, 82, 48, 26, 16, 97, 95, 42, 80, 59, 67, 38, 106, 99), `Node 43` = c(94, 50, 72, 14, 64, 103, 74, 87, 51, 80, 53, 16, 28, 84, 64, 73, 43, 25, 86, 39, 46, 59, 78, 41, 57, 92, 84, 95, 17, 87, 35, 44, 72, 37, 23, 100, 82, 0, 86, 59, 87, 63, 87, 45, 62, 94, 20, 53, 28, 23), `Node 44` = c(10, 48, 34, 87, 34, 22, 29, 38, 48, 17, 58, 79, 95, 16, 52, 51, 68, 72, 64, 51, 68, 40, 34, 59, 67, 11, 40, 65, 96, 21, 64, 55, 18, 62, 78, 20, 48, 86, 0, 52, 43, 67, 55, 43, 55, 13, 67, 70, 101, 94), `Node 45` = c(62, 12, 21, 67, 19, 58, 59, 79, 14, 57, 69, 63, 82, 57, 74, 14, 71, 36, 95, 52, 74, 59, 25, 21, 81, 51, 78, 101, 73, 65, 62, 16, 51, 61, 66, 56, 26, 59, 52, 0, 30, 85, 91, 26, 73, 65, 44, 18, 85, 79), `Node 46` = c(50, 37, 19, 93, 26, 37, 61, 79, 37, 55, 84, 88, 107, 51, 84, 19, 90, 65, 101, 69, 92, 69, 13, 49, 95, 35, 79, 104, 100, 60, 82, 44, 52, 81, 89, 36, 16, 87, 43, 30, 0, 98, 95, 45, 84, 54, 70, 45, 111, 104), `Node 47` = c(70, 73, 81, 56, 74, 89, 49, 42, 74, 55, 25, 50, 51, 62, 16, 94, 24, 72, 30, 34, 18, 33, 86, 74, 6, 78, 40, 37, 64, 57, 30, 73, 51, 31, 45, 86, 97, 63, 67, 85, 98, 0, 30, 60, 24, 67, 56, 93, 56, 52), `Node 48` = c(54, 80, 79, 80, 76, 74, 38, 19, 80, 41, 37, 73, 78, 48, 23, 95, 47, 89, 12, 50, 44, 34, 84, 84, 34, 66, 18, 12, 89, 40, 52, 83, 43, 50, 68, 72, 95, 87, 55, 91, 95, 30, 0, 67, 28, 50, 75, 102, 85, 80), `Node 49` = c(52, 14, 27, 49, 21, 57, 40, 59, 14, 41, 44, 43, 62, 44, 49, 35, 46, 30, 71, 27, 49, 35, 35, 19, 55, 47, 57, 77, 57, 49, 37, 16, 33, 36, 44, 55, 42, 45, 43, 26, 45, 60, 67, 0, 48, 53, 28, 35, 66, 59), `Node 50` = c(4, 60, 46, 94, 47, 25, 30, 31, 60, 18, 60, 86, 101, 17, 52, 63, 71, 83, 59, 58, 71, 43, 46, 70, 68, 19, 34, 58, 103, 16, 69, 66, 23, 67, 84, 24, 59, 94, 13, 65, 54, 67, 50, 53, 55, 0, 76, 82, 107, 100 )), .Names = c("Row.names", "node_demand", "Node 1", "Node 2", "Node 3", "Node 4", "Node 5", "Node 6", "Node 7", "Node 8", "Node 9", "Node 10", "Node 11", "Node 12", "Node 13", "Node 14", "Node 15", "Node 16", "Node 17", "Node 18", "Node 19", "Node 20", "Node 21", "Node 22", "Node 23", "Node 24", "Node 25", "Node 26", "Node 27", "Node 28", "Node 29", "Node 30", "Node 31", "Node 32", "Node 33", "Node 34", "Node 35", "Node 36", "Node 37", "Node 38", "Node 39", "Node 40", "Node 41", "Node 42", "Node 43", "Node 44", "Node 45", "Node 46", "Node 47", "Node 48", "Node 49", "Node 50"), class = "data.frame", row.names = c(NA, -50L))
You can try a tidyverse library(tidyverse) d %>% as.tibble() %>% gather(k,v, -node_demand, -Row.names) %>% arrange(k, v) %>% group_by(k) %>% filter(Row.names != k) %>% filter(cumsum(node_demand)<120) %>% summarise(sum(v)) # A tibble: 50 x 2 k `sum(v)` <chr> <dbl> 1 Node 1 381 2 Node 10 202 3 Node 11 332 4 Node 12 186 5 Node 13 262 6 Node 14 419 7 Node 15 282 8 Node 16 279 9 Node 17 272 10 Node 18 302 # ... with 40 more rows Prove result for Node 1 and 2: .Last.value %>% filter(k %in% c("Node 1", "Node 2")) # A tibble: 2 x 2 k `sum(v)` <chr> <dbl> 1 Node 1 381 2 Node 2 178 The idea is to transform the data from long to wide. After arranging, we group by Node (column k) and filter 1) "self-nodes" and 2) cumsum<120. Finally calculate the sum for each Node.
creating named vector from a csv file did not work
Creating named vector where names are associated to GO id from a csv file did not work. > head(read.delim("~/GOmapping.tsv", sep = '\t')) V1 V14 1 sp0000005 GO:0003723 2 sp0000006 GO:0016021 3 sp0000007 GO:0003700,GO:0006355,GO:0043565 4 sp0000016 GO:0046983 5 sp0000017 GO:0004672,GO:0005524,GO:0006468 6 sp0000022 GO:0003677,GO:0046983 > head(read.delim("~/GOmapping.tsv", sep = '\t'))[1] V1 1 sp0000005 2 sp0000006 3 sp0000007 4 sp0000016 5 sp0000017 6 sp0000022 > head(read.delim("~/GOmapping.tsv", sep = '\t'))[2] V14 1 GO:0003723 2 GO:0016021 3 GO:0003700,GO:0006355,GO:0043565 4 GO:0046983 5 GO:0004672,GO:0005524,GO:0006468 6 GO:0003677,GO:0046983 > geneID2GO <- read.delim("~/GOmapping.tsv", sep = '\t'))[2] > geneID2GO <- read.delim("~/GOmapping.tsv", sep = '\t')[2] > names(geneID2GO) <- read.delim("~/GOmapping.tsv", sep = '\t')[1] > head(geneID2GO) c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 57, 58, 59, 60, 6 ... 1 GO:0003723 2 GO:0016021 3 GO:0003700,GO:0006355,GO:0043565 4 GO:0046983 5 GO:0004672,GO:0005524,GO:0006468 6 GO:0003677,GO:0046983 What did I miss? Thank you in advance.
If you want a vector as result, maybe you could try to coerce your values and names (column 1) to character. data <- read.delim("~/GOmapping.tsv", sep = '\t') geneID2GO <- as.character(data[,2]) names(geneID2GO) <- as.character(data[,1]) head(geneID2GO) sp0000005 sp0000006 sp0000007 "GO:0003723" "GO:0016021" "GO:0003700,GO:0006355,GO:0043565" sp0000016 "GO:0046983" Alternatively, you can display the result as follows: cbind(geneID2GO) geneID2GO sp0000005 "GO:0003723" sp0000006 "GO:0016021" sp0000007 "GO:0003700,GO:0006355,GO:0043565" sp0000016 "GO:0046983"
groupby + rowMean with dplyr
I have a large data frame, which includes impoTand nlc as key (ignore t all together), and other columns which each hold a number. I want to find, for each impoTand nlc pair, the average of all the other columns, or basically a rowMean. A subset of my data, which only includes one nlc, is given at the end. The last thing I tried was : avg <- data.frame(a %>% group_by(impoT, nlc) %>% select(-c(1:3)) %>% mutate(r= rowMeans(.) )) stds = (a %>% group_by(impoT, nlc) %>% select(-c(1:3)) %>% apply( 1, sd)) #wrong dput(a) structure(list(impoT = 1:18, nlc = c(669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L), t = c(102L, 118L, 134L, 150L, 166L, 182L, 198L, 214L, 230L, 246L, 262L, 278L, 294L, 310L, 326L, 342L, 358L, 374L), X11950 = c(6, 14, 40, 53, 59, 70, 118, 119, 111, 114, 103, 220, 278, 94, 28, 13, 5, 8), X11951 = c(4, 18, 41, 64, 78, 87, 140, 112, 113, 129, 112, 245, 322, 102, 52, 20, 15, 7), X11952 = c(8, 13, 30, 42, 52, 86, 126, 118, 52, 87, 116, 251, 262, 101, 35, 21, 15, 21), X11955 = c(9, 11, 47, 38, 39, 70, 95, 82, 80, 77, 77, 142, 192, 78, 13, 13, 5, 0), X11956 = c(14, 13, 44, 65, 65, 72, 125, 138, 117, 111, 104, 175, 282, 93, 28, 14, 8, 4), X11957 = c(10, 7, 45, 42, 50, 83, 123, 102, 104, 82, 102, 234, 265, 101, 23, 13, 7, 6), X11958 = c(10, 13, 42, 60, 68, 69, 106, 125, 104, 103, 112, 233, 310, 128, 50, 22, 10, 5), X11959 = c(7, 11, 32, 45, 63, 74, 119, 87, 121, 108, 104, 229, 266, 111, 46, 26, 22, 11), X11962 = c(8, 12, 38, 35, 49, 58, 96, 66, 73, 109, 82, 161, 192, 75, 22, 4, 2, 3), X11963 = c(8, 9, 39, 40, 56, 50, 142, 98, 102, 78, 79, 220, 229, 87, 25, 5, 7, 2), X11964 = c(10, 9, 42, 60, 53, 52, 105, 114, 96, 94, 95, 180, 268, 114, 23, 10, 7, 10), X11965 = c(9, 9, 41, 40, 61, 81, 150, 102, 102, 121, 125, 222, 347, 116, 37, 18, 3, 4), X11966 = c(10, 9, 34, 43, 49, 73, 112, 123, 102, 92, 107, 207, 239, 115, 60, 18, 15, 5), X11969 = c(8, 9, 31, 34, 41, 51, 93, 92, 68, 103, 76, 166, 182, 63, 24, 14, 6, 4), X11970 = c(7, 12, 33, 48, 56, 59, 102, 88, 99, 86, 103, 194, 233, 90, 25, 13, 7, 3), X11971 = c(9, 16, 37, 60, 78, 62, 114, 106, 129, 107, 91, 212, 272, 88, 31, 10, 3, 3), X12088 = c(6, 11, 41, 44, 56, 70, 106, 97, 64, 73, 75, 161, 186, 76, 17, 8, 2, 2), X12089 = c(0, 11, 53, 59, 62, 64, 114, 109, 109, 100, 66, 222, 241, 88, 19, 8, 8, 3), X12090 = c(4, 12, 57, 52, 65, 73, 132, 109, 120, 101, 104, 227, 238, 99, 17, 8, 10, 8), X12091 = c(4, 16, 54, 167, 74, 62, 111, 95, 120, 102, 92, 227, 317, 106, 44, 16, 10, 4), X12092 = c(9, 10, 50, 55, 63, 64, 130, 103, 98, 116, 83, 249, 279, 88, 35, 36, 22, 15), X12095 = c(5, 15, 39, 44, 53, 58, 95, 92, 67, 63, 69, 163, 182, 69, 20, 8, 4, 2), X12096 = c(3, 14, 49, 53, 71, 70, 107, 130, 90, 89, 101, 214, 253, 100, 30, 10, 3, 3), X12097 = c(2, 16, 53, 61, 82, 83, 123, 124, 125, 98, 89, 220, 274, 107, 20, 17, 7, 5), X12098 = c(6, 17, 56, 59, 51, 77, 102, 115, 93, 98, 83, 221, 288, 97, 36, 16, 9, 10), X12099 = c(2, 16, 39, 49, 60, 84, 112, 91, 102, 103, 108, 246, 261, 131, 49, 24, 18, 14), X12102 = c(4, 12, 29, 47, 64, 69, 104, 111, 92, 72, 105, 174, 179, 64, 16, 10, 2, 1)), .Names = c("impoT", "nlc", "t", "X11950", "X11951", "X11952", "X11955", "X11956", "X11957", "X11958", "X11959", "X11962", "X11963", "X11964", "X11965", "X11966", "X11969", "X11970", "X11971", "X12088", "X12089", "X12090", "X12091", "X12092", "X12095", "X12096", "X12097", "X12098", "X12099", "X12102" ), row.names = c(NA, -18L), class = "data.frame")
It's easiest if you split your means into two steps, as you're actually taking the mean of irregular groups: first each row, and second each group. This means you're taking the means of means, but given each of the row means is of the same amount of numbers, they should be fine that way, although you should consider that the grouping means may be means of different amounts of rows. You also need to nest the select so you don't lose your grouping variables, and use summarise to collapse the groups. All told, a %>% mutate(r = rowMeans(select(a, -c(1:3)))) %>% group_by(impoT, nlc) %>% summarise(r = mean(r)) produces Source: local data frame [18 x 3] Groups: impoT [?] impoT nlc r (int) (int) (dbl) 1 1 669 6.740741 2 2 669 12.407407 3 3 669 42.074074 4 4 669 54.037037 5 5 669 59.925926 6 6 669 69.296296 7 7 669 114.888889 8 8 669 105.481481 9 9 669 98.259259 10 10 669 96.888889 11 11 669 94.925926 12 12 669 207.962963 13 13 669 253.222222 14 14 669 95.592593 15 15 669 30.555556 16 16 669 14.629630 17 17 669 8.592593 18 18 669 6.037037