This question already has answers here:
All combinations of all sizes?
(2 answers)
Unordered combinations of all lengths
(3 answers)
Closed 4 years ago.
I would like to build a dataframe that lists all possible combinations of 6 numbers.
I realised that I can use combn(), but with only one value for m. With a bit of playing around I got the desired result by going through step-by-step with the following code -
combi1 <- data.frame(c(1:6))
colnames(combi1) <- 'X1'
combi2 <- data.frame(t(combn(c(1:6), 2)))
combi3 <- data.frame(t(combn(c(1:6), 3)))
combi4 <- data.frame(t(combn(c(1:6), 4)))
combi5 <- data.frame(t(combn(c(1:6), 5)))
combi6 <- data.frame(t(combn(c(1:6), 6)))
Combi <- rbind.fill(combi1, combi2, combi3, combi4, combi5, combi6)
I had to transpose each of the DFs to get them in the right shape.
My problem is that this seems to be quite an in-efficient method. Maybe a bit simplistic. I thought there must surely be some quicker way to code this, but haven't found any solution online that gives me what I'd like.
Possibly build it into a function or a loop somehow? I'm fairly new to R though and haven't had a great deal of practice writing functions.
Is it what you want ?
combis <- vector("list", 6)
combi1 <- data.frame(c(1:6))
colnames(combi1) <- 'X1'
combis[[1]] <- combi1
combis[2:6] <- lapply(2:6, function(n) data.frame(t(combn(c(1:6), n))))
do.call(plyr::rbind.fill, combis)
Result:
X1 X2 X3 X4 X5 X6
1 1 NA NA NA NA NA
2 2 NA NA NA NA NA
3 3 NA NA NA NA NA
4 4 NA NA NA NA NA
5 5 NA NA NA NA NA
6 6 NA NA NA NA NA
7 1 2 NA NA NA NA
8 1 3 NA NA NA NA
9 1 4 NA NA NA NA
10 1 5 NA NA NA NA
11 1 6 NA NA NA NA
12 2 3 NA NA NA NA
13 2 4 NA NA NA NA
14 2 5 NA NA NA NA
15 2 6 NA NA NA NA
16 3 4 NA NA NA NA
17 3 5 NA NA NA NA
18 3 6 NA NA NA NA
19 4 5 NA NA NA NA
20 4 6 NA NA NA NA
21 5 6 NA NA NA NA
22 1 2 3 NA NA NA
23 1 2 4 NA NA NA
24 1 2 5 NA NA NA
25 1 2 6 NA NA NA
26 1 3 4 NA NA NA
27 1 3 5 NA NA NA
28 1 3 6 NA NA NA
29 1 4 5 NA NA NA
30 1 4 6 NA NA NA
31 1 5 6 NA NA NA
32 2 3 4 NA NA NA
33 2 3 5 NA NA NA
34 2 3 6 NA NA NA
35 2 4 5 NA NA NA
36 2 4 6 NA NA NA
37 2 5 6 NA NA NA
38 3 4 5 NA NA NA
39 3 4 6 NA NA NA
40 3 5 6 NA NA NA
41 4 5 6 NA NA NA
42 1 2 3 4 NA NA
43 1 2 3 5 NA NA
44 1 2 3 6 NA NA
45 1 2 4 5 NA NA
46 1 2 4 6 NA NA
47 1 2 5 6 NA NA
48 1 3 4 5 NA NA
49 1 3 4 6 NA NA
50 1 3 5 6 NA NA
51 1 4 5 6 NA NA
52 2 3 4 5 NA NA
53 2 3 4 6 NA NA
54 2 3 5 6 NA NA
55 2 4 5 6 NA NA
56 3 4 5 6 NA NA
57 1 2 3 4 5 NA
58 1 2 3 4 6 NA
59 1 2 3 5 6 NA
60 1 2 4 5 6 NA
61 1 3 4 5 6 NA
62 2 3 4 5 6 NA
63 1 2 3 4 5 6
Related
I have generated random data like this.
data <- replicate(10,sample(0:9,10,rep=FALSE))
ind <- which(data %in% sample(data, 5))
#now replace those indices in data with NA
data[ind]<-NA
#here is our vector with 15 random NAs
data = as.data.frame(data)
rownames(data) = 1:10
colnames(data) = 1:10
data
which results in a data frame like this. How can I reorder the entry value such that if the entry is numeric, then the value will be placed in a (row number - 1), and NA will be put in any rows where there is no value matching the (row number -1). The data I want, for example, the first column, should look like this
.
How can I do this? I have no clue at all. We can order decreasing or increasing and put NA in the last order, but that is not what I want.
You can make a helper function to assign values to indices at (values + 1), then apply the function over all columns:
fx <- function(x) {
vals <- x[!is.na(x)]
pos <- vals + 1
out <- rep(NA, length(x))
out[pos] <- vals
out
}
as.data.frame(sapply(data, fx))
1 2 3 4 5 6 7 8 9 10
1 NA 0 NA 0 0 0 0 NA 0 0
2 NA NA NA 1 1 NA NA NA NA NA
3 2 NA 2 2 NA NA NA NA 2 NA
4 3 NA 3 3 NA NA 3 NA 3 3
5 4 4 4 4 NA 4 NA 4 4 NA
6 5 5 NA 5 NA NA 5 5 5 NA
7 NA 6 6 NA 6 NA NA 6 NA NA
8 7 NA 7 7 NA 7 7 NA NA 7
9 NA NA NA NA 8 8 8 8 8 8
10 9 9 NA NA 9 NA NA 9 NA 9
Starting data:
set.seed(13)
data <- replicate(10, sample(
c(0:9, rep(NA, 10)),
10,
replace = FALSE
))
data <- as.data.frame(data)
colnames(data) <- 1:10
data
1 2 3 4 5 6 7 8 9 10
1 2 NA NA 2 NA NA 0 NA 3 7
2 4 NA NA 4 NA NA NA NA 2 9
3 9 9 NA 3 9 4 NA 6 4 0
4 NA NA NA 1 6 NA NA 4 NA NA
5 5 6 3 0 NA NA 5 8 8 NA
6 NA NA 7 NA NA NA 7 NA 5 3
7 3 4 6 NA 1 0 NA 5 NA NA
8 NA NA NA 7 0 7 NA NA 0 NA
9 NA 0 4 NA 8 8 8 9 NA 8
10 7 5 2 5 NA NA 3 NA NA NA
I am trying to store all rows with NA for my columns Math_G1, Math_G2 and Math_G3 into a dataset variable. However when I do this, there are additional rows that pops up which have NA as values throughout all its attributes including its row number (eg. NA.1, NA.2 ...) How do I fix this?
I have already tried to use the c() function to attempt to filter out all these results but these rows are still there, in addition to this, i have also used the which() function but they are still there.
Here is my code :
dat <- read.csv(file = "final merged.csv", stringsAsFactors=FALSE, na.strings=c("NA", "NULL"))
dat_small <- dat[c("age","traveltime","studytime",
"failures","famrel","freetime","goout","Dalc","Walc",
"health","absences","Math_G1","Math_G2","Math_G3","Por_G1","Por_G2","Por_G3","DoubleSub")]
sample_size <- 500
all_set <- sample(1:length(dat[,1]),sample_size,replace = F)
dat <- dat_small[all_set,]
index_na_math <- which(is.na(c(dat$Math_G1,dat$Math_G2,dat$Math_G3)))
index_na_por <- which(is.na(c(dat$Por_G1,dat$Por_G2,dat$Por_G3)))
index_na_both <- c(index_na_math,index_na_por)
#each row of my dataset helps define a specific student
#portugese and math are subjects that students within the dataset takes
dat_purepor <- dat[which(index_na_math),] #students who takes only portugese
dat_puremath <- dat[c(index_na_por),] # students who takes only math
dat_math <- dat[c(-index_na_math),] #students who takes math + students who take both
dat_por <- dat[c(-index_na_por),] #students who take portugese + students who take both
dat_both <- dat[c(-index_na_both),] #students who takes both math and portugese
dat_purepor
dat_puremath
I expected the output to be filtered according to my conditions but without any rows with NA as the values for all its columns so I don't understand why the final results return NA.
Here is a preview of the dataset dat_small:
> dat_small
age traveltime studytime failures famrel freetime goout Dalc Walc health absences Math_G1 Math_G2 Math_G3 Por_G1 Por_G2 Por_G3 DoubleSub
1 18 2 2 0 4 3 4 1 1 3 6 5 6 6 13 13 13 1
2 17 1 2 0 5 3 3 1 1 3 4 5 5 6 15 15 15 1
3 15 1 2 3 4 3 2 2 3 3 10 7 8 10 10 12 13 1
4 15 1 3 0 3 2 2 1 1 5 2 15 14 15 14 14 14 1
5 16 1 2 0 4 3 2 1 2 5 4 6 10 10 13 13 13 1
6 16 1 2 0 5 4 2 1 2 5 10 15 15 15 10 13 13 1
7 16 1 2 0 4 4 4 1 1 3 0 12 12 11 14 14 16 1
8 17 2 2 0 4 1 4 1 1 1 6 6 5 6 12 13 13 1
9 15 1 2 0 4 2 2 1 1 1 0 16 18 19 13 17 17 1
10 15 1 2 0 5 5 1 1 1 5 0 14 15 15 9 10 11 1
11 15 1 2 0 3 3 3 1 2 2 0 10 8 9 15 15 15 1
12 15 3 3 0 5 2 2 1 1 4 4 10 12 12 10 12 13 1
13 15 1 1 0 4 3 3 1 3 5 2 14 14 14 13 14 15 1
14 15 2 2 0 5 4 3 1 2 3 2 10 10 11 14 14 14 1
15 15 1 3 0 4 5 2 1 1 3 0 14 16 16 11 12 14 1
16 16 1 1 0 4 4 4 1 2 2 4 14 14 14 9 8 9 1
17 16 1 3 0 3 2 3 1 2 2 6 13 14 14 10 10 16 1
18 16 3 2 0 5 3 2 1 1 4 4 8 10 10 11 11 11 1
19 17 1 1 3 5 5 5 2 4 5 16 6 5 5 10 13 13 1
20 16 1 1 0 3 1 3 1 3 5 4 8 10 10 14 14 14 1
21 15 1 2 0 4 4 1 1 1 1 0 13 14 15 9 8 10 1
22 15 1 1 0 5 4 2 1 1 5 0 12 15 15 10 13 13 1
23 16 1 2 0 4 5 1 1 3 5 2 15 15 16 11 10 11 1
24 16 2 2 0 5 4 4 2 4 5 0 13 13 12 14 14 14 1
25 15 1 3 0 4 3 2 1 1 5 2 10 9 8 10 11 10 1
26 16 1 1 2 1 2 2 1 3 5 14 6 9 8 13 13 13 1
27 15 1 1 0 4 2 2 1 2 5 2 12 12 11 12 11 12 1
28 15 1 1 0 2 2 4 2 4 1 4 15 16 15 14 12 12 1
29 16 1 2 0 5 3 3 1 1 5 4 11 11 11 10 10 1 1
30 16 1 2 0 4 4 5 5 5 5 16 10 12 11 9 12 12 1
31 15 1 2 0 5 4 2 3 4 5 0 9 11 12 9 10 11 1
32 15 2 2 0 4 3 1 1 1 5 0 17 16 17 14 14 16 1
33 15 1 2 0 4 5 2 1 1 5 0 17 16 16 14 14 16 1
34 15 1 2 0 5 3 2 1 1 2 0 8 10 12 10 13 13 1
35 16 1 1 0 5 4 3 1 1 5 0 12 14 15 9 12 12 1
36 15 2 1 0 3 5 1 1 1 5 0 8 7 6 14 13 12 1
37 15 1 3 0 5 4 3 1 1 4 2 15 16 18 14 14 16 1
38 16 2 3 0 2 4 3 1 1 5 7 15 16 15 9 9 8 1
39 15 1 3 0 4 3 2 1 1 5 2 12 12 11 14 13 12 1
40 15 1 1 0 4 3 1 1 1 2 8 14 13 13 14 13 12 1
41 16 2 2 1 3 3 3 1 2 3 25 7 10 11 13 13 13 1
42 15 1 1 0 5 4 3 2 4 5 8 12 12 12 10 13 13 1
43 15 1 2 0 4 3 3 1 1 5 2 19 18 18 9 12 12 1
44 15 1 1 0 5 4 1 1 1 1 0 8 8 11 10 13 13 1
45 16 2 2 1 4 3 3 2 2 5 14 10 10 9 11 11 11 1
46 15 1 2 0 5 2 2 1 1 5 8 8 8 6 12 11 12 1
47 16 1 2 0 2 3 5 1 4 3 12 11 12 11 10 11 11 1
48 16 1 4 0 4 2 2 1 1 2 4 19 19 20 14 14 16 1
49 15 1 2 0 4 3 3 2 2 5 2 15 15 14 10 13 13 1
50 15 1 2 1 4 4 4 1 1 3 2 7 7 7 15 15 15 1
51 16 3 2 0 4 3 3 2 3 4 2 12 13 13 13 13 13 1
52 15 1 2 0 4 3 3 1 1 5 2 11 13 13 16 14 16 1
53 15 2 1 1 5 5 5 3 4 5 6 11 11 10 14 14 16 1
54 15 1 1 0 3 3 4 2 3 5 0 8 10 11 11 12 13 1
55 15 1 1 0 5 3 4 4 4 1 6 10 13 13 13 12 13 1
[ reached getOption("max.print") -- omitted 889 rows ]
Here is a preview of what happens when i run the dat_puremath dataset.
> dat_puremath
age traveltime studytime failures famrel freetime goout Dalc Walc health absences Math_G1 Math_G2 Math_G3 Por_G1 Por_G2 Por_G3 DoubleSub
918 15 2 4 0 4 4 2 2 3 3 12 16 16 16 NA NA NA 0
931 16 1 2 3 2 3 3 2 2 4 5 7 7 7 NA NA NA 0
933 16 1 2 0 3 3 4 1 1 4 0 12 13 14 NA NA NA 0
935 16 1 1 0 4 5 2 1 1 5 20 13 12 12 NA NA NA 0
927 16 2 2 0 3 4 4 1 4 5 2 13 13 11 NA NA NA 0
929 17 1 2 0 5 3 3 1 1 3 0 8 8 9 NA NA NA 0
942 17 1 3 0 3 3 2 2 2 3 3 11 11 11 NA NA NA 0
928 16 1 2 0 1 2 2 1 2 1 14 12 13 12 NA NA NA 0
936 17 1 3 0 3 2 3 1 1 4 4 10 9 9 NA NA NA 0
939 17 1 4 0 5 2 2 1 2 5 0 17 17 18 NA NA NA 0
941 17 1 2 0 4 2 2 1 1 3 12 11 9 9 NA NA NA 0
937 17 1 2 0 5 4 5 1 2 5 4 10 9 11 NA NA NA 0
925 16 1 2 0 4 4 2 1 1 3 0 14 14 14 NA NA NA 0
938 17 1 3 0 4 3 3 1 1 3 6 13 12 12 NA NA NA 0
921 15 1 3 0 4 2 2 1 1 5 2 9 11 11 NA NA NA 0
943 17 1 3 0 4 4 3 1 1 5 7 12 14 14 NA NA NA 0
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.11 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.12 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.16 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.17 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.18 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.19 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.20 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.21 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.22 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.23 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.24 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.25 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.26 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.27 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.28 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.29 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.30 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.31 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Can someone explain why this happens and how I can fix it? Thank you!
When indexing, using is.na(c(dat$Math_G1,dat$Math_G2,dat$Math_G3)) creates an array of length 3*nrow(dat), so when applying the indices it does not behave as expected once past the index number nrow(dat).
Try the following
index_na_math <- (is.na(dat$Math_G1) | is.na(dat$Math_G2) | is.na(dat$Math_G3))
similarly for the other one, and then
index_na_both <- index_na_math | index_na_por
# or depending what you mean by 'both'
index_na_both <- index_na_math & index_na_por
The subsetting with dat_math <- dat[!index_na_math,] will yield the expected result (accordingly for the others).
I have a large dataframe, 300+ columns (time series) with about 2600 observations. The columns are filled with a lot of NA's and then a short time series, and then typically NA's again. I would like to find the first non-NA value in each column and replace it with NA.
This is what I'm hoping to achieve, only with a much bigger dataframe:
Before:
x1 x2 x3 x4
1 NA NA NA NA
2 NA NA NA NA
3 1 1 NA NA
4 2 2 1 1
5 3 3 2 2
6 4 4 3 3
7 5 5 4 4
8 6 6 5 5
9 7 7 6 6
10 8 8 7 7
11 9 9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA
After:
x1 x2 x3 x4
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 2 2 NA NA
5 3 3 2 2
6 4 4 3 3
7 5 5 4 4
8 6 6 5 5
9 7 7 6 6
10 8 8 7 7
11 9 9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA
I've searched around and found a way to do this for each column, but my efforts to apply it to the whole dataframe has proven difficult.
I have created an example dataframe to reproduce my original dataframe:
#Dataframe with NA
x1=x2=c(NA,NA,1:10,NA,NA)
x3=x4=c(NA,NA,NA,1:7,NA,NA,NA,NA)
df=data.frame(x1,x2,x3,x4)
I have used this to replace the first value with NA in 1 column (provided by #Joshua Ulrich here), however I would like to apply it to all columns without manually changing 300+ codes:
NonNAindex <- which(!is.na(df[,1]))
firstNonNA <- min(NonNAindex)
is.na(df[,1]) <- seq(firstNonNA, length.out=1)
I have tried to set the above as a function and run it for all columns with apply/lapply, as well as a for loop, but haven't really figured out how to apply the changes to my dataframe. I'm sure there is something I've completely overlooked as I'm just taking my first small steps in R.
All suggestions would be highly appreciated!
We can use base R
df1[] <- lapply(df1, function(x) replace(x, which(!is.na(x))[1], NA))
df1
# x1 x2 x3 x4
#1 NA NA NA NA
#2 NA NA NA NA
#3 NA NA NA NA
#4 2 2 NA NA
#5 3 3 2 2
#6 4 4 3 3
#7 5 5 4 4
#8 6 6 5 5
#9 7 7 6 6
#10 8 8 7 7
#11 9 9 NA NA
#12 10 10 NA NA
#13 NA NA NA NA
#14 NA NA NA NA
Or as #thelatemail suggested
df1[] <- lapply(df1, function(x) replace(x, Position(Negate(is.na), x), NA))
Since you would like to do this for all columns, you could use the mutate_all function from dplyr. See http://dplyr.tidyverse.org/ for more information. In particular, you may want to look at some of the examples shown here.
library(dplyr)
mutate_all(df, funs(if_else(row_number() == min(which(!is.na(.))), NA_integer_, .)))
#> x1 x2 x3 x4
#> 1 NA NA NA NA
#> 2 NA NA NA NA
#> 3 NA NA NA NA
#> 4 2 2 NA NA
#> 5 3 3 2 2
#> 6 4 4 3 3
#> 7 5 5 4 4
#> 8 6 6 5 5
#> 9 7 7 6 6
#> 10 8 8 7 7
#> 11 9 9 NA NA
#> 12 10 10 NA NA
#> 13 NA NA NA NA
#> 14 NA NA NA NA
I need some help with subset/filter of data.frame. Below is the code for my random dataset.
A <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
B <- c(3,3,3,3,4,4,4,4,1,1,1,1,2,2,2,2)
C <- c(1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4)
Fakey <- data.frame(A, B, C)
Filter_Fakey <- subset(Fakey, (Fakey>1 & Fakey<4))
That last line of coode results in the following:
> Filter_Fakey
A B C
5 2 4 3
6 2 4 3
7 2 4 3
8 2 4 3
9 3 1 2
10 3 1 2
11 3 1 2
12 3 1 2
NA NA NA NA
NA.1 NA NA NA
NA.2 NA NA NA
NA.3 NA NA NA
NA.4 NA NA NA
NA.5 NA NA NA
NA.6 NA NA NA
NA.7 NA NA NA
NA.8 NA NA NA
NA.9 NA NA NA
NA.10 NA NA NA
NA.11 NA NA NA
NA.12 NA NA NA
NA.13 NA NA NA
NA.14 NA NA NA
NA.15 NA NA NA
But What I really want is this,
> Filter_Fakey
A B C
5 2 3 3
6 2 3 3
7 2 3 3
8 2 3 3
9 3 2 2
10 3 2 2
11 3 2 2
12 3 2 2
NA NA NA NA
NA.1 NA NA NA
NA.2 NA NA NA
NA.3 NA NA NA
NA.4 NA NA NA
NA.5 NA NA NA
NA.6 NA NA NA
NA.7 NA NA NA
NA.8 NA NA NA
NA.9 NA NA NA
NA.10 NA NA NA
NA.11 NA NA NA
NA.12 NA NA NA
NA.13 NA NA NA
NA.14 NA NA NA
NA.15 NA NA NA
I've tried subset(), subset(with a negation condition), filter{dplyr}, and the different bracket notations ('[' and '[['). Thanks for helping me out.
Use lapply to loop through columns of the data frame, and set values out of conditions to be NA if that is what you are after. Use order(is.na(...)) to arrange NA values to the last positions:
do.call(cbind, lapply(Fakey, function(col) {
col[col <= 1 | col >= 4] <- NA; col[order(is.na(col))]
}))
A B C
1 2 3 3
2 2 3 3
3 2 3 3
4 2 3 3
5 3 2 2
6 3 2 2
7 3 2 2
8 3 2 2
9 NA NA NA
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 NA NA NA
14 NA NA NA
15 NA NA NA
16 NA NA NA
Another option is using length<- to pad NA at the end after subsetting each of the columns using the logical condition.
data.frame(lapply(Fakey, function(x) `length<-`(x[x > 1 & x <4], nrow(Fakey))))
# A B C
#1 2 3 3
#2 2 3 3
#3 2 3 3
#4 2 3 3
#5 3 2 2
#6 3 2 2
#7 3 2 2
#8 3 2 2
#9 NA NA NA
#10 NA NA NA
#11 NA NA NA
#12 NA NA NA
#13 NA NA NA
#14 NA NA NA
#15 NA NA NA
#16 NA NA NA
I have four vectors (columns)
x y z t
1 1 1 10
1 1 1 15
2 4 1 14
2 3 1 15
2 2 1 17
2 1 2 19
1 4 2 18
1 4 2 NA
2 2 2 45
3 3 2 NA
3 1 3 59
4 3 3 23
1 4 3 45
4 4 4 74
2 1 4 86
How can I calculate mean and median of vector t, for each value of vector y (from 1 to 4) where x=1, z=1, using aggregate function in R?
It was discussed how to do it with 3 parameters (Multiple Aggregation in R) but it`s a little unclear how to do it with 4 parameters.
Thank you.
You could try something like this in data.table
data <- data.table(yourdataframe)
bar <- data[,.N,by=y]
foo <- data[x==1 & z==1,list(mean.t=mean(t,na.rm=T),median.t=median(t,na.rm=T)),by=y]
merge(bar[,list(y)],foo,by="y",all.x=T)
y mean.t median.t
1: 1 12.5 12.5
2: 2 NA NA
3: 3 NA NA
4: 4 NA NA
You probably could do the same in aggregate, but I am not sure you can do it in one easy step.
An answer to to an additional request in the comments...
bar <- data.table(expand.grid(y=unique(data$y),z=unique(data[z %in% c(1,2,3,4),z])))
foo <- data[x==1 & z %in% c(1,2,3,4),list(
mean.t=mean(t,na.rm=T),
median.t=median(t,na.rm=T),
Q25.t=quantile(t,0.25,na.rm=T),
Q75.t=quantile(t,0.75,na.rm=T)
),by=list(y,z)]
merge(bar[,list(y,z)],foo,by=c("y","z"),all.x=T)
y z mean.t median.t Q25.t Q75.t
1: 1 1 12.5 12.5 11.25 13.75
2: 1 2 NA NA NA NA
3: 1 3 NA NA NA NA
4: 1 4 NA NA NA NA
5: 2 1 NA NA NA NA
6: 2 2 NA NA NA NA
7: 2 3 NA NA NA NA
8: 2 4 NA NA NA NA
9: 3 1 NA NA NA NA
10: 3 2 NA NA NA NA
11: 3 3 NA NA NA NA
12: 3 4 NA NA NA NA
13: 4 1 NA NA NA NA
14: 4 2 18.0 18.0 18.00 18.00
15: 4 3 45.0 45.0 45.00 45.00
16: 4 4 NA NA NA NA