I think this will have a simple answer, but I can't work it out! Here is an example using the iris dataset:
a <- table(iris[,2])
b <- table(iris[,3])
How do I add these two tables together? For example, the variable 3 would have a value of 27 (26+1) and variable 3.3 a value of 8 (6+2) in the new output table.
Any help much appreciated.
This will work if you want to use the variables which are present in both a and b:
n <- intersect(names(a), names(b))
a[n] + b[n]
# 3 3.3 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.4
# 27 8 8 5 4 7 5 6 4 5 5
If you want to use all variables:
n <- intersect(names(a), names(b))
res <- c(a[!(names(a) %in% n)], b[!(names(b) %in% n)], a[n] + b[n])
res[order(names(res))] # sort the results
temp<-merge(a,b,by='Var1')
temp$sum<-temp$Freq.x + temp$Freq.y
Var1 Freq.x Freq.y sum
1 3 26 1 27
2 3.3 6 2 8
3 3.5 6 2 8
4 3.6 4 1 5
5 3.7 3 1 4
6 3.8 6 1 7
7 3.9 2 3 5
8 4 1 5 6
9 4.1 1 3 4
10 4.2 1 4 5
11 4.4 1 4 5
Here is another one:
transform(merge(a,b, by="Var1"), sum=Freq.x + Freq.y)
Var1 Freq.x Freq.y sum
1 3 26 1 27
2 3.3 6 2 8
3 3.5 6 2 8
4 3.6 4 1 5
5 3.7 3 1 4
6 3.8 6 1 7
7 3.9 2 3 5
8 4 1 5 6
9 4.1 1 3 4
10 4.2 1 4 5
11 4.4 1 4 5
Here's a slightly tortured one-liner version of the merge() solution:
do.call(function(Var1, Freq.x, Freq.y) data.frame(Var1=Var1, Freq=rowSums(cbind(Freq.x, Freq.y))), merge(a, b, by="Var1"))
Here's the one if you want to use all variables:
do.call(function(Var1, Freq.x, Freq.y) data.frame(Var1=Var1, Freq=rowSums(cbind(Freq.x, Freq.y), na.rm=TRUE)), merge(a, b, by="Var1", all=TRUE))
Unlike the transform() one-liner, it doesn't accumulate .x and .y so it can be used iteratively.
The merge function of the data.table package may be what you want: https://rpubs.com/ronasta/join_data_tables
Related
Assuming the following data frame:
set.seed(2409)
df <- data.frame(group = rep(1:4, each=5), value = round(runif(20, 1, 10),0))
df
group value
1 1 4
2 1 9
3 1 7
4 1 1
5 1 6
6 2 5
7 2 8
8 2 5
9 2 5
10 2 3
11 3 6
12 3 1
13 3 4
14 3 4
15 3 9
16 4 6
17 4 5
18 4 7
19 4 7
20 4 4
I'm now interested in calculating the mean of the value column based on the first three (or n) rows for each group.
So, what I want to achieve is:
group value mean
1 1 4 6.666667
2 1 9 6.666667
3 1 7 6.666667
4 1 1 6.666667
5 1 6 6.666667
6 2 5 6.000000
7 2 8 6.000000
8 2 5 6.000000
9 2 5 6.000000
10 2 3 6.000000
11 3 6 3.666667
12 3 1 3.666667
13 3 4 3.666667
14 3 4 3.666667
15 3 9 3.666667
16 4 6 6.000000
17 4 5 6.000000
18 4 7 6.000000
19 4 7 6.000000
20 4 4 6.000000
I can get the values in the mean column e.g. by running:
sapply(split(df, df$group),
function(x) mean(x[1:3,]$value))
1 2 3 4
6.666667 6.000000 3.666667 6.000000
But I am pretty sure that there has to be a more elegant way to get these values maybe by using dplyr. It's easy to calculate the overall mean for each group:
df <- df %>%
group_by(group) %>%
mutate(mean = mean(value))
df
group value mean
<int> <dbl> <dbl>
1 1 4 5.4
2 1 9 5.4
3 1 7 5.4
4 1 1 5.4
5 1 6 5.4
6 2 5 5.2
7 2 8 5.2
8 2 5 5.2
9 2 5 5.2
10 2 3 5.2
11 3 6 4.8
12 3 1 4.8
13 3 4 4.8
14 3 4 4.8
15 3 9 4.8
16 4 6 5.8
17 4 5 5.8
18 4 7 5.8
19 4 7 5.8
20 4 4 5.8
But how do I consider only the first 3 rows here?
Thank you very much for your help!
If you need to do it repeatedly (programmatically), you can do
means <- c(2,3,5)
df %>%
group_by(group) %>%
mutate(as.data.frame(lapply(setNames(means, paste0("mean", means)),
function(z) mean(head(value,z))))) %>%
ungroup()
# # A tibble: 20 x 5
# group value mean2 mean3 mean5
# <int> <dbl> <dbl> <dbl> <dbl>
# 1 1 4 6.5 6.67 5.4
# 2 1 9 6.5 6.67 5.4
# 3 1 7 6.5 6.67 5.4
# 4 1 1 6.5 6.67 5.4
# 5 1 6 6.5 6.67 5.4
# 6 2 5 6.5 6 5.2
# 7 2 8 6.5 6 5.2
# 8 2 5 6.5 6 5.2
# 9 2 5 6.5 6 5.2
# 10 2 3 6.5 6 5.2
# 11 3 6 3.5 3.67 4.8
# 12 3 1 3.5 3.67 4.8
# 13 3 4 3.5 3.67 4.8
# 14 3 4 3.5 3.67 4.8
# 15 3 9 3.5 3.67 4.8
# 16 4 6 5.5 6 5.8
# 17 4 5 5.5 6 5.8
# 18 4 7 5.5 6 5.8
# 19 4 7 5.5 6 5.8
# 20 4 4 5.5 6 5.8
I have 2 datasets of different sizes but with common data in the first columns like this
x <- data.frame(cbind(c(1,2,3,4,5,6,7,8,9,10),c(1,4,3,2,5,4,6,7,1,3)))
y <- data.frame(cbind(c(0,2,4,6,8,10),c(6,5,4,7,5,4)))
> x
X1 X2
1 1 1
2 2 4
3 3 3
4 4 2
5 5 5
6 6 4
7 7 6
8 8 7
9 9 1
10 10 3
> y
X1 X2
1 0 6
2 2 5
3 4 4
4 6 7
5 8 5
6 10 4
I've been trying to use the approx function to do the interpolation on X2 in y but I haven't been able to find examples with different column sizes.
You could merge y with the common column in x and approximate on it as xout.
data.frame(X1=x$X1, X2=approx(merge(x["X1"], y, all=T)[,2], xout=x$X1)$y)
# X1 X2
# 1 1 6.0
# 2 2 5.5
# 3 3 5.0
# 4 4 4.5
# 5 5 4.0
# 6 6 5.5
# 7 7 7.0
# 8 8 6.0
# 9 9 5.0
# 10 10 4.5
> sleep
extra group ID
1 0.7 1 1
2 -1.6 1 2
3 -0.2 1 3
4 -1.2 1 4
5 -0.1 1 5
6 3.4 1 6
7 3.7 1 7
8 0.8 1 8
9 0.0 1 9
10 2.0 1 10
11 1.9 2 1
12 0.8 2 2
13 1.1 2 3
14 0.1 2 4
15 -0.1 2 5
16 4.4 2 6
17 5.5 2 7
18 1.6 2 8
19 4.6 2 9
20 3.4 2 10
I have this set of Data and Im supposed to Divide it by the effects that GROUP have on different people and put it into two different boxplot but as you can see theres group 1 and group 2 and they are on the same data which is group so I dont know how to divede the data into group 1 and group 2 can u help me with this?
You don't need to divide the data to put it into a boxplot:
boxplot(extra~group,data=sleep)
You can explore the different options available by using ?boxplot.
Some people like to use the ggplot2 package:
library(ggplot2)
ggplot(sleep,aes(x=group,y=extra,group=group))+geom_boxplot()
Others prefer lattice:
bwplot(group~extra,data=sleep)
This is a good dataset to use ggplot2 with.
library(ggplot2)
ggplot(sleep, aes(x=factor(group), y=extra)) + geom_boxplot()
I have a data frame in R that can be approximated as:
df <- data.frame(x = rep(1:5, each = 4), y = rep(2:6, each = 4), z = rep(3:7, each = 4))
> df
x y z
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 2 3 4
6 2 3 4
7 2 3 4
8 2 3 4
9 3 4 5
10 3 4 5
11 3 4 5
12 3 4 5
13 4 5 6
14 4 5 6
15 4 5 6
16 4 5 6
17 5 6 7
18 5 6 7
19 5 6 7
20 5 6 7
I'd like to compute colwise means at intervals of 5, and then collapse these means into a new data frame. For example, I'd like to compute the colwise means of df[1:5,], df[6:10,], df[11:15,], and df[16:20,], and return a df that looks as follows:
[,1] [,2] [,3]
[1,] 1.2 2.2 3.2
[2,] 2.4 3.4 4.4
[3,] 3.6 4.6 5.6
[4,] 4.8 5.8 6.8
I'm currently using a for-loop as such (where temp.coeff would correspond to the "5" specified above):
my.means <- NULL
for (j in 1:baseFreq) {
temp.mean <- colMeans(temp.df[(temp.coeff*(j-1)+1):(temp.coeff*j),])
my.means <- rbind(my.means, temp.mean)
}
my.means <- t(my.means)
collapsed.df <- t(data.frame(colMeans(my.means)))
}
..but I feel like there's an apply statement that could do the job a lot more efficiently. In addition, while the above data frame only has 20 rows, the one's on which I'll be working will have several thousand. Thoughts?
Many thanks in advance SO.
aggregate can do this if you aggregate against an appropriate running index. You do end up with another column in the result (which can be removed).
aggregate(. ~ rep(seq(nrow(df)/5), each=5), data=df, FUN=mean)
## rep(seq(nrow(df)/5), each = 5) x y z
## 1 1 1.2 2.2 3.2
## 2 2 2.4 3.4 4.4
## 3 3 3.6 4.6 5.6
## 4 4 4.8 5.8 6.8
I really think data.table works great for situations like this. It is fast and easy.
require("data.table")
dt <- data.table(df)
dt[,row.num:=.I]
dt[,lapply(.SD,mean),by=list(interval=cut(row.num,seq(0,nrow(dt),by=5)))]
# interval x y z
# 1: (0,5] 1.2 2.2 3.2
# 2: (5,10] 2.4 3.4 4.4
# 3: (10,15] 3.6 4.6 5.6
# 4: (15,20] 4.8 5.8 6.8
This is a possible solution with a combination of apply and sapply:
apply(df, 2, function(x) sapply(seq(1,nrow(df),5), function(y) mean(x[y:(y+4)])))
# x y z
#[1,] 1.2 2.2 3.2
#[2,] 2.4 3.4 4.4
#[3,] 3.6 4.6 5.6
#[4,] 4.8 5.8 6.8
Edit after comment by #jbaums: depending on the desired behavior, you might want to add na.rm=TRUE to the mean calculation:
apply(df, 2, function(x) sapply(seq(1,nrow(df),5), function(y) mean(x[y:(y+4)], na.rm = TRUE)))
How can I subset data with logical conditions.
Assume that I have data as below. I would like to subset data set with first condition that all animals having FCR record, then I would like to take all animals in same pen with these animals in new data set.
animal Feed Litter Pen
1 0.2 5 3
2 NA 5 3
3 0.2 5 3
4 0.2 6 4
5 0.3 5 4
6 0.3 4 4
7 0.3 5 3
8 0.3 5 3
9 NA 5 5
10 NA 3 5
11 NA 3 3
12 NA 3 5
13 0.4 7 3
14 0.4 7 3
15 NA 7 5
I'm assuming that "FCR record" (in your question) relates to "Feed". Then, if I understand the question correctly, you can do this:
split(df[complete.cases(df),], df[complete.cases(df), 4])
# $`3`
# animal Feed Litter Pen
# 1 1 0.2 5 3
# 3 3 0.2 5 3
# 7 7 0.3 5 3
# 8 8 0.3 5 3
# 13 13 0.4 7 3
# 14 14 0.4 7 3
#
# $`4`
# animal Feed Litter Pen
# 4 4 0.2 6 4
# 5 5 0.3 5 4
# 6 6 0.3 4 4
In the above, complete.cases drops any of the incomplete observations. If you needed to match the argument on a specific variable, you can use something like df[!is.na(df$Feed), ] instead of complete.cases. Then, split creates a list of data.frames split by Pen.
# all animals with Feed data
df[!is.na(df$Feed), ]
# all animals from pens with at least one animal with feed data in the pen
df[ave(!is.na(df$Feed), df$Pen, FUN = any), ]