Apply function over consecutive groups in vector [duplicate] - r

This question already has answers here:
Calculate the mean of every 13 rows in data frame
(4 answers)
Closed 1 year ago.
I want to calculate meas of three consecutive variables a vector.
Ex:
Vec<-rep(1:10)
I would like the output to be like the screenshot below:

You can create the following function to calculate means by groups of 3 (or any other number):
f <- function(x, k=3)
{
for(i in seq(k,length(x),k))
x[(i/k)] <- mean(x[(i-k+1):i])
return(x[1:(length(x)/k)])
}
f(1:15)
[1] 2 5 8 11 14

We can create a grouping variable using gl and then get the mean with ave
ave(Vec, as.numeric(gl(length(Vec), 3, length(Vec))))

Related

Finding the maximum value for each row and extract column names [duplicate]

This question already has answers here:
R Create column which holds column name of maximum value for each row
(4 answers)
Closed 1 year ago.
Say we have the following matrix,
x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), c("A","B","C")))
What I'm trying to do is:
1- Find the maximum value of each row. For this part, I'm doing the following,
df <- apply(X=x, MARGIN=1, FUN=max)
2- Then, I want to extract the column names of the maximum values and put them next to the values. Following the reproducible example, it would be "C" for the three rows.
Any assistance would be wonderful.
You can use apply like
maxColumnNames <- apply(x,1,function(row) colnames(x)[which.max(row)])
Since you have a numeric matrix, you can't add the names as an extra column (it would become converted to a character-matrix).
You can choose a data.frame and do
resDf <- cbind(data.frame(x),data.frame(maxColumnNames = maxColumnNames))
resulting in
resDf
A B C maxColumnNames
X 1 4 7 C
Y 2 5 8 C
Z 3 6 9 C

How to add cells based off of a specific integer? [duplicate]

This question already has answers here:
Sum elements of a vector beween zeros in R
(3 answers)
Closed 2 years ago.
I want to add values from a column. They go in sequence:
0,225,2352,34234,23442,23456,0,123,...
I want to add the values from 0 until the following 0 but not including the second.
For example, i want an output of
(0+225+2352+34234+23442+23456),(0+123+,...,),...
I want to store them as a new column of totals
One simple solution in base R is
sapply(split(x, cumsum(x == 0)), sum)
With split you basically create groups of elements that you want to sum together using sapply. The final result will be a named numeric vector.
Sample data
x <- c(0,225,2352,34234,23442,23456,0,123,2,0,1,42)
sapply(split(x, cumsum(x == 0)), sum)
# 1 2 3
# 83709 125 43

Split dataframe into 20 groups based on column values [duplicate]

This question already has answers here:
Splitting a continuous variable into equal sized groups
(11 answers)
How to categorize a continuous variable in 4 groups of the same size in R?
(1 answer)
R divide data into groups
(1 answer)
Closed 2 years ago.
I am fairly new to R and can't find a concise way to a problem.
I have a dataframe in R called df that looks as such. It contain a column called values that contains values from 0 to 1 ordered numerically and a binary column called flag that contains either 0 or 1.
df
value flag
0.033 0
0.139 0
0.452 1
0.532 0
0.687 1
0.993 1
I wish to split this dataframe into X amount of groups from 0 to 1. For example if I wished a 4 split grouping, the data would be split from 0-0.25, 0.25-0.5, 0.5-0.75, 0.75-1. This data would also contain the corresponding flag to that point.
I want to solution to be scalable so if I wished to split it into more group then I can. I am also limited to the tidyverse packages.
Does anyone have a solution for this? Thanks
if n is the number of partitions:
L = seq(1,n)/n
GroupedList = lapply(L,function(x){
df[(df$value < x) & (df$value > (x-(1/n))),]
})
I think this should produce a list of dataframes where each dataframe contains what you asked.
You can use cut to divide data into n groups and use it in split to have list of dataframes.
n <- 4
list_df <- split(df, cut(df$value, breaks = n))
If you want to split the data between 0-1 into n groups you can do :
list_df <- split(df, cut(df$value, seq(0, 1, length.out = n + 1)))

R Data-Frame: Get Maximum of Variable B condititional on Variable A [duplicate]

This question already has answers here:
Extract the maximum value within each group in a dataframe [duplicate]
(3 answers)
Closed 7 years ago.
I am searching for an efficient and fast way to do the following:
I have a data frame with, say, 2 variables, A and B, where the values for A can occur several times:
mat<-data.frame('VarA'=rep(seq(1,10),2),'VarB'=rnorm(20))
VarA VarB
1 0.95848233
2 -0.07477916
3 2.08189370
4 0.46523827
5 0.53500190
6 0.52605101
7 -0.69587974
8 -0.21772252
9 0.29429577
10 3.30514605
1 0.84938361
2 1.13650996
3 1.25143046
Now I want to get a vector giving me for every unique value of VarA
unique(mat$VarA)
the maximum of VarB conditional on VarA.
In the example here that would be
1 0.95848233
2 1.13650996
3 2.08189370
etc...
My data-frame is very big so I want to avoid the use of loops.
Try this:
library(dplyr)
mat %>% group_by(VarA) %>%
summarise(max=max(VarB))
Try to use data.table package.
library(data.table)
mat <- data.table(mat)
result <- mat[,max(VarB),VarA]
print(result)
Try this:
library(plyr)
ddply(mat, .(VarA), summarise, VarB=min(VarB))

R: How to compute the mean by ID in a given data frame? [duplicate]

This question already has answers here:
Calculating statistics on subsets of data [duplicate]
(3 answers)
Closed 7 years ago.
I have the following data:
ID Value
1 3
1 5
How can I compute the mean by ID, and put the mean in the data frame as a new variable such that it is repeated for the same ID. The result should look like this:
ID Value Mean
1 3 4
1 5 4
Thanks.
You can compute the mean by group using ave(). Assuming your data frame is called df, you can do the following:
df$Mean <- with(df, ave(Value, ID, FUN=mean))
This adds Mean as another column in your data frame.
You can use the 'ave' function from 'base' R:
df=data.frame(ID=c(1,1), value=c(3,5))
df['mean'] <- ave(df$value, df$ID, FUN=mean)
> df
### ID value mean
### 1 1 3 4
### 2 1 5 4

Resources