I input a vector vec<-c(2 3 4 8 10 12 15 19 20 23 27 28 39 47 52 60 64 75), and the size of intervals that I want to break the vector entries into.
In this example I want to break this into 9 different vectors based on the size of each entry.
In my case I want vector number 1 to be entries in the interval [1,9], then vector 2 to be entries in [10,18]...ect
In other words:
vec1: 2 3 4 8
vec2: 10 12 15
vec3: 19 20 23 27
ect...
I have tried using the split function but I do not know how to set a ratio that will work.
Maybe the following will do what you want.
f <- cut(vec, seq(0, max(vec), by = 9), include.lowest = TRUE)
sp <- split(vec, f)
sp <- sp[sapply(sp, function(x) length(x) != 0)]
sp
Use integer division %/% to return a vector of which group each value belongs in. Then split into separate vectors. Use (vec-1) to be "inclusive", i.e. 27 goes with group 3, not group 4.
split(vec,(vec-1) %/% 9)
Edit:
Another way using dplyr and cut which explicitly tags each interval
require(dplyr)
vec <- as.data.frame(vec)
df2 %>% mutate(interval = cut(vec,breaks=seq(0,((max(vec) %/% 9) +1) * 9,9),include.lowest=TRUE,right=TRUE))
vec interval
1 2 [0,9]
2 3 [0,9]
3 4 [0,9]
4 8 [0,9]
5 10 (9,18]
6 12 (9,18]
7 15 (9,18]
8 19 (18,27]
9 20 (18,27]
10 23 (18,27]
11 27 (18,27]
maybe this
library(purrr)
vec <- c(2, 3, 4, 8, 10 ,12, 15 ,19, 20, 23, 27, 28, 39, 47, 52, 60, 64, 75)
vec1 <- keep(vec, function(x) x >= 1 & (x) <= 9)
vec2 <- keep(vec, function(x) x >= 10 & (x) <= 18)
Related
I have a dataframe and i want to calculate the sum of variables present in a vector in every row and make the sum in other variable after i want the name of new variable created to be from the name of the variable in vector
for example
data
Name A_12 B_12 C_12 D_12 E_12
r1 1 5 12 21 15
r2 2 4 7 10 9
r3 5 15 16 9 6
r4 7 8 0 7 18
let's say i have two vectors
vector_1 <- c("A_12","B_12","C_12")
vector_2 <- c("B_12","C_12","D_12","E_12")
The result i want is :
New_data >
Name A_12 B_12 C_12 ABC_12 D_12 E_12 BCDE_12
r1 1 5 12 18 21 15 54
r2 2 4 7 13 10 9 32
r3 5 15 16 36 9 6 45
r4 7 8 0 15 7 18 40
I created for loop to get the sum of the rows in a vector but i didn't get the correct result
Please tell me ig you need any more informations or clarifications
Thank you
You can use rowSums and simple column-subsetting:
dat$ABC_12 <- rowSums(dat[,vector_1])
dat$BCDE_12 <- rowSums(dat[,vector_2])
dat
# Name A_12 B_12 C_12 D_12 E_12 ABC_12 BCDE_12
# 1 r1 1 5 12 21 15 18 53
# 2 r2 2 4 7 10 9 13 30
# 3 r3 5 15 16 9 6 36 46
# 4 r4 7 8 0 7 18 15 33
Note that if your frames inherit from data.table, then you'll need to use either subset(dat, select=vector_1) or dat[,..vector_1] instead of simply dat[,vector_1]; if you aren't already using data.table, then you can safely ignore this paragraph.
Like this (using dplyr/tidyverse)
df %>%
rowwise() %>%
mutate(
ABC_12 = sum(c_across(vector_1)),
BCDE_12 = sum(c_across(vector_2))
)
Though I'm not sure the sums are correct in your example
-=-=-=EDIT-=-=-=-
Here's a function to help with the naming.
ex_fun <- function(vec, n_len){
paste0(paste(substr(vec,1,n_len), collapse = ""), substr(vec[1],n_len+1,nchar(vec[1])))
}
Which can then be implemented like so.
df %>%
rowwise() %>%
mutate(
!!ex_fun(vector_1, 1) := sum(c_across(vector_1)),
!!ex_fun(vector_2, 1) := sum(c_across(vector_2)),
)
-=-= Extra note -=--=
If you list your vectors up you could then combine this with r2evans answer and stick into a loop if you prefer.
vectors = list(vector_1, vector_2)
for (v in vectors){
df[ex_fun(v, 1)] <- rowSums(df[,v])
}
I believe this might work, so long as only the starting digits are different:
library("tidyverse")
#Input dataframe.
data <- data.frame(Name =c("r1", "r2", "r3", "r4"), A_12 = c(1, 2, 5, 7), B_12 = c(5, 4, 15, 8),
C_12 = c(12, 7, 16, 0), D_12 = c(21, 10, 9, 7), E_12 = c(15, 9, 6, 18))
#add all vectors to the "vectors" list. I have added vector_1 and vector_2, but
#there can be as many vectors as needed, they just need to be put in the list.
vector_1 <- c("A_12","B_12","C_12")
vector_2 <- c("B_12","C_12","D_12","E_12")
vector_list<-list(vector_1, vector_2)
vector_sum <- function(data, vector_list){
output <- data |>
dplyr::select(1, all_of(vector_list[[1]]))
for (i in vector_list) {
name1 <- substring(as.character(i), 1,1) |> paste(collapse = '')
name2 <- substring(as.character(i[1]), 2)
input_temp <- dplyr::select(data, all_of(i))
input_temp <- mutate(input_temp, temp=rowSums(input_temp))
names(input_temp)[names(input_temp) == "temp"] <- paste(name1, name2)
output = cbind(output, input_temp)
}
output[, !duplicated(colnames(output))]
}
vector_sum(data, vector_list)
I have a dataframe like this:
V1 = paste0("AB", seq(1:48))
V2 = seq(1:48)
test = data.frame(name = V1, value = V2)
I want to calculate the means of the value-column and specific rows.
The pattern of the rows is pretty complicated:
Rows of MeanA1: 1, 5, 9
Rows of MeanA2: 2, 6, 10
Rows of MeanA3: 3, 7, 11
Rows of MeanA4: 4, 8, 12
Rows of MeanB1: 13, 17, 21
Rows of MeanB2: 14, 18, 22
Rows of MeanB3: 15, 19, 23
Rows of MeanB4: 16, 20, 24
Rows of MeanC1: 25, 29, 33
Rows of MeanC2: 26, 30, 34
Rows of MeanC3: 27, 31, 35
Rows of MeanC4: 28, 32, 36
Rows of MeanD1: 37, 41, 45
Rows of MeanD2: 38, 42, 46
Rows of MeanD3: 39, 43, 47
Rows of MeanD4: 40, 44, 48
As you see its starting at 4 different points (1, 13, 25, 37) then always +4 and for the following 4 means its just stepping 1 more row down.
I would like to have an output of all these means in one list.
Any ideas? NOTE: In this example the mean is of course always the middle number, but my real df is different.
Not quite sure about the output format you require, but the following codes can calculate what you want anyhow.
calc_mean1 <- function(x) mean(test$value[seq(x, by = 4, length.out = 3)])
calc_mean2 <- function(x){sapply(x:(x+3), calc_mean1)}
output <- lapply(seq(1, 37, 12), calc_mean2)
names(output) <- paste0('Mean', LETTERS[seq_along(output)]) # remove this line if more than 26 groups.
output
## $MeanA
## [1] 5 6 7 8
## $MeanB
## [1] 17 18 19 20
## $MeanC
## [1] 29 30 31 32
## $MeanD
## [1] 41 42 43 44
An idea via base R is to create a grouping variable for every 4 rows, split the data every 12 rows (nrow(test) / 4) and aggregate to find the mean, i.e.
test$new = rep(1:4, nrow(test)%/%4)
lapply(split(test, rep(1:4, each = nrow(test) %/% 4)), function(i)
aggregate(value ~ new, i, mean))
# $`1`
# new value
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
# $`2`
# new value
# 1 1 17
# 2 2 18
# 3 3 19
# 4 4 20
# $`3`
# new value
# 1 1 29
# 2 2 30
# 3 3 31
# 4 4 32
# $`4`
# new value
# 1 1 41
# 2 2 42
# 3 3 43
# 4 4 44
And yet another way.
fun <- function(DF, col, step = 4){
run <- nrow(DF)/step^2
res <- lapply(seq_len(step), function(inc){
inx <- seq_len(run*step) + (inc - 1)*run*step
dftmp <- DF[inx, ]
tapply(dftmp[[col]], rep(seq_len(step), run), mean, na.rm = TRUE)
})
names(res) <- sprintf("Mean%s", LETTERS[seq_len(step)])
res
}
fun(test, 2, 4)
#$MeanA
#1 2 3 4
#5 6 7 8
#
#$MeanB
# 1 2 3 4
#17 18 19 20
#
#$MeanC
# 1 2 3 4
#29 30 31 32
#
#$MeanD
# 1 2 3 4
#41 42 43 44
Since you said you wanted a long list of the means, I assumed it could also be a vector where you just have all these values. You would get that like this:
V1 = paste0("AB", seq(1:48))
V2 = seq(1:48)
test = data.frame(name = V1, value = V2)
meanVector <- NULL
for (i in 1:(nrow(test)-8)) {
x <- c(test$value[i], test$value[i+4], test$value[i+8])
m <- mean(x)
meanVector <- c(meanVector, m)
}
First, I simplify my question. I want to extract certain ranges from a numeric vector. For example, extracting 3 ranges from 1:20 at the same time :
1 < x < 5
8 < x < 12
17 < x < 20
Therefore, the expected output is 2, 3, 4, 9, 10, 11, 18, 19.
I try to use the function findInterval() and control arguments rightmost.closed and left.open to do that, but any arguments sets cannot achieve the goal.
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
x[findInterval(x, v) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19
x[findInterval(x, v, rightmost.closed = T) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19 20
x[findInterval(x, v, left.open = T) %% 2 == 1]
# [1] 2 3 4 5 9 10 11 12 18 19 20
By the way, the conditions can also be a matrix like that :
[,1] [,2]
[1,] 1 5
[2,] 8 12
[3,] 17 20
I don't want to use for loop if it's not necessary.
I am grateful for any helps.
I'd probably do it using purrr::map2 or Map, passing your lower-bounds and upper-bounds as arguments and filtering your dataset with a custom function
library(purrr)
x <- 1:20
lower_bounds <- c(1, 8, 17)
upper_bounds <- c(5, 12, 20)
map2(
lower_bounds, upper_bounds, function(lower, upper) {
x[x > lower & x < upper]
}
)
You may use data.table::inrange and its incbounds argument. Assuming ranges are in a matrix 'm', as shown in your question:
x[data.table::inrange(x, m[ , 1], m[ , 2], incbounds = FALSE)]
# [1] 2 3 4 9 10 11 18 19
m <- matrix(v, ncol = 2, byrow = TRUE)
You were on the right path, and left.open indeed helps, but rightmost.closed actually concerns only the last interval rather than the right "side" of each interval. Hence, we need to use left.open twice. As you yourself figured out, it looks like an optimal way to do that is
x[findInterval(x, v) %% 2 == 1 & findInterval(x, v, left.open = TRUE) %% 2 == 1]
# [1] 2 3 4 9 10 11 18 19
Clearly there are alternatives. E.g.,
fun <- function(x, v)
if(length(v) > 1) v[1] < x & x < v[2] | fun(x, v[-1:-2]) else FALSE
x[fun(x, v)]
# [1] 2 3 4 9 10 11 18 19
I found an easy way just with sapply() :
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
(v.df <- as.data.frame(matrix(v, 3, 2, byrow = T)))
# V1 V2
# 1 1 5
# 2 8 12
# 3 17 20
y <- sapply(x, function(x){
ind <- (x > v.df$V1 & x < v.df$V2)
if(any(ind)) x else NA
})
y[!is.na(y)]
# [1] 2 3 4 9 10 11 18 19
df <- data.frame(x = seq(1:10))
I want this:
df$y <- c(1, 2, 3, 4, 5, 15, 20 , 25, 30, 35)
i.e. each y is the sum of previous five x values. This implies the first
five y will be same as x
What I get is this:
df$y1 <- c(df$x[1:4], RcppRoll::roll_sum(df$x, 5))
x y y1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 15
6 15 20
7 20 25
8 25 30
9 30 35
10 35 40
In summary, I need y but I am only able to achieve y1
1) enhanced sum function Define a function Sum which sums its first 5 values if it receives 6 values and returns the last value otherwise. Then use it with partial=TRUE in rollapplyr:
Sum <- function(x) if (length(x) < 6) tail(x, 1) else sum(head(x, -1))
rollapplyr(x, 6, Sum, partial = TRUE)
## [1] 1 2 3 4 5 15 20 25 30 35
2) sum 6 and subtract off original Another possibility is to take the running sum of 6 elements filling in the first 5 elements with NA and subtracting off the original vector. Finally fill in the first 5.
replace(rollsumr(x, 6, fill = NA) - x, 1:5, head(x, 5))
## [1] 1 2 3 4 5 15 20 25 30 35
3) specify offsets A third possibility is to use the offset form of width to specify the prior 5 elements:
c(head(x, 5), rollapplyr(x, list(-(1:5)), sum))
## [1] 1 2 3 4 5 15 20 25 30 35
4) alternative specification of offsets In this alternative we specify an offset of 0 for each of the first 5 elements and offsets of -(1:5) for the rest.
width <- replace(rep(list(-(1:5)), length(x)), 1:5, list(0))
rollapply(x, width, sum)
## [1] 1 2 3 4 5 15 20 25 30 35
Note
The scheme for filling in the first 5 elements seems quite unusual and you might consider using partial sums for the first 5 with NA or 0 for the first one since there are no prior elements fir that one:
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = NA)
## [1] NA 1 3 6 10 15 20 25 30 35
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = 0)
## [1] 0 1 3 6 10 15 20 25 30 35
rollapplyr(x, 6, sum, partial = TRUE) - x
## [1] 0 1 3 6 10 15 20 25 30 35
A simple approach would be:
df <- data.frame(x = seq(1:10))
mysum <- function(x, k = 5) {
res <- rep(NA, length(x))
for (i in seq_along(x)) {
if (i <= k) { # edited ;-)
res[i] <- x[i]
} else {
res[i] <- sum(x[(i-k):(i-1)])
}
}
res
}
mysum(df$x)
# [1] 1 2 3 4 5 15 20 25 30 35
mysum <- function(x, k = 5) {
res <- x[1:k]
append<-sapply(2:(len(x)+1-k),function(i) sum(x[i:(i+k-1)]))
return(c(res,append))
}
mysum(df$x)
I want to do an easy subtract in R, but I don't know how to solve it. I would like to know if I have to do a loop or if there is a function.
I have a column with numeric variables, and I would like to subtract n by n-1.
Time_Day Diff
10 10
15 5
45 30
60 15
Thus, I would like to find the variable "Diff".
you can also try with package dplyr
library(dplyr)
mutate(df, dif=Time_Day-lag(Time_Day))
# Time_Day Diff dif
# 1 10 10 NA
# 2 15 5 5
# 3 45 30 30
# 4 60 15 15
Does this do what you need?
Here we save the column as a variable:
c <- c(10, 15, 45, 60)
Now we add a 0 to the beginning and then cut off the last element:
cm1 <- c(0, c)[1:length(c)]
Now we subtract the two:
dif <- c - cm1
If we print that out, we get what you're looking for:
dif # 10 5 30 15
With diff :
df <- data.frame(Time_Day = c(10, 15, 45, 60))
df$Diff <- c(df$Time_Day[1], diff(df$Time_Day))
df
## Time_Day Diff
##1 10 10
##2 15 5
##3 45 30
##4 60 15
It works fine in dplyr too :
library("dplyr")
df <- data.frame(Time_Day = c(10, 15, 45, 60))
df %>% mutate(Diff = c(Time_Day[1], diff(Time_Day)))