I am trying to calculate the sum of this sequence in R.
The sequence will have two Inputs (1 and 11) in the below case,
1 + (1 * 2/3 ) + (1 * 2/3 * 4/5) + ( 1 * 2/3 * 4/5 * 6/7) + ....................(1 *......10/11)
I think, defining my own is the way to go here.
You could try just using old-fashioned loops here:
sum <- 0
num_terms <- 6
for (i in 1:num_terms) {
y <- 1
if (i > 1) {
for (j in 1:(i-1)) {
y <- y * (j*2 / (j*2 + 1))
}
}
sum <- sum + y
}
You can set num_terms to any value you want, from 1 to a higher value. In this case, I use 6 terms because this is the requested number of terms in your question.
Someone will probably come along and reduce the entire code snippet above to one line, but in my mind an explicit loop is justified here.
Here is a link to a demo which prints out the values being used in each of the terms, for verification purposes:
Demo
My approach:
# input
start <- 1
n <- 5 # number of terms
end <- start + n*2
result <- start
to_add <- start
for (i in (start + 1):(end-1)) {
to_add <- to_add * (i / (i + 1))
result <- result + to_add
}
which gives:
> result
[1] 4.039755
Another base R alternative using cumprod to generate the inner terms is
sum(cumprod(c(1, seq(2, 10, 2)) / c(1, seq(3, 11, 2))))
[1] 3.4329
Here, c(1, seq(2, 10, 2)) / c(1, seq(3, 11, 2)) generates the sequence 1, 2/3, 4/5, 6/7, 8/9, 10/11 and cumprod takes the cumulative product. This result is summed with sum. The returned result is identical to the one in the accepted answer.
you can try:
library(tidyverse)
Result <- tibble(a=seq(1, 11, 2)) %>%
mutate(b=lag(a, default = 0)+1) %>%
mutate(Prod=cumprod(b)/cumprod(a)) %>%
mutate(Sum=cumsum(Prod))
Result
# A tibble: 6 x 4
a b Prod Sum
<dbl> <dbl> <dbl> <dbl>
1 1 1 1.0000000 1.000000
2 3 2 0.6666667 1.666667
3 5 4 0.5333333 2.200000
4 7 6 0.4571429 2.657143
5 9 8 0.4063492 3.063492
6 11 10 0.3694084 3.432900
# and some graphical analysis
Result %>%
ggplot(aes(as.factor(a), Prod, group=1)) +
geom_col(aes(as.factor(a), Sum), alpha=0.4)+
geom_point() +
geom_line()
Related
So I am basically looking for a more efficient way to do this:
c(seq(1, 5, 2), seq(2, 6, 2))
Is there a simpler function built in R or some of the packages that would allow me to specify just one interval (from 1 to 6; instead of having to specify from 1 to 5 and from 2 to 6), but to sort the numbers so that all the odd numbers appear before the even ones?
You can use sequence. The first argument of the function is the length of each sequence, from is the starting point, and by is the interval.
sequence(c(3, 3), from = c(1, 2), by = 2)
#[1] 1 3 5 2 4 6
Or, as a function that fits your request:
seqOrdered <- function(from = 1, to){
n = ceiling((to - from) / 2)
sequence(c(n, n), from = c(from, from + 1), by = 2)
}
seqOrdered(1, 6)
#[1] 1 3 5 2 4 6
Just concatenate the sub-data that contains only odd numbers of the original data and the other sub-data that contains the remaining even numbers.
In the following, you can have the original data x1, which consists of 10 integers from a poisson distribution of mean 8 (rpois(n = 10, lambda = 8)), and merge the sub-data of odd numbers (x1[x1 %% 2 == 1]) and that of even numbers (x1[x1 %% 2 == 0]).
## To prepare data
x1 <- rpois(n = 10, lambda = 8)
x1
## To sort the data so that odd numbers come earlier
c(x1[x1 %% 2 == 1], x1[x1 %% 2 == 0])
I'm trying to write a for-loop of a dataset. Just to make it simple, I'll write an example:
Two variables, X and Y.
X = 3, 6, 9
Y = 4, 8, 12
I want to make a loop that does this:
(Xi - Yi)^2, so first (3-4)^2, then
(6-8)^2 and so on.
Then, after that is done, multiply by this:
((1/2)/(n*(n-1))).
In this example, it would be:
(3-4)^2 + (6-8)^2 + (9-12)^2 = 1 + 4 + 9 = 14
1/2 / (3*(3-1)) = 0.5 / 6 = 0.0833.
0.0833 * 14 = 1.166.
result <- 0
sum <- rep(NA, n)
for (i in (1:n)) {
for(j in (1:n)) {
sum <- ((gathered$X[i] - gathered$X[j])^2)
}
}
Usually in R you can avoid for loops most of the times. For your case you can do
sum((X - Y)^2) * (1/2)/(length(X) * (length(X) - 1))
#[1] 1.166666667
However, as far as for loop is concerned you should be using a single loop since you want to access X[i] and Y[i] together.
sum <- 0
n <- 3
for (i in (1:n)) {
sum <- sum + (X[i] - Y[i])^2
}
sum * (1/2)/(n*(n-1))
#[1] 1.1667
data
X = c(3, 6, 9)
Y = c(4, 8, 12)
How about this, i think outer is fit to your problem.
CASE 1 ( X-Y )
sum(diag(outer(X,Y,function(X,Y)(X-Y)^2))) *
(1/2)/(length(X) * (length(X) - 1))
1.166667
CASE 2 ( all X and Y calculation )
sum(outer(X,Y,function(X,Y)(X-Y)^2)) *
(1/2)/(length(X) * (length(X) - 1))
15.5
I have a large table with timestamps from several nights. Columns are an id for what night, an id for what timestamp within that night and the hearth rate at that timestamp, it looks like this:
allData <- data.table(nightNo=c(1,1,1,1,1,1,2,2,2,2), withinNightNo=c(1,2,3,4,5,6,1,2,3,4), HR=c(1:10))
nightNo withinNightNo HR
1 1 1
1 2 2
1 3 3
1 4 4
1 5 5
1 6 6
2 1 7
2 2 8
2 3 9
2 4 10
I'd like to add two new columns to the table, the slope and the cumsum of HR from up to the last 10 rows of the same night. I calculate the slope using linear regression and defined cumsum as: CUMSUMn = MAX(CUMSUMn-1, 0) + (valuen - MEAN(value1-n)). The result should look like this:
nightNo withinNightNo HR HRSlope HRCumsum
1 1 1 NaN 0.0
1 2 2 1 0.5
1 3 3 1 1.5
1 4 4 1 3.0
1 5 5 1 5.0
1 6 6 1 7.5
2 1 7 NaN 0.0
2 2 8 1 0.5
2 3 9 1 1.5
2 4 10 1 3.0
I've created code for both of these functions using for loops. They work, but my table is so large that it takes a long time to even calculate the slope/cumsum of a single value. My code looks like this:
# Add HRSlope column
allData$HRSlope <- 0
for(i in 1:nrow(allData)){
# Get points from up to last 10 seconds of the same night
start <- ifelse(i < 11, 1, (i-10))
points <- filter(allData[start:i,], nightNo == allData[i,]$nightNo)[, c("withinNightNo", "HR")]
# Calculate necessary values
meanX <- mean(points$withinNightNo)
meanY <- mean(points$HR)
meanXY <- mean(points$withinNightNo * points$HR)
meanX2 <- mean(points$withinNightNo^2)
# Calculate slope and add to table
allData[i,]$HRSlope <- (meanX * meanY - meanXY) / (meanX^2 - meanX2)
cat(i, "\n")
}
# Add cumsum column, and add first value to sum
allData$HRCumsum <- 0
Sum <- allData[1,]$HR
for(i in 2:nrow(allData)){
# Get sum and average of HR in night so far, reset Sum if new night started
Sum <- allData[i,]$HR + ifelse(allData[i,]$nightNo != allData[i-1,]$nightNo, 0 , Sum )
Average <- Sum / allData[i,]$withinNightNo
# Get previous cumsum, if available
pCumsum <- ifelse(allData[i,]$nightNo != allData[i-1,]$nightNo, 0 , allData[i-1,]$HRCumsum )
# Calculate current cumsum
allData[i,]$HRCumsum <- max(pCumsum, 0) + (allData[i,]$HR - Average)
cat(i, "\n")
}
Is there a more efficient way to do this, presumably without for loops?
EDIT:
I've been able to increase the speed of my slope function somewhat. It however still uses a forloop and it actually puts down a wrong value in a field for 9 times before putting down the correct value. Any thoughts on how to fix these two issues?
getSlope <- function(x, y) {
# Calculate necessary values
meanX <- mean(x)
meanY <- mean(y)
meanXY <- mean(x * y)
meanX2 <- mean(x^2)
# Calculate slope
return((meanX * meanY - meanXY) / (meanX^2 - meanX2))
}
# Loop back to 1
for(i in max(allData):1){
# Prevent i<=0
low <- ifelse(i < 10, 0, i-10)
# Grab up to last 10 points and calculate slope
allData[with(allData, withinNightNo > i-10 & withinNightNo <= i), slope := getSlope(withinNightNo, HR), by= nightNo]
}
EDIT2:
I've also been able to improve my cumsum a little, but it suffers from the same things as the slope. Besides that it takes larger chuncks of the table, because it needs to get the average, and needs to loop over all the data twice. Any thoughts on improving this would also be highly be appreciated.
# Calculate part of the cumsum
getCumsumPart <- function(x){
return(x-mean(x))
}
# Calculate valueN - mean(value1:N)
for(i in max(allData$withinNightNo):1){
allData[with(allData, withinNightNo <= i), cumsumPart:=
getCumsumPart(HR), by=nightNo]
}
# Calculate + max(cumsumN-1, 0)
for(i in max(allData$withinNightNo):1){
allData[with(allData, withinNightNo <= i & cumsumPart > 0), cumsum:= sum(cumsumPart), by=nightNo]
}
# Remove part table
allData$cumsumPart <- NULL
# Set NA values to 0
allData[with(allData, is.na(cumsum)), cumsum := 0]
Try this approach
library(dplyr)
library(caTools)
allData <- data.frame(nightNo=c(1,1,1,1,1,1,2,2,2,2),
withinNightNo=c(1,2,3,4,5,6,1,2,3,4),
HR=c(1:10))
group_fun <- function(grouped_df, window=10L) {
# slope
mean_x <- runmean(grouped_df$withinNightNo, window, align="right")
mean_y <- runmean(grouped_df$HR, window, align="right")
mean_xy <- runmean(grouped_df$withinNightNo * grouped_df$HR, window, align="right")
mean_xx <- runmean(grouped_df$withinNightNo * grouped_df$withinNightNo, window, align="right")
grouped_df$slope <- (mean_x * mean_y - mean_xy) / (mean_x^2 - mean_xx)
# cumsum
partial <- grouped_df$HR - mean_y # from above
# the "loop" is unavoidable here, I think
cumsum <- 0
grouped_df$cumsum <- sapply(partial, function(val) {
cumsum <<- max(cumsum, 0) + val
cumsum
})
grouped_df
}
out <- allData %>%
group_by(nightNo) %>%
do(group_fun(., window=3L)) # change window as desired
Is there a method to generate random integers in R such that any two consecutive numbers are different? It is probably along the lines of x[k+1] != x[k] but I can't work out how to put it all together.
Not sure if there is a function available for that. Maybe this function can do what you want:
# n = number of elements
# sample_from = draw random numbers from this range
random_non_consecutive <- function(n=10,sample_from = seq(1,5))
{
y=c()
while(length(y)!=n)
{
y= c(y,sample(sample_from,n-length(y),replace=T))
y=y[!c(FALSE, diff(y) == 0)]
}
return(y)
}
Example:
random_non_consecutive(20,c(2,4,6,8))
[1] 6 4 6 2 6 4 2 8 4 2 6 2 8 2 8 2 8 4 8 6
Hope this helps.
The function above has a long worst-case runtime. We can keep that worst-case more constant with for example the following implementation:
# n = number of elements
# sample_from = draw random numbers from this range
random_non_consecutive <- function(n=10,sample_from = seq(1,5))
{
y= rep(NA, n)
prev=-1 # change this if -1 is in your range, to e.g. max(sample_from)+1
for(i in seq(n)){
y[i]=sample(setdiff(sample_from,prev),1)
prev = y[i]
}
return(y)
}
Another approach is to over-sample and remove the disqualifying ones as follows:
# assumptions
n <- 5 # population size
sample_size <- 1000
# answer
mu <- sample_size * 1/n
vr <- sample_size * 1/n * (1 - 1/n)
addl_draws <- round(mu + vr, 0)
index <- seq(1:n)
sample_index <- sample(index, sample_size + addl_draws, replace = TRUE)
qualified_sample_index <- sample_index[which(diff(sample_index) != 0)]
qualified_sample_index <- qualified_sample_index[1:sample_size]
# In the very unlikely event the number of qualified samples < sample size,
# NA's will fill the vector. This will print those N/A's
print(which(is.na(qualified_sample_index) == TRUE))
I am doing this in R. Though there is a shortcut way to do the following in R, I want to check it out:
x <- c(7,6,8,7)
y <- 1
n <- length(x)
p=1
s = 0
for(i in 1:n){
s = s + (x^p * y^p)
}
s
Since I have not specified the index number of x in the for loop, I guessed that only the first element of x will be used. So I made a table:
i s= s + (x^p * y^p)
1 s= 0 + (7^1 * 1^1)=7
2 s= 7 + (7^1 * 1^1)=14
3 s= 14 + (7^1 * 1^1)=21
4 s= 21 + (7^1 * 1^1)=28
But the result was:
s
[1] 28 24 32 28
I couldn't match this result in any way. How does it work?
x is not a single integer, but a vector of integers. You need to subset your x the way god intended. When you specify x, R doesn't use just the first element, but all of them. This is the assumption that is clouding your solution.
x <- c(7,6,8,7)
y <- 1
n <- length(x)
p <- 1
s <- 0
for(i in 1:n){
s <- s + (x[1]^p * y^p)
message(s)
}
7
14
21
28