I am currently trying to simplify this summation. I am new to R.
Data
Lx = c(5050.0, 65.0, 25.0, 19.0, 17.5, 16.5, 15.5, 14.5, 13.5, 12.5, 6.0, 0.0)
Summation series
Tx = c(sum(Lx[1:12]),sum(Lx[2:12]),sum(Lx[3:12]),sum(Lx[4:12]),
sum(Lx[5:12]),sum(Lx[6:12]),sum(Lx[7:12]),sum(Lx[8:12]),
sum(Lx[9:12]),sum(Lx[10:12]),sum(Lx[11:12]),sum(Lx[12:12]))
You can do:
rev(cumsum(rev(Lx)))
[1] 5255.0 205.0 140.0 115.0 96.0 78.5 62.0 46.5 32.0 18.5 6.0 0.0
Or alternatively, using Reduce():
Reduce(`+`, Lx, right = TRUE, accumulate = TRUE)
[1] 5255.0 205.0 140.0 115.0 96.0 78.5 62.0 46.5 32.0 18.5 6.0 0.0
Using a for loop:
Tx_new <- vector(length = length(Lx))
for (i in 1:length(Lx)) {
Tx_new[i] <- sum(Lx[i:length(Lx)])
}
A possible solution, using sapply:
sapply(1:12, function(x) sum(Lx[x:12]))
#> [1] 5255.0 205.0 140.0 115.0 96.0 78.5 62.0 46.5 32.0 18.5
#> [11] 6.0 0.0
Package spatstat.utils provides a fast version ("under certain conditions") of the reverse cumulative sum, revcumsum, which is based on computing sum(x[i:n]) with n = length(x) (basically #Jan Brederecke's answer):
Lx = c(5050.0, 65.0, 25.0, 19.0, 17.5, 16.5, 15.5, 14.5, 13.5, 12.5, 6.0, 0.0)
# install.packages("spatstat.utils")
spatstat.utils::revcumsum(Lx)
# [1] 5255.0 205.0 140.0 115.0 96.0 78.5 62.0 46.5 32.0 18.5 6.0 0.0
Benchmark
x = c(5050.0, 65.0, 25.0, 19.0, 17.5, 16.5, 15.5, 14.5, 13.5, 12.5, 6.0, 0.0)
bm <- microbenchmark(
fRev(x),
fReduce(x),
fJan(x),
fEshita(x),
fsapply(x),
fRevcumsum(x),
times = 100L
)
autoplot(bm)
rev(cumsum(rev(Lx))) and spatstat.utils::revcumsum(Lx) seem like the fastest solutions.
Please try the following code:
Lx = c(5050.0, 65.0, 25.0, 19.0, 17.5, 16.5, 15.5, 14.5, 13.5, 12.5, 6.0, 0.0)
l=length(Lx)
aa=list()
for(i in 1:l)
{
x=sum((Lx[i:l]))
aa=append(aa,x)
}
all the values after summation will be in the list "aa".
Related
I have a simple question. I have a numeric vector with increasing order.
e.g.
[1] 13.5 13.9 14.2 14.5 14.8 15.2 16.0 16.9 17.4 17.8 18.3 18.7 19.4
and I want to find the relative position of a specific number between these vectors.
e.g.
f(13)
' < 13.5 '
f(15.0)
' 14.8 <= < 15.2
You can use cut():
vec <- c(13.5, 13.9, 14.2, 14.5, 14.8, 15.2, 16.0, 16.9, 17.4, 17.8, 18.3, 18.7, 19.4)
cut(13, c(-Inf, vec, Inf))
# [1] (-Inf,13.5]
cut(15, c(-Inf, vec, Inf))
# [1] (14.8,15.2]
I have a dataframe one of the cols is id and some of the values have been messed up during the recording of the data.
here's an example of the type of data
dput(df)
structure(list(Id = c("'110171786'", "'1103fbfd5'", "'0700edf6dc'",
"'1103fad09'", "'01103fc9bb'", "''", "''", "0000fba2b'", "'01103fb169'",
"'01103fd723'", "'01103f9c34'", "''", "''", "''", "'01103fc088'",
"'01103fa6d8'", "'01103fb374'", "'01103fce8c'", "'01103f955d'",
"'011016e633'", "'01103fa0da'", "''", "''", "''", "'01103fa4bd'",
"'01103fb5c4'", "'01103fd0d7'", "'01103f9e2e'", "'01103fc657'",
"'01103fd4d1'", "'011016e78e'", "'01103fbda2'", "'01103fbae7'",
"'011016ee23'", "'01103fc847'", "'01103fbfbb'", "''", "'01103fb8bb'",
"'01103fc853'", "''", "'01103fbcd5'", "'011016e690'", "'01103fb253'",
"'01103fcb19'", "'01103fb446'", "'01103fa4fa'", "'011016cfbd'",
"'01103fd250'", "'01103fac7d'", "'011016a86e'"), Weight = c(11.5,
11.3, 11.3, 10.6, 10.6, 8.9, 18.7, 10.9, 11.3, 18.9, 18.9, 8.6,
8.8, 8.4, 11, 10.4, 10.4, 10.8, 11.2, 11, 10.3, 9.5, 8.1, 9.3,
10.2, 10.5, 11.2, 21.9, 18, 17.8, 11.3, 11.5, 10.8, 10.5, 12.8,
10.9, 8.9, 10.3, 10.8, 8.9, 10.9, 9.9, 19, 11.6, 11.3, 11.7,
10.9, 12.1, 11.3, 10.6)), class = "data.frame", row.names = c(NA,
-50L))
>
What I would like to do is search through the id column and replace the following mistakes
some of the values have a zero missing off the front, all of these would start with a 1 now instead which makes finding them easily. So basically anything that has a character length of 9 and starts with a 1 needs a 0 as the first character.
some of the values are less than 10 characters long, these need to be removed.
some have more than one leading 0 and these need to be removed.
df$Id <- gsub("^('?)(1.{8}')$", "\\10\\2", df$Id)
df[ !grepl("^'?(00|'$)", df$Id),]
# Id Weight
# 1 '0110171786' 11.5
# 2 '01103fbfd5' 11.3
# 3 '0700edf6dc' 11.3
# 4 '01103fad09' 10.6
# 5 '01103fc9bb' 10.6
# 9 '01103fb169' 11.3
# 10 '01103fd723' 18.9
# 11 '01103f9c34' 18.9
# 15 '01103fc088' 11.0
# 16 '01103fa6d8' 10.4
# 17 '01103fb374' 10.4
# 18 '01103fce8c' 10.8
# 19 '01103f955d' 11.2
# 20 '011016e633' 11.0
# 21 '01103fa0da' 10.3
# 25 '01103fa4bd' 10.2
# 26 '01103fb5c4' 10.5
# 27 '01103fd0d7' 11.2
# 28 '01103f9e2e' 21.9
# 29 '01103fc657' 18.0
# 30 '01103fd4d1' 17.8
# 31 '011016e78e' 11.3
# 32 '01103fbda2' 11.5
# 33 '01103fbae7' 10.8
# 34 '011016ee23' 10.5
# 35 '01103fc847' 12.8
# 36 '01103fbfbb' 10.9
# 38 '01103fb8bb' 10.3
# 39 '01103fc853' 10.8
# 41 '01103fbcd5' 10.9
# 42 '011016e690' 9.9
# 43 '01103fb253' 19.0
# 44 '01103fcb19' 11.6
# 45 '01103fb446' 11.3
# 46 '01103fa4fa' 11.7
# 47 '011016cfbd' 10.9
# 48 '01103fd250' 12.1
# 49 '01103fac7d' 11.3
# 50 '011016a86e' 10.6
There are three companies, A, B, and C. They have rates stored in vectors.
A <- c(19, 19, 19, 20, 12)
B <- c(19, 19, 20, 20, 20, 20, 19, 19, 19, 11)
C <- c(13, 13)
In this example there are 17 rates but only five unique rates.
Each rate corresponds to a monthly fee over 12 months that is the same across companies.
> str(fees)
List of 5
$ 19: num [1:12] 8.5 24.8 40.9 56.6 72.1 ...
$ 20: num [1:12] 8.9 26.1 42.9 59.4 75.7 ...
$ 12: num [1:12] 5.5 16.1 26.6 37 47.2 57.4 67.4 77.3 87.1 96.8 ...
$ 11: num [1:12] 4.8 13.9 23 31.9 40.8 49.6 58.3 66.9 75.5 83.9 ...
$ 13: num [1:12] 5.9 17.4 28.6 39.8 50.8 61.7 72.4 83.1 93.6 104 ...
The goal is to build the fee matrix for each company. So for A, the fee matrix is:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 8.5 24.8 40.9 56.6 72.1 87.4 102.4 117.1 131.6 145.9 159.9 173.1
[2,] 8.5 24.8 40.9 56.6 72.1 87.4 102.4 117.1 131.6 145.9 159.9 173.1
[3,] 8.5 24.8 40.9 56.6 72.1 87.4 102.4 117.1 131.6 145.9 159.9 173.1
[4,] 8.9 26.1 42.9 59.4 75.7 91.6 107.3 122.7 137.9 152.7 167.3 181.3
[5,] 5.5 16.1 26.6 37.0 47.2 57.4 67.4 77.3 87.1 96.8 106.4 114.6
My guess is the best way to build these matrices is with a lookup table, however I am unsure as to how to do that.
R Script
A <- c(19, 19, 19, 20, 12)
B <- c(19, 19, 20, 20, 20, 20, 19, 19, 19, 11)
C <- c(13, 13)
all_comp <- c(A, B, C)
unique_comp <- unique(all_comp)
# Only 5 unique rates
# 19 20 12 11 13
fees <- list('19' = c(8.5, 24.8, 40.9, 56.6, 72.1, 87.4, 102.4, 117.1, 131.6, 145.9, 159.9, 173.1),
'20' = c(8.9, 26.1, 42.9, 59.4, 75.7, 91.6, 107.3, 122.7, 137.9, 152.7, 167.3, 181.3),
'12' = c(5.5, 16.1, 26.6, 37.0, 47.2, 57.4, 67.4, 77.3, 87.1, 96.8, 106.4, 114.6),
'11' = c(4.8, 13.9, 23.0, 31.9, 40.8, 49.6, 58.3, 66.9, 75.5, 83.9, 92.3, 99.3),
'13' = c(5.9, 17.4, 28.6, 39.8, 50.8, 61.7, 72.4, 83.1, 93.6, 104.0, 114.2, 123.1))
# Desired result
A_m <- matrix(c(rep(fees[['19']], 3), fees[['20']], fees[['12']]), 5, 12, byrow = TRUE)
Update:
Thanks to the comment below, the function might be as short as:
desired_result <- function(input_vec, fees) (result <- fees[as.character(input_vec)])
result <- desired_result(A, fees)
Then you can easily convert it to matrix:
result_matrix <- matrix(unlist(result),
nrow = length(result),
ncol = length(result[[1]]))
Old answer
How about wrapping it up as a separate function:
desired_result <- function(input_vec, fees) {
N <- length(input_vec)
result <- list()
for (i in 1:N) {
result[[i]] <- fees[[as.character(input_vec[i])]]
}
return(result)
}
You should store the company rates in a named list and the fees in a matrix whose row names correspond to the unique rate values.
rates <- list(A = A, B = B, C = C)
lookup <- matrix(unlist(fees), ncol = length(fees[[1L]]), byrow = TRUE, dimnames = list(names(fees), NULL))
Then you can use the matrix as a lookup table as follows:
res <- lapply(rates, function(x) lookup[as.character(x), ])
The result is a named list of company fee matrices, with res[["A"]] extracting the fee matrix for company A, and so on.
FYI, you can learn about the different ways to index vectors and arrays by reading the help page on the subset operator, accessible with ?`[`. Here, we are indexing the rows of lookup with a character vector whose elements are a subset of the row names of the matrix (with duplicates).
I'm relatively new to R and am having trouble processing my data into a more workable form. If I had a continuous x and y vector, some with with multiple x values for the same y value how would I go about writing a script which could automatically average those multiple x values and create a new data.set with the the average x values and y values of the same length. An example is included below.
X <- c(34.2, 35.3, 32.1, 33.0, 34.7, 34.2, 34.1, 34.0, 34.1)
Y <- c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2, 5.6)
I think this does what you want. The aggregate function will group y by x in this case and take the mean.
x<-c(34.2,35.3,32.1,33.0,34.7, 34.2, 34.1, 34.0, 34.1)
y<-c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2, 5.6 )
df<-data.frame(x=x,y=y)
df2<-aggregate(y~.,data=df,FUN=mean)
df2
I assume you want the average for each Y value
Try this:
X <- c(34.2, 35.3, 32.1, 33.0, 34.7, 34.2, 34.1, 34.0, 34.1)
Y <- c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2, 5.6)
xy <- cbind(X,Y)
xy<- as.data.frame(xy)
tapply( X = xy$X,INDEX = list(xy$Y),FUN = mean )
If I understand you correctly, you want a new dataset in which for every Y value, you have the average of the corresponding X values. Using the fact that an average of a vector of length 1 is just that value to handle singletons, this can be done easily with dplyr.
X <- c(34.2, 35.3, 32.1, 33.0, 34.7, 34.2, 34.1, 34.0, 34.1)
Y <- c(90.1, 90.1, 72.5, 63.1, 45.1, 22.2, 22.2, 22.2, 5.6)
Df <- data.frame(X, Y)
> Df
X Y
1 34.2 90.1
2 35.3 90.1
3 32.1 72.5
4 33.0 63.1
5 34.7 45.1
6 34.2 22.2
7 34.1 22.2
8 34.0 22.2
9 34.1 5.6
Now:
library(dplyr)
Df2 <- Df %>% group_by(Y) %>% summarize(X = mean(X))
> Df2
Source: local data frame [6 x 2]
Y X
1 5.6 34.10
2 22.2 34.10
3 45.1 34.70
4 63.1 33.00
5 72.5 32.10
6 90.1 34.75
I've asked many questions about this and all the answers were really helpful...but once again my data is weird and I need help...Basically, what I want to do is find the average speed at a certain range of intervals...lets say from 6 s to 40 s my average speed would be 5 m/s...etc etc..
So it was pointed out to me to use this code...
library(IRanges)
idx <- seq(1, ncol(data), by=2)
# idx is now 1, 3, 5. It will be passed one value at a time to `i`.
# that is, `i` will take values 1 first, then 3 and then 5 and each time
# the code within is executed.
o <- lapply(idx, function(i) {
ir1 <- IRanges(start=seq(0, max(data[[i]]), by=401), width=401)
ir2 <- IRanges(start=data[[i]], width=1)
t <- findOverlaps(ir1, ir2)
d <- data.frame(mean=tapply(data[[i+1]], queryHits(t), mean))
cbind(as.data.frame(ir1), d)
})
which gives this output
# > o
# [[1]]
# start end width mean
# 1 0 400 401 1.05
#
# [[2]]
# start end width mean
# 1 0 400 401 1.1
#
# [[3]]
# start end width mean
# 1 0 400 401 1.383333
So if I wanted it to be every 100 s... I'll just change ir1 <- ....., by = 401 to become by=100.
But my data is weird because of a few things
my data doesnt always start with 0 s sometimes it starts at 20 s...depending on the specimen and whether it moves
My data collection does not happen every 1s or 2s or 3s. Hence sometimes I get data 1-20 s but it skips over 20-40 s simply because the specimen does not move.
I think the findOverlaps portion of the code affects my output. How can I get rid of that without disturbing the output?
Here is some data to illustrate my troubles...but all of my real data ends in 2000s
Time Speed Time Speed Time Speed
6.3 1.6 3.1 1.7 0.3 2.4
11.3 1.3 5.1 2.2 1.3 1.3
13.8 1.3 6.3 3.4 3.1 1.5
14.1 1.0 7.0 2.3 4.5 2.7
47.4 2.9 11.3 1.2 5.1 0.5
49.2 0.7 26.5 3.3 5.9 1.7
50.5 0.9 27.3 3.4 9.7 2.4
57.1 1.3 36.6 2.5 11.8 1.3
72.9 2.9 40.3 1.1 13.1 1.0
86.6 2.4 44.3 3.2 13.8 0.6
88.5 3.4 50.9 2.6 14.0 2.4
89.0 3.0 62.6 1.5 14.8 2.2
94.8 2.9 66.8 0.5 15.5 2.6
117.4 0.5 67.3 1.1 16.4 3.2
123.7 3.2 67.7 0.6 26.5 0.9
124.5 1.0 68.2 3.2 44.7 3.0
126.1 2.8 72.1 2.2 45.1 0.8
As you can see from the data, it doesnt necessarily end in 60 s etc sometimes it only ends at 57 etc
EDIT add dput of data
structure(list(Time = c(6.3, 11.3, 13.8, 14.1, 47.4, 49.2, 50.5,
57.1, 72.9, 86.6, 88.5, 89, 94.8, 117.4, 123.7, 124.5, 126.1),
Speed = c(1.6, 1.3, 1.3, 1, 2.9, 0.7, 0.9, 1.3, 2.9, 2.4,
3.4, 3, 2.9, 0.5, 3.2, 1, 2.8), Time.1 = c(3.1, 5.1, 6.3,
7, 11.3, 26.5, 27.3, 36.6, 40.3, 44.3, 50.9, 62.6, 66.8,
67.3, 67.7, 68.2, 72.1), Speed.1 = c(1.7, 2.2, 3.4, 2.3,
1.2, 3.3, 3.4, 2.5, 1.1, 3.2, 2.6, 1.5, 0.5, 1.1, 0.6, 3.2,
2.2), Time.2 = c(0.3, 1.3, 3.1, 4.5, 5.1, 5.9, 9.7, 11.8,
13.1, 13.8, 14, 14.8, 15.5, 16.4, 26.5, 44.7, 45.1), Speed.2 = c(2.4,
1.3, 1.5, 2.7, 0.5, 1.7, 2.4, 1.3, 1, 0.6, 2.4, 2.2, 2.6,
3.2, 0.9, 3, 0.8)), .Names = c("Time", "Speed", "Time.1",
"Speed.1", "Time.2", "Speed.2"), class = "data.frame", row.names = c(NA,
-17L))
sorry if i don't understand your question entirely, could you explain why this example doesn't do what you're trying to do?
# use a pre-loaded data set
mtcars
# choose which variable to cut
var <- 'mpg'
# define groups, whether that be time or something else
# and choose how to cut it.
x <- cut( mtcars[ , var ] , c( -Inf , seq( 15 , 25 , by = 2.5 ) , Inf ) )
# look at your cut points, for every record
x
# you can merge them back on to the mtcars data frame if you like..
mtcars$cutpoints <- x
# ..but that's not necessary
# find the mean within those groups
tapply(
mtcars[ , var ] ,
x ,
mean
)
# find the mean within groups, using a different variable
tapply(
mtcars[ , 'wt' ] ,
x ,
mean
)