Finding value of a series in R without for-loop - r

I am a newbie in R` and I found this problem:
Calculate the following sum using R:
1+(2/3)+(2/3)(4/5)+...+(2/3)(4/5)...(38/39)
I was enthusiastic to know how to solve this without using a for loop, and using only vector operations.
My thoughts and what I've tried till now:
Suppose I create two vectors such as
x<-2*(1:19)
y<-2*(1:19)+1
Then, x consists of all the numerators in the question and y has all the denominators. Now
z<-x/y
will create a vector of length 19 in which will be stored the values of 2/3, 4/5, ..., 38/39
I was thinking of using the prod function in R to find the required products. So, I created a vector such that
i<-1:19
In hopes of traversing z from the first element to the last, I did write:
prod(z[1:i])
But it failed miserably, giving me the result:
[1] 0.6666667
Warning message:
In 1:i : numerical expression has 19 elements: only the first used
What I wanted to do:
I expected to store the values of (2/3), (2/3)(4/5), ..., (2/3)(4/5)...(38/39) individually in another vector (say p) which will thus have 19 elements in it. I then intend to use the sum function to finally find out the sum of all those...
Where am I stuck:
As described in the R documentation, the prod function returns the product of all the values present in its arguments. So,
prod(z[1:1])
prod(z[1:2])
prod(z[1:3])
will return the values of (2/3), (2/3)(4/5), (2/3)(4/5)(6/7) respectively which it does:
> prod(z[1:1])
[1] 0.6666667
> prod(z[1:2])
[1] 0.5333333
> prod(z[1:3])
[1] 0.4571429
But it's not possible to go on like this and do it for all the 19 elements of the vector z. I am stuck here thinking as to what could be done. I wanted to iterate all the elements of z one-by-one for which I created another vector i as described above, but it didn't go as I had thought. Any help, suggestions, and hints will be really great as to how this can be done. I seem to have run out of ideas here.
More Information:
Here, I am providing with all the outputs in a systematic manner for others to understand my problem better:
> x
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
> y
[1] 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
> z
[1] 0.6666667 0.8000000 0.8571429 0.8888889 0.9090909 0.9230769 0.9333333
[8] 0.9411765 0.9473684 0.9523810 0.9565217 0.9600000 0.9629630 0.9655172
[15] 0.9677419 0.9696970 0.9714286 0.9729730 0.9743590
> i
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Short Note (controversial statement ahead): This post would really have benefited from the use of LaTeX, but unfortunately, due to extremely heavy dependencies, as is mentioned in several posts regarding inclusion of LaTeX on Stack Overflow (like this), that is not a thing till now.

You can use cumprod to get a cumulative product of a vector which is what you are after
p <- cumprod(z)
p
# [1] 0.6666667 0.5333333 0.4571429 0.4063492 0.3694084 0.3409923 0.3182595
# [8] 0.2995384 0.2837732 0.2702602 0.2585097 0.2481694 0.2389779 0.2307373
# [15] 0.2232941 0.2165276 0.2103411 0.2046562 0.1994087
A less-efficient but more generalized alternative to cumprod would be
p <- sapply(i, function(x) prod(z[1:x]))
Here the sapply takes the place of the loop and passes a different ending index for each product
Then you can do
1 + sum(p)

Related

R triangular numbers function

While working on a small program for calculating the right triangular number that fulfils an equation, I stumbled over a page that holds documentation on the function Triangular()
Triangular function
When I tried to use this, Rstudio says it couldn't find it and I can't seem to find any other information about what library this could be in.
Does this function even exist and/or are there other ways to fill a vector with triangular numbers?
Here is a base R solution to define your custom triangular number generator, i.e.,
myTriangular <- function(n) choose(seq(n),2)
or
myTriangular <- function(n) cumsum(seq(n)-1)
such that
> myTriangular(10)
[1] 0 1 3 6 10 15 21 28 36 45
If you would like to use Triangular() from package Zseq, then please try
Zseq::Triangular(10)
such that
> Zseq::Triangular(10)
Big Integer ('bigz') object of length 10:
[1] 0 1 3 6 10 15 21 28 36 45
It's pretty easy to do it yourself:
triangular <- function(n) sapply(1:n, function(x) sum(1:x))
So you can do:
triangular(10)
# [1] 1 3 6 10 15 21 28 36 45 55

Kolmogorov-Smirnov using R

Long story short, I want to manually write the code for the Kolmogorov-Smirnov one-sample statistic instead of using ks.test() in R. From what I understand, the K-S test can be broken down into a ratio between a numerator and a denominator. I am interested in writing out the numerator, and from what I understand it is the maximal absolute difference between a sample of observations and the theoretical assumption. Let's use the below case as an example:
Data Expected
1 0.01052632 0.008864266
2 0.02105263 0.010969529
13 0.05263158 0.018282548
20 0.06315789 0.031689751
22 0.09473684 0.046315789
24 0.26315789 0.210526316
26 0.27368421 0.220387812
27 0.29473684 0.236232687
28 0.30526316 0.252520776
3 0.42105263 0.365650970
4 0.42105263 0.372299169
5 0.45263158 0.398781163
6 0.49473684 0.452853186
7 0.50526316 0.460277008
8 0.73684211 0.656842105
9 0.74736842 0.665484765
10 0.75789474 0.691523546
11 0.77894737 0.718005540
12 0.80000000 0.735955679
14 0.84210526 0.791135734
15 0.86315789 0.809972299
16 0.88421053 0.838559557
17 0.89473684 0.857950139
18 0.96842105 0.958337950
19 0.97894737 0.968642659
21 0.97894737 0.979058172
23 0.98947368 0.989473684
25 1.00000000 1.000000000
Here, I want to obtain the maximal absolute difference (Data - Expected).
Anyone have an idea? I can rephrase this question, if necessary. Thanks!
I was looking for an answer something along the lines of this code:
> A <- with(df, max(abs(Data-Expected)))
,where df is the data frame.
Here, I obtain the differences between each Data and Expected, convert the values into absolute values, and from the vector of absolute differences select the maximum value. Thus, the answer is:
> A
0.082

Is there a way to refer to the end of a vector?

I don't want to save the huge intermediate results for some of my calculations, and hence want to run some tests without saving these memory expensive vectors.
Say, during the computation I have a vector of arbitrary length l.
But I don't know what l is, and I can't save the vector in the memory.
Is there a way I can refer the length of the vector, something like
vec[100:END] or vec[100:-1] or vec[100:last]
Please note that vec here is not a variable, and it only refers to an intermediate expression which will output a vector.
I know length, head and tail functions, and that vec[-(1:99)] is an equivalent expression.
But, I actually want to know if there is some reference that will run an iteration from a specified number to the 'end' of the vector.
Thanks!!
I'm probably not understanding your question. If this isn't useful let me know and I'll delete it.
I gather you want to extract the elements from a vector of arbitrary length, from element N to the end, without explicitly storing the vector (which is required if you want to use, e.g. length(vec)). Here are two ways:
N <- 5 # grab element 5 to the end.
set.seed(12)
(1:sample(N:100,1))[-(1:(N-1))]
# [1] 5 6 7 8 9 10 11
set.seed(12)
tail(1:sample(N:100,1),-(N-1))
# [1] 5 6 7 8 9 10 11
Both of these create (temporarily) a sequence of integers of random length (>=5), and extract the elements from 5 to the end without self-referencing.
You mentioned memory a could of times. If you're concerned about memory and assigning large objects, you should take a look at the Memory-limits documentation, and the related links. First, there are ways to operate on the language in R. Here I only assign one object, the function f, and use it without making any other assignments.
> f <- function(x, y) x:y ## actually, g <- ":" is only 96 bytes
> object.size(f)
# 1560 bytes
> f(5, 20)[3:7]
# [1] 7 8 9 10 11
> object.size(f)
# 1560 bytes
> f(5, 20)[3:length(f(5, 20))]
# [1] 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> object.size(f)
# 1560 bytes
You can also use an expression to hold an unevaluated function call.
> e <- expression(f(5, 20)) ## but again, g <- ":" is better
> eval(e)
# [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> eval(e)[6:9]
# [1] 10 11 12 13
> eval(e)[6:length(eval(e))]
# [1] 10 11 12 13 14 15 16 17 18 19 20
> rev(eval(e))
# [1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5
Depending of the type of data you're working with, there are ways to
avoid using large amounts of memory during a session. Here are a few related to
your question.
memory.profile()
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
# Ncells 274711 14.7 531268 28.4 531268 28.4
# Vcells 502886 3.9 1031040 7.9 881084 6.8
?gc() is good knowledge to have, and I can't really explain it. Best to read
about it. Also, I just learned about memCompress() and memDecompress() for
in-memory compression/storage. Here's a look Also, if you're working with
integer values, notifying R about it can help save memory.
That's what the L is for on the end of the rep.int() call.
x <- rep.int(5L, 1e4L)
y <- as.raw(x)
z1 <- memCompress(y)
z2 <- memCompress(y, "b")
z3 <- memCompress(y, "x")
mapply(function(a) object.size(get(a)), c('x','y','z1','z2','z3'))
# x y z1 z2 z3
# 40040 10040 88 88 168
And there is also
delayedAssign("p", rep.int(5L, 1e5L))
which is a promise object that takes up 0 bytes of memory until it is first evaluated.

Need help pairing data

I'm looking for what I'm sure is a quick answer. I'm working with a data set that looks like this:
Week Game.ID VTm VPts HTm HPts Differential HomeWin
1 NFL_20050908_OAK#NE OAK 20 NE 30 10 TRUE
1 NFL_20050911_ARI#NYG ARI 19 NYG 42 23 TRUE
1 NFL_20050911_CHI#WAS CHI 7 WAS 9 2 TRUE
1 NFL_20050911_CIN#CLE CIN 27 CLE 13 -14 FALSE
1 NFL_20050911_DAL#SD DAL 28 SD 24 -4 FALSE
1 NFL_20050911_DEN#MIA DEN 10 MIA 34 24 TRUE
NFL data. I want to come up with a way to pair each HTm with its Differential, and store these values in another table. I know it's easy to do, but all the methods I am coming up with involve doing each team individually via a for loop that searches for [i,5]=="NE", [i,5]=="NYG". I'm wondering if there's a way to systematically do this for all 32 teams. I would then use the same method to pair VTM of the same team code ("NYG" or "NE") with VPTs and a VDifferential.
Thanks for the help.
Im not sure if i understood your question correctly(you need a function like select in a database?) but:
cbind(matr[,x], matr[,y])
selects column x and y and creates a new matrix
It sounds like you'd like to perform operations on your data frame based on a grouping variable. For that, there are many functions, among which is tapply(). For example, if your data is in a data.frame object named nflDF, you could get the maximum Differential for each home team HTm by
tapply(nflDF$Differential, nflDF$HTm, FUN = max)
Which would return (with your sample data)
CLE MIA NE NYG SD WAS
-14 24 10 23 -4 2
Alternatively, you could use by:
by(nflDF, nflDF$HTm, FUN = function(x) max(x$Differential))
HTm: CLE
[1] -14
------------------------------------------------------------
HTm: MIA
[1] 24
------------------------------------------------------------
HTm: NE
[1] 10
------------------------------------------------------------
HTm: NYG
[1] 23
------------------------------------------------------------
HTm: SD
[1] -4
------------------------------------------------------------
HTm: WAS
[1] 2
To perform more complicated operations, change the values supplied to the FUN arguments in the appropriate function.

Combining vectors of unequal length into a data frame

I have a list of vectors which are time series of inequal length. My ultimate goal is to plot the time series in a ggplot2 graph. I guess I am better off first merging the vectors in a dataframe (where the shorter vectors will be expanded with NAs), also because I want to export the data in a tabular format such as .csv to be perused by other people.
I have a list that contains the names of all the vectors. It is fine that the column titles be set by the first vector, which is the longest. E.g.:
> mylist
[[1]]
[1] "vector1"
[[2]]
[1] "vector2"
[[3]]
[1] "vector3"
etc.
I know the way to go is to use Hadley's plyr package but I guess the problem is that my list contains the names of the vectors, not the vectors themselves, so if I type:
do.call(rbind, mylist)
I get a one-column df containing the names of the dfs I wanted to merge.
> do.call(rbind, actives)
[,1]
[1,] "vector1"
[2,] "vector2"
[3,] "vector3"
[4,] "vector4"
[5,] "vector5"
[6,] "vector6"
[7,] "vector7"
[8,] "vector8"
[9,] "vector9"
[10,] "vector10"
etc.
Even if I create a list with the object themselves, I get an empty dataframe :
mylist <- list(vector1, vector2)
mylist
[[1]]
1 2 3 4 5 6 7 8 9 10 11 12
0.1875000 0.2954545 0.3295455 0.2840909 0.3011364 0.3863636 0.3863636 0.3295455 0.2954545 0.3295455 0.3238636 0.2443182
13 14 15 16 17 18 19 20 21 22 23 24
0.2386364 0.2386364 0.3238636 0.2784091 0.3181818 0.3238636 0.3693182 0.3579545 0.2954545 0.3125000 0.3068182 0.3125000
25 26 27 28 29 30 31 32 33 34 35 36
0.2727273 0.2897727 0.2897727 0.2727273 0.2840909 0.3352273 0.3181818 0.3181818 0.3409091 0.3465909 0.3238636 0.3125000
37 38 39 40 41 42 43 44 45 46 47 48
0.3125000 0.3068182 0.2897727 0.2727273 0.2840909 0.3011364 0.3181818 0.2329545 0.3068182 0.2386364 0.2556818 0.2215909
49 50 51 52 53 54 55 56 57 58 59 60
0.2784091 0.2784091 0.2613636 0.2329545 0.2443182 0.2727273 0.2784091 0.2727273 0.2556818 0.2500000 0.2159091 0.2329545
61
0.2556818
[[2]]
1 2 3 4 5 6 7 8 9 10 11 12
0.2824427 0.3664122 0.3053435 0.3091603 0.3435115 0.3244275 0.3320611 0.3129771 0.3091603 0.3129771 0.2519084 0.2557252
13 14 15 16 17 18 19 20 21 22 23 24
0.2595420 0.2671756 0.2748092 0.2633588 0.2862595 0.3549618 0.2786260 0.2633588 0.2938931 0.2900763 0.2480916 0.2748092
25 26 27 28 29 30 31 32 33 34 35 36
0.2786260 0.2862595 0.2862595 0.2709924 0.2748092 0.3396947 0.2977099 0.2977099 0.2824427 0.3053435 0.3129771 0.2977099
37 38 39 40 41 42 43 44 45 46 47 48
0.3320611 0.3053435 0.2709924 0.2671756 0.2786260 0.3015267 0.2824427 0.2786260 0.2595420 0.2595420 0.2442748 0.2099237
49 50 51 52 53 54 55 56 57 58 59 60
0.2022901 0.2251908 0.2099237 0.2213740 0.2213740 0.2480916 0.2366412 0.2251908 0.2442748 0.2022901 0.1793893 0.2022901
but
do.call(rbind.fill, mylist)
data frame with 0 columns and 0 rows
I have tried converting the vectors to dataframes, but there is no cbind.fill function, so plyr complains that the dataframes are of different length.
So my questions are:
Is this the best approach? Keep in mind that the goals are a) a ggplot2 graph and b) a table with the time series, to be viewed outside of R
What is the best way to get a list of objects starting with a list of the names of those objects?
What the best type of graph to highlight the patterns of 60 timeseries? The scale is the same, but I predict there'll be a lot of overplotting. Since this is a cohort analysis, it might be useful to use color to highlight the different cohorts in terms of recency (as a continuous variable). But how to avoid overplotting? The differences will be minimal so faceting might leave the viewer unable to grasp the difference.
I think that you may be approaching this the wrong way:
If you have time series of unequal length then the absolute best thing to do is to keep them as time series and merge them. Most time series packages allow this. So you will end up with a multi-variate time series and each value will be properly associated with the same date.
So put your time series into zoo objects, merge them, then use my qplot.zoo function to plot them. That will deal with switching from zoo into a long data frame.
Here's an example:
> z1 <- zoo(1:8, 1:8)
> z2 <- zoo(2:8, 2:8)
> z3 <- zoo(4:8, 4:8)
> nm <- list("z1", "z2", "z3")
> z <- zoo()
> for(i in 1:length(nm)) z <- merge(z, get(nm[[i]]))
> names(z) <- unlist(nm)
> z
z1 z2 z3
1 1 NA NA
2 2 2 NA
3 3 3 NA
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
>
> x.df <- data.frame(dates=index(x), coredata(x))
> x.df <- melt(x.df, id="dates", variable="val")
> ggplot(na.omit(x.df), aes(x=dates, y=value, group=val, colour=val)) + geom_line() + opts(legend.position = "none")
If you're doing it just because ggplot2 (as well as many other things) like data frames then what you're missing is that you need the data in long format data frames. Yes, you just put all of your response variables in one column concatenated together. Then you would have 1 or more other columns that identify what makes those responses different. That's the best way to have it set up for things like ggplot.
You can't. A data.frame() has to be rectangular; but recycling rules assures that the shorter vectors get expanded.
So you may have a different error here -- the data that you want to rbind is not suitable, maybe ? -- but is hard to tell as you did not supply a reproducible example.
Edit Given your update, you get precisely what you asked for: a list of names gets combined by rbind. If you want the underlying data to appear, you need to involve get() or another data accessor.

Resources