Count blocks in a series - r

This is a simple problem, however I cannot find an elegant solution for:
Given is the following vector series:
series=c(1,2,4,5,6,1,2,4,5,6,7,8,2,4)
I now want to count blocks of this vector in the same vector; e.g. if I have a block size of 2, I would like to count the pairs 1&2, 2&4, 4&5 and so on (in total 8 unique blocks if I did the counting right).
Can you think of an easy way to program that so that I receive an output matrix with a column for the "unique block number" and a corresponding column for the counts?

One idea is to can use rollapply from zoo,
nrow(unique(rollapply(series, 2, by = 1, paste0)))
#[1] 8
You can change '2' to get combinations(block sizes) of 3, 4, etc...

Related

Why matrices are [rows, columns] and not [columns, rows]?

For the sake of simplicity, I will refer to column as col.
Why are matrices defined as [rows, columns] and not [columns, rows]?
It has just caused me a ton of headaches and confusions.
My thinking goes this way:
A regular array:
[1, 3, 5, 2, 4]
is like a matrix with one row and multiple cols. And it is notated like that: arr[n].
And so if we had another dimension:
[1, 3, 5, 2, 4]
[1, 3, 6, 3, 6]
there are now rows. So let us notate the rows after the 'n', arr[n, rows], but the reality shows us otherwise.
In addition, a matrix of 2 dimensions can be looked at as a cartesian coordinate system (where the direction of the y axis is flipped, the origin is element [0,0]). in the plane, we notate points like that: (X,Y).
It looks like the cols are sitting on the x axis and the rows are on the y axis, so why not notate the elements of matrices like that: [Cols, Rows]?
Sorry if I've confused you, and sorry for my ignorance.
IMO, this is bound to latin typographical conventions.
Latin writes from left to right, then top to bottom.
Following these conventions, a Matrix is decomposed into nRow rows.
Each row is then decomposed into nColum elements, much like you would decompose a text in sentences, and sentences in words.
Information is thus organized with a most significant arrangement (rows) and least significant (column).
Following same convention as latin (err arabic) number notation, we have most significant nRow on the left (first if you are latin), and least significant nColumn on the right, thus (nRow,nColumn) for describing the layout.
Naturally, accessing a single element at row iRow and column jCol follows same convention (iRow,jCol).
Note that information can be arranged completely differently in underlying software. For example, for multi-dimensional arrays in FORTRAN and Matlab, first indice vary first, and the sequence in memory is x(1,1) x(2,1) x(3,1) ... x(1,2) x(2,2) x(3,2) ... sort of column-wise order if we consider that left (first) is row index, right (last) is column index. Or maybe some optimized library will have arranged a block layout for the matrix.

Print all values greater than some value without using booleans in R?

anyone got any idea how to print element greater than some value from a set without using booleans?
E.g., suppose I have the set x, which includes the elements (1, 4, 6, 3, 5, 2, 9).
Obviously, we can print all the values of x greater than 5 using the following code:
x[x>5]
But this way of coding uses booleans (it uses TRUE, FALSE etc.)
But is there any way to do this using solely integers?
I was thinking about some sort of loop that would start contain the number of elements, and then do
x[c(variable)]
but I don't know really.
Please help?

Series vector for approximating pi

I've been set a question about Madhava's approximation of pi. The first part of it is to create a vector which contains the first 20 terms in the series. I know I could just input the first 20 terms into a vector, however that seems like a really long winded way of doing things. I was wondering if there is an easier way to create the vector?
Currently I have the vector
g = c((-3)^(-0)/(2*0+1), (-3)^(-1)/(2*1+1), (-3)^(-2)/(2*2+1), (-3)^(-3)/(2*3+1), (-3)^(-4)/(2*4+1), (-3)^(-5)/(2*5+1), (-3)^(-6)/(2*6+1), (-3)^(-7)/(2*7+1), (-3)^(-8)/(2*8+1), (-3)^(-9)/(2*9+1), (-3)^(-10)/(2*10+1), (-3)^(-11)/(2*11+1), (-3)^(-12)/(2*12+1), (-3)^(-13)/(2*13+1), (-3)^(-14)/(2*14+1), (-3)^(-15)/(2*15+1), (-3)^(-16)/(2*16+1), (-3)^(-17)/(2*17+1), (-3)^(-18)/(2*18+1), (-3)^(-19)/(2*19+1), (-3)^(-20)/(2*20+1))
And
h = sqrt(12)
So I have done g*h to get the approximation of pi. Surely there's an easier way of doing this?
Apologies if this is relatively basic, I am very new to R and still learning how to properly use stack overflow.
Thanks.
One of the best features of R is that it is vectorised. This means that we can do operations element-wise on entire vectors rather than having to type out the operation for each element. For example, if you wanted to find the square of the first five natural numbers (starting at one), we can do this:
(1:5)^2
which results in the output
[1] 1 4 9 16 25
instead of having to do this:
c(1^2, 2^2, 3^2, 4^2, 5^2)
which gives the same output.
Applying this amazing property of R to your situation, instead of having to manually construct the whole vector, we can just do this:
series <- sqrt(12) * c(1, -1) / 3^(0:19) / seq(from=1, by=2, length.out=20)
sum(series)
which gives the following output:
[1] 3.141593
and we can see more decimal places by doing this:
sprintf("%0.20f", sum(series))
[1] "3.14159265357140338182"
To explain a little further what I did in that line of code to generate the series:
We want to multiply the entire thing by the square root of 12, hence the sqrt(12), which will be applied to every element of the resulting vector
We need the signs of the series to alternate, which is accomplished via * c(1, -1); this is because of recycling, where R recycles elements of vectors when doing vector operations. It will multiply the first element by one, the second element by -1, then recycle and multiply the third element by 1, the fourth by -1, etc.
We need to divide each element by 1, 3, 9, etc., which is accomplished by / 3^(0:19) which gives / c(3^0, 3^1, ...)
Lastly, we also need to divide by 1, 3, 5, 7, etc. which is accomplished by seq(from=1, by=2, length.out=20) (see help(seq))

Using head() to print n ordered rows in dataframe from random starting position

I know I can use
head(sample(x),m)
to print a random selection of m rows from my dataset, but in this case each new draw is randomized. What if, instead of randomizing every draw, I wanted to randomize only the starting position for the first draw, while preserving the order of subsequent rows?
To illustrate, imagine we have a dataset of n rows and I wanted to print m of them in order, starting from a random position. The randomly drawn starting position is 5, so my desired function would print 5, 6, 7, ..., m < n.
This is more of a theoretical question, not a diagnostic one, so I don't believe a MWE example is needed...please let me know if you think it is and I will be happy to provide one.
We create a numeric index using the sample element and adding with the sequence of 'n' rows that should follow it. If the sampled index is say the last row, then we can create a condition to check for those cases
i1 <- sample(nrow(df1), 1)+ 0:3
df1[ i1[i1 <= nrow(df1)], ]

R -- frequencies within a variable for repeating values

I've got a column A, which has several values, some of them repeating. So, example: A = c(5, 9, 6, 5, 5). I need to go through A and count the frequencies of each of the values in A. So, for this example, for the set of 5s in A, there are 3 occurancies of 5s. I need to save these frequencies so I can use them in another calculation. By the way, I have several other variables in this dataset.
How do I do this?
Thanks.
You can try
library(data.table)#v1.9.4+
setDT(yourdf)[, .N, by = A]

Resources