R: create a data frame out of a rolling window - r

Lets say I have a data frame with the following structure:
DF <- data.frame(x = 0:4, y = 5:9)
> DF
x y
1 0 5
2 1 6
3 2 7
4 3 8
5 4 9
what is the most efficient way to turn 'DF' into a data frame with the following structure:
w x y
1 0 5
1 1 6
2 1 6
2 2 7
3 2 7
3 3 8
4 3 8
4 4 9
Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields
w x y
1 0 5
1 1 6
1 2 7
2 1 6
2 2 7
2 3 8
3 2 7
3 3 8
3 4 9
I am a bit stumped by this problem, because the data frame can also contain an arbitrary number of columns, i.e. w,x,y,z etc.
/edit 2: I've realized edit 1 is a bit unreasonable, as xts doesn't seem to deal with multiple observations per data point

My approach would be to use the embed function. The first thing to do is to create a rolling sequence of indices into a vector. Take a data-frame:
df <- data.frame(x = 0:4, y = 5:9)
nr <- nrow(df)
w <- 3 # window size
i <- 1:nr # indices of the rows
iw <- embed(i,w)[, w:1] # matrix of rolling-window indices of length w
> iw
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
wnum <- rep(1:nrow(iw),each=w) # window number
inds <- i[c(t(iw))] # the indices flattened, to use below
dfw <- sapply(df, '[', inds)
dfw <- transform(data.frame(dfw), w = wnum)
> dfw
x y w
1 0 5 1
2 1 6 1
3 2 7 1
4 1 6 2
5 2 7 2
6 3 8 2
7 2 7 3
8 3 8 3
9 4 9 3

Related

how to remove part of a phrase from all variables in a column in R

say I have a Data frame
g <- c("Smember_1", "Smember_1", "Smember_1", "Smember_2", "Smember_2", "Smember_2", "Smember_3", "Smember_3", "Smember_3")
m <- c(1,2,1,3,4,1,3,5,6)
df <- data.frame(g, m)
g m
1 Smember_1 1
2 Smember_1 2
3 Smember_1 1
4 Smember_2 3
5 Smember_2 4
6 Smember_2 1
7 Smember_3 3
8 Smember_3 5
9 Smember_3 6
I would like to remove Smember_ in from all the variables in the g column such that the data frame df looks like
> df
g m
1 1 1
2 1 2
3 1 1
4 2 3
5 2 4
6 2 1
7 3 3
8 3 5
9 3 6
I think you want
df$g <- gsub(".*(\\d+)$", "\\1", df$g)
df2$variable <- gsub("Smember_","", df2$variable)
worked!

Compare 2 values of the same row of a matrix with the row and column index of another matrix in R

I have a matrix1 with 11217 rows and 2 columns, a second matrix2 which has 10 rows and 10 columns. Now, I want to compare the values in the rows of matrix 1 with the indices of matrix 2 and if these are the same then the value of the corresponding index (currently 0) of the matrix2 should be increased with +1.
c1 <- x[2:11218] #these values go from 1 to 10
#second column from index 3 to N
c2 <- x[3:11219] #these values also go from 1 to 10
#matrix with column c1 and c2
m1 <- as.matrix(cbind(c1 = c1, c2 = c2))
#empty matrix which will count the frequencies
m2 <- matrix(0, nrow = 10, ncol = 10)
#change row and column names of m2 to the numbers of 1 to 10
dimnames(m2) <-list(c(1:10), c(1:10))
#go through every row of the matrix m1 and look which rotation appears, add 1 to m2 if the rotation
#equals the corresponding index
r <- c(1:10)
c <- c(1:10)
for (i in 1:nrow(m1)) {
if(m1[i,1] == r & m1[i,2] == c)
m2[r,c]+1
}
no frequencies where calculated, i don't understand why?
It appears that you are trying to replicate the behavior of table. I'd recommend just using it instead.
Simpler data (it appears you did not include variable x):
m1 <-
matrix(round(runif(20, 1,10))
, ncol = 2)
Then, use table. Here, I am setting the values of each column to be a factor to ensure that the right columns are generated:
table(factor(m1[,1], 1:10)
, factor(m1[,2], 1:10))
gives:
1 2 3 4 5 6 7 8 9 10
1 3 4 0 4 2 0 5 3 2 0
2 3 7 9 7 4 5 3 4 5 2
3 4 6 3 10 8 9 4 2 7 3
4 5 2 14 3 7 13 8 11 3 3
5 2 13 2 5 8 5 7 7 8 6
6 1 10 7 4 5 6 8 5 8 5
7 3 3 6 5 4 5 4 8 7 7
8 5 5 8 7 6 10 5 4 3 4
9 2 5 8 4 7 4 4 6 4 2
10 3 1 2 3 3 5 3 5 1 0

R - how to aggregate by group values listed in this way

Given DF, how do I aggregate it to list the number of rows with successive matching Type?
For example, I would like to generate out1 or out2 from DF.
DF <- data.frame("Ages"=c(1,2,3,4,5,6,7,8),"Type"=c("R","N","N","N","R","R","N","R"))
DF
Ages Type
1 1 R
2 2 N
3 3 N
4 4 N
5 5 R
6 6 R
7 7 N
8 8 R
out1 <- data.frame("Counts"=c(1,3,2,1,1))
out1
Counts
1 1
2 3
3 2
4 1
5 1
out2 <- data.frame("Ages"=c(1,4,6,7,8),"Counts"=c(1,3,2,1,1))
out2
Ages Counts
1 1 1
2 4 3
3 6 2
4 7 1
5 8 1

Reduce columns of a matrix by a function in R

I have a matrix sort of like:
data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
# 1 2 3 2 3 2
# 1 5 4 9 6 7 8
# 2 6 9 9 1 2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8 6 4
# 5 6 4 5 9 4 4
Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:
# 1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4
The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melting the data then using group_by from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.
We can use rowMins from library(matrixStats)
library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
# 1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4
row.names(res) <- row.names(values)

Extract a data frame using model.frame and formula

I want to extract a data frame using a formula, which specifies which columns to select and some crossing overs among columns.
I know model.frame function. However it does not give me the crossing overs:
For example:
df <- data.frame(x = c(1,2,3,4), y = c(2,3,4,7), z = c(5,6, 9, 1))
f <- formula('z~x*y')
model.frame(f, df)
output:
> df
x y z
1 1 2 5
2 2 3 6
3 3 4 9
4 4 7 1
> f <- formula('z~x*y')
> model.frame(f, df)
z x y
1 5 1 2
2 6 2 3
3 9 3 4
4 1 4 7
I hope to get:
z x y x*y
1 5 1 2 2
2 6 2 3 6
3 9 3 4 12
4 1 4 7 28
Is there a package that could achieve this functionality? (It would be perfect if I can get the resulting matrix as a sparse matrix because the crossed columns will be highly sparse)
You can use model.matrix:
> model.matrix(f, df)
(Intercept) x y x:y
1 1 1 2 2
2 1 2 3 6
3 1 3 4 12
4 1 4 7 28
attr(,"assign")
[1] 0 1 2 3
If you want to save the result as a sparse matrix, you can use the Matrix package:
> mat <- model.matrix(f, df)
> library(Matrix)
> Matrix(mat, sparse = TRUE)
4 x 4 sparse Matrix of class "dgCMatrix"
(Intercept) x y x:y
1 1 1 2 2
2 1 2 3 6
3 1 3 4 12
4 1 4 7 28

Resources