How do I automatically populate a matrix with intervals given the size and number of the intervals? - r

bucket_size <- 30
bucket_amount <- 24
matrix(???, bucket_amount, 2)
I'm trying to populate a (bucket_amount x 2) matrix using the interval size given by bucket_size. Here is what it would look like with the current given values of bucket_size and bucket_amount.
[1 30]
[31 60]
[61 90]
[91 120]
.
.
.
[691 720]
I can obviously hard code this specific example out, but I'm wondering how I can do this for different values of bucket_size and bucket_amount and have the matrix populate automatically.

We can seq specifying the from, by as 'bucket_size' and length.out as 'bucket_amount' to create a sequence of values ('v1'). Append 1 at the beginning while adding 1 to the 'v1' without last element and cbind these two vectors to create a matrix
v1 <- seq(bucket_size, length.out = bucket_amount , by = bucket_size)
v2 <- c(1, v1[-length(v1)] + 1)
m1 <- cbind(v2, v1)
-outupt
> head(m1)
v2 v1
[1,] 1 30
[2,] 31 60
[3,] 61 90
[4,] 91 120
[5,] 121 150
[6,] 151 180
> tail(m1)
v2 v1
[19,] 541 570
[20,] 571 600
[21,] 601 630
[22,] 631 660
[23,] 661 690
[24,] 691 720

Related

How to automatically multiply and add some coefficient to a data frame in R?

I have this data set
obs <- data.frame(replicate(8,rnorm(10, 0, 1)))
and this coefficients
coeff <- data.frame(replicate(8,rnorm(2, 0, 1)))
For each column of obs, I need to multiply the first element of first column, and add the second element of the first column too. I need to do the same for the 8 columns. I read somewhere that if someone copy and paste code more than once you are doing something wrong... and that's exactly what I did.
obs.transformed.X1 <-(obs[1]*coeff[1,1])+coeff[2,1]
obs.transformed.X2 <-(obs[2]*coeff[1,2])+coeff[2,2]
.
.
.
.
.
obs.transformed.X8 <-(obs[8]*coeff[1,8])+coeff[2,8]
I know there is a smarter way to do this (loop?), but I just couldn't figure it out. Any help will be appreciated.
This is what I've tried but I am only getting the last column
for (i in 1:length(obs)) {
results=(obs[i]*coeff[1,i])+coeff[2,i]
}
If you coerce to matrix class you can use the sweep function in a sequential fashion first multiplying columns by the first row of coeff and then by adding hte second row, again column-wise:
obs <- data.frame(matrix(1:60, 10)) # I find checking with random numbers difficult
coeff <- data.frame(matrix(1:12,2))
sweep(
sweep(as.matrix(obs), 2, as.matrix(coeff)[1,], "*"), # first operation is "*"
2, as.matrix(coeff)[2,], "+" ) # arguments for the addition
#--------------------------------
X1 X2 X3 X4 X5 X6
[1,] 3 37 111 225 379 573
[2,] 4 40 116 232 388 584
[3,] 5 43 121 239 397 595
[4,] 6 46 126 246 406 606
[5,] 7 49 131 253 415 617
[6,] 8 52 136 260 424 628
[7,] 9 55 141 267 433 639
[8,] 10 58 146 274 442 650
[9,] 11 61 151 281 451 661
[10,] 12 64 156 288 460 672
Decreased number of columns because your original code was too wide for my Rstudio console. But this should be very general. I suspect there's an equivalent matrix operator method but It didn't come to me
I came up with this solution..
results = list()
for (i in 1:length(obs)) {
results[[i]]=(obs[i]*coeff[1,i])+coeff[2,i]
}
results <- as.data.frame(results)
Is there any efficient way to do this?
I used Map
results <- as.data.frame(Map(`+`, Map(`*`, obs, coeff[1,]), coeff[2,]))
This should also give what you are looking for.

Loop over matrix using n consecutive rows in R

I have a matrix that consists of two columns and a number (n) of rows, while each row represents a point with the coordinates x and y (the two columns).
This is what it looks (LINK):
V1 V2
146 17
151 19
153 24
156 30
158 36
163 39
168 42
173 44
...
now, I would like to use a subset of three consecutive points starting from 1 to do some fitting, save the values from this fit in another list, an den go on to the next 3 points, and the next three, ... till the list is finished. Something like this:
Data_Fit_Kasa_1 <- CircleFitByKasa(Data[1:3,])
Data_Fit_Kasa_2 <- CircleFitByKasa(Data[3:6,])
....
Data_Fit_Kasa_n <- CircleFitByKasa(Data[i:i+2,])
I have tried to construct a loop, but I can't make it work. R either tells me that there's an "unexpected '}' in "}" " or that the "subscript is out of bonds". This is what I've tried:
minimal runnable code
install.packages("conicfit")
library(conicfit)
CFKasa <- NULL
Data.Fit <- NULL
for (i in 1:length(Data)) {
row <- Data[i:(i+2),]
CFKasa <- CircleFitByKasa(row)
Data.Fit[i] <- CFKasa[3]
}
RStudio Version 0.99.902 – © 2009-2016 RStudio, Inc.; Win10 Edu.
The third element of the fitted circle (CFKasa[3]) represents the radius, which is what I am really interested in. I am really stuck here, please help.
Many thanks in advance!
Best, David
Turn your data into a 3D array and use apply:
DF <- read.table(text = "V1 V2
146 17
151 19
153 24
156 30
158 36
163 39", header = TRUE)
a <- t(DF)
dim(a) <-c(nrow(a), 3, ncol(a) / 3)
a <- aperm(a, c(2, 1, 3))
# , , 1
#
# [,1] [,2]
# [1,] 146 17
# [2,] 151 19
# [3,] 153 24
#
# , , 2
#
# [,1] [,2]
# [1,] 156 30
# [2,] 158 36
# [3,] 163 39
center <- function(m) c(mean(m[,1]), mean(m[,2]))
t(apply(a, 3, center))
# [,1] [,2]
#[1,] 150 20
#[2,] 159 35
center(DF[1:3,])
#[1] 150 20

R: reshape data by chunks - more elegant way

I stumble upon the following thing. I read the reshape manual, but still lost.
Is there an efficient and more elegant way to reshape the matrix of even chunks?
the code to generate the matrix and reshaped matrix is below.
# current matrix
x <- matrix(sample(20*9), 20, 9)
colnames(x) <- c(paste("time",c(1:3),sep="_"),
paste("SGNL", 1, c(1:3), sep="_"),
paste("SGNL", 2, c(1:3), sep="_"))
# reshaped matrix
x.reshaped <- rbind( x[,c(1,4,7)], x[,c(2,5,8)], x[,c(3,6,9)] )
colnames(x.reshaped) <- sub("\\_1$", "", colnames(x.reshaped))
Thanks!
If you want to use an approach that is name-based and not position-based, then you should look at melt from "data.table":
library(data.table)
melt(as.data.table(x), measure.vars = patterns("time", "SGNL_1", "SGNL_2"))
Example output:
head(melt(as.data.table(x), measure.vars = patterns("time", "SGNL_1", "SGNL_2")))
# variable value1 value2 value3
# 1: 1 48 110 155
# 2: 1 67 35 140
# 3: 1 102 55 72
# 4: 1 161 39 66
# 5: 1 36 137 99
# 6: 1 158 169 85
Or, in base R:
patts <- c("time", "SGNL_1", "SGNL_2")
sapply(patts, function(y) c(x[, grep(y, colnames(x))]))
# time SGNL_1 SGNL_2
# [1,] 48 110 155
# [2,] 67 35 140
# [3,] 102 55 72
# [4,] 161 39 66
# [5,] 36 137 99
# .
# .
# .
# .
# [56,] 13 1 84
# [57,] 40 46 95
# [58,] 152 7 178
# [59,] 81 79 123
# [60,] 50 101 146
Data generated with set.seed(1).
We could create the subset of matrices (based on the index generated by the seq) in a list and then rbind it together.
do.call(rbind, lapply(1:3, function(i) x[,seq(i, length.out=3, by=3)]))
Or using a for loop
m2 <- c()
for(i in 1:3) { m2 <- rbind(m2, x[,seq(i, length.out=3, by=3)])}
x[,c(matrix(1:9, 3, byrow=TRUE))] # or shorter:
x[,matrix(1:9, 3, byrow=TRUE)]

Subset columns in R with specific values

Given a large data frame e.g. df, with 500 columns and 100 rows, how do I just subset columns exceeding a specific threshold e.g. 1 ?
Thanks
Assuming you mean subset columns that all of their values are greater than 1 you could do something like this (essentially you can also use the following according to any condition you might have. Just change the if condition):
Example Data
a <- data.frame(matrix(runif(90),ncol=3))
> a
X1 X2 X3
1 0.33341130 0.09307143 0.51932506
2 0.78014395 0.30378432 0.67309736
3 0.19967771 0.30829771 0.60144888
4 0.77736355 0.42504910 0.23880491
5 0.60631868 0.55198423 0.29565519
6 0.24246456 0.57945721 0.17882712
7 0.10499677 0.48768998 0.54931955
8 0.92288335 0.29290491 0.72885160
9 0.85246128 0.87564673 0.60069170
10 0.39931205 0.29895856 0.83249469
11 0.33674259 0.85618041 0.62940935
12 0.27816980 0.51508938 0.76079354
13 0.19121182 0.27586235 0.21273823
14 0.66337625 0.18631150 0.67762964
15 0.00923405 0.84753915 0.08386400
16 0.33209371 0.54919903 0.49128825
17 0.97685675 0.25564765 0.56439142
18 0.26710042 0.75852884 0.88706946
19 0.32422355 0.58971620 0.84070049
20 0.73000898 0.09068726 0.92541277
21 0.80547283 0.93723241 0.31050230
22 0.28897215 0.80679092 0.06080124
23 0.32190269 0.12254342 0.42506740
24 0.52569405 0.68506407 0.68302356
25 0.31098388 0.66225007 0.08565480
26 0.67546897 0.08123716 0.58419470
27 0.29501987 0.17836528 0.79322116
28 0.20736102 0.81145297 0.44078101
29 0.75165829 0.51865202 0.36653840
30 0.63375066 0.03804626 0.69949846
Solution
Just a single lapply is enough. I use 0.05 as threshold here because it is easier to demonstrate how to use it according to my random data set. Change that to whatever you want in your dataset.
b <- do.call(cbind, (lapply(a, function(x) if(all(x>0.05)) return(x) )))
Output
> b
X3
[1,] 0.51932506
[2,] 0.67309736
[3,] 0.60144888
[4,] 0.23880491
[5,] 0.29565519
[6,] 0.17882712
[7,] 0.54931955
[8,] 0.72885160
[9,] 0.60069170
[10,] 0.83249469
[11,] 0.62940935
[12,] 0.76079354
[13,] 0.21273823
[14,] 0.67762964
[15,] 0.08386400
[16,] 0.49128825
[17,] 0.56439142
[18,] 0.88706946
[19,] 0.84070049
[20,] 0.92541277
[21,] 0.31050230
[22,] 0.06080124
[23,] 0.42506740
[24,] 0.68302356
[25,] 0.08565480
[26,] 0.58419470
[27,] 0.79322116
[28,] 0.44078101
[29,] 0.36653840
[30,] 0.69949846
Only column 3 confirmed the condition on this occasion so it was returned.
Or another option (based on #LyzandeR's data). Get the colSums of the logical condition (a <= 0.05) and negate it (!). If there are 0 values for a particular column, this step converts to TRUE. This can be used for selecting the columns.
a[,!colSums(a<=0.05), drop=FALSE]

Cumulative sums, moving averages, and SQL "group by" equivalents in R

What's the most efficient way to create a moving average or rolling sum in R? How do you do the rolling function along with a "group by"?
While zoo is great, sometimes there are simpler ways. If you data behaves nicely, and is evenly spaced, the embed() function effectively lets you create multiple lagged version of a time series. If you look inside the VARS package for vector auto-regression, you will see that the package author chooses this route.
For example, to calculate the 3 period rolling average of x, where x = (1 -> 20)^2:
> x <- (1:20)^2
> embed (x, 3)
[,1] [,2] [,3]
[1,] 9 4 1
[2,] 16 9 4
[3,] 25 16 9
[4,] 36 25 16
[5,] 49 36 25
[6,] 64 49 36
[7,] 81 64 49
[8,] 100 81 64
[9,] 121 100 81
[10,] 144 121 100
[11,] 169 144 121
[12,] 196 169 144
[13,] 225 196 169
[14,] 256 225 196
[15,] 289 256 225
[16,] 324 289 256
[17,] 361 324 289
[18,] 400 361 324
> apply (embed (x, 3), 1, mean)
[1] 4.666667 9.666667 16.666667 25.666667 36.666667 49.666667
[7] 64.666667 81.666667 100.666667 121.666667 144.666667 169.666667
[13] 196.666667 225.666667 256.666667 289.666667 324.666667 361.666667
I scratched up a good answer from Achim Zeileis over on the r list. Here's what he said:
library(zoo)
## create data
x <- rnorm(365)
## transform to regular zoo series with "Date" index
x <- zooreg(x, start = as.Date("2004-01-01")) plot(x)
## add rolling/running/moving average with window size 7
lines(rollmean(x, 7), col = 2, lwd = 2)
## if you don't want the rolling mean but rather a weekly ## time series of means you can do
nextfri <- function(x) 7 * ceiling(as.numeric(x - 1)/7) + as.Date(1) xw <- aggregate(x, nextfri, mean)
## nextfri is a function which computes for a certain "Date" ## the next friday. xw is then the weekly series.
lines(xw, col = 4)
Achim went on to say:
Note, that the difference between is
rolling mean and the aggregated series
is due to different alignments. This
can be changed by changing the 'align'
argument in rollmean() or the
nextfri() function in the aggregate
call.
All this came from Achim, not from me:
http://tolstoy.newcastle.edu.au/R/help/05/06/6785.html

Resources