Operation with a column vector without using a loop in R - r

I want to perform an operation over a vector without using a loop. The operation is the following:
This is how I am coding in R
meanx <- mean(rankx)
Numerador <- (rankx[] - meanx)*(rankx[+1] - meanx)
This is the input:
> dput(rankx)
c(15, 11, 12, 30, 58, 14, 41, 10, 57, 32, 28, 52, 61, 18, 54,
37, 19, 7, 29, 66, 5, 47, 25, 6, 50, 65, 62, 23, 40, 63, 42,
64, 38, 56, 45, 17, 8, 59, 55, 67, 24, 60, 2, 35, 44, 20, 3,
39, 4, 31, 26, 51, 21, 22, 53, 33, 46, 9, 16, 36, 13, 27, 34,
48, 1, 49, 43)
For example for the first case it will be: (15 - mean(rankx))(11 - mean(rankx))
For the next: (11 - mean(rankx))(12 - mean(rankx))
I am not sure how to refer to the second element and my error is in rankx[+1]
Any idea in how to solve this operation without using a loop?

You can use dplyr::lead
rankx[+1] is equivalent to rankx[1], which is 15.
If you want a copy of rankx that's displaced by one unit, use dplyr::lead(rankx) - like this:
rankx <- c(15, 11, 12, 30, 58, 14, 41)
dplyr::lead(rankx)
#> [1] 11 12 30 58 14 41 NA
meanx <- mean(rankx)
Numerador <- (rankx - meanx)*(dplyr::lead(rankx) - meanx)
Numerador
#> [1] 161.30612 205.87755 -57.40816 133.16327 -381.12245 -179.55102 NA
Created on 2021-04-20 by the reprex package (v1.0.0)

Related

Add vertical lines to time-series plot

I have the code below which plots two time-series. I'd like to add a vertical line every say 10 units on the x-axis to basically divide the plot up into like 5 squares. Any tips are very much appreciated.
Code:
## Plot Forecast & Actual
ts.plot(ts(CompareDf$stuff1),ts(CompareDf$stuff2),col=1:2,xlab="Hour",ylab="Minu tes",main='testVar')
legend("topleft", legend = c("Actual","Forecast"), col = 1:2, lty = 1)
Data:
dput(CompareDf)
structure(list(stuff1 = c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55), stuff2 = c(8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57)), .Names = c("stuff1",
"stuff2"), row.names = c(NA, -50L), class = "data.frame")
After plotting timeseries data, use abline to draw vertical lines.
abline(v = seq(10, 50, 10))

Discretize the columns first - Apriori in R

I extracted this data from a file:
forests<-read.table("~/Desktop/f.txt", header = FALSE, sep = " ", fill = TRUE)
library(arules) //after installing the package
I get an error when I type
forests<- apriori(forests, parameter = list(support=0.3))
Error in asMethod(object) : column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85 not logical or a factor. Discretize the
columns first.
I tried discritize(forests). It still doesn't work.
itemFrequencyPlot(forests) also gives an error saying unable to find inherited method.
Have I imported the data incorrectly?

Decomposition of time series data

Can anybody help be decipher the output of ucm. My main objective is to check if the ts data is seasonal or not. But i cannot plot and look and every time. I need to automate the entire process and provide an indicator for the seasonality.
I want to understand the following output
ucmxmodel$s.season
# Time Series:
# Start = c(1, 1)
# End = c(4, 20)
# Frequency = 52
# [1] -2.391635076 -2.127871717 -0.864021134 0.149851212 -0.586660213 -0.697838635 -0.933982269 0.954491859 -1.531715424 -1.267769820 -0.504165631
# [12] -1.990792301 1.273673437 1.786860414 0.050859315 -0.685677002 -0.921831488 -1.283081922 -1.144376739 -0.964042949 -1.510837956 1.391991657
# [23] -0.261175626 5.419494363 0.543898305 0.002548125 1.126895943 1.474427901 2.154721023 2.501352782 0.515453691 -0.470886132 1.209419689
ucmxmodel$vs.season
# [1] 1.375832 1.373459 1.371358 1.369520 1.367945 1.366632 1.365582 1.364795 1.364270 1.364007 1.364007 1.364270 1.364795 1.365582 1.366632 1.367945
# [17] 1.369520 1.371358 1.373459 1.375816 1.784574 1.784910 1.785223 1.785514 1.785784 1.786032 1.786258 1.786461 1.786643 1.786802 1.786938 1.787052
# [33] 1.787143 1.787212 1.787257 1.787280 1.787280 1.787257 1.787212 1.787143 1.787052 1.786938 1.786802 1.786643 1.786461 1.786258 1.786032 1.785784
# [49] 1.785514 1.785223 1.784910 1.784578 1.375641 1.373276 1.371175 1.369337 1.367762 1.366449 1.365399 1.364612 1.364087 1.363824 1.363824 1.364087
# [65] 1.364612 1.365399 1.366449 1.367762 1.369337 1.371175 1.373276 1.375636 1.784453 1.784788 1.785101 1.785392 1.785662 1.785910 1.786136 1.786339
ucmxmodel$est.var.season
# Season_Variance
# 0.0001831373
How can i use the above info without looking at the plots to determine the seasonality and at what level ( weekly, monthly, quarterly or yearly)?
In addition, i am getting NULL in est
ucmxmodel$est
# NULL
Data
The data for a test is:
structure(c(44, 81, 99, 25, 69, 42, 6, 25, 75, 90, 73, 65, 55,
9, 53, 43, 19, 28, 48, 71, 36, 1, 66, 46, 55, 56, 100, 89, 29,
93, 55, 56, 35, 87, 77, 88, 18, 32, 6, 2, 15, 36, 48, 80, 48,
2, 22, 2, 97, 14, 31, 54, 98, 43, 62, 94, 53, 17, 45, 92, 98,
7, 19, 84, 74, 28, 11, 65, 26, 97, 67, 4, 25, 62, 9, 5, 76, 96,
2, 55, 46, 84, 11, 62, 54, 99, 84, 7, 13, 26, 18, 42, 72, 1,
83, 10, 6, 32, 3, 21, 100, 100, 98, 91, 89, 18, 88, 90, 54, 49,
5, 95, 22), .Tsp = c(1, 3.15384615384615, 52), class = "ts")
and
structure(c(40, 68, 50, 64, 26, 44, 108, 90, 62, 60, 90, 64, 120, 82, 68, 60,
26, 32, 60, 74, 34, 16, 22, 44, 50, 16, 34, 26, 42, 14, 36, 24, 14, 16, 6, 6,
12, 20, 10, 34, 12, 24, 46, 30, 30, 46, 54, 42, 44, 42, 12, 52, 42, 66, 40,
60, 42, 44, 64, 96, 70, 52, 66, 44, 64, 62, 42, 86, 40, 56, 50, 50, 62, 22,
24, 14, 14, 18, 18, 10, 20, 10, 4, 18, 10, 10, 14, 20, 10, 32, 12, 22, 20, 20,
26, 30, 36, 28, 56, 34, 14, 54, 40, 30, 42, 36, 52, 30, 32, 52, 42, 62, 46,
64, 70, 48, 40, 64, 40, 120, 58, 36, 40, 34, 36, 26, 18, 28, 16, 32, 18, 12,
20), .Tsp = c(1, 4.36, 52), class = "ts")
I think the most straightforward approach would be to follow Rob Hyndman's approach (he is the author of many time series packages in R). For your data it would work as follows,
require(fma)
# Create a model with multiplicative errors (see https://www.otexts.org/fpp/7/7).
fit1 <- stlf(test2)
# Create a model with additive errors.
fit2 <- stlf(data, etsmodel = "ANN")
deviance <- 2 * c(logLik(fit1$model) - logLik(fit2$model))
df <- attributes(logLik(fit1$model))$df - attributes(logLik(fit2$model))$df
# P-value
1 - pchisq(deviance, df)
# [1] 1
Based on this analysis we find the p-value of 1 which would lead us to conclude there is no seasonality.
I quite like the stl() function provided in R. Try this minimal example:
# some random data
x <- rnorm(200)
# as a time series object
xt <- ts(x, frequency = 10)
# do the decomposition
xts <- stl(xt, s.window = "periodic")
# plot the results
plot(xts)
Now you can get an estimate of the 'seasonality' by comparing the variances.
vars <- apply(xts$time.series, 2, var)
vars['seasonal'] / sum(vars)
You now have the seasonal variance as a proportion of sum of variances after decomposition.
I highly recommend reading the original paper so that you understand whats happening under the hood here. Its very accessible and I like this method as it is quite intuitive.

Shuffle deck of cards without built-in random functon

My friend suggested me to try to solve this problem before interview, but I have no idea on how to approach it.
I need to write a code to shuffle a deck of 52 cards without using a built-in standard random function.
Update
Thanks to Yifei Wu, his answer was very helpful.
Here is a link for my github project where I executed the given algorithm
https://github.com/Dantsj16/Shuffle-Without-Random.git
Your question does not say it must be a random shuffle of 52 cards. There is such a thing as a perfect shuffle, where a riffle shuffle is done with the top card remaining on the top and every other card comes from the other half of the deck. Many magicians and card sharks can do this shuffle as desired. It is well known that eight perfect shuffles in a row of a standard 52-card deck returns the cards to their original order, if the top card remains on top for each shuffle.
Here are 8 perfect shuffles in python Note that this shuffle is done differently than an actual manual shuffle would be done, to simplify the code.
In [1]: d0=[x for x in range(1,53)] # the card deck
In [2]: print(d0)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [3]: d1=d0[::2]+d0[1::2] # a perfect shuffle
In [4]: print(d1)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52]
In [5]: d2=d1[::2]+d1[1::2]
In [6]: d3=d2[::2]+d2[1::2]
In [7]: d4=d3[::2]+d3[1::2]
In [8]: d5=d4[::2]+d4[1::2]
In [9]: d6=d5[::2]+d5[1::2]
In [10]: d7=d6[::2]+d6[1::2]
In [11]: d8=d7[::2]+d7[1::2]
In [12]: print(d8)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [13]: print(d0 == d8)
True
If you want the perfect shuffle as done by hand, use
d1=[None]*52
d1[::2]=d0[:26]
d1[1::2]=d0[26:]
This gives, for d1,
[1, 27, 2, 28, 3, 29, 4, 30, 5, 31, 6, 32, 7, 33, 8, 34, 9, 35, 10, 36, 11, 37, 12, 38, 13, 39, 14, 40, 15, 41, 16, 42, 17, 43, 18, 44, 19, 45, 20, 46, 21, 47, 22, 48, 23, 49, 24, 50, 25, 51, 26, 52]
Let me know if you really need a random shuffle. I can adapt my Borland Delphi code into python if you need it.

How do you include data frame output inside warnings and errors?

How can I include array or data frame output in a message, warning or error?
By default, the output is collapsed by deparseing each column, which isn't useful. Here's an example, using the cars dataset.
message(cars)
## c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24, 24, 25)c(2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)
Print the output, recapture it using capture.output(), and collapse into a single string separated by newlines.
print_and_capture <- function(x)
{
paste(capture.output(print(x)), collapse = "\n")
}
message(print_and_capture(cars))
## speed dist
## 1 4 2
## 2 4 10
## # etc.
stop("An error was found in the cars dataset:\n", print_and_capture(cars))
## Error: An error was found in the cars dataset:
## speed dist
## 1 4 2
## 2 4 10
## # etc.
print_and_capture() is now available in assertive.base.

Resources