rollapply mean 5 previuous years - r

I have a zoo obj like that colled z.
> z["2013-12",1]
Allerona
2013-12-01 0.0
2013-12-02 0.0
2013-12-03 0.0
2013-12-04 0.0
2013-12-05 0.2
2013-12-06 0.0
2013-12-07 0.0
2013-12-08 0.2
2013-12-09 0.0
....
It stores daily value of rainfall.
I'm able to compute the 5-days accumulation using rollapply usingi:
m=rollapply(z, width=3, FUN=sum, by=1, by.column=TRUE, fill=NA, align="right")
It looks ok
> m["2013-12",1]
Allerona
2013-12-01 0.0
2013-12-02 0.0
2013-12-03 0.0
2013-12-04 0.0
2013-12-05 0.2
2013-12-06 0.2
2013-12-07 0.2
2013-12-08 0.2
2013-12-09 0.2
...
How can I calculate for each day themean for 5-years before?
Thanks

SMA (x, n=5*365)
does not do the trick ?

I sorted my problem.
The solution was to use a list into the width parameter of rollapply.
here below the code:
mean5year=rollapply(as.zoo(m), list(-365*5:1), function(x) {mean(x,na.rm = TRUE)},fill=NA)
where
list(-365*5:1)
takes the same day but in the previous 5-years. I should also use a mean with na.rm =TRUE to compute mean also if NA are in the sequence

Related

R only ever runs on a certain CPU in Linux

I have an 8-core RHEL Linux machine running R 4.0.2.
If I ask R for the number of cores, I can confirm that 8 are available.
> print(future::availableWorkers())
[1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[7] "localhost" "localhost"
> print(parallel::detectCores())
[1] 8
However, if I run this simple example
f <- function(out=0) {
for (i in 1:1e10) out <- out + 1
}
output <- parallel::mclapply(1:8, f, mc.cores = 8)
my top indicates that only 1 core is being used (so that each worker is using 1/8th of that core, or 1/64th of the entire machine).
%Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 2.0 us, 0.0 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32684632 total, 28211076 free, 2409992 used, 2063564 buff/cache
KiB Swap: 16449532 total, 11475052 free, 4974480 used. 29213180 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3483 user 20 0 493716 57980 948 R 1.8 0.2 0:18.09 R
3479 user 20 0 493716 57980 948 R 1.5 0.2 0:18.09 R
3480 user 20 0 493716 57980 948 R 1.5 0.2 0:18.08 R
3481 user 20 0 493716 57980 948 R 1.5 0.2 0:18.09 R
3482 user 20 0 493716 57980 948 R 1.5 0.2 0:18.09 R
3484 user 20 0 493716 57980 948 R 1.5 0.2 0:18.09 R
3485 user 20 0 493716 57980 948 R 1.5 0.2 0:18.09 R
3486 user 20 0 493716 57980 948 R 1.5 0.2 0:18.09 R
Does anyone know what might be going on here? Another StackOverflow question that documents similar behavior is here. It's clear that I messed up the install somehow. I followed these install instructions for RHEL 7. I'm guessing there is a dependency missing, but I have no idea where to look. If anyone has any ideas of diagnostics to run, etc., they would be most appreciated.
For further context, I have R 3.4.1 also installed on my machine, and when I run this code, everything works fine. (I installed that version through yum.)
I also installed R 4.0.3 yesterday using the same instructions linked above, and it suffers from the same problem.
First run
system(sprintf("taskset -p 0xffffffff %d", Sys.getpid()))
then your simple example
f <- function(out=0) { for (i in 1:1e10) out <- out + 1 }
output <- parallel::mclapply(1:8, f, mc.cores = 8)
works on all 8 cores.

Create xts object from CSV

I'm trying to generate an xts from a CSV file. The output looks okay as a simple vector i.e. Date and Value columns are character and numeric, respectively.
However, if I want to make it into an xts, the output seems dubious
I'm wondering what is the output on the furthest left column on the xts?
> test <- read.csv("Test.csv", header = TRUE, as.is = TRUE)
> test
Date Value
1 1/12/2014 1.5
2 2/12/2014 0.9
3 1/12/2015 -0.1
4 2/12/2015 -0.3
5 1/12/2016 -0.7
6 2/12/2016 0.2
7 7/12/2016 -1.0
8 8/12/2016 -0.2
9 9/12/2016 -1.1
> xts(test, order.by = as.POSIXct(test$Date), format = "%d/%m/%Y")
Date Value
0001-12-20 "1/12/2014" " 1.5"
0001-12-20 "1/12/2015" "-0.1"
0001-12-20 "1/12/2016" "-0.7"
0002-12-20 "2/12/2014" " 0.9"
0002-12-20 "2/12/2015" "-0.3"
0002-12-20 "2/12/2016" " 0.2"
0007-12-20 "7/12/2016" "-1.0"
0008-12-20 "8/12/2016" "-0.2"
0009-12-20 "9/12/2016" "-1.1"
I'd simply like to set an xts ordered by Date, rather than the mystery column on the left. I've tried as.Date for the xts as well but have the same results.
I recommend you use read.zoo to read the data from CSV, then convert the result to xts using as.xts.
Text <- "Date,Value
1/12/2014,1.5
2/12/2014,0.9
1/12/2015,-0.1
2/12/2015,-0.3
1/12/2016,-0.7
2/12/2016,0.2
7/12/2016,-1.0
8/12/2016,-0.2
9/12/2016,-1.1"
z <- read.zoo(text=Text, sep=",", header=TRUE, format="%m/%d/%Y", drop=FALSE)
x <- as.xts(z)
# Value
# 2014-01-12 1.5
# 2014-02-12 0.9
# 2015-01-12 -0.1
# 2015-02-12 -0.3
# 2016-01-12 -0.7
# 2016-02-12 0.2
# 2016-07-12 -1.0
# 2016-08-12 -0.2
# 2016-09-12 -1.1
Note that you will need to omit text = Text from your actual call, and replace it with file = "your_file_name.csv".
The issue appears to be twofold. One, there is a misplaced parenthesis in one of your calls; two, the left most column is the index, making the Date column superfluous.
df <- read.table(text="
Date Value
1/12/2014 1.5
2/12/2014 0.9
1/12/2015 -0.1
2/12/2015 -0.3
1/12/2016 -0.7
2/12/2016 0.2
7/12/2016 -1.0
8/12/2016 -0.2
9/12/2016 -1.1",
header=TRUE)
df$Date <- as.Date(df$Date, format="%d/%m/%Y")
library(xts)
xts(df[-1], order.by=df[,1])
# Value
# 2014-12-01 1.5
# 2014-12-02 0.9
# 2015-12-01 -0.1
# 2015-12-02 -0.3
# 2016-12-01 -0.7
# 2016-12-02 0.2
# 2016-12-07 -1.0
# 2016-12-08 -0.2
# 2016-12-09 -1.1

extract irregular numeric data from strings

I have data like below. I wish to extract the first and last year from each string here called my.string. Some strings only contain one year and some strings contain no years. No strings contain more than two years. I have provided the desired result in the object named desired.result below the example data set. I am using R.
When a string contains two years those years are contained within a portion of the string that looks like this ga49.51 or ea22.24
When a string contains only one year that year is contained in a portion of the string that looks like this: time11
I know a bit about regex, but this problem seems too irregular and complex for me to figure out. I am not even sure where to begin. Thank you for any advice.
EDIT
Perhaps delete the numbers before the first colon (:) and the remaining numbers are what I want.
my.data <- read.table(text = '
my.string cov1 cov2
42:Alpha:ga6.8 -0.1 2.2
43:Alpha:ga9.11 -2.5 0.6
44:Alpha:ga30.32 -1.3 0.5
45:Alpha:ga49.51 -2.5 0.6
50:Alpha:time1:ga.time -1.7 0.9
51:Alpha:time2:ga.time -1.5 0.8
52:Alpha:time3:ga.time -1.0 1.0
2:Beta:ea2.9 -1.7 0.6
3:Beta:ea17.19 -5.0 0.8
4:Beta:ea22.24 -6.4 1.0
8:Beta:as 0.2 0.6
9:Beta:sd 1.7 0.4
12:Beta:time1:ea.tim -2.6 1.8
13:Beta:time10:ea.ti -3.6 1.1
14:Beta:time11:ea.ti -3.1 0.7
', header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
desired.result <- read.table(text = '
my.string cov1 cov2 time1 time2
42:Alpha:ga6.8 -0.1 2.2 6 8
43:Alpha:ga9.11 -2.5 0.6 9 11
44:Alpha:ga30.32 -1.3 0.5 30 32
45:Alpha:ga49.51 -2.5 0.6 49 51
50:Alpha:time1:ga.time -1.7 0.9 1 NA
51:Alpha:time2:ga.time -1.5 0.8 2 NA
52:Alpha:time3:ga.time -1.0 1.0 3 NA
2:Beta:ea2.9 -1.7 0.6 2 9
3:Beta:ea17.19 -5.0 0.8 17 19
4:Beta:ea22.24 -6.4 1.0 22 24
8:Beta:as 0.2 0.6 NA NA
9:Beta:sd 1.7 0.4 NA NA
12:Beta:time1:ea.tim -2.6 1.8 1 NA
13:Beta:time10:ea.ti -3.6 1.1 10 NA
14:Beta:time11:ea.ti -3.1 0.7 11 NA
', header = TRUE, stringsAsFactors = FALSE, na.strings = "NA")
I suggest using stringr library to extract the data you need since it handles NA values better, and also allows using a constrained-width lookbehind:
> library(stringr)
> my.data$time1 <- str_extract(my.data$my.string, "(?<=time)\\d+|(?<=\\b[ge]a)\\d+")
> my.data$time2 <- str_extract(my.data$my.string, "(?<=\\b[ge]a\\d{1,100}\\.)\\d+")
> my.data
my.string cov1 cov2 time1 time2
1 42:Alpha:ga6.8 -0.1 2.2 6 8
2 43:Alpha:ga9.11 -2.5 0.6 9 11
3 44:Alpha:ga30.32 -1.3 0.5 30 32
4 45:Alpha:ga49.51 -2.5 0.6 49 51
5 50:Alpha:time1:ga.time -1.7 0.9 1 <NA>
6 51:Alpha:time2:ga.time -1.5 0.8 2 <NA>
7 52:Alpha:time3:ga.time -1.0 1.0 3 <NA>
8 2:Beta:ea2.9 -1.7 0.6 2 9
9 3:Beta:ea17.19 -5.0 0.8 17 19
10 4:Beta:ea22.24 -6.4 1.0 22 24
11 8:Beta:as 0.2 0.6 <NA> <NA>
12 9:Beta:sd 1.7 0.4 <NA> <NA>
13 12:Beta:time1:ea.tim -2.6 1.8 1 <NA>
14 13:Beta:time10:ea.ti -3.6 1.1 10 <NA>
15 14:Beta:time11:ea.ti -3.1 0.7 11 <NA>
The first regex matches:
(?<=time)\\d+ - 1+ digits that have time before them
| - or
(?<=\\b[ge]a)\\d+ - 1+ digits that have ge or ea` as a whole word in front
The second regex matches:
(?<=\\b[ge]a\\d{1,100}\\.) - check if the current position is preceded with ge or ea as a whole word followed with 1 to 100 digits (I believe that should be enough for your scenario, 100-digit chunks are hardly expected here, you may even decrease the value), and then a .
\\d+ - 1+ digits
Here's a regex that will extract either of the two types, and output them to different columns at the end of the lines:
Search: .*(?:time(\d+)|(?:[ge]a)(\d+)\.(\d+)).*
Replace: $0\t$1\t$2\t$3
Breakdown:
.*(?: ... ).* ensures that the whole line is matched, and uses a non-capturing group for the main alternation
time(\d+): this is the first half of the alternation, capturing any digits after a "time"
(?:[ge]a)(\d+)\.(\d+): the second half of the alternation matches "ga" or "ea" followed by two sets of digits, each in its own capture group
Replacement: $0 puts the whole line back. Each of the other capture groups are added, with tabs in-between.
See regex101 example

What are the Closeness and shortest.paths functions definition in igraph package calculating?

I found a weird result in some data I am working on and decided to test closeness and shortest.paths functions with the following matrix.
test<-c(0,0.3,0.7,0.9,0.3,0,0,0,0.7,0,0,0.5,0.9,0,0.5,0)
test<-matrix(test,nrow=4)
colnames(test)<-c("A","B","C,","D")
rownames(test)<-c("A","B","C,","D")
test
A B C D
A 0.0 0.3 0.7 0.9
B 0.3 0.0 0.0 0.0
C 0.7 0.0 0.0 0.5
D 0.9 0.0 0.5 0.0
grafo=graph.adjacency(abs(test),mode="undirected",weighted=TRUE,diag=FALSE)
When I measure closeness() I get this:
> closeness(grafo)
A B C D
0.5263158 0.4000000 0.4545455 0.3846154
Which is merely the sum of the weights and NOT the distancies (1-weights).
> 1/(0.7+(0.7+0.3)+0.5)
[1] 0.4545455
When I define distance as 1-weight, I get this
> 1/((1-0.7)+((1-0.7)+(1-0.3))+(1-0.5))
[1] 0.5555556
In the igraph manual, it says, in the formula, that it is the sum of distances. My question is, does the function actually consider the weight and, therefore, it is a bug, or WE should consider (and modify) our graphs' edges as distance to run this function?
The SAME issue occurs with the shortest.paths function btw. It gives me a sum of the weights, NOT distances.
> shortest.paths(grafo)
A B C D
A 0.0 0.3 0.7 0.9
B 0.3 0.0 1.0 1.2
C 0.7 1.0 0.0 0.5
D 0.9 1.2 0.5 0.0
Thanks.

R round to nearest .5 or .1

I have a data set of stock prices that have already been rounded to 2 decimal places (1234.56). I am now trying to round to a specific value which is different for each stock. Here are some examples:
Current Stock Price Minimum Tick Increment Desired Output
123.45 .50 123.50
155.03 .10 155.00
138.24 .50 138.00
129.94 .10 129.90
... ... ...
I'm not really sure how to do this but am open to suggestions.
Probably,
round(a/b)*b
will do the work.
> a <- seq(.1,1,.13)
> b <- c(.1,.1,.1,.2,.3,.3,.7)
> data.frame(a, b, out = round(a/b)*b)
a b out
1 0.10 0.1 0.1
2 0.23 0.1 0.2
3 0.36 0.1 0.4
4 0.49 0.2 0.4
5 0.62 0.3 0.6
6 0.75 0.3 0.6
7 0.88 0.7 0.7
I'm not familiar with R the language, but my method should work with any language with a ceiling function. I assume it's rounded UP to nearest 0.5:
a = ceiling(a*2) / 2
if a = 0.4, a = ceiling(0.4*2)/2 = ceiling(0.8)/2 = 1/2 = 0.5
if a = 0.9, a = ceiling(0.9*2)/2 = ceiling(1.8)/2 = 2/2 = 1
Like what JoshO'Brien said in the comments: round_any in the package plyr works very well!
> library(plyr)
> stocks <- c(123.45, 155.03, 138.24, 129.94)
> round_any(stocks,0.1)
[1] 123.4 155.0 138.2 129.9
>
> round_any(stocks,0.5)
[1] 123.5 155.0 138.0 130.0
>
> round_any(stocks,0.1,f = ceiling)
[1] 123.5 155.1 138.3 130.0
>
> round_any(stocks,0.5,f = floor)
[1] 123.0 155.0 138.0 129.5
Read more here:
https://www.rdocumentation.org/packages/plyr/versions/1.8.4/topics/round_any
The taRifx package has just such a function:
> library(taRifx)
> roundnear( seq(.1,1,.13), c(.1,.1,.1,.2,.3,.3,.7) )
[1] 0.1 0.2 0.3 0.4 0.6 0.6 0.7
In your case, just feed it the stock price and the minimum tick increment as its first and second arguments, and it should work its magic.
N.B. This has now been deprecated. See comment.

Resources