I use quantmod, to calculate the moving average over 2000 dataframes with loop
price = xts object
price <- cbind(price, SMA(price, 5), SMA(price, 10),
SMA(price, 20), SMA(price, 60), SMA(price, 120),
SMA(price, 180), SMA(price, 240))
But some data don't exceed the number of width, stop running in the middle. In that case, I just want to fill NA only.
I need some support to solve this problem.
Or if I need to use any other package for solving this problem, let me know
Thanks
Moving average functions give an error when the chosen period is longer than the available data. As #RuiBarradas mentions in the comment, for a SMA zoo::rollmean could work. As you need to loop over quite a few data.frames a function is easier. The function below could be used in an lapply function or just in a loop.
I created a sub function inside the bigger function to check if the chosen period is bigger than the rows supplied. If so, return a vector of NA's else return a SMA. After that, loop over the periods to return a data.frame with the supplied price column and all the SMA columns with a name so you can see which SMA is in which column.
Note that there is no error handling in case of incorrect inputs. Sample data below.
# periods for the SMA
periods <- c(5, 10, 20, 60, 120, 180, 240)
get_smas <- function(price, n) {
my_sma <- function(x, n = 10) {
if (n < 1 || n > NROW(x)) {
out <- rep(NA_real_, NROW(x))
} else {
# change SMA for EMA if you want the EMA's
out <- TTR::SMA(x, n = n)
}
out
}
# combine the price column with the ma's. Reduce works backwards, so price column last
price_combined <- Reduce(cbind, lapply(n, function(x) my_sma(price, n = x)), price)
# turn matrix into data.frame
price_combined <- data.frame(price_combined)
# rename columns, assuming price column has a column name.
# change paste0 value from SMA to EMA if EMA is used.
names(price_combined) <- c(names(price_combined)[1], paste0("SMA_", n))
price_combined
}
# supply a price and a vector of periods
my_prices <- get_smas(price, periods)
head(my_prices, 2)
Close SMA_5 SMA_10 SMA_20 SMA_60 SMA_120 SMA_180 SMA_240
1 182.01 NA NA NA NA NA NA NA
2 179.70 NA NA NA NA NA NA NA
tail(my_prices, 2)
Close SMA_5 SMA_10 SMA_20 SMA_60 SMA_120 SMA_180 SMA_240
142 156.79 154.156 152.053 147.475 145.4393 156.1770 NA NA
143 157.35 154.556 152.941 148.381 145.4292 156.0474 NA NA
data:
# close prices of aapl from 2022-01-03 to 2022-07-28
price <- structure(list(Close = c(182.009995, 179.699997, 174.919998,
172, 172.169998, 172.190002, 175.080002, 175.529999, 172.190002,
173.070007, 169.800003, 166.229996, 164.509995, 162.410004, 161.619995,
159.779999, 159.690002, 159.220001, 170.330002, 174.779999, 174.610001,
175.839996, 172.899994, 172.389999, 171.660004, 174.830002, 176.279999,
172.119995, 168.639999, 168.880005, 172.789993, 172.550003, 168.880005,
167.300003, 164.320007, 160.070007, 162.740005, 164.850006, 165.119995,
163.199997, 166.559998, 166.229996, 163.169998, 159.300003, 157.440002,
162.949997, 158.520004, 154.729996, 150.619995, 155.089996, 159.589996,
160.619995, 163.979996, 165.380005, 168.820007, 170.210007, 174.070007,
174.720001, 175.600006, 178.960007, 177.770004, 174.610001, 174.309998,
178.440002, 175.059998, 171.830002, 172.139999, 170.089996, 165.75,
167.660004, 170.399994, 165.289993, 165.070007, 167.399994, 167.229996,
166.419998, 161.789993, 162.880005, 156.800003, 156.570007, 163.639999,
157.649994, 157.960007, 159.479996, 166.020004, 156.770004, 157.279999,
152.059998, 154.509995, 146.5, 142.559998, 147.110001, 145.539993,
149.240005, 140.820007, 137.350006, 137.589996, 143.110001, 140.360001,
140.520004, 143.779999, 149.639999, 148.839996, 148.710007, 151.210007,
145.380005, 146.139999, 148.710007, 147.960007, 142.639999, 137.130005,
131.880005, 132.759995, 135.429993, 130.059998, 131.559998, 135.869995,
135.350006, 138.270004, 141.660004, 141.660004, 137.440002, 139.229996,
136.720001, 138.929993, 141.559998, 142.919998, 146.350006, 147.039993,
144.869995, 145.860001, 145.490005, 148.470001, 150.169998, 147.070007,
151, 153.039993, 155.350006, 154.089996, 152.949997, 151.600006,
156.789993, 157.350006)), class = "data.frame", row.names = c(NA,
-143L))
rollmeanr and rollapplyr can handle the situation with fewer data items than width.
library(zoo)
price <- 1:6
rollmeanr(price, 10, fill = NA)
## [1] NA NA NA NA NA NA
w <- c(5, 10, 20, 60, 120, 180, 240)
sapply(setNames(w, w), rollmeanr, x = price, fill = NA)
## 5 10 20 60 120 180 240
## [1,] NA NA NA NA NA NA NA
## [2,] NA NA NA NA NA NA NA
## [3,] NA NA NA NA NA NA NA
## [4,] NA NA NA NA NA NA NA
## [5,] 3 NA NA NA NA NA NA
## [6,] 4 NA NA NA NA NA NA
Given a uncertain number of columns containing source values for the same variable I would like to create a column that defines the final value to be selected depending on source importance and availability.
Reproducible data:
set.seed(123)
actuals = runif(10, 500, 1000)
get_rand_vector <- function(){return (runif(10, 0.95, 1.05))}
get_na_rand_ixs <- function(){return (round(runif(5,0,10),0))}
df = data.frame("source_1" = actuals*get_rand_vector(),
"source_2" = actuals*get_rand_vector(),
"source_n" = actuals*get_rand_vector())
df[["source_1"]][get_na_rand_ixs()] <- NA
df[["source_2"]][get_na_rand_ixs()] <- NA
df[["source_n"]][get_na_rand_ixs()] <- NA
My manual solution is as follows:
df$available <- ifelse(
!is.na(df$source_1),
df$source_1,
ifelse(
!is.na(df$source_2),
df$source_2,
df$source_n
)
)
Given the desired result of:
source_1 source_2 source_n available
1 NA NA NA NA
2 NA NA 930.1242 930.1242
3 716.9981 NA 717.9234 716.9981
4 NA 988.0446 NA 988.0446
5 931.7081 NA 924.1101 931.7081
6 543.6802 533.6798 NA 543.6802
7 744.6525 767.4196 783.8004 744.6525
8 902.8788 955.1173 NA 902.8788
9 762.3690 NA 761.6135 762.3690
10 761.4092 702.6064 708.7615 761.4092
How could I automatically iterate over the available sources to set the data to be considered? Given in some cases n_sources could be 1,2,3..,7 and priority follows the natural order (1 > 2 >..)
Once you have all of the candidate vectors in order and in an appropriate data structure (e.g., data.frame or matrix), you can use apply to apply a function over the rows. In this case, we just look for the first non-NA value. Thus, after the first block of code above, you only need the following line:
df$available <- apply(df, 1, FUN = function(x) x[which(!is.na(x))[1]])
coalesce() from dplyr is designed for this:
library(dplyr)
df %>%
mutate(available = coalesce(!!!.))
source_1 source_2 source_n available
1 NA NA NA NA
2 NA NA 930.1242 930.1242
3 716.9981 NA 717.9234 716.9981
4 NA 988.0446 NA 988.0446
5 931.7081 NA 924.1101 931.7081
6 543.6802 533.6798 NA 543.6802
7 744.6525 767.4196 783.8004 744.6525
8 902.8788 955.1173 NA 902.8788
9 762.3690 NA 761.6135 762.3690
10 761.4092 702.6064 708.7615 761.4092
I am attempting to create a matrix of response probabilities by looping through the rows of a vector (theta) and columns of separate matrix (tmp). I keep receiving the error message incorrect number of subscripts on matrixand am not sure what I am doing wrong. Any help would be appreciated!
theta = seq(from=-4, to=4, by=.01)
ID = c(1:10)
a = c(1.11,1.03,1.03,1.62,1.23,1.16,1.46,0.91,0.78,0.85)
b = c(-0.33,0.05,-1.25,-0.18,0.47,-1.11,-0.17,-0.57,-0.18,0.45)
c = c(0.16,0.18,0.17,0.24,0.12,NA,NA,NA,0.29,NA)
tmp = data.frame(ID,a,b,c)
for (j in 1:nrow(tmp)) {
for (k in 1:length(theta)){
RP[k,j] = tmp$c[j] + ((1-tmp$c[j])/
(1+exp(-1.7 * tmp$a[j]*theta - tmp$b[j])))
}
}
The desired results is a matrix with the same number of rows as the length of theta and the same number of columns as the tmp data frame. It should look like this:
head(tmp2)
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
1 0.1603182 0.1807822 0.1702159 0.2400104 0.1203281 NA NA NA 0.2929362 NA
2 0.1603243 0.1807960 0.1702197 0.2400107 0.1203350 NA NA NA 0.2929752 NA
3 0.1603305 0.1808100 0.1702236 0.2400110 0.1203421 NA NA NA 0.2930148 NA
4 0.1603368 0.1808243 0.1702276 0.2400113 0.1203493 NA NA NA 0.2930549 NA
5 0.1603432 0.1808389 0.1702316 0.2400116 0.1203567 NA NA NA 0.2930955 NA
6 0.1603497 0.1808537 0.1702357 0.2400120 0.1203642 NA NA NA 0.2931366 NA
In the last line of the for loop, you are using the whole of vector theta as a multiplier:
(1+exp(-1.7 * tmp$a[j]*theta - tmp$b[j])))
Presumably you meant to use the kth element:
(1+exp(-1.7 * tmp$a[j]*theta[k] - tmp$b[j])))
I can't test this as you've left out the definition of the matrix RP, but I'm sure you didn't mean to return 801 elements every pass through the loop.
I have just started learning R. I'm trying to input data from a .csv file and but R keeps adding extra rows and columns with NA values. Does anyone know why this might be happening? Any advice on removing these NA would be greatly appreciated. I have used the following code:
>no_col <- max(count.fields("6%AA_comp.csv", sep=","))
>mydata <- read.csv(file="6%AA_comp.csv", fill=TRUE, header=TRUE, col.names = 1:no_col-1)
>mydata
X0 X1 X2 X3 X4
1 206428 152160 122080 111940 NA
2 183620 148300 118820 107260 NA
3 169100 164480 151420 146200 NA
4 179000 135920 107340 93540 NA
5 213820 146640 113040 109140 NA
6 150920 141400 133600 132000 NA
7 185645 154000 124510 128900 NA
8 176102 139100 141000 110300 NA
9 159045 154350 121050 153500 NA
10 198610 161000 119000 105600 NA
11 183100 138900 141500 129550 NA
12 211050 142550 136700 113500 NA
13 167000 150100 120000 102540 NA
14 NA NA NA NA NA
15 NA NA NA NA NA
16 NA NA NA NA NA
Well, data cleansing is always half the job or more. What you can do is to read the file as it is and then clean it by indexing only the rows and columns you are interested in, in your case this would be:
mydata <- read.csv(file="6%AA_comp.csv", fill=TRUE, header=TRUE)
mydata <- mydata[1:13, 1:5]
This typically happens when you delete some rows from your csv file and then try and import the same.
If its a one off, the easiest solution will be to open the csv in excel and delete all the rows below the last data row.
Addressing the comment below, we can do something like this
NA.Count = function(x)
{
return(sum(is.na(x)))
}
Row.NA.Count = apply(MAT,1,NA.Count)
Idx = Row.NA.Count == ncol(MAT)
MAT = MAT[!Idx,]
where MAT is the imported matrix.
The above code will take care of all the empty rows. You can do a similar thing for the columns.
Hope this helps.
I have a large dataframe called dualbeta which contains 2 rows and 6080 columns. Here is a sample:
row.names A.Close AA.Close AADR.Close AAIT.Close AAL.Close
1 upside 1.253929 0.9869027 0.6169613 0.6353903 0.1782124
2 downside 1.027412 1.1936236 0.5915299 0.5697878 0.1702382
I am trying to extract only those with the upside >= 1.00 and those with a downside <=1.00. I used combinations <- subset(dualbeta, upside>=1.00 & downside<=1.00) but i get the following:
row.names A.Close AA.Close AADR.Close AAIT.Close
1 NA NA NA NA NA
2 NA.1 NA NA NA NA
3 NA.2 NA NA NA NA
4 NA.3 NA NA NA NA
5 NA.4 NA NA NA NA
...
It should just return a 2 by x table where x is the number of combinations found. I do not know why I am getting a bunch of rows? Additionally, i thought i had NA values in the dualbeta so i used na.omit(dualbeta)->dualbeta but it deleted everything & turned dualbeta into a 0 by 6080. I also used which(is.na(dualbeta)) which returned 3307 and 3308 but when i checked those columns, they did not contain NAs.
You might work on the transpose of the data in order to select rows with the proper characteristics (which are columns in the transpose):
# Fix up the data, use proper row names
rownames(x) <- x$row.names
# Remove old row name column
x <- x[-1]
# transpose and subset
subset(data.frame(t(x)), upside > 1 & downside < 1)
This expression returns a zero-length result with your example data. Changing the parameters shows what is returned:
subset(data.frame(t(x)), upside > .6 & downside < .6)
## upside downside
## AADR.Close 0.6169613 0.5915299
## AAIT.Close 0.6353903 0.5697878
You can the data with simple indexing.
Let's say this is your data
dualbeta<-data.frame(matrix(runif(24,0,2),
nrow=2,
dimnames=list(c("upside","downside"), letters[1:12])))
then you can extract with
dualbeta[, dualbeta[1,]>=1.00 & dualbeta[2,]<=1.00]