I'm simulating some data in R, plotting x against Y (a rate) and I want x to be increasing linearly up to a point and then level off. That is, Y is a function of x between say, 0.1 and 5, but constant from 5.01 to 10. Is there a simple command which allows for varying x's? I'm sure my lecturer told me about one but I can't remember it...
Any help or thoughts would be greatly appreciated!
You could use ifelse:
> f <- function(x) ifelse(x < 5, x**2, 25)
> x <- seq(1, 10, .1)
> plot(x, f(x), type='l')
Related
I am trying to make a 3D surface plot in R. In this the values for the z and x axes should be within a given range, and the value of y should depend on both x and z as described in the function.
z <- maxGiraffeNumber <- c(100:2400)
x <- Tourism <- c(1:100)
y <- Rain <- 100/(17/((z/365)*0.3))*(100-y)
surface3d(x,y,z, col=colors)
running this code gives me the following error
Error in rgl.surface(x = 1:100, y = 100:2400, z = c(14.7789550384904, :
'y' length != 'x' rows * 'z' cols
Thank you for your help
The problem is stated fairly well in the error. To produce a surface plot you need a vector of x co-ordinates of any length and a vector of y co-ordinates of any length. However, your z vector needs to have a value at every (x, y) co-ordinate, which means you need to be sure that length(z) == length(x) * length(y). However, what you have is x of length 100, y of length 2301 and z of length 2301.
If you have a function you want to apply to every possible combination of x and y, you can use outer.
I'll give an example of producing a surface with something similar to the code you have created here, but it's probably not exactly what you were looking for, since it's not clear exactly what you are trying to do.
library(rgl)
f <- function(x, y) 100 / (17/((x / 365) * 0.3)) * (100 - y)
y <- Rain <- c(1:100)
x <- Tourism <- c(1:100)
z <- maxGiraffeNumber <- outer(Rain, Tourism, f)
surface3d(Tourism, Rain, maxGiraffeNumber, col = "red")
Which makes the following rotatable 3D surface pop-up:
I'm attempting to create 1000 samples of a certain variable Z, in which first I generate 12 uniform RV's Ui, and then have Z = ∑ (Ui-6) from i=1 to 12. I can generate one Z from
u <- runif(12)
Z <- sum(u-6)
However I am not sure how to go about repeating that 1000x. In the end, the desire is to plot out the histogram of the Z's, and ideally it to resemble the normal curve. Sorry, clearly I am as beginner as you can get in this realm. Thank you!
If I understand the question, this is a pretty straightforward way to do it -- use replicate() to perform the calculation as many times as you want.
# number of values to draw per iteration
n_samples <- 12
# number of iterations
n_iters <- 1000
# get samples, subtract 6 from each element, sum them (1000x)
Zs <- replicate(n_iters, sum(runif(n_samples) - 6))
# print a histogram
hist(Zs)
Is this what you're after?
set.seed(2017);
n <- 1000;
u <- matrix(runif(12 * n), ncol = 12);
z <- apply(u, 1, function(x) sum(x - 6));
# Density plot
require(ggplot2);
ggplot(data.frame(z = z), aes(x = z)) + geom_density();
Explanation: Draw 12 * 1000 uniform samples in one go, store in a 1000 x 12 matrix, and then sum row entries x - 6.
I'd first like to describe my problem:
What i want to do is to calculate the number of spikes on prices in a 24 hour window, while I possess half hourly data.
I have seen all Stackoverflow posts like e.g. this one:
Rollapply for time series
(If there are more relevant ones, please let me know ;) )
As I cannot and probably also should not upload my data, here's a minimal example:
I simulate a random variable, convert it to an xts object, and use a user defined function to detect "spikes" (of course pretty ridiculous in this case, but illustrates the error).
library(xts)
##########Simulate y as a random variable
y <- rnorm(n=100)
##########Add a date variable so i can convert it to a xts object later on
yDate <- as.Date(1:100)
##########bind both variables together and convert to a xts object
z <- cbind(yDate,y)
z <- xts(x=z, order.by=yDate)
##########use the rollapply function on the xts object:
x <- rollapply(z, width=10, FUN=mean)
The function works as it is supposed to: it takes the 10 preceding values and calculates the mean.
Then, I defined an own function to find peaks: A peak is a local maximum (higher than m points around it) AND is at least as big as the mean of the timeseries+h.
This leads to:
find_peaks <- function (x, m,h){
shape <- diff(sign(diff(x, na.pad = FALSE)))
pks <- sapply(which(shape < 0), FUN = function(i){
z <- i - m + 1
z <- ifelse(z > 0, z, 1)
w <- i + m + 1
w <- ifelse(w < length(x), w, length(x))
if(all(x[c(z : i, (i + 2) : w)] <= x[i + 1])&x[i+1]>mean(x)+h) return(i + 1) else return(numeric(0))
})
pks <- unlist(pks)
pks
}
And works fine: Back to the example:
plot(yDate,y)
#Is supposed to find the points which are higher than 3 points around them
#and higher than the average:
#Does so, so works.
points(yDate[find_peaks(y,3,0)],y[find_peaks(y,3,0)],col="red")
However, using the rollapply() function leads to:
x <- rollapply(z,width = 10,FUN=function(x) find_peaks(x,3,0))
#Error in `[.xts`(x, c(z:i, (i + 2):w)) : subscript out of bounds
I first thought, well, maybe the error occurs because for it might run int a negative index for the first points, because of the m parameter. Sadly, setting m to zero does not change the error.
I have tried to trace this error too, but do not find the source.
Can anyone help me out here?
Edit: A picture of spikes:Spikes on the australian Electricity Market. find_peaks(20,50) determines the red points to be spikes, find_peaks(0,50) additionally finds the blue ones to be spikes (therefore, the second parameter h is important, because the blue points are clearly not what we want to analyse when we talk about spikes).
I'm still not entirely sure what it is that you are after. On the assumption that given a window of data you want to identify whether its center is greater than the rest of the window at the same time as being greater than the mean of the window + h then you could do the following:
peakfinder = function(x,h = 0){
xdat = as.numeric(x)
meandat = mean(xdat)
center = xdat[ceiling(length(xdat)/2)]
ifelse(all(center >= xdat) & center >= (meandat + h),center,NA)
}
y <- rnorm(n=100)
z = xts(y, order.by = as.Date(1:100))
plot(z)
points(rollapply(z,width = 7, FUN = peakfinder, align = "center"), col = "red", pch = 19)
Although it would appear to me that if the center point is greater than it's neighbours it is necessarily greater than the local mean too so this part of the function would not be necessary if h >= 0. If you want to use the global mean of the time series, just substitute the calculation of meandat with the pre-calculated global mean passed as an argument to peakfinder.
I have a vector and for each point in that vector I would like to compute the difference between the average for some range of the points immediately before that point minus the average for some range of the points immediately after that point. I have done this with a for loop because filter does not seem to have an option to apply exclusively to points after a vector point (parameter sides = only 1 or 2) and because I did not know how to shoehorn this into an apply statement since I need a function that operates on each point using its position within the vector and not just its own value. Can someone show me the way?
Here's how I did it with a for loop:
x = rep(c(1,1,1,1,1,10), 20)
x = x + 100
x = x - c(1:length(x))
lookahead = 4
y = x
for(i in (lookahead):(length(x)-lookahead))
{
y[i] = mean(x[(i-lookahead):i]) - mean(x[i:(i+lookahead)])
}
plot(x)
lines(y, col="red")
You can see from the plot what the objective is: to identify spikes (but no I don't want to be told about other ways to find spikes, I want to use my simple boxcar moving average method).
There must be a better way to calculate this vector. Thank you for any suggestions.
p.s. I see someone wants to flag this as a repeat of Calculating moving average in R However my question is different as the answers to that question (use roll_mean or filter) don't apply here without modification. If there is a way to use roll_mean or filter, I can't tell from the docs and would appreciate someone telling me how I can use either of these to calculate forward-looking moving averages instead of backward-looking moving averages. Thanks again.
Problem with your procedure is that it starts at i=4, and subsets x[0:4] where R trims out 0 index automatically.
y1 = RcppRoll::roll_mean(x, 5)
y1 = c(rep(NA, 4), y1) - c(y1, rep(NA, 4)) # you can use y1 = lag(y1, 4) - y1 instead if you have dplyr
# fill NA positions
y1[1:4]=x[1:4]
y1[116:120]=x[116:120]
y1 differs from y only at positions 4, and 116 where your loop is problematic.
In case, if you have no access to RcppRoll, you can use embed(faster than zoo::rollmean).
y1 = rowMeans(embed(x, 5)) #slightly slower than roll_mean
y1 = c(rep(NA, 4), y1) - c(y1, rep(NA, 4)) # you can use y1 = lag(y1, 4) - y1 instead if you have dplyr
# fill NA positions
y1[1:4]=x[1:4]
y1[116:120]=x[116:120]
OK. I have one solution however, I've modified your code for the loop to go from (lookahead+1):(length(x)-lookahead) . This is so that the very first mean is a mean of 5 values like all the rest.
Calculate a vector of averages of 5 values:
lastIndexInY <- length(x)-lookahead
Y_ave <- (x[ 1:lastIndexInY ] + x[ 1:lastIndexInY +1] + x[ 1:lastIndexInY +2] + x[ 1:lastIndexInY +3]+ x[ 1:lastIndexInY +4] )/5
Then your result y is the same as:
y_vec <- c(x[1:4], Y_ave[1:(length(Y_ave)-4)] - Y_ave[5:length(Y_ave) ] , x[-3:0 + length(x)] )
all(y - y_vec == 0 )
[1] TRUE
(Are you sure you need to retain the first 4 and last 4 values of x?)
everybody.
I am writing this question because I need your help to do a graphical representation in R. I would like to do a graphic as a coordinate system (two variables) and another variable representing the volume (size) of the points.
You can see an example:pxfolioplotbcgmatrix
Thank you!
Try ?symbols function
x <- 1:5
y <- 5:1
r <- seq(2, 10, 2)
symbols(x, y, circles=r, fg="white", bg="red")