Related
What are the alternatives for drawing a simple curve for a function like
eq = function(x){x*x}
in R?
It sounds such an obvious question, but I could only find these related questions on stackoverflow, but they are all more specific
Plot line function in R
Plotting functions on top of datapoints in R
How can I plot a function in R with complex numbers?
How to plot a simple piecewise linear function?
Draw more than one function curves in the same plot
I hope I didn't write a duplicate question.
I did some searching on the web, and this are some ways that I found:
The easiest way is using curve without predefined function
curve(x^2, from=1, to=50, , xlab="x", ylab="y")
You can also use curve when you have a predfined function
eq = function(x){x*x}
curve(eq, from=1, to=50, xlab="x", ylab="y")
If you want to use ggplot,
library("ggplot2")
eq = function(x){x*x}
ggplot(data.frame(x=c(1, 50)), aes(x=x)) +
stat_function(fun=eq)
You mean like this?
> eq = function(x){x*x}
> plot(eq(1:1000), type='l')
(Or whatever range of values is relevant to your function)
plot has a plot.function method
plot(eq, 1, 1000)
Or
curve(eq, 1, 1000)
Here is a lattice version:
library(lattice)
eq<-function(x) {x*x}
X<-1:1000
xyplot(eq(X)~X,type="l")
Lattice solution with additional settings which I needed:
library(lattice)
distribution<-function(x) {2^(-x*2)}
X<-seq(0,10,0.00001)
xyplot(distribution(X)~X,type="l", col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255), cex.lab = 3.5, cex.axis = 3.5, lwd=2 )
If you need your range of values for x plotted in increments different from 1, e.g. 0.00001 you can use:
X<-seq(0,10,0.00001)
You can change the colour of your line by defining a rgb value:
col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255)
You can change the width of the plotted line by setting:
lwd = 2
You can change the size of the labels by scaling them:
cex.lab = 3.5, cex.axis = 3.5
As sjdh also mentioned, ggplot2 comes to the rescue. A more intuitive way without making a dummy data set is to use xlim:
library(ggplot2)
eq <- function(x){sin(x)}
base <- ggplot() + xlim(0, 30)
base + geom_function(fun=eq)
Additionally, for a smoother graph we can set the number of points over which the graph is interpolated using n:
base + geom_function(fun=eq, n=10000)
Function containing parameters
I had a function (emax()) involving 3 parameters (a, b & h) whose line I wanted to plot:
emax = function(x, a, b, h){
(a * x^h)/(b + x^h)
}
curve(emax, from = 1, to = 40, n=40 a = 1, b = 2, h = 3)
which errored with Error in emax(x) : argument "a" is missing, with no default error.
This is fixed by putting the named arguments within the function using this syntax:
curve(emax(x, a = 1, b = 2, h = 3), from = 1, to = 40, n = 40)
which is contrary to the documentation which writes curve(expr, from, to, n, ...) rather than curve(expr(x,...), from, to, n).
I want to integrate a kernel density estimate in order to get a kernel estimate of the cdf.
This is my code:
set.seed(1)
z <- rnorm(250)
pdf <- approxfun(density(z, bw = "SJ"), yleft = 0, yright = 0)
cdf <- function(b) {
integrate(pdf, -Inf, b)$value
}
x <- seq(-20, 20, 0.1)
plot(x, sapply(x, cdf), type = "l", xlab = "x", ylab = "density", ylim= c(0, 1))
Which produces the following plot
As you can see, the cdf drops to zero at ~18, which clearly should not happen.
Why does this happen and how can I avoid it?
Use a large finite number for your left integration endpoint, instead of -infinity.
cdf <- function(b)
{
integrate(pdf, -20, b)$value
}
x <- seq(-20, 20, 0.1)
plot(x, sapply(x, cdf), type="l", xlab="x", ylab="density", ylim=c(0, 1))
The reason is basically because R's numerical integration routine isn't that sophisticated, and sometimes fails when infinite endpoints are supplied. (The help says that using explicit infinite intervals can be better than large finite endpoints. In this case, that advice doesn't work.)
What are the alternatives for drawing a simple curve for a function like
eq = function(x){x*x}
in R?
It sounds such an obvious question, but I could only find these related questions on stackoverflow, but they are all more specific
Plot line function in R
Plotting functions on top of datapoints in R
How can I plot a function in R with complex numbers?
How to plot a simple piecewise linear function?
Draw more than one function curves in the same plot
I hope I didn't write a duplicate question.
I did some searching on the web, and this are some ways that I found:
The easiest way is using curve without predefined function
curve(x^2, from=1, to=50, , xlab="x", ylab="y")
You can also use curve when you have a predfined function
eq = function(x){x*x}
curve(eq, from=1, to=50, xlab="x", ylab="y")
If you want to use ggplot,
library("ggplot2")
eq = function(x){x*x}
ggplot(data.frame(x=c(1, 50)), aes(x=x)) +
stat_function(fun=eq)
You mean like this?
> eq = function(x){x*x}
> plot(eq(1:1000), type='l')
(Or whatever range of values is relevant to your function)
plot has a plot.function method
plot(eq, 1, 1000)
Or
curve(eq, 1, 1000)
Here is a lattice version:
library(lattice)
eq<-function(x) {x*x}
X<-1:1000
xyplot(eq(X)~X,type="l")
Lattice solution with additional settings which I needed:
library(lattice)
distribution<-function(x) {2^(-x*2)}
X<-seq(0,10,0.00001)
xyplot(distribution(X)~X,type="l", col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255), cex.lab = 3.5, cex.axis = 3.5, lwd=2 )
If you need your range of values for x plotted in increments different from 1, e.g. 0.00001 you can use:
X<-seq(0,10,0.00001)
You can change the colour of your line by defining a rgb value:
col = rgb(red = 255, green = 90, blue = 0, maxColorValue = 255)
You can change the width of the plotted line by setting:
lwd = 2
You can change the size of the labels by scaling them:
cex.lab = 3.5, cex.axis = 3.5
As sjdh also mentioned, ggplot2 comes to the rescue. A more intuitive way without making a dummy data set is to use xlim:
library(ggplot2)
eq <- function(x){sin(x)}
base <- ggplot() + xlim(0, 30)
base + geom_function(fun=eq)
Additionally, for a smoother graph we can set the number of points over which the graph is interpolated using n:
base + geom_function(fun=eq, n=10000)
Function containing parameters
I had a function (emax()) involving 3 parameters (a, b & h) whose line I wanted to plot:
emax = function(x, a, b, h){
(a * x^h)/(b + x^h)
}
curve(emax, from = 1, to = 40, n=40 a = 1, b = 2, h = 3)
which errored with Error in emax(x) : argument "a" is missing, with no default error.
This is fixed by putting the named arguments within the function using this syntax:
curve(emax(x, a = 1, b = 2, h = 3), from = 1, to = 40, n = 40)
which is contrary to the documentation which writes curve(expr, from, to, n, ...) rather than curve(expr(x,...), from, to, n).
I am using the following code to create a standard normal distribution in R:
x <- seq(-4, 4, length=200)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=2)
I need the x-axis to be labeled at the mean and at points three standard deviations above and below the mean. How can I add these labels?
The easiest (but not general) way is to restrict the limits of the x axis. The +/- 1:3 sigma will be labeled as such, and the mean will be labeled as 0 - indicating 0 deviations from the mean.
plot(x,y, type = "l", lwd = 2, xlim = c(-3.5,3.5))
Another option is to use more specific labels:
plot(x,y, type = "l", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
Using the code in this answer, you could skip creating x and just use curve() on the dnorm function:
curve(dnorm, -3.5, 3.5, lwd=2, axes = FALSE, xlab = "", ylab = "")
axis(1, at = -3:3, labels = c("-3s", "-2s", "-1s", "mean", "1s", "2s", "3s"))
But this doesn't use the given code anymore.
If you like hard way of doing something without using R built in function or you want to do this outside R, you can use the following formula.
x<-seq(-4,4,length=200)
s = 1
mu = 0
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2))
plot(x,y, type="l", lwd=2, col = "blue", xlim = c(-3.5,3.5))
An extremely inefficient and unusual, but beautiful solution, which works based on the ideas of Monte Carlo simulation, is this:
simulate many draws (or samples) from a given distribution (say the normal).
plot the density of these draws using rnorm. The rnorm function takes as arguments (A,B,C) and returns a vector of A samples from a normal distribution centered at B, with standard deviation C.
Thus to take a sample of size 50,000 from a standard normal (i.e, a normal with mean 0 and standard deviation 1), and plot its density, we do the following:
x = rnorm(50000,0,1)
plot(density(x))
As the number of draws goes to infinity this will converge in distribution to the normal. To illustrate this, see the image below which shows from left to right and top to bottom 5000,50000,500000, and 5 million samples.
In general case, for example: Normal(2, 1)
f <- function(x) dnorm(x, 2, 1)
plot(f, -1, 5)
This is a very general, f can be defined freely, with any given parameters, for example:
f <- function(x) dbeta(x, 0.1, 0.1)
plot(f, 0, 1)
I particularly love Lattice for this goal. It easily implements graphical information such as specific areas under a curve, the one you usually require when dealing with probabilities problems such as find P(a < X < b) etc.
Please have a look:
library(lattice)
e4a <- seq(-4, 4, length = 10000) # Data to set up out normal
e4b <- dnorm(e4a, 0, 1)
xyplot(e4b ~ e4a, # Lattice xyplot
type = "l",
main = "Plot 2",
panel = function(x,y, ...){
panel.xyplot(x,y, ...)
panel.abline( v = c(0, 1, 1.5), lty = 2) #set z and lines
xx <- c(1, x[x>=1 & x<=1.5], 1.5) #Color area
yy <- c(0, y[x>=1 & x<=1.5], 0)
panel.polygon(xx,yy, ..., col='red')
})
In this example I make the area between z = 1 and z = 1.5 stand out. You can move easily this parameters according to your problem.
Axis labels are automatic.
This is how to write it in functions:
normalCriticalTest <- function(mu, s) {
x <- seq(-4, 4, length=200) # x extends from -4 to 4
y <- (1/(s * sqrt(2*pi))) * exp(-((x-mu)^2)/(2*s^2)) # y follows the formula
of the normal distribution: f(Y)
plot(x,y, type="l", lwd=2, xlim = c(-3.5,3.5))
abline(v = c(-1.96, 1.96), col="red") # draw the graph, with 2.5% surface to
either side of the mean
}
normalCriticalTest(0, 1) # draw a normal distribution with vertical lines.
Final result:
I am a beginner in R and I have tried to find information about the following without finding anything.
The green graph in the picture is composed by the red and yellow graphs. But let's say that I only have the data points of something like the green graph. How do I extract the low/high frequencies (i.e. approximately the red/yellow graphs) using a low pass/high pass filter?
Update: The graph was generated with
number_of_cycles = 2
max_y = 40
x = 1:500
a = number_of_cycles * 2*pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)
Update 2: Using the Butterworth filter in the signal package suggested I get the following:
library(signal)
bf <- butter(2, 1/50, type="low")
b <- filter(bf, y+noise1)
points(x, b, col="black", pch=20)
bf <- butter(2, 1/25, type="high")
b <- filter(bf, y+noise1)
points(x, b, col="black", pch=20)
The calculations was a bit work, signal.pdf gave next to no hints about what values W should have, but the original octave documentation at least mentioned radians which got me going. The values in my original graph was not chosen with any specific frequency in mind, so I ended up with the following not so simple frequencies: f_low = 1/500 * 2 = 1/250, f_high = 1/500 * 2*10 = 1/25 and the sampling frequency f_s = 500/500 = 1. Then I chose a f_c somewhere inbetween the low and high frequencies for the low/high pass filters (1/100 and 1/50 respectively).
I bumped into similar problem recently and did not find the answers here particularly helpful. Here is an alternative approach.
Let´s start by defining the example data from the question:
number_of_cycles = 2
max_y = 40
x = 1:500
a = number_of_cycles * 2*pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
y <- y + noise1
plot(x, y, type="l", ylim=range(-1.5*max_y,1.5*max_y,5), lwd = 5, col = "green")
So the green line is the dataset we want to low-pass and high-pass filter.
Side note: The line in this case could be expressed as a function by using cubic spline (spline(x,y, n = length(x))), but with real world data this would rarely be the case, so let's assume that it is not possible to express the dataset as a function.
The easiest way to smooth such data I have came across is to use loess or smooth.spline with appropriate span/spar. According to statisticians loess/smooth.spline is probably not the right approach here, as it does not really present a defined model of the data in that sense. An alternative is to use Generalized Additive Models (gam() function from package mgcv). My argument for using loess or smoothed spline here is that it is easier and does not make a difference as we are interested in the visible resulting pattern. Real world datasets are more complicated than in this example and finding a defined function for filtering several similar datasets might be difficult. If the visible fit is good, why to make it more complicated with R2 and p values? To me the application is visual for which loess/smoothed splines are appropriate methods. Both of the methods assume polynomial relationships with the difference that loess is more flexible also using higher degree polynomials, while cubic spline is always cubic (x^2). Which one to use depends on trends in a dataset. That said, the next step is to apply a low-pass filter on the dataset by using loess() or smooth.spline():
lowpass.spline <- smooth.spline(x,y, spar = 0.6) ## Control spar for amount of smoothing
lowpass.loess <- loess(y ~ x, data = data.frame(x = x, y = y), span = 0.3) ## control span to define the amount of smoothing
lines(predict(lowpass.spline, x), col = "red", lwd = 2)
lines(predict(lowpass.loess, x), col = "blue", lwd = 2)
Red line is the smoothed spline filter and blue the loess filter. As you see results differ slightly. I guess one argument of using GAM would be to find the best fit, if the trends really were this clear and consistent among datasets, but for this application both of these fits are good enough for me.
After finding a fitting low-pass filter, the high-pass filtering is as simple as subtracting the low-pass filtered values from y:
highpass <- y - predict(lowpass.loess, x)
lines(x, highpass, lwd = 2)
This answer comes late, but I hope it helps someone else struggling with similar problem.
Use filtfilt function instead of filter (package signal) to get rid of signal shift.
library(signal)
bf <- butter(2, 1/50, type="low")
b1 <- filtfilt(bf, y+noise1)
points(x, b1, col="red", pch=20)
One method is using the fast fourier transform implemented in R as fft. Here is an example of a high pass filter. From the plots above, the idea implemented in this example is to get the serie in yellow starting from the serie in green (your real data).
# I've changed the data a bit so it's easier to see in the plots
par(mfrow = c(1, 1))
number_of_cycles = 2
max_y = 40
N <- 256
x = 0:(N-1)
a = number_of_cycles * 2 * pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)
### Apply the fft to the noisy data
y_noise = y + noise1
fft.y_noise = fft(y_noise)
# Plot the series and spectrum
par(mfrow = c(1, 2))
plot(x, y_noise, type='l', main='original serie', col='green4')
plot(Mod(fft.y_noise), type='l', main='Raw serie - fft spectrum')
### The following code removes the first spike in the spectrum
### This would be the high pass filter
inx_filter = 15
FDfilter = rep(1, N)
FDfilter[1:inx_filter] = 0
FDfilter[(N-inx_filter):N] = 0
fft.y_noise_filtered = FDfilter * fft.y_noise
par(mfrow = c(2, 1))
plot(x, noise1, type='l', main='original noise')
plot(x, y=Re( fft( fft.y_noise_filtered, inverse=TRUE) / N ) , type='l',
main = 'filtered noise')
Per request of OP:
The signal package contains all kinds of filters for signal processing. Most of it is comparable to / compatible with the signal processing functions in Matlab/Octave.
Check out this link where there's R code for filtering (medical signals). It's by Matt Shotwell and the site is full of interesting R/stats info with a medical bent:
biostattmat.com
The package fftfilt contains lots of filtering algorithms that should help too.
I also struggled to figure out how the W parameter in the butter function maps on to the filter cut-off, in part because the documentation for filter and filtfilt is incorrect as of posting (it suggests that W = .1 would result in a 10 Hz lp filter when combined with filtfilt when signal sampling rate Fs = 100, but actually, it's only a 5 Hz lp filter -- the half-amplitude cut-off is 5 Hz when use filtfilt, but the half-power cut-off is 5 Hz when you only apply the filter once, using the filter function). I'm posting some demo code I wrote below that helped me confirm how this is all working, and that you could use to check a filter is doing what you want.
#Example usage of butter, filter, and filtfilt functions
#adapted from https://rdrr.io/cran/signal/man/filtfilt.html
library(signal)
Fs <- 100; #sampling rate
bf <- butter(3, 0.1);
#when apply twice with filtfilt,
#results in a 0 phase shift
#5 Hz half-amplitude cut-off LP filter
#
#W * (Fs/2) == half-amplitude cut-off when combined with filtfilt
#
#when apply only one time, using the filter function (non-zero phase shift),
#W * (Fs/2) == half-power cut-off
t <- seq(0, .99, len = 100) # 1 second sample
#generate a 5 Hz sine wave
x <- sin(2*pi*t*5)
#filter it with filtfilt
y <- filtfilt(bf, x)
#filter it with filter
z <- filter(bf, x)
#plot original and filtered signals
plot(t, x, type='l')
lines(t, y, col="red")
lines(t,z,col="blue")
#estimate signal attenuation (proportional reduction in signal amplitude)
1 - mean(abs(range(y[t > .2 & t < .8]))) #~50% attenuation at 5 Hz using filtfilt
1 - mean(abs(range(z[t > .2 & t < .8]))) #~30% attenuation at 5 Hz using filter
#demonstration that half-amplitude cut-off is 6 Hz when apply filter only once
x6hz <- sin(2*pi*t*6)
z6hz <- filter(bf, x6hz)
1 - mean(abs(range(z6hz[t > .2 & t < .8]))) #~50% attenuation at 6 Hz using filter
#plot the filter attenuation profile (for when apply one time, as with "filter" function):
hf <- freqz(bf, Fs = Fs);
plot(c(0, 20, 20, 0, 0), c(0, 0, 1, 1, 0), type = "l",
xlab = "Frequency (Hz)", ylab = "Attenuation (abs)")
lines(hf$f[hf$f<=20], abs(hf$h)[hf$f<=20])
plot(c(0, 20, 20, 0, 0), c(0, 0, -50, -50, 0),
type = "l", xlab = "Frequency (Hz)", ylab = "Attenuation (dB)")
lines(hf$f[hf$f<=20], 20*log10(abs(hf$h))[hf$f<=20])
hf$f[which(abs(hf$h) - .5 < .001)[1]] #half-amplitude cutoff, around 6 Hz
hf$f[which(20*log10(abs(hf$h))+6 < .2)[1]] #half-amplitude cutoff, around 6 Hz
hf$f[which(20*log10(abs(hf$h))+3 < .2)[1]] #half-power cutoff, around 5 Hz
there is a package on CRAN named FastICA, this computes the approximation of the independent source signals, however in order to compute both signals you need a matrix of at least 2xn mixed observations (for this example), this algorithm can't determine the two indpendent signals with just 1xn vector. See the example below. hope this can help you.
number_of_cycles = 2
max_y = 40
x = 1:500
a = number_of_cycles * 2*pi/length(x)
y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)
######################################################
library(fastICA)
S <- cbind(y,noise1)#Assuming that "y" source1 and "noise1" is source2
A <- matrix(c(0.291, 0.6557, -0.5439, 0.5572), 2, 2) #This is a mixing matrix
X <- S %*% A
a <- fastICA(X, 2, alg.typ = "parallel", fun = "logcosh", alpha = 1,
method = "R", row.norm = FALSE, maxit = 200,
tol = 0.0001, verbose = TRUE)
par(mfcol = c(2, 3))
plot(S[,1 ], type = "l", main = "Original Signals",
xlab = "", ylab = "")
plot(S[,2 ], type = "l", xlab = "", ylab = "")
plot(X[,1 ], type = "l", main = "Mixed Signals",
xlab = "", ylab = "")
plot(X[,2 ], type = "l", xlab = "", ylab = "")
plot(a$S[,1 ], type = "l", main = "ICA source estimates",
xlab = "", ylab = "")
plot(a$S[, 2], type = "l", xlab = "", ylab = "")
I am not sure if any filter is the best way for You. More useful instrument for that aim is the fast Fourier transformation.