So, I've spent the last four hours trying to find an efficient way of plotting the curve(s) of a function with two variables - to no avail. The only answer that I could actually put to practice wasn't producing a multiple-line graph as I expected.
I created a function with two variables, x and y, and it returns a continuous numeric value. I wanted to plot in a single screen the result of this function with certain values of x and all possible values of y within a given range (y is also a continuous variable).
Something like that:
These two questions did help a little, but I still can't get there:
Plotting a function curve in R with 2 or more variables
How to plot function of multiple variables in R by initializing all variables but one
I also used the mosaic package and plotFun function, but the results were rather unappealing and not very readable: https://www.youtube.com/watch?v=Y-s7EEsOg1E.
Maybe the problem is my lack of proficiency with R - though I've been using it for months so I'm not such a noob. Please enlighten me.
Say we have a simple function with two arguments:
fun <- function(x, y) 0.5*x - 0.01*x^2 + sqrt(abs(y)/2)
And we want to evaluate it on the following x and y values:
xs <- seq(-100, 100, by=1)
ys <- c(0, 100, 300)
This line below might be a bit hard to understand but it does all of the work:
res <- mapply(fun, list(xs), ys)
mapply allows us to run function with multiple variables across a range of values. Here we provide it with only one value for "x" argument (note that xs is a long vector, but since it is in a list - it's only one instance). We also provide multiple values of "y" argument. So the function will run 3 times each with the same value of x and different values of y.
Results are arranged column-wise so in the end we have 3 columns. Now we only have to plot:
cols <- c("black", "cornflowerblue", "orange")
matplot(xs, res, col=cols, type="l", lty=1, lwd=2, xlab="x", ylab="result")
legend("bottomright", legend=ys, title="value of y", lwd=2, col=cols)
Here the matplot function does all the work - it plots a line for every column in the provided matrix. Everything else is decoration.
Here is the result:
Related
I'd like to plot a dataset that consists of two vectors of length 100. The mean difference of the vectors being high and the variance of each of them being considerably smaller, it is quite difficult to plot both vectors and still be able to see the variation within each vector.
What I'd like to be able to manually set the breaks so that we could both see the difference between the vectors and within them.
Consider this data set
a=rnorm(100,sd=0.005)+1
b=rnorm(100,sd=0.005)+10
vec = c(a,b)
Neither plot(vec) nor plot(vec,log="y") gives satisfying results, as it is not possible to distinguish the variation within the vector (see picture).
I'd like the breaks on the y-axis to be (min(a), max(a), 5, min(b), max(b)) (and get equal distance between them). How could one achieve that?
Depending on exactly what you are trying to do, a simple transformation of the data in each part of the vector might be enough:
vec2 <- c( (a - min(a))/ (max(a)-min(a)) , 3 + (b - min(b))/ (max(b)-min(b)) )
plot(vec2, axes=F)
box()
axis(1)
axis(2, at=c(0,1,2,3,4), labels = round(c(min(a), max(a), 5, min(b), max(b)),2))
Alternative approaches might be a custom transformation in ggplot, a secondary axis in ggplot, breaking the graph into facets, or using ggbreak.
Here is some data to work with.
df <- data.frame(x1=c(234,543,342,634,123,453,456,542,765,141,636,3000),x2=c(645,123,246,864,134,975,341,573,145,468,413,636))
If I plot these data, it will produce a simple scatter plot with an obvious outlier:
plot(df$x2,df$x1)
Then I can always write the code below to remove the y-axis outlier(s).
plot(df$x2,df$x1,ylim=c(0,800))
So my question is: Is there a way to exclude obvious outliers in scatterplots automatically? Like ouline=F would do if I were to plot, say, boxplots for an example. To my knowledge, outline=F doesn't work with scatterplots.
This is relevant because I have hundreds of scatterplots and I want to exclude all obvious outlying data points without setting ylim(...) for each individual scatterplot.
You could write a function that returns the index of what you define as an obvious outlier. Then use that function to subset your data before plotting.
Here all observations with "a" exceeding 5 * median of "a" are excluded.
df <- data.frame(a = c(1,3,4,2,100), b=c(1,3,2,4,2))
f <- function(x){
which(x$a > 5*median(x$a))
}
with(df[-f(df),], plot(b, a))
There is no easy yes/no option to do what you are looking for (the question of defining what is an "obvious outlier" for a generic scatterplot is potentially quite problematic).
That said, it should not be too difficult to write a reasonable function to give y-axis limits from a set of data points. If we take "obvious outlier" to mean a point with y value significantly above or below the bulk of the sample (which could be justified assuming a sufficient distribution of x values), then you could use something like:
ybounds <- function(y){ # y is the response variable in the dataframe
bounds = quantile(df$x1, probs=c(0.05, 0.95), type=3, names=FALSE)
return(bounds + c(-1,1) * 0.1 * (bounds[2]-bounds[1]) )
}
Then plot each dataframe with plot(df$x, df$y, ylim=ybounds(df$y))
I am attempting to plot discrete functions in R for a flow model equation. I have to plot the original function u(x) = tanh(x - 0.1), with u(x) on the Y-axis and x on the X-axis. I then must plot a discrete function that describes the slope.
u <- array(0,dim=c(21))
#Plot the original function u(x)=tanh(ax-x0)
curve(tanh(x-0.1), from=0, to=5, n=100, col="red", xlab="x", ylab = "u(x)")
grid (NULL,NULL, col = "lightgray", lty="dotted")
x = seq(0, 5, by=0.25)
for (i in 1:21){
u[i] = tanh(x[i]-0.1)
}
x1 = seq(0, 4.75, by=0.25)
du1 <- array(0,dim=c(20))
for (i in 1:20){
du1[i] = (u[i+1]-u[i])/0.25
}
plot(x1, du1, xlab = "x", ylab = "du/dx")
So per the definition of my derivative function, my du/dx vector will only have 20 vector points, but my x vector still has 21 points. I must then repeat giving defined du/dx vectors that have 19 and 18 vector points. Is there any way I can plot the du/dx vs. x functions all on the same graph without having to redefine x every time?
I'm not sure I'm totally clear on what you're asking, but here's code that prevents you from writing out 18 individual code blocks (using the "diff" function in base).
derivs <- matrix(NA, nrow=21, ncol=18)
x <- seq(0, 5, by=0.25)
orig <- tanh(x-0.1)
derivs[,1] <- c(diff(orig)/.25, NA)
for(col in 2:18) {
print(col)
derivs[,col] <- c((diff(derivs[,col-1])/.25), NA)
}
The resulting matrix (here called "derivs" has a column for each derivative (first column is first derivative, second is second derivative, etc...)
One reason I'm a bit confused about what you're trying for is that, if you were to plot all these on one graph, it would be a really weird graph, because the order of magnitudes are really different between the first few, and the last few derivatives.
The dimensions aren't really different for each derivative; I've simply padded it with NAs, which won't appear on a graph.
Also note that you can use the diff function to get second-order differences and so forth.
PS. The graph will probably look more reasonable if, rather than taking the differences as you did (and as I did, to emulate you), so that the different is assigned to the first x value...you probably want to center. E.g. every other derivative would actually be plotted at .125, .375, etc.)
I want to break the x-axis of a plot of a cumulative distribution function for which I use the function plot.stepfun, but don't seem to be able to figure out how.
Here's some example data:
set.seed(1)
x <- sample(seq(1,20,0.01),300,replace=TRUE)
Then I use the function ecdf to get the empirical cumulative distribution function of x:
x.cdf <- ecdf(x)
And I change the class of x.cdf to stepfun, because I prefer to call plot.stepfun directly over using plot.ecdf (which also uses plot.stepfun, but has fewer possibilities to customize the plot).
class(x.cdf) <- "stepfun"
Then I am able to create a plot as follows:
plot(x.cdf, do.point=FALSE)
But now I want to break up the x-axis between 12 and 20, e.g. using axis.break [plotrix-library] such as here, but since I have no ordinary x and y-argument for plotting, I don't know how to do this.
Any help would be very much appreciated!
"Breaking the axis between 12 and 20" doesn't make a lot of sense to me since 20 is the end of the x range, so I will exemplify breaking it between 12 and 15. The plotrix.axis.break function doesn't actually do very much (as can be seen if you step through that example.) All it does is put a couple of slashes at a particular location, the "breakpos". All the rest of the work needs to be done with regular plotting functions and plot.stepfun isn't really set up to do it, so I'm using regular plot.default with the type="s" argument. You need to do the offsetting of the x values, the arguments to the ecdf function and the labels in the axis arguments.
png()
plot( c(seq(1,12,0.1), seq(15,20,0.1)-3), # Supply the range, shifted
x.cdf(c(seq(1,12,0.1), seq(15,20,0.1))), # calc domain values, not shifted
type="s", xaxt="n", xlab="X", ylab="Quantile")
axis(1, at=c( 1:12, (16:20)-3), labels=c(1:12, (16:20)) ) #shift x's, labels unshifted
axis.break(breakpos=12)
dev.off()
I was wondering if it was possible to graph three lines in R using functions. For instance, how could I get the functions:
3x+1
4x+2
x+1
to show up on the same graph in r?
First decide the bounds, say 0 to 100, and make an empty plot including those points:
plot(c(0,100), c(0,100))
possibly of course with optional parameters such as axes=, xlab=, ylab=, and so on, to control various details of the axes and titling/labeling; then, add each line with abline(a, b) where b is the slope and a is the intercept, so, in your examples:
abline(1, 3)
abline(2, 4)
abline(1, 1)
Of course there are many more details you can control such as color (col= optional parameter), line type (lty=) and width (lwd=), etc, but this is the gist of it.
You can also use the curve function. For example:
curve(3*x+1, from=-5, to=5)
curve(4*x+2, add=T)
curve(x+1, add=T)
Here the add parameter causes the plots to be put on the same graph
Here's another way using matplot:
> x <- 0:10
> matplot(cbind(x, x, x), cbind(3*x+1, 4*x+2, x+1),
type='l', xlab='x', ylab='y')
matplot(X, Y, ...) takes two matrix arguments. Each column of X is plotted against each column of Y.
In our case, X is a 11 x 3 matrix with each column a sequence of 0 to 10 (our x-values for each line). Y is a 11 x 3 matrix with each column computed off the x vector (per your line equations).
xlab and ylab just label the x and y axes. The type='l' specifies that lines are to be drawn (see other options by typing ?matplot or ?plot at the R prompt).
One nice thing about matplot is that the defaults can be nice for plotting multiple lines -- it chooses different colors and styles per line. These can also be modified: see ?matplot (and lty for more detail).