Linear graphics - r

I have a dataset in this way:
maximum <- c(10) #for each time
minim <- c(2) #for each time
Quantity c(4, 2, 10, 2, 10, 6, 2)
How can I structure my dataset to create a linear graphic like this? Time is always constant except when you go from 2 to 10 (they are at the same instant)

Using base R plot functions, just create a Time vector (x-axis):
maximum <- c(10) #for each time
minim <- c(2) #for each time
Time <- c(1, 2, 2, 3, 3, 4, 5)
Quantity <-c(4, 2, 10, 2, 10, 6, 2)
plot(Time, Quantity, type = "l", col = "lightblue", xaxt = "n")
abline(h = maximum, col = "darkblue")
abline(h = minim, col = "orange")

Related

Multiply probability distributions in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 months ago.
Improve this question
I'm trying to multiply some probability functions as to update the probability given certain factors. I've tried several things using the pdqr and bayesmeta packages, but they all work out not the way I intend, what am I missing?
A reproducible example showing two different distributions, a and b, which I want to multiply. That is because, as you notice, b doesn't have measurements in the low values, so a probability of 0. This should be reflected in the updated distribution.
library(tidyverse)
library(pdqr)
library(bayesmeta)
#measurements
a <- c(1, 2, 2, 4, 5, 5, 6, 6, 7, 7, 7, 8, 7, 8, 2, 6, 9, 10)
b <- c(5, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 7)
#create probability distribution functions
distr_a <- new_d(a, type = "continuous")
distr_b <- new_d(b, type = "continuous")
#try to combine distributions
summarized <- distr_a + distr_b
multiplied <- distr_a * distr_b
mixture <- form_mix(list(distr_a, distr_b))
convolution <- convolve(distr_a, distr_b)
The resulting PDF's are plotted like this:
The bayesmeta::convolve() does the same as summarizing two pdqr PDF's and seem to oddly shift the distributions to the right and make them not as high as supposed to be.
Ordinarily multiplying the pdqr PDF's leaves a very low probablity overall.
Using the pdqr::form_mix() seems to even the PDF's out in between, but leaving probabilies above 0 for the lower x-values.
So, I tried to gain some insight in what I wanted to do, by using the PDF's for a and b to generate probabilities for each x value and multiply that:
#multiply distributions manually
x <- c(1:10)
manual <- data.frame(x) %>%
mutate(a = distr_a(x),
b = distr_b(x),
multiplied = a*b)
This indeed gives a resulting shape I am after, it however (logically) has too low probabilities:
I would like to multiply (multiple) PDF's. What am I doing wrong? Are my statistics wrong, or am I missing a usefull function?
UPDATE:
It seems I am a stats noob on this subject, but I would like to achieve something like the below distribution. Given that both situation a and b are true, I would expect the distribution te be something like the dotted line. Is that possible?
multiplied is the correct one. One can check with log-normal distributions. The sum of two independant log-normal random variables is log-normal with µ = µ_a + µ_b and sigma² = sigma²_a + sigma²_b.
a <- rlnorm(25000, meanlog = 0, sdlog = 1)
b <- rlnorm(25000, meanlog = 1, sdlog = 1)
distr_a <- new_d(a, type = "continuous")
distr_b <- new_d(b, type = "continuous")
distr_ab <- form_trans(
list(distr_a, distr_b), trans = function(x, y) x*y
)
# or: distr_ab <- distr_a * distr_b
plot(distr_ab, xlim = c(0, 40))
curve(dlnorm(x, meanlog = 1, sdlog = sqrt(2)), add = TRUE, col = "red")
As demonstrated here:
https://www.r-bloggers.com/2019/05/bayesian-models-in-r-2/
# Example distributions
probs <- seq(0,1,length.out= 100)
prior <- dbinom(x = 8, prob = probs, size = 10)
lik <- dnorm(x = probs, mean = .5, sd = .1)
# Multiply distributions
unstdPost <- lik * prior
# If you wanted to get an actual posterior, it must be a probability
# distribution (integrate to 1), so we can divide by the sum:
stdPost <- unstdPost / sum(unstdPost)
# Plot
plot(probs, prior, col = "black", # rescaled
type = "l", xlab = "P(Black)", ylab = "Density")
lines(probs, lik / 15, col = "red")
lines(probs, unstdPost, col = "green")
lines(probs, stdPost, col = "blue")
legend("topleft", legend = c("Lik", "Prior", "Unstd Post", "Post"),
text.col = 1:4, bty = "n")
Created on 2022-08-06 by the reprex package (v2.0.1)

standard deviation (and percentiles) of multiple arrays of the same dimension in R

I have several different arrays of the same dimension. Is there a way to find the standard deviation, mean, and some percentiles of all the arrays? My final result should be one array with the same dimension as each of the individual arrays.
I tried the following it clearly doesn't work
m1 <- array(runif(8), dim = c(2, 2, 2))
m2 <- array(runif(8), dim = c(2, 2, 2))
m3 <- array(runif(8), dim = c(2, 2, 2))
sd(m1, m2, m3)
Consider creating a single array and use apply to loop over the dimensions and get the sd
out <- apply(array(c(m1, m2, m3), dim = c(2, 2, 2, 3)), c(1, 2, 3), sd)
-checking the output
> sd(c(m1[1], m2[1], m3[1]))
[1] 0.1623589
> out[1]
[1] 0.1623589
Use the same method for mean
out2 <- apply(array(c(m1, m2, m3), dim = c(2, 2, 2, 3)), c(1, 2, 3), mean)

How to customize color and scale of y axis in each multiple plot using plot.zoo?

This a reproducible example of my data
dat<-data.frame(
prec<-rnorm(650,mean=300),
temp<-rnorm(650,mean = 22),
pet<-rnorm(650,mean = 79),
bal<-rnorm(650,mean = 225))
colnames(dat)<-c("prec","temp","pet","bal")
dat<-ts(dat,start = c(1965,1),frequency = 12)
#splines
fit1<-smooth.spline(time(dat),dat[,1],df=25)
fit2<-smooth.spline(time(dat),dat[,2],df=25)
fit3<-smooth.spline(time(dat),dat[,3],df=25)
fit4<-smooth.spline(time(dat),dat[,4],df=25)
dat2 <- cbind(dat, fitted(fit1), fitted(fit2), fitted(fit3), fitted(fit4))
plot.zoo(window(dat2, start = 1965), xlab = "", screen = 1:4,
col = c(1:4, 1, 2, 3, 4),yax.flip = TRUE, bty="n")
How can I modify the color and the scale of the y axes in each plot to match the same color of the time series?
Create dat2 which contains both the series and the smooth splines, use window to start it at 1965, specify in screen= that the the columns be in panels 1:4 (it will recycle for the last 4 columns) and specify that the last 4 columns be black, i.e. 1, or modify colors to suit.
dat2 <- cbind(dat, fitted(fit1), fitted(fit2), fitted(fit3), fitted(fit4))
plot.zoo(window(dat2, start = 1965), xlab = "", screen = 1:4,
col = c(1:4, 1, 1, 1, 1))
Regarding the comment, to me it seems easier to read if the ticks, labels and axes are black but if you want to do that anyways use the mfrow= graphical parameter with a for loop and specify col.axis and col.lab in the plot.zoo call:
nc <- ncol(dat)
cols <- 1:nc # specify desired colors
opar <- par(mfrow = c(nc, 1), oma = c(6, 0, 5, 0), mar = c(0, 5.1, 0, 2.1))
for(i in 1:nc) {
dat1965 <- window(dat[, i], start = 1965)
plot(as.zoo(dat1965), col = cols[i], ylab = colnames(dat)[i], col.axis = cols[i],
col.lab = cols[i])
fit <- smooth.spline(time(dat1965), dat1965, df = 25)
lines(cbind(dat1965, fitted(fit))[, 2]) # coerce fitted() to ts
}
par(opar)
mtext("4 plots", line = -2, font = 2, outer = TRUE)

Looping for ggplot how to use the for loop variable i inside the loop

I have a dataframe (what is the dataframe? i,e is not important).
I am using that and plotting some point curves. like below
#EXP <- 3 (example)
#EXP_VEC <- c(1:EXP)
for (i in 1:EXP)
{
gg2_plot[i] <- ggplot(subset(gg2,Ei == EXP_VEC[i] ),aes(x=hours, y=variable, fill = Mi)) + geom_point(aes(fill = Mi,color = Mi),size = 3)
}
As you can see EXP_VEC = c(1,2,3.......) (Depends on user input Ex: if user inputs 2 then EXP_VEC = c(1,2))
Dataframe has Ei = 1,2,3,4,........
Now I have to do the plotting for all these Ei values depending on the user input.
Consider, EXP_VEC=3
now the for loop should produce three plots for Ei = 1 , Ei = 2 and Ei = 3
for this if the for loop I have written works then it would have been done and finished.
But obviously for loop is not working. I cant use aes_string because variable "i" is outside the aes().
Ex: consider the following dataset
dd<-data.frame(
Ei = c(1L, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
Mi = c(1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2),
hours = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
variable = c(0.1023488, 0.1254325, 0.1523245, 0.1225425, 0.1452354,
0.1853324, 0.1452369, 0.1241241, 0.0542232, 0.8542154, 0.021542,
0.2541254))
As you can see I have two sets of Ei, I want to plot 1st plot for Ei = 1 and then beside this plot I want to again plot for Ei = 2.
So I thought of saving the plots for Ei=1 and Ei=2 in two separate variables and then using then in some kind of cascade function which I am yet to find out.
How do I do it?
Is there a easy way to do this by just using ggplot without any loop?
If not then how can I call "i" value inside my for loop?
I would do something like this:
plot_exp <-
function(i){
dat <- subset(gg2,Ei == i )
if (nrow(dat) > 0)
ggplot(dat,aes(x=hours, y=variable, fill = Mi)) +
geom_point(aes(color = Mi),size = 3)
}
ll <- lapply(seq_len(EXP), plot_exp)
ll is a list of plot of ggplot objects.

Line in R plot should start at a different timepoint

I have the following example data set:
date<-c(1,2,3,4,5,6,7,8)
valuex<-c(2,1,2,1,2,3,4,2)
valuey<-c(2,3,4,5,6)
now I plot the date and the valuex variable:
plot(date,valuex,type="l")
now, I want to add a line of the valuey variable, but it should start with the 4th day, so not at the beginning, therefore I add NA values:
valuexmod<-c(rep(NA,3),valuex)
and I add the line with:
lines(date,valuexmod,type="l",col="red")
But this does not work? R ignores the NA values and the valuexmod line starts with the first day, but it should start with th 4th day?
Given that date and valuex have the same length, I am assuming that you have a typo above.
Try this instead:
date <- c(1, 2, 3, 4, 5, 6, 7, 8)
valuex <- c(2, 1, 2, 1, 2, 3, 4, 2)
valuey <- c(2, 3, 4, 5, 6)
valueymod <- c(rep(NA, 3), valuey)
plot(date, valuex, type = "l", ylim = range(c(valuex, valuey)))
lines(date, valueymod, type = "l", col = "red")
Here's the resulting plot:
Related to your question is a point made in help("lines")...
The coordinates can contain NA values. If a point contains NA in either its x or y value, it is omitted from the plot, and lines are not drawn to or from such points. Thus missing values can be used to achieve breaks in lines.

Resources