How to fit an appropriate smooth curve over bar plot/histogram?

How to fit an appropriate smooth curve over bar plot/histogram? - r

My imported data set consists of predetermined ranges and their probability density values. I have plotted this in a bar chart in R. So my plot shows a histogram, but to R its just a bar plot. However, I now need to put a curve on this bar chart for visualization purposes, using same data in bar chart.
The code I have used so far is creating a funny looking curve that doesn't fit appropriately to the bar chart...Any help would be hugely appreciated please!
Code used so far:
barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")
fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = 0.2)
lines(fit1,col="blue", lwd=3)
Link to output of this code:
Data:
Data10 <- structure(list(
ï..Weight.Range = c(0, 0.5, 1, 1.5, 2, 2.5, 3,
3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5,
11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5, 15, 15.5, 16, 16.5, 17,
17.5, 18, 18.5, 19, 19.5, 20, 20.5, 21, 21.5, 22, 22.5, 23, 23.5,
24, 24.5, 25, 25.5, 26, 26.5, 27, 27.5, 28, 28.5, 29, 29.5, 30,
30.5, 31, 31.5, 32, 32.5, 33, 33.5, 34, 34.5, 35, 35.5, 36, 36.5,
37, 37.5, 38, 38.5, 39, 39.5, 40, 40.5, 41, 41.5, 42, 42.5, 43,
43.5, 44, 44.5, 45, 45.5, 46, 46.5, 47, 47.5, 48), pdf = c(0.012697609,
0.015237131, 0.017776653, 0.019046414, 0.020694512, 0.022575831,
0.024457151, 0.02633847, 0.028219789, 0.030101109, 0.031982428,
0.033863747, 0.035745066, 0.037626386, 0.039507705, 0.041389024,
0.043270343, 0.045151663, 0.042420729, 0.03688759, 0.033198831,
0.029510072, 0.026374627, 0.023976934, 0.02264407, 0.021614794,
0.020585518, 0.019556242, 0.018526967, 0.017497691, 0.016468415,
0.015439139, 0.014409863, 0.013380587, 0.012351311, 0.011322035,
0.009839476, 0.008433837, 0.007731017, 0.007028197, 0.005622558,
0.004919738, 0.004568328, 0.004498046, 0.004427764, 0.004357482,
0.0042872, 0.004216918, 0.004146636, 0.004076354, 0.004006072,
0.00393579, 0.003865508, 0.003795226, 0.003724944, 0.003654663,
0.003584381, 0.003514099, 0.003443817, 0.003373535, 0.003303253,
0.003232971, 0.003162689, 0.003092407, 0.003022125, 0.002951843,
0.002881561, 0.002811279, 0.002740997, 0.002670715, 0.002600433,
0.002530151, 0.002459869, 0.002389587, 0.002319305, 0.002249023,
0.002178741, 0.002108459, 0.002038177, 0.001967895, 0.001897613,
0.001827331, 0.001757049, 0.001686767, 0.001616485, 0.001546203,
0.001475921, 0.001405639, 0.001335357, 0.001265075, 0.001194794,
0.001124512, 0.00105423, 0.000983948, 0.000913666, 0.000843384,
0.000773102)
), class = "data.frame", row.names = c(NA, -97L))

You need to feed in the initial barplot when drawing the new lines.
my_bar <- barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")
fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = .2)
lines(my_bar, fit1$y,col="blue",type="l",lwd=3)

The barplot function is meant to be used with a categorical variable. It is treating your x values as categories rather than a continuous number. When barplot runs, it calculates an value for each category which it silently returns. You can use those returned values with the result from your smooth spline to draw the line. For example
xx <- barplot(Data10$pdf, names = Data10$ï..Weight.Range, xlab = "Weight", ylab = "Probability Density", ylim = c(0.00,0.05), main = "Histogram")
fit1<-smooth.spline(Data10$ï..Weight.Range, Data10$pdf, df=12, spar = 0.2)
lines(xx[,1], fit1$y,col="blue", lwd=3)

Related

R/Open air Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

I am trying to use the function "Summaryplot" from the Openair Package in R. But everytime I tried to use it with the next data matrix, you only have to use the next code to extract the info:
structure(list(Fecha = structure(c(1577840400, 1577844000, 1577847600,
1577851200, 1577854800, 1577858400, 1577862000, 1577865600, 1577869200,
1577872800, 1577876400, 1577880000, 1577883600, 1577887200, 1577890800,
1577894400, 1577898000, 1577901600, 1577905200, 1577908800, 1577912400,
1577916000, 1577919600, 1577923200, 1577926800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), PM10_CDAR = c(11.4, 8.3, 13.3, 16,
39.5, 35.4, 31, 48.7, 41, 34, 23.3, 16.5, 21.8, 15.7, 17.8, 12.7,
12.8, 16, 11.3, 7.9, 8.1, 10, 10.4, 7.7, 6.1), PM10_KEN = c(49.7,
72.4, 34.5, 50.3, 65.2, 59, 25.5, 19.6, 17.4, 14.3, 48.2, 34.8,
25.3, 56.7, 26, 45.6, 29, 30.5, 24.1, 22, 26.9, 22.2, 17.3, 19.1,
15.5), PM10_LAF = c(28.8, 69, 72.3, 35.1, 82, 44, 69, 73, 46,
43, 29.9, 25.1, 21.4, 15.8, 11.7, 16, 15, 12, 9, 10.8, 10.1,
11.9, 12.9, 12.4, 11.8), PM10_TUN = c(45, 57, 93, 69, 73, 60,
45, 69, 61, 46, 28, 20, 33, 54, 44, 27, 39, 37, 36, 41, 30, 29,
18, 4, 7), PM2.5_CDAR = c(9, 8, 10, 16, 34, 30, 33, 42, 33, 34,
6, 10, 9, 9, 15, 10, 9, 7, 9, 5, 5, 10, 6, 4, 2), PM2.5_KEN = c(49,
81, 110, 83, 63, 59, 79, 68, 84, 76, 48, 19, 22, 34, 36, 33,
29, 19, 13, 22, 3, 16, 16, 6, 9), PM2.5_LAF = c(35, 65, 53, 30,
60, 62, 64, 67, 36, 43, 21, 16, 11, 11, 10, 15, 15, 12, 9, 6,
6, 10, 10, 9, 10), PM2.5_TUN = c(39, 42, 66, 54, 52, 39, 33,
40, 42, 33, 21, 11, 13, 27, 22, 17, 21, 15, 17, 15, 13, 10, 6,
4, 2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-25L))
the next error appears:
> summaryPlot(date.zoo_2, pollutant = "Kennedy_PM10")
Error in seq.int(0, to0 - from, by) : 'to' must be a finite number
In addition: Warning messages:
1: In min.default(numeric(0), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
2: In max.default(numeric(0), na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
I tried everything, to change the date column into date as. idx <- as.POSIXct(datos_meterologicos$Fecha); datos_meterologicos$Fecha <- read.zoo(datos_meterologicos, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC"). And frankly, I don´t know what to do because the same error is still appearing.
The whole code is next
date.matrix_2 <- as.data.frame(datos_meterologicos[,-1])
idx_2 <- as.POSIXct(datos_meterologicos$Fecha)
date.xts_2 <- as.xts(date.matrix_2,order.by=idx_2)
date.zoo_2 <- as.zoo(date.xts_2)

Why does lm's fixed intercept not work with poly (raw = FALSE)

Why does a fixed intercept lead to a huge negative shift? See the red line.
Form the docs ?poly
Returns or evaluates orthogonal polynomials of degree 1 to degree over
the specified set of points x: these are all orthogonal to the
constant polynomial of degree 0.
Thus, I would expect the polynomial of degree 0 to be the intercept. What do I miss?
plot(df$t, df$y)
# this is working as expected
model1 <- lm(y ~ -1 + poly(t, 10, raw = TRUE), data = df)
model2 <- lm(y ~ -1 + poly(t, 10, raw = FALSE), data = df)
model3 <- lm(y ~ poly(t, 10, raw = TRUE), data = df) # raw = FALSE gives similar results
nsamples <- 1000
new_df <- data.frame(t = seq(0, 96, length.out = nsamples))
new_df$y1 <- predict(model1, newdata = new_df)
new_df$y2 <- predict(model2, newdata = new_df)
new_df$y3 <- predict(model3, newdata = new_df)
plot(new_df$t, new_df$y1, type = "l", ylim = c(-0.5, 1))
lines(new_df$t, new_df$y2, col = "red")
lines(new_df$t, new_df$y3 + 0.05, col = "blue") # offest for visibilty added!!
lines(c(0, 96), -c(mean(df$y), mean(df$y)), col = "red")
Edit: I think the question is equivalent to "what orthogonal polynomials are used (formula)?". The reference in the docs is a really old book - I can't get it. And there are a lot of different ortogonal poynomials, see e.g. Wikipedia.
Data:
df <- structure(list(t = c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5,
8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5,
19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5,
30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5,
41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 51.5,
52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5,
63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5,
74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5,
85.5, 86.5, 87.5, 88.5, 89.5, 90.5, 91.5, 92.5, 93.5, 94.5, 95.5),
y = c(0.00561299852289513, 0.0117183653372723, 0.0171836533727228,
0.0234367306745446, 0.0280157557853274, 0.0331856228458887, 0.0391432791728213,
0.0438700147710487, 0.048793697685869, 0.0539635647464303, 0.0586903003446578,
0.0630723781388479, 0.0681437715411128, 0.0732151649433777, 0.0780403741999015,
0.0813884785819793, 0.085425898572132, 0.0896110290497292, 0.0934022648941408,
0.0968980797636632, 0.0996061053668144, 0.103495814869522, 0.107631708517971,
0.111176760216642, 0.115017232890202, 0.119350073855244, 0.124766125061546,
0.131216149679961, 0.139586410635155, 0.148153618906942, 0.156080748399803,
0.166814377154111, 0.177006400787789, 0.189118660758247, 0.202412604628262,
0.217577548005908, 0.234318069916297, 0.249089118660758, 0.267355982274741,
0.284539635647464, 0.301477104874446, 0.316100443131462, 0.332151649433776,
0.346873461349089, 0.361792220580995, 0.376366322008863, 0.392220580994584,
0.408173313638602, 0.424224519940916, 0.439192516001969, 0.454849827671098,
0.471196454948301, 0.485622845888725, 0.500443131462334, 0.514869522402757,
0.529148202855736, 0.544559330379124, 0.559773510585918, 0.576218611521418,
0.593303791235844, 0.609010339734121, 0.623929098966027, 0.6397341211226,
0.655489906450025, 0.669768586903003, 0.68493353028065, 0.698867552929591,
0.713244707040867, 0.726095519448548, 0.74027572624323, 0.752584933530281,
0.76903003446578, 0.781486952240276, 0.794091580502216, 0.804726735598227,
0.818217626784835, 0.832742491383555, 0.845691777449532, 0.856179222058099,
0.866075824716888, 0.875923190546529, 0.886952240275726, 0.896898079763663,
0.906203840472674, 0.915755785327425, 0.923879862136878, 0.932693254554407,
0.940768094534712, 0.949187592319055, 0.956523879862137, 0.964204825209257,
0.971344165435746, 0.978532742491384, 0.986558345642541, 0.993205317577548, 1)),
class = "data.frame", row.names = c(NA, -96L))

Just think about a regression line. For (x, y) data, let xx = mean(x) and yy = mean(y). Fitting
y = b * (x - xx)
is different from fitting
y = a + b * (x - xx)
and that a (intercept) measures the vertical shift. Furthermore, it can be shown that a = yy.

Partly change outliner styles in boxplot

Suppose I have the following data set
data <- c(
9.5, 27.9, 7.8, 17.8, 31.4, 25.9, 27.4,
25.2, 31.1, 34.7, 42, 29.1, 32.5, 30.3, 33, 33.8, 41.1, 34.5, 62)
When I drew the boxplot in r
boxplot(data)
I got three outliers 7.8, 9.5, and 62, that are illustrated in the diagram with three small circles.
Here I want to change the pch of the biggest outlier, i.e., 62, to a filled circle, but not the other two smaller outliners.
The following is what I've tried, but it doesn't work:
boxplot(data, outpch = ifelse(data >= 60, 16, 1))
Is there a way to achieve this?
Thanks

I don't think you can do this directly in boxplot function since outpch parameter in boxplot doesn't expect a vector but we can use the points function to display the outliers differently.
bp <- boxplot(data, outpch = NA)
with(bp, points(group, out, pch = ifelse(out >=60, 16, 1)))

Manipulate Chart Select Multiple Time-Series

I have created the plot below, so that I can easily compare multiple timeseries over different sections of time. I would next like to add a selector so that I can pick multiple timeseries to view on the plot at the same time, and uncheck the ones I don't want to see on the plot.
code:
y<-series1
r<-series2
s<-series3
require(graphics)
manipulate(
ts.plot(ts(y[x:(x+100)]),ts(r[x:(x+100)]),ts(s[x:(x+100)]),ts(t[x: (x+100)]),ts(h[x:(x+100)]), gpars=list(col = c("red","green","gray")))
,
x=slider(1,length(y)))
data:
dput(series1[1:10])
c(9.5, 9, 14.5, 22.5, 13, 16, 22, 31.5, 51, 43)
dput(series2[1:10])
c(20.3368220204774, 18.0733372276398, 16.61695493123, 15.6798824136643,
15.0769466973063, 14.6890028922692, 14.4393902191708, 14.2787832298018,
14.175444706505, 14.1089541357078)
dput(series3[1:10])
c(17.8189147557743, 22.3815592342001, 16.108169527143, 21.0654757276344,
16.3878646132368, 18.9345933680916, 16.634277276197, 15.4322081636797,
20.2884280389731, 12.2089595405668)

You could do it like this
y<-c(9.5, 9, 14.5, 22.5, 13, 16, 22, 31.5, 51, 43)
r<-c(20, 18, 17, 16, 15, 15, 14, 14, 14, 14)
require(graphics)
require(manipulate)
manipulate({
lines <- list(if (chk.y) ts(y), if(chk.r) ts(r))
cols <- c(if (chk.y) "red", if (chk.r) "green")
do.call(ts.plot, c(lines, list(gpars=list(col=cols, xlab="t", ylab="y"))))
},
chk.y=checkbox(TRUE, "y"),
chk.r=checkbox(TRUE, "r")
)

Area plot with missing values in base R

I want to draw an area plot for which the base of the polygon is zero and the data lines are connected to the base by vertical segments at every data break (that is the beginning, the end and possible NAs/NaN).
I drew this:
I had to force vertical down ward segments where the serie is interrupted with NAs, and I did this transforming NAs in 0s. But that doesn't produce vertical segments but polygon lines that reach the following 0s. I solved the problem for the beginning and the end of the series, adding a (y = 0, x = 0) point on both sides on the serie.
But this doesn't fix the problem if the NAs are inside the serie.
Any idea?
here's an example code (different image):
pollen <- c(45, 257.4, 24.67, 54.6, 89.4, 297, 471.25, 1256.5, 312.25, 969.2, 787.5, 425, NaN, 76.6, 42.67, 38.5, 20.2, 5.67, 15.8, 13.2, 11, 6.25, 6.67, 2.3, 0.5, 30.8, 3.75, 3, 2, 2.2, 3.25, 4.5, 9.6, 15.8, 200.2, NaN)
weeks.vec <- c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
plot.ts(y = pollen, x = weeks.vec, col = 'red', ylab = 'Pollen', xlab = 'Weeks', lwd = 3, xy.labels = F, xy.lines = T)
pollen[is.na(pollen)] <- 0
poly.y <- c(0,pollen,0)
poly.x <- c(weeks.vec[1], weeks.vec, weeks.vec[length(weeks.vec)])
polygon(y = poly.y, x = poly.x, density = NA,border = NA, col = rgb(1,0,0, .3))

I'd use ggplot2:
pollen <- c(45, 257.4, 24.67, 54.6, 89.4, 297, 471.25, 1256.5, 312.25, 969.2, 787.5, 425, NaN, 76.6, 42.67, 38.5, 20.2, 5.67, 15.8, 13.2, 11, 6.25, 6.67, 2.3, 0.5, 30.8, 3.75, 3, 2, 2.2, 3.25, 4.5, 9.6, 15.8, 200.2, NaN)
weeks.vec <- c(5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
DF <- data.frame(pollen, weeks.vec)
library(ggplot2)
ggplot(DF, aes(x = weeks.vec, y = pollen)) +
geom_ribbon(aes(ymin = 0, ymax = pollen),
colour = NA, fill = "red", alpha = 0.3) +
geom_line(colour = "red") +
geom_point(colour = "red", size = 3) +
xlab("Week") + ylab("Pollen") +
theme_bw()
But if you must use base plots:
plot.ts(y = pollen, x = weeks.vec, col = 'red',
ylab = 'Pollen', xlab = 'Weeks', lwd = 3,
xy.labels = F, xy.lines = T)
g <- cumsum(!is.finite(pollen))
for (i in unique(g)) {
y <- pollen[g == i]
x <- weeks.vec[g == i]
x <- x[is.finite(y)]
y <- y[is.finite(y)]
x <- c(x, rev(x))
y <- c(y, y * 0)
polygon(y = y, x = x, density = NA,border = NA, col = rgb(1,0,0, .3))
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to fit an appropriate smooth curve over bar plot/histogram? - r

Related

R/Open air Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

Why does lm's fixed intercept not work with poly (raw = FALSE)

Partly change outliner styles in boxplot

Manipulate Chart Select Multiple Time-Series

Area plot with missing values in base R

Categories

Resources