structure x-axis date as mm/dd/yy using lattice in R - r

I want the dates on the x-axis to appear as mm-dd-yy using lattice. Only every month needs to be shown ex: 10/1/2022, 11/1/2022 etc.
I think I may have to use format(pretty(df$date), "%d-%m-%y") based on another post but when I try I get an error:
Error in multiple && !outer : invalid 'x' type in 'x && y'
Here is my code:
library(lattice)
library(latticeExtra)
obj1 <- xyplot(Nconc + Sconc ~ date, df, type = "l", main="Title", lwd=2, col= c("red", "steelblue"))
obj2 <- xyplot(CountyCounts ~ date, df, type = "l", lwd=2, col="green")
p <- doubleYscale(obj1, obj2, use.style=FALSE, add.ylab2 = TRUE)
p
Here is a sample of my data:
structure(list(date = structure(c(1634688000, 1634774400, 1634860800,
1635033600, 1635120000, 1635206400, 1635379200, 1635465600, 1635638400,
1635811200, 1635897600, 1636070400, 1636416000, 1636502400, 1636588800,
1636675200, 1636934400, 1637020800, 1637107200, 1637280000, 1637366400,
1637625600, 1637712000, 1637971200, 1638057600, 1638144000, 1638230400,
1638316800, 1638403200), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Nconc = c(2.63e-05, 2.4e-05, 1.83e-05, 1.47e-05, 2.11e-05,
2.09e-05, 3.99e-05, 5.22e-05, 9.17e-06, 2.89e-05, 4.79e-05,
1.9e-05, 3.12e-05, 1.81e-05, 5.79e-06, 4.49e-05, 7.61e-06,
1.51e-05, 3.61e-05, 1.46e-05, 1.16e-05, 3.53e-05, 0.000232,
3.09e-05, 1.31e-06, 1.24e-05, 4.99e-05, 7.76e-05, 0.00016
), Sconc = c(2.98e-05, 2.79e-05, 1.75e-05, 2.2e-05, 2.62e-05,
2.56e-05, 4.27e-05, 5.6e-05, 1.26e-05, 3.92e-05, 6.11e-05,
1.42e-05, 2.53e-05, 2.12e-05, 4.31e-06, 4.57e-05, 9.83e-06,
1.07e-05, 3.65e-05, 2.51e-05, 1.23e-05, 2.63e-05, 0.000212,
3.99e-05, 1.42e-06, 1.08e-05, 4.09e-05, 7.89e-05, 0.000113
), CountyCounts = c(46, 45, 47, 17, 49, 49, 29, 38, 19, 20,
30, 25, 38, 24, 19, 27, 27, 30, 39, 17, 11, 23, 21, 19, 21,
43, 45, 42, 28)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -29L))

You could add
scales = list(x = list(format = "%d-%m-%y"))
to the call creating obj1 to get the format you want. But to control the locations exactly, you will need to specify them via at, e.g.,
scales = list(x = list(at = seq(as.POSIXct("2021-11-01"), by = "1 month", length.out = 5),
format = "%d-%m-%y", at = ))
You could try playing with tick.number = 5 (default) to control the locations, and you might get lucky, but there are no guarantees that you will get exactly 1 tick per month that way.

Related

Create mean value plot without missing values count to total

Using a dataframe with missing values:
structure(list(id = c("id1", "test", "rew", "ewt"), total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8), total_frq_4 = c(36, NA, 104, NA)), row.names = c(NA, 4L), class = "data.frame")
How is is possible to create a bar plot with the mean for every column, excluding the id column, but without filling the missing values with 0 but leaving out the row with missing values example for total_frq_3 24+25+8 = 57/3 = 19
You can use colMeans function and pass it the appropriate argument to ignore NA.
library(ggplot2)
xy <- structure(list(id = c("id1", "test", "rew", "ewt"),
total_frq_1 = c(54, 87, 10, 36), total_frq_2 = c(45, 24, 202, 43), total_frq_3 = c(24, NA, 25, 8),
total_frq_4 = c(36, NA, 104, NA)),
row.names = c(NA, 4L),
class = "data.frame")
xy.means <- colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE)
xy.means <- as.data.frame(xy.means)
xy.means$total <- rownames(xy.means)
ggplot(xy.means, aes(x = total, y = xy.means)) +
theme_bw() +
geom_col()
Or just use base image graphic
barplot(height = colMeans(x = xy[, 2:ncol(xy)], na.rm = TRUE))

Manually draw boxplot using ggplot

I think my question is very similar to this one, the only difference being that I'd love to use ggplot (and the answer with ggplot was missing a tiny bit of detail). I have data like this:
show<-structure(list(Median = c(20, 39, 21, 52, 45.5, 24, 36, 20, 134,
27, 44, 43), IQR = c(4, 74, 28, 51.5, 73.5, 18, 47.5, 26.5, 189.5,
46, 54, 61), FirstQuartile = c(`25%` = 19, `25%` = 24, `25%` = 12,
`25%` = 30.5, `25%` = 36.5, `25%` = 18, `25%` = 16.5, `25%` = 13,
`25%` = 53.5, `25%` = 15, `25%` = 24.5, `25%` = 27), ThirdQuartile = c(`75%` = 23,
`75%` = 98, `75%` = 40, `75%` = 82, `75%` = 110, `75%` = 36,
`75%` = 64, `75%` = 39.5, `75%` = 243, `75%` = 61, `75%` = 78.5,
`75%` = 88), Group = c("Program Director", "Editor", "Everyone",
"Board Director", "Board Director", "Program Director", "Editor",
"Everyone", "Board Director", "Everyone", "Editor", "Program Director"
), Decade = c("1980's", "1980's", "1980's", "1980's", "1990's",
"1990's", "1990's", "1990's", "2000's", "2000's", "2000's", "2000's"
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
))
And I would like to draw a graph like this:
With "group" as the color, instead of "fellowship". The problem is, that graph was drawn from "complete" data (with 800ish rows), and I clearly only have summary data above. I realize it won't be able to draw outliers but that is ok. Any help would be appreciated! I'm specifically struggling with how I would draw the ymin/max and the edges of the notch. Thank you
You can use geom_boxplot() with stat = "identity" and fill in the five boxplot numbers as aesthetics.
library(ggplot2)
# show <- structure(...) # omitted for previty
ggplot(show, aes(Decade, fill = Group)) +
geom_boxplot(
stat = "identity",
aes(lower = FirstQuartile,
upper = ThirdQuartile,
middle = Median,
ymin = FirstQuartile - 1.5 * IQR, # optional
ymax = ThirdQuartile + 1.5 * IQR) # optional
)
As pointed out by jpsmith in the comments below, the 1.5 * IQR rule becomes hairy if you don't have the range of the data. However, if you have information about the data extrema or the data domain, you can limit the whiskers as follows:
# Dummy values assuming data is >= 0 up to infinity
show$min <- 0
show$max <- Inf
ggplot(show, aes(Decade, fill = Group)) +
geom_boxplot(
stat = "identity",
aes(lower = FirstQuartile,
upper = ThirdQuartile,
middle = Median,
ymin = pmax(FirstQuartile - 1.5 * IQR, min),
ymax = pmin(ThirdQuartile + 1.5 * IQR, max))
)

can't add labels to my graph

I have this graph:
I just need to add labels to each colored line.
I need to add to the blue one Forecast Sales and for the red one Historical Sales.
I tried to adapt these examples here but I have much error. Also, I can not plot the graph above just by using this code:
to make it reproductible :
dput(df1)
structure(list(Semaine = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31), M = c(5649.96284329564, 7400.19639744335, 6948.61488673139,
5043.28209277238, 7171.29719525351, 7151.04746494067, 5492.96601941748,
6796.1160130719, 5532.95496473142, 7371.33061889251, 5462.73861171367,
7156.01570964247, 5558.63194819212, 9329.49289405685, 5770.02903225806,
7348.68497576737, 5261.26655896607, 8536.11304909561, 7463.97630586968,
6133.49774339136, 7252.69089929995, 6258.54674403611, 8167.67766497462,
5644.66612816371, 7512.5169628433, 5407.84275713516, 7795.63220247711,
5596.75282714055, 7264.37264404954, 5516.98492191707, 8188.80776699029
> dput(df2)
structure(list(Semaine = c(32, 33.2, 34.4, 35.6, 36.8, 38), M = c(5820.32304669441,
6296.32038834951, 7313.24757281553, 7589.714214588, 8992.35922330097,
9664.95469255663)), .Names = c("Semaine", "M"), row.names = c(NA,
-6L), class = "data.frame")
ggplot() + geom_line(data=df1, aes(x = Semaine, y = M),color = "red") +
stat_smooth(data=df2, aes(x = Semaine, y = M),color = "blue")+
scale_x_continuous(breaks = seq(0,40,1))
Thank you!
cols <- c("A"="red", "B"="blue")
ggplot() + geom_line(data=df1, aes(x = Semaine, y = M,color = "A")) +
stat_smooth(data=df2, aes(x = Semaine, y = M,color = "B"), method = 'loess')+
scale_x_continuous(breaks = seq(0,40,1)) +
scale_color_manual(name="Title", values=cols)

Subsetting and plotting data by TimeStamp

I have a data.frame P1 (5000rows x 4cols) and would like to save the subset of data in columns 2,3 and 4 when the time-stamp in column 1 falls into a set range determined by a vector TimeStamp (in seconds).
E.g. put all values in columns 2, 3, and 4 into a new data.frame and call each section of data: Condition.1.P1, Condition.2.P1, etc.
The reason I'd like to label separately as I have 35 versions of P1 (P2, P3, P33, etc) and need to be able to melt them together to plot them.
dput(TimeStamp)
c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)
dput(head(P1))
structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp = c(27.781,
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70,
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c("Time",
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA,
6L), class = "data.frame")
Do you want to seperate the data by the timestamp range and put it in a list? Than this might be what you are looking for:
TimeStamp <- c(18, 138, 438, 678, 798, 1278, 1578, 1878, 2178)
dat <- structure(list(Time = c(0, 5, 100, 200, 500, 1200), SkinTemp =(27.781,
27.78, 27.779, 27.779, 27.778, 27.777), HeartRate = c(70, 70,
70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c ("Time",
"SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA,
6L), class = "data.frame")
dat$Segment <- cut(dat$Time,c(-Inf,TimeStamp))
split(dat,dat$Segment)
P2 = data.frame(NA, NA, NA, NA) # Create empty data.frame
for (i in 1:length(ts)){
P3 = data.frame() # Create empty changing data.frame
if (i == 1) {ts1 = 0} else {ts1 = ts[i-1]} #First time stamp starts at 0
ts2 = ts[i]
P3 = subset(P1, P1$Time > ts1 & P1$Time < ts2)[,c(2,3,4)] #Subset the columns and assign to P3
if (nrow(P3) == 0){P3 = data.frame(NA, NA, NA)} #If the subset is empty, assign NA
P3$TimeStamp = paste(ts1,ts2,sep="-") # Append TimeStamp to the P3
colnames(P3) = colnames(P2) #Make sure column names are same to allow rbind
P2 = rbind(P2,P3) #Append P3 to P2
}
P2 = P2[c(2:nrow(P2)),] #Remove the first row (that has NA)
colnames(P2) = c("SkinTemp", "HeartRate", "RespirationRate", "TimeStamp") #Provide column names)
rm(P3); rm(i); rm(ts1); rm(ts2) #Cleanup

Compute and save the r-squared value of bootstrap objects in a new dataframe in R

I have a dataframe df
dput(df)
structure(list(x = c(49, 50, 51, 52, 53, 54, 55, 56, 1, 2, 3,
4, 5, 14, 15, 16, 17, 2, 3, 4, 5, 6, 10, 11, 3, 30, 64, 66, 67,
68, 69, 34, 35, 37, 39, 2, 17, 18, 99, 100, 102, 103, 67, 70,
72), y = c(2268.14043972082, 2147.62290922552, 2269.1387550775,
2247.31983098201, 1903.39138268307, 2174.78291538358, 2359.51909126411,
2488.39004804939, 212.851575751527, 461.398994384333, 567.150629704352,
781.775113821961, 918.303706148872, 1107.37695799186, 1160.80594193377,
1412.61328924168, 1689.48879626486, 260.737164468854, 306.72700499362,
283.410379620422, 366.813913489692, 387.570173754128, 388.602676983443,
477.858510450125, 128.198042456082, 535.519377609133, 1028.8780498564,
1098.54431357711, 1265.26965941035, 1129.58344809909, 820.922447928053,
749.343583476846, 779.678206156474, 646.575242339517, 733.953282899613,
461.156280127354, 906.813018662913, 798.186995701282, 831.365377249207,
764.519073183124, 672.076289062505, 669.879217186302, 1341.47673353751,
1401.44881976186, 1640.27575962036)), .Names = c("x", "y"), row.names = c(NA,
-45L), class = "data.frame")
I have created two non-linear regression (nls1 and nls2) based on my dataset.
library(stats)
nls1 <- nls(y~A*(x^B)*(exp(k*x)),
data = df,
start = list(A = 1000, B = 0.170, k = -0.00295))
nls2<-nls(y~A*x^3+B*x^2+C*x+D, data=df,
start = list(A=0.02, B=-0.6, C= 50, D=200))
I then computed bootstrap objects for these two functions to get multiple sets of parameters (A,B and k for nls1 and A, B, C and D for nls2).
library(nlstools)
Boo1 <- nlsBoot(nls1, niter = 200)
Boo2 <- nlsBoot(nls2, niter = 200)
Based on this bootstrap objects, I would like to compute r-squared of each combination of parameters and save the min, max and median of my r-squared values for each bootstrap object into one new dataframe. The dataframe could look like new.df.
structure(list(Median = c(NA, NA), Max = c(NA, NA), Min = c(NA,
NA)), .Names = c("Median", "Max", "Min"), row.names = c("nls1",
"nls2"), class = "data.frame")
The idea is then to do some box plots with the median, min and max values for each non-linear model based on bootstrapping to compare them. Can someone help me out with that? Thanks in advance.
Answer from #bunk
stat <- function(dat, inds) { fit <- try(nls(y~A*(x^B)*(exp(k*x)), data = dat[inds,], start = list(A = 1000, B = 0.170, k = -0.00295)), silent=TRUE); f1 <- if (inherits(fit, "nls")) AIC(fit) else NA; fit2 <- try(nls(y~A*x^3+B*x^2+C*x+D, data = dat[inds,], start = list(A=0.02, B=-0.6, C= 50, D=200)), silent=TRUE); f2 <- if (inherits(fit2, "nls")) AIC(fit2) else NA; c(f1, f2) }; res <- boot(df, stat, R=200). Then, to get medians for example, apply(res$t, 2, median, na.rm=TRUE)

Resources