How to get rollingMean function in R package Openair to work? - r

I'm trying to get the 8 hour rolling mean for o3 using the rollingMean function using this code:
mydata <- rollingMean(mydata, pollutant = "o3", hours = 8)
breaks <- c(0, 34, 66, 100, 121, 141, 160, 188, 214, 240, 500)
labels <- c("Low.1", "Low.2", "Low.3", "Moderate.4", "Moderate.5", "Moderate.6",
"High.7", "High.8", "High.9", "Very High.10")
calendarPlot(mydata, pollutant = "rolling8o3",breaks = breaks, labels = labels, cols = "jet", statistic = "max"
But I keep getting this error message:
Error in strsplit(interval, split = " ") : non-character argument
In addition: Warning message:
In diff(as.numeric(unique(dates[order(dates)]))) :
NAs introduced by coercion
Any help with this would be greatly appreciated I'm very new to R. Thanks

If you attach mydata from the package using data(mydata), it should work.
library(openair)
data(mydata)
mydata <- rollingMean(mydata, pollutant = "o3", hours = 8)
breaks <- c(0, 34, 66, 100, 121, 141, 160, 188, 214, 240, 500)
labels <- c("Low.1", "Low.2", "Low.3", "Moderate.4", "Moderate.5", "Moderate.6",
"High.7", "High.8", "High.9", "Very High.10")
calendarPlot(mydata, pollutant = "rolling8o3",breaks = breaks, labels = labels, cols = "jet", statistic = "max")

Related

SVD function in R. I want to get the singular values $d from a list of datasets. I want to put it in a table form

I want to use the svd function to get the singular values of a large datasets in a list.
When I use the svd function in a single matrix, I am able to use $d and get the values, but for the list I cannot get the output.
Here is the code for a matrix and the output.
tb = matrix(c(64, 112, 59, 174, 111, 37,
39, 135, 115, 92, 161, 70,
93, 119, 50, 142, 20, 114,
149, 191, 62, 17, 145, 21,
60, 37, 29, 74, 42, 242), nrow = 5, ncol = 6, byrow = TRUE)
## Compute SVD of tb
#
my_svd = svd(tb)
## Retrieve (save) singular values
#
sv = my_svd$d
## Compute ratio between "1st/2nd" & "2nd/3rd" singular values
#
ratios = matrix(c(sv[1]/sv[2], sv[2]/sv[3]), nrow = 1)
colnames(ratios) = c("sv1/sv2", "sv2/sv3")
## Print ratios
ratios
How do I apply this to the list of dataset?
my current code
svdresult <- lapply(d1,svd)
svdresult
d1 is my list of dataset
How do I get svdresult$d on the list of datasets.
Thanks in advance
Maybe something like the following?
get_svd_ratios <- function(data) {
sv = svd(data)$d
n = length(sv)
ratios = matrix(sv[1:(n - 1)] / sv[2:n] , nrow = 1)
names = paste(
paste0("sv", 1:(n - 1)),
paste0("sv", 2:n),
sep = "/"
)
colnames(ratios) = names
return(ratios)
}
lapply(list(tb), get_svd_ratios)
# [[1]]
# sv1/sv2 sv2/sv3 sv3/sv4 sv4/sv5
# [1,] 2.261771 1.680403 1.29854 2.682195

Replicating a repeated measures compound symmetry structure from SAS to R using lme

I'm trying to replicate an analysis in a paper by Milliken (https://sci-hub.tw/10.1016/s0169-7161(03)22007-1, section 8) from SAS code to R. I'm quite stumped to be honest. It's a split plot repeated measure design where the correlation structure is a compound symmetry structure. Below is the data and SAS code and it's results.
Data
library(magrittr)
library(tidyr)
library(dplyr)
dta <- data.frame(
tmp = c(rep(900, 3), rep(1000, 3), rep(1100, 3)),
posit = rep(c("top", "mid", "bot"), 3),
lot_1 = c(189, 211, 178, 213, 220, 197, 194, 212, 189),
lot_2 = c(195, 206, 162, 199, 230, 198, 215, 208, 193),
lot_3 = c(183, 210, 173, 189, 228, 202, 194, 201, 180),
lot_4 = c(187, 223, 181, 183, 221, 168, 232, 215, 192),
lot_5 = c(173, 191, 149, 202, 213, 151, 190, 198, 182)
)
dta <- dta %>%
tidyr::pivot_longer(., cols = c(lot_1, lot_2, lot_3, lot_4, lot_5),
names_to = "Lot") %>%
dplyr::mutate(Lot = as.factor(Lot),
tmp = as.factor(tmp),
lot_tmp = as.factor(paste0(Lot, "-", tmp)))
SAS Code
proc mixed data = dta cl covtest ic;
class Posit temp lot;
model thick = temp Posit Posit*temp/ddfm = kr; random lot;
repeated posit/type = cs subject = lot*temp r rcorr
Output from SAS
R code attempt
## this works but isn't doing the same thing as above
library(nlme)
m1 <- lme(
value ~ temp + posit + temp:posit,
random = ~ 1 | lot ,
correlation = corCompSymm(form=~1|lot),
data = dta, method = "REML"
)
I'm stuck at this point on how to add a repeated structure to the posit factor.
Thank you for the help!

How do I draw a plotly boxplot with calculated values?

I have the following code for creating a boxplot in ggplot2:
throughput <- c(1, 2, 3, 4, 5)
response_time_min <- c(9, 19, 29, 39, 49)
response_time_10 <- c(50, 55, 60, 60, 61)
response_time_med <- c(100, 100, 100, 100, 100)
response_time_90 <- c(201, 201, 250, 200, 230)
response_time_max <- c(401, 414, 309, 402, 311)
df <- data.frame(throughput, response_time_min, response_time_10, response_time_med,response_time_90, response_time_max)
df
library(ggplot2)
g <- ggplot(df) +
geom_boxplot(aes(x=factor(throughput),ymax = response_time_max,upper = response_time_90,
y = response_time_med,
middle = response_time_med,
lower = response_time_10,
ymin = response_time_min), stat = "identity")
g
But now when I want to apply ggplotly(g) the graph does not render correctly. What can I do to make this work?
I don't think 90th percentile and 10th percentile can be done. Assuming they are q3 and q1, respectively, the code below
bp <- plot_ly(color=c("orange")) %>%
add_trace(lowerfence = response_time_min, q1 = response_time_10,
median = response_time_med, q3 = response_time_90,
upperfence = response_time_max, type = "box") %>%
layout(xaxis=list(title="throughput"),
yaxis=list(title="response_time"))
bp
gives the following output:

How to create a factor variable from a data.frame and plot the columns on a side-by-side plot

Some of you this could be an easy question.
I have 2 data frames:
dput(head(Activitieslessthan35))
structure(list(`Main job: Working time in main job` = c(470,
440, 430, 430, 410, 150), Sleep = c(420, 450, 450, 420, 450,
460), `Unspecified TV video or DVD watching` = c(60, 40, 210,
190, 60, 0), Eating = c(80, 60, 40, 70, 60, 130), `Other personal care:Wash and dress` = c(60,
60, 50, 50, 70, 50), `Travel to work from home and back only` = c(60,
60, 50, 90, 90, 30), `Unspecified radio listening` = c(140, 180,
50, 90, 140, 160), `Other specified social life` = c(350, 270,
310, 330, 710, 440), `Socialising with family` = c(350, 270,
360, 330, 730, 540), `Food preparation and baking` = c(410, 310,
420, 380, 1000, 950)), row.names = c(NA, 6L), class = "data.frame")
and
dput(head(ActivitiesMoreOrEqual35))
structure(list(`Main job: Working time in main job` = c(360,
420, 390, 490, 540, 390), Sleep = c(590, 480, 310, 560, 280,
370), `Unspecified TV video or DVD watching` = c(100, 60, 130,
120, 60, 30), Eating = c(70, 100, 70, 40, 190, 80), `Other personal care:Wash and dress` = c(10,
30, 100, 60, 270, 90), `Travel to work from home and back only` = c(0,
50, 260, 50, 0, 0), `Unspecified radio listening` = c(50, 80,
260, 80, 210, 200), `Other specified social life` = c(190, 320,
790, 250, 580, 420), `Travel in the course of work` = c(50, 80,
260, 70, 120, 200), `Food preparation and baking` = c(440, 570,
820, 570, 820, 590)), row.names = c(NA, 6L), class = "data.frame")
I would like to convert the data.frames into factors - for example to have a factor variable called Activitieslessthan35 with colums of the data frame to be used as levels such as `Main job: Working time in main job', 'Sleep', etc. Later I would like also to plot (the sum) the levels of the factors on a side-by-side bar plot.
I don't know if you care transform a data.frame into factor variable as well how to change the format of the data.frames to create the plot
Any suggestion is welcome
If I understand well, you want to have both of your dataframe in a long format of two columns, one column containing all colnames of your dataframe, and the second column with all values, then summarise each "factor" of the first column, merging both dataframes and plotting both dataframes into a single plot. Am I right ?
Here a way to do it. I called df the dataframe Activitieslessthan35 and df2 the
dataframe ActivitiesMoreOrEqual35.
First, we are going to transpose to a long format each of your dataframe using pivot_longer
library(tidyr)
library(dplyr)
df <- df %>% pivot_longer(everything(), names_to = "Activities", values_to = "Values_less_than35")
df2 <- df2 %>% pivot_longer(everything(),names_to = "Activities", values_to = "Values_More_than_35")
Then, we will calculate the sum value for each factor of each of your dataframe:
df_sum = df%>% group_by(Activities) %>% summarise(Values_less_than35 = sum(Values_less_than35))
df2_sum = df2 %>% group_by(Activities) %>% summarise(Values_More_than_35 = sum(Values_More_than_35))
Then, we are merging both dataframe into a singe one by using "Activities" as merging columns
final_df = merge(df_sum,df2_sum, by.x = "Activities", by.y = "Activities", all = TRUE)
Finally, we are transposing one last time final_df in order to have values in the correct shape for plotting them with ggplot2
final_df <- final_df %>% pivot_longer(., -Activities, names_to = "Variable", values_to = "Value")
And now we can plot your final dataframe using ggplot2
library(ggplot2)
ggplot(final_df, aes(x = stringr::str_wrap(Activities, 15), y = Value, fill = Variable)) +
geom_col(stat = "identity", position = position_dodge()) +
coord_flip()+
xlab("")
And you get the following plot:
Does it look what you are expecting ?

R: How Plot an Excel Table(Matrix) with R

I got this problem I still haven't found out how to solve it. I want to plot all the Values MW1, MW2 and MW3 in function of "DHT + Procymidone". How can I plot all this values in the graphic so that I will get 3 different curves (in different colors and different number like curve 1, 2, ...)? And I want the labels of the X-Values("DHT + Procymidone") to be like -10, -9, ... , -4 instead of 1,00E-10, ...
DHT + Procymidone MW 1 MW 2 MW 3
1,00E-10 114,259526780335 111,022461066274 213,212408408682
1,00E-09 115,024187788314 111,083316791613 114,529425136628
1,00E-08 110,517449986348 107,867941606743 125,10230718665
1,00E-07 100,961311263444 98,4219995773135 116,045168653416
1,00E-06 71,2383604211297 73,539659636842 50,3213799775309
1,00E-05 20,3553333652104 36,1345771905088 15,42260866106
1,00E-04 4,06189509055904 18,1246447874679 10,1988107887318
I have shortened your data frame for convenience reasons, so here's an example:
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
library(tidyr)
library(ggplot2)
mydf <- gather(mydat, "grp", "MW", 2:4)
ggplot(mydf, aes(x = DHT_Procymidone, y = MW, colour = grp)) + geom_line()
which gives following plot:
To use ggplot, your data needs to be in long-format. gather does this for you, appending columns MW1-MW3 into one column, while the column names are added as new column values in the grp-column. This group-column allows to identify different groups, i.e. different colored lines in the plot.
Depending on the type of DHT + Procymidone, you can, e.g. use format(..., scientific = FALSE) to convert to numeric, however, this will result in -0.0000000001 (and not -10).
However, if this data column is a character vector (you can coerce with as.character), this may work:
a <- "1,00E-10"
sub("1,00E", "", a, fixed = TRUE)
> [1] "-10"
As an alternative answer to #Daniel's which doesn't rely on ggplot (thanks Daniel for providing the reproducible data).
mydat <- data.frame(DHT_Procymidone = c(-10, -9, -8, -7, -6, -5, -4),
MW1 = c(114, 115, 110, 100, 72, 20, 4),
MW2 = c(111, 111, 107, 98, 73, 36, 18),
MW3 = c(213, 114, 123, 116, 50, 15, 10))
plot(mydat[,2] ~ mydat[,1], typ = "l", ylim = c(0,220), xlim = c(-10,-2), xlab = "DHT Procymidone", ylab = "MW")
lines(mydat[,3] ~ mydat[,1], col = "blue")
lines(mydat[,4] ~ mydat[,1], col = "red")
legend(x = -4, y = 200, legend = c("MW1","MW2","MW3"), lty = 1, bty = "n", col = c("black","blue","red"))
To change axis labels see the text in xlab and ylab. To change axis limits see xlim and ylim.

Resources