Calculate moving geometric mean by water sampling station

Calculate moving geometric mean by water sampling station - r

I need to calculate the moving geometric mean on fecal coliform over time(at each value I want the geomean of that value and the previous 29 values), by individual sampling stations. When I download the data from our database the column headers are:
Station SampleDate FecalColiform
Depending on the growing area there are a few to over a dozen stations.
I tried to adapt some code that I found at HERE:
#File: Fecal
Fecal <- group_by(Fecal, Station) %>%
arrange(SampleDate) %>%
mutate(logres = log10(ResultValue)) %>%
mutate(mgm = stats::filter(logres, rep(1/24, 24), sides =1))
This worked, but the problem is that I don't want the resulting log values. I want just the regular geomean so that I can plot it and everyone can easily understand the values. I tried to somehow sneak the geometric.mean function from the psych package in there I could not make that work.
There are resources for calculating a moving average, and code for calculating geometric mean and I have tried to combine several of them. I can't find an example for moving geometric mean.
Eventually I would like to graph all of geomeans by station similar to the example in the link above.
> dput(ByStationRGMData[1:10,])
structure(list(Station = c(114L, 114L, 114L, 114L, 114L, 114L,
114L, 114L, 114L, 114L), Classification = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(" Approved ", " Conditionally Approved ",
" Prohibited "), class = "factor"), SampleDate = c(19890103L,
19890103L, 19890209L, 19890316L, 19890413L, 19890511L, 19890615L,
19890713L, 19890817L, 19890914L), SWTemp = c(NA, NA, 5L, 8L,
NA, 13L, 15L, 18L, NA, 18L), Salinity = c(NA, NA, 22L, 18L, NA,
26L, 22L, 24L, NA, 32L), FecalColiform = c(180, 49, 2, 17, 7.9,
1.8, 4.5, 11, 33, 1.8), RGM = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
)), .Names = c("Station", "Classification", "SampleDate", "SWTemp",
"Salinity", "FecalColiform", "RGM"), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), vars = list(
Station), drop = TRUE, indices = list(0:9), group_sizes = 10L, biggest_group_size = 10L, labels = structure(list(
Station = 114L), class = "data.frame", row.names = c(NA,
-1L), vars = list(Station), drop = TRUE, .Names = "Station"))
I would also like to add a moving 90th percentile to the dataframe and the graphs. I tried the following:
ByStationRGMData <- RawData %>%
group_by(Station) %>%
arrange(SampleDate) %>%
mutate(RGM = as.numeric(rollapply(FecalColiform, 30, geometric.mean, fill=NA, align="right"))) +
mutate(F90 = as.numeric(rollapply(FecalColiform, 30, quantile, p=0.90, fill=NA, align="right")))
This gives me the error:
Error in mutate_(.data, .dots = lazyeval::lazy_dots(...)) : argument ".data" is missing, with no default
I can't seem to figure out what I'm missing.

You can use rollapply from the zoo package (illustrated here using the built-in mtcars data frame). I've used a window of 3 values, but you can set that to 30 in your actual data. align="left" uses the current value and n-1 previous values, where n is the window width:
library(psych)
library(dplyr)
library(zoo)
mtcars %>%
mutate(mpgGM = rollapply(mpg, 3, geometric.mean, fill=NA, align="left"))
Include a grouping variable to get rolling geometric means separately for each group.

Related

How to skip and disregard a row in a loop that can't be read by a line of code or that provides error?

structure(list(`total primary - yes RS` = c(0L, 138L, 101L, 86L,
118L), `total primary - no RS` = c(0L, 29L, 39L, 35L, 38L), `total secondary- yes rs` = c(0L,
6L, 15L, 3L, 15L), `total secondary- no rs` = c(0L, 0L, 7L, 1L,
2L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
I had previously asked for a line of code that could run a chisquare for each of four rows included
https://stackoverflow.com/questions/66750999/with-r-i-would-like-to-loop-through-each-row-and-create-corresponding-chisquare/66751018#66751018
Though the script worked it only worked because the four rows were able to run through the script.
library(broom)
library(dplyr)
apply(df, 1, function(x) tidy(chisq.test(matrix(x, ncol = 2)))) %>%
bind_rows
I now have a line that has zero and when i run the same script i get
Error in stats::chisq.test(x, y, ...) :
at least one entry of 'x' must be positive
I tried to do something using tryCatch(), this way
tryCatch(apply(df, 1, function(x) tidy(chisq.test(matrix(x, ncol = 2))))) %>%
bind_rows
but it did not work. Ultimately the dataset has a bunch of rows like this I would like a scenario where the script recognizes that it isn't only in row 1, but in multiple rows like 5,23,67 and so on.

I am not sure I am following your code/data exactly, but what if you move your tryCatch statement inside the apply statement like so: apply(df, 2, function(x) tryCatch(tidy(chisq.test(matrix(x, ncol = 2))))) %>% bind_rows? Does that help at all?

how to drop multiple column which has categorical values in R?

I know how to drop columns by name, but I am not quite sure how I am going to drop the columns which has categorical values. It can be done manually looking at which columns has categorical values, but not intuitive for me using R code. How to detect columns which has categorical values? any way to make this happen?
minimal data
mydf=structure(list(taken_time = c(15L, 5L, 39L, -21L, 46L, 121L),
ap6xl = c(203.2893857, 4.858269406, 2, 14220, 218.2215352,
115.5227706), pct5 = c(732.074484, 25.67901235, 1.01, 120.0477168,
3621.328567, 79.30561111), crp4 = c(196115424.7, 1073624.455,
1.23, 1457496.474, 10343851.7, 81288042.73), age = c(52L,
74L, 52L, 67L, 82L, 67L), gender = structure(c(2L, 2L, 2L,
1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), inpatient_readmission_time_rtd = c(79.78819444,
57.59068053, 57.59068053, 57.59068053, 57.59068053, 9.893055556
), infection_flag = c(0L, 0L, 1L, 1L, 0L, 1L), temperature_value = c(98.9,
98.9, 98, 101.3, 99.5, 98.1), heartrate_value = c(106, 61,
78, 91, 120, 68), pH_result_time_rta = c(11, 85.50402145,
85.50402145, 85.50402145, 85.50402145, 85.50402145), gcst_value = c(15,
15, 15, 14.63769293, 15, 14.63769293)), row.names = c(NA,
6L), class = "data.frame")
instead of manually typing name of columns which has categorical values, is there any way we can detect categorical columns and drop it?
I am concerning the case such as dataframe might have more than 10 categorical columns, it is sort of pain, so I am curious if it is possible using R. any thought?
for example, I can do this for above dataframe by manually looking at which one are categorical columns:
mydf <- mydf[!names(mydf) %in% c("gender", "infection_flag")]
is there any way we can detect which ones is categorical columns and drop it for numerical calculation purpose? any idea?

An option with base R
i1 <- sapply(mydf, is.numeric)
df[i1]

You can use dplyr and select all the numerical columns:
library(dplyr)
mydf %>% select_if(is.numeric)

group variables depending on defined circular area with center of circle having variable radius

I have a data table object:
> dput(head(trackdatacompvar))
structure(list(wellvid = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("A4-009",
"B3-006", "B4-015", "C2-009", "C2-034", "C3-017", "C4-014", "C4-016",
"C4-026", "C4-036"), class = "factor"), TRACK_ID = c(0L, 0L,
0L, 0L, 0L, 0L), treatment = structure(c(2L, 2L, 2L, 2L, 2L,
2L), .Label = c("Si_induced", "Si_notinduced"), class = "factor"),
A = c(0L, 0L, 0L, 0L, 0L, 0L), X = c(50.216, 50.216, 50.091,
50.091, 50.216, 50.216), Y = c(295.609, 295.609, 295.477,
295.477, 295.609, 295.609), T = 0:5, V = c(0, 0, 0.181793839279557,
0, 0.181793839279557, 0), x_grpA = c(641.67, 641.67, 641.67,
641.67, 641.67, 641.67), y_grpA = c(625, 625, 625, 625, 625,
625), rad_grpA = c(50L, 50L, 50L, 50L, 50L, 50L), x_grpB = c(889.58,
889.58, 889.58, 889.58, 889.58, 889.58), y_grpB = c(377.08,
377.08, 377.08, 377.08, 377.08, 377.08), rad_grpB = c(20L,
20L, 20L, 20L, 20L, 20L)), .Names = c("wellvid", "TRACK_ID",
"treatment", "A", "X", "Y", "T", "V", "x_grpA", "y_grpA", "rad_grpA",
"x_grpB", "y_grpB", "rad_grpB"), sorted = "wellvid", class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000210788>)
I want to define 4 groups of data depending on circular area. Groups A and B will be dependent on the x,y origin of 2 beads (labelled as x_grpA, y_grpA and x_grpB, y_grpB), group C is an outside area and group D as the area where groups A and B overlap (but this area is sometimes not there). The 2 circular groups should be inside a circular area with radius of 115 µm. This 115 µm is dependent on the size of the bead, so I also have in my data 2 radius (rad_grpA and rad_grpB). To understand it visually, here are 2 pictures:
My original idea is to reuse the awesome script I was given before. So, I tried defining the center of the each data point and the corresponding length of the whole area of group A as:
center_grpA <- c(trackdatacompvar$x_grpA, trackdatacompvar$y_grpA)
circle_grpA <- (trackdatacompvar$rad_grpA)*2 + 115
But after this I am lost.
In the end I want to put inside my dataframe their grouping as one variable.
Would appreciate any help! Thanks :)

We can use a little convenience function from a package of mine here:
check_if_in_circle <- function(points, x, y, r) {
(points[, 1] - x) ^ 2 + (points[, 2] - y) ^ 2 < r ^ 2
}
Now we check for each point, whether it's in circle A, circle B, and then ifelse to figure out whether to assign A, B, C or D. I use within to avoid typing that long data name.
trackdatacompvar <- within(trackdatacompvar,
{
grpA <- check_if_in_circle(points = cbind(X, Y),
x_grpA, y_grpA, rad_grpA + 115)
grpB <- check_if_in_circle(points = cbind(X, Y),
x_grpB, y_grpB, rad_grpB + 115)
grp <- ifelse(grpA, ifelse(grpB, 'D', 'A'),
ifelse(grpB, 'B', 'C'))
} )
For the few rows you gave us, all are in group C.

R: How to create multiple maps (rworldmap) using apply?

I want to create multiple maps (similar to this example) using the apply family. Here a small sample of my code (~200 rows x 150 cols). (UN and ISO3 are codes for rworldmap):
df <- structure(list(BLUE.fruits = c(12803543,
3745797, 19947613, 0, 130, 4), BLUE.nuts = c(21563867, 533665,
171984, 0, 0, 0), BLUE.veggies = c(92690, 188940, 34910, 0, 0,
577), GREEN.fruits = c(3389314, 15773576, 8942278, 0, 814, 87538
), GREEN.nuts = c(6399474, 1640804, 464688, 0, 0, 0), GREEN.veggies = c(15508,
174504, 149581, 0, 0, 6190), UN = structure(c(4L, 5L, 1L, 6L,
2L, 3L), .Label = c("12", "24", "28", "4", "8", "n/a"), class = "factor"),
ISO3 = structure(c(1L, 3L, 6L, 4L, 2L, 5L), .Label = c("AFG",
"AGO", "ALB", "ASM", "ATG", "DZA"), class = "factor")), .Names = c("BLUE.fruits", "BLUE.nuts", "BLUE.veggies", "GREEN.fruits", "GREEN.nuts",
"GREEN.veggies", "UN", "ISO3"), row.names = c(97L, 150L, 159L,
167L, 184L, 191L), class = "data.frame")
and the code I used before to plot one single map:
library(rworldmap)
mapDevice('x11')
spdf <- joinCountryData2Map(df, joinCode="ISO3", nameJoinColumn="ISO3")
mapWF <- mapCountryData(spdf, nameColumnToPlot="BLUE.nuts",
catMethod="quantiles")
Note: in mapCountryData() I used the names of single columns (in this case "BLUE.nuts"). My question is: is there a way to apply this mapping code for the different columns creating six different maps? Either in one multi-panel using layout() or even better creating six different plots that get saved according to their colnames. Ideas? Thanks a lot in advance

You are close.
Add this to save one plot per column.
#put column names to plot in a vector
col_names <- names(df)[1:6]
lapply(col_names, function(x) {
#opens device to store pdf
pdf(paste0(x,'.pdf'))
#plots map
mapCountryData(spdf, nameColumnToPlot=x)
#closes created pdf
dev.off()
})

Using summarise function to make sumIF with the dplyr package [duplicate]

This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
I am using the dplyr to make a sumIF function on my data frame. However, it does not give me the desired output:
> dput(sys)
structure(list(NUMERIC = c(244L, 24L, 1L, 2L, 4L, 111L, 23L,
2L, 3L, 4L, 24L), VAL = c("FALSE", "FALSE", "TES", "TEST", "TRUE",
"TRUE", "TRUE", "asdfs", "asdfs", "safd", "sd"), IDENTIFIER = c(99L,
99L, 98L, 98L, 99L, 99L, 99L, 13L, 13L, 99L, 12L)), .Names = c("NUMERIC",
"VAL", "IDENTIFIER"), row.names = c(NA, 11L), class = c("grouped_dt",
"tbl_dt", "tbl", "grouped_dt", "tbl_dt", "tbl", "data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000000100788>, sorted = c("VAL",
"IDENTIFIER"), vars = list(VAL, IDENTIFIER))
>
>
> sys <- group_by(sys, VAL, IDENTIFIER)
> df.summary <- summarise(sys,
+ numeric = sum(NUMERIC)
+ )
>
> (df.summary)
numeric
1 442
My desired result should look like that:
Any recommendation as to what I am doing wrong?

This could occur when you have plyr loaded along with dplyr. You can either do this on a new R session or use
dplyr::summarise(sys,
numeric = sum(NUMERIC)
)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Calculate moving geometric mean by water sampling station - r

Related

How to skip and disregard a row in a loop that can't be read by a line of code or that provides error?

how to drop multiple column which has categorical values in R?

group variables depending on defined circular area with center of circle having variable radius

R: How to create multiple maps (rworldmap) using apply?

Using summarise function to make sumIF with the dplyr package [duplicate]

Categories

Resources