assign to is.na(clinical.trial$age) - r

I am looking at the code from here which has this at the beginning:
## generate data for medical example
clinical.trial <-
data.frame(patient = 1:100,
age = rnorm(100, mean = 60, sd = 6),
treatment = gl(2, 50,
labels = c("Treatment", "Control")),
center = sample(paste("Center", LETTERS[1:5]), 100, replace =
TRUE))
## set some ages to NA (missing)
is.na(clinical.trial$age) <- sample(1:100, 20)
I cannot understand this last line.
The LHS is a vector of all FALSE values. The RHS is a vector of 20 numbers selected from the vector 1:100.
I don't understand this kind of assignment. How is this result in clinical.trial$age getting some NA values? Does this kind of assignment have a name? At best I would say that the boolean vector on the RHS gets numbers assigned to it with recycling.

is.na(x) <- value is translated as 'is.na<-'(x, value).
You can think of 'is.na<-'(x, value) as 'assign NA to x, at position value'.
A perhaps better and intuitive phrasing could be assign_NA(to = x, pos = value).
Regarding other similar function, we can find those in the base package:
x <- as.character(lsf.str("package:base"))
x[grep('<-', x)]
#> [1] "$<-" "$<-.data.frame"
#> [3] "#<-" "[[<-"
#> [5] "[[<-.data.frame" "[[<-.factor"
#> [7] "[[<-.numeric_version" "[<-"
#> [9] "[<-.data.frame" "[<-.Date"
#> [11] "[<-.factor" "[<-.numeric_version"
#> [13] "[<-.POSIXct" "[<-.POSIXlt"
#> [15] "<-" "<<-"
#> [17] "attr<-" "attributes<-"
#> [19] "body<-" "class<-"
#> [21] "colnames<-" "comment<-"
#> [23] "diag<-" "dim<-"
#> [25] "dimnames<-" "dimnames<-.data.frame"
#> [27] "Encoding<-" "environment<-"
#> [29] "formals<-" "is.na<-"
#> [31] "is.na<-.default" "is.na<-.factor"
#> [33] "is.na<-.numeric_version" "length<-"
#> [35] "length<-.factor" "levels<-"
#> [37] "levels<-.factor" "mode<-"
#> [39] "mostattributes<-" "names<-"
#> [41] "names<-.POSIXlt" "oldClass<-"
#> [43] "parent.env<-" "regmatches<-"
#> [45] "row.names<-" "row.names<-.data.frame"
#> [47] "row.names<-.default" "rownames<-"
#> [49] "split<-" "split<-.data.frame"
#> [51] "split<-.default" "storage.mode<-"
#> [53] "substr<-" "substring<-"
#> [55] "units<-" "units<-.difftime"
All works the same in the sense that 'fun<-'(x, val) is equivalent to fun(x) <- val. But after that they all behave like any normal functions.
R manuals: 3.4.4 Subset assignment

The help tells us, that:
(xx <- c(0:4))
is.na(xx) <- c(2, 4)
xx #> 0 NA 2 NA 4
So,
is.na(xx) <- 1
behaves more like
set NA at position 1 on variable xx

#matt, to respond to your question asked above in the comments, here's an alternative way to do the same assignment that I think is easier to follow :-)
clinical.trial$age[sample(1:100, 20)] <- NA

Related

Get the mean for every iteration

I'm new in R. Hoping someone could help me.
I am trying to get the mean using for the first values of i for nth iteration, example (first value on first iteration then first two values on 2nd iterations)
How do I go about doing this?
Here is the sample data:
set.seed(1234)
i <- sample(200,100)
An alternative, may be, simpler solution
set.seed(1234)
i <- sample(200,100)
cumsum(i)/(1:100)
#> [1] 28.00000 54.00000 86.00000 89.75000 94.00000 101.16667 105.71429
#> [8] 113.25000 116.66667 118.20000 116.36364 115.25000 113.30769 110.21429
#> [15] 108.13333 108.62500 103.05882 104.33333 102.10526 97.20000 101.66667
#> [22] 103.81818 101.04348 100.70833 101.56000 105.11538 103.66667 105.96429
#> [29] 106.55172 104.60000 104.70968 105.53125 104.96970 103.08824 103.42857
#> [36] 102.55556 104.10811 102.47368 100.94872 98.47500 98.92683 101.00000
#> [43] 99.79070 99.84091 98.75556 99.52174 100.76596 101.87500 100.95918
#> [50] 101.66000 100.17647 101.03846 102.37736 100.62963 100.54545 99.14286
#> [57] 98.01754 99.20690 100.38983 100.15000 101.00000 99.53226 99.68254
#> [64] 100.34375 100.07692 101.39394 100.17910 99.75000 99.18841 99.85714
#> [71] 100.35211 100.72222 102.04110 101.02703 100.69333 101.53947 102.44156
#> [78] 101.89744 101.43038 100.61250 100.83951 102.04878 101.04819 99.95238
#> [85] 99.12941 98.70930 97.77011 98.44318 98.92135 98.46667 97.45055
#> [92] 97.31522 97.75269 97.05319 96.84211 97.02083 97.81443 97.93878
#> [99] 98.92929 99.55000
Created on 2022-03-04 by the reprex package (v2.0.1)
Here's a one-liner to get the result:
sapply(1:100, function(x) mean(i[seq(x)]))
#> [1] 28.00000 54.00000 86.00000 89.75000 94.00000 101.16667 105.71429
#> [8] 113.25000 116.66667 118.20000 116.36364 115.25000 113.30769 110.21429
#> [15] 108.13333 108.62500 103.05882 104.33333 102.10526 97.20000 101.66667
#> [22] 103.81818 101.04348 100.70833 101.56000 105.11538 103.66667 105.96429
#> [29] 106.55172 104.60000 104.70968 105.53125 104.96970 103.08824 103.42857
#> [36] 102.55556 104.10811 102.47368 100.94872 98.47500 98.92683 101.00000
#> [43] 99.79070 99.84091 98.75556 99.52174 100.76596 101.87500 100.95918
#> [50] 101.66000 100.17647 101.03846 102.37736 100.62963 100.54545 99.14286
#> [57] 98.01754 99.20690 100.38983 100.15000 101.00000 99.53226 99.68254
#> [64] 100.34375 100.07692 101.39394 100.17910 99.75000 99.18841 99.85714
#> [71] 100.35211 100.72222 102.04110 101.02703 100.69333 101.53947 102.44156
#> [78] 101.89744 101.43038 100.61250 100.83951 102.04878 101.04819 99.95238
#> [85] 99.12941 98.70930 97.77011 98.44318 98.92135 98.46667 97.45055
#> [92] 97.31522 97.75269 97.05319 96.84211 97.02083 97.81443 97.93878
#> [99] 98.92929 99.55000
Created on 2022-03-04 by the reprex package (v2.0.1)

Import a package with only local side-effect

When writing tests, I sometimes want to check how R would react to conflicts.
For instance, my package contains a compact() function that conflicts with purrr::compact(), and I wrote some code so that this latter is still used on regular lists.
In my tests, I want to check that purrr::compact() will still work on regular lists if my package is loaded.
Therefore, I wrote a unit-test that looks a bit like this:
test_that("Test A", {
library(purrr, include.only="compact", warn.conflicts=FALSE)
compact = crosstable::compact
x = list(a = "a", b = NULL, c = integer(0), d = NA, e = list())
expect_identical(compact(x), list(a="a",d=NA))
})
However, the library() call has a global effect that kind of messes up with some other unrelated tests.
Is there a way to import a library locally?
I'm thinking about something like rlang::local_options().
My first idea is a great package withr which helps with all temp related problems. Take into account that namespace will be still there, loadedNamespaces().
Example of usage from .GlobalEnv:
search()
#> [1] ".GlobalEnv" "package:stats" "package:graphics"
#> [4] "package:grDevices" "package:utils" "package:datasets"
#> [7] "package:methods" "Autoloads" "tools:callr"
#> [10] "package:base"
withr::with_package("dplyr", {airquality %>% mutate(n = 2) %>% head()})
#> Ozone Solar.R Wind Temp Month Day n
#> 1 41 190 7.4 67 5 1 2
#> 2 36 118 8.0 72 5 2 2
#> 3 12 149 12.6 74 5 3 2
#> 4 18 313 11.5 62 5 4 2
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 2
mutate
#> Error in eval(expr, envir, enclos): object 'mutate' not found
search()
#> [1] ".GlobalEnv" "package:stats" "package:graphics"
#> [4] "package:grDevices" "package:utils" "package:datasets"
#> [7] "package:methods" "Autoloads" "tools:callr"
#> [10] "package:base"
Created on 2021-06-21 by the reprex package (v2.0.0)
Another idea is usage of utils::getFromNamespace:
fun <- utils::getFromNamespace("fun", "pkg")

skimr: how to remove histogram?

I want to use the function skim from R package skimr on Windows. Unfortunately, in many situations column, hist is printed incorrectly (with many <U+2587>-like symbols), as in the example below.
Question: is there an easy way to either disable column "hist" and prevent it from being printed or prevent it from being calculated at all? Is there an option like hist = FALSE?
capture.output(skimr::skim(iris))
#> [1] "Skim summary statistics"
#> [2] " n obs: 150 "
#> [3] " n variables: 5 "
#> [4] ""
#> [5] "-- Variable type:factor ------------------------------------------------------------------------"
#> [6] " variable missing complete n n_unique top_counts"
#> [7] " Species 0 150 150 3 set: 50, ver: 50, vir: 50, NA: 0"
#> [8] " ordered"
#> [9] " FALSE"
#> [10] ""
#> [11] "-- Variable type:numeric -----------------------------------------------------------------------"
#> [12] " variable missing complete n mean sd p0 p25 p50 p75 p100"
#> [13] " Petal.Length 0 150 150 3.76 1.77 1 1.6 4.35 5.1 6.9"
#> [14] " Petal.Width 0 150 150 1.2 0.76 0.1 0.3 1.3 1.8 2.5"
#> [15] " Sepal.Length 0 150 150 5.84 0.83 4.3 5.1 5.8 6.4 7.9"
#> [16] " Sepal.Width 0 150 150 3.06 0.44 2 2.8 3 3.3 4.4"
#> [17] " hist"
#> [18] " <U+2587><U+2581><U+2581><U+2582><U+2585><U+2585><U+2583><U+2581>"
#> [19] " <U+2587><U+2581><U+2581><U+2585><U+2583><U+2583><U+2582><U+2582>"
#> [20] " <U+2582><U+2587><U+2585><U+2587><U+2586><U+2585><U+2582><U+2582>"
#> [21] " <U+2581><U+2582><U+2585><U+2587><U+2583><U+2582><U+2581><U+2581>"
Changing the locale to Chinese (as in this answer) does not solve the problem, but makes it worse:
Sys.setlocale(locale = "Lithuanian")
df <- data.frame(x = 1:5, y = c("Ą", "Č", "Ę", "ū", "ž"))
Sys.setlocale(locale = "Chinese")
capture.output(skimr::skim(df))
#> Error in substr(names(x), 1, options$formats$.levels$max_char) : invalid multibyte string at '<c0>'
skim_with(numeric = list(hist = NULL)) This is in the "Using Skimr" vignette.
You could also use skim_without_charts instead of skim.
More details in the docs here:
https://www.rdocumentation.org/packages/skimr/versions/2.0.2/topics/skim
Also keep in mind that the output from skimr is a dataframe so you can do:
# I'm using tidyverse here
iris %>%
skim() %>%
select(-numeric.hist)
The catch is that the name of the column is not hist but numeric.hist.
I actually got to this question because I wanted to do the opposite: keep only the histograms.

Scientific notation with Rmpfr in R

I am testing some calculation based on the example code from Rmpfr in R.
The test is as follows:
ns <- mpfr(1:24, 120) ; factorial(ns)
However, in my output the result is:
[1] 1e0 2e0 6e0
[4] 2.4e1 1.2e2 7.2e2
[7] 5.04e3 4.032e4 3.6288e5
[10] 3.6288e6 3.99168e7 4.790016e8
[13] 6.2270208e9 8.71782912e10 1.307674368e12
[16] 2.0922789888e13 3.55687428096e14 6.402373705728e15
[19] 1.21645100408832e17 2.43290200817664e18 5.109094217170944e19
[22] 1.12400072777760768e21 2.585201673888497664e22 6.2044840173323943936e23
While in the example output, the scientific notation is got rid of as :
[1] 1 2
[3] 6 24
[5] 120 720
[7] 5040 40320
[9] 362880 3628800
[11] 39916800 479001600
[13] 6227020800 87178291200
[15] 1307674368000 20922789888000
[17] 355687428096000 6402373705728000
[19] 121645100408832000 2432902008176640000
[21] 51090942171709440000 1124000727777607680000
[23] 25852016738884976640000 620448401733239439360000
How could I turn off the scientific notation in this case?
I have tried:
options(scipen=999)
mpfr(1:24, 120,scientific=FALSE)
But none of them are working.
use scipen=0
library(Rmpfr)
options(scipen = 999)
ns <- mpfr(1:24, 120) ; factorial(ns)
# 24 'mpfr' numbers of precision 120 bits
# [1] 1e0 2e0 6e0
# [4] 2.e1 1.e2 7.e2
# [7] 5.0e3 4.03e4 3.628e5
# [10] 3.628e6 3.9916e7 4.79001e8
# [13] 6.227020e9 8.7178291e10 1.30767436e12
# [16] 2.092278988e13 3.5568742809e14 6.40237370572e15
# [19] 1.2164510040883e17 2.4329020081766e18 5.10909421717094e19
# [22] 1.1240007277776076e21 2.58520167388849766e22 6.204484017332394393e23
options(scipen = 0)
ns <- mpfr(1:24, 120) ; factorial(ns)
# 24 'mpfr' numbers of precision 120 bits
# [1] 1 2 6
# [4] 24 120 720
# [7] 5040 40320 362880
# [10] 3628800 39916800 479001600
# [13] 6227020800 87178291200 1307674368000
# [16] 20922789888000 355687428096000 6402373705728000
# [19] 121645100408832000 2432902008176640000 51090942171709440000
# [22] 1124000727777607680000 25852016738884976640000 620448401733239439360000

How to enumerate all S4 methods implemented by a package?

I'm looking for a way to query all S4 methods implemented by a particular package (given through its namespace environment). I think I could enumerate all objects that start with .__T__, but I'd rather prefer using a documented and/or less hackish way.
> ls(asNamespace("RSQLite"), all.names = TRUE, pattern = "^[.]__T__")
[1] ".__T__dbBegin:DBI" ".__T__dbBeginTransaction:RSQLite"
[3] ".__T__dbBind:DBI" ".__T__dbClearResult:DBI"
[5] ".__T__dbColumnInfo:DBI" ".__T__dbCommit:DBI"
[7] ".__T__dbConnect:DBI" ".__T__dbDataType:DBI"
[9] ".__T__dbDisconnect:DBI" ".__T__dbExistsTable:DBI"
[11] ".__T__dbFetch:DBI" ".__T__dbGetException:DBI"
[13] ".__T__dbGetInfo:DBI" ".__T__dbGetPreparedQuery:RSQLite"
[15] ".__T__dbGetQuery:DBI" ".__T__dbGetRowCount:DBI"
[17] ".__T__dbGetRowsAffected:DBI" ".__T__dbGetStatement:DBI"
[19] ".__T__dbHasCompleted:DBI" ".__T__dbIsValid:DBI"
[21] ".__T__dbListFields:DBI" ".__T__dbListResults:DBI"
[23] ".__T__dbListTables:DBI" ".__T__dbReadTable:DBI"
[25] ".__T__dbRemoveTable:DBI" ".__T__dbRollback:DBI"
[27] ".__T__dbSendPreparedQuery:RSQLite" ".__T__dbSendQuery:DBI"
[29] ".__T__dbUnloadDriver:DBI" ".__T__dbWriteTable:DBI"
[31] ".__T__fetch:DBI" ".__T__isSQLKeyword:DBI"
[33] ".__T__make.db.names:DBI" ".__T__show:methods"
[35] ".__T__sqlData:DBI" ".__T__SQLKeywords:DBI"
I think showMethods is the only thing available in methods, but it does not actually return the functions as an object, just prints them to the screen.
The following will return a list of the methods defined in an environment. Adapted from covr::replacements_S4(), which is used to modify all methods in a package to track coverage.
S4_methods <- function(env) {
generics <- methods::getGenerics(env)
res <- Map(generics#.Data, generics#package, USE.NAMES = FALSE,
f = function(name, package) {
what <- methods::methodsPackageMetaName("T", paste(name, package, sep = ":"))
table <- get(what, envir = env)
mget(ls(table, all.names = TRUE), envir = table)
})
res[lengths(res) > 0]
}
m <- S4_methods(asNamespace("DBI"))
length(m)
#> [1] 21
m[1:3]
#> [[1]]
#> [[1]]$DBIObject
#> function(dbObj, obj, ...) {
#> dbiDataType(obj)
#> }
#> <environment: namespace:DBI>
#> attr(,"target")
#> An object of class "signature"
#> dbObj
#> "DBIObject"
#> attr(,"defined")
#> An object of class "signature"
#> dbObj
#> "DBIObject"
#> attr(,"generic")
#> [1] "dbDataType"
#> attr(,"generic")attr(,"package")
#> [1] "DBI"
#> attr(,"class")
#> [1] "MethodDefinition"
#> attr(,"class")attr(,"package")
#> [1] "methods"
#>
#>
#> [[2]]
#> [[2]]$character
#> function(drvName, ...) {
#> findDriver(drvName)(...)
#> }
#> <environment: namespace:DBI>
#> attr(,"target")
#> An object of class "signature"
#> drvName
#> "character"
#> attr(,"defined")
#> An object of class "signature"
#> drvName
#> "character"
#> attr(,"generic")
#> [1] "dbDriver"
#> attr(,"generic")attr(,"package")
#> [1] "DBI"
#> attr(,"class")
#> [1] "MethodDefinition"
#> attr(,"class")attr(,"package")
#> [1] "methods"
#>
#>
#> [[3]]
#> [[3]]$`DBIConnection#character`
#> function(conn, statement, ...) {
#> rs <- dbSendStatement(conn, statement, ...)
#> on.exit(dbClearResult(rs))
#> dbGetRowsAffected(rs)
#> }
#> <environment: namespace:DBI>
#> attr(,"target")
#> An object of class "signature"
#> conn statement
#> "DBIConnection" "character"
#> attr(,"defined")
#> An object of class "signature"
#> conn statement
#> "DBIConnection" "character"
#> attr(,"generic")
#> [1] "dbExecute"
#> attr(,"generic")attr(,"package")
#> [1] "DBI"
#> attr(,"class")
#> [1] "MethodDefinition"
#> attr(,"class")attr(,"package")
#> [1] "methods"
I think you want the showMethods function, as in:
showMethods(where=asNamespace("RSQLite"))
The output is:
Function: dbBegin (package DBI)
conn="SQLiteConnection"
Function: dbBeginTransaction (package RSQLite)
conn="ANY"
Function: dbClearResult (package DBI)
res="SQLiteConnection"
res="SQLiteResult"
Function: dbColumnInfo (package DBI)
res="SQLiteResult"
and this goes on for many more rows. ?showMethods will has some additional arguments for tailoring the results.

Resources