Use of $ and %% operators in R - r

I have been working with R for about 2 months and have had a little bit of trouble getting a hold of how the $ and %% terms.
I understand I can use the $ term to pull a certain value from a function (e.g. t.test(x)$p.value), but I'm not sure if this is a universal definition. I also know it is possible to use this to specify to pull certain data.
I'm also curious about the use of the %% term, in particular, if I am placing a value in between it (e.g. %x%) I am aware of using it as a modulator or remainder e.g. 7 %% 5 returns 2. Perhaps I am being ignorant and this is not real?
Any help or links to literature would be greatly appreciated.
Note: I have been searching for this for a couple hours so excuse me if I couldn't find it!

You are not really pulling a value from a function but rather from the list object that the function returns. $ is actually an infix that takes two arguments, the values preceding and following it. It is a convenience function designed that uses non-standard evaluation of its second argument. It's called non-standard because the unquoted characters following $ are first quoted before being used to extract a named element from the first argument.
t.test # is the function
t.test(x) # is a named list with one of the names being "p.value"
The value can be pulled in one of three ways:
t.test(x)$p.value
t.test(x)[['p.value']] # numeric vector
t.test(x)['p.value'] # a list with one item
my.name.for.p.val <- 'p.value'
t.test(x)[[ my.name.for.p.val ]]
When you surround a set of characters with flanking "%"-signs you can create your own vectorized infix function. If you wanted a pmax for which the defautl was na.rm=TRUE do this:
'%mypmax%' <- function(x,y) pmax(x,y, na.rm=TRUE)
And then use it without quotes:
> c(1:10, NA) %mypmax% c(NA,10:1)
[1] 1 10 9 8 7 6 7 8 9 10 1

First, the $ operator is for selecting an element of a list. See help('$').
The %% operator is the modulo operator. See help('%%').

The '$' operator is used to select particular element from a list or any other data component which contains sub data components.
For example: data is a list which contains a matrix named MATRIX and other things too.
But to get the matrix we write,
Print(data$MATRIX)
The %% operator is a modulus operator ; which provides the remainder.
For example: print(7%%3)
Will print 1 as an output

Related

Stringr str_which first compare 1st row with whole column than to next row

I am trying to match DNA sequences in a column. I am trying to find the longer version of itself, but also in this column it has the same sequence.
I am trying to use Str_which for which I know it works, since if I manually put the search pattern in it finds the rows which include the sequence.
As a preview of the data I have:
SNID type seqs2
9584818 seqs TCTTTCTTTAAGACACTGTCCCAAGCTGAAAGGGAACCTACCAAAGAAACTTCTTCATCTRAGGAATCTACTTATATGTGAGTGCAATGAACTTGTAGATTCTGCTCCTGGGGCCACAGAA
9584818 reversed TTCTGTGGCCCCAGGAGCAGAATCTACAAGTTCATTGCACTCACATATAAGTAGATTCCTYAGATGAAGAAGTTTCTTTGGTAGGTTCCCTTTCAGCTTGGGACAGTGTCTTAAAGAAAGA
9562505 seqs GTCTTCAGCATCTTTCTTTAAGACACTGTCCCAAGCTGAAAGGGAACCTACCAAAGAAACTTCTTCATCTRAGGAATCTACTTATATGTGAGTGCAATGAACTTGTAGATTCTGCTCCTGGGGCCACAGAACTTTGTGAAT
9562505 reversed ATTCACAAAGTTCTGTGGCCCCAGGAGCAGAATCTACAAGTTCATTGCACTCACATATAAGTAGATTCCTYAGATGAAGAAGTTTCTTTGGTAGGTTCCCTTTCAGCTTGGGACAGTGTCTTAAAGAAAGATGCTGAAGAC
Using a simple search of row one as x
x <- "TCTTTCTTTAAGACACTGTCCCAAGCTGAAAGGGAACCTACCAAAGAAACTTCTTCATCTRAGGAATCTACTTATATGTGAGTGCAATGAACTTGTAGATTCTGCTCCTGGGGCCACAGAA"
str_which(df$seqs2, x)
I get the answer I expect:
> str_which(df$seqs3, x)
[1] 1 3
But when I try to search as a whole column, I just get the result of the rows finding itself. And not the other rows in which it is also stated.
> str_which(df$seqs2, df$seqs2)
[1] 1 2 3 4
Since my data set is quite large, I do not want to do this manually, and rather use the column as input, and not just state "x" first.
Anybody any idea how to solve this? I have tried most Stringr cmds by now, but by mistake I might have did it wrongly or skipped some important ones.
Thanks in advance
You may need lapply :
lapply(df$seqs2, function(x) stringr::str_which(df$seqs2, x))
You can also use grep to keep this in base R :
lapply(df$seqs2, function(x) grep(x, df$seqs2))

Recoding a discrete variable

I have a discrete variable with scores from 1-3. I would like to change it so 1=2, 2=1, 3=3.
I have tried
recode(Data$GEB43, "c(1=2; 2=1; 3=3")
But that doesn't work.
I know this is an overly stupid question that can be solved in excel within seconds but trying to learn how to do basics like this in R.
We should always provide a minimal reproducible example:
df <- data.frame(x=c(1,1,2,2,3,3))
You didn't specifiy the package for recode so I assumed dplyr. ?dplyr::recode tells us how the arguments should be passed to the function. In the original question "c(1=2; 2=1; 3=3" is a string (i.e. not an R expression but a character string "c(1=2; 2=1; 3=3"). To make it an R expression we have to get rid of the double quotes and replace the ; with ,. Additionally, we need a closing bracket i.e. c(1=2, 2=1, 3=3). But still, as ?dplyr::recode tells us, this is not the way to pass this information to recode:
Solution using dplyr::recode:
dplyr::recode(df$x, "1"=2, "2"=1, "3"=3)
Returns:
[1] 2 2 1 1 3 3
Assuming, you mean dplyr::recode, the syntax is
recode(.x, ..., .default = NULL, .missing = NULL)
From the documentation it says
.x - A vector to modify
... - Replacements. For character and factor .x, these should be named and replacement is based only on their name. For numeric .x, these can be named or not. If not named, the replacement is done based on position i.e. .x represents positions to look for in replacements
So when you have numeric value you can replace based on position directly
recode(1:3, 2, 1, 3)
#[1] 2 1 3

Understanding List Sub-setting in R

Abstract.
I am having trouble understanding a unit of code regarding
the sub-setting of lists. I am applying an index to a list.
The problem is that when I apply the index to a list inside
a custom function, the list behaves like a table, returning
only the first column, but for every row (4 rows in total).
If I apply the same index to the same list outside of that
custom function, the output is only the first element of the
list, displaying both elements of the character vector contained
in the first element of the list. I need to know why there is a
difference in outputs.
How have I tried to resolve my issue by myself?
I performed a Google search on the following search term:
[Indexing Lists in R](Indexing Lists https://stackoverflow.com/questions/tagged/r).
The closest article was this one: How to correctly use lists in R.
But, it failed to answer my question.
Introduction.
I am citing the code that I am using before stating my question
because it is too confusing a matter to explain in the absolute
abstract.
In the below, there are four instructions that students are told
to follow. Each one is enumerated.
# Instruction 1:
# Create a character vector containing the names of the top four
# mathematicians that contributed to the field of statistics and
# list their birth years, with the name and year separated by a
# colon.
mathematicians <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
# The above code creates a character vector with four elements.
# Instruction 2: Next, use the strsplit() function to split the person's
# last name from his birth year.
split_name_and_year_born <- strsplit(mathematicians, split = ":")
# The variable split_name_and_year_born must be a list because
# strsplit only returns lists (according to the documentation).
# Instruction 3: Write a function that accepts a list or vector
# object and returns only the first element of that object.
first <- function(x) {
x[1]
}
# This is a fairly straightforward function. If x is a list then
# x[1] should be the first element of that list. The same is true
# for vectors.
# Instruction 4: apply the first function to the list split_name_and_year_born
lapply(split_name_and_year_born, first)
# [[1]]
# [1] "GAUSS"
#
# [[2]]
# [1] "BAYES"
#
# [[3]]
# [1] "PASCAL"
#
# [[4]]
# [1] "PEARSON"
My commentary: If you consider split_name_and_year_born as a list of vectors, of length = 2, we could imagine the list behaving somewhat like a table, wherein the first element is the first column in the table. This interpretation of the above code makes sense given the output. However, if I enter the following line of code, I get only the first element of the list.
split_name_and_year_born[1]
[[1]]
[1] "GAUSS" "1777"
My question is, why is there a difference in the output? I am using the same data structure, with the same data. I am only applying the indexing operator in different places. Why is there a difference in outputs? The function must be doing something implicit. I just do not know what.

"Named tuples" in r

If you load the pracma package into the r console and type
gammainc(2,2)
you get
lowinc uppinc reginc
0.5939942 0.4060058 0.5939942
This looks like some kind of a named tuple or something.
But, I can't work out how to extract the number below the lowinc, namely 0.5939942. The code (gammainc(2,2))[1] doesn't work, we just get
lowinc
0.5939942
which isn't a number.
How is this done?
As can be checked with str(gammainc(2,2)[1]) and class(gammainc(2,2)[1]), the output mentioned in the OP is in fact a number. It is just a named number. The names used as attributes of the vector are supposed to make the output easier to understand.
The function unname() can be used to obtain the numerical vector without names:
unname(gammainc(2,2))
#[1] 0.5939942 0.4060058 0.5939942
To select the first entry, one can use:
unname(gammainc(2,2))[1]
#[1] 0.5939942
In this specific case, a clearer version of the same might be:
unname(gammainc(2,2)["lowinc"])
Double brackets will strip the dimension names
gammainc(2,2)[[1]]
gammainc(2,2)[["lowinc"]]
I don't claim it to be intuitive, or obvious, but it is mentioned in the manual:
For vectors and matrices the [[ forms are rarely used, although they
have some slight semantic differences from the [ form (e.g. it drops
any names or dimnames attribute, and that partial matching is used for
character indices).
The partial matching can be employed like this
gammainc(2, 2)[["low", exact=FALSE]]
In R vectors may have names() attribute. This is an example:
vector <- c(1, 2, 3)
names(vector) <- c("first", "second", "third")
If you display vector, you should probably get desired output:
vector
> vector
first second third
1 2 3
To ensure what type of output you get after the function you can use:
class(your_function())
I hope this helps.

how to use R language $ symbol to extract column from a matrix

I am new to R language, if I use us_stocks$"LNC" I could get the corresponding data zoo. resB is a list with following elements.The library is zoo, which is the type of us_stocks
resB
# [[1]] LNC 7
# [[2]] GAM 62
# [[3]] CMA 7
class(resB)
# [1] "list"
names(resB[[1]])
# [1] "LNC"
but when use us_stocks$names(resB[[1]]) I could not get the zoo series? How to fix this?
It often takes a while to understand what is meant by " ... $ is a function which does not evaluate its second argument." Most R functions would take names(resB[[1]]) and eval;uate it and then act on the value. But not $. It expects the second argument to be an actual column name but given as an unquoted string. This is an example of "non-standard evaluation". You will also see it operating in the functions library and help, as well as many functions in what is known perhaps flippantly as the hadleyverse, which includes the packages 'ggplot2' and 'dplyr'. The names of dataframe columns or the nodes of R lists are character literals, however, they are not really R names in the sense that their values cannot be accessed with an unquoted sequence of letters typed to the console at the toplevel of R.
So as already stated you should be using d[[ names(resB[[1]]) ]]. This is also much safer to use in programming, since there are often problems with scoping involved with the use of the $-function in anything other than interactive console use.

Resources