vector - character/integer class (under the hood) - r

Starting to learn R, and I would appreciate some help understanding how R decides the class of different vectors. I initialize vec <- c(1:6) and when I perform class(vec) I get 'integer'. Why is it not 'numeric', because I thought integers in R looked like this: 4L
Also with vec2 <- c(1,'a',2,TRUE), why is class(vec2) 'character'? I'm guessing R picks up on the characters and automatically assigns everything else to be characters...so then it actually looks like c('1','a','2','TRUE') am I correct?

Type the following, you can see the help page of the colon operator.
?`:`
Here is one paragraph.
For numeric arguments, a numeric vector. This will be of type integer
if from is integer-valued and the result is representable in the R
integer type, otherwise of type "double" (aka mode "numeric").
So, in your example c(1:6), since 1 for the from argument can be representable in R as integer, the resulting sequence becomes integer.
By the way, c is not needed to create a vector in this case.
For the second question, since in a vector all the elements have to be in the same type, R will automatically convert all the elements to the same. In this case, it is possible to convert everything to be character, but it is not possible to convert "a" to be numeric, so it results in a character vector.

Related

How do I convert data from integer and dbl to numeric in R

As stated above, I'm trying to convert data in my dataframe from integer/dbl to numeric but I end up with dbl for both columns.
Original dataset
Code I'm using to convert to numeric;
data$price <- as.numeric(data$price)
data$lot_size <- as.numeric(data$lot_size)
The dataframe I end up with:
The dataframe I end up with
Dataset I have been working with: https://dasl.datadescription.com/datafile/housing-prices-ge19
"numeric is identical to double"
https://stat.ethz.ch/R-manual/R-devel/library/base/html/numeric.html
> typeof(as.numeric(3L))
[1] "double"
> typeof(as.integer(3L))
[1] "integer"
The stuff with types in R is a bit confusing. I would say that numeric is not really a data type at all in R. You will never get the answer numeric from the typeof function.
Both, integers and doubles are considered to be numeric and the function is.numeric will return TRUE for either.
On the other hand, numeric is more often a synonym for double.
The functions numeric and as.numeric are the same as double and as.double.
Edit:
With a bit more research under my belt let me rephrase it like this:
'numeric' is the virtual superclass of both integer and double.
See for example getClass("numeric") and help(UseMethod) (first paragraph in the Details section).
Hadley says it better: Advanced R

Object in R is integer but has length of 8364

I have a data.frame from which I extracted a column called Volume. The code is as follows:
volume = aapl.us$Volume
In the console, I am told the following:
typeof(volume)
# "integer"
length(volume)
# 8364
How is this possible?
The case that you encounter is not strange behavior in R. It may sound unintuitive at first to users of other programming language where there is a distinction between a scalar (single number) and a vector (one-dimensional array).
R does not have "scalar" data. Simplest data structure in R is a vector, and it can be a numeric, character, factor, integer, logical, or complex-valued vector. A single number in R is a "vector of length one", and not a "scalar". A vector must contain data of the same type.
typeof() returns the type of a variable (see the link for further information). In your case, Volume is a vector that contains integers, and that vector has length 8364.

What are character vectors made of?

"Alice" is a character vector of length 1. "Bob" is also a character vector of length 1, but it's clearly shorter. At face value, it appears that R's character are made out of something smaller than characters, but if you try to subset them, say "Alice"[1], you'll just get the original vector back. How does R internally make sense of this? What are character vectors actually made of?
You're mistaking vector length for string length.
In R common variables are all vectors containing whatever data you typed, so both are vectors that contain 1 string even if you don't assign a name to them.
If you want to check the size of each string, use nchar function:
nchar("Alice")
[1] 5
nchar("Bob")
[1] 3

"Named tuples" in r

If you load the pracma package into the r console and type
gammainc(2,2)
you get
lowinc uppinc reginc
0.5939942 0.4060058 0.5939942
This looks like some kind of a named tuple or something.
But, I can't work out how to extract the number below the lowinc, namely 0.5939942. The code (gammainc(2,2))[1] doesn't work, we just get
lowinc
0.5939942
which isn't a number.
How is this done?
As can be checked with str(gammainc(2,2)[1]) and class(gammainc(2,2)[1]), the output mentioned in the OP is in fact a number. It is just a named number. The names used as attributes of the vector are supposed to make the output easier to understand.
The function unname() can be used to obtain the numerical vector without names:
unname(gammainc(2,2))
#[1] 0.5939942 0.4060058 0.5939942
To select the first entry, one can use:
unname(gammainc(2,2))[1]
#[1] 0.5939942
In this specific case, a clearer version of the same might be:
unname(gammainc(2,2)["lowinc"])
Double brackets will strip the dimension names
gammainc(2,2)[[1]]
gammainc(2,2)[["lowinc"]]
I don't claim it to be intuitive, or obvious, but it is mentioned in the manual:
For vectors and matrices the [[ forms are rarely used, although they
have some slight semantic differences from the [ form (e.g. it drops
any names or dimnames attribute, and that partial matching is used for
character indices).
The partial matching can be employed like this
gammainc(2, 2)[["low", exact=FALSE]]
In R vectors may have names() attribute. This is an example:
vector <- c(1, 2, 3)
names(vector) <- c("first", "second", "third")
If you display vector, you should probably get desired output:
vector
> vector
first second third
1 2 3
To ensure what type of output you get after the function you can use:
class(your_function())
I hope this helps.

Integer vs Numeric Datatype in R

This is a question I had while going through R programming course from Coursera. I had asked this question in their forums, but didn't get any answer.
So I thought, I should ask it here.
As I understand what Professor was talking about in that lecture - by default, when we store any number value in variable such as shown below
x <- 1
x
# prints numeric
class(x)
But why is it that when we store a vector such as shown below (note: still without the 'L' symbol to force it as an integer)
x <- 1:10
x
# prints "integer", but why?
class(x)
I thought it should give me a numeric vector, but it is not the case.
Can anybody please explain what is happening here?
this has been discussed, see http://r.789695.n4.nabble.com/Integer-vs-numeric-td847329.html
From help(":")
Value:
For numeric arguments, a numeric vector. This will be of type
'integer' if 'from' is integer-valued and the result is
representable in the R integer type, otherwise of type '"double"'
(aka 'mode' '"numeric"').

Resources