Coercing a character to a numeric in R - r

I'm a newbie to R and I've learnt that a character string like "12.5" can be coerced to a numeric using as.numeric() function, which gives me the following result.
> as.numeric("12.5")
[1] 12.5
But when I try following, the result doesn't contain the fractional part.
> as.numeric("12.0")
[1] 12
is there a way to keep the fractional part in the result...
Thanks in advance...

Is it really necessary to print whole numbers that way? If you're worried about how it will appear with other numbers, say in a vector or data frame, not to worry. If you have at least one decimal number in the vector, the whole number will appear as a decimal as well.
> as.numeric(c("12.0", "12.1"))
## [1] 12.0 12.1
> data.frame(x = as.numeric(c("12.0", "12.1")))
## x
## 1 12.0
## 2 12.1
If it's simply for appearance purposes, there are a few functions that can make 12.0 appear numeric. Keep in mind, however that this does not coerce to numeric, even though it looks like it does.
> noquote("12.0")
## [1] 12.0
> cat("12.0")
## 12.0

Related

Converting character matrix with vectors of numerics and plain numerics into numeric

this in theory simple task turned out to drive me crazy today. I'm rather new to R, but got along quite well until now. Maybe someone of you is having an easier time to solve it.
In short: How do I get the maximum values per observation out of a somehow 'mixed' character matrix similar to this one?
dummy = as.matrix(c("c(1.5,2.6,3)", "2", "1.5", "c(1.8, 2.9)"))
so that my result says (in numeric): c(3, 2, 1.5, 2.9)
The longer story:
I'm coming from a
stri_match_all_regex(somestring, regexp)
to get some numbers from a plain text. This returns me a character matrix (by definition of the stri_match_all_regex function)
let it look something similar like this after stripping out some stray characters:
dummy = as.matrix(c("c(1.5,2.6,3)","2","1.5","c(1.8,2.9)"))
You already see the complication of the strings instead of vectors in my matrix here. My desired state is to identify the maximum value of each row.
Usually nothing simpler as that, I'd e.g. run
lapply(dummy, max)
But applying numerical functions obviously won't work with these characters disguised as numericals.(until this point I did not even realize that these are all characters and not numbers as they show up without quotation marks in rStudio View(dummy) ). Turning it into numerics with
as.numeric(dummy)
makes me lose the vectors within the matrix with NAs. Not what I want. I want each "c(1.2,5)" interpreted as if it would be a 'real'/'quotation-mark-less c(1.2,5), and the numbers as numbers of course too.
I even tried to strsplit / gsub the columns but that doesn't seem fruitful either or I'm just doing it wrong.
gsub( ",|c\\(|\\)", ",", dummy)
leaves me with NAs as the , isn't properly interpreted and
as.numeric(strsplit(dummy, ",|.\\(|\\)"))
won't allow me to coerce th elist object returned to numeric
Hence the straightforward question:
How do I turn a character Matrix similar to dummy into a "usable" form to apply numeric functions on both, the plain numbers and the vectors consisting of numbers?
Thanks for your help! I feel like this should be easy.. but I'm stuck with it for quite a while now.
You can use eval/parse to get the numeric values.
result <- apply(dummy, 1, function(s) {
eval(parse(text = s))
})
result
#[[1]]
#[1] 1.5 2.6 3.0
#
#[[2]]
#[1] 2
#
#[[3]]
#[1] 1.5
#
#[[4]]
#[1] 1.8 2.9
If you'd like a tidyverse solution, here's one that makes use of purrr and stringr. Mapping along the items in dummy, I remove any "c" and parentheses from each entry, split it by commas and (optionally) space, flatten into a single-level list, and convert to numeric.
library(tidyverse)
dummy <- as.matrix(c("c(1.5,2.6,3)", "2", "1.5", "c(1.8, 2.9)"))
map(dummy, ~str_remove_all(., "[c\\(\\)]") %>%
str_split(",\\s?") %>%
flatten_chr() %>%
as.numeric()
)
#> [[1]]
#> [1] 1.5 2.6 3.0
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 1.5
#>
#> [[4]]
#> [1] 1.8 2.9
Created on 2018-07-10 by the reprex package (v0.2.0).
You can use this:
apply(dummy, 1, function(x) max(eval(parse(text=x))))
Result:
[1] 3.0 2.0 1.5 2.9

Why do $ and [ on a data frame column give different output presentation and data types?

I'm new to R. Just learning via online tutorials. My question is:
1) Why does accessing the same column with different syntaxes have different output presentation?
Vertical Display:
> airquality["Ozone"]
Ozone
1 41
2 36
3 12
Horizontal Display:
airquality$Ozone
[1] 41 36 12 18 NA 28 23 19 8
[46] NA 21 37 20 12 13 NA NA NA
[91] 64 59 39 9 16 78 35 66 122
2) Why do the following have different data types?
> class(airquality["Ozone"])
[1] "data.frame"
> class(airquality$Ozone)
[1] "integer"
> class(airquality[["Ozone"]])
[1] "integer"
Same reason for both: airquality["Ozone"] returns a dataframe, whereas airquality$Ozone returns a vector. class() shows you their object types. str() is also good for succinctly showing you an object.
See the help on the '[' operator, which is also known as 'extracting', or the function getElement(). In R, you can call help() on a special character or operator, just surround it with quotes: ?'[' or ?'$' (In Python/C++/Java or most other languages we'd call this 'slicing').
As to why they print differently, print(obj) in R dispatches under-the-hood an object-specific print method. In this case: print.dataframe, which prints the dataframe column(s) vertically, with row-indices, vs print (or print.default) for a vector, which just prints the vector contents horizontally, with no indices.
Now back to extraction with the '[' vs '$' operators:
The most important distinction between ‘[’, ‘[[’ and ‘$’ is that the ‘[’ can select more than one element whereas the other two ’[[’ and ‘$’ select a single element.
There's also a '[[' extract syntax, which will do like '$' does in selecting a single element (vector):
airquality[["Ozone"]]
[1] 41 36 12 18
The difference between [["colname"]] and $colname is that in the former, the column-name can come from a variable, but in the latter, it must be a string. So [[varname]] would allow you to index different columns depending on value of varname.
Read the doc about the exact=TRUE and drop=TRUE options on extract(). Note drop=TRUE only works on arrays/matrices, not dataframes, where it's ignored:
airquality["Ozone", drop=TRUE]
In `[.data.frame`(airquality, "Ozone", drop = TRUE) :
'drop' argument will be ignored
It's all kinda confusing, offputting at first, eccentrically different and quirkily non-self-explanatory. But once you learn the syntax, it makes sense. Until then, it feels like hitting your head off a wall of symbols.
Please take a very brief skim of R-intro and R-lang#Indexing HTML or in PDF. Bookmark them and come back to them regularly. Read them on the bus or plane...
PS as #Henry mentioned, strictly when accessing a dataframe, we should insert a comma to disambiguate that the column-names get applied to columns, not rows: airquality[, "Ozone"]. If we used numeric indices, airquality[,1] and airquality[1] both extract the Ozone column, whereas airquality[1,] extracts the first row. R is applying some cleverness since usually strings aren't row-indices.
Anyway, it's all in the doc... not necessarily all contiguous or clearly-explained... welcome to R :-)

as.numeric is rounding positive values / outputing NA for negative values [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 4 years ago.
I am trying to do something with [R] which should be extremely simple: convert values in a data.frame to numbers, as I need to test for their values and r does not recognize them as number.
When I convert a decimal number to numeric, I get the correct value:
> a <- as.numeric(1.2)
> a
[1] 1.2
However, when I extract a positive value from the data.frame then use as.numeric, the number is rounded up:
> class(slices2drop)
[1] "data.frame"
> slices2drop[2,1]
[1] 1.2
Levels: 1 1.2
> a <- as.numeric(slices2drop[2,1])
> a
[1] 2
Just in case:
> a*100
[1] 200
So this is not a problem with display, the data itself is not properly handled.
Also, when the number is negative, I get NA:
> slices2drop[2,1] <- -1
> a <- as.numeric(slices2drop[2,1])
> a
[1] NA
Any idea as to what may be happening?
This problem has to do with factors. To solve your problem, first coerce your factor variable to be character and then apply as.numeric to get what you want.
> x <- factor(c(1, 1.2, 1.3)) # a factor variable
> as.numeric(x)
[1] 1 2 3
Integers number are returned, one per each level, there are 3 levels: 1, 1.2 and 1.3, therefore 1,2,3 is returned.
> as.numeric(as.character(x)) # this is what you're looking for
[1] 1.0 1.2 1.3
Actually as.numeric is not rounding your numbers, it returns a unique integer per each level in your factor variable.
I faced a similar situation where the conversion of factor into numeric would generate incorrect results.
When you type: ?factor
The Warning mentioned with the factor variables explains this complexity very well and provides the solution to this problem as well.
It's a good place to start working with...
Another thing to consider is that, such conversion would transform NULLs into NAs

Transposing a large dataframe / matrix in R

Am encountering a strange issue transposing a large dataset. I want to get a list of non-linear flight routes (i.e. sub-lists of vectors with 30 vertices each) into a dataframe (with 32 columns for vertices). The list coerces into a data.frame no problem, but then fails when (1) transposing with t(x) and (2) converting to matrix.
To illustrate:
> class(gc)
[1] "list"
> length(gc)
[1] 58278
> gc[[1]][1:30]
[1] 147.2200 147.1606 147.1012 147.0418 146.9824 146.9231 146.8638
[8] 146.8046 146.7454 146.6862 146.6270 146.5679 146.5088 146.4498
[15] 146.3908 146.3318 146.2728 146.2139 146.1550 146.0961 146.0373
[22] 145.9785 145.9197 145.8610 145.8022 145.7435 145.6849 145.6262
[29] 145.5676 145.5090
> gc2 <- data.frame(gc)
> nrow(gc2)
[1] 32
> length(gc2)
[1] 116556
> gc2[1:5,1:5]
lon lat lon.1 lat.1 lon.2
1 147.2200 -9.443383 -80.37861 43.46083 -87.90484
2 147.1606 -9.335072 -80.23135 43.52385 -87.53193
3 147.1012 -9.226751 -80.08379 43.58667 -87.15751
4 147.0418 -9.118420 -79.93591 43.64931 -86.78161
5 146.9824 -9.010080 -79.78773 43.71175 -86.40421
> gc3 <- t(gc2)
> nrow(gc3)
[1] 116556
> length(gc3)
[1] 3729792
> gc3 <- as.matrix(gc2)
> nrow(gc3)
[1] 32
> length(gc3)
[1] 3729792
The 3729792 figure is 116556*32..
Grateful for any assistance!
3729792 figure is 116556*32
That is correct. length() for a matrix tells you the number of elements the matrix holds (which you have verified). length() for a data.frame tells you the number of columns it has.
If you want to compare apples to apples in your data.frame vs. matrix comparison, use nrow() and ncol()
I'm guessing a little at your data structure, but you've hinted that it's a list of numeric vectors.
n_routes <- 5
gc <- replicate(n_routes, runif(30), simplify = FALSE)
names(gc) <- letters[seq_len(n_routes)]
You can convert this list to be a vector with as.data.frame(gc) but note that data frames aren't meant to be transposed (it doesn't make sense if columns have different types.
This means that you need to convert to data frame and then to matrix before transposing.
gc2 <- t(as.matrix(as.data.frame(gc)))
Since all your columns are numeric, you may want to leave it as a matrix. Alternatively, use as.data.frame again to make it a data frame.
as.data.frame(gc2)
As others have pointed out, length has different meanings for matrices and data frames. The definition for data frames – the number of columns – is unintuitive, and a legacy of S compatibility. Use ncol instead, since it gives the same answer, but with more readable code.

Delete rows with negative values

In R I am trying to delete rows within a dataframe (ants) which have a negative value under the column heading Turbidity. I have tried
ants<-ants[ants$Turbidity<0,]
but it returns the following error:
Warning message:
In Ops.factor(ants$Turbidity, 0) : < not meaningful for factors
Any ideas why this may be? Perhaps I need to make the negative values
NA before I then delete all NAs?
Any ideas much appreciated, thank you!
#Joris: result is
str(ants$Turbidity)
num [1:291] 0 0 -0.1 -0.2 -0.2 -0.5 0.1 -0.4 0 -0.2 ...
Marek is right, it's a data problem. Now be careful if you use [as.numeric(ants$Turbidity] , as that one will always be positive. It gives the factor levels (1 to length(ants$Turbidity)), not the numeric factors.
Try this :
tt <- as.numeric(as.character(ants$Turbidity))
which(!is.na(tt))
It will give you a list of indices where the value was not numeric in the first place. This should enable you to first clean up your data.
eg:
> Turbidity <- factor(c(1,2,3,4,5,6,7,8,9,0,"a"))
> tt <- as.numeric(as.character(Turbidity))
Warning message:
NAs introduced by coercion
> which(is.na(tt))
[1] 11
You shouldn't use the as.numeric(as.character(...)) structure to convert problematic data, as it will generate NA's that will mess with the rest. Eg:
> Turbidity[tt > 5]
[1] 6 7 8 9 <NA>
Levels: 0 1 2 3 4 5 6 7 8 9 a
Always do summary(ants) after reading in data, and check if you get what you expect.
It will save you lots of problems. Numeric data is prone to magic conversion to character or factor types.
EDIT. I forget about as.character conversion (see Joris comment).
Message mean that ants$Turbidit is a factor. It will work when you do
ants <- ants[as.numeric(as.character(ants$Turbidity)) > 0,]
or
ants <- subset(ants, as.character(as.numeric(Turbidity)) > 0)
But the real problem is that your data are not prepared to analysis. Such conversion should be done in the beginning. You should be careful cause there could be non-numeric values also.
This should also work using the tidyverse (assuming column is the correct data type).
ants %>% dplyr::filter(Turbidity >= 0)

Resources