Convert in R character formulas to numeric - r

How can I convert y vector into a numeric vector.
y <- c("1+2", "0101", "5*5")
when I use
as.numeric(Y)
OUTPUT
Na 101 NA

The following code
sapply(y, function(txt) eval(parse(text=txt)))
should to the work.
The problem is quite deep and you need to know about metaprogramming.
The problem with as.numeric is, that it only converts a string to a numeric, if the string only consists of numbers and one dot. Everything else is converted to NA. In your case, "1+2" contains a plus, hence NA. Or "5*5" contains a multiplication, hence NA. To say R that it should "perform the operation given by a string", you need eval and parse.

An option with map
library(purrr)
map_dbl(y, ~ eval(rlang::parse_expr(.x)))
#[1] 3 101 25

Related

Converting String to Integers R

I am doing something which should be quite simple. I would like to convert a vector of charter string to unique integers. From researching how to do this I found the stroi function which claims to convert strings to integers. However, I am getting weird results and I cannot understand why. When I run the code in the documentation below it works fine:
strtoi(c("ffff", "FFFF"), 16L)
[1] 65535 65535
However, when I apply this function to actually data I get a vector of NAs. Consider the following example:
strtoi(c('spy','spx'),16L)
[1] NA NA
Why does it return NAs in this example? Is there a way to get strtoi to work or do I need to write my own function?
in
strtoi(c("ffff", "FFFF"), 16L)
[1] 65535 65535
you are converting Hexa string to numbers
in this case
strtoi(c('spy','spx'),16L)
[1] NA NA
S, p, y and x are out of the HEXA spectrum....
thats why you get NA
if you try another base it might work... for instance
strtoi(c('spy','spx'),36L)
[1] 37222 37221

How do I extract ints in R vectors?

I'm trying to extract a specific index in a vector, and I keep getting a strange output. I'm using R-Studio and it works fine with string vectors, but I get strange numbers with an "L" after them when I input integers. The same thing happens when I define all_numbers using c(), :, and seq(). Am I doing something incorrectly? I thought I was doing it exactly as my textbook describes it.
# Extracts "Anne" correctly
all_names <- c("Sally", "Pedro", "Anne", "Molly")
extract <- all_names [3]
# Extracts "3L" not 3
all_numbers <- 1:30
extract <- all_numbers[3]
# Extracts "7L" not 7
all_numbers <- 5:30
extract <- all_numbers[3]
# Extracts "12L" not 12
all_numbers <- 10:30
extract <- all_numbers[3]
L is a way in which R represents integers.
class(1L)
#[1] "integer"
class(1)
#[1] "numeric"
In R, indexing starts at 1. So all_numbers[3] in 2nd and 3rd case should be 7 and 12 respectively.
I can't find the relevant document at this moment but if I remember correctly integer takes up less space than numeric class.
If you don't want L in the output convert all_numbers to numeric class.
all_numbers <- as.numeric(all_numbers)

What is the best way in R to identify the first character in a string?

I am trying to find a way to loop through some data in R that contains both numbers and characters and where the first character is found return all values after. For example:
column
000HU89
87YU899
902JUK8
result
HU89
YU89
JUK8
have tried stringr_detct / grepl but the value of the first character is by nature unknown so I am having difficultly pulling it out.
We could use str_extract
stringr::str_extract(x, "[A-Z].*")
#[1] "HU89" "YU899" "JUK8"
data
x <- c("000HU89", "87YU899", "902JUK8")
Ronak's answer is simple.
Though I would also like to provide another method:
column <-c("000HU89", "87YU899" ,"902JUK8")
# Get First character
first<-c(strsplit(gsub("[[:digit:]]","",column),""))[[1]][1]
# Find the location of first character
loc<-gregexpr(pattern =first,column)[[1]][1]
# Extract everything from that chacracter to the right
substring(column, loc, last = 1000000L)
We can use sub from base R to match one or more digits (\\d+) at the start (^) of the string and replace with blank ("")
sub("^\\d+", "", x)
#[1] "HU89" "YU899" "JUK8"
data
x <- c("000HU89", "87YU899", "902JUK8")
In base R we can do
x <- c("000HU89", "87YU899", "902JUK8")
regmatches(x, regexpr("\\D.+", x))
# [1] "HU89" "YU899" "JUK8"

Sprintf Function and Character Dates

I have a data set in which I want to pad zeroes in front of a set of dates that don't have six characters. For example, I have a date that reads 91003 (October 3rd, 2009) and I want it to read 091003, as well as any other date that is missing a zero in front. When I use the sprintf function, the code is:
Data1$entrydate <- sprintf("%06d", data1$entrydate)
But what it spits out is something like 000127, or some other other random number for all the other dates in the problem. I don't understand what's going on, and I would appreciate some help on the issue. Thanks.
PS. I am sometimes also getting a error message that sprintf is only for character values, I don't know if there is any code for numerical values.
I guess you got different results than expected because the column class was factor. You can convert the column to numeric either by as.numeric(as.character(datacolumn)) or as.numeric(levels(datacolumn)). According to ?factor
To transform a factor ‘f’ to approximately its
original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
and slightly more efficient than ‘as.numeric(as.character(f))’.
So, you can use
levels(data1$entrydate) <- sprintf('%06d', as.numeric(levels(data1$entrydate)))
Example
Here is an example that shows the problem
v1 <- factor(c(91003, 91104,90103))
sprintf('%06d', v1)
#[1] "000002" "000003" "000001"
Or, it is equivalent to
sprintf('%06d', as.numeric(v1)) #the formatted numbers are
# the numeric index of factor levels.
#[1] "000002" "000003" "000001"
When you convert it back to numeric, works as expected
sprintf('%06d', as.numeric(levels(v1)))
#[1] "090103" "091003" "091104"

R - as.numeric matrix

I am new to R and I am trying to convert a dataframe to a numeric matrix using the below code
expData <- read.table("GSM469176.txt",header = F)
expVec <- as.numeric(as.matrix(exp_data))
When I use as.matrix, without as.numeric, it returns some numbers (as below)
0.083531 0.083496 0.083464 0.083435 0.083406 0.083377 0.083348"
[9975] "-0.00285 -0.0028274 -0.0028046 -0.0027814 -0.0027574 -0.0027319 -0.0027042
but when I put in the as.numeric, they are all converted to "NA"
I apologize if someone has asked this question before but I can't find a post that solves my problem.
Thanks in advance
You have 2 issues. First, if you examine the structure of the data frame, you'll note that the first column is characters:
head(expData)[, 1:4]
V1 V2 V3 V4
1 YAL002W(cer) 6.1497e-02 6.2814e-02 6.4130e-02
2 YAL002W(par) 7.1352e-02 7.3262e-02 7.5171e-02
3 YAL003W(cer) 2.2428e-02 3.8252e-02 5.4078e-02
4 YAL003W(par) 2.6548e-02 3.6747e-02 4.6947e-02
5 YAL005C(cer) 2.4023e-05 2.3243e-05 2.2462e-05
6 YAL005C(par) 2.0252e-02 2.0346e-02 2.0440e-02
Therefore, trying to convert the complete data frame to numeric will not work as expected.
Second, you are running as.numeric() after as.matrix(), which is converting the matrix to a vector:
x <- as.numeric(as.matrix(expData))
# Warning message:
# NAs introduced by coercion
class(x)
[1] "numeric"
dim(x)
# NULL not a matrix
length(x)
# [1] 14261302
I suggest you try this:
rownames(expData) <- expData$V1
expData$V1 <- NULL
expData <- as.matrix(expData)
dim(expData)
# [1] 7502 1900
class(expData[, 1])
# [1] "numeric"
You get the NA's when R doesn't know how to convert something to a number.
Specifically, the quotation mark in your output tells me that you have one (several) LNG string of numbers. To see why this is bad, try: as.nmeric("-0.00285 -0.0028274")
I don't know what your raw data is like, but as #alexwhan mentioned, the culprit is probably in your call to read.table
To fix it, try explicitly setting the sep argument (ie, next to where you have header)
I would suggest opening up the raw file in a simple text editor (TextEdit.app or notepad, not Word) and seeing how they are separated. M guess is
..., sep="\t"
should do the trick.

Resources