How do I extract ints in R vectors? - r

I'm trying to extract a specific index in a vector, and I keep getting a strange output. I'm using R-Studio and it works fine with string vectors, but I get strange numbers with an "L" after them when I input integers. The same thing happens when I define all_numbers using c(), :, and seq(). Am I doing something incorrectly? I thought I was doing it exactly as my textbook describes it.
# Extracts "Anne" correctly
all_names <- c("Sally", "Pedro", "Anne", "Molly")
extract <- all_names [3]
# Extracts "3L" not 3
all_numbers <- 1:30
extract <- all_numbers[3]
# Extracts "7L" not 7
all_numbers <- 5:30
extract <- all_numbers[3]
# Extracts "12L" not 12
all_numbers <- 10:30
extract <- all_numbers[3]

L is a way in which R represents integers.
class(1L)
#[1] "integer"
class(1)
#[1] "numeric"
In R, indexing starts at 1. So all_numbers[3] in 2nd and 3rd case should be 7 and 12 respectively.
I can't find the relevant document at this moment but if I remember correctly integer takes up less space than numeric class.
If you don't want L in the output convert all_numbers to numeric class.
all_numbers <- as.numeric(all_numbers)

Related

R order list based on multiple characters from each item

I'd like to sort a list based on more than the first character of each item in that list. The list contains chr data though some of those characters are digits. I've been trying to use a combination of substr() and order but to no avail.
For example:
mylist <- c('0_times','3-10_times','11_20_times','1-2_times','more_than_20_times')
mylist[order(substr(mylist,1,2))]
However, this results in 11-20_times being placed prior to 3-10_times:
[1] "0_times" "1-2_times" "11-20_times" "3-10_times" "more_than_20_times"
Update
To provide further detail on the use case.
My data is similar to the following:
mydf <- data.frame(X1=c("0_times","3-10_times", "11-20_times", "1-2_times","3-10_times",
"0_times","3-10_times", "11-20_times", "1-2_times","3-10_times" ),
X2=c('a','b','c','d','e','a','b','c','d','e'))
mydf2 <- data.frame(names = colnames(mydf))
mydf2$vals <- lapply(mydf, unique)
It is the vectors in mydf2$vals that I would like to sort. While the solution from #AllanCameron functions perfectly on a single vector, I'd like to apply that to each vector contained within mydf2$vals but cannot figure out how.
I have attempted to use unlist to access the lists contained but again can only do this on an individual row basis:
unlist(mydf2[1,'vals'], use.names=FALSE)
My inexperience evident here but I've been struggling with this all day.
This requires a bit of string parsing and converting to numeric:
o <- sapply(strsplit(mylist, '\\D+'), function(x) min(as.numeric(x[nzchar(x)])))
mylist[order(o)]
#> [1] "0_times" "1-2_times" "3-10_times"
#> [4] "11_20_times" "more_than_20_times"

Is there a way to use an inverse pattern for stringr in R?

I'm currently using a large data column in for time.
Time in this format is 00h00.
'\d\dh\d\d' is the regex equivalent I believe.
Though many of the cells have terms like "morning" or other terms that can't be used.
I'm trying to use the str_replace_all() function with no success.
As a follow up question, would I be able plot these times on a histogram for each occurance? That is the end goal here.
Thank you for your suggestions.
If I understand correctly, you just want to filter off non matching entries in your time column, something like this:
df <- df[grepl("^\\d{2}h\\d{2}$", df$time), ]
As an alternative (guess), perhaps you mean to extract 00h00 from each string, removing any other non-compliant portion. This might result in empty strings.
vec <- c("01h23", "02h34 ", "03h45 morning", "morning")
stringr::str_extract(vec, "\\d\\d[Hh]\\d\\d")
# [1] "01h23" "02h34" "03h45" NA
or with base R,
out <- strcapture("(\\d\\d[Hh]\\d\\d)", vec, list(tm = ""))
out
# tm
# 1 01h23
# 2 02h34
# 3 03h45
# 4 <NA>
this returns a data.frame which can be easily extracted into a vector. If you need the non-compliant strings to be empty strings instead of NA, then
out$tm[is.na(out$tm)] <- ""
out
# tm
# 1 01h23
# 2 02h34
# 3 03h45
# 4

Convert Vector From Dataframe

I have one vector created using the following code:
vectorA<-c(1.125,2.250,3.501)
I have another vector stored in a data frame:
vectordf<-data.frame()
vectordf[1,1]<-'1.125,2.250,3.501'
vectorB<-vectordf[1,1]
I need vectorB to be the same as vectorA so I can use it in another function. Right now the two vectors are different as shown below:
printerA<-paste("vectorA=",vectorA)
printerB<-paste("vectorB=",vectorB)
print(printerA)
print(printerB)
dput(vectorA)
dput(vectorB)
[1] "vectorA= 1.125" "vectorA= 2.25" "vectorA= 3.501"
[1] "vectorB= 1.125 2.250 3.501"
c(1.125, 2.25, 3.501)
"1.125 2.250 3.501"
How can I get vectorB into the same format as vectorA? I have tried using as.numeric, as.list, as.array, as.matrix.
This can be done with scan.
printerB<-paste("vectorB=", scan(text = vectordf[1,1], sep = ','))
And now printerA and printerB are
printerA
#[1] "vectorA= 1.125" "vectorA= 2.25" "vectorA= 3.501"
printerB
#[1] "vectorB= 1.125" "vectorB= 2.25" "vectorB= 3.501"
The problem is that what you've called "vectorB" isn't quite a vector as you imagine it -- it's a string vector of length 1 consisting of numbers separated by commas.
Your idea to use as.numeric() is good, but as.numeric() doesn't quite know how to parse the string with commas as a vector of distinct numbers. So, you first want to split the string:
vectorB <- unlist(strsplit(vectorB, ",", fixed = T))
The strsplit() call will chop up vectorB into different vector sub-parts based on where it finds the commas. The data structure it returns is a list, so we flatten it back down to a vector with unlist(). Then, your as.numeric() idea will work:
vectorB <- as.numeric(vectorB)
Obviously you can clean that up into a single line if you wish, but I wanted to clearly illustrate where the hole in your strategy was.
To make the answer more complete: the reason why this mismatch happened at all was in this line early on in your code:
vectordf[1,1]<-'1.125,2.250,3.501'
The type of the object on the right side of <- is a string vector, and it's a vector of length 1. To fix this, you could have used
vectordf[1:3, 1] <- c(1.125, 2.25, 3.501)
because the type of the object on the right is now a numeric vector of length 3. Note that we had to adjust the indexing on the left side by changing the row index to be 1:3.

Convert in R character formulas to numeric

How can I convert y vector into a numeric vector.
y <- c("1+2", "0101", "5*5")
when I use
as.numeric(Y)
OUTPUT
Na 101 NA
The following code
sapply(y, function(txt) eval(parse(text=txt)))
should to the work.
The problem is quite deep and you need to know about metaprogramming.
The problem with as.numeric is, that it only converts a string to a numeric, if the string only consists of numbers and one dot. Everything else is converted to NA. In your case, "1+2" contains a plus, hence NA. Or "5*5" contains a multiplication, hence NA. To say R that it should "perform the operation given by a string", you need eval and parse.
An option with map
library(purrr)
map_dbl(y, ~ eval(rlang::parse_expr(.x)))
#[1] 3 101 25

Return the character associated with the specified Ascii code in R

Good afternoon,
I'm trying to create a cartesian product in R with the letters of the alphabet.
What I'm actually trying is this:
First I create a matrix with the letters
a <- as.matrix(seq(97,122,by=1))
Then I create a data frame with 2 columns with all the combinations
b <- expand.grid(a, a)
Finally I combine the 2 columns
apply(b,1,paste,collapse=" ")
The problem I have is that I can't find a way to transform those "decimals" to its Ascii character.
I have tried several things like rawToChar and gsub unsuccessfully.
Can somebody point me in the right direction?
Thanks
A very easy way to return a character based on its ASCII code is the function intToUtf8. It also works for vectors including multiple integers and returns the corresponding characters as one string.
vec <- 97:122
intToUtf8(vec)
# [1] "abcdefghijklmnopqrstuvwxyz"
intToUtf8(65)
# [1] "A"
First direct method:
res <- do.call(paste, expand.grid(letters, letters))
If you've some other ascii values and you want to get equivalent characters:
val <- 65:96 # whatever values you want the equivalent characters for
mode(val) <- "raw" # set mode to raw
# alternatively, val <- as.raw(65:96)
a <- sapply(val, rawToChar)
res <- do.call(paste, expand.grid(a, a))
To print an ASCII char in R you can use the print function with a backslash \ before an ASCII code number. For example to print the character equivalent of 150 use print("\150").
Or for your example above you could try:
a <- sapply(97:122,function(x) rawToChar(as.raw(x)))
b <- expand.grid(a,a)
c <- t(apply(b,1,function(x) paste(x[1],x[2])))

Resources