Taking characters to the left of a character [duplicate] - r

This question already has answers here:
Splitting a file name into name,extension
(3 answers)
substring of a path variable
(2 answers)
Closed 9 years ago.
Given some data
hello <- c('13.txt','12.txt','14.txt')
I want to just take the numbers and convert to numeric, i.e. remove the .txt

You want file_path_sans_ext from the tools package
library(tools)
hello <- c('13.txt','12.txt','14.txt')
file_path_sans_ext(hello)
## [1] "13" "12" "14"

You can do this with regular expressions using the function gsub on the "hello" object in your original post.
hello <- c('13.txt','12.txt','14.txt')
as.numeric(gsub("([0-9]+).*","\\1",hello))
#[1] 13 12 14

Another regex solution
hello <- c("13.txt", "12.txt", "14.txt")
as.numeric(regmatches(hello, gregexpr("[0-9]+", hello)))
## [1] 13 12 14

If you know your extensions are all .txt then you can use substr()
> hello <- c('13.txt','12.txt','14.txt')
> as.numeric(substr(hello, 1, nchar(hello) - 3))
#[1] 13 12 14

Related

Print a result without a preceding square bracket in R [duplicate]

This question already has an answer here:
R how to not display the number into brackets of the row count in output
(1 answer)
Closed 2 years ago.
x <- 5+2
print(x)
[1] 7
How to suppress [1] and only print 7?
Similarly for characters:
y <- "comp"
print(y)
[1] "comp"
I want to remove both [1] and " ". Any help is appreciated!
Thanks!
With cat, it is possible
cat(x, '\n')
7
Or for characters
cat(dQuote(letters[1], FALSE), '\n')
"a"

Count the number of words without white spaces [duplicate]

This question already has answers here:
Count the number of all words in a string
(19 answers)
Closed 2 years ago.
I have the following string:
str1<-" india hit milestone electricity wind solar"
Number of words contained in it is:
>sapply(strsplit(str1, " "), length)
[1] 7
It is not true because we have a space at the beginning of str1. I tried to trim the white space but:
> stripWhitespace(str1) # by tm package
returns the same situation:
[1] " india hit milestone electricity wind solar"
Why?
You can just use the base function trimws
sapply(strsplit(trimws(str1), " "), length)
[1] 6
Maybe you can try
lengths(gregexpr("\\b\\w+\\b",str1))
such that
> lengths(gregexpr("\\b\\w+\\b",str1))
[1] 6
You could try using stringr::str_trim and stringr::str_split like this:
length(stringr::str_split(stringr::str_trim(str1), pattern=" ", simplify=T))
We can use str_count
library(stringr)
str_count(str1, '\\w+')
#[1] 6

Extracting a number of a string of varying lengths [duplicate]

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 6 years ago.
Pretend I have a vector:
testVector <- c("I have 10 cars", "6 cars", "You have 4 cars", "15 cars")
Is there a way to go about parsing this vector, so I can store just the numerical values:
10, 6, 4, 15
If the problem were just "15 cars" and "6 cars", I know how to parse that, but I'm having difficulty with the strings that have text in front too! Any help is greatly appreciated.
For this particular common task, there's a nice helper function in tidyr called extract_numeric:
library(tidyr)
extract_numeric(testVector)
## [1] 10 6 4 15
We can use str_extract with pattern \\d+ which means to match one or more numbers. It can be otherwise written as [0-9]+.
library(stringr)
as.numeric(str_extract(testVector, "\\d+"))
#[1] 10 6 4 15
If there are multiple numbers in a string, we use str_extract_all which wil1 return a list output.
This can be also done with base R (no external packages used)
as.numeric(regmatches(testVector, regexpr("\\d+", testVector)))
#[1] 10 6 4 15
Or using gsub from base R
as.numeric(gsub("\\D+", "", testVector))
#[1] 10 6 4 15
BTW, some functions are just using the gsub, from extract_numeric
function (x)
{
as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
}
So, if we need a function, we can create one (without using any external packages)
ext_num <- function(x) {
as.numeric(gsub("\\D+", "", x))
}
ext_num(testVector)
#[1] 10 6 4 15
This might also come in handy .
testVector <- gsub("[:A-z:]","",testVector)
testVector <- gsub(" ","",testVector)
> testVector
[1] "10" "6" "4" "15"

strsplit in R not working for $ as split character [duplicate]

This question already has answers here:
How do I strip dollar signs ($) from data/ escape special characters in R?
(4 answers)
Closed 7 years ago.
> str = "a$b$c"
> astr <- strsplit(str,"$")
> astr
[[1]]
[1] "a$b$c"
Still trying to figure the answer out!
You need to escape it
strsplit(str,"\\$")
Another option is to use , fixed = TRUE option:
strsplit(str,"$",fixed=TRUE)
## [1] "a" "b" "c"

R: extract directory out of a path [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I extract a file/folder_name only from a path?
May I ask you how I can get the last subdirectory of a path.
For example I want to get the subdirectory "7" and the following code fails:
Path <- "123\\456\\7"
Split <- strsplit(Path, "\\") # Fails because of 'Trailing backslash'
LastElement <- c[[1]][length(Split[[1]])]
Thank you in advance
You could also use the built-in function basename:
basename(Path)
[1] "7"
You have to add a second pair of \\ to escape the \ to the regex:
> Path <- "123\\456\\7"
> Split <- strsplit(Path, "\\\\")
> Split[[1]][length(Split[[1]])]
[1] "7"

Resources