There is a vector with a time value. How can I remove a colon and convert a text value to a numeric value. i.e. from "10:01:02" - character to 100102 - numeric. All that I could find is presented below.
> x <- c("10:01:02", "11:01:02")
> strsplit(x, split = ":")
[[1]]
[1] "10" "01" "02"
[[2]]
[1] "11" "01" "02"
If you want to do everything in one line, you can use the destring() function from taRifx to remove everything that isn't a number and convert the result to numeric.
taRifx::destring(x)
This will also work if some of your data's formatted in a different way, such as "10-01-02", though you may have to set the value of keep.
destring("10-10-10", keep = "0-9")
And if you don't want to have to install the taRifx package you can define the destring() function locally.
destring <- function(x, keep = "0-9.-")
{
return(as.numeric(gsub(paste("[^", keep, "]+", sep = ""),
"", x)))
}
We can use gsub to replace : with "". After that, use as.numeric to do the conversion.
x <- as.numeric(gsub(":", "", x, fixed = TRUE))
Or we can use the regex suggest by Soto
x <- as.numeric(gsub('\\D+', '', x))
Try with
x <- as.numeric(x)
and then to make sure
class(x)
Related
I have the following code:
for (fileName in fileNames) {
index <- "0"
if (grepl("_01", fileName, fixed = TRUE)) {
index <- "01"
}
if (grepl("_02", fileName, fixed = TRUE)) {
index <- "02"
}
}
and so on.
My filename is like "31231_sad_01.csv" or "31231_happy_01.csv".
All of my filenames are stored in a character vector fileNames. I loop through each file.
How can I find the past ending part of the filename aka 01 in this case or 02?
I tried using the code I mentioned and it always returns 1 for every value.
Try the following:
#suppose you have your file names in a character vector
fnames <- c("31231_sad_01.csv", "31231_happy_02.csv")
unlist(lapply(str_extract_all(fnames,"\\d+"),'[',2))
It would return a vector
[1] "01" "02"
Vectorized alternatives exist, there is no need for a loop.
To check if the last numeric part of filename ends with a specific number, here 01, we can first extract the numeric part, then run endsWith.
string <- c("31231_sad_01.csv", "bla_215.csv", "test_05.csv")
endsWith(stringr::str_extract(string, "([^_])*(?=.csv)"), "01")
#> [1] TRUE FALSE FALSE
An alternative way is to use sub to extract parts of the strings. Your examples show that the targeted index in each file name is always located after _ and before .csv. We can use this pattern in sub:
library(magrittr)
findex <- function(filename){
filename %>%
sub(".csv.*" , "", .) %>% #extract the part before ".csv"
sub(".*_" , "", .) # exctract the part after "_"
}
This method can be used for various length of the index.
Test:
findex("31231_sad_01.csv")
#[1] "01"
findex("31231_happy_02.csv")
#[1] "02"
findex("31231_happy_213.csv")
#[1] "213"
findex("31231_happy_15213.csv")
#[1] "15213"
Then, you can use lapply or vapply to the vector that contains all the names:
names <- c("31231_happy_1032.csv", "31231_happy_02.csv", "31231_happy_213.csv", "31231_happy_15213.csv")
lapply(names, findex)
#[[1]]
#[1] "1032"
#[[2]]
#[1] "02"
#[[3]]
#[1] "213"
#[[4]]
#[1] "15213"
vapply(names, findex, character(1))
#31231_happy_1032.csv 31231_happy_02.csv 31231_happy_213.csv
"1032" "02" "213"
#31231_happy_15213.csv
"15213"
In case you want to use only base R, this should work:
findex1 <- function(filename) sub(".*_" , "", sub(".csv.*" , "", filename))
vapply(names, findex1, character(1))
# 31231_happy_1032.csv 31231_happy_02.csv 31231_happy_213.csv
# "1032" "02" "213"
#31231_happy_15213.csv
# "15213"
i want to write a function which takes a character Vector(including numbers) as Input and left pads zeroes to the numbers in it. for example this could be an Input Vector :
x<- c("abc124.kk", "77kk-tt", "r5mm")
x
[1] "abc124.kk" "77kk-tt" "r5mm"
each string of the input Vector contains only one Vector but there all in different positions(some are at the end, some in the middle..)
i want the ouput to look like this:
"abc124.kk" "077kk-tt" "r005mm"
that means to put as many leading Zeros to the number included in the string so that it has as many Digits as the longest number.
but i want a function who does this for every string Input not only my example(the x Vector).
i already started extracting the numbers and letters and turned the numbers the way i want them but how can i put them back together and back on the right Position?
my_function<- function(x){
letters<- str_extract_all(x,"[a-z]+")
numbers<- str_extract_all(x, "[0-9]+")
digit_width<-max(nchar(numbers))
numbers_correct<- str_pad(numbers, width=digit_width, pad="0")
}
and what if i have a Vector which includes some strings without numbers? how can i exclude them and get them back without any changes ?
for example if teh Input would be
y<- c("12ab", "cd", "ef345")
the numbers variable Looks like that:
[[1]]
[1] "12"
[[2]]
character(0)
in this case i would want that the ouput at the would look like this:
"012ab" "cd" "ef345"
An option would be using gsubfn to capture the digits, convert it to numeric and then pass it to sprintf for formatting
library(gsubfn)
gsubfn("([0-9]+)", ~ sprintf("%03d", as.numeric(x)), x)
#[1] "abc124.kk" "077kk-tt" "r005mm"
x <- c("12ab", "cd", "ef345")
s = gsub("\\D", "", x)
n = nchar(s)
max_n = max(n)
sapply(seq_along(x), function(i){
if (n[i] < max_n) {
zeroes = paste(rep(0, max_n - n[i]), collapse = "")
gsub("\\d+", paste0(zeroes, s[i]), x[i])
} else {
x[i]
}
})
#[1] "012ab" "cd" "ef345"
I was manipulating my count-data (fcm) and had my Barcode ID's as column names in the format: TCGA.BH.A0DQ.11A.12R.A089.07 etc
I proceeded to use:
CountCol= colnames(fcm)
Barcode = strsplit(as.character(CountCol), ".", fixed=TRUE)
giving me a list of all the split character strings such as :
head(Barcode,2)
[[1]]
[1] "TCGA" "3C" "AAAU" "01A" "11R" "A41B" "07"
[[2]]
[1] "TCGA" "3C" "AALI" "01A" "11R" "A41B" "07"
My question is now how do I put only the first three elements together to make new column names separated by a "-" (i.e. TCGA-3C-AAAU for the first and so forth for the next ~1200 values)
I hope this was clear.
I tried a few methods but keep coming short of the correct solution.
try sapply
sapply(Barcode,function(x){paste(x[1:3],collapse="-")})
You could also use the purrrlibrary for a more simplified code:
library(purrr)
x <- c("TCGA", "3C", "AAAU", "01A", "11R", "A41B", "07" )
y <- c("TCGA", "3C", "AALI", "01A", "11R", "A41B", "07" )
z <- list(x, y)
purrr::map(z, ~paste(.[1:3], collapse = "-"))
[[1]]
[1] "TCGA-3C-AAAU"
[[2]]
[1] "TCGA-3C-AALI"
I faced this issue for some numeric columns in R.Some of negative values in some columns are taken in brackets and column is convert into factor.
How to remove brackets in R and make value to negative? Eg. "(265)" to -265
How can I use gsub function in R to do this? If any other method is available, please suggest.
Here is an alternative. Regex match is made on values that start and end with a round bracket, and contain one or more numeric characters between, returning the middle-group (numeric characters) with a minus-sign in front. The whole lot is then cast to numeric:
as.numeric(gsub("^\\(([1-9]+)\\)$","-\\1",x))
Just in case there is something else going on with numbers:
convert.brackets <- function(x){
if(grepl("\\(.*\\)", x)){
paste0("-", gsub("\\(|\\)", "", x))
} else {
x
}
}
x <- c("123", "(456)", "789")
sapply(x, convert.brackets, USE.NAMES = F)
[1] "123" "-456" "789"
Otherwise simply:
paste0("-", gsub("\\(|\\)", "", x))
I have a dataframe comprising two columns of words. For each row I'd like to identify any letters that occur in only the word in the second column e.g.
carpet carpelt #return 'l'
bag flag #return 'f' & 'l'
dog dig #return 'i'
I'd like to use R to do this automatically as I have 6126 rows.
As an R newbie, the best I've got so far is this, which gives me the unique letters across both words (and is obviously very clumsy):
x<-(strsplit("carpet", ""))
y<-(strsplit("carpelt", ""))
z<-list(l1=x, l2=y)
unique(unlist(z))
Any help would be much appreciated.
The function you’re searching for is setdiff:
chars_for = function (str)
strsplit(str, '')[[1]]
result = setdiff(chars_for(word2), chars_for(word1))
(Note the inverted order of the arguments in setdiff.)
To apply it to the whole data.frame, called x:
apply(x, 1, function (words) setdiff(chars_for(words[2]), chars_for(words[1])))
Use regex :) Paste your word with brackets [] and then use replace function for regex. This regex finds any letter from those in brackets and replaces it with empty string (you can say that it "removes" these letters).
require(stringi)
x <- c("carpet","bag","dog")
y <- c("carplet", "flag", "smog")
pattern <- stri_paste("[",x,"]")
pattern
## [1] "[carpet]" "[bag]" "[dog]"
stri_replace_all_regex(y, pattern, "")
## [1] "l" "fl" "sm"
x <- c("carpet","bag","dog")
y <- c("carpelt", "flag", "dig")
Following (somewhat) with what you were going for with strsplit, you could do
> sx <- strsplit(x, "")
> sy <- strsplit(y, "")
> lapply(seq_along(sx), function(i) sy[[i]][ !sy[[i]] %in% sx[[i]] ])
#[[1]]
#[1] "l"
#
#[[2]]
#[1] "f" "l"
#
#[[3]]
#[1] "i"
This uses %in% to logically match the characters in y with the characters in x. I negate the matching with ! to determine those those characters that are in y but not in x.