Changing a column of a dataframe in R - r

I have a dataframe in R with a column with values as "s1-112", "s10-112", "s3656-112" etc. Now i want to change the values to only the part after "s" and before "-112" that is the number after s. is there a way?

You could use gsub here
x<-c("s1-112", "s10-112", "s3656-112")
gsub("s(.*)-112", "\\1", x)
# [1] "1" "10" "3656"

Or (using #MrFlick's data)
library(stringr)
str_extract(x, perl('\\d+(?=-)'))
#[1] "1" "10" "3656"

Related

Specific Permutations in R

I am trying to create permutations of the alphabet {0,1,2,3} using combinat::permn.
The thing is that I want each one of the permutations to be converted to the form of '%s-%s-%s'..etc and to be stored in a list. For example,
> library(combinat)
> permn(numbers[1:4])
[[1]]
[1] "0" "1" "2" "3"
[[2]]
[1] "0" "1" "3" "2"
.
.
. and so on
But I want to convert the output for all permutations into a list of string sequences of my specific format, i.e. '0-1-2-3', '0-1-3-2 etc.
Use lapply to apply paste on each of the vectors and collapse them with the delimiter you want (in this case "-").
lapply(permn(0:3), paste, collapse = "-")
If you just want the output as a vector instead of a list you could use sapply in place of lapply

Ascending order of vector of numeric characters

I have a vector of numbers of type character.
x = c("5","-.5","-.1",".01",".1","1","3")
Is there a quick and easy way to order this character vector using the numeric value of each character? I can't find a clean way to do this.
So for instance, I want a function
x <- characterOrder(x)
With output:
c("-.5","-.1",".01",".1","1","3", "5")
Thank you!
You can do this in base R using the order function and the as.numeric when you order it by the as.numeric value.
x = c("5","-.5","-.1",".01",".1","1","3")
x[order(as.numeric(x))]
[1] "-.5" "-.1" ".01" ".1" "1" "3" "5"
If you want this in a function:
characterOrder <- function(x) {
return(x[order(as.numeric(x))])
}
You could try mixedsort from gtools
library(gtools)
mixedsort(x)
#[1] "-.5" "-.1" ".01" ".1" "1" "3" "5"

R: Using gsub to replace a digit matched by pattern (n) with (n-1) in character vector

I am trying to match the last digit in a character vector and replace it with the matched digit - 1. I have believe gsub is what I need to use but I cannot figure out what to use as the 'replace' argument. I can match the last number using:
gsub('[0-9]$', ???, chrvector)
But I am not sure how to replace the matched number with itself - 1.
Any help would be much appreciated.
Thank you.
We can do this easily with gsubfn
library(gsubfn)
gsubfn("([0-9]+)", ~as.numeric(x)-1, chrvector)
#[1] "str97" "v197exdf"
Or for the last digit
gsubfn("([0-9])([^0-9]*)$", ~paste0(as.numeric(x)-1, y), chrvector2)
#[1] "str97" "v197exdf" "v33chr138d"
data
chrvector <- c("str98", "v198exdf")
chrvector2 <- c("str98", "v198exdf", "v33chr139d")
Assuming the last digit is not zero,
chrvector <- as.character(1:5)
chrvector
#[1] "1" "2" "3" "4" "5"
chrvector <- paste(chrvector, collapse='') # convert to character string
chrvector <- paste0(substring(chrvector,1, nchar(chrvector)-1), as.integer(gsub('.*([0-9])$', '\\1', chrvector))-1)
unlist(strsplit(chrvector, split=''))
# [1] "1" "2" "3" "4" "4"
This works even if you have the last digit zero:
chrvector <- c(as.character(1:4), '0') # [1] "1" "2" "3" "4" "0"
chrvector <- paste(chrvector, collapse='')
chrvector <- as.character(as.integer(chrvector)-1)
unlist(strsplit(chrvector, split=''))
# [1] "1" "2" "3" "3" "9"

extract numerical suffixes from strings in R

I have this character vector:
variables <- c("ret.SMB.l1", "ret.mkt.l1", "ret.mkt.l4", "vix.l4", "ret.mkt.l5" "vix.l6", "slope.l11", "slope.l12", "us2yy.l2")
Desired output:
> suffixes(variables)
[1] 1 1 4 4 5 6 11 12 2
In other words, I need a function that will return a numeric vector showing the suffixes (each of which be 1 or 2 digits long). Note, I need something that can work with a much larger number of strings which may or may not have numbers somewhere the middle. The numerical suffixes range from 1 to 99.
Many thanks
Just use gsub:
> gsub(".*?([0-9]+)$", "\\1", variables)
[1] "1" "1" "4" "4" "5" "6" "11" "12" "2"
Wrap it in as.numeric if you want the result as a number.
You could use sub function.
> variables <- c("ret.SMB.l1", "ret.mkt.l1", "ret.mkt.l4", "vix.l4", "ret.mkt.l5" ,"vix.l6", "slope.l11", "slope.l12", "us2yy.l2")
> sub(".*\\D", "", variables)
[1] "1" "1" "4" "4" "5" "6" "11" "12" "2"
.*\\D matches all the characters from the start upto the last non-digit character. Replacing those matched characters with an empty string will give you the desired output.

How to edit "row.names" after split and cut2 in R?

I want to edit out some information from row.names that are created automatically once split and cut2 were used. See following code:
#Mock data
date_time <- as.factor(c('8/24/07 17:30','8/24/07 18:00','8/24/07 18:30',
'8/24/07 19:00','8/24/07 19:30','8/24/07 20:00',
'8/24/07 20:30','8/24/07 21:00','8/24/07 21:30',
'8/24/07 22:00','8/24/07 22:30','8/24/07 23:00',
'8/24/07 23:30','8/25/07 00:00','8/25/07 00:30'))
U. <- as.numeric(c('0.2355','0.2602','0.2039','0.2571','0.1419','0.0778','0.3557',
'0.3065','0.1559','0.0943','0.1519','0.1498','0.1574','0.1929'
,'0.1407'))
#Mock data frame
test_data <- data.frame(date_time,U.)
#To use cut2
library(Hmisc)
#Splitting the data into categories
sub_data <- split(test_data,cut2(test_data$U.,c(0,0.1,0.2)))
new_data <- do.call("rbind",sub_data)
test_data <- new_data
You will see that "test_data" would have an extra column "row.names" with values such as "[0.000,0.100).6", "[0.000,0.100).10", etc.
How do I remove "[0.000,0.100)" and keep the number after the "." such as 6 and 10 so that I can reference these rows by their original row number later?
Any other better method to do this?
You could also set the names of sub_data to NULL.
names(sub_data) <- NULL
test_data <- do.call('rbind', sub_data)
row.names(test_data)
#[1] "6" "10" "5" "9" "11" "12" "13" "14" "15" "1" "2" "3" "4" "7" "8"
You could use a Regular Expression (Regex), as follows:
rownames(test_data) = gsub(".*[]\\)]\\.", "", rownames(test_data))
It's cryptic if you're not familiar with Regular Expressions, but it basically says match any sequence of characters (.*) that are followed by either a brace or parenthesis ([]\\)]) and then by a period (\\.) and remove all of it.
The double backslashes are "escapes" indicating that the character following the double-backslash should be interpreted literally, rather than in its special Regex meaning (e.g., . means "match any single character", but \\. means "this is really just a period").
Just for fun, you can also use regmatches
> Names <- rownames(test_data)
> ( rownames(test_data) <- regmatches(Names, regexpr("[0-9]+$", Names)) )
[1] "6" "10" "5" "9" "11" "12" "13" "14" "15" "1" "2" "3" "4" "7" "8"

Resources