I am trying to use ifelse to populate a new column in a data frame.
I want to extract the last digits of a character string in a column if this is longer than 3. if the charachter string is shorter I just want it to give -1...
I already figured out how to extract the last characters of the string if the string is longer than 3 characters.
x<- c("ABCD1", "ABCD2", "ABCD3", "ABCD4", "BC5", "BC6", "BC7")
y<-NULL
dat<-cbind(x,y)
ifelse (nchar(x>3), y=substr(x, 5,5), y=-1)
dat<-cbind(x,y)
view(dat)
when I run this, I get the next error
Error in ifelse(nchar(x > 3), y = substr(x, 4, 5), y = substr(x, 3)) :
formal argument "yes" matched by multiple actual arguments`
What I want is that vector "y" gets the numbers 1,2,3,4,-1,-1,-1
so I can bind both columns latter. If you have a better way of doing this I would appreciate it.
You're almost there! This will work as long as the strings with length > 3 are 4 characters long.
ifelse(nchar(x) > 3, substr(x, 5, 5), -1)
If your strings might be longer than 4 characters:
ifelse(nchar(x) > 3, sub(".*([0-9]).*", "\\1", x), -1)
I am guessing you need a dataframe. Here's what you probably need -
x <- c("ABCD1", "ABCD2", "ABCD3", "ABCD4", "BC5", "BC6", "BC7")
dat <- data.frame(x, stringsAsFactors = F)
dat$y <- ifelse(nchar(dat$x) > 3, as.numeric(substr(dat$x, 5,5)), -1)
x y
1 ABCD1 1
2 ABCD2 2
3 ABCD3 3
4 ABCD4 4
5 BC5 -1
6 BC6 -1
7 BC7 -1
Related
This question already has answers here:
Extracting the last n characters from a string in R
(15 answers)
Extract last 4-digit number from a series in R using stringr
(4 answers)
Closed 3 years ago.
So I have a relatively simple problem (and I think that there may be some duplicates of my question out there) but I just can't seem to figure it out and I would really appreciate any and all help.
I have a dataset and in one column, I have multiple rows of different 11-digit numbers. I hope to obtain the last 6 digits of each number and I hope to be able to create a new column in my dataset with the results.
Below is an example:
random_num <- c(11001100100, 11001100300, 11001100400,
11001100501, 11001100502, 11001100600)
random_stuff <- c(2, 5, 6, 2, 5, 3)
data_frame <- cbind(random_num, random_stuff)
And I hope to get an output that shows something like this:
So far, this is what I have:
conversion <- function (x) {
for (i in nrow(x))
{
c <- as.character[i]
be <- substring(c, seq(1, nchar(c), 1), seq(1, nchar(c), 1))
ad <- paste(be[6], be[7], be[8], be[9], be[10], be[11], sep = "")
final <- as.numeric(ad)
return(final)
}
}
finalr <- conversion(data_frame)
finalr
But I either get the error message saying that
'Error in as.character[i]: object of type 'builtin' is not subsettable' or 'Error in mutate_impl(.data, dots) : Evaluation error: 'to' must be of length 1.'
Will really appreciate any advice. Thank you!
From what I can, the result column/vector is just the last six digits from the random_num vector. So, we can use the modulus to compute this:
random_num <- c(1100100100, 1100100300, 1100100400,
1100100501, 1100100502, 1100100600)
result <- random_num %% 1000000
result
[1] 100100 100300 100400 100501 100502 100600
This answer avoids a potentially unnecessary cast from numeric to character.
I've converted your dataset in a dataframe:
random_num <- c(11001100100, 11001100300, 11001100400,
11001100501, 11001100502, 11001100600)
random_stuff <- c(2, 5, 6, 2, 5, 3)
data_frame <- data.frame(random_num, random_stuff)
data_frame$result <- substring(data_frame$random_num, nchar(data_frame$random_num)-6+1)
> data_frame
random_num random_stuff result
1 11001100100 2 100100
2 11001100300 5 100300
3 11001100400 6 100400
4 11001100501 2 100501
5 11001100502 5 100502
6 11001100600 3 100600
I'm trying to learn R and a sample problem is asking to only reverse part of a string that is in alphabetical order:
String: "abctextdefgtext"
StringNew: "cbatextgfedtext"
Is there a way to identify alphabetical patterns to do this?
Here is one approach with base R based on the patterns showed in the example. We split the string to individual characters ('v1'), use match to find the position of characters with that of alphabet position (letters), get the difference of the index and check if it is equal to 1 ('i1'). Using the logical vector, we subset the vector ('v1'), create a grouping variable and reverse (rev) the vector based on grouping variable. Finally, paste the characters together to get the expected output
v1 <- strsplit(str1, "")[[1]]
i1 <- cumsum(c(TRUE, diff(match(v1, letters)) != 1L))
paste(ave(v1, i1, FUN = rev), collapse="")
#[1] "cbatextgfedtext"
Or as #alexislaz mentioned in the comments
v1 = as.integer(charToRaw(str1))
rawToChar(as.raw(ave(v1, cumsum(c(TRUE, diff(v1) != 1L)), FUN = rev)))
#[1] "cbatextgfedtext"
EDIT:
1) A mistake was corrected based on #alexislaz's comments
2) Updated with another method suggested by #alexislaz in the comments
data
str1 <- "abctextdefgtext"
You could do this in base R
vec <- match(unlist(strsplit(s, "")), letters)
x <- c(0, which(diff(vec) != 1), length(vec))
newvec <- unlist(sapply(seq(length(x) - 1), function(i) rev(vec[(x[i]+1):x[i+1]])))
paste0(letters[newvec], collapse = "")
#[1] "cbatextgfedtext"
Where s <- "abctextdefgtext"
First you find the positions of each letter in the sequence of letters ([1] 1 2 3 20 5 24 20 4 5 6 7 20 5 24 20)
Having the positions in hand, you look for consecutive numbers and, when found, reverse that sequence. ([1] 3 2 1 20 5 24 20 7 6 5 4 20 5 24 20)
Finally, you get the letters back in the last line.
I have a vector:
lst <- c("2,1","7,10","11,0","7,0","10,0","1,1","1,0","4,0","4,1","0,1","6,0")
each element contains two numbers,separated by ",". I would like to get indexes of elements containing "1".
So the index list is expected:
1, 6, 7, 9, 10
grep() will work nicely for this. By default, it returns the indices of the matched pattern.
grep("^1,|,1$", lst)
# [1] 1 6 7 9 10
The regular expression ^1,|,1$ looks to match a string that
^1, = starts with 1,
| OR
,1$ = ends with ,1
each element contains two numbers. my answer is not ideal but I got what I need.
m <- as.numeric(unlist(lapply(strsplit(as.character(lst), "\\,"),"[[",1)))
n <- as.numeric(unlist(lapply(strsplit(as.character(lst), "\\,"),"[[",2)))
sort(unique(c(which(m==1),which(n==1))))
Depending on background and context of this task it might be prudent to turn this vector into a data.frame:
lst <- c("2,1","7,10","11,0","7,0","10,0","1,1","1,0","4,0","4,1","0,1","6,0")
DF <- read.table(text = do.call(paste, list(lst, collapse = "\n")), sep = ",")
which(DF$V1 == 1L | DF$V2 == 1L)
#[1] 1 6 7 9 10
I am importing a key in which each row is an argument setting for a function I have programmed. The goal is to batch test my function by producing outputs for all sets of arguments. That's not terribly important. What is important is that I import a column that contains in each row a value for a range. For instance, "1:5" is meant to be entered into an argument as the value 1:5. I try to coerce using as.numeric("1:5"), but R is not happy with this. Is there a way to coerce this to the string c(1,2,3,4,5) from the character value "1:5"
Your text is valid code, so you can eval(parse it
dat$parsed <- lapply(dat$key, function(x) eval(parse(text=x)))
# key parsed
# 1 1:5 1, 2, 3, 4, 5
# 2 1:6 1, 2, 3, 4, 5, 6
# 3 1:4 1, 2, 3, 4
Data
dat <- read.table(text="key
1:5
1:6
1:4", strings=F, header=T)
Reduce(':', strsplit(x,":")[[1]])
[1] 1 2 3 4 5
If x = "1:5", we can use strsplit to separate the two numbers. We can then use Reduce to execute the operator : on the split.
I have a data frame with several columns. One of those contains Plotids like AEG1, AEG2,..., AEG50, HEG1, HEG2,..., HEG50, SEG1, SEG2,..., SEG50. So, the data frame has 150 rows. Now I want to change only some of these Plotids, so that there is AEG01, AEG02,... instead of AEG1, AEG2, ... So, I just want to add a "0" to some of the column entries. I tried it by using lapply, a for loop, writing a function,... but nothing did the job. There was always the error message:
In if (nchar(as.character(dat_merge$EP_Plotid)) == 4)
paste(substr(dat_merge$EP_Plotid, ... :
the condition has length > 1 and only the first element will be used
So, this was my last try:
Plotid_func <- function(x) {
if(nchar(as.character(dat_merge$EP_Plotid))==4)
paste(substr(dat_merge$EP_Plotid, 1, 3), "0", substr(dat_merge$EP_Plotid, 4, 4), sep="")
}
dat_merge$Plotid <- sapply(dat_merge$EP_Plotid, Plotid_func)
Therewith, I wanted to select only those column entries with four digits. And to only those selected entries, I wanted to add a 0. Can anybody help me? dat_merge is the name of my data frame and EP_Plotid is the column I want to edit. Thanks in advance
Just extract the "string" portion and the "numeric" portion and paste them back together after using sprintf on the numeric portion.
An example:
## "x" is the "column" of plot ids. Here I go up to 12
## to demonstrate the zero padding that it sounds like
## you're looking for
x <- c(paste0("AEG", 1:12), paste0("HEG", 1:12))
## Extract the string values
Strings <- gsub("([A-Z]+)(.*)", "\\1", x)
## Extract the numeric values
Nums <- gsub("([A-Z]+)(.*)", "\\2", x)
## Put them back together
paste0(Strings, sprintf("%02d", as.numeric(Nums)))
# [1] "AEG01" "AEG02" "AEG03" "AEG04" "AEG05" "AEG06"
# [7] "AEG07" "AEG08" "AEG09" "AEG10" "AEG11" "AEG12"
# [13] "HEG01" "HEG02" "HEG03" "HEG04" "HEG05" "HEG06"
# [19] "HEG07" "HEG08" "HEG09" "HEG10" "HEG11" "HEG12"
Or you can just modify your function to actually use the input variable x (which is not happening in your original function)
dat_merge <- data.frame(EP_Plotid = c("AEG1", "AEG2", "AEG50", "HEG1", "HEG2", "HEG50", "SEG1", "SEG2", "SEG50"))
Plotid_func <- function(x) {
if(nchar(as.character(x)) == 4){
paste(substr(x, 1, 3), "0", substr(x, 4, 4), sep="")
} else as.character(x)
}
dat_merge$Plotid <- sapply(dat_merge$EP_Plotid, Plotid_func)
dat_merge
# EP_Plotid Plotid
# 1 AEG1 AEG01
# 2 AEG2 AEG02
# 3 AEG50 AEG50
# 4 HEG1 HEG01
# 5 HEG2 HEG02
# 6 HEG50 HEG50
# 7 SEG1 SEG01
# 8 SEG2 SEG02
# 9 SEG50 SEG50
A vectorized version of your function (which is much better than using sapply which is just a for loop) would be
dat_merge$Plotid <- ifelse(nchar(as.character(dat_merge$EP_Plotid))==4, paste(substr(dat_merge$EP_Plotid, 1, 3), "0", substr(dat_merge$EP_Plotid, 4, 4), sep=""), as.character(dat_merge$EP_Plotid))
Or use a combination of formatC with str_extract from library(stringr)
library(stringr)
x from Ananda's post.
Extract alphabets and numbers separately.
Flag 0's to the numbers with formatC
paste together
paste0(str_extract(x, "[[:alpha:]]+"), formatC(as.numeric(str_extract(x,"\\d+")), width=2, flag=0))
#[1] "AEG01" "AEG02" "AEG03" "AEG04" "AEG05" "AEG06" "AEG07" "AEG08" "AEG09"
#[10] "AEG10" "AEG11" "AEG12" "HEG01" "HEG02" "HEG03" "HEG04" "HEG05" "HEG06"
#[19] "HEG07" "HEG08" "HEG09" "HEG10" "HEG11" "HEG12"