Related
I am currently using R to convert data from an experiment into a high quality dataset. One of the features of my code is to detect repetitions of the experiment and label them accordingly. I have written the following code for this:-
DAYREP<-function(a){
CAPS<-c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P",
"Q","R","S","T","U","V","W","X","Y","Z")
if (unique(table(a))==1 && length(unique(table(a)))==1){
return(a)
}
else{
for (i in a){
if (table(a)[[i]]>=2){
CAPS.sum<-CAPS[1:as.vector(table(a)[[i]])-1]
val<-c(i,paste0(i,CAPS.sum))
del<-a[!a %in% i]
vec<-append(del,val,after=i-1)
return(vec)
}
}
}
}
I have used the following vectors of day numbers for testing and they highlight every possible outcome known so far.
a<-c(1,2,3,4,5,6,7,8,9)
b<-c(1,2,3,4,5,6,7,8,8)
c<-c(1,2,3,3,4,5,6)
d<-c(1,1,1,1,1,1)
e<-c(1,2,2,3,4,5,6,6,7)
f<-c(2,7,8,10,11,11,14)
It produces the following output:-
> DAYREP(a)
[1] 1 2 3 4 5 6 7 8 9
> DAYREP(b)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "8A"
> DAYREP(c)
[1] "1" "2" "3" "3A" "4" "5" "6"
> DAYREP(d)
[1] "1" "1A" "1B" "1C" "1D" "1E"
> DAYREP(e)
[1] "1" "2" "2A" "3" "4" "5" "6" "6" "7"
> DAYREP(f)
Error in table(a)[[i]] : subscript out of bounds
The function works on all the tests but e and f. With e it only converts the first set of repeated values, and with f it returns an error message.
I am aware that the problem is being caused by the table(a)[[i]] element calling the frequency value from the table, however I am unsure as to whether or not there is a method to call the values being tabulated from the table. E.g.
> table(e)
e
1 2 3 4 5 6 7
1 2 1 1 1 2 1
The method I am using is calling the bottom line, however I wish to call the top line. Does anybody know of a solution to this?
#cr1msonB1ade has kindly suggested the use of the make.unique function which is able to perform what the above function does with slight variation.
> make.unique(e)
[1] "1" "2" "2.1" "3" "4" "5" "6" "6.1" "7"
Thank you!
As stated in my comment I think what you want is the builtin function make.unique, but there are also some issues with how you are using the table, so I would like to address those as well. When you want to access the values in a table via the name of the variable (i in your for loop), you want to index with single brackets [ not double brackets [[. The other issue is that table converts the values to factors and thus you would have to index with an as.character(i). I don't think this completely fixed your script, but it might get you close enough.
toString seems to convert a whole vector to a single string -
toString(c(1,2))
[1] "1, 2"
how does one map the string conversion over each element; i.e. for the above example, to obtain ("1", "2") ?
> as.character(c(1,2))
[1] "1" "2"
Is the output I get from the R-console.
Since the result is a character vector with a single element, the strategy of using as.character will have no effect. Need to use scan:
> scan(text = toString(0:11), sep="," )
Read 12 items
[1] 0 1 2 3 4 5 6 7 8 9 10 11
Then you can use as.character if that is needed:
> res <- scan(text = toString(0:11), sep="," )
Read 12 items
> as.character(res)
[1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
I prefer paste0 since it's shorter and (from what I can tell) accomplishes the same thing as as.character:
> paste0(1:2)
[1] "1" "2"
> identical(paste0(1:2),as.character(1:2))
[1] TRUE
I have this character vector:
variables <- c("ret.SMB.l1", "ret.mkt.l1", "ret.mkt.l4", "vix.l4", "ret.mkt.l5" "vix.l6", "slope.l11", "slope.l12", "us2yy.l2")
Desired output:
> suffixes(variables)
[1] 1 1 4 4 5 6 11 12 2
In other words, I need a function that will return a numeric vector showing the suffixes (each of which be 1 or 2 digits long). Note, I need something that can work with a much larger number of strings which may or may not have numbers somewhere the middle. The numerical suffixes range from 1 to 99.
Many thanks
Just use gsub:
> gsub(".*?([0-9]+)$", "\\1", variables)
[1] "1" "1" "4" "4" "5" "6" "11" "12" "2"
Wrap it in as.numeric if you want the result as a number.
You could use sub function.
> variables <- c("ret.SMB.l1", "ret.mkt.l1", "ret.mkt.l4", "vix.l4", "ret.mkt.l5" ,"vix.l6", "slope.l11", "slope.l12", "us2yy.l2")
> sub(".*\\D", "", variables)
[1] "1" "1" "4" "4" "5" "6" "11" "12" "2"
.*\\D matches all the characters from the start upto the last non-digit character. Replacing those matched characters with an empty string will give you the desired output.
Could somebody explain me why this does not print all the numbers separately in R.
numberstring <- "0123456789"
for (number in numberstring) {
print(number)
}
Aren't strings just arrays of chars? Whats the way to do it in R?
In R "0123456789" is a character vector of length 1.
If you want to iterate over the characters, you have to split the string into
a vector of single characters using strsplit.
numberstring <- "0123456789"
numberstring_split <- strsplit(numberstring, "")[[1]]
for (number in numberstring_split) {
print(number)
}
# [1] "0"
# [1] "1"
# [1] "2"
# [1] "3"
# [1] "4"
# [1] "5"
# [1] "6"
# [1] "7"
# [1] "8"
# [1] "9"
Just for fun, here are a few other ways to split a string at each character.
x <- "0123456789"
substring(x, 1:nchar(x), 1:nchar(x))
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
regmatches(x, gregexpr(".", x))[[1]]
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
scan(text = gsub("(.)", "\\1 ", x), what = character())
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
Possible with tidyverse::str_split
numberstring <- "0123456789"
str_split(numberstring,boundary("character"))
1. '0''1''2''3''4''5''6''7''8''9'
Here's a naive approach for iterating a string using a for loop and substring. This isn't any better than existing answers for the common case, but it might be useful if you want to break out of the loop early instead of always traversing the entire string once up front, as str_split/scan/substring(x, 1:nchar(x), 1:nchar(x))/regmatches requires.
s <- "0123456789"
if (s != "") {
for (i in 1:nchar(s)) {
print(substring(s, i, i))
}
}
The if is needed to avoid looping backwards from 1 to 0, inclusive of both ends.
Your question is not 100% clear as to the desired outcome (print each character individually from a string, or store each number in a way that the given print loop will result in each number being produced on its own line).
To store numberstring such that it prints using the loop you included:
numberstring<-c(0,1,2,3,4,5,6,7,8,9)
for(number in numberstring){print(number);}
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
>
I want to edit out some information from row.names that are created automatically once split and cut2 were used. See following code:
#Mock data
date_time <- as.factor(c('8/24/07 17:30','8/24/07 18:00','8/24/07 18:30',
'8/24/07 19:00','8/24/07 19:30','8/24/07 20:00',
'8/24/07 20:30','8/24/07 21:00','8/24/07 21:30',
'8/24/07 22:00','8/24/07 22:30','8/24/07 23:00',
'8/24/07 23:30','8/25/07 00:00','8/25/07 00:30'))
U. <- as.numeric(c('0.2355','0.2602','0.2039','0.2571','0.1419','0.0778','0.3557',
'0.3065','0.1559','0.0943','0.1519','0.1498','0.1574','0.1929'
,'0.1407'))
#Mock data frame
test_data <- data.frame(date_time,U.)
#To use cut2
library(Hmisc)
#Splitting the data into categories
sub_data <- split(test_data,cut2(test_data$U.,c(0,0.1,0.2)))
new_data <- do.call("rbind",sub_data)
test_data <- new_data
You will see that "test_data" would have an extra column "row.names" with values such as "[0.000,0.100).6", "[0.000,0.100).10", etc.
How do I remove "[0.000,0.100)" and keep the number after the "." such as 6 and 10 so that I can reference these rows by their original row number later?
Any other better method to do this?
You could also set the names of sub_data to NULL.
names(sub_data) <- NULL
test_data <- do.call('rbind', sub_data)
row.names(test_data)
#[1] "6" "10" "5" "9" "11" "12" "13" "14" "15" "1" "2" "3" "4" "7" "8"
You could use a Regular Expression (Regex), as follows:
rownames(test_data) = gsub(".*[]\\)]\\.", "", rownames(test_data))
It's cryptic if you're not familiar with Regular Expressions, but it basically says match any sequence of characters (.*) that are followed by either a brace or parenthesis ([]\\)]) and then by a period (\\.) and remove all of it.
The double backslashes are "escapes" indicating that the character following the double-backslash should be interpreted literally, rather than in its special Regex meaning (e.g., . means "match any single character", but \\. means "this is really just a period").
Just for fun, you can also use regmatches
> Names <- rownames(test_data)
> ( rownames(test_data) <- regmatches(Names, regexpr("[0-9]+$", Names)) )
[1] "6" "10" "5" "9" "11" "12" "13" "14" "15" "1" "2" "3" "4" "7" "8"