Modifying the suffix of a specific range of columns in R - r

I have a data frame with N number of columns. In this case 25 and would like to change the suffix only from colum variables 15 to 30.
t0 is the dataframe with the 30 column variables
For all the variable 1 to 30, the following command works perfect:
t0<-data.frame(a=c(1),b=c(1),c=c(1),d=c(1),e=c(1),f=c(1),g=c(1),h=c(1))
colnames(t0) <- paste( colnames(t0), "Sub",sep = "_")
names(t0)
[1] "a_Sub" "b_Sub" "c_Sub" "d_Sub" "e_Sub"
[6] "f_Sub" "g_Sub" "h_Sub" "i_Sub" "ii_Sub"
[15] "j_Sub" "k_Sub" "l_Sub" "m_Sub" "n_Sub"
Desired output:
names(t0)
[1] "a" "b" "c" "d" "e"
[6] "f" "g" "h" "i" "ii"
[15] "j_Sub" "k_Sub" "l_Sub" "m_Sub" "n_Sub"
Any idea how to get this done in R?
Thanks,
Albit

The reason why it didn't work was due to subsetting the dataset and then get the column names. Instead, we can directly get the column names of the entire dataset and subset the columns with numeric index
colnames(t0)[15:30] <- paste(colnames(t0)[15:30], "Sub", sep="_")

Related

return number of specific element of vector based of its name [duplicate]

This question already has answers here:
Convert letters to numbers
(5 answers)
Closed 5 years ago.
I need to return number of element in vector based on vector element name. Lets say i have vector of letters:
myLetters=letters[1:26]
> myLetters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
and what I intent to do is to create/find function that returns me the number of element when called for example:
myFunction(myLetters["b"])
[1] 2
myFunction(myLetters["z"])
[1]26
In summary I need a way to refer to excel columns by writing letters of a column (A,B,C later maybe even AA or further) and to get the number.
If you want to refer to excel columnnames, you could create a reference vector with all possible excel column names:
eg1 <- expand.grid(LETTERS, LETTERS)
eg2 <- expand.grid(LETTERS, LETTERS, LETTERS)
excelcols <- c(LETTERS, paste0(eg1[[2]], eg1[[1]]), paste0(paste0(eg2[[3]], eg2[[2]], eg2[[1]])))
After which you can use which:
> which(excelcols == 'A')
[1] 1
> which(excelcols == 'AB')
[1] 28
> which(excelcols == 'ABC')
[1] 731
If you need to find the number of times specific letter occurs then the following should work:
myLetters = c("a","a", "b")
myFunction = function(myLetters, findLetter){
length(which(myLetters==findLetter))
}
Let find how many times "a" occurs in myLetters:
myFunction(myLetters, "a")
# [1] 2

How to retain character strings using positional indexing?

What I need to do is very similar to what the function below does
x = c("abcde", "ghij", "klmnopq")
tstrsplit(x, "", fixed=TRUE, keep=c(1,3,5), names=c('first','second','third'))
However, I would like to be able to return strings using ranges of values. For example, I would like to specify that in first I want to have the first two letters for each element.
Thus instead of having:
$first
[1] "a" "g" "k"
$second
[1] "c" "i" "m"
$third
[1] "e" NA "o"
The output should look like
$first
[1] "ab" "gh" "kl"
$second
[1] "c" "i" "m"
$third
[1] "e" NA "o"
Background:
I have a large .txt file of records and a lookup table that tells from which position to which position each attribute goes, and the expected max width from which position. The txt file looks like:
James Brown M 01-01-1970
And then in a separate file I have a lookup table that says:
Field Start width
Name 1 7
FamilyN 9 7
Gender 11 1
Incidentally, I would appreciate any feedback on the best way to import this type of large .txt file. I feel like read.table is inappropriate since it tries to reduce to a dataframe format which is not what these files really are.
Something like this maybe:
x = c("abcde", "ghij", "klmnopq")
library(tidyverse)
list(c(1,3,5), c(2,1,1)) %>%
pmap(~ substr(x, .x, .x + .y - 1) %>% replace(., .=="", NA))
[[1]]
[1] "ab" "gh" "kl"
[[2]]
[1] "c" "i" "m"
[[3]]
[1] "e" NA "o"
I've hardcoded the positions. Per #MrFlick's comment, if you have a large number of strings, you'll need some strategy for deciding on the character positions so that you can automate it, rather than hardcoding it.

Determine all characters present in a vector of strings

Say I have the following dataframe consisting of two vectors containing character strings:
df <- data.frame(
"ID"= c("1a", "1b", "1c", "1d"),
"Codes" = c("BX.MX|GX.WX", "MX.RX|BX.YX", "MX.OX|GX.GX", "MX.OX|YX.OX"),
stringsAsFactors = FALSE)
I'd like a simple way to determine which characters have been used in a given vector. In other words, the output of such a function would reveal:
find.characters(df$Codes) # hypothetical function
[1] "B" "G" "M" "W" "X" "R" "Y" "O" "|" "."
find.characters(df$ID) # hypothetical function
[1] "1" "a" "b" "c" "d"
You can create a custom function to do this. The idea is to split the strings into individual characters (strsplit(v1, '')), output will be list. We can unlist it to make it a vector, then get the unique elements. But, this is not sorted yet. Based on the example showed, you may want to sort the letters and other characters differently. So, we use grep to index the 'LETTER' character, and use this to separately sort the subset of vectors and concatenate c( it together.
find.characters <- function(v1){
x1 <- unique(unlist(strsplit(v1, '')))
indx <- grepl('[A-Z]', x1)
c(sort(x1[indx]), sort(x1[!indx]))
}
find.characters(df$Codes)
#[1] "B" "G" "M" "O" "R" "W" "X" "Y" "|" "."
find.characters(df$ID)
#[1] "1" "a" "b" "c" "d"
NOTE: Generally, I would use grepl('[A-Za-z]', x1), but I didn't do that because the expected result for the 'ID' column is different.
find.characters<-function(x){
unique(c(strsplit(split="",x),recursive = T))
}

Selecting and matching multiple vectors in a list in R

I have a list of vectors like this:
>list
[[1]]
[1] "a" "m" "l" "s" "t" "o"
[[2]]
[1] "m" "y" "o" "t" "e"
[[3]]
[1] "n" "a" "s"
[[4]]
[1] "b" "u" "z" "u" "l" "a"
[[5]]
[1] "c" "m" "u" "s" "r" "i" "x" "t"
1-First, I want to select the vector in the table with the highest number of elements (in this case the 5th vector with 8 elements). This is easy.
2-Second I want to select all vectors in the list with length equal or immediately lower than the previous, and intersect them with the previous vector.
Another possibility I have is selecting by the name of the 1st character. In this case this would be equivalent to select the vectors starting with "a" or "b", the first and fourth in the list. In this case what I do not know is how to select multiple vectors in a list knowing their first element.
3-Finally, I want to keep just the intersection with the minimum number of matches.
In this case the the four vector in the list, starting with "b". Then start the process again for the rest of the vectors but considering already the 4th and 5th vector when "intersecting". In this case would be pick up the second element and intersect this element with a "unique() combination" of the 4th and 5th.
I hope I have explained myself!. Is there a way to do this in R without 3-4 "for" and "if" loops? in another words. Is there a clever way to do it using lapply or similar?
This should do it?
list <- strsplit(list("amlsto", "myote","nas","buzula","cmsusrixt"), "")
# find minimum length
lens <- sapply(list, length)
which.min(lens)
# which are same or 1 shorter than previous
inds <- which (lens==c(-1,head(lens, -1)) | lens==c(-1,head(lens,-1))-1)
# get the intersections
inters <- mapply(intersect, list[inds], list[inds-1], SIMPLIFY=FALSE)
#Get items where first in vector is in target set
target <- c("a","b")
isTarget <- sapply(list, "[[",1) %in% target
# Minimum number of overlaps
which.min(lapply(inters, length))

How to add a "1" to end of each variable name in R?

I have a dataframe "data" with 50 variables. For the analysis purpose, I want to rename all these variables by adding 1 at the end of each variable . Following is the procedure that I followed (for a dataframe "datasample" of 10 variables):
names(datasample)
# original colnames for 10 variables
names(datasample)
[1] "a" "z" "y" "b" "bb" "ca" "a3"
[8] "b2" "as" "ask"
#rename 10 variables
names(datasample)<-c("a1","z1","y1","b1","bb1","ca1","a31","b21","as1","ask1")
I was wondering whether there is an efficient way of renaming these multiple variables. Thanks in advance.
names(datasample) <- paste(names(datasample), "1", sep="")
Or, equivalently,
names(datasample) <- paste0(names(datasample), "1")

Resources