R , Replicating the rownames in data.frame - r

I have a data.frame with dimension [6587 37] and the rownames must repeat after every 18 rows. How i can do this in Rstudio.

If your 18 column names are:
mynames <- c("a", "b", "c", "d", "e", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s")
You can get what you want with:
paste0(rep(mynames,length.out=6587),rep(1:366,each=18,length.out=6587))
Or you can modify the names pasting different things.

Row names in data.frames have to be unique.
> df <- data.frame(x = 1:2)
> rownames(df) <- c("a", "a")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘a’
You could use make.names to make the names unique, but still carry some repeating information.
> make.names(c("a","a"), unique = TRUE)
[1] "a" "a.1"
These could be identified with help from grep
Or you could make a column in df or a second data.frame that holds the information

Related

Can I use a vector as a regex pattern parameter in R?

I want to search a phonetic dictionary (tsv with two columns, one for words, another for phonetic transcription: IPA) for certain consonant clusters according to the type combination (e.g. fricative+plosive, plosive+fricative, plosive+liquid, etc.). I created a vector concatenating the corresponding phonemes:
plosives <- c("p", "b", "t", "d", "k", "g")
fricatives <- c("f", "v", "s", "z", "ʂ", "ʐ", "x")
The point of writing these vectors in the first place I to shorthand and quickly reference each consonant type when writing different regexes. I want to search all two-consonant combinations from these two types (FP, PF, PP, FF). How can I write a regex in R using these vectors as pattern parameters?
I know crossing (fricatives, plosives) gives me all combinations as a string, but I get an error when using it in: CC.all <- str_extract_all(ruphondict$IPA, crossing (fricatives, plosives), simplify = T)
A base R way to form a regex.
paste(
apply(expand.grid(plosives, fricatives), 1, paste0, collapse = ""),
collapse = "|"
)
Note that this is in fact a one-liner.
paste(apply(expand.grid(plosives, fricatives), 1, paste0, collapse = ""),collapse = "|")
You need to make a |-delimited string to use as a regular expression:
plosives <- c("p", "b", "t", "d", "k", "g")
fricatives <- c("f", "v", "s", "z", "ʂ", "ʐ", "x")
my_regex <- (crossing(plosives, fricatives)
|> mutate(comb = paste0(plosives, fricatives))
|> pull(comb)
|> paste(collapse = "|")
)
[1] "bf|bs|bʂ|bv|bx|bz|bʐ|df|ds|dʂ|dv|dx|dz|dʐ|gf|gs|gʂ|gv|gx|gz|gʐ|kf|ks|kʂ|kv|kx|kz|kʐ|pf|ps|pʂ|pv|px|pz|pʐ|tf|ts|tʂ|tv|tx|tz|tʐ"

Removing list items based on presence of a sub-list

I have a list and I would like to remove any list object with a sublist. In the example below, I would like to remove ob2 and ob5 and keep all other objects.
dat <- list(ob1 = c("a", "b", "c"),
ob2 = list(a = c("d")),
ob3 = c("e", "f", "g"),
ob4 = c("h", "i", "j"),
ob5 = list(1:3))
Can anyone offer a solution of how to do this?
We can create a condition with sapply (from base R)
dat[!sapply(dat, is.list)]
Or with Filter from base R
Filter(Negate(is.list), dat)
Or with discard
library(purrr)
discard(dat, is.list)

search for next closest element not in a list

I am trying to replace 2 alphabets (repeats ) from vector of 26 alphabets.
I already have 13 of 26 alphabets in my table (keys), so replacement alphabets should not be among those 13 'keys'.
I am trying to write code to replace C & S by next present alphabet which should not be part of 'keys'.
The following code is replacing repeat C by D and S by T, but those both letters are in my 'keys'. Could someone know how I can implement condition so that code will re-run loop if letter to be replace is already present in 'key'?
# alphabets <- toupper(letters)
keys <- c("I", "C", "P", "X", "H", "J", "S", "E", "T", "D", "A", "R", "L")
repeats <- c("C", "S")
index_of_repeat_in_26 <- which(repeats %in% alphabets)
# index_of_repeat_in_26 is 3 , 19
# available_keys <- setdiff(alphabets,keys)
available <- alphabets[available_keys]
# available <- c("B", "F", "G", "K", "O", "Q", "U", "V", "W", "Y", "Z")
index_available_keys <- which(alphabets %in% available_keys)
# 2 6 7 11 15 17 21 22 23 25 26
for (i in 1:length(repeat)){
for(j in 1:(26-sort(index_of_repeat_in_26)[1])){
if(index_of_repeat_in_26[i]+j %in% index_available_keys){
char_to_replace_in_key[i] <- alphabets[index_of_capital_repeat_in_26[i]+1]
}
else{
cat("\n keys not available to replace \n")
}
}
}
keys <- c("I", "C", "P", "X", "H", "J", "S", "E", "T", "D", "A", "R", "L")
repeats <- c("C", "S")
y = sort(setdiff(LETTERS, keys)) # get the letters not present in 'keys'
y = factor(y, levels = LETTERS) # make them factor so that we can do numeric comparisons with the levels
y1 = as.numeric(y) # keep them numeric to compare
z = factor(repeats, levels = LETTERS)
z1 = as.numeric(z)
func <- function(x) { # so here, in each iteration, the index(in this case 1:4 gets passed)
xx = y1 - z1[x] # taking the difference between each 'repeat' element from all 'non-keys'
xx = which(xx>0)[1]# choose the one with smallest difference(because 'y1' is already sorted. So the first nearest non-key gets selected
r = y[xx] # extract the corresponding 'non-key' element
y <<- y[-xx] # after i get the closest letter, I remove that from global list so that it doesn't get captured the next time
y1 <<- y1[-xx] # similarily removed from the equivalent numeric list
r # return the extracted 'closest non-key' chracter
}
# sapply is also a for-loop by itself, in which a single element get passed ro func at a time.
# Here 'seq_along' is used to pass the index. i.e. for 'C' - 1, for 'S' - 2 , etc gets passed.
ans = sapply(seq_along(repeats), func)
if (any(is.na(ans))){
cat("\n",paste0("keys not available to replace for ",
paste0(repeats[which(is.na(ans))], collapse = ",")) ,
"\n")
ans <- ans[!is.na(ans)]
}
# example 2 with :
repeats <- c("Y", "Z")
# output :
# keys not available to replace for Z
# ans
# [1] Z
Note : to understand how each ieration of sapply() works : you should run debug(func) and then run the sapply() call. You can then check on console how each variable xx, r is getting evaluated. Hope this helps!

R - combining columns by specific conditions

I currently has a data frame as follow:
groups <- data.frame(name=paste("person",c(1:27),sep=""),
assignment1 = c("F","A","B","H", "A", "E", "D", "G", "I", "I", "E", "A", "D", "C", "F", "C", "D", "H", "F", "H", "G", "I", "G", "C", "B", "E", "B"),
assignment2 = c("H", "F", "F", "D", "E", "G", "A", "E", "I", "C", "A", "H", "G", "B", "I", "C", "E", "I", "C", "A", "B", "B", "G", "D", "H", "F", "D"),stringsAsFactors = FALSE)
It would looks like this:
I would like to create a list for each person that only contains the people he had already worked with. For example, person1 is on group F and H for 1st and 2nd assignment respectively and
The member of groups F on 1st assignment are {"person1","person15", "person19"}.
The member of groups D on 2nd assignment are {"person1","person12", "person25"}.
I would like to create a vector for person1 like
{"person15", "person19", "person12", "person25"}.
Any one knows a convenient way to do this in R?
Any help will be appreciated. Thanks in advance.
You could do this:
teammates <- lapply(1:nrow(groups), function(i) {
assig1 <- subset(groups, assignment1 == groups$assignment1[i])$name
assig2 <- subset(groups, assignment2 == groups$assignment2[i])$name
unq_set <- unique(c(assig1, assig2))
return(setdiff(unq_set, groups$name[i]))
})
This takes a vector of row indices, and for each one applies a function that a) gets the names of those where assignments 1 & 2 match the given row, b) gets the unique superset of these, c) returns that, less the name of the person around whom the group is built
The output is a list like this:
[[1]]
[1] "person15" "person19" "person12" "person25"
[[2]]
[1] "person5" "person12" "person3" "person26"
[[3]]
[1] "person25" "person27" "person2" "person26"
...and so on
For more brevity, the following is equivalent (though order inside list items may be different). Same logic as #user5219763's answer for subsetting, but the setdiff part is important
teammates <- lapply(1:nrow(groups), function(i) {
setdiff(
with(groups, name[assignment1 == assignment1[i] |
assignment2 == assignment2[i] ]),
groups$name[i])
})
Here's a solution using dplyr and tidyr:
library(dplyr)
library(tidyr)
groups %>%
gather(var, val, -name) %>%
unite(comb, var, val) %>%
left_join(.,., by = 'comb') %>%
group_by(name.x) %>%
summarise(out = list(name.y))
The heavy lifting is done using the left_join before that we are combining columns, so that we can merge on eg assignment1_f. The output contains itself, and is not corrected for dupes - that is up to you.
However, as #akrun says, if you are doing a lot of this stuff, use igraph
You could use is.element()
workedWith <- function(index,data=groups){
data[is.element(data[,2],data[index,2]) | is.element(data[,3],data[index,3]),1]
}
lapply(X = seq(1:nrow(groups)),FUN = workedWith)

Collapse vector to string of characters with respective numbers of consequtive occurences

I would like to collapse a CIGAR vector to a CIGAR string. By CIGAR vector to String I mean the following:
I want a function that converts:
cigar.vector = c("M", "M", "I", "I", "M", "I", "", "M", "D", "D", "M", "I", "D", "M", "I")
to this:
cigar.string = "2M2I1M1I1M2D1M1I1D1M1I"
and viceversa.
Note that there is a "" (empty character), that does not count. thanks!
rle seems the obvious choice here:
rcv <- rle(cigar.vector[cigar.vector!=""])
paste0(rcv$lengths,rcv$values,collapse="")
#[1] "2M2I1M1I1M2D1M1I1D1M1I"
If you want to get fancy, you could also exploit the fact that rle gives a list of length 2:
paste(do.call(rbind,rle(cigar.vector[cigar.vector!=""])),collapse="")
#[1] "2M2I1M1I1M2D1M1I1D1M1I"
Going backwards will be impossible if only given the result (assign above to result), as it has lost information for the "" cases. Excluding those cases, you can get close enough with something like:
backwards <- rep(
unlist(strsplit(result,"\\d+"))[-1],
as.numeric(unlist(strsplit(result,"[^0-9]")))
)
identical(cigar.vector[cigar.vector!=""],backwards)
#[1] TRUE

Resources