I have huge data set. The columns contain values like A,B,C,D,E,F,G,H and I need to replace them with 1,2,3,4...
[1] "C" "C" "C" "C" "C" "A" "H" "G" "G" "G" "G" "G" "G" "G" "C" "C" "C" "C" "C"
[20] "C" "B" "B" "B" "H" "H" "H" "H" "H" "H" "G" "C" "A" "A" "A" "A" "A" "A" "A"
[30]----
Another similar problem is values in one column are more than 1000 and I need to replace them by unique numbers.
try replace
replace function examples
in your case e.g.
replace(df, "A", 1)
Related
I'm trying to make fasta files for each variation of a gene using a CSV file extracted from gnoMAD. In this function,x is a list with coordinates for each variation, Y is a fasta file opened using the read.fasta function from the seqinr library and data is the file I downloaded from gnomAD. I'm having trouble with the last if statement,supposed to manage SNVs. For some reason,instead of inserting the nucleotide at the position specified, the value is concatenated at the end of the fasta file.
I've read the documentation for the library but haven't found anything about the internal representation for the fasta files.
Example of output:
t" "t" "g" "c" "t" "c" "a" "c" "a" "g" "t" "g" "t" "t" "t" "g"
"a" "g" "c" "a" "g" "t" "g" "c" "t" "g" "a" "g" "c" "a" "c" "a" "a" "a" "g" "c"
"a" "g" "a" "c" "a" "c" "t" "c" "a" "a" "t" "a" "a" "a" "t" "g" "c" "t" "a" "g"
9
"a" "t" "t" "t" "a" "c" "a" "c" "a" "c" "t" "c" "C"
The C with a 9 index should be in the ninth position of the sequence
files<-function(x,y,data){
test<-str_detect(data[ ,"Consequence"],"[del]")
names<-paste(data[ ,"Chromosome"],data[ ,"Position"],data[ ,"Reference"],data[ ,"Alternate"],"ACE2",sep="-")
for (j in 1:length(x)){
copy<-y
if(length(x[[j]])!=1 && test[j]==TRUE){
for(i in x[[j]][1]:x[[j]][2]){
copy[[1]][i]<-NA
}
copy<-copy[[1]][!is.na(copy[[1]])]
}
if(length(x[[j]])==1 && test[j]==TRUE){
copy[[1]][x[[j]][1]]<-NA
copy<-copy[[1]][!is.na(copy[[1]])]
}
if(test[j]==FALSE){
n<-x[[j]][1]
copy[[1]][n]<-complementary(data[j,"Alternate"])
print(copy[[1]][n])
}
putz<-paste(names[j],"fasta",sep=".")
write.fasta(copy,names[j],putz)
}
}
Suppose I have a vector in R:
x<-c("a", "b", "c;d", "e", "f;g;h;i;j")
My question is how to expand x by the seperator ";", namely a desired output would be:
x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
With strsplit:
unlist(strsplit(x, split = ";"))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
Is there an efficient way of programming to solve the following task?
Imagine the following vector:
A<-[a,b,c...k]
And would like to spread it the following way:
Let‘s start with e.g. n=2
B<-[a,a,b,b,c...,k,k]
And now n=4 or any number greater 1
C<-[a,a,a,a,b,...,k,k,k,k]
To solve it via loops seems kind of easy, but is there any function or vector based operation I missed/could use? A tidyverse solutions (for using it in a pipe) would be the best solution for me.
(It is hard to do research on this task as I am a newbie in R and don‘t the correct terms to search for. Any help would be helpful.)
Let
A <- letters[1:11]
A
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k"
If you use function rep with argument each, you get what you want:
rep(A, each=2)
[1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j"
[20] "j" "k" "k"
rep(A, each=3)
[1] "a" "a" "a" "b" "b" "b" "c" "c" "c" "d" "d" "d" "e" "e" "e" "f" "f" "f" "g"
[20] "g" "g" "h" "h" "h" "i" "i" "i" "j" "j" "j" "k" "k" "k"
An option is to use rep with argument times = 2 or 4 and then sort the result. Another option is to use mapply and then c operator.
c(mapply(rep, 2 ,A)) # OR sort(rep(A, times = 2))
#[1] "a" "a" "b" "b" "c" "c" "d" "d" "e" "e" "f" "f" "g" "g" "h" "h" "i" "i" "j" "j"
#[21] "k" "k"
c(mapply(rep,A, 4)) #OR sort(rep(A, times = 2))
#[1] "a" "a" "a" "a" "b" "b" "b" "b" "c" "c" "c" "c" "d" "d" "d" "d" "e" "e" "e" "e"
#[21] "f" "f" "f" "f" "g" "g" "g" "g" "h" "h" "h" "h" "i" "i" "i" "i" "j" "j" "j" "j"
#[41] "k" "k" "k" "k"
I am trying to convert my_dnabin1, a DNAbin file of 55 samples, to fasta format. I am using the following code to convert it into a fasta file.
dnabin_to_fasta <- lapply(my_dnabin1, function(x) as.character(x[1:length(x)]))
This generates a list of 55 samples which looks like:
$SS.11.01
[1] "t" "t" "a" "c" "c" "t" "a" "a" "a" "a" "a" "g" "c" "c" "g" "c" "t" "t" "c" "c" "c" "t" "c" "c" "a" "a"
[27] "c" "c" "c" "t" "a" "g" "a" "a" "g" "c" "a" "a" "a" "c" "c" "t" "t" "t" "c" "a" "a" "c" "c" "c" "c" "a"
$SS.11.02
[1] "t" "t" "a" "c" "c" "t" "a" "a" "a" "a" "a" "g" "c" "c" "g" "c" "t" "t" "c" "c" "c" "t" "c" "c" "a" "a"
[27] "c" "c" "c" "t" "a" "g" "a" "a" "g" "c" "a" "a" "a" "c" "c" "t" "t" "t" "c" "a" "a" "c" "c" "c" "c" "a"
and so on...
However, I want a fasta formatted file as the output that may look something like:
>SS.11.01 ttacctga
>SS.11.02 ttacctga
you can try this
lapply(my_dnabin1, function(x) paste0(x, collapse = ''))
I have searched for this but in vain.
the problem is I have two lists, first with the elements to be repeated
for example
my.list<-list(c('a','b','c','d'), c('g','h'))
and the second list is the number of times each element is to be repeated
repeat.list<-list(c(5,7,6,1), c(2,3))
I would like to create a new list in which each element in my.list is repeated based in repeat.list
i.e.
result:
[[1]]
[1] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c" "c" "c" "d"
[[2]]
[1] "g" "g" "h" "h" "h"
Thank you in advance for your help
Use mapply:
mapply(rep, my.list, repeat.list)
[[1]]
[1] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c" "c" "c" "d"
[[2]]
[1] "g" "g" "h" "h" "h"
lapply also does the trick, but is more verbose:
lapply(seq_along(my.list), function(i)rep(my.list[[i]], repeat.list[[i]]))
[[1]]
[1] "a" "a" "a" "a" "a" "b" "b" "b" "b" "b" "b" "b" "c" "c" "c" "c" "c" "c" "d"
[[2]]
[1] "g" "g" "h" "h" "h"