Matching the first and last charcters in a fasta file

Matching the first and last charcters in a fasta file - r

I have a fasta sequences like following:
fasta_sequences
seq1_1
"MTFJKASDKASWQHBFDDFAHJKLDPAL"
seq1_2
"GTRFKJDAIUETZUQOIHHASJKKJHPAL"
seq1_3
"MTFJHAZOQIIREUUBSDFHGTRF"
seq2_1
"JUZGFNBGTFCKAJDASEJIJAS"
seq2_1
"MTFHJHJASBBCMASDOEQSDPAL"
seq2_3
"RTZIIASDPLKLKLKLLJHGATRF"
seq3_1
"HMTFLKBNCYXBASHDGWPQWKOP"
seq3_2
"MTFJKASDJLKIOOIEOPWEIOKOP"
I would like to retain only those sequences which starts with MTF and ends with either KOP or TRF or PAL. At the end it should be like
seq1_1
"MTFJKASDKASWQHBFDDFAHJKLDPAL"
seq1_3
"MTFJHAZOQIIREUUBSDFHGTRF"
seq2_1
"MTFHJHJASBBCMASDOEQSDPAL"
seq3_2
"MTFJKASDJLKIOOIEOPWEIOKOP"
I tried the following code in R but it gave me which contains nothing
new_fasta=grep("^MTF.*(PAL|TRF|KOP)$")
Could anyone help how to get the desired output. Thanks in advance.

This is the way to go i guess;
For every element in fasta_sequences; (if fasta_sequences is a vector containing the sequences)
newseq = list()
it=1
for (i in fasta_sequences){
# i is seq1_1, seq1_2 etc.
a=substr(i,1,3)
if (a=="MTF"){
x=substr(i,(nchar(i)-2),nchar(i))
if ( x=="PAL" | x=="KOP" | x=="TRF"){
newseq[it]=i
it=it+1
}
}
}
Hope it helps

new_fasta=grep("^MTF.*(PAL|TRF|KOP)$",fasta_sequences,perl=True)
^^^^^^^^^
Add perl=True option.

Related

R function to check each element and its related children elements to add a result to a list

Suppose we have given dataframe in R. By 0--7, it means it is taking integer values from 0-7 i.e. 0,1,2,3,4,5,6,7.
I am interested in making a function such that
If a[1,1]>alpha, it goes and checks its children i.e. 0--7 consists of a[1,2] and a[2,2].
So,
{a[2,1]>alpha
{a[4,1]>alpha
{a[5,1]>alpha
ps=list.append(0)
else ps=list.append(1)
}}}
Here, alpha is a a threshold. The ps is appended from values of 0 to 15 based on this criteria.
My code is
{for (i in 1:2)
{ if (a[j,i]>alpha)
{if (i%%2==1}
{j=j*2
if (a[j,i]>alpha
###here i want to go recursively i think and where and how should i add append values to the list
if a[j,i+1]>alpha}
if{i%%2==0}
{}
}}
I am stuck and confused at the same time. Any help or advices would be greatly appreciated.
Thanks

Assigning Values within a dynamically named matrix in R

I am struggling with a loop in R where I have to use dynamic variable names (which I am told is a bad idea from the other posts about dynamic variable names, but I am pretty sure that I need to based on my file structure). Each folder for which the loop enters, there is a different number of files.
The dynamic variable names contain matrices and I need to look in each row/column of the matrix and output a new matrix.
Streamlined example:
var 1 is a matrix(0,40,40)
var 2 is a matrix(0,45,45)
var 3 is a matrix(0,40,40)
For (f in 1:(length of var3s)) # the number of files in the folder, in each folder:
For (g in 1: ncol(var1)) {
For (h in 1: nrow(var1)) {
if (var 1[g,h]>4 & var 2[g,h]<1)
{ var3[f] [g,h]<-1} # <- you cannot do this, but this is ultimately what I want
}
}
I want to take the f-th variable matrix from variable 3's list and assign a value to the location at [g,h]
I've done this before with real variable names, but I am struggling with adding the dynamic element. This is what it looks like and the errors I'm getting.
for (f in 1:(length(LD139_040))){
assign(paste0("LD139_040s",f),
matrix(0,nrow(eval(parse(text=paste0("B139_040",f)))),
ncol(eval(parse(text=paste0("B139_040",f)))))) # this effectively creates my new matrix (var3 above) the size I need based on the files above
for (g in 1:(ncol(eval(parse(text=paste0("B139_040",f)))))){
for (h in 1:(nrow(eval(parse(text=paste0("B139_040",f)))))){
if (S139_040[g,h]>10 &
(assign(paste0("LD139_040",f), as.matrix(raster(LD139_040[f]))))[g,h]>.295 &
(assign(paste0("LD139_040",f), as.matrix(raster(LD139_040[f]))))[g,h]<.33 &
(assign(paste0("B139_040",f), as.matrix(raster(Blue139_040[f]))))[g,h]<180)
# this section also works and will give me a t/f at each location [g,h]
# if true, assign the value 1 to the new matrix LD139_040 at f
{assign(paste0("LD139_040s", f)[g,h], 1)}
}
}
}
I have tried a variety of combinations of eval and assign to organize the last statement, and I get errors such as 'invalid first assignment', incorrect number of dimensions, and target of assignment expands to non-language object.
Thanks for your help!
R version 3.1.1 "Sock it to Me" with library(raster)

This did not require dynamic variable names. At each iteration within the loop the all of the names will change at the same time.
For example, this is how I answered the section in code block 2:
for (f in 1:(length(LD139_040))){
currenttile<-LD139_040[f]
Blue<-B139_040[f]
newmatrix<- matrix(0,nrow(Blue),ncol(Blue))
for (g in 1:(ncol(B139_040[f]))){
for (h in 1:(nrow(B139_040[f]{
if (S139_040[g,h]>10 & currenttile[g,h]>.295 & currenttile[g,h]<.33 & Blue [g,h]<180)
{newmatrix[g,h]<-1}
}
}
}
Put even more simply, since I learned that as long as the matrices are the same dimensions, you do not have to loop through each location:
for (f in 1:(length(LD139_040))){
currenttile<-LD139_040[f]
Blue<-B139_040[f]
newmatrix<- matrix(0,nrow(Blue),ncol(Blue))
currenttile[currenttile >.295 & currenttile <.33]<- 1
Blue[Blue<180]<- 1
newmatrix[Blue==1 & currenttile==1]<- 1
}
So thanks to everyone who tried to decipher this, it was a confusing problem for me to it took a while to figure out how to best approach it, (and obviously how to explain it). I hope this helps someone!

How to condense a file: uniq occurences and sum another field

I have a very large file that looks something like this:
1,22,A
2,10,A
3,4,B
4,3,B
5,20,B
The second column tells me how many instances of the third column there are. So I want to collapse the third column (so that it is effectively uniqued), but add up the second column values. Desired output would be something like:
32,A
27,B
I can come up with some rather complicated ways to do this, but it seems like it ought to be rather simple...

I'm not sure what kind of "math" answer you would expect...
Given you have a file input.txt with the following content:
1,22,A
2,10,A
3,4,B
4,3,B
5,20,B
Create a new file with the following script in Ruby, put in the same directory as your input.txt, and run ruby script.rb from the console:
File.open('output.txt', 'w+') do |file|
result = {}
File.readlines("input.txt").each do |line|
values = line.split(',')
letter = values[2]
letter_value = values[1].to_i
result[letter] ||= 0
result[letter] += letter_value
end
result.each do |letter, value|
file << [value, letter].join(', ')
end
end
Then, look for your result in output.txt in the same directory.

R - create iterable list/dataframe from unique()

I'd like to get the unique elements from a column. That seems straight forward. Both of these work, but I'm not getting the object type I'd like:
userlist <- as.list(somebigdf$username)
userlist <- unique(userlist)
or
userlist <- unique(somebigdf$username)
When I iterate through, I'm not getting the names:
for(i in 1:length(userlist)){
cat(names(userlist[i]), '\n')
}
Returns blank spaces.
for(i in userlist){
cat(i, '\n')
}
Returns integers.
The above function is just an example. I'll be using that but also matching the returned name in an if-else function.
The object types seem to be integers or an extended data.frame with lots of values for each name - which isn't what I want. I would really just like a list of strings something along the lines of userlist = c( the results from unique).
Edit -
This code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}

I'm accepting my own answer. Namely, a working solution - this code will iterate correctly through the names:
for(name in unique(somebigdf$username)){
cat(name, '\n')
}
If someone at a later date has a better answer that seems more in keeping with the question, I will be happy to accept that as the answer.

pattern matching and delete all the lines except the last occurence

I have a txt file which is having 100+ lines, i want to search for pattern and delete all the lines except the last occurrence.
Here are the lines from the txt file.
my pattern search is "string1=" , "string2=", "string3=" , "string4=" and "string5="
string1=hi
string2=hello
string3=welcome
string3=welcome1
string3=
string4=hi
string5=hello
i want to go through the each line and keep "string3=" is empty on the file and remove the "string3=welcome" ,"string3=welcome1"
please help me.

For a single pattern, you can start with something like this:
grep "string3" input | tail -1

#!/usr/bin/perl
my %h;
while (<STDIN>) {
my ($k, $v) = split /=/;
$h{$k} = $v;
}
foreach my $k ( sort keys %h ) {
print "$k=$h{$k}";
}
The perl script here will take your list as stdin and process output as you mention. This assumes you want the keys (string*) as sorted output.
If you only wants the values that start with string1-5 only then you can put a match in the beginning of your while loop as so:
next if ! /^string[1-5]=/;

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Matching the first and last charcters in a fasta file - r

new_fasta=grep("^MTF.*(PAL|TRF|KOP)$",fasta_sequences,perl=True) ^^^^^^^^^ Add perl=True option.

Related

R function to check each element and its related children elements to add a result to a list

Assigning Values within a dynamically named matrix in R

How to condense a file: uniq occurences and sum another field

R - create iterable list/dataframe from unique()

pattern matching and delete all the lines except the last occurence

Categories

Resources