This question already has answers here:
Alternate, interweave or interlace two vectors
(2 answers)
Closed 1 year ago.
I want to write a R program that creates the vector 0.1^3, 0.2^1, 0.1^6, 0.2^4, ..., 0.1^36, 0.2^34.
v=c(seq(3,36,3))
w=c(seq(1,34,3))
x=c(0.1^v)
y=c(0.2^w)
z=c(x,y)
Please help.
rbind to a matrix and convert to vector again:
c(rbind(x, y))
Or more directly:
rep(c(0.1, 0.2), 12)^c(rbind(seq(3,36,3), seq(1,34,3)))
You can use matrix to create the desired vector.
c(matrix(z, 2, byrow=TRUE))
# [1] 1.000000e-03 2.000000e-01 1.000000e-06 1.600000e-03 1.000000e-09
# [6] 1.280000e-05 1.000000e-12 1.024000e-07 1.000000e-15 8.192000e-10
#[11] 1.000000e-18 6.553600e-12 1.000000e-21 5.242880e-14 1.000000e-24
#[16] 4.194304e-16 1.000000e-27 3.355443e-18 1.000000e-30 2.684355e-20
#[21] 1.000000e-33 2.147484e-22 1.000000e-36 1.717987e-24
In bioinformatics/microbial ecology literature a fairly common practice is to concatenate multiple sequence alignments of multiple genes prior to building phylogenetic trees. In R terminology it may be clearer to say 'merge' these sequences by the organism they came from, but I'm sure examples are better.
Say these are two multiple sequence alignments.
library(Biostrings)
set1<-AAStringSet(c("IVR", "RDG", "LKS"))
names(set1)<-paste("org", 1:3, sep="_")
set2<-AAStringSet(c("VRT", "RKG", "AST"))
names(set2)<-paste("org", 2:4, sep="_")
set1
A AAStringSet instance of length 3
width seq names
[1] 3 IVR org_1
[2] 3 RDG org_2
[3] 3 LKS org_3
set2
A AAStringSet instance of length 3
width seq names
[1] 3 VRT org_2
[2] 3 RKG org_3
[3] 3 AST org_4
The correct concatenation of these sequences would be
A AAStringSet instance of length 4
width seq names
[1] 6 IVR--- org_1
[2] 6 RDGVRT org_2
[3] 6 LKSRKG org_3
[4] 6 ---AST org_4
The "-" notes a 'gap' (lack of amino acid) in that position, or in this case a lack of a gene to concatenate.
I thought there would be a function to do this in BioStrings, MSA, DECIPHER, or other related packages, but have been unable to find one.
I found the following Q&As, each does not provide the desired output as described.
1: https://support.bioconductor.org/p/38955/
output
A AAStringSet instance of length 6
width seq names
[1] 3 IVR org_1
[2] 3 RDG org_2
[3] 3 LKS org_3
[4] 3 VRT org_2
[5] 3 RKG org_3
[6] 3 AST org_4
May be better described as 'appending' the sequences (joins the two sets vertically).
2: https://support.bioconductor.org/p/39878/
output
A AAStringSet instance of length 2
width seq
[1] 9 IVRRDGLKS
[2] 9 VRTRKGAST
Concatenates sequences in each set, a complete chimera of each set (certainly not desired).
3: How to concatenate two DNAStringSet sequences per sample in R?
output
A AAStringSet instance of length 3
width seq
[1] 6 IVRVRT
[2] 6 RDGRKG
[3] 6 LKSAST
Creates chimeras of sequences by the order they are in. Even worse with different number of sequences (loops and concatenates shorter set...)
4: https://www.biostars.org/p/115192/
Output
A AAStringSet instance of length 2
width seq
[1] 3 IVR
[2] 3 VRT
Only appends the first sequence from each set, not sure why anyone wants this...
I would normally think these kinds of processes would be done with some combination of bash and Python, but I'm using the DECIPHER multiple sequence aligner in R, so it makes sense to do the rest of the processing in R. In the process of writing up this question I came up with an answer that I will post, but I'm kind of expecting someone to point me to the manual I missed that describes the function that does this. Thanks!
So I am a somewhat fanatical user of data.table in R, among many things it is great to merge datasets by names. I found Biostrings::AAStringSets can be converted to matrices using as.matrix and these can be converted to data.table and merged.
set1.dt<-data.table(as.matrix(set1), keep.rownames = TRUE)
set2.dt<-data.table(as.matrix(set2), keep.rownames = TRUE)
set12.dt<-merge(set1.dt, set2.dt, by="rn", all=TRUE)
set12.dt
rn V1.x V2.x V3.x V1.y V2.y V3.y
1: org_1 I V R <NA> <NA> <NA>
2: org_2 R D G V R T
3: org_3 L K S R K G
4: org_4 <NA> <NA> <NA> A S T
This is the correct merge, but needs more work to get the final result.
Need to replace "NA" with "-". I always need to look up this question to remember the best way to do this with a data.table.
Fastest way to replace NAs in a large data.table
#slightly modified from original, added arg "x"
f_dowle = function(dt, x) { # see EDIT later for more elegant solution
na.replace = function(v,value=x) { v[is.na(v)] = value; v }
for (i in names(dt))
eval(parse(text=paste("dt[,",i,":=na.replace(",i,")]")))
}
f_dowle(set12.dt, "-")
Concatenate the sequences (not included the names with !"rn")
set12<-apply(set12.dt[ ,!"rn"], 1, paste, collapse="")
Convert back to AAStringSet and add back names
set12<-AAStringSet(set12)
names(set12)<-set12.dt$rn
Desired output
set12
A AAStringSet instance of length 4
width seq names
[1] 6 IVR--- org_1
[2] 6 RDGVRT org_2
[3] 6 LKSRKG org_3
[4] 6 ---AST org_4
This works, but seems quite cumbersome, especially converting between different data formats. Obviously can wrap it into a function to use more easily, but again seems like this should already be a function in some Bioconductor package...
I am trying to read a TSV file in R using the read.table function.
myTable <- read.table("file_path", sep='\t', header=T)
But when I try the command
names(myTable)
It gives me column names which are odd numbered, while merging the even numbered columns with those.
[1] "GeneSymbol" "GSM480304_JK_C_05.07.mas5.chp"
[3] "GSM480355_JK_C_05.07.mas5.chp" "GSM480480_JK_C_05.07.mas5.chp"
[5] "GSM480555_JK_C_05.07.mas5.chp" "GSM480634_JK_C_05.07.mas5.chp"
These are exact column names and you can see that two column names are separated by space while only ODD numbered column names are listed.
The output should be like this:
[1] "GeneSymbol"
[2] "GSM480304_JK_C_05.07.mas5.chp"
[3] "GSM480355_JK_C_05.07.mas5.chp"
[4] "GSM480480_JK_C_05.07.mas5.chp"
[5] "GSM480555_JK_C_05.07.mas5.chp"
[6] "GSM480634_JK_C_05.07.mas5.chp"
This is creating problem in assigning names to another table where I want to use these column names. Any suggestions ?
As noted in the comments, R is displaying all the columns, but not in the format you expect. This can be forced by casting the result of names() with as.data.frame() as follows:
rawData <- "
Number,Name,Type1,Type2,Total,HP,Attack,Defense,SpecialAtk,SpecialDef,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
7,Squirtle,Water,,314,44,48,65,50,64,43,1,False
8,Wartortle,Water,,405,59,63,80,65,80,58,1,False
9,Blastoise,Water,,530,79,83,100,85,105,78,1,False"
gen01 <- read.csv(textConnection=rawData,header=TRUE)
as.data.frame(names(gen01))
...and the output:
> as.data.frame(names(gen01))
names(gen01)
1 Number
2 Name
3 Type1
4 Type2
5 Total
6 HP
7 Attack
8 Defense
9 SpecialAtk
10 SpecialDef
11 Speed
12 Generation
13 Legendary
This question already has answers here:
Date conversion from POSIXct to Date in R
(3 answers)
Closed 5 years ago.
My question takes a general aspect comparing to which was proposed here How to remove time-field string from a date-as-character variable?.
In fact, suppose I have this date type variable:
> head(DataDia$Date)
[1] "2016-09-13 15:56:30.827" "2016-12-12 13:39:17.537" "2016-09-16 21:57:24.977" "2016-09-23 11:19:22.010"
[5] "2017-01-11 20:06:58.490" "2016-10-21 23:40:43.927"
How do I delete all time-field strings and just keep the date format. SO that I get this:
> head(DataDia$Date)
[1] "2016-09-13" "2016-12-12" "2016-09-16" "2016-09-23"
[5] "2017-01-11" "2016-10-21"
Note please that I am working on a data table. So I need a way using data.table
operations.
Just use as.Date(DataDia$Date).
You Can use:
as.POSIXct(Df$Date,format='%Y-%m-%d',tz= "UTC")
Combining as.Date and as.character
x = c("2016-09-13 15:56:30.827", "2016-12-12 13:39:17.537", "2016-09-16 21:57:24.977", "2016-09-23 11:19:22.010",
"2017-01-11 20:06:58.490", "2016-10-21 23:40:43.927")
y = as.character(as.Date(x, format = "%Y-%m-%d"))
y
[1] "2016-09-13" "2016-12-12" "2016-09-16" "2016-09-23" "2017-01-11" "2016-10-21"
I have a vector of strings that contain both character and numeric values. For example:
a=c("ILLUMINA:420:C2D7UACXX:1:1102:14591:91480","ILLUMINA:420:C2D7UACXX:1:1102:14592:3881","ILLUMINA:420:C2D7UACXX:1:1102:14592:37103","ILLUMINA:420:C2D7UACXX:1:1102:14592:37356")
I'd like to order the vector so that the characters are sorted alphabetically and the numbers numerically. The structure of the strings is always of the format:
"ILLUMINA:420:C2D7UACXX:1:<number>:<number>:<number>", so actually the order only applies to the last three colon separated numbers.
I did try mixedsort {gtools} but the result was the same as using sort and
sort.int, which is:
> mixedsort(a)
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"
Clearly the right order should be:
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"
Is there any immediate solution?
EDIT completely change the solution after OP clarification
You can extract the last 3 elements and order, and you create a data.frame:
dat = read.table(text=sub('.*:1:([0-9]+):([0-9]+):([0-9]+)','\\1|\\2|\\3',a),sep='|')
dat
V1 V2 V3
1 1102 14591 91480
2 1102 14592 3881
3 1102 14592 37103
4 1102 14592 37356
Then you order using 3 columns:
a[with(dat,order(V1,V2,V3))]
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"
gtools::mixedsort does work in your case, actually:
> a=c("ILLUMINA:420:C2D7UACXX:1:1102:14591:91480",
"ILLUMINA:420:C2D7UACXX:1:1102:14592:3881",
"ILLUMINA:420:C2D7UACXX:1:1102:14592:37103",
"ILLUMINA:420:C2D7UACXX:1:1102:14592:37356")
>
> mixedsort(a)
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480"
[2] "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881"
[3] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[4] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"
I am using gtools_3.4.2 and R-3.2.0
Here's a faster solution:
fields.list = strsplit(a,split=":")
sort.dt = data.table(t(sapply(fields.list,function(x) as.numeric(c(x[5],x[6],x[7])))))
sorted.a = v[with(sort.dt,order(V1,V2,V3))]
> sorted.a
[1] "ILLUMINA:420:C2D7UACXX:1:1102:14591:91480" "ILLUMINA:420:C2D7UACXX:1:1102:14592:3881" "ILLUMINA:420:C2D7UACXX:1:1102:14592:37103"
[4] "ILLUMINA:420:C2D7UACXX:1:1102:14592:37356"