Analysis of DEXSeq count table - r

I am using a platform "Bcbio" for processing RNASeq fastqs. At the end of the process, it generates a number of files like counttables, sailfish raw data and so on. There is also a file called "combined.dexseq" which looks like;
id Control_rep1_1 Control_rep1_2 50ng_1 50ng_2 250ng_1 250ng_2
ENSG00000000003:001 458 495 688 643 619 622
ENSG00000000003:002 143 140 204 153 166 163
ENSG00000000003:003 93 65 117 101 80 112
ENSG00000000003:004 50 47 68 73 54 89
ENSG00000000003:005 66 62 85 109 71 104
ENSG00000000003:006 97 93 152 163 131 153
I want to run a DEXSeq analysis following the vignette but the problem is vignette generates the data form that I have at the very end when featureCounts() function is used.
Can anyone help me with estimating exon fold changes and using other important functions for analysis with using the file format that I have?

Related

How do I extend the y-axis range in an autocorrelation plot?

I have fit a linear regression model to some data in Stata and now I want to generate the Residual Autocorrelation Plot with respect to the variable id.
Below you can find the variables generated from the regression:
clear
input id response pred_response stud_res
101 72 57.55613 1.512287
102 61 51.24638 1.010817
103 49 56.94838 -0.8237054
104 48 43.1188 0.5078933
105 51 60.35182 -0.9997848
106 49 43.1188 0.6123365
107 50 43.60501 0.6678697
108 58 67.50063 -1.00277
109 50 45.17883 0.5053187
110 51 45.66593 0.5525671
111 59 62.28483 -0.3425483
112 65 52.94175 1.259024
113 57 59.49549 -0.2584414
114 53 59.00929 -0.6238151
115 74 68.10928 0.6212816
116 50 54.2797 -0.4418168
117 84 68.35238 1.671826
118 46 50.27308 -0.4435438
119 52 48.0915 0.4033695
120 64 58.04234 0.6188389
121 59 45.17972 1.444254
122 55 54.51646 0.0500989
124 46 44.33432 0.1745929
125 52 51.48948 0.0526441
126 63 64.71586 -0.1833892
127 52 51.00238 0.1038181
128 42 43.84811 -0.1929091
129 57 63.62279 -0.6922547
130 23 42.75415 -2.098808
131 65 58.88685 0.6355278
132 38 48.45526 -1.100601
133 59 54.77137 0.4510341
134 26 43.72021 -1.880954
135 53 60.46791 -0.7770496
136 50 40.68689 0.9796554
137 56 51.9748 0.4227943
138 49 65.43971 -1.751305
139 76 68.83858 0.7565064
140 68 66.53456 0.1536334
141 60 49.66532 1.077015
142 46 43.72021 0.2374953
143 57 59.85926 -0.2981544
144 45 48.45615 -0.3568231
145 46 45.42282 0.0596576
146 64 67.13597 -0.3291895
147 40 41.9024 -0.1997022
148 62 64.7104 -0.283202
149 13 45.78748 -3.629334
150 79 63.25813 1.66337
151 61 59.86015 0.1180355
152 46 42.02484 0.4124526
153 50 45.66593 0.4487194
154 48 51.61103 -0.3727813
155 65 59.37306 0.5858857
156 62 69.08168 -0.748562
157 56 54.5228 0.1524598
158 54 52.09724 0.196739
159 72 60.46156 1.209799
160 57 60.83167 -0.4032753
161 50 41.6593 0.8780965
162 65 55.97507 0.9392686
163 56 66.28511 -1.086957
201 54 49.5392 0.4779044
202 57 50.02451 0.7322617
203 48 49.18 -0.1222386
204 41 41.66019 -0.0684602
205 34 38.38376 -0.4576099
206 54 54.511 -0.0545433
207 38 40.68777 -0.2798446
208 49 41.77539 0.7603746
209 58 54.63255 0.3589811
210 14 47.24063 -3.676064
211 40 39.47226 0.0554914
212 13 39.71537 -2.931103
213 51 45.17426 0.611295
214 44 54.39491 -1.084383
216 42 48.08604 -0.6381954
217 55 46.38978 0.8958285
301 62 63.86043 -0.1954589
302 37 43.23401 -0.6509517
303 46 44.57196 0.147607
304 59 59.8538 -0.0890346
305 35 41.66019 -0.6924483
306 70 66.77221 0.3416052
307 56 58.15843 -0.2244185
308 45 46.99207 -0.2117317
309 50 47.47739 0.2635025
310 52 46.87598 0.5302449
311 52 59.84834 -0.8546749
312 83 49.78776 3.674294
313 57 54.03025 0.3084902
314 38 44.57196 -0.680949
315 40 48.81446 -0.9177504
410 48 39.59927 0.8789283
415 50 40.92999 0.9539063
605 42 36.31649 0.6024827
end
When I generate this graph, the default range for the vertical axis is set to encompass the estimated autocorrelation values. However, I want to extend this axis range over all allowable correlation values (i.e., from negative one to positive one). Unfortunately, when I do this, the axis labels do not adjust to the new range, and the labels get squashed.
Below is my code and output:
* Generate the residual autocorrelation plot
* (taken with respect to id variable)
tsset id
ac stud_res, lags(12) yscale(r(-1,1)) ///
title("Residual Autocorrelation Plot") ///
ytitle("Estimated Autocorrelation") ///
How can I get a plot with the desired extension to the vertical axis, but without having the labels squashed only onto the range of the plot values?
You have two choices and both involve adjusting the ylabel() option while removing yscale():
ac stud_res, lags(12) ylabel(-1(0.4)1) title("Residual Autocorrelation Plot") ///
ytitle("Estimated Autocorrelation")
and
ac stud_res, lags(12) ylabel(#5) title("Residual Autocorrelation Plot") ///
ytitle("Estimated Autocorrelation")

Error with sm.density.compare error

I am an r novice and currently analyzing my data, so forgive me if my error is basic.
I am trying to use sm.density.compare function in the sm package to compare the abundance and diversity of parasite across host species and region.
The data I am trying to analyze is similar to the iris dataset. The iris data is working but when I try to run my data, I get the error "Error in x * w : non-numeric argument to binary operator"
Here is my code:
sm.density.compare(Data_Sheets_FINAL$Total.Endos, Data_Sheets_FINAL$Species)
The species data is broken into three groups (AS, CS, and TSE). Here is my Total.Endos data:
[1] 221 46 413 477 29 294 196 298 592 331 20 339 36 123 119 158 34 258 264 160 224 184 452
[24] 103 17 133 128 311 13 98 387 152 74 1058 13 110 66 9 17 5 22 530 146 73 44 277
[47] 75 27 68 49 115 67 104 108 256 762 93 21 1604 47 13 79 213 32 15 10 38 369 108
[70] 270 70 432 246 14 72 12 34 79 167
Any ideas?
This is the error message that you get if your Species are strings. Try
Data_Sheets_FINAL$Species = factor(Data_Sheets_FINAL$Species)
sm.density.compare(Data_Sheets_FINAL$Total.Endos, Data_Sheets_FINAL$Species)

How do I create SHA256 HMAC using Ironclad in Common Lisp?

There is a python function I am trying to port to Common Lisp:
HEX(HMAC_SHA256(apiSecret, 'stupidstupid'))
How do I go about this with Ironclad?
The closest I've come is:
(ironclad:make-hmac apiSecret :sha256)
But it's not working; it says that apiSecret
The value "V0mn1LLQIc6GNSiBpDfDmRo3Ji8leBZWqMIolNBsnaklScgI"
is not of type
(SIMPLE-ARRAY (UNSIGNED-BYTE 8) (*)).
Ironclad internally works with arrays of bytes.
But it provides tools to convert from ascii strings to such arrays and from bytes to "hex" strings. Here is an interactive session (note that I don't know much about crypto algorithms):
CL-USER> (in-package :ironclad)
#<PACKAGE "IRONCLAD">
Converting the secret:
CRYPTO> (ascii-string-to-byte-array "V0mn1LLQIc6GNSiBpDfDmRo3Ji8leBZWqMIolNBsnaklScgI")
#(86 48 109 110 49 76 76 81 73 99 54 71 78 83 105 66 112 68 102 68 109 82 111
51 74 105 56 108 101 66 90 87 113 77 73 111 108 78 66 115 110 97 107 108 83
99 103 73)
Building the HMAC from previous value:
CRYPTO> (make-hmac * :sha256)
#<HMAC(SHA256) {1006214D93}>
Now, I am not sure this is what you want, but according to the documentation, you are supposed to update the hmac with one or more sequences:
CRYPTO> (update-hmac * (ascii-string-to-byte-array "stupidstupid"))
#<HMAC(SHA256) {1006214D93}>
... and then compute a digest:
CRYPTO> (hmac-digest *)
#(178 90 228 244 244 45 109 163 51 222 77 235 244 173 249 208 144 43 116 130
210 188 62 247 145 153 100 198 119 86 207 163)
The resulting array can be converted to an hex string:
CRYPTO> (byte-array-to-hex-string *)
"b25ae4f4f42d6da333de4debf4adf9d0902b7482d2bc3ef7919964c67756cfa3"
For completeness, here is how you could wrap those functions to replicate the original code, assuming you are in a package that imports the right symbols:
(defun hex (bytes)
(byte-array-to-hex-string bytes))
(defun hmac_sha256 (secret text)
(let ((hmac (make-hmac (ascii-string-to-byte-array secret) :sha256)))
(update-hmac hmac (ascii-string-to-byte-array text))
(hmac-digest hmac)))
Finally:
(HEX (HMAC_SHA256 "V0mn1LLQIc6GNSiBpDfDmRo3Ji8leBZWqMIolNBsnaklScgI"
"stupidstupid"))
=> "b25ae4f4f42d6da333de4debf4adf9d0902b7482d2bc3ef7919964c67756cfa3"

Return id numbers if missing over a set of variables

If I have a large database, including an 'id' var, I want to list all variables of interest, and return back to myself a list of ids that are missing each particular variable.
#Fake Data:
set.seed(11100)
missdata<-data.frame(id<-1:1000,C1<-sample(c(1,NA),1000,replace=TRUE,prob=c(.8,.2)), C2<-sample(c(1,NA),1000,replace=TRUE,prob=c(.8,.2)))
names(missdata)<-c("id","v1","v2")
#One variable solution:
missdatatest<-subset(missdata, is.na(v1),select=id)
missdatatest[1:10,]
> missdatatest[1:10,]
[1] 5 30 44 47 48 49 57 65 68 74
#Looking to build a function...
FindMissings<-function(indata,varslist,printvar){
printonevar<-function(var){
missdatalist<-subset(indata, is.na(var),select=printvar)
print(missdatalist)
}
lapply(vars,printonevar)
}
#Run function:
vars<-c("v1","v2")
FindMissings(missdata,vars,id)
#Error:
> FindMissings(missdata,vars,id)
Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected
Any help would be appreciated. I originally wrote a function to do this in SAS, and it works perfectly fine, but I'm trying to move a lot of my work into R.
There's no need for such a function. Just use lapply:
> lapply(missdata[-1], function(x) which(is.na(x)))
$v1
[1] 5 30 44 47 48 49 57 65 68 74 89 103 107 110 115 119 152 167
[19] 175 176 194 197 199 202 204 212 215 223 231 232 233 239 245 280 281 293...
<<SNIP>>
$v2
[1] 3 6 18 19 22 23 27 28 33 38 41 50 51 55 60 66 68 77
[19] 81 84 86 96 97 99 109 116 117 134 139 141 143 146 148 153 165 168...
<<SNIP>>
If you specifically wanted to return the values from your "id" column (not just the position of the NA values), you can modify the statement to be:
lapply(missdata[-1], function(x) missdata$id[which(is.na(x))])
If your concern is how to use this approach for specific variables, it's pretty straightforward:
vars <- c("v1","v2")
lapply(missdata[vars], function(x) which(is.na(x)))

Create a for loop which prints every number that is x%%3=0 between 1-200

Like the title says I need a for loop which will write every number from 1 to 200 that is evenly divided by 3.
Every other method posted so far generates the 1:200 vector then throws away two thirds of it. What a waste. In an attempt to be eco-conscious, this method does not waste any electrons:
seq(3,200,by=3)
You don't need a for loop, use match function instead, as in:
which(1:200 %% 3 == 0)
[1] 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81
[28] 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
[55] 165 168 171 174 177 180 183 186 189 192 195 198
Two other alternatives:
c(1:200)[c(F, F, T)]
c(1:200)[1:200 %% 3 == 0]

Resources