Strange Cook's Values with DESeq2 in R

Strange Cook's Values with DESeq2 in R - r

I'm currently trying to assess fold change when comparing two different sample types using DESeq2 package and I'm getting weird Cook's distance values which are causing major problems.
The two different samples have different amounts of replicates (6 replicates vs 5 replicates) which might be the reason for these weird results (since when i remove one sample the results are no longer "weird").
So, the results:
First of all, the condition I'm using is:
name Type
total_1 t_24
total_2 t_24
total_3 t_24
total_4 t_24
total_5 t_24
nuc_1 n_24
nuc_2 n_24
nuc_3 n_24
nuc_4 n_24
nuc_5 n_24
nuc_6 n_24
Since the total is the control, they need to be the first for the LFC to be positive when up-regulated in the test.
Now the code I'm using is
df_conditions <- data.frame(
sample = df_conditions$name,
condition = df_conditions$Type
)
# The data is only a fraction of the "data_combined" were c(c(9,10,11,12,13,14) corresponds to the nuclear fraction
DEA_matrix <- DESeqDataSetFromMatrix(data_combined[c(9,10,11,12,13,14,23,24,25,26,27)],
df_conditions,
~condition)
DEA <- DESeq(DEA_matrix)
DEA_results <- results(DEA)
When i ran my sample I get around 350 genes with NAs (in both p-value and padj) which, after looking both online and on the manual means that there is a problem with cooks distance value.
After checking these values for the genes using assays(DEA)[["cooks"]] i saw that, for example:
For a given gene which has NAs i got these cook's distance values
nuc_1 nuc_2 nuc_3 nuc_4 nuc_5 nuc_6 total_1 total_2 total_3 total_4 total_5
1.479392e-04 1.548888e-02 1.557630e-04 2.008926e-01 1.012255e-01 2.222557e+01 5.412296e-01 1.156913e+00 9.553727e-01 1.107007e+00 1.146971e+00
Which indicates that nuc_6 is clearly the outlier but the normalized reads do not indicate that (keep in mind that these are approximate values and not the actual values) :
nuc_1 nuc_2 nuc_3 nuc_4 nuc_5 nuc_6 total_1 total_2 total_3 total_4 total_5
400 350 400 450 300 400 50 30 30 30 30
Does anyone have any idea as to why the cook's value is so weird and indicating an outlier when clearly there isnt one? Could it be the difference in the sample number? I've already tried inverting the order of the condition dataframe (put the nuclear first) but the results remain the same so I don't think its wrongly identifying the nuc_6 as part of the total.
Any ideas?
So far I've tried removing the sample nuc_6 which solved the issue.
I've tried changing the order of the conditions (putting nuclear first and then total) but the problem remains.
sessionInfo( )
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_1.0.0 scales_1.2.1 AnnotationDbi_1.60.0 enrichplot_1.18.3
[5] GGally_2.1.2 clusterProfiler_4.6.0 preprocessCore_1.60.0 DESeq2_1.38.1
[9] SummarizedExperiment_1.28.0 Biobase_2.58.0 MatrixGenerics_1.10.0 matrixStats_0.63.0
[13] GenomicRanges_1.49.0 GenomeInfoDb_1.34.4 IRanges_2.32.0 S4Vectors_0.36.1
[17] BiocGenerics_0.44.0 EnhancedVolcano_1.16.0 ggrepel_0.9.2 ggplot2_3.4.0
[21] gplots_3.1.3 WGCNA_1.72-1 fastcluster_1.2.3 dynamicTreeCut_1.63-1
[25] readxl_1.4.1
loaded via a namespace (and not attached):
[1] backports_1.4.1 shadowtext_0.1.2 Hmisc_4.7-2 fastmatch_1.1-3 plyr_1.8.8 igraph_1.3.5
[7] lazyeval_0.2.2 splines_4.2.2 BiocParallel_1.32.4 digest_0.6.30 htmltools_0.5.4 foreach_1.5.2
[13] yulab.utils_0.0.6 GOSemSim_2.24.0 viridis_0.6.2 GO.db_3.16.0 fansi_1.0.3 magrittr_2.0.3
[19] checkmate_2.1.0 memoise_2.0.1 cluster_2.1.4 doParallel_1.0.17 Biostrings_2.66.0 annotate_1.76.0
[25] graphlayouts_0.8.4 jpeg_0.1-10 colorspace_2.0-3 blob_1.2.3 xfun_0.35 dplyr_1.0.10
[31] crayon_1.5.2 RCurl_1.98-1.9 jsonlite_1.8.4 scatterpie_0.1.8 impute_1.72.3 survival_3.4-0
[37] iterators_1.0.14 ape_5.6-2 glue_1.6.2 polyclip_1.10-4 gtable_0.3.1 zlibbioc_1.44.0
[43] XVector_0.38.0 DelayedArray_0.23.2 DOSE_3.24.2 DBI_1.1.3 Rcpp_1.0.9 viridisLite_0.4.1
[49] xtable_1.8-4 htmlTable_2.4.1 gridGraphics_0.5-1 tidytree_0.4.2 foreign_0.8-83 bit_4.0.5
[55] Formula_1.2-4 htmlwidgets_1.6.1 httr_1.4.4 fgsea_1.24.0 RColorBrewer_1.1-3 reshape_0.8.9
[61] pkgconfig_2.0.3 XML_3.99-0.13 farver_2.1.1 nnet_7.3-18 deldir_1.0-6 locfit_1.5-9.6
[67] utf8_1.2.2 ggplotify_0.1.0 tidyselect_1.2.0 rlang_1.0.6 reshape2_1.4.4 munsell_0.5.0
[73] cellranger_1.1.0 tools_4.2.2 cachem_1.0.6 downloader_0.4 cli_3.4.1 generics_0.1.3
[79] RSQLite_2.2.19 gson_0.0.9 stringr_1.5.0 fastmap_1.1.0 ggtree_3.6.2 knitr_1.42
[85] bit64_4.0.5 tidygraph_1.2.2 caTools_1.18.2 purrr_0.3.5 KEGGREST_1.38.0 ggraph_2.1.0
[91] nlme_3.1-160 aplot_0.1.9 compiler_4.2.2 rstudioapi_0.14 png_0.1-8 treeio_1.22.0
[97] tibble_3.1.8 tweenr_2.0.2 geneplotter_1.76.0 stringi_1.7.8 lattice_0.20-45 Matrix_1.5-1
[103] vctrs_0.5.1 pillar_1.8.1 lifecycle_1.0.3 data.table_1.14.6 cowplot_1.1.1 bitops_1.0-7
[109] patchwork_1.1.2 qvalue_2.30.0 R6_2.5.1 latticeExtra_0.6-30 KernSmooth_2.23-20 gridExtra_2.3
[115] codetools_0.2-18 gtools_3.9.4 MASS_7.3-58.1 assertthat_0.2.1 withr_2.5.0 GenomeInfoDbData_1.2.9
[121] parallel_4.2.2 grid_4.2.2 rpart_4.1.19 ggfun_0.0.9 tidyr_1.2.1 HDO.db_0.99.1
[127] ggforce_0.4.1 base64enc_0.1-3 interp_1.1-3

Related

Tidyr's crossing() function not producing expected names in output

I apologize in advance that I could not create a reproducible example, but when I do tidyr::crossing on some dataframes, I get a crossed tibble where the x variables have the form: x$col_name and the y variables have y$col_name. If I do:
crossing(iris,mtcars)
I get names that don't have the x$ prefix, as desired. I checked that the class of the input dataframes are the same as the example above, and there are no duplicate names in the example I'm working with. I can't share the data for the usual privacy reasons. I realize there is not much to work with here, but I'm hoping someone here is experience enough with tidyr to understand this issue.
Here is some session info:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] writexl_1.4.0 readxl_1.4.0 lubridate_1.8.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[7] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] cellranger_1.1.0 pillar_1.7.0 compiler_4.2.0 dbplyr_2.2.0 tools_4.2.0
[6] jsonlite_1.8.0 lifecycle_1.0.1 gtable_0.3.0 pkgconfig_2.0.3 rlang_1.0.2
[11] reprex_2.0.1 rstudioapi_0.13 DBI_1.1.3 cli_3.3.0 haven_2.5.0
[16] xml2_1.3.3 withr_2.5.0 httr_1.4.3 fs_1.5.2 generics_0.1.2
[21] vctrs_0.4.1 hms_1.1.1 grid_4.2.0 tidyselect_1.1.2 glue_1.6.2
[26] R6_2.5.1 fansi_1.0.3 tzdb_0.3.0 modelr_0.1.8 magrittr_2.0.3
[31] backports_1.4.1 scales_1.2.0 ellipsis_0.3.2 rvest_1.0.2 assertthat_0.2.1
[36] colorspace_2.0-3 utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 broom_0.8.0
[41] crayon_1.5.1

Okay, the answer is simple:
crossing(iris,mtcars)
crossing(x = iris,y = mtcars)
This is rather odd behavior in my opinion.

dimdesc() error from FactoMineR package in the building of PCA

Using the data available on FactoMineR package: (http://factominer.free.fr/book/orange.csv), I created a PCA and after a PCA with supplementary information. The latter step when I used the function dimdesc() I obtained an error that surfing through internet I was not able to find a way to solve it.
I post here if someone could help me. I attach below my code. Thanks on advance for your comments/hints.
data_orange<-read.delim("orange.csv", header = T, sep = ";")
data_orange_subset <- data_orange[,1:8]
res.pca<-PCA(data_orange_subset, graph = F)
dimdesc(res.pca,axes=1:2)
--> This works
When considering supplementary information in the PCA:
data_orange_2 <- data_orange[,-c(16,17)]
res.pca.all <- PCA(data_orange_2, graph = F,
quanti.sup = 9:15,
quali.sup = 1)
dimdesc(res.pca.all, axes = 1:2)
Error in if (sum(tabF[, 2] <= proba) > 0) resF <- tabF[tabF[, 2] <= proba, :
missing value where TRUE/FALSE needed
I've checked for NA values in dataframe but it is not the case.
SessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.1
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] factoextra_1.0.7.999 ggplot2_3.3.6 FactoMineR_2.4
loaded via a namespace (and not attached):
[1] ggrepel_0.9.1 Rcpp_1.0.8.3 lattice_0.20-45 tidyr_1.2.0
[5] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 R6_2.5.1
[9] backports_1.4.1 evaluate_0.15 pillar_1.7.0 rlang_1.0.3
[13] rstudioapi_0.13 minqa_1.2.4 car_3.1-0 nloptr_2.0.3
[17] Matrix_1.4-1 DT_0.23 rmarkdown_2.13 labeling_0.4.2
[21] splines_4.1.2 lme4_1.1-29 htmlwidgets_1.5.4 munsell_0.5.0
[25] broom_0.8.0 compiler_4.1.2 xfun_0.31 pkgconfig_2.0.3
[29] faraway_1.0.7 htmltools_0.5.2 flashClust_1.01-2 tidyselect_1.1.2
[33] tibble_3.1.7 gridExtra_2.3 dendextend_1.15.2 viridisLite_0.4.0
[37] fansi_1.0.3 crayon_1.5.1 dplyr_1.0.9 withr_2.5.0
[41] ggpubr_0.4.0 MASS_7.3-56 leaps_3.1 grid_4.1.2
[45] nlme_3.1-157 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2
[49] magrittr_2.0.3 scales_1.2.0 cli_3.3.0 carData_3.0-5
[53] farver_2.1.0 ggsignif_0.6.3 viridis_0.6.2 scatterplot3d_0.3-41
[57] ellipsis_0.3.2 generics_0.1.2 vctrs_0.4.1 boot_1.3-28
[61] ggsci_2.9 tools_4.1.2 glue_1.6.2 purrr_0.3.4
[65] abind_1.4-5 fastmap_1.1.0 yaml_2.3.5 colorspace_2.0-3
[69] cluster_2.1.3 rstatix_0.7.0 knitr_1.39

Using variables inside function R. Error in is_data_pronoun(expr)

I have created a function in R and have been using it in research for well over a month now. This function I have created takes use of the embrace operator {{. The purpose of the function isn't really important, but basically at the beginning of my function I created a tibble from the data given in which I use through the function.
f <- function(data, x, y, z){
tb <- data %>%
transmute("var1" = {{x}},
"var2" = {{y}},
"var3" = {{z}})
# Do some stuff with tb
return(tb)
}
My data lets call df already has the variable names x, y, and z so I have been able to just use function by just putting in the data name as shown below.
df <- tibble("x" = 1:3,
"y" = 4:6,
"z" = 7:9)
f(data = df)
> output
However, today I installed tidymodels and right after I installed it I have been getting an error.
f(data = df)
>Error in is_data_pronoun(expr) :
argument "expr" is missing, with no default
It seems to fix this error all I have to do is give the variables names in the function and it works as show below.
f(data = df, x = x, y = y, z = z)
># A tibble: 3 x 3
var1 var2 var3
<int> <int> <int>
1 1 4 7
2 2 5 8
3 3 6 9
This is kinda annoying as now I would have to go all throughout my file when I use the function and put x = x, y = y, z = z in the function. Does anyone have any idea why I am getting this error and why it all the sudden as come up and how to fix it? I am planning on publishing the function for others to use so thats why I'm using the {{. I have also already completely uninstalled R and all my packages and reinstalled what I was using except for tidymodels and am still getting the error. My guess is it has something to do with updated version of dplyr?
Here is my session info
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tseries_0.10-50 knitr_1.38
[3] tidyquant_1.0.3 quantmod_0.4.18
[5] TTR_0.24.3 PerformanceAnalytics_2.0.4
[7] xts_0.12.1 zoo_1.8-9
[9] lubridate_1.8.0 forecast_8.16
[11] timetk_2.8.0 forcats_0.5.1
[13] stringr_1.4.0 dplyr_1.0.8
[15] purrr_0.3.4 readr_2.1.2
[17] tidyr_1.2.0 tibble_3.1.6
[19] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ggsignif_0.6.3 ellipsis_0.3.2 class_7.3-20
[5] fs_1.5.2 rstudioapi_0.13 ggpubr_0.4.0 listenv_0.8.0
[9] furrr_0.2.3 prodlim_2019.11.13 fansi_1.0.3 xml2_1.3.3
[13] codetools_0.2-18 splines_4.1.3 jsonlite_1.8.0 broom_0.7.12
[17] dbplyr_2.1.1 compiler_4.1.3 httr_1.4.2 backports_1.4.1
[21] assertthat_0.2.1 Matrix_1.4-0 fastmap_1.1.0 cli_3.2.0
[25] htmltools_0.5.2 tools_4.1.3 gtable_0.3.0 glue_1.6.2
[29] Rcpp_1.0.8.3 carData_3.0-5 cellranger_1.1.0 fracdiff_1.5-1
[33] vctrs_0.4.0 urca_1.3-0 nlme_3.1-155 lmtest_0.9-40
[37] timeDate_3043.102 gower_1.0.0 xfun_0.30 globals_0.14.0
[41] rvest_1.0.2 lifecycle_1.0.1 rstatix_0.7.0 future_1.24.0
[45] MASS_7.3-55 scales_1.1.1 ipred_0.9-12 hms_1.1.1
[49] parallel_4.1.3 yaml_2.3.5 curl_4.3.2 rpart_4.1.16
[53] stringi_1.7.6 hardhat_0.2.0 lava_1.6.10 rlang_1.0.2
[57] pkgconfig_2.0.3 rsample_0.1.1 evaluate_0.15 lattice_0.20-45
[61] recipes_0.2.0 tidyselect_1.1.2 parallelly_1.31.0 magrittr_2.0.3
[65] R6_2.5.1 generics_0.1.2 DBI_1.1.2 pillar_1.7.0
[69] haven_2.4.3 withr_2.5.0 survival_3.2-13 abind_1.4-5
[73] nnet_7.3-17 future.apply_1.8.1 modelr_0.1.8 crayon_1.5.1
[77] car_3.0-12 Quandl_2.11.0 utf8_1.2.2 tzdb_0.3.0
[81] rmarkdown_2.13 grid_4.1.3 readxl_1.4.0 reprex_2.0.1
[85] digest_0.6.29 munsell_0.5.0 quadprog_1.5-8

Case when doesnt run for factor columns [duplicate]

This question already has answers here:
case_when in mutate pipe
(7 answers)
Closed 2 years ago.
THis seems fair enough, maybe its a bug or I am missing something very basic. I try to convert Species to binary variable & hence using case when for a simple operation, however receive an error not sure should arise.
iris %>%
dplyr::mutate(Species=as.factor(Species),
Species=case_when(Species=="setosa"~"virginica",
TRUE~Species))
Error: Problem with `mutate()` input `Species`.
x must be a character vector, not a `factor` object.
i Input `Species` is `case_when(Species == "setosa" ~ "virginica", TRUE ~ Species)`.
Details on session info
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] conflicted_1.0.4 extrafontdb_1.0 extrafont_0.17 forcats_0.5.0
[5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4
[9] tidyverse_1.3.0 ggplot2_3.3.2 dplyr_1.0.2 stringr_1.4.0
loaded via a namespace (and not attached):
[1] qpdf_1.1 xfun_0.19 tidyselect_1.1.0
[4] haven_2.3.1 snakecase_0.11.0 colorspace_1.4-1
[7] vctrs_0.3.4 generics_0.1.0 usethis_1.6.3
[10] htmltools_0.5.0 yaml_2.2.1 utf8_1.1.4
[13] rlang_0.4.8 pillar_1.4.6 glue_1.4.2
[16] withr_2.3.0 DBI_1.1.0 dbplyr_2.0.0
[19] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0
[22] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
[25] rvest_0.3.6 memoise_1.1.0 evaluate_0.14
[28] knitr_1.30 curl_4.3 fansi_0.4.1
[31] Rttf2pt1_1.3.8 broom_0.7.2 pdftools_2.3.1
[34] Rcpp_1.0.5 scales_1.1.1 backports_1.2.0
[37] jsonlite_1.7.1 fs_1.5.0 hms_0.5.3
[40] askpass_1.1 digest_0.6.27 stringi_1.5.3
[43] grid_4.0.3 cli_2.1.0 tools_4.0.3
[46] magrittr_1.5 crayon_1.3.4 pkgconfig_2.0.3
[49] ellipsis_0.3.1 xml2_1.3.2 reprex_0.3.0
[52] lubridate_1.7.9 tidytuesdayR_1.0.1 assertthat_0.2.1
[55] rmarkdown_2.5 httr_1.4.2 rstudioapi_0.12
[58] R6_2.5.0 compiler_4.0.3

Using case_when on factor variables is bit tricky.
case_when is type strict meaning all the values should evaluate to same type. The first value that you have is of type character ("virginica") and the TRUE value is of type factor hence you get a type mismatch error there. Also all the values should have factor with same levels as your original data. So incorporating all these changes you could do :
library(dplyr)
iris %>%
mutate(Species=case_when(Species == "setosa" ~
factor("virginica", levels = unique(.$Species)),
TRUE ~ Species))

The iris data set already defaults to having the Species column by a factor. You want character type here, so:
iris %>%
dplyr::mutate(Species=as.character(Species),
Species=case_when(Species=="setosa" ~ "virginica", TRUE ~ Species))

Error in extracting several sequences from a .fasta file using a genomic range object as reference

I have a fasta file that corresponds to my reference genome and a vcf file that corresponds to the SNP call of my data. I would like to get the sequence of each one of the SNPs from my fasta.
For that, using R I loaded the vcf file and extract a genomic range object from it using the commands:
vcf.fn<-"SNPsAcrossAlltheIndividuals.vcf"
vcf <- readVcf(vcf.fn, verbose=FALSE)
SNPrange <- vcf#rowRanges
I extended the position of the SNP to one base each side, but I will not consider it here, since It will add more bias to my question.
After that also using R, I used Rsamtools package to read the fasta using the follow commands:
library(Rsamtools)
file_path <- "F1.fasta"
indexFa(file_path)
fa = FaFile("F1.fasta")
I checked if all the SNPs (or extended windows) were not out of boundaries of my fasta using this command:
gr = as(seqinfo(fa), "GRanges")
findOverlaps(gr, SNPrange)
And finally, I run the command do get the sequence from the SNPrange using my fasta file. However I got the follow error:
seq_ <-getSeq(fa, SNPrange)
Error in value[[3L]](cond) :
record 12177 (chr7:88167221-88167221) failed
file: F1.fasta
I noticed that other people had the same problem, but none got a solution, so I tried to solve it my way. I tried to getSeq for each chromosomes separately:
chr1<- gr[seqnames(gr) == "chr1" ]
chr2<- gr[seqnames(gr) == "chr2" ]
chr3<- gr[seqnames(gr) == "chr3" ]
...
seq1 <-getSeq(fa, chr1)
seq2 <-getSeq(fa, chr2)
seq3 <-getSeq(fa, chr3)
...
And worked, but there were some chromosomes presenting exactly the same problem:
seq7 <-getSeq(fa, chr7)
Error in value[[3L]](cond) : record 993 (chr7:88167220-88167222) failed
file: F1.fasta
I suspect the problem could be that the positions from which I want to extract the string were "N" in my fasta file. So I tried to locate one of these positions in my fasta file where R showed me an error. And to my surprise, they were not "Ns", but heterozygous bases. However, when I subsetted my data by chromosome, the algorithm was able to identify a Y (C / T) and other heterozygous bases, that is, it has no problem with degenerate bases. So I think that the problem is with the algorithm and not with my data. I used the follow command in bash to extract the seq from the desired position of the fasta file:
samtools faidx F1.fasta chr7:88167220-88167222"
>chr7:88167220-88167222
> CRA
And here is my sessinInfo
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252
[4] LC_NUMERIC=C LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] readr_1.3.1 limma_3.44.3
[3] ggplot2_3.3.2 stringr_1.4.0
[5] vcfR_1.12.0 adegenet_2.1.3
[7] ape_5.4-1 ade4_1.7-15
[9] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 GenomicFeatures_1.40.1
[11] AnnotationDbi_1.50.3 VariantAnnotation_1.34.0
[13] Rsamtools_2.4.0 Biostrings_2.56.0
[15] XVector_0.28.0 SummarizedExperiment_1.18.2
[17] DelayedArray_0.14.1 matrixStats_0.56.0
[19] Biobase_2.48.0 GenomicRanges_1.40.0
[21] GenomeInfoDb_1.24.2 IRanges_2.22.2
[23] S4Vectors_0.26.0 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] colorspace_1.4-1 seqinr_3.6-1 deldir_0.1-29 ellipsis_0.3.1
[5] class_7.3-17 rstudioapi_0.11 farver_2.0.3 bit64_4.0.5
[9] fansi_0.4.1 xml2_1.3.2 codetools_0.2-16 splines_4.0.2
[13] memuse_4.1-0 cluster_2.1.0 dbplyr_2.0.0 shiny_1.5.0
[17] compiler_4.0.2 httr_1.4.2 assertthat_0.2.1 Matrix_1.2-18
[21] fastmap_1.0.1 cli_2.1.0 later_1.1.0.1 htmltools_0.5.0
[25] prettyunits_1.1.1 tools_4.0.2 igraph_1.2.5 coda_0.19-4
[29] gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.3 reshape2_1.4.4
[33] dplyr_1.0.2 rappdirs_0.3.1 gmodels_2.18.1 Rcpp_1.0.5
[37] raster_3.3-13 vctrs_0.3.4 spdep_1.1-5 gdata_2.18.0
[41] nlme_3.1-148 rtracklayer_1.48.0 pinfsc50_1.2.0 mime_0.9
[45] lifecycle_0.2.0 gtools_3.8.2 XML_3.99-0.3 LearnBayes_2.15.1
[49] zlibbioc_1.34.0 MASS_7.3-51.6 scales_1.1.1 BSgenome_1.56.0
[53] hms_0.5.3 promises_1.1.1 expm_0.999-5 curl_4.3
[57] memoise_1.1.0 biomaRt_2.44.4 stringi_1.5.3 RSQLite_2.2.1
[61] e1071_1.7-3 permute_0.9-5 boot_1.3-25 BiocParallel_1.22.0
[65] spData_0.3.8 rlang_0.4.7 pkgconfig_2.0.3 bitops_1.0-6
[69] lattice_0.20-41 purrr_0.3.4 sf_0.9-6 labeling_0.4.2
[73] GenomicAlignments_1.24.0 bit_4.0.4 tidyselect_1.1.0 plyr_1.8.6
[77] magrittr_1.5 R6_2.5.0 generics_0.1.0 DBI_1.1.0
[81] withr_2.3.0 mgcv_1.8-31 pillar_1.4.6 units_0.6-7
[85] RCurl_1.98-1.2 sp_1.4-2 tibble_3.0.3 crayon_1.3.4
[89] KernSmooth_2.23-17 BiocFileCache_1.12.1 progress_1.2.2 grid_4.0.2
[93] blob_1.2.1 vegan_2.5-6 digest_0.6.25 classInt_0.4-3
[97] xtable_1.8-4 httpuv_1.5.4 openssl_1.4.3 munsell_0.5.0
[101] viridisLite_0.3.0 askpass_1.1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Strange Cook's Values with DESeq2 in R - r

Related

Tidyr's crossing() function not producing expected names in output

dimdesc() error from FactoMineR package in the building of PCA

Using variables inside function R. Error in is_data_pronoun(expr)

Case when doesnt run for factor columns [duplicate]

Error in extracting several sequences from a .fasta file using a genomic range object as reference

Categories

Resources