How to obtain SNP dataset from UCSC (customProDB) - bioconductor

I need to obtain a SNP dataset for proteogenomics. Supposedly the latest version is called 'snpedia' and the data is downloaded from the UCSC table browser. When I try to download it I get an error saying the table 'snpediaCodingDbSnp' is unavailable.
Does anyone know how to obtain the data for hg38?
Code
library(customProDB)
ensembl <- useMart(
"ENSEMBL_MART_ENSEMBL",
dataset="hsapiens_gene_ensembl",
host="https://aug2017.archive.ensembl.org",
path="/biomart/martservice",
archive=FALSE
)
annotation_path <- "<your_path>"
PrepareAnnotationEnsembl(
mart=ensembl,
annotation_path=annotation_path,
splice_matrix=FALSE,
dbsnp='snpedia',
COSMIC=FALSE
)
Output
Prepare gene/transcript/protein id mapping information (ids.RData) ... done
Build TranscriptDB object (txdb.sqlite) ...
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
done
Prepare exon annotation information (exon_anno.RData) ... done
done
done
Prepare dbSNP information (dbsnpinCoding.RData) ... Error in normArgTable(value, x) :
Table 'snpediaCodingDbSnp' is unavailable
In addition: Warning message:
Ensembl will soon enforce the use of https.
Ensure the 'host' argument includes "https://"

Related

R packages not connecting to peer for data download

I am working with a couple of R packages for genetic pathway enrichment analyses and the two packages that I am using are now throwing errors when trying to connect to each package's respective server for downloading the reference data for the analysis.
In the first package gage, I am getting the following error when attempting to download:
library(gage)
> kg.ko = kegg.gsets("ko") # ("ko" is KEGG ortholog pathway)
Error in curl::curl_fetch_memory(url, handle = handle) :
Failure when receiving data from the peer
In the second package clusterProfiler, I am getting the following error:
library(clusterProfiler)
# the data
dput(head(de_kegg_chr))
c("K14847", "K19009", "K00078", "K21407", "K23285", "K06972")
# KEGG enrichment (which will pull relevant reference data during this step)
# over-representation analysis (fisher's)
> enrich <- enrichKEGG(gene = de_kegg_chr,
+ organism = "ko",
+ keyType='kegg',
+ pvalueCutoff = 0.01)
Reading KEGG annotation online:
fail to download KEGG data...
Error in download.KEGG.Path(species) :
'species' should be one of organisms listed in 'http://www.genome.jp/kegg/catalog/org_list.html'...
In addition: Warning message:
In utils::download.file(url, quiet = TRUE, method = method, ...) :
URL 'https://rest.kegg.jp/link/ko/pathway': status was 'Failure when receiving data from the peer'
After the first error, I thought it was something specific to the gage package and found a simple work-around because these data are downloaded from the server prior to the analysis function.
This is more of a problem with the second package because the reference data are downloaded within the function that conducts the analysis.
Now that this is happening with more than one package (both of these scripts were working perfectly before yesterday), I'm thinking it is something systematic within R or R studio.

Difficulty opening a package data file of unknown type

I am trying to load the state map from the maps package into an R object. I am hoping it is a SpatialPolygonsDataFrame or something I can turn into one after I have inspected it. However I am failing at the first step – getting it into an R object. I do not know the file type.
I first tried to assign the map() output to an R object directly:
st_m <- maps::map(database = "state")
draws the map, but str(st_m) appears to do nothing, unless it is redrawing the same map.
Then I tried loading it as a dataset: st_m <- data("stateMapEnv", package="maps") but this just returns a string:
> str(stateMapEnv)
chr "R_MAP_DATA_DIR"
I opened the maps directory win-library/3.4/maps/mapdata/ and found what I think is the map file, “state.L”.
I tried reading it with scan and got an error message I do not understand:
scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in scan(file = "D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
scan() expected 'a real', got '#'
I then opened the file with Notepad++. It appears to be a binary or compressed file.
So I thought it might be an R data file with an unusual extension. But my attempt to load it returned a “bad magic number” error:
st_m <- load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L")
Error in load("D:/Documents/R/win-library/3.4/maps/mapdata/state.L") :
bad restore file magic number (file may be corrupted) -- no data loaded
Observing that these responses have progressed from the unhelpful through the incomprehensible to the occult, I thought it best to seek assistance from the wizards of stackoverflow.
This should be able to export the 'state' or any other maps dataset for you:
library(ggplot2)
state_dataset <- map_data("state")

Error in fstRead R

I am using the new 'fst' package in R for a few weeks to write and read tables in the .fst format. Sometimes I cannot read a table that I've just write having the following message :
> tab=read.fst("Tables R/tab.fst",as.data.table=TRUE)
Error in fstRead(fileName, columns, from, to) :
Unknown type found in column.
Do you know why this happens ? Is there an other way to retrieve the table ?

cummeRbund Create Gene Set Error

I am having trouble creating a Gene Set using cummeRbund (R software used to analyze cufflinks, cuffdiff output).
I have been working from the cummeRbund manual that can be found here The directions have worked up until the point of creating the gene sets.
Before creating the gene sets you need to create a vector of gene_ids to include. In the example they enclose each item in this list in quotation marks. I have a created a gene_ids .txt file named OtoSCOPE_v7_list_oneline.txt the first 4 entries in this list are shown below.
“Adcy1” “Bdp1” “Bsnd” “Cabp2”
Here is the create gene sets portion of the script that I have been using.
###################################
# Creating Gene Sets
###################################
#first created a vector of gene_ids that you want included in your gene set
base_dir <- "/Users/paulranum/Documents/cummeRbund"
otoscope_genes <- read.table(file.path(base_dir, "OtoSCOPE_v7_list_oneline.txt"), stringsAsFactors=FALSE)
data(cuff)
myGeneIds<-otoscope_genes
myGeneIds
myGenes<-getGenes(cuff, myGeneIds)
myGenes
When I run this I get the following output and errors.
> data(cuff)
Warning message:
In data(cuff) : data set 'cuff' not found
> myGeneIds<-otoscope_genes
> myGeneIds
V1 V2 V3 V4
1 “Adcy1” “Bdp1” “Bsnd” “Cabp2”
> myGenes<-getGenes(cuff, myGeneIds)
Error in rsqlite_send_query(conn#ptr, statement) :
cannot start a transaction within a transaction
> myGenes
Error: object 'myGenes' not found
From what I can tell there are two main issues going on.
it is not recognizing my data(cuff) command. cuff is the name of my CuffSet data file this file has worked for everything else. is this not the correct data file?
the error after the myGenes<-getGenes(cuff, myGeneIds) command:
Error in rsqlite_send_query(conn#ptr, statement) : cannot start a
transaction within a transaction
Thanks for reading any help would be very much appreciated.

Trouble providing data sets with package

I have two data sets full and raw that I placed in the data/ directory of my package. However, when I load my package, they are not available. I tried looking for them using the data function, but did not see them.
data(raw, package = "pkg")
Warning message:
In data(raw, package = "pkg") : data set 'raw' not found
Do I have to export them somehow?
I noticed when I tried to open the file using load from another computer, it read in as a string. Maybe I'm not writing the data frame properly? I used:
save(df.full, file = "full.RData")
save(df.raw, file = "raw.RData")

Resources