Convert bed file to vcf with bed2vcf function - r

I am trying to convert .bed files to vcf by using the function bed2vcf from bedr R package.
I tried the following code:
cromXvcf <-
bed2vcf("cromXmerged2_pruned_removed_sex_mr_hh_sex_pop.bed",
filename = cromXmerged, zero.based = 1, header = NULL, fasta = "/media/iriel/Cosmos/Doctorado/Proyectos/Cromosoma X/Bases dedatos/human_g1k_v37.fasta")
and it throws the following error:
VALIDATE REGIONS * Checking input type... FAIL ERROR: Not sure what
the input format is! Error in is.valid.region(x) :
Can anybody tell what could be wrong? Any other suggestion of how could I do this conversion without using Perl?

I solved it by loading bed file to variable and changing datatypes for column 1 and 4.
Afterwards I also checked that my reference has chromosomes as chr1..22 not just 1...22.
The other thing I checked that my bed file is sorted.
x <- read.table("cromXmerged2_pruned_removed_sex_mr_hh_sex_pop.bed")
x$V1 <- as.character(x$V1)
x$V4 <- as.character(x$V4)
sapply(x, mode)
y <- bed2vcf(x, zero.based=True, header=NULL, fasta="/media/iriel/Cosmos/Doctorado/Proyectos/Cromosoma X/Bases dedatos/human_g1k_v37.fasta")
And it worked fine for me.

Related

Write stata dataframe in R [duplicate]

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

Problem with XLS files with R's package readxl

I need to read a XLS file in R, but I'm having a problem regarding the way my file is generated and the R function readxl. I do not have this issue with Python, and this is my hope that it's possible to solve this problem inside R.
An application we use at my company exports reports in XLS format (not XLSX). This report is generated daily. What I need is to sum the total value of the rows in each file, in order to create a new report containing each day followed by this total value.
When I try to read these files in R using the readxl package, the program returns this error:
Erro: Can't subset columns that don't exist.
x Location 5 doesn't exist.
i There are only 0 columns.
Run rlang::last_error() to see where the error occurred.
Now, the weird thing is that, when I open the XLS file on Excel before running my script, R is able to run properly.
I guesses this was an error caused by something like the file only being completed when I open it... but the same python script does give me the correct result.
I am now assuming this is a bug in the readxl package. Is there another package I could use to run XLS (and not XLSX)? One that does not depend on Java installed on my computer, I mean.
my readxl script:
if (!require("readxl")) {install.packages("readxl"); library("readxl")}
"%,%" <- function(x,y) paste0(x,"\\",y)
year = "2021"
month = "Aug"
column = 5 # VL_COVAR
path <- "F:\\variancia" %,% year %,% month
tiposDF = c("date","numeric","list","numeric","numeric","numeric","list")
file.names <- dir(path, pattern =".xls")
vari <- c()
for (i in 1:length(file.names)){
file <- paste(path,sep="\\",file.names[i])
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
}
vari
file <- paste(path,sep="\\",'Covariância.xls_02082021.xls')
print(paste("Reading ", file))
dados <- read_excel(file, col_types = tiposDF)
somaVar <- sum(dados[column])
vari <- append(vari,c(somaVar))
x <- import(file)
View(x)
Thanks everyone!

how to modify multiple cells in a loop in R without losing formatting?

Disclaimer: R noob here!
On a high level, I am trying to convert pdf to xls ;) The pdf is well formatted, no surprises expected. At one point I am trying to modify multiple cells using xlsx package in a loop. I've got a variable list of 3 - 5 elements and want to change the content of 7th column in .xls file, starting with 14th row. The list is coming from a PDF file (src.pdf below).
Here's the code:
library(xlsx)
library(pdftools)
library(stringr)
library(tabulizer)
library(tidyverse)
# for example data, separately download src.xls from https://file-examples-com.github.io/uploads/2017/02/file_example_XLS_100.xls, i. e. using wget -O src.xls https://file-examples-com.github.io/uploads/2017/02/file_example_XLS_100.xls
src <- xlsx::loadWorkbook(file = "src.xls")
sheets <- getSheets(src)
rows <- getRows(sheets$List1)
cc <- getCells(rows)
pdf_path <- "src.pdf"
# dest <- extract_tables("src.pdf", output="data.frame", area = list(c(163, 315, 217, 459)), guess = FALSE, header = FALSE)
dest <- extract_tables("https://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf", output="data.frame", area=list(c(195,103,376,515)), guess = FALSE, header = FALSE)
#use as
#dest[[c(1,1)]][1]
#dest[[c(1,1)]][2]
#...
row = 0
for (i in 1:length(dest[[1]]$V1))
{
row = i+13
setCellValue(paste0("cc$`",row,".7`"), value = dest[[c(1,1)]][i])
}
This returns
Error in .jcall(cell, "V", "setCellValue", value) :
RcallMethod: cannot determine object class
Any ideas how to use setCellValue in a loop? I am open to using different modules as well, as long as they keep the formatting of the source .xls.
Thank you!

Error while trying to read .data file in R

I am trying to read car.data file at this location - https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data using read.table as below. Tried various solutions listed earlier, but did not work. I am using Windows 8, R version 3.2.3. I can save this file as txt file and then read, but not able to read the .data file directly from URL or even after saving using read.table
t <- read.table(
"https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
fileEncoding="UTF-16",
sep = ",",
header=F
)
Here is the error I am getting and is resulting in an empty dataframe with single cell with "?" in it:
Warning messages:
1: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", : invalid input found on input connection 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
2: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", :
incomplete final line found by readTableHeader on 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
Please help!
Don't use read.table when the data is not stored in a table. Data at that link is clearly presented in comma-separated format. Use the RCurl package instead and read the data as CSV:
library(RCurl)
x <- getURL("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")
y <- read.csv(text = x)
Now y contains your data.
Thanks to cory, here is the solution - just use read.csv directly:
x <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")

Converting R file to Stata with missing string values

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

Resources