How can I write ped file from r to epacts - r

Is there a package which allows me to write a .ped file from my R-dataset to use with EPACTS with an appropiate header?
I cannot google it and only find a way to read it

A web search reveals that there is no tool to do this. You may want to consider using VCF format, as EPACTS seems to accept this:
http://genome.sph.umich.edu/wiki/EPACTS#VCF_file_for_Genotypes
You can convert PED to VCF using plink like so:
plink --file prefix --recode vcf --out prefix
You may need to fiddle with additional option to get it to produce output you want, see https://www.cog-genomics.org/plink2/data#recode, specfically:
The 'vcf', 'vcf-fid', and 'vcf-iid' modifiers result in production of a
VCFv4.2 file. 'vcf-fid' and 'vcf-iid' cause family IDs and within-family IDs
respectively to be used for the sample IDs in the last header row, while
'vcf' merges both IDs and puts an underscore between them (in this case, a
warning will be given if an ID already contains an underscore).
If the 'bgz' modifier is added, the VCF file is block-gzipped. (Gzipping
of other --recode output files is not currently supported.)
The A2 allele is saved as the reference and normally flagged as not
based on a real reference genome ('PR' INFO field value). When it is
important for reference alleles to be correct, you'll usually also want to
include --a2-allele and --real-ref-alleles in your command.

EPACTS needs both a VCF and PED file as input for association analysis. Unlike the PED file described in the PLINK documentation, the PED file used in EPACTS does not contain genotype data. Its purpose is to hold your phenotype data and covariates, and it needs a .ped extension to be recognized by EPACTS.
To export a data frame in R as a PED file you just need to specify that a .ped extension is needed; you can use the following command:
write.table(df, filename.ped, sep="\t", row.names=F, col.names=T, quote=F)
EPACTS also requires that the header line containing the column names be commented out. I usually just do this step manually since adding in the '#' is very quick, and I always open my file to check it anyway. Alternatively you could set col.names=F and use a .dat file as shown in the EPACTS documentation here: https://genome.sph.umich.edu/wiki/EPACTS#PED_file_for_Phenotypes_and_Covariates

Related

How can I read a semi-colon delimited log file in R?

I am currently working with a raw data set that, when downloaded from our device, outputs in a log file with values delimited with a semi-colon.
I am simply trying to load this data into r so I can put it onto a dataframe and analyze from there. However, as it is a log file I can't use read_csv or read_delim. When I use read_log, there is no input where I can define the delimiter, and as such my columns are being misread and I am receiving error messages since r is not recognizing ; as a delimiter in the file.
I have been unable to find any other instances of people using delimited log files with r, but I am trying to make the code work before I resign to uploading it into excel (I don't want to do this, both because the files have a lot of associated data and my computer runs excel very slowly). Does anyone have any suggestions of functions I could use to load the semi-colon delimited log file?
Thank you!
You could use data.table::fread(). freadautomatically recognizes most delimiters very reliable and reads most file types like *.csv, *.txt etc.
If your facing a situation where it doesn't guess the right delimiter, you can define it by setting the optionfread(your_file, sep=";"). But it won't be necessary in your case.
I've creates a file named your_file without any extension and the following content:
Text1;Text2;Text3;Text4
And now imported it to R:
library(data.table)
df = fread("your_file", header=FALSE)
Output:
> df
V1 V2 V3 V4
1: Text1 Text2 Text3 Text4

Vcorpus Rstudio combining .txt files

I have a directory of .txt files and need to combine then into one file. each file would be a separate line. I tried:
new_corpus <-VCorpus(DirSource("Downloads/data/"))
The data is in the file but I get an error
Error in DirSource(directory = "Downloads/data/") :
empty directory
This is a bit basic but I was only given this information on how to create the corpus. What I need to do is take this file and create one factor that is the .txt and another with an ID, in the form of:
ID .txt
ID .txt
.......
EDIT To clarify on emilliman5 comment:
I need both a data frame and a corpus. The example I am working from used a csv file with the data already tagged for a Naive Bayes problem. I can work through that example and all the steps. The data I have is in a different format. It is 2 directories (/ham and /spam) of short .txt files. I was able to create a corpus, when I changed my command to:
new_corpus <-VCorpus(DirSource("~/Downloads/data/"))
I have cleaned the raw data and can make DTM but at the end I will need to create a crossTable with the labels spam and ham. I do not understand how I insert that information into the corpus.

How to reference a file path from another file in r

I have a series of r scripts which all do very different things to the same .txt file. For various reasons I don't want to combine them into a single file. The name of the input text file changes from time to time which means I have to change the file path on all the scripts by hand. Is there a way of telling r to look for the path name in a text file so I only have to change the text file rather than all the scripts. In other words going from:
df <- read.delim("~/Desktop/Sequ/Blabla.txt", header=TRUE)
to
df <- get the path to read the text file from here
OK. Sorted this one in about 5 seconds. Oops
just use source("myfile.txt")
as in:
df <- read.delim(source("~ Desktop/Sequ/Plots/Path.txt"))
Easy

how to read a file to data frame and print some colums in R

I got a question about reading a file into data frame using R.
I don't understand "getwd" and "setwd", do we must do these before reading the files?
and also i need to print some of the columns in the data frame, and only need to print 1 to 30,how to do this?
Kinds regards
getwd tells you what your current working directory is. setwd is used to change your working directory to a specified path. See the relevant documentation here or by typing ? getwd or ? setwd in your R console.
Using these allows you to shorten what you type into, e.g., read.csv by just specifying a filename without specifying its full path, like:
setwd('C:/Users/Me/Documents')
read.csv('myfile.csv')
instead of:
read.csv('C:/Users/Me/Documents/myfile.csv')

X12 seasonal adjustment program from census, problem with input file extensions

I downloaded the X12 seasonal adjustment program located here: http://www.census.gov/srd/www/x12a/x12downv03_pc.html
I followed the setup and got the setting correct. When I go to select a file to input I have four options for file extensions to import which are ".spc" ".mta" ".dta" and "."
The problem is that I have data in excel and I have searched extensively through search engines and I do cannot figure out a way to get data from excel into one of these formats so I can do a seasonal adjustment on my data. Thanks
ADDED: After converting to a dta file (using R thanks to the comments left below) it looks like the program makes you convert it also to a .spc file as well. Anyone have a lead on how to do this? thanks
My first reaction is to:
(1) export the data from excel in something simple like csv.
(2) import that data into R
(3) use the R library "foreign" to export the data in .dta format.
So with the file "test.csv" containing:
V1,V2
1,2
3,4
5,6
you could do the following to produce "test.dta":
library(foreign)
testdata <- read.csv("test.csv")
write.dta(testdata,"test.dta")
Voila, data in .dta format. Would this work for what you have?
I've only ever used the command-line version of X12, but it sounds like you may be using the windows interface instead? If so the following might not be entirely accurate, but it should be close enough (I hope!).
The .dta and .mta files you refer to are just metafiles containing text lists of either spec files or data files to be processed; in particular the .dta files X12 uses are NOT Stata data format files like those produced by Nathan's R-based answer. It's probably best to ignore using metafiles until you are comfortable enough using the software to adjust a single time series.
You can export your data in tab separated variable format (year month/quarter value) without headings and use that as your data file. You can also use a simple list of data values separated by spaces, tabs, or newlines and then tell X12ARIMA what the start and end dates of the series are in the .spc file.
The .spc file doesn't contain the input data, it's a specification file telling X12 where to find the data file and how you want those data to be processed -- you'll have to write them yourself or create them in Win X-12.
Ideally you should write a separate .spc file for each time series to be adjusted; while you can write a .spc file which invokes many of X12's autoselection and identification procedures, it's usually not a good idea to treat the process as a black box, and a bit of manual intervention in the .spc is often necessary to get a good quality adjustment (and essential if there's a seasonal break involved). I find it helpful to start with a fairly generic skeleton .spc file suitable for your computing environment to begin with and then tweak it from there as appropriate for each series.
If you really want to use a single .spc file to adjust multiple series, then you can provide a list of data files in a .dta file and a single .spc file instructing X12ARIMA how to adjust them, but take care to ensure this is appropriate for your data!
The "Getting started with X-12-ARIMA input files on your PC" document on that site is probably a good place to start reading, but you'll probably end up having to consult the complete reference documentation (in particular Chapters 3 and 7) as well.
Edit postscript:
The UK Office for National Statistics have a draft of their guide to seasonal adjustment with X12ARIMA available online here here (archive.org), and is worth a look. It's a good bit easier to work through than the Census Bureau documentation.
Ryan,
This is not elegant, but it might work for you. In this example I'm trying to replicate the spec file from the Example 3.2 in the Census documentation.
Concatentate the data into one text string, then save this single text string using the MS-DOS (TXT) format under the SAVE AS command. To make the text string, first insert two cells above your column header and in the second one type the following text into it.
series{title=
Next, insert double quotation marks before and after the text in your column header, like this:
"Monthly Retail Sales of Household Appliance Stores"
Directly below the last data row, insert rows of texts that list the model specifications, like the following:
)
start= 1972.jul}
transform{function = log}
regression{variables=td}
indentify[diff=(0,1) sdiff=(0,1)}
So you should have something like the following:
<blank row>
series{title=
"Monthly Retail Sales of Household Appliance Stores"
530
529
...
592
590
start= 1972.jul}
transform{function = log}
regression{variables=td}
indentify{diff=(0,1) sdiff=(0,1)}
For the next instructions I am assuming that the text *series{title=
* appears in cell A2 and that cell B1 is empty. In cell B2, insert the following:
=CONCATENATE(B1,A2," ")
Then copy this formula into every cell down the column to concatentate all of the text in column A into a single cell at the end of column B. Finally, copy the final cell to a new spreadsheet's cell A1 using PASTE SPECIAL/VALUE, and save this spreadsheet using SAVE AS: *TXT(MS-DOS), but change the extension to ".spc".
Good luck (and from the little I read of the Census documentation - you'll need it).

Resources