I have a series of r scripts which all do very different things to the same .txt file. For various reasons I don't want to combine them into a single file. The name of the input text file changes from time to time which means I have to change the file path on all the scripts by hand. Is there a way of telling r to look for the path name in a text file so I only have to change the text file rather than all the scripts. In other words going from:
df <- read.delim("~/Desktop/Sequ/Blabla.txt", header=TRUE)
to
df <- get the path to read the text file from here
OK. Sorted this one in about 5 seconds. Oops
just use source("myfile.txt")
as in:
df <- read.delim(source("~ Desktop/Sequ/Plots/Path.txt"))
Easy
Related
I have an Excel file containing a large number of hyperlinks, and I want to write a program that extracts the URLs and matches them with the displayed text. I can import the URLs with the solution to a previous question, which uses the following code:
library(XML)
# rename file to .zip
my.zip.file <- sub("xlsx", "zip", my.excel.file)
file.copy(from = my.excel.file, to = my.zip.file)
# unzip the file
unzip(my.zip.file)
# unzipping produces a bunch of files which we can read using the XML package
# assume sheet1 has our data
xml <- xmlParse("xl/worksheets/sheet1.xml")
# finally grab the hyperlinks
hyperlinks <- xpathApply(xml, "//x:hyperlink/#display", namespaces="x")
However, this ignores rows without any links, so the imported dataset is several thousand rows shorter than it should be. I can get the displayed text with read.xlsx, but I don't know how to match it with the URLs. I've tried looking for ways to find out which rows have links, or to change the code so it adds NAs in the right places, but I haven't had any success.
Having the same use case today, I dug a little bit and got a R function to extract all hyperlinks beneath the cells/text. My code snippet is posted here, Extract hyperlink from Excel file in R, which I believe is a similar topic:
I have five files that are all part of the same thing and intermingled in them are occasional exports of csv files. Currently, i have the full path written out for each of them, but i had to change a folder and then go through all of the files and find every reference and change them too.
Is there a way in R to store text, like "C:\R\Folder\New" so that then i can just reference an object or something and then i only have to change the path once?
In SAS i'd use a %let statement.
Maybe something like??
path <- "C:\R\Folder\New"
And then:
write.table(grouped_ageck, file="??PATH??\\qc\\ageck.csv", sep=",", row.names=F)
I've got a folder full of .doc files and I want to merge them all into R to create a dataframe with filename as one column and content as another column (which would include all content from the .doc file.
Is this even possible? If so, could you provide me with an overview of how to go about doing this?
I tried starting out by converting all the files to .txt format using readtext() using the following code:
DATA_DIR <- system.file("C:/Users/MyFiles/Desktop")
readtext(paste0(DATA_DIR, "/files/*.doc"))
I also tried:
setwd("C:/Users/My Files/Desktop")
I couldn't get either to work (output from R was Error in list_files(file, ignore_missing, TRUE, verbosity) : File '' does not exist.) but I'm not sure if this is necessary for what I want to do.
Sorry that this is quite vague; I guess I want to know first and foremost if what I want to do can be done. Many thanks!
Is there a package which allows me to write a .ped file from my R-dataset to use with EPACTS with an appropiate header?
I cannot google it and only find a way to read it
A web search reveals that there is no tool to do this. You may want to consider using VCF format, as EPACTS seems to accept this:
http://genome.sph.umich.edu/wiki/EPACTS#VCF_file_for_Genotypes
You can convert PED to VCF using plink like so:
plink --file prefix --recode vcf --out prefix
You may need to fiddle with additional option to get it to produce output you want, see https://www.cog-genomics.org/plink2/data#recode, specfically:
The 'vcf', 'vcf-fid', and 'vcf-iid' modifiers result in production of a
VCFv4.2 file. 'vcf-fid' and 'vcf-iid' cause family IDs and within-family IDs
respectively to be used for the sample IDs in the last header row, while
'vcf' merges both IDs and puts an underscore between them (in this case, a
warning will be given if an ID already contains an underscore).
If the 'bgz' modifier is added, the VCF file is block-gzipped. (Gzipping
of other --recode output files is not currently supported.)
The A2 allele is saved as the reference and normally flagged as not
based on a real reference genome ('PR' INFO field value). When it is
important for reference alleles to be correct, you'll usually also want to
include --a2-allele and --real-ref-alleles in your command.
EPACTS needs both a VCF and PED file as input for association analysis. Unlike the PED file described in the PLINK documentation, the PED file used in EPACTS does not contain genotype data. Its purpose is to hold your phenotype data and covariates, and it needs a .ped extension to be recognized by EPACTS.
To export a data frame in R as a PED file you just need to specify that a .ped extension is needed; you can use the following command:
write.table(df, filename.ped, sep="\t", row.names=F, col.names=T, quote=F)
EPACTS also requires that the header line containing the column names be commented out. I usually just do this step manually since adding in the '#' is very quick, and I always open my file to check it anyway. Alternatively you could set col.names=F and use a .dat file as shown in the EPACTS documentation here: https://genome.sph.umich.edu/wiki/EPACTS#PED_file_for_Phenotypes_and_Covariates
I've got a tab-delimited text file which I generated by pasting a table from an excel sheet into a text file and I'm trying to read the data into R on a Mac. I get the following output
system.file("path/to/file.txt")
[1]""
no lines available in input
If I try loading the text file using the 'Source script or load data in R' button, I get:
1: col1 col2
^
/path/to/file: unexpected symbol
I thought this might be the tabs but then I added
sep='\t'
to my read.table line and that still doesn't work - any suggestions?
The data is in the format of a matrix, with no entry on the first col/first row entry for the row names, which are the first column
The easiest way I find trying to figure out this path stuff is to mess about: getwd() and setwd(). First, type
getwd()
in your R terminal. This will give your working directory. It also gives you an idea of how to specify the path to your file! The function setwd sets the working directory.
Now you have the correct path in the correct format, you just need to use:
##For csv files
read.csv(....)
##For tab delimited files
read.delim(....)
##For other files - you can specify `sep` to `\t` if you wish.
read.table(....)