r download url file with partial name - r

I am programming in R. I need to download a set of files from an http: address. The naming format of the file refers to a date/time period but also contains additional numbering that is not recognizable. For example for the file below the first set of numbers refers to the date 2014/10/24 at 05:10am but the second batch of numbers is not recognizable. All files on the webpage follow this standard format.
http://www.nemweb.com.au/REPORTS/CURRENT/MCCDispatch/PUBLIC_MCCDISPATCH_201410240510_0000000258279329.zip
My question is: How do I download the file with only partial name information?
For example if I wanted to download the file relating to the 6:30 time period I know that the url prefix is as below, but would not know the numbers that follow after: http://www.nemweb.com.au/REPORTS/CURRENT/MCCDispatch/PUBLIC_MCCDISPATCH_201410240630_??????????????.zip

You're actually in luck. Because you have a directory listing. Essentially, you have to download the list of links and then grep them. Here's how you would go about doing that.
library(XML)
url <- "http://www.nemweb.com.au/REPORTS/CURRENT/MCCDispatch/"
parsed <- htmlParse(url)
links <- xpathSApply(parsed, "//#href")
Now you have a list of URLs that you can search through and choose the one that's appropriate.
Hint: grep("pattern",links)

Related

How to download json data from multiple url's in r

I would like to download multiple data files in r from multiple urls, where only the number changes. I succesfully downloaded this code, but i need to download a string of numbers (e.g. 29208, 49510, 54604 62759,62760,7002,38175) which have to replace 29208 in the url. I am a total newbie, and eventhough I have read and seen some examples, I cannot seem to write the right code.
install.packages("jsonlite")
library(jsonlite)
df <- fromJSON('https://api.euroinvestor.dk/instruments/29208/closeprices?fromDate=1970-1-1')
You can not do this , you can just do a loop for , and in everytime you change the url parameters, the loop for loop in all values that you want to use in the url

Import information from .doc files into R

I've got a folder full of .doc files and I want to merge them all into R to create a dataframe with filename as one column and content as another column (which would include all content from the .doc file.
Is this even possible? If so, could you provide me with an overview of how to go about doing this?
I tried starting out by converting all the files to .txt format using readtext() using the following code:
DATA_DIR <- system.file("C:/Users/MyFiles/Desktop")
readtext(paste0(DATA_DIR, "/files/*.doc"))
I also tried:
setwd("C:/Users/My Files/Desktop")
I couldn't get either to work (output from R was Error in list_files(file, ignore_missing, TRUE, verbosity) : File '' does not exist.) but I'm not sure if this is necessary for what I want to do.
Sorry that this is quite vague; I guess I want to know first and foremost if what I want to do can be done. Many thanks!

Importing to R an Excel file saved as web-page

I would like to open an Excel file saved as webpage using R and I keep getting error messages.
The desired steps are:
1) Upload the file into RStudio
2) Change the format into a data frame / tibble
3) Save the file as an xls
The message I get when I open the file in Excel is that the file format (excel webpage format) and extension format (xls) differ. I have tried the steps in this answer, but to no avail. I would be grateful for any help!
I don't expect anybody will be able to give you a definitive answer without a link to the actual file. The complication is that many services will write files as .xls or .xlsx without them being valid Excel format. This is done because Excel is so common and some non-technical people feel more confident working with Excel files than a csv file. Now, the files will have been stored in a format that Excel can deal with (hence your warning message), but R's libraries are more strict and don't see the actual file type they were expecting, so they fail.
That said, the below steps worked for me when I last encountered this problem. A service was outputting .xls files which were actually just HTML tables saved with an .xls file extension.
1) Download the file to work with it locally. You can script this of course, e.g. with download.file(), but this step helps eliminate other errors involved in working directly with a webpage or connection.
2) Load the full file with readHTMLTable() from the XML package
library(XML)
dTemp = readHTMLTable([filename], stringsAsFactors = FALSE)
This will return a list of dataframes. Your result set will quite likely be the second element or later (see ?readHTMLTable for an example with explanation). You will probably need to experiment here and explore the list structure as it may have nested lists.
3) Extract the relevant list element, e.g.
df = dTemp[2]
You also mention writing out the final data frame as an xls file which suggests you want the old-style format. I would suggest the package WriteXLS for this purpose.
I seriously doubt Excel is 'saved as a web page'. I'm pretty sure the file just sits on a server and all you have to do is go fetch it. Some kind of files (In particular Excel and h5) are binary rather than text files. This needs an added setting to warn R that it is a binary file and should be handled appropriately.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download.file(url=myurl, destfile="localcopy.xlsx", mode="wb")
or, for use downloader, and ty something like this.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download(myurl, destfile="localcopy.csv", mode="wb")

How to reference a file path from another file in r

I have a series of r scripts which all do very different things to the same .txt file. For various reasons I don't want to combine them into a single file. The name of the input text file changes from time to time which means I have to change the file path on all the scripts by hand. Is there a way of telling r to look for the path name in a text file so I only have to change the text file rather than all the scripts. In other words going from:
df <- read.delim("~/Desktop/Sequ/Blabla.txt", header=TRUE)
to
df <- get the path to read the text file from here
OK. Sorted this one in about 5 seconds. Oops
just use source("myfile.txt")
as in:
df <- read.delim(source("~ Desktop/Sequ/Plots/Path.txt"))
Easy

Request user to identify file location and auto-extract variable name from file location in R

I am EXTREMELY new to R, and programming in general, so thank you for your patience.
I am trying to write a script which reads values from a .txt file and after some manipulation plots the results. I have two questions which are somewhat coupled.
First, is there a function which asks the user to identify the location of a file? i.e. User runs script. Script opens up file navigation prompt and requests user to navigate to and select relevant file.
Currently, I have to manually identify the file and location in R. e.g.
spectra.raw <- read.table("C:\Users\...\file1.txt", row.names=NULL, header = TRUE)
I'd rather have the user identify the file location each time the script is run. This will be used by non-tech people, and I don't trust them to copy/paste file locations into R.
The second question I've been struggling with is, is it possible to create a variable name based off the file selected? For example, if the user selects "file1.txt" I'd like R to assign the output of read.table() to a variable named "file1.raw" much like the above "spectra.raw"
If it helps, all the file names will have the exact same number of characters, so if it's possible to select the last say 5 characters from the file location, that would work.
Thank you very much, and please excuse my ignorance.
See file.choose. Though I believe it behaves slightly differently on different platforms, so beware of that.
See assign, i.e. assign("fileName",value). You'll want to parse the file path that file.choose spits back using string manipulation functions like substr or strsplit.
Try
file.choose
I think it can do what you want.
For example,
myfile <- file.choose()
Enter file name: adataset.Rdata
load(myfile)
myfile contains the name of the file so you don't have to do anything special.

Resources