Downloading MCX data using R - r

I'm trying to download data on margin requirements from the MCX website using R
However, I am unable to recognise the appropriate url to use in order to download this data.
The link is here
files for different dates have seemingly different urls
for instance:
DailyMargin_20170919223427.csv
DailyMargin_20170919223104.csv
DailyMargin_20170919223039.csv
They seem to be of the form
DailyMargin_2017091922****.csv
(20170919 is the date on which I'm trying to download the data)
My code has the line:
myURL = paste("https://www.mcxindia.com/market-operations/clearing-settlement/daily-margin", "DailyMargin_2017091922","****", ".csv", sep = "")
the ****** part seems to be random.

From what I can tell, the remaining **** appears to be a timestamp when the data are created by the webpage using javascript. You will probably not be able to directly download data as it will not exist until it is created. That said, you might be able to utilize a package like rvest to do the scrapping for you.
https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

Related

Download CSV file from website that changes name every month

pretty new to R and just trying to get my head round web scrapping.
So I want to download into R a CSV file (easy enough I can do that). The issue that i am having is that every month the CSV file name changes on the website, so the URL then also changes.
So is there a way in R to tell it to download the the CSV file without the exact file url?
here is the website i am practicing it on:
https://www.police.uk/pu/your-area/police-scotland/performance/999-data-performance/
Thanks :)
So tried the basic stuff like below, issue is the url will change when the file is updated.
dat <- read.csv(
"https://www.police.uk/contentassets/069a2c11fcb444bbbeb519f69875577e/2022/sept/999-data-nov-21---sep-22.csv"
, header = T)
view(dat)

How to download json data from multiple url's in r

I would like to download multiple data files in r from multiple urls, where only the number changes. I succesfully downloaded this code, but i need to download a string of numbers (e.g. 29208, 49510, 54604 62759,62760,7002,38175) which have to replace 29208 in the url. I am a total newbie, and eventhough I have read and seen some examples, I cannot seem to write the right code.
install.packages("jsonlite")
library(jsonlite)
df <- fromJSON('https://api.euroinvestor.dk/instruments/29208/closeprices?fromDate=1970-1-1')
You can not do this , you can just do a loop for , and in everytime you change the url parameters, the loop for loop in all values that you want to use in the url

Why won't the HTML function in R actually write the HTML?

So I recently helped write a code for my lab which takes our processed data and makes a merged data frame of it. For purpose of keeping the lab updated, we keep our data tables updated on a secure wiki and thus I need an HTML made so I can basically upload the dataframe onto the wiki easily. It's worked before - all I did was basically copy what was already written and working and edited it to work for a different time point in our data collection. I have no errors given back to me and the data looks how I want it to look. As far as I know this script should be written logically and working well and so far it does except for one issue: R will make a file for the HTML, but there is no HTML written in the text document.
I have HTML's written from the other data time points which are written the exact same as this one, so I don't think it is a script construction thing.
Any ideas as to why this could be happening? I just need to know where to triage.
The package used for HTML is R2HTML, included in my packages list up at the top of the script. For HTML(, file=paste()), you will need to use your own directory to see if the HTML is written as a text file.
If I am not wrong , You are trying to get the dataframe in html format .
In this case you need to use xtable package in R
Just the below code in bottom of the script
## install the xtable package before importing it
library("xtable")
print(xtable(ChildSRPtotsFU_wiki), type="html", file="check_stack_overflow.html")

Is there a way to accelerate formatted table writing from R to excel?

I have a 174603 rows and 178 column dataframe, which I'm importing to Excel using openxlsx::saveWorkbook, (Using this package to obtain the aforementioned format of cells, with colors, header styles and so on). But the process is extremely slow, (depending on the amount of memory used by the machine it can take from 7 to 17 minutes!!) and I need a way to reduce this significantly (Doesn't need to be seconds, but anything bellow 5 min would be OK)
I've already searched other questions but they all seem to focus either in exporting to R (I have no problem with this) or writing non-formatted files to R (using write.csv and other options of the like)
Apparently I can't use xlsx package because of the settings on my computer (industrial computer, Check comments on This question)
Any suggestions regarding packages or other functionalities inside this package to make this run faster would be highly appreciated.
This question has some time ,but I had the same problem as you and came up with a solution worth mentioning.
There is package called writexl that has implemented a way to export a data frame to Excel using the C library libxlsxwriter. You can export to excel using the next code:
library(writexl)
writexl::write_xlsx(df, "Excel.xlsx",format_headers = TRUE)
The parameter format_headers only apply centered and bold titles, but I had edited the C code of the its source in github writexl library made by ropensci.
You can download it or clone it. Inside src folder you can edit write_xlsx.c file.
For example in the part that he is inserting the header format
//how to format headers (bold + center)
lxw_format * title = workbook_add_format(workbook);
format_set_bold(title);
format_set_align(title, LXW_ALIGN_CENTER);
you can add this lines to add background color to the header
format_set_pattern (title, LXW_PATTERN_SOLID);
format_set_bg_color(title, 0x8DC4E4);
There are lots of formating you can do searching in the libxlsxwriter library
When you have finished editing that file and given you have the source code in a folder called writexl, you can build and install the edited package by
shell("R CMD build writexl")
install.packages("writexl_1.2.tar.gz", repos = NULL)
Exporting again using the first chunk of code will generate the Excel with formats and faster than any other library I know about.
Hope this helps.
Have you tried ;
write.table(GroupsAlldata, file = 'Groupsalldata.txt')
in order to obtain it in txt format.
Then on Excel, you can simply transfer you can 'text to column' to put your data into a table
good luck

Importing to R an Excel file saved as web-page

I would like to open an Excel file saved as webpage using R and I keep getting error messages.
The desired steps are:
1) Upload the file into RStudio
2) Change the format into a data frame / tibble
3) Save the file as an xls
The message I get when I open the file in Excel is that the file format (excel webpage format) and extension format (xls) differ. I have tried the steps in this answer, but to no avail. I would be grateful for any help!
I don't expect anybody will be able to give you a definitive answer without a link to the actual file. The complication is that many services will write files as .xls or .xlsx without them being valid Excel format. This is done because Excel is so common and some non-technical people feel more confident working with Excel files than a csv file. Now, the files will have been stored in a format that Excel can deal with (hence your warning message), but R's libraries are more strict and don't see the actual file type they were expecting, so they fail.
That said, the below steps worked for me when I last encountered this problem. A service was outputting .xls files which were actually just HTML tables saved with an .xls file extension.
1) Download the file to work with it locally. You can script this of course, e.g. with download.file(), but this step helps eliminate other errors involved in working directly with a webpage or connection.
2) Load the full file with readHTMLTable() from the XML package
library(XML)
dTemp = readHTMLTable([filename], stringsAsFactors = FALSE)
This will return a list of dataframes. Your result set will quite likely be the second element or later (see ?readHTMLTable for an example with explanation). You will probably need to experiment here and explore the list structure as it may have nested lists.
3) Extract the relevant list element, e.g.
df = dTemp[2]
You also mention writing out the final data frame as an xls file which suggests you want the old-style format. I would suggest the package WriteXLS for this purpose.
I seriously doubt Excel is 'saved as a web page'. I'm pretty sure the file just sits on a server and all you have to do is go fetch it. Some kind of files (In particular Excel and h5) are binary rather than text files. This needs an added setting to warn R that it is a binary file and should be handled appropriately.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download.file(url=myurl, destfile="localcopy.xlsx", mode="wb")
or, for use downloader, and ty something like this.
myurl <- "http://127.0.0.1/imaginary/file.xlsx"
download(myurl, destfile="localcopy.csv", mode="wb")

Resources