How to convert doc to docx file using R code - r

I tried to read doc file using readdoc() but when doc file consist of tables, it will not able read it properly.
Therefore I want to convert doc file to docx file so that I can extract tables using docxtractr package availabe in R.
I want to convert .doc file to .docx file using R code.

Related

Read htm file and a table wihtin file in R

I have got some .htm files. Instead there are HTML files including a data sheet.
I know how to process that in python. Using pandas reading html file;
df = pd.read_html("file")[0]
Python read the file as a list and the table is in [0] position.
How can I do that in R ?

How do I download and extract a list of papers in LaTeX format from arXiv?

I have a list of papers that I'd like to extract from arXiv (I have the arxiv links / name of the arxvi file), but in the LaTeX format. How can I do this in Python?
If we go to this page: https://arxiv.org/format/2010.11645
We can read the following text:
Source:
Delivered as a gzipped tar (.tar.gz) file if there are multiple files, otherwise as a PDF file, or a gzipped TeX, DVI, PostScript or HTML (.gz, .dvi.gz, .ps.gz or .html.gz) file depending on submission format. [ Download source ]
We can download the file by clicking on [ Download source ], but I have no idea what type of file I'm getting back. The filename is simple 2010.11645.
I'd like to download the file in LaTeX format (which I believe it .tex) and then convert it into .txt using pandoc. I believe I'd need to download the files via requests somehow?
How can I do this? Thanks!

r Remove xlsx sheet using officer package

I am developing an R shiny app that will generate a sequence of editable charts in an Excel file using the officer package. However, when creating a new .xlsx file using the read_xlsx() function, the file contains a blank first sheet entitled "Feuil1" (apparently the template is in French).
WkBook <- read_xlsx()
WkBook
xlsx document with 1 sheet(s):
[1] "Feuil1"
After adding other appropriately named worksheets to the destination .xlsx file, is it possible to program the app to delete this extraneous "Feuil1" sheet? Officer has an add_sheet() function but doesn't appear to have a function to remove or rename a sheet.
add_sheet(WkBook, "newChart")
xlsx document with 2 sheet(s):
[1] "Feuil1" "newChart"

Scilab unable to correctly read text and csv file

I wish to open and read the following text file in Scilab (version 6.0.2).
The original file is an .xlsx that I have converted to both .txt and .csv through Excel to facilitate opening & working with it in Scilab.
Using both fscanfMat and csvRead, scilab only reads the first column as Nan. I understand why the first column is considered as Nan, but I do not see why the rest of the document isn't read. Columns 2 and 3 are in particular of interest to me.
For csvRead, I used :
M=csvRead(chemin+filename," ",",",[],[],[],[],7);
to skip the 7-row header.
Could it be something to do with the way in which the file has been formatted?
For anyone able to help, I will try to upload an example of a .txt file and also the original .xlsx file
Files available for download, here: Excel and Text files
If you convert your xlsx file into a xls one with Excel you can read it withthe readxls function.
Your separator is a tabulation character (ascii code 9). Use the following command:
M=csvRead("Probe1_350N_2S.txt",ascii(9),",",[],[],[],[],7);

Importing .xls file that is saved as *.htm, *.html as it is saved on the backend

I have a requirement where I have to import an .xls file which is saved as .*htm, .*html.
How do we load this inside R in a data frame. The data is present in Sheet1 starting from Row Number 5. I have been struggling with this by trying to load it using xlsx package and readxl package. But neither of them worked, because the native format of the file is different.
I can't edit and re-save the file manually as .xlsx, as it cannot be automated.
Also to note, saved it as a .xlsx file and it works fine. But that's not what I need.
Kindly help me with this.
Try the openxlsx package and its function read.xlsx. If that doesn't work, you could programmatically rename the file as described for example here, and then open it using one of these excel packages.
Your file could be in xls format instead of xlsx, have you tried read_xls() function from readxl? Or it could also be in text format, in this case read.table() or fread() from data.tableshould work. The fact that it works after saving the file in xlsx strongly suggests that it is not formatted as an xlsx to begin with.
Hope this helps.

Resources