I need to load a CSV dataset directly to R from Kaggle without pre-downloading the CSV file. I found the solution how to unzip a csv file using GET() function from httr package, but for that I need the URL of the download. On Kaggle I can see only the Download button link '.../download', without the exact address to the zipped csv file. Is there any way to get the link to the zipped csv file on Kaggle?
I also found the kaggler package, but it seems working only for my own datasets that are under my account. While I would like to load into R datasets uploaded by other Kaggle users. Is there any way to do it in R?
Here is an example of Kaggle dataset. I would need to load the Placement_Data_Full_Class.csv directly into R.
Related
I'm trying to save a two csv files that can be accessed through this website:
https://www.cenace.gob.mx/Paginas/SIM/Reportes/CapacidadTransferencia.aspx
I only want to save both csv files as data frames in RStudio. I've tried to do this through rvest, but it seems that the data is in the website's back end and this approach is not working.
I'm trying to write a table into a macro-enabled Excel file (.xlsm) through the R. The write.xlsx (openxlsx) and writeWorksheetToFile (XLconnect) functions don't work.
When I used the openxlsx package, as seen below, the resulting .xlsm files ended up getting corrupted.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
write.xlsx(Input_Files[[i]], Inputs[i], sheetName="Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Corrupted .xlsm file error message after write.xlsx:
Excel cannot open the file 'xxxxx.xslm' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file
After researching this problem extensively, I found that the XLConnect connect package offers the writeWorksheetToFile function which works with .xlsm, albeit after running it a few times it yields an error message that there is no more free space. It also runs for 20+ minutes for tables with approximately 10,000 lines. I tried adding xlcFreeMemory at the beginning of the for loop, but it doesn't solve the issue.
Code:
library(XLConnect)
library(openxlsx)
for (i in 1:3){
xlcFreeMemory()
writeWorksheetToFile(Inputs[i], Input_Files[[i]], "Input_Sheet")
}
#Input_Files[[i]] are the R data.frames which need to be inserted into the .xslm file
#Inputs[i] are the excel files upon which the tables should be written into
Could anyone recommend a way to easily and quickly transfer an R table into an xlsm file without corrupting it?
I have a requirement where I have to import an .xls file which is saved as .*htm, .*html.
How do we load this inside R in a data frame. The data is present in Sheet1 starting from Row Number 5. I have been struggling with this by trying to load it using xlsx package and readxl package. But neither of them worked, because the native format of the file is different.
I can't edit and re-save the file manually as .xlsx, as it cannot be automated.
Also to note, saved it as a .xlsx file and it works fine. But that's not what I need.
Kindly help me with this.
Try the openxlsx package and its function read.xlsx. If that doesn't work, you could programmatically rename the file as described for example here, and then open it using one of these excel packages.
Your file could be in xls format instead of xlsx, have you tried read_xls() function from readxl? Or it could also be in text format, in this case read.table() or fread() from data.tableshould work. The fact that it works after saving the file in xlsx strongly suggests that it is not formatted as an xlsx to begin with.
Hope this helps.
I've created my first package, hosted on github. I'm trying to include three data frames in the package. Per the Writing R Extensions guide, I saved each data frame as a separate .RData file in the data subdirectory. However, I can't seem to access the data when I load the package.
When I install the package from a clean R session using
require(devtools)
install_github("Lloyd.et.al.Cell.abundance.metaanalysis", "adsteen")
the package functions, documentation, and vignette seem to load correctly. Better yet,
data(package="Lloyd.et.al.Cell.abundance.metaanalysis")
shows the three data frames that are encoded as .RData files, named all_data, corrected_seds, and corrected_sw.
The problem is, I can' seem to actually access the data. I have LazyData: true at line 21 in the DESCRIPTION file, so I would expect head(all_data) to show the data frame, but it returns an error, obect 'all_data' not found. I can't seem to find a way to use load() to load the data.
What am I doing wrong?
I am making my first attempts to write a R package. I am loading one csv file from hard drive and I am hoping to bundle up my R codes and my csv files into one package later.
My question is how can I load my csv file when my pakage is generated, I mean right now my file address is something like c:\R\mydirectory....\myfile.csv but after I sent it to someone else how can I have a relative address to that file?
Feel free to correct this question if it is not clear to others!
You can put your csv files in the data directory or in inst/extdata.
See the Writing R Extensions manual - Section 1.1.5 Data in packages.
To import the data you can use, e.g.,
R> data("achieve", package="flexclust")
or
R> read.table(system.file("data/achieve.txt", package = "flexclust"))
Look at the R help for package.skeleton: this function
automates some of the setup for a new source package. It creates directories, saves functions, data, and R code files to appropriate places, and creates skeleton help files and a ‘Read-and-delete-me’ file describing further steps in packaging.
The directory structure created by package.skeleton includes a data directory. If you put your data here it will be distributed with the package.