I am using the topmodel package in R with the huagrahuma dataset that comes with the package.
I would like to bring all these variables into excel, edit those as per my requirement & then use the base in R.
package & data: https://rdrr.io/cran/topmodel/man/huagrahuma.html
You can save datasets using a command like write.csv() or one of the writing functions from the readr, readxl or xlsx packages (for example). Using:
?write.csv()
Will show you how to use the function. Once you have it saved as a .csv file on your computer, you can open it with Excel and do what you need with it.
Edit: Following G5W's comment below, you could try extracting elements of this list and saving those, depending on what you actually want to change. To be honest, with a list structure, you are better off changing the data in R, using any of the apply family of functions, or the purrr package. R is much better than Excel for transforming/tidying data, so why not use it? :-)
Related
I am trying to export a dataframe from R to excel. I am using the 'writexl' package but it does not seem to work.
The code is as following:
install.packages('writexl')
library(writexl)
write_xlsx(data_frame, "H:\\folder1.xlsx")
There does not seem to be any error produced and the code appears to have run, however when I look in 'folder1' the data_frame is not there.
Is there anything I am doing incorrectly?
I've found the openxlsx package to be easier to use than the xlsx package. It also doesn't have a java dependency. The main command for directly writing a data frame to an Excel file is write.xlsx. You can also create worksheets, do lots of fancy formatting and write multiple tables to a worksheet (see the vignettes here for some examples), but start with write.xlsx for the basic creation of Excel files.
I have some excel file with simple formulas like =SUM(A1:A3).
I need to import the file into R, but before that I need to refresh the formulas. Is there a way to refresh the file from within R? There are good packages for importing the data in a R dataframe (eg. the R xslx package) but I need to refresh my formulas first.
Any suggestions?
Thanks!
You should be able to do this with RDCOMClient:
library(RDCOMClient)
ex = COMCreate("Excel.Application")
book = ex$Workbooks()$Open("my_file.xlsx")
book$Worksheets("Sheet1")$Calculate() # if you have many sheets you could loop through them or use apply functions based on their actual names
book$Save()
book$Close()
Here's another thread on the underlying VBA
I would appreciate it if someone can help me import this EndNote .enl file into R Dataframe.
When I came across this issue I tried a number of different packages, including bibliometrix (which is a very extensive package but a bit too complicated for this task I thought), bibtex and RISmed, without success.
But then I found the RefManageR package which worked great for this.
I first exported the Endnote library to Bibtex, there are pretty good directions for that step at:
https://libguides.usask.ca/c.php?g=218034&p=1458583
Then I imported the file that I had just creating using the following code in R:
# Basic importing of BibTex files
library(RefManageR)
# User RefManageR to import bibtex references and convert to tibble
bib = ReadBib("MyEndnoteBibtexExportFile.bibtex")
d.bibEQL = as.data.frame(bibEQL)
(I wanted a data frame for some custom analysis and plotting, seems like once you are here you are flexible to do whatever you want with the data, since it's just a a standard R data structure. RefManageR also has functions for further processing if that's what you want.)
I want to export a dataset in the MASS package to SPSS for further investigation. I'm looking for the EuStockMarkets data set in the package.
As described in http://www.statmethods.net/input/exportingdata.html, I did:
library(foreign)
write.foreign(EuStockMarkets, "c:/mydata.txt", "c:/mydata.sps", package="SPSS")
I got a text file but the sps file is not a valid SPSS file. I'm really looking for a way to export the dataset to something that a SPSS can open.
As Thomas has mentioned in the comments, write.foreign doesn't generate native SPSS datafiles (.sav). What it does generate is the data in a comma delimited format (the .txt file) and a basic syntax file for reading that data into SPSS (the .sps file). The EuStockMarkets data object class is multivariate time series (mts) so when it's exported the metadata is lost and the resulting .sps file, lacking variable names, throws an error when you try to run it in SPSS. To get around this you can export it as a data frame instead:
write.foreign(as.data.frame(EuStockMarkets), "c:/mydata.txt", "c:/mydata.sps", package="SPSS")
Now you just need to open mydata.sps as a syntax file (NOT as a datafile) in SPSS and run it to read in the datafile.
Rather than exporting it, use the STATS GET R extension command. It will take a specified data frame from an R workspace/dataset and convert it into a Statistics dataset. You need the R Essentials for Statistics and the extension command, which are available via the SPSS Community site (www.ibm.com/developerworks/spssdevcentral)
I'm not trying to answer a question that has been answered. I just think there is something else to complement for other users looking for this.
On your SPSS window, you just need to find the first line of code and edit it. It should be something like this:
"file-name.txt"
You need to find the folder path where you're keeping your file:
"C:\Users\DELL\Google Drive\Folder-With-Your-File"
Then you just need to add this path to your file's name:
"C:\Users\DELL\Google Drive\Folder-With-Your-File\file-name.txt"
Otherwise SPSS will not recognize the .txt file.
Sorry if I'm repeating some information here, I just wanted to make it easier to understand.
I suppose that EuStockMarkets is a (labelled) data frame.
This should work and even keep the variable and value labels:
require(sjlabelled)
write_spss(EuStockMarkets, "mydata.sav")
Or you try rio:
rio::export(EuStockMarkets, "mydata.sav")
I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics:
I have created a package subdirectory called data and I am aware that this is the location where I should save the datasets that I want to add to my package. I am also cognizant of the fact that the files containing the data may be .rda, .txt, or .csv files.
Each series of data that I want to add to the package consists of a single column of numbers (eg. of the form 340 or 4.5) and each series of data differs in length.
So far, I have saved all of the datasets into a .txt file. I have also successfully loaded the data using the data() function. Problem not solved, however.
The problem is that each series of data loads as a factor except for the series greatest in length. The series that load as factors contain missing values (of the form '.'). I had to add these missing values in order to make each column of data the same in length. I tried saving the data as unequal columns, but I received an error message after calling data().
A consequence of adding missing values to get the data to load is that once the data is loaded, I need to remove the NA's in order to get on with my analysis of the data! So, this clearly is not a good way of doing things.
Ideally (I suppose), I would like the data to load as numeric vectors or as a list. In this way, I wouldn't need the NA's appended to the end of each series.
How do I solve this problem? Should I save all of the data into one single file? If so, in what format should I do it? Perhaps I should save the datasets into a number of files? Again, in which format? What is the best practical way of doing this? Any tips would greatly be appreciated.
I'm not sure if I understood your question correctly. But, if you edit your data in your favorite format and save with
save(myediteddata, file="data.rda")
The data should be loaded exactly the way you saw it in R.
To load all files in data directory you should add
LazyData: true
To your DESCRIPTION file, in your package.
If this don't help you could post one of your files and a print of the format you want, this will help us to help you ;)
In addition to saving as rda files you could also choose to load them as numeric with:
read.table( ... , colClasses="numeric")
Or as non-factor-text:
read.table( ..., as.is=TRUE) # which does pretty much the same as stringsAsFactors=FALSE
read.table( ..., colClasses="character")
It also appears that the data function would accept these arguments sinc it is documented to be a simple wrapper for read.table(..., header=TRUE).
Preferred saving location of your data depends on its format.
As Hadley suggested:
If you want to store binary data and make it available to the user,
put it in data/. This is the best place to put example datasets.
If you want to store parsed data, but not make it available to the
user, put it in R/sysdata.rda. This is the best place to put data
that your functions need.
If you want to store raw data, put it in inst/extdata.
I suggest you have a look at the linked chapter as it goes into detail about working with data when developing R packages.
You'll need to create the data file and include it in the R package, and you may want to also document it. Here's how to do both.
Create the data file and include it in R package
Create a directory inside the package called /data and place any data in it. Use only .rda and .RData files.
When creating the rda/RData file from an R object, make sure the R object is named what you want it to be named when it's used in the package and use save() to create it. Example:
save(river_fish, file = "data/river_fish.rda", version = 2)
Add this on a new line in the file called DESCRIPTION:
LazyData: true
Documenting the dataset
Document the dataset by placing a string with the dataset name after the documentation:
#' This is data to be included in my package
#'
#' #author My Name \email{blahblah##roxygen.org}
#' #references \url{data_blah.com}
"data-name"
Here and here are some nice examples from dplyr.
Notes
To access the data in the package, run river_fish or whatever the name of the dataset is. Nothing more is needed.
Using version = 2 when calling save() ensures your data object is available for older R versions (i.e. prior to 3.5.0) i.e. it will prevent this warning:
WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R.
No need to use load() in the R package (just call the object directly instead e.g. river_fish will be enough to yield the data from data/river_fish.rda), but in the event you do wish to load an rda/RData file for some reason (e.g. playing around or testing), this will do it:
load("data/river_fish.rda")
Informative sources here and here