Export data from R to Excel - r

I am writing code to export database from R into Excel, I have been trying others code including:
write.table(ALBERTA1, "D:/ALBERTA1.txt", sep="\t")
write.csv(ALBERTA1,":\ALBERTA1.csv")
your_filename_in_R = read.csv("ALBERTA1.csv")
your_filename_in_R = read.csv("ALBERTA1.csv")
write.csv(df, file = "ALBERTA1.csv")
your_filename_in_R = read.csv("ALBERTA1.csv")
write.csv(ALBERTA1, "ALBERTA1.csv")
write.table(ALBERTA1, 'clipboard', sep='\t')
write.table(ALBERTA1,"ALBERTA1.txt")
write.table(as.matrix(ALBERTA2),"ALBERTA2.txt")
write.table(as.matrix(vecm.pred$fcst$Alberta_Females[,1]), "vecm.pred$fcst$Alberta_Females[,1].txt")
write.table(as.matrix(foo),"foo.txt")
write.xlsx(ALBERTA2, "/ALBERTA2.xlsx")
write.table(ALBERTA1, "D:/ALBERTA1.txt", sep="\t").
Other users of this forum advised me this:
write.csv2(ALBERTA1, "ALBERTA1.csv")
write.table(kt, "D:/kt.txt", sep="\t", row.names=FALSE)
You can see on the pictures the outcome I have got from the code above. But this numbers can't be used to make any further operations such as addition with other matrices.
Has someone experienced this kind of problems?

Another option is the openxlsx-package. It doesn't depend on java and can read, edit and write Excel-files. From the description from the package:
openxlsx simplifies the the process of writing and styling Excel xlsx files from R and removes the dependency on Java
Example usage:
library(openxlsx)
# read data from an Excel file or Workbook object into a data.frame
df <- read.xlsx('name-of-your-excel-file.xlsx')
# for writing a data.frame or list of data.frames to an xlsx file
write.xlsx(df, 'name-of-your-excel-file.xlsx')
Besides these two basic functions, the openxlsx-package has a host of other functions for manipulating Excel-files.
For example, with the writeDataTable-function you can create formatted tables in an Excel-file.

Recently used xlsx package, works well.
library(xlsx)
write.xlsx(x, file, sheetName="Sheet1")
where x is a data.frame

writexl, without Java requirement:
# install.packages("writexl")
library(writexl)
tempfile <- write_xlsx(iris)

The WriteXLS function from the WriteXLS package can write data to Excel.
Alternatively, write.xlsx from the xlsx package will also work.

One could also use the readODS package. Granted it doesn't produce an .xlsx, but Excel can read Open Document Spreadsheet (ODS) / LibreOffice files too.
require(readODS)
tmp = file.path(tempdir(), 'iris.ods')
write_ods(iris, tmp)

If I might offer an alternative, you could also save your dataframe in a regular csv file, and then use the "get data" function within Excel to import the dataframe. This worked like a charm for me, and you need not bother with any excel packages in R.

Here is a way to write data from a dataframe into an excel file by different IDs and into different tabs (sheets) by another ID associated to the first level id. Imagine you have a dataframe that has email_address as one column for a number of different users, but each email has a number of 'sub-ids' that have all the data.
data <- tibble(id = c(1,2,3,4,5,6,7,8,9), email_address = c(rep('aaa#aaa.com',3), rep('bbb#bbb.com', 3), rep('ccc#ccc.com', 3)))
So ids 1,2,3 would be associated with aaa#aaa.com. The following code splits the data by email and then puts 1,2,3 into different tabs. The important thing is to set append = True when writing the .xlsx file.
temp_dir <- tempdir()
for(i in unique(data$email_address)){
data %>%
filter(email_address == i) %>%
arrange(id) -> subset_data
for(j in unique(subset_data$id)){
write.xlsx(subset_data %>% filter(id == j),
file = str_c(temp_dir,"/your_filename_", str_extract(i, pattern = "\\b[A-Za-z0-
9._%+-]+"),'_', Sys.Date(), '.xlsx'),
sheetName = as.character(j),
append = TRUE)}
}
The regex gets the name from the email address and puts it into the file-name.
Hope somebody finds this useful. I'm sure there's more elegant ways of doing this but it works.
Btw, here is a way to then send these individual files to the various email addresses in the data.frame. Code goes into second loop [j]
send.mail(from = "sender#sender.com",
to = i,
subject = paste("Your report for", str_extract(i, pattern = "\\b[A-Za-z0-9._%+-]+"), 'on', Sys.Date()),
body = "Your email body",
authenticate = TRUE,
smtp = list(host.name = "XXX", port = XXX,
user.name = Sys.getenv("XXX"), passwd = Sys.getenv("XXX")),
attach.files = str_c(temp_dir, "/your_filename_", str_extract(i, pattern = "\\b[A-Za-z0-9._%+-]+"),'_', Sys.Date(), '.xlsx'))

I have been trying out the different packages including the function:
install.packages ("prettyR")
library (prettyR)
delimit.table (Corrvar,"Name the csv.csv") ## Corrvar is a name of an object from an output I had on scaled variables to run a regression.
However I tried this same code for an output from another analysis (occupancy models model selection output) and it did not work. And after many attempts and exploration I:
copied the output from R (Ctrl+c)
in Excel sheet I pasted it (Ctrl+V)
Select the first column where the data is
In the "Data" vignette, click on "Text to column"
Select Delimited option, click next
Tick space box in "Separator", click next
Click Finalize (End)
Your output now should be in a form you can manipulate easy in excel. So perhaps not the fanciest option but it does the trick if you just want to explore your data in another way.
PS. If the labels in excel are not the exact one it is because Im translating the lables from my spanish excel.

Related

Is there a way to report the file name of a imported dataset?

I'm trying to set up a export to .xlsx file that will include the name of the dataset into its title.
I have the functions and everything fine to add objects into the title and export it, but I dont know report the name of my original dataset as an object which I can then add into the function.
(Using Rstudio 1.3)
Before analysing my data, I import the dataset, "DS". I then call this "input".
input <- DS
data("input")
After all analysis is done, I set up the name I want to append and call it "name". I made it to include the row name, column name, and then a .xlsx at the end to save it as a .xlsx file (it was just saving without file extension before that)
name <- paste(analysis.score$pairs$row,
analysis.score$pairs$column,
".xlsx", sep = "_")
write.xlsx(analysis.score, name)
My resulting file will be something like "row_column_.xlsx"
What I need is a command to report what the file name of the dataset is (in this example DS), so that I can include it into the name to paste onto the file.
I've tried using name(input) but it returnns the names of all the columns in the file.
I have a number of datasets to analyse, and would like it so that I just have to put each dataset title in once at the begining of the script.
Sorry if this doesnt make sense, I'm very new to this (started Monday)
Thanks!
I don't know of an importing function that saves as an attribute to the data the file name that holds the data outside of R. That said, it would be pretty easy to make one.
my_import <- function(filename, ...){
require(rio)
require(stringr)
x <- import(filename, ...)
## strip off leading absolute or relative path information
attr(x, "filename") <- str_extract(filename, "[\\w\\d\\.\\_\\-]*$")
return(x)
}
Then, as long as you have the rio and stringr packages installed, you would be able to do the following:
df <- my_import("my_file.xlsx")
attr(df, "filename")
# [1] "my_file.xlsx"

Add file names as row name based on condition with R

I have found some variations to this question, and tried all possibilities but it does not help. I have been able to just extract the content, but I would like to have the file name associated as well at each row in a CSV file: If content ("Flash Point") found in the “.txt” file, extract content and give the “.txt” file name as the associated row name in the csv. If content not find just skip both content and file and go to next extraction. Any help would be greatly appreciated. The issue here is that the row names are given based on a specific condition. Here is the initial code. Thanks a lot for your help
for (i in 1:length(txt)){
doc<-readLines(txt[i])
doc<-doc[grepl("Flash point",doc)]
lst[[txt[[i]]]]<-doc %>% stringr::str_extract("(\\d|>).*")
results<-lst[[txt[[i]]]]
write.table(results,file = "outputestrod.csv",row.names = FALSE,col.names = FALSE,sep = ",", append = TRUE)
}
I am adding an example here
Content Extracted
Content Extracted with Files names As row if specific content value found
Result of suggested results<-paste(txt[i],lst[[txt[[i]]]])
Results
It sounds like you need to use a paste() command to combine two strings, the file name, and the contents of the file.
Try changing the line
results<-lst[[txt[[i]]]]
to this:
results<- paste(txt, lst[[txt[[i]]]] )
Here is a tidyverse version of what I think you are trying to do. Consider the resource http://r4ds.had.co.nz/ if you want to learn to code like this. Your loop does not take advantage of R's vector operations.
library(tidyverse)
filenames <- dir(your folder)
file_and_content_with_string <- function(filename, string){
doc<-readLines(filename)
doc<-doc[grepl(string,doc)]
file_text <- doc %>% stringr::str_extract("(\\d|>).*")
results <- data.frame(filename = filename, content = file_text)
}
all_results <- map_df(filenames, function(x) file_and_content_with_string(x, "Flash point"))
write_csv(all_results, "outputestrod.csv")

R Session Aborted When Reading Large Dataset

I need to read ~20,000 csv files (~500GB), then filter the data and bind them together. My code works when I only read ~15,000 files, but it prompts 'R session aborted' when I read ~20,000 files.
memory.limit(80000)
ReadCustomer = function(x)
fread(x, encoding = "UTF-8", select = c("customer_sysno", "event_cat2")) %>%
filter(event_cat2 == "***") %>%
select(customer_sysno) %>%
rename(CustomerSysNo = customer_sysno) %>%
mutate(CustomerSysNo = as.numeric(CustomerSysNo)) %>%
filter(CustomerSysNo > 0)
CustomerData = rbindlist(lapply(FileList, ReadCustomer))
I tried replacing fread(x, encoding = "UTF-8", select = c("customer_sysno", "event_cat2")) by spark_read_csv(sc, "Data", x), but sparkR still didn't work.
How can I read all the files? Will Rcpp help?
Do you know how many rows you get back from each file, you don't say?
You're essentially posing this problem as a straightforward filtering exercise; you want only the customer_sysno column where certain conditions are met. What you then want to do with this will influence whether you even want to merge them all together.
I propose opening an output file and appending each new output to it. Then you've got a local file containing all your desired customer_sysno values. You can then walk through or sample that as suits your use case.
If the rows where your event_cat2 condition is met is actually a small subset of each file, and each file is big, then another approach would be to readLine your way through them, maybe in conjunction with appending results to an output file. This is basically asking R to do a job like (g)awk is awesome at, so that might be a useful preprocessing step to get you the desired data.

Read excel file with formulas in cells into R

I was trying to read an excel spreadsheet into R data frame. However, some of the columns have formulas or are linked to other external spreadsheets. Whenever I read the spreadsheet into R, there are always many cells becomes NA. Is there a good way to fix this problem so that I can get the original value of those cells?
The R script I used to do the import is like the following:
options(java.parameters = "-Xmx8g")
library(XLConnect)
# Step 1 import the "raw" tab
path_cost = "..."
wb = loadWorkbook(...)
raw = readWorksheet(wb, sheet = '...', header = TRUE, useCachedValues = FALSE)
UPDATE: read_excel from the readxl package looks like a better solution. It's very fast (0.14 sec in the 1400 x 6 file I mentioned in the comments) and it evaluates formulas before import. It doesn't use java, so no need to set any java options.
# sheet can be a string (name of sheet) or integer (position of sheet)
raw = read_excel(file, sheet=sheet)
For more information and examples, see the short vignette.
ORIGINAL ANSWER: Try read.xlsx from the xlsx package. The help file implies that by default it evaluates formulas before importing (see the keepFormulas parameter). I checked this on a small test file and it worked for me. Formula results were imported correctly, including formulas that depend on other sheets in the same workbook and formulas that depend on other workbooks in the same directory.
One caveat: If an externally linked sheet has changed since the last time you updated the links on the file you're reading into R, then any values read into R that depend on external links will be the old values, not the latest ones.
The code in your case would be:
library(xlsx)
options(java.parameters = "-Xmx8g") # xlsx also uses java
# Replace file and sheetName with appropriate values for your file
# keepFormulas=FALSE and header=TRUE are the defaults. I added them only for illustration.
raw = read.xlsx(file, sheetName=sheetName, header=TRUE, keepFormulas=FALSE)

Appending r output in a single sheet of xlsx file

How can i append my R outputs in a single sheet of xlsx file? I am currently working on web crawling wherein i need to scrap the user reviews from website and save it in my deskstop in xlsx format. I need to every time change the website url(as user reviews are in different pages) in my code and save the output in one sheet of xlsx file.
Can you please help me with the code of appending outputs in a single sheet of xlsx file? Below is the code which i am using: Every time i need to change the website url and run the same below code and save the corresponding output in a single sheet of mydata.xlsx
library("rvest")
htmlpage <- html("http://www.glassdoor.com/GD/Reviews/Symphony-Teleca-Reviews-E28614_P2.htm?sort.sortType=RD&sort.ascending=false&filter.employmentStatus=REGULAR&filter.employmentStatus=PART_TIME&filter.employmentStatus=UNKNOWN")
proshtml <- html_nodes(htmlpage, ".pros")
pros <- html_text(proshtml)
pros
data=data.frame(pros)
library(xlsx)
write.xlsx(data, "D:/mydata.xlsx", append=TRUE)
A trivial, but super-slow way:
If you only need to add (a few) row(s) to an existing Excel file, and it only has one sheet to which you want to append, you can just do a simple read => overwrite step:
SHEET.NAME <- '...' # fill in with yours
existing.data <- read.xlsx(file, sheetName = SHEET.NAME)
new.data <- rbind(existing.data, data)
write.xlsx(new.data, file, sheetName = SHEET.NAME, row.names = F, append = F)
Note:
It's quite slow in general, will work only for small scale
read.xlsx is a slow function. Try read.xlsx2 to make it much faster (see the difference in the docs)
If your R process is run once and keeps working for a long time, obviously don't do it this way (reading and overwriting a file is ridiculous in that case)
look at package xlsx.
?write.xlsx will show you what you want. append=TRUE is the key.
========= EDIT TO CORRECT =========
As #Jakub pointed out, append=TRUE adds another worksheet to the file.
========= EDIT TO ADD: ANOTHER METHOD ==========
Another method is to save the data to a .csv file, which could easily open from excel. In this case, the append=T works as expected (adding to the existing sheet):
write.table(df,"D:/MyFile.csv",append=T,sep=",")

Resources