Excel Exporting Multiple Data Sets to a Single Spreadsheet File - r

I am trying to output multiple small data frames to an Excel file. The data frames are residuals and predicted from mgcv models run from a loop. Each is a separate small data set that I am trying to output to a separate worksheet in the same Excel spreadsheet file.
The relevant line of code that is causing the error is from what I can tell this line of code
write.xlsx(resid_pred, parfilename, sheetName=parsheetname, append = TRUE)**
where resid_pred is the residuals predicted data frame, parfilename is the file name and path and
parsheetname is the sheet name.
The error message is
Error in save Workbook(wb, file = file, overwrite = overwrite) : File already exists!
Which makes no sense since the file would HAVE to exist if I am appending to it. Does anyone have a clue?

Amazingly the following code will work:
write.xlsx2(resid_pred, file=parfilename, sheetName= parsheetname, col.names =
TRUE, rowNames = FALSE, append = TRUE, overwrite = FALSE)
The only difference is it is write.xlsx2 not just xlsx.

Related

How to assign a sheet name when exporting from R

I am using the below code to export a data frame from RStudio to excel file. I want to define the sheet name while exporting however despite of writing the correct code the exported file contains the default sheet name "Sheet1". Seeking help where I am going wrong? I am using "xlsx" package.
write.xlsx(SLIDE6FVOLT, "C:\\Users\\I0510906\\Desktop\\RAuto\\SLIDE6FVOLT.xlsx",
sheetname = "SLIDE6FVOLT", colnames = TRUE, rownames = FALSE)
Parameter is called sheetName= not sheetname=

Adding a .CSV file to excel as a sheet in R

I'm able to add a dataframe to excel as a separate sheet. However, I want to be able to add a .CSV file that is already created as a sheet.
Code that works to add dataframe as a sheet:
library(xlsx)
write.xlsx(dataframe, file = excelFileName,
sheetName=excelsheetname, append=TRUE,row.names = FALSE)
I need to be able to replicate the same thing as above. However, instead of a dataframe, it is a .CSV file. Is there a solution?
Thanks
It seems like the only step missing in your solution is to first read the CSV file into a dataframe using read.csv or read.table:
library(xlsx)
dataframe <- read.csv(csv)
write.xlsx(dataframe, file = excelFileName,
sheetName=excelsheetname, append=TRUE,row.names = FALSE)
If you specifically want to add a csv to an excel sheet without first reading it in then that is another story, and you should clarify it in your question.
The following works and suits my needs.
csvDF = read.csv(file = csvFileName, as.is = 1,stringsAsFactors = FALSE, header = FALSE)
write.xlsx(csvDF , file = excelFileName,sheetName=sheetNameInfo, append=TRUE,row.names = FALSE, col.names = FALSE)
To start, here is a template that you can use to import an Excel file into R:
library("readxl")
read_excel("Path where your Excel file is stored\\File Name.xlsx")
And if you want to import a specific sheet within the Excel file, then you may use this template:
library("readxl")
read_excel("Path where your Excel file is stored\\File Name.xlsx",sheet = "Your sheet name")
Note: In the R Console, type the following command to install the readxl package:
install.packages("readxl")

r format csv with random line breaks and export

I have csv files that for some reason have random line breaks after some codes:
I can read this file fine in R but I was wondering if there was a way to create an output without the random line breaks? Importing this file in other programs is creating issues where the 416 becomes a new line.
id,Abuse,AbuseHistoryOfAbuse,AbuseCurrentlyInAbusive,AbuseHistoryOfCPS,AbuseImminentRisk,AbuseInterventionCodes,Alcohol,AlcoholCurrentlyInTreatment,AlcoholSuspectUse,AlcoholAdmitsUse,AlcoholInterventionCodes,Asthma,AsthmaHistory,AsthmaInterventionCodes,BarriersToService,BarriersExperiencing,BarriersHistoryOf,BarriersToServiceInterventionCodes,BasicNeeds,BasicFood,BasicFoodLimitedAccess,BasicFoodNoWIC,BasicFoodNoDHS,BasicHousing,BasicHousingHasRegular,BasicHousingHomelessWith,BasicHousingHomelessWithout,BasicTransportation,BasicTransportationNoneLimited,BasicOther,BasicNeedsInterventionCodes,Breastfeeding,BreastfeedingPrenatal,BreastfeedingInterventionCodes,BreastHealth,BreastHealthInterventionCodes,Diabetes,DiabetesHistoryGestational,DiabetesHistoryDiabetes,DiabetesInterventionCodes,Drugs,DrugsType,DrugsUse,DrugsInterventionCodes,FamilyPlanning,FamilyPlanningNoPlans,FamilyPlanningInterventionCodes,Hypertension,HypertensionHistoryHypertension,HypertensionHistoryPreeclampsia,HypertensionInterventionCodes,Nutrition,NutritionInterventionCodes,ChronicDisease,ChronicDiseaseHistoryOther,ChronicDiseaseInterventionCodes,Periodontal,PeriodontalNoVisit,PeriodontalInterventionCodes,PersonalGoals,PersonalGoalsInterventionCodes,Smoking,SmokingUse,SmokingInterventionCodes,SocialSupport,SocialSupportInterventionCodes,STD,STDDiscloseSTD,STDDiscloseHIV,STDInterventionCodes,Stress,PrenatalEDSScore,PostnatalEDSScore,StressScore,StressAll,StressModerate,StressHistoryMentalHealth,StressHistoryBabyBlues,StressReportsStress,StressCurrentlyTreated,StressNotFollowing,StressEndorsesSuicidal,StressInterventionCodes,WomensHealth,WomensHealthInterventionCodes
0001,FALSE,FALSE,FALSE,FALSE,FALSE,NA,FALSE,FALSE,FALSE,FALSE,NA,FALSE,FALSE,NA,TRUE,FALSE,TRUE,411
416 ,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,5F11
5T42
,TRUE,Not breastfeeding,NA,FALSE,NA,FALSE,FALSE,FALSE,NA,FALSE,NA,NA,NA,FALSE,FALSE,NA,FALSE,FALSE,FALSE,NA,FALSE,NA,FALSE,FALSE,NA,FALSE,FALSE,NA,FALSE,NA,FALSE,NA,NA,TRUE,2041,FALSE,FALSE,FALSE,NA,FALSE,NA,NA,NA,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,NA,FALSE,NA
I added a screenshot that helps show more:
As you mentioned, R does know how to skip through null lines when reading CSV files. This means that the resulting dataframe will contain no unexpected linebreaks, so if you just write it to CSV again, it will, likewise, have no random linebreaks:
temp <- read.csv('table_with_line_breaks.csv')
write.csv(temp, 'table_without_line_breaks.csv', row.names = FALSE, quote = FALSE)

Why is creating CSV file in Sparklyr R shows an Error?

Introdution
I have written following R code by referring Link-1. Here, Sparklyr package is used in R programming to read huge data from JSON file. But, while creating CSV file, it has shown the error.
R code
sc <- spark_connect(master = "local", config = conf, version = '2.2.0')
sample_tbl <- spark_read_json(sc,name="example",path="example.json", header = TRUE,
memory = FALSE, overwrite = TRUE)
sdf_schema_viewer(sample_tbl) # to create db schema
sample_tbl %>% spark_write_csv(path = "data.csv") # To write CSV file
Last line shows the following error. Dataset contains different data types. If required I can show the database schema. It contains nested data columns.
Error
Error: java.lang.UnsupportedOperationException: CSV data source does not support struct,media:array,display_url:string,expanded_url:string,id:bigint,id_str:string,indices:array,media......
Question
How to resolve this error? Is it due to the different data types or deep level 2 to 3 nested columns? Any help would be appreciated.
It seems that your dataframe has array data type, which is NOT supported by CSV. It seems it's not possible that CSV file can include array or other nest structure for this scenario.
Therefore, If you want your data to be human readable text, write out as Excel file.
Please note that Excel CSV (very special case though) supports arrays in CSV using "\n"
inside quotes, but you have to use as EOL for the row "\r\n" (Windows EOL).

read.csv.folder - quickly pulling pieces of data from one folder

I am currently trying to merge two data files using the map_df code. I have downloaded my dataset [https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data] and placed it within my working directory's file location. It is a file with many separate smaller files. I am hoping to import the dataset quickly using map_df instead of having to name every single file in code. However, when I try to pull the data from that folder:
namedata.df <- read.csv.folder(Namedata, x = TRUE, y = TRUE, header = TRUE, dec = ".", sep = ";", pattern = "csv", addSpec = NULL, back = TRUE)
I get a return of: Error in substr(folder, start = nchar(folder), stop = nchar(folder)) :
object 'Namedata' not found
Why might it be missing the folder? Is there a better way to pull in a folder of data?
Try projectTemplate. When you run load.project() command it loads all csv, xls files as dataframes. The data frame names are same as the file names

Resources