I'm able to add a dataframe to excel as a separate sheet. However, I want to be able to add a .CSV file that is already created as a sheet.
Code that works to add dataframe as a sheet:
library(xlsx)
write.xlsx(dataframe, file = excelFileName,
sheetName=excelsheetname, append=TRUE,row.names = FALSE)
I need to be able to replicate the same thing as above. However, instead of a dataframe, it is a .CSV file. Is there a solution?
Thanks
It seems like the only step missing in your solution is to first read the CSV file into a dataframe using read.csv or read.table:
library(xlsx)
dataframe <- read.csv(csv)
write.xlsx(dataframe, file = excelFileName,
sheetName=excelsheetname, append=TRUE,row.names = FALSE)
If you specifically want to add a csv to an excel sheet without first reading it in then that is another story, and you should clarify it in your question.
The following works and suits my needs.
csvDF = read.csv(file = csvFileName, as.is = 1,stringsAsFactors = FALSE, header = FALSE)
write.xlsx(csvDF , file = excelFileName,sheetName=sheetNameInfo, append=TRUE,row.names = FALSE, col.names = FALSE)
To start, here is a template that you can use to import an Excel file into R:
library("readxl")
read_excel("Path where your Excel file is stored\\File Name.xlsx")
And if you want to import a specific sheet within the Excel file, then you may use this template:
library("readxl")
read_excel("Path where your Excel file is stored\\File Name.xlsx",sheet = "Your sheet name")
Note: In the R Console, type the following command to install the readxl package:
install.packages("readxl")
I have csv files that for some reason have random line breaks after some codes:
I can read this file fine in R but I was wondering if there was a way to create an output without the random line breaks? Importing this file in other programs is creating issues where the 416 becomes a new line.
id,Abuse,AbuseHistoryOfAbuse,AbuseCurrentlyInAbusive,AbuseHistoryOfCPS,AbuseImminentRisk,AbuseInterventionCodes,Alcohol,AlcoholCurrentlyInTreatment,AlcoholSuspectUse,AlcoholAdmitsUse,AlcoholInterventionCodes,Asthma,AsthmaHistory,AsthmaInterventionCodes,BarriersToService,BarriersExperiencing,BarriersHistoryOf,BarriersToServiceInterventionCodes,BasicNeeds,BasicFood,BasicFoodLimitedAccess,BasicFoodNoWIC,BasicFoodNoDHS,BasicHousing,BasicHousingHasRegular,BasicHousingHomelessWith,BasicHousingHomelessWithout,BasicTransportation,BasicTransportationNoneLimited,BasicOther,BasicNeedsInterventionCodes,Breastfeeding,BreastfeedingPrenatal,BreastfeedingInterventionCodes,BreastHealth,BreastHealthInterventionCodes,Diabetes,DiabetesHistoryGestational,DiabetesHistoryDiabetes,DiabetesInterventionCodes,Drugs,DrugsType,DrugsUse,DrugsInterventionCodes,FamilyPlanning,FamilyPlanningNoPlans,FamilyPlanningInterventionCodes,Hypertension,HypertensionHistoryHypertension,HypertensionHistoryPreeclampsia,HypertensionInterventionCodes,Nutrition,NutritionInterventionCodes,ChronicDisease,ChronicDiseaseHistoryOther,ChronicDiseaseInterventionCodes,Periodontal,PeriodontalNoVisit,PeriodontalInterventionCodes,PersonalGoals,PersonalGoalsInterventionCodes,Smoking,SmokingUse,SmokingInterventionCodes,SocialSupport,SocialSupportInterventionCodes,STD,STDDiscloseSTD,STDDiscloseHIV,STDInterventionCodes,Stress,PrenatalEDSScore,PostnatalEDSScore,StressScore,StressAll,StressModerate,StressHistoryMentalHealth,StressHistoryBabyBlues,StressReportsStress,StressCurrentlyTreated,StressNotFollowing,StressEndorsesSuicidal,StressInterventionCodes,WomensHealth,WomensHealthInterventionCodes
0001,FALSE,FALSE,FALSE,FALSE,FALSE,NA,FALSE,FALSE,FALSE,FALSE,NA,FALSE,FALSE,NA,TRUE,FALSE,TRUE,411
416 ,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,5F11
5T42
,TRUE,Not breastfeeding,NA,FALSE,NA,FALSE,FALSE,FALSE,NA,FALSE,NA,NA,NA,FALSE,FALSE,NA,FALSE,FALSE,FALSE,NA,FALSE,NA,FALSE,FALSE,NA,FALSE,FALSE,NA,FALSE,NA,FALSE,NA,NA,TRUE,2041,FALSE,FALSE,FALSE,NA,FALSE,NA,NA,NA,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,NA,FALSE,NA
I added a screenshot that helps show more:
As you mentioned, R does know how to skip through null lines when reading CSV files. This means that the resulting dataframe will contain no unexpected linebreaks, so if you just write it to CSV again, it will, likewise, have no random linebreaks:
temp <- read.csv('table_with_line_breaks.csv')
write.csv(temp, 'table_without_line_breaks.csv', row.names = FALSE, quote = FALSE)
Introdution
I have written following R code by referring Link-1. Here, Sparklyr package is used in R programming to read huge data from JSON file. But, while creating CSV file, it has shown the error.
R code
sc <- spark_connect(master = "local", config = conf, version = '2.2.0')
sample_tbl <- spark_read_json(sc,name="example",path="example.json", header = TRUE,
memory = FALSE, overwrite = TRUE)
sdf_schema_viewer(sample_tbl) # to create db schema
sample_tbl %>% spark_write_csv(path = "data.csv") # To write CSV file
Last line shows the following error. Dataset contains different data types. If required I can show the database schema. It contains nested data columns.
Error
Error: java.lang.UnsupportedOperationException: CSV data source does not support struct,media:array,display_url:string,expanded_url:string,id:bigint,id_str:string,indices:array,media......
Question
How to resolve this error? Is it due to the different data types or deep level 2 to 3 nested columns? Any help would be appreciated.
It seems that your dataframe has array data type, which is NOT supported by CSV. It seems it's not possible that CSV file can include array or other nest structure for this scenario.
Therefore, If you want your data to be human readable text, write out as Excel file.
Please note that Excel CSV (very special case though) supports arrays in CSV using "\n"
inside quotes, but you have to use as EOL for the row "\r\n" (Windows EOL).
I am currently trying to merge two data files using the map_df code. I have downloaded my dataset [https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data] and placed it within my working directory's file location. It is a file with many separate smaller files. I am hoping to import the dataset quickly using map_df instead of having to name every single file in code. However, when I try to pull the data from that folder:
namedata.df <- read.csv.folder(Namedata, x = TRUE, y = TRUE, header = TRUE, dec = ".", sep = ";", pattern = "csv", addSpec = NULL, back = TRUE)
I get a return of: Error in substr(folder, start = nchar(folder), stop = nchar(folder)) :
object 'Namedata' not found
Why might it be missing the folder? Is there a better way to pull in a folder of data?
Try projectTemplate. When you run load.project() command it loads all csv, xls files as dataframes. The data frame names are same as the file names