How to remove the first row in every sheet in r? - r

I have an excel document with multiple sheets. I was curious if there was a way to remove the first row of every sheet as there is a header file on the sheet that isn't necessary but this automatically appears on each sheet. I would rather not open the file and remove the first row as this can cause possible data errors.

"Many functions that read-in data will have a "skip" argument (under some name). E.g., for readxl::read_excel() you can use skip = 1 to skip one line. If you check out the documentation for the function you are using it should help clarify things."
- Andrew

Related

RStudio - weird formatting to append new row to .csv file

I am trying to append a new row of data to an existing .csv file but everytime I do so, it doesn't add it to a new row. Instead, it just appends to the last row like so:
Also, the first column is supposed to show the date but I'm not sure why it shows up as hashtags. But in the column where it shows 33882020-09-24, it should end at 3388 and everything else after that should be in their respective column below.
Here is what I have for my code. I've followed multiple forums on how to append to .csv files and did exactly what was shown so I am at a loss..
Any suggestions would be greatly appreciated! Thank you in advance!!
For the data, is possible that the Rstudio don't recognise the number as a data (because you use -, and for R is a operation. Try don't use this symbol, but for example_) or try to write 24sept2020.
For add a new row a found this info:
How can a add a row to a data frame in R?

How to delete row in for loop function?

I want to delete the second row in every sheet in an excel file. Actually, i have done this.
for (i in 1:5)
{bulan=read_excel(file, sheet=sheet[i],skip = 2)}
but it deletes the first 2 rows. How to only delete the second one? Thanx
It depends whether you would really like to delete a row in your excel sheet or read an excel sheet without a certain row.
Because what your are doing here is actually not deleting rows, but skipping the first two rows when reading the sheets. The function read_excel returns a dataframe.
If you want the dataframe without the second row, what you could do is:
for(i in 1:5){bulan <- read_excel(file, sheet=sheet[i])
bulan <- bulan[-2,]
}
However, this would not make much sense as is, since bulan gets overwritten in every step of the for loop.
If you would like to delete rows in your excel file using R, you could read the file, delete the corresponding row of the dataframe in R, and read the dataframe to an excel file again. Apparently there is am R package called "xlsx" for writing excel files.

How do I get EXCEL to interpret character variable without scientific notation in R using fwrite?

I have a relatively simple issue when writing out in R with fwrite from the data.table package I am getting a character vector interpreted as scientific notation by Excel. You can run the following code to create the data issue:
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))
fwrite(samp,"test.csv",row.names = F)
When you read this back into R you get values back no problem if you have scinote disable. My less code capable colleagues work with the csv directly in excel and they see this:
They can attempt to change the variable to text but excel then interprets all the zeros. I want them to see the original "7E39" from the data table created. Any ideas how to avoid this issue?
PS: I'm working with millions of rows so write.csv is not really an option
EDIT:
One workaround I've found is to just create a mock variable with quotes:
samp = data.table(id = c("7E39", "7G32","5D99999"))[,id2:=shQuote(id)]
I prefer a tidyr solution (pun intended), as I hate unnecessary columns
EDIT2:
Following R2Evan's solution I adapted it to data table with the following (factoring another numerical column, to see if any changes occured):
#create example
samp = data.table(id = c("7E39", "7G32","5D99999"))[,second_var:=c(1,2,3)]
fwrite(samp[,id:=sprintf("=%s", shQuote(id))],
"foo.csv", row.names=FALSE)
It's a kludge, and dang-it for Excel to force this (I've dealt with it before).
write.csv(data.frame(id=sprintf("=%s", shQuote(c("7E39", "7G32","5D99999")))),
"foo.csv", row.names=FALSE)
This is forcing Excel to consider that column a formula, and interpret it as such. You'll see that in Excel, it is a literal formula that assigns a static string.
This is obviously not portable and prone to all sorts of problems, but that is Excel's way in this regard.
(BTW: I used write.csv here, but frankly it doesn't matter which function you use, as long as it passes the string through.)
Another option, but one that your consumers will need to do, not you.
If you export the file "as is", meaning the cell content is just "7E39", then an auto-import within Excel will always try to be smart about that cell's content. However, you can manually import the data.
Using Excel 2016 (32bit, on win10_64bit, if it matters):
Open Excel (first), have an (optionally empty) worksheet already open
On the ribbon: Data > Get External Data > From Text
Navigate to the appropriate file (CSV)
Select "Delimited" (file type), click Next, select "Comma" (and optionally deselect any others that may default to selected), Next
Click on the specific column(s) and set the "Default data format" to "Text" (this will need to be done for any/all columns where this is a problem). Multiple columns can be Shift-selected (for a range of columns), but not Ctrl-selected. Finish.
Choose the top-left cell to import/paste the data (or a new worksheet)
Select Properties..., and deselect "Save query definition". Without this step, the data is considered a query into an external data source, which may not be a problem but makes some things a little annoying. (For example, try to highlight all data and delete it ... Excel really wants to make sure you know what you're doing there.)
This method provides a portable solution. It "punishes" the Excel users, but anybody/anything else will still be able to consume the files directly without change. The biggest disadvantage with this method is that you won't know if somebody loads it incorrectly unless/until they get odd results when the try to use the data and some fields are silently converted.

"arules" library's "read.transaction()" reads in CSV files with an additional, blank column for every transaction

When you attempt to read CSV files that aren't the default groceries.csv, every transaction has an additional entry in it — a blank space — which will mess up all of the calculations for analysis (and even crash R if your CSV file is big enough). I've tried to insert NA's into all of the blank cells in my CSV file, but I cannot find a way to remove all of them within the read.transactions() command (remove duplicates leaves a single NA). I haven't found a trustworthy way to fix this in any of the other questions on stackoverflow, nor anywhere else on the internet.
Example entry:
> inspect(trans[1:5])
items
1 {,
FACEBOOK.COM,
Google,
Google Web Search}
It is hard to say. I assume you read the data with read.transactions(). Does your CSV file have leading white spaces in some/all lines? You could try to use the cols parameter in read.transactions() to fix the problem.
An example with data and the code to replicate the problem would help.

Limited count of column in XLSX file

I should generate a big excel spreadsheet with XLConnect. I am filling each column in this spreadscheet with my calculation and at the end I am writing my calculation in the spreadscheet:
writeWorksheetToFile(file=FileName,mtr,startRow=1,startCol=strcol,sheet="Sheet1",header=FALSE,rownames=FALSE)
but if I a open the excel file I can only see until AMJ coloumn. Is there a way, that I see my total column or the count of columns in XLSX file is limited?
I am not sure Kaja about this, because I use a convenience wrapper for XLConnect, but all I need to provide as arguments are the object I want to print to file (for you, "mtr") and the filename, such as perhaps "mtr.print.xlsx".
Why do you need to specify the startRow? Also, perhaps your startCol argument only leaves the AMJ column? Have you tried omitting it?
In short, only all the R object and the filename and see what happens.

Resources