I export my CSV file with python, numbers are wrapped as ="10000000000" in cells, for example:
name,price
"something expensive",="10000000000",
in order to display the number correctly, I prefer to wrap the big number or string of numbers(so someone could open it directly without reformating the column), like order ID into this format.
It's correct with excel or number, but when I import it with R by using read.csv, cells' values show as =10000000000.
Is there any solution to this?
Thank you
how about:
yourcsv <- read.csv("yourcsv.csv")
yourcsv <- gsub("=", "", yourcsv$price)
Also, in my experience read_csv() from the tidyverse library reads data in much faster than read.csv() and I think also has more logic built into it for nonideal cases encountered, so maybe it's worth trying.
Related
I want to read in a pretty large csv file from S3 including entries like Hawaii 21"" pizza. However, I noticed that if I use fread (which I do prefer as it's faster), entries include two double quotes changes into Hawaii 21"""" pizza. This kind of issue does not occur if I use read.csv.
I noticed the warning message recommends to add quote="" in order to avoid the issue. But how can I insert it in s3_read_using function?
I can use gsub to make extra quotes disappear, but still wondering if there's any direct solution to it.
And below is my read-in code:
table <- s3read_using(FUN=fread, object='mytable.csv', bucket="mybucket/tables")
table <- s3read_using(FUN=read.csv, object='mytable.csv', bucket="mybucket/tables")
Thanks in advance!
Try :
table <- s3read_using(FUN=fread, quote="\"", object='mytable.csv', bucket="mybucket/tables")
Using openxlsx read.xlsx to import a dataframe from a multi-class column. The desired result is to import all values as strings, exactly as they're represented in Excel. However, some decimals are represented as very long floats.
Sample data is simply an Excel file with a column containing the following rows:
abc123,
556.1,
556.12,
556.123,
556.1234,
556.12345
require(openxlsx)
df <- read.xlsx('testnumbers.xlsx', )
Using the above R code to read the file results in df containing these string
values:
abc123,
556.1,
556.12,
556.12300000000005,
556.12339999999995,
556.12345000000005
The Excel file provided in production has the column formatted as "General". If I format the column as Text, there is no change unless I explicitly double-click each cell in Excel and hit enter. In that case, the number is correctly displayed as a string. Unfortunately, clicking each cell isn't an option in the production environment. Any solution, Excel, R, or otherwise is appreciated.
*Edit:
I've read through this question and believe I understand the math behind what's going on. At this point, I suppose I'm looking for a workaround. How can I get a float from Excel to an R dataframe as text without changing the representation?
Why Are Floating Point Numbers Inaccurate?
I was able to get the correct formats into a data frame using pandas in python.
import pandas as pd
test = pd.read_excel('testnumbers.xlsx', dtype = str)
This will suffice as a workaround, but I'd like to see a solution built in R.
Here is a workaround in R using openxlsx that I used to solve a similar issue. I think it will solve your question, or at least allow you to format as text in the excel files programmatically.
I will use it to reformat specific cells in a large number of files (I'm converting from general to 'scientific' in my case- as an example of how you might alter this for another format).
This uses functions in the openxlsx package that you reference in the OP
First, load the xlsx file in as a workbook (stored in memory, which preserves all the xlsx formatting/etc; slightly different than the method shown in the question, which pulls in only the data):
testnumbers <- loadWorkbook(here::here("test_data/testnumbers.xlsx"))
Then create a "style" to apply which converts the numbers to "text" and apply it to the virtual worksheet (in memory).
numbersAsText <- createStyle(numFmt = "TEXT")
addStyle(testnumbers, sheet = "Sheet1", style = numbersAsText, cols = 1, rows = 1:10)
finally, save it back to the original file:
saveWorkbook(testnumbers,
file = here::here("test_data/testnumbers_formatted.xlsx"),
overwrite = T)
When you open the excel file, the numbers will be stored as "text"
I've got a CSV file that I am reading into an R script using fread. The resulting variable is a vector, which is what I need for the next step in my process. There are values in my CSV file such as 'Energy \nElectricity', and the intention is that these will be labels for a chart, with a line break between (in this case) 'Energy' and 'Electricity' for formatting reasons.
When I manually code the vector to be
myVec <- c('Energy \nElectricity'), this works fine and the line break is maintained.
When I read the data in using fread, however, the resulting vector is effectively c('Energy \\nElectricity'), i.e. the process has inserted an extra escape character and the formatting is lost.
My question is as follows:
Is there a way to use fread to maintain these line breaks at all?
If not, can I format them differently in my csv file?
If not, can I use gsub or similar to remove the extra line break once the file has been read into a vector?
I have tried all manner of ways to implement gsub (and sub), but they either get rid of both escape characters, such as gsub("\\\\", "\\", myVec) which gives
[1] "Energy nElectricity", or they throw an error. I think I am missing something obvious. Any help appreciated.
If nobody comes up with a better solution, this is how you would clean it using gsub:
gsub("\\n", "\n", "Energy \\nElectricity", fixed = TRUE)
The fixed option ignores all regex characters and is also considerably faster than fixed = FALSE.
I am trying to import data from in xls format in R, but it reads the header incorrectly, instead of
X1
R interprets the data as
`X1 `
that makes writing complicated R syntax impossible.
How this issue can be resolved ?
One can skip the header record and give your own column names with any number of R packages that read excel data. Here is an example with readxl::read_excel().
library(readxl)
data <- read_excel("./data/anExcelWorksheet.xlsx",
col_names=FALSE,
skip=1)
This code works, however, I wonder if there is a more efficient way. I have a CSV file that has a single column of ticker symbols. I then read this csv into R and apply functions to each ticker using a for loop.
I read in the csv, and then go into the data frame and pull out the character vector that the for loop needs to run properly.
SymbolListDataFrame = read.csv("DJIA.csv", header = FALSE, stringsAsFactors=F)
SymbolList = SymbolListDataFrame[[1]]
for (Symbol in SymbolList){...}
Is there a way to combine the first two lines I have written into one? Maybe read.csv is not the best command for this?
Thank you.
UPDATE
I am using the readlines method suggested by Jake and Bartek. There is a warning "incomplete final line found on" the csv file but I ignore it since the data is correct.
SymbolList <- readLines("DJIA.csv")
SymbolList <- read.csv("DJIA.csv", header = FALSE, stringsAsFactors=F)[[1]]
readLines function is the best solution here.
Please note that read.csv function is not only for reading files with csv extensions. This is simply read.table function with parameters like header or sep set differently. Check the documentation for more info.