Readxl and openxlsx add extra characters to numbers from an excel file - r

I have some numbers in an excel file that I want to read into R as characters. When I import the file either using readxl or openxlsx, the imported data have two extra characters, which are not in the excel file. The excel sheet looks like this:
The example file is here
I have tried changing the format within the Excel file but this messes up the numbers. My current work-around is to concatenate the number with ' in a separate column in excel and then read that column into R. This works for some reason.
library(readxl)
boo <- read_excel("./boo.xlsx",
col_types = c("text"))
boo
Reading the excel file gives the following (note the last two characters in the Example numbers column. The concatNum column shows the concatenated version.
# A tibble: 6 x 2
`Example numbers` concatNum
<chr> <chr>
1 985.12002779568002 '985.12002779568
2 985.12002826159505 '985.120028261595
3 985.12002780627301 '985.120027806273
4 985.12002780627301 '985.120027806273
5 985.12002780724401 '985.120027807244
6 985.12002780291402 '985.120027802914
Any reasons why this would be happening? Does anyone have a better way of fixing it than my current work-around?

Related

Opening csv file correctly

I am trying to use this dataset: wine_quality_dataset
I am running the following function:
data2 <- read.table("C:/Users/Magda/Downloads/winewhite.csv")
And here is what I got:
head(data2)
V1
1 fixed acidity;volatile acidity;citric acid;residual sugar;chlorides;free sulfur dioxide;total sulfur dioxide;density;pH;sulphates;alcohol;quality
2 7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
3 6.3;0.3;0.34;1.6;0.049;14;132;0.994;3.3;0.49;9.5;6
4 8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;10.1;6
5 7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
6 7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
What command should I use to read csv file correctly?
Try
readr::read_csv("C:/Users/Magda/Downloads/winewhite.csv")
readr is part of tidyverse a collection of libraries that help you tidying up data.
If you are using European format CSV with a semicolon ; separator, use
readr::read_csv2("C:/Users/Magda/Downloads/winewhite.csv")

R read_xlsx Adds Trailing Digit to Character

I am reading an Excel file into R using the read_xlsx function from the readxl package. Some of the columns could be "numerics" in Excel, but I convert everything to a character as I read things in. This solves a lot of downstream problems for me because really none of the data from Excel is actually numeric in practice. Things that look like numerics are really identification numbers of some sort.
Here is my issue. I am trying to read in the following data:
You can see that the first column is a numeric in Excel. When I read this in, I get:
library(readxl)
xl <- read_xlsx("C:/test/test.xlsx", col_types = c("text"))
xl
#> # A tibble: 1 x 3
#> some_id_number some_name some_other_name
#> <chr> <chr> <chr>
#> 1 310.16000000000003 name name_Descriptions
Where is that trailing 3 coming from? I have tried to adjust the digits option per this question without any luck.
Any thoughts?

read.xlsx file with one column consisting "numbers as text"

I have excel file that contains numeric variables, but the first column (index column) uses custom formatting: those are numbers that should be presented as text (or similar to text) and having always fixed number of digits where some are zeroes. Here is my example table from excel:
And here is formatting for bad_col1 (rest are numbers or general):
When I try to import my data by using read.xlsx function from either openxlsx or xlsx package it produces something like this:
read.xlsx(file_dir,sheet=1)#for openxlsx
bad_col1 col2 col3
1 5 11 974
2 230 15 719
3 10250 6 944
4 2340 7 401
So as you can see, zeroes are gone. Is there any way to read 1st column as "text" and as other numeric? I can not convert it to text after, because "front zeroes" are gone arleady. I can think of workaround, but it would be more feasible for my project to have them converted while importing.
Thank you in Advance
You can use a vector to filter your desired format, with library readxl:
library(readxl)
filter <- c('text','numeric','numeric')
the_file <- read_xlsx("sample.xlsx", col_types = filter)
Even more, you can skip columns if you use in your filter 'skip' in the desired position, considering that you might have many columns.
Regards
With this https://readxl.tidyverse.org/reference/read_excel.html you can use paramater col_types so that first column is read as character.

Read csv but skip escaped commas in strings

I have a csv file like this:
id,name,value
1,peter,5
2,peter\,paul,3
How can I read this file and tell R that "\," does not indicate a new column, only ",".
I have to add that file has 400mb.
Thanks
You can use readLines() to read the file into memory and then pre-process it. If you're willing to convert the non-separate commas into something else, you can do something like:
> read.csv(text = gsub("\\\\,", "-", readLines("dat.csv")))
id name value
1 1 peter 5
2 2 peter-paul 3
Another option is to utilize the fact that the fread function from data.table can perform system commands as its first argument. Then you can do something like a sed operation on the file before reading it in (which may or may not be faster):
> data.table::fread("sed -e 's/\\\\\\,/-/g' dat.csv")
id name value
1: 1 peter 5
2: 2 peter-paul 3
You can always then use gsub() to convert the temporary - separator back into a comma.

Creating a vector from a file in R

I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package.
As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow.
> words <- c(apple=10, pie=14, orange=5, fruit=4)
My problem is to do the same thing but create the vector from a file which would contain words and their occurrence number. I would be very happy if you could give me some hints.
Moreover, to understand the format of the file to be inserted I write the vector words to a file.
> write(words, file="words.txt")
However, the file words.txt contains only the values but not the names(apple, pie etc.).
$ cat words.txt
10 14 5 4
Thanks.
words is a named vector, the distinction is important in the context of the cloud() function if I read the help correctly.
Write the data out correctly to a file:
write.table(words, file = "words.txt")
Create your word occurrence file like the txt file created. When you read it back in to R, you need to do a little manipulation:
> newWords <- read.table("words.txt", header = TRUE)
> newWords
x
apple 10
pie 14
orange 5
fruit 4
> words <- newWords[,1]
> names(words) <- rownames(newWords)
> words
apple pie orange fruit
10 14 5 4
What we are doing here is reading the file into newWords, the subsetting it to take the one and only column (variable), which we store in words. The last step is to take the row names from the file read in and apply them as the "names" on the words vector. We do the last step using the names() function.
Yes, 'vector' is the proper term.
EDIT:
A better method than write.table would be to use save() and load():
save(words. file="svwrd.rda")
load(file="svwrd.rda")
The save/load combo preserved all the structure rather than doing coercion. The write.table followed by names()<- is kind of a hassle as you can see in both Gavin's answer here and my answer on rhelp.
Initial answer:
Suggest you use as.data.frame to coerce to a dataframe an then write.table() to write to a file.
write.table(as.data.frame(words), file="savew.txt")
saved <- read.table(file="savew.txt")
saved
words
apple 10
pie 14
orange 5
fruit 4

Resources