Loading an Excel sheet into R, some strings in the cells of the dataframe appear to be bold and in a different format. For example, like so:
πππ’πππ«π
And when I copy paste this string into the R console, it appears like this:
Anyone know how to fix this (revert these strings into the standard format) in R?
Want to avoid going back into Excel to fix it.
Thanks!
These are actually UTF-8 encoded letters in the Mathematical Alphanumeric Symbols block in Unicode, and they don't map nicely back on to 'standard' ASCII letters in R unless you have a pre-existing mapping function such as utf8_normalize from the utf8 package:
library(utf8)
utf8_normalize('πππ’πππ«π', map_compat = TRUE)
#> [1] "Haidara"
However, I would strongly recommend that you fix your Excel file before importing to avoid having to do this; it works with the example you have given us here, but there may be unwelcome surprises in converting some of your other strings.
Usuaully I use Tidyverse to read in excel files with the read_excel command, however I encountered the dreaded "Unknown or uninitialised column" bug that refers to a non existent column and then warns about said not existent column from then on through the workflow.
So I decided to use openxlsx instead to read in the excel files. All was going well until I realised that openxlsx sees column names with white space as not syntactically correct and it adds a . to replace the whitespace. So 'Customer Name' becomes 'Customer.Name'.
I tried using the check.names=FALSE command to leave the headers in tact, but the package seems to ignore this command.
Many of the headers might have more than a single space between the words and the format has to stay the same. I cannot use an excel package that relies on Java as our company has blocked it.
How can I force openxlsx to leave the header alone?
Example of the code I am using is here: IMACS <- read.xlsx("//zfsstdscun001a.rz.ch.com/UKGI_Pricing/Bus_Insights/R_Scripts/IMACS.xlsx",check.names=FALSE, sheet = "IMACS")
All credit to #Matt on this.
Using readxl and read_excel together worked a treat.
IMACS <- readxl::read_excel("//zfsstdscun001a.rz.com/UKGI_Pricing/Bus_Insights/R_Scripts/CAT Risks/IMACSV2.xlsx",
sheet = "IMACS")
I am trying to parse a text log file like this, I can use the default read.csv to parse this file.
test <- read.csv("test.txt", header=FALSE)
It separated all comma parts, though not perfectly put in a dataframe, further manipulation can be done to improve.
However, I can not seem to do so using readr package
test <- read_csv("test.txt", header=FALSE)
All observations turn into 1 row, no separation between commas.
I am learning this package so any help would be great.
{"dev_id":"f8:f0:05:xx:db:xx","data":[{"dist":[7270,7269,7269,7275,7270,7271,7265,7270,7274,7267,7271,7271,7266,7263,7268,7271,7266,7265,7270,7268,7264,7270,7261,7260]},{"temp":0},{"hum":0},{"vin":448}],"time":4485318,"transmit_time":4495658,"version":"1.0"}
{"dev_id":"f8:xx:05:xx:d9:xx","data":[{"dist":[6869,6868,6867,6871,6866,6867,6863,6865,6868,6869,6868,6860,6865,6866,6870,6861,6865,6868,6866,6864,6866,6866,6865,6872]},{"temp":0},{"hum":0},{"vin":449}],"time":4405316,"transmit_time":4413715,"version":"1.0"}
{"dev_id":"xx:f0:05:e8:da:xx","data":[{"dist":[5775,5775,5777,5772,5777,5770,5779,5773,5776,5777,5772,5768,5782,5772,5765,5770,5770,5767,5767,5777,5766,5763,5773,5776]},{"temp":0},{"hum":0},{"vin":447}],"time":4461316,"transmit_time":4473307,"version":"1.0"}
{"dev_id":"xx:f0:xx:e8:xx:0a","data":[{"dist":[4358,4361,4355,4358,4359,4359,4361,4358,4359,4360,4360,4361,4361,4359,4359,4356,4357,4361,4359,4360,4358,4358,4362,4359]},{"temp":0},{"hum":0},{"vin":424}],"time":5190320,"transmit_time":5198748,"version":"1.0"}
Thanks to #Dave2e pointing out that this file is in JSON format, I found the way to parse it using ndjson::stream_in.
I am a novice in R and I have been having some trouble trying to get R and Excel to cooperate.
I have written a code that makes it able to compare two vectors with each other and determine the differences between them:
data.x<-read.csv(file.choose(), header=T)
data.y<-read.csv(file.choose(), header=T)
newdata.x<-grep("DAG36|G379",data.x,value=TRUE,invert=TRUE)
newdata.x
newdata.y<-grep("DAG36|G379",data.y,value=TRUE,invert=TRUE)
newdata.y
setdiff(newdata.x,newdata.y)
setdiff(newdata.y,newdata.x)
The data I want to transfer from Excel to R is a long row of numbers placed as so:
β312334-2056β, β457689-0932β, β857384-9857β,β¦.,
There are about 350 of these numbers placed in their own separate cell along a single row.
I used the command: = """" & A1 & """" To put double quotes around every number in order for R to read it properly.
At first I tried to simply copy/paste the data directly into a vector in R, but it's as if R wonβt read it as a single row of data and therefore splits it up.
I also tried to save the excel file as a CSV file but that didnβt work either.
Lastly I tried to open it directly in to R using the command:
data.x<- read.csv(file.choose(), header=T)
But as I type in: data.x and press enter it simply says:
<0 rows> (or 0-lenghts row.names)
I simply canβt figure out what Iβm doing wrong. Any help would be greatly appreciated.
It's hard to access without a reproducible example, but you should be able to transpose the Excel file into a single column. Then import using read_csv from the readr package. Take a look at the tidyverse package, which will contain some great tools to import and work with this type of data.
I use https://github.com/tidyverse/readxl/. It makes it easy to maintain formatting from excel into type safe tibbles.
If you can share some sample data a working solution can be generated.
I have an Excel file that I am trying to load into R using the odbcConnectExcel and sqlQuery commands from RODBC package. One of the columns has numerical values with plus or minus signs, such as '5+ or '3-. However, if i do something like,
conn <- odbcConnectExcel("file.xls")
sqlQuery(conn, "SELECT * FROM `Sheet1$`")
then the column with the plus and minus signs will be returned as a numerical column with those symbols stripped. Is there a way to have this column read in as a factor in which the signs are maintained? I would prefer to not have to convert the file to another format first.
Thanks.
Data like this becomes a factor if you use the xlsReadWrite (http://www.swissr.org/software/xlsreadwrite) package to read the file:
library(xlsReadWrite)
x <- read.xls(file="file.xls")
However, note that you need to do something more than just install.packages("xlsReadWrite") to get this package to run. You need another file or so, I forgot.
This doesn't directly address your question, but hopefully it will help:
This is the best summary of options for connecting to Excel that I have seen: Export Data Frames To Multi-worksheet Excel File. While it deals generally with exporting, importing is also possible with most of these approaches.
My favorite is actually the RDCOMClient because it provides total control over Excel as an application.