I want to import an excel file in R. The file however has columns such as Jan-13, Jan14 and so on. These are the column headers. When I import the data using the readxl package, it by default converts the date into numbers. So my columns which should be dates are now numbers.
I am using the code :
library(readxl)
data = read_excel("FileName", col_names = TRUE, skip = 0)
Can someone please help?
The date information is still there. It's just in the wrong format. This should work:
names(data) <- format(as.Date(as.numeric(names(data), origin="1899-01-01")), "%b%d")
Related
So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the readxl works for the basic stuff like this:
library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()
for (i in wb){
dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}
# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)
This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow R to guess the col_types because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The range argument uses the cell_ranger package which does not allow for reading discontinuous columns. So any alternative?
.xlsx >>> you can use library openxlsx
The read.xlsx function from library openxlsx has an optional parameter cols that takes a numeric index, specifying which columns to read.
It seems it reads all columns as characters if at least one column contains characters.
openxlsx::read.xlsx("test.xlsx", cols = c(2,3,6))
.xls >>> you can use library XLConnect
The potential problem is that library XLConnect requires library rJava, which might be tricky to install on some systems. If you can get it running, the keep and drop parameters of readWorksheet() accept both column names and indices. Parameter colTypes deals with column types. This way it works for me:
options(java.home = "C:\\Program Files\\Java\\jdk1.8.0_74\\") #path to jdk
library(rJava)
library(XLConnect)
workbook <- loadWorkbook("test.xls")
readWorksheet(workbook, sheet = "Sheet0", keep = c(1,2,5))
Edit:
Library readxl works well for both .xls and .xlsx if you want to read a range (rectangle) from your excel file. E.g.
readxl::read_xls("test.xls", range = "B3:D8")
readxl::read_xls("test.xls", sheet = "Sheet1", range = cell_cols("B:E"))
readxl::read_xlsx("test.xlsx", sheet = 2, range = cell_cols(2:5))
I am having two date(formatted as dd-mm-yyyy in excel) columns in my data in excel sheet.
Date Delivery Date Collection
06-08-17 15-08-17
11-04-17 15-04-17
24-01-17 24-01-17
11-08-16 14-08-16
There are multiple issues.
Currently I am reading a subset of data(manually made of top 100 rows in another excel sheet.).
The dates in same format in excel are shown differently in R.
They all look like as in Data.Collection when I read the whole data set.
data <- read.xlsx("file.xlsx", sheetName='subset', startRow=1)
The data output shown in R is
.
I need them all to be shown as in Data.Delivery because I need to write the result back after analysis.
I am also trying to make it Date in R using
dates <- data$Date.Delivery
as.Date(dates, origin = "30-12-1899",format="%d-%m-%y")
To format Date.Collection as in Data.Delivery after reading your file, try
# see the str of your data
str(data)
# if Date.Collection is characher
data$Date.Collection <- as.numeric(data$Date.Collection)
# if Date.Collection is factor
data$Date.Collection <- as.numeric(levels(data$Date.Collection))[data$Date.Collection]
# conversion
data$Date.Collection <- as.Date(data$Date.Collection - 25569, origin = "1970-01-01")
or you can read the file using "gdata" or "XLConnect" packages to read the column as factor.
then use ymd() from lubridate to convert it into date
require(gdata)
data = read.xls (path, sheet = 1, header = TRUE)
data$Date.Collection <- ymd(data$Date.Collection)
I have an excel file which has date information in some cells. like :
I read this file into R by the following command :
library(xlsx)
data.files = list.files(pattern = "*.xlsx")
data <- lapply(data.files, function(x) read.xlsx(x, sheetIndex = 9,header = T))
Everything is correct except the cells with date! instead of having the xlsx information into those cell, I always have 42948 as a date :
Does anybody know how could I fix this ?
As you can see, after importing your files, dates are represented as numeric values (here 42948). They are actually the internal representation of the date information in Excel. Those values are the ones that R presents instead of the “real” dates.
You can get those dates in R with as.Date(42948 - 25569, origin = "1970-01-01")
Notice that you can also use a vector containing the internal representation of the dates, so this should also work
vect <- c(42948, 42949, 42950)
as.Date(vect - 25569, origin = "1970-01-01")
PS: To convert an Excel datetime colum, see this (p.31)
So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the readxl works for the basic stuff like this:
library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()
for (i in wb){
dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}
# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)
This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow R to guess the col_types because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The range argument uses the cell_ranger package which does not allow for reading discontinuous columns. So any alternative?
.xlsx >>> you can use library openxlsx
The read.xlsx function from library openxlsx has an optional parameter cols that takes a numeric index, specifying which columns to read.
It seems it reads all columns as characters if at least one column contains characters.
openxlsx::read.xlsx("test.xlsx", cols = c(2,3,6))
.xls >>> you can use library XLConnect
The potential problem is that library XLConnect requires library rJava, which might be tricky to install on some systems. If you can get it running, the keep and drop parameters of readWorksheet() accept both column names and indices. Parameter colTypes deals with column types. This way it works for me:
options(java.home = "C:\\Program Files\\Java\\jdk1.8.0_74\\") #path to jdk
library(rJava)
library(XLConnect)
workbook <- loadWorkbook("test.xls")
readWorksheet(workbook, sheet = "Sheet0", keep = c(1,2,5))
Edit:
Library readxl works well for both .xls and .xlsx if you want to read a range (rectangle) from your excel file. E.g.
readxl::read_xls("test.xls", range = "B3:D8")
readxl::read_xls("test.xls", sheet = "Sheet1", range = cell_cols("B:E"))
readxl::read_xlsx("test.xlsx", sheet = 2, range = cell_cols(2:5))
I'm using RStudio and I wanted to import csv data.
This data has 3 columns and they are separated by ",".
Now I type test <- read.csv("data1.csv", sep=",")
Data is imported but its imported as just ONE column.
Headers are okay, but also the headers (actually 3) are combined together in just one column.
If I set header=F, there was V1 as heading. So there really is just one column.
Why is my separator not working?
try read_csv() from readr package
install.packages("devtools")
devtools::install_github("hadley/readr")
with your sample input
library(readr)
file <- 'Alter.des.Hauses,"Quadratfuß","Marktwert"\n33,1.812,"$90.000,00"\n'
read_csv(file) # for the actual use read_csv("file.csv") ...
read_csv2(file)