Error in panda from_csv datetime parsing - datetime

I have a csv file, where the first 3 columns of each row represent a date, like:
2013,1,1,... (first row)
I want to automatically convert the first three columns of each row in the csv into a python datetime object using the following code:
parseDate = lambda y,m,d: datetime.datetime(y,m,d)
df = pandas.DataFrame.from_csv(csvPath, index_col=False,header=None, parse_dates...
=[0,1,2],date_parser= parseDate)
But get an error in the date_parser part.
However, just doing
dtime = parseDate(2003,1,1)
works as expected, so my lambda expression actually seems to be correct.
Can anyone help?

Related

When writing a DateTime value using openxlsx, a "x" is written followed by DateTime value in the next row instead of just the DateTime value

I would like to write DateTime values to an excel sheet using openxlsx. When I try to do this, instead of just the DateTime value, I get a lowercase "x" on one row followed by the DateTime in the subsequent row. This occurs whether I use write.xlsx or writeData. I also tried converting the DateTime using as.POSIXlt or as.POSIXct, converting the date with timezone specified or not, and get the same result.
The UTC DateTime values are coming from a PerkinElmer microplate reader file.
Below is a code snippet that gives me this result. Any advice or help is appreciated, Thanks!
library(openxlsx)
library(lubridate)
date <- as_datetime("2022-04-07T22:15:08+0000", tz = "America/Los_Angeles")
options(openxlsx.datetimeFormat = "yyyy-mm-dd hh:mm:ss")
write.xlsx(date,"test.xlsx",overwrite = TRUE)
The documentation of write.xlsx says in section Arguments that x is (my emphasis)
A data.frame or a (named) list of objects that can be handled by writeData() or writeDataTable() to write to file.
So apparently an atomic vector is first coerced to data.frame and since the data argument name is x, so is its column header.
This also happens when writing a named list date_list <- list(date = date). A workbook with a sheet named date is created and the data in it has a column header x.

R changing Excel Date columns Datatype to Logical during read_excel() if Date column is empty in excel

So I have 2 excels VIN1.xlsx and VIN2.xlsx that I need to compare.
VIN1 excel has a dale column OUTGATE_DT which is populated for atleast 1 rows.
VIN2 excel has a date column OUTGATE_DT which is completely null for all rows.
when I import VIN1.xlsx excel using read_excel, it creates the object, and when I check the OUTGATE_DT column, it says its datatype to be as POSIXct[1:4] (which I assume is correct for Date Column. )
But when I import VIN2.xlsx excel using read_excel, it creates the object, and when I check the OUTGATE_DT column, it says its datatype to be logical[1:4] (it is doing this because this column is entirely empty).
and that is why my compare_df(vin1,vin2) functions failing
stating -
Error in rbindlist(l, use.names, fill, idcol) :
Class attribute on column 80 of item 2 does not match with column 80 of item 1.
I am completely new with R, your help would be highly appreciated. TIA
Please check the screenshot for reference.
You should use read_excel() as the following read_excel(, col_types = "text")
All your columns will be considered as text so you won't have any issue to compare or anything.
Or, if you want to keep the column types in your original df, you can do something like this:
library(dplyr)
library(readxl)
VIN2 <- read_excel(VIN2.xlsx) %>%
mutate(OUTGATE_DT = as.Date(OUTGATE_DT))
then you shouldn't have a problem using rbind or bind_rows from dplyr.

A cell in a CSV is (wrongly) read as a character vector of length 2 in R

I have a data frame like this I read in from a .csv (or .xlsx, I've tried both), and one of the variables in the data frame is a vector of dates.
Generate the data with this
Name <- rep("Date", 15)
num <- seq(1:15)
Name <- paste(Name, num, sep = "_")
data1 <- data.frame(
Name,
Due.Date = seq(as.Date("2020/09/24", origin = "1900-01-01"),
as.Date("2020/10/08", origin = "1900-01-01"), "days")
)
When I reference one of the cells specifically, like this: str(project_dates$Due.Date[241]) it reads the date as normal.
However, the exact position of the important dates varies from project to project, so I wrote a command that identifies where the important dates are in the sheet, like this: str(project_dates[str_detect(project_dates$Name, "Date_17"), "Due.Date"])
This code worked on a few projects, but on the current project it now returns a character vector of length 2. One of the values is the date, and the other value is NA. And to make matters worse, the location of the date and the NA is not fixed across dates--the date is the first value in some cells and the second in others (otherwise I would just reference, e.g., the first item in the vector).
What is going on here, but more importantly, how do I fix this?!
Clarification on the second command:
When I was originally reading from an Excel file, the command was project_dates[str_detect(project_dates$Name, "Date_17"), "Due.Date"]$Due.Date because it was returning a 1x1 tibble, and I needed the value in the tibble.
When I switched to reading in data as a csv, I had to remove the $Due.Date because the command was now reading the value as an atomic vector, so the $ operator was no longer valid.
Help me, Oh Blessed 1's (with) Knowledge! You're my only hope!
Edited to include an image of the data like the one that generates the error
I feel sheepish.
I was able to remove the NAs with
data1<- data1[!is.na(data1$Due.Date), ].
I assumed that command would listwise delete the rows with any missing values, so if the cell contained the 2-length vector, then I would lose the whole row of data. Instead, it removed the NA from the cell, leaving only the date.
Thank you to everyone who commented and offered help!

filtering while downloading a dataset R

There is a large dataset that I need to download over the web using R, but I would like to learn how to filter it at the same time while downloading to the Dates that I need. Right now, I have it setup to download and .unzip and then I create another data set with a filter. The file is a text ";" delimited file
There is a Date column with format 1/1/2009 and I need to only select two dates, 3/1/2009 and 3/2/2009, how to do that in R ?
When I import it, R set it as a factor, since I only need those two dates and there is no need to do a Between, I just select the two factors and call it a day.
Thanks!
I don't think you can filter while downloading. To select only these dates you can use the subset function:
# do not convert string to factors
d.all = read.csv(file, ..., stringsAsFactors = FALSE, sep = ';')
# Date column is called DATE:
d.filter = subset(d.all, DATE %in% c("1/1/2009", "3/1/2009"))

RODBC sqlQuery as.is returning bad results

I'm trying to import an excel worksheet into R. I want to retrieve a (character) ID column and a couple of date columns from the worksheet. The following code works fine but brings one column in as a date and not another. I think it has something to do with more leading columns being empty in the second date field.
dateFile <- odbcConnectExcel2007(xcelFile)
query <- "SELECT ANIMALID, ST_DATE_TIME, END_DATE_TIME FROM [KNWR_CL$]"
idsAndDates <- sqlQuery(dateFile,query)
So my plan now is to bring in the date columns as character fields and convert them myself using as.POSIXct. However, the following code produces only a single row in idsAndDates.
dateFile <- odbcConnectExcel2007(xcelFile)
query <- "SELECT ANIMALID, ST_DATE_TIME, END_DATE_TIME FROM [KNWR_CL$]"
idsAndDates <- sqlQuery(dateFile,query,as.is=TRUE,TRUE,TRUE)
What am I doing wrong?
I had to move on and ended up using the gdata library (which worked). I'd still be interested in an answer for this though.

Resources