> df <- read.csv("C:\\Users\\Vikas Kumar Dwivedi\\Desktop\\Yahoo.csv")
> df
Date Open High Low Close Adj.Close Volume
1 01-03-2013 null null null null null null
2 01-04-2013 1569.180054 1597.569946 1536.030029 1597.569946 1597.569946 77098000000
3 01-05-2013 1597.550049 1687.180054 1581.280029 1630.73999 1630.73999 76447250000
> df$Date <- as.Date(df$Date, format("%m/%d/%Y"))
> df <- df[order(df$Date), ]
> df<- as.xts(df[, 2], order.by = df$Date)
Error in UseMethod("as.xts") :
no applicable method for 'as.xts' applied to an object of class "factor"
I am not able to convert dataframe into xts? Could you please help me.
The problem is that the columns in your CSV contain numbers and characters, so read.csv() interprets them as factors. You need to do what quantmod::getSymbols.yahoo() does and set na.strings = "null". That tells read.csv() to treat the character string "null" as a NA value.
csv <- "Date,Open,High,Low,Close,Adj.Close,Volume
01-03-2013,null,null,null,null,null,null
01-04-2013,1569.180054,1597.569946,1536.030029,1597.569946,1597.569946,77098000000
01-05-2013,1597.550049,1687.180054,1581.280029,1630.73999,1630.73999,76447250000"
d <- read.csv(text = csv, na.strings = "null")
# also note that your date format was wrong, and there is no need to wrap a character
# string in `format()`
d$Date <- as.Date(d$Date, format = "%m-%d-%Y")
#d <- d[order(d$Date), ] # this isn't necessary, xts() will do it for you
(x <- xts(d[, 2], order.by = d$Date))
# [,1]
# 2013-01-03 NA
# 2013-01-04 1569.18
# 2013-01-05 1597.55
Or you can do all of this with a call to read.csv.zoo() and wrap it in as.xts() if you prefer an xts object.
(x <- as.xts(read.csv.zoo(text = csv, format = "%m-%d-%Y", na.strings = "null")))
# Open High Low Close Adj.Close Volume
# 2013-01-03 NA NA NA NA NA NA
# 2013-01-04 1569.18 1597.57 1536.03 1597.57 1597.57 77098000000
# 2013-01-05 1597.55 1687.18 1581.28 1630.74 1630.74 76447250000
Related
I have imported an SPSS file, which contains several date/time variables of the following class:
[1] "POSIXct" "POSIXt"
The user-defined missing value for these variables is 8888-08-08 00:00:00. How can I convert this value to NA for the set of relevant date/time variables in R?
I tried running df$datetime[df$datetime == "8888-08-08"] <- NA as well as df$datetime[df$datetime == as.Date("8888-08-08")] <- NA to no avail.
As these are in POSIXct, use the same type to convert and assign to NA
df$datetime[df$datetime == as.POSIXct("8888-08-08 00:00:00")] <- NA
data
set.seed(24)
df <- data.frame(datetime = sample(c(Sys.time(), Sys.time() + 1:5,
as.POSIXct("8888-08-08 00:00:00")), 20, replace =TRUE))
I have imported an SPSS file, which contains several date/time variables of the following class:
[1] "POSIXct" "POSIXt"
The user-defined missing value for these variables is 8888-08-08 00:00:00. How can I convert this value to NA for the set of relevant date/time variables in R?
I tried running df$datetime[df$datetime == "8888-08-08"] <- NA as well as df$datetime[df$datetime == as.Date("8888-08-08")] <- NA to no avail.
As these are in POSIXct, use the same type to convert and assign to NA
df$datetime[df$datetime == as.POSIXct("8888-08-08 00:00:00")] <- NA
data
set.seed(24)
df <- data.frame(datetime = sample(c(Sys.time(), Sys.time() + 1:5,
as.POSIXct("8888-08-08 00:00:00")), 20, replace =TRUE))
I m trying to convert all the NULL values in my dataset to NA. In short
Explanation of question
My data set looks like below:
One thing that I noticed though is that when I try to find the number of empty values it shows the number of NA values in my dataset not including the NULL values. I would like to convert the NULL values to NA in order to remove them.
So I counted the number of missing values in my complete dataset then in the columns as
> dim(raw_data)
[1] 80983 16
> # Count missing values in entire data set
> table(is.na(raw_data))
FALSE TRUE
1247232 48496
> # Count na 's column wise
> na_count <-sapply(raw_data, function(y) sum(length(which(is.na(y)))))
> na_count <- data.frame(na_count)
> na_count
na_count
Merchant_Id 1
Tran_Date 1
Military_Time 1
Terminal_Id_Key 1
Amount 1
Card_Amount_Paid 1
Merchant_Name 1
Town 1
Area_Code 1
Client_ID 48481
Age_Band 1
Gender_code 1
Province 1
Avg_Income_3M 1
Value_Spent 1
Number_Spent 1
As you can see it does not show the NULL as NA so I tried to convert it as:
> # Turn Null to NA
> temp_data <- raw_data
>
> temp_data[temp_data == ''] = NA
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I also tried
> # Turn Null to NA
> temp_data <- raw_data
> temp_data[temp_data == 'NULL'] = NA
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
But I am getting the error above. This was followed by the last one below (which was better because I did not have an error but I still got NULL values in my data set).
> raw_data[is.null(raw_data)] <- NA
> table(is.na(raw_data))
FALSE TRUE
1247232 48496
Could you perhaps suggest ways to deal with this error?
I also tried to get rid of the date and got this different error when I once again tried to remove the NULL values:
> df <- raw_data
>
> df1 <- transform(df, date = as.Date(df$Tran_Date), time = format(df$Tran_Date, "%T"))
>
> df1[df1 == NULL] = NA
Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, :
length of 'dimnames' [2] not equal to array extent
This solved my issue. Instead of changing the NULL values to NA. I imported the values in from the github account as NA values.
I added
na = c("","NA","NULL",NULL)
to my importing argument read.table or read_tsv from readr package. This then did the trick and changed my NULL values to NA.
I have several .csv files containing hourly data. Each file represents data from a point in space. The start and end date is different in each file.
The data can be read into R using:
lstf1<- list.files(pattern=".csv")
lst2<- lapply(lstf1,function(x) read.csv(x,header = TRUE,stringsAsFactors=FALSE,sep = ",",fill=TRUE, dec = ".",quote = "\""))
head(lst2[[800]])
datetime precip code
1 2003-12-30 00:00:00 NA M
2 2003-12-30 01:00:00 NA M
3 2003-12-30 02:00:00 NA M
4 2003-12-30 03:00:00 NA M
5 2003-12-30 04:00:00 NA M
6 2003-12-30 05:00:00 NA M
datetime is YYYY-MM-DD-HH-MM-SS, precip is the data value, codecan be ignored.
For each dataframe (df) in lst2 I want to select data for the period 2015-04-01 to 2015-11-30 based on the following conditions:
1) If precip in a df contains all NAswithin this period, delete it (do not select)
2) If precip is not all NAs select it.
The desired output (lst3) contains the sub-setted data for the period 2015-04-01 to 2015-11-30.
All dataframes in lst3 should have equal length with days and hourswithout precipdenoted as NA
The I can write the files in lst3 to my directory using something like:
sapply(names(lst2),function (x) write.csv(lst3[[x]],file = paste0(names(lst2[x]), ".csv"),row.names = FALSE))
The link to a sample file can be found here (~200 KB)
It's a little hard to understand exactly what you are trying to do, but this example (using dplyr, which has nice filter syntax) on the file you provided should get you close:
library(dplyr)
df <- read.csv ("L112FN0M.262.csv")
df$datetime <- as.POSIXct(df$datetime, format="%d/%m/%Y %H:%M")
# Get the required date range and delete the NAs
df.sub <- filter(df, !is.na(precip),
datetime >= as.POSIXct("2015-04-01"),
datetime < as.POSIXct("2015-12-01"))
# Check if the subset has any rows left (it will be empty if it was full of NA for precip)
if nrow(df.sub > 0) {
df.result <- filter(df, datetime >= as.POSIXct("2015-04-01"),
datetime < as.POSIXct("2015-12-01"))
# Then add df.result to your list of data frames...
} # else, don't add it to your list
I think you are saying that you want to retain NAs in the data frame if there are also valid precip values--you only want to discard if there are NAs for the entire period. If you just want to strip all NAs, then just use the first filter statement and you are done. You obviously don't need to use POSIXct if you've already got your dates encoded correctly another way.
EDIT: w/ function wrapper so you can use lapply:
library(dplyr)
# Get some example data
df <- read.csv ("L112FN0M.262.csv")
df$datetime <- as.POSIXct(df$datetime, format="%d/%m/%Y %H:%M")
dfnull <- df
dfnull$precip <- NA
# list of 3 input data frames to test, 2nd one has precip all NA
df.list <- list(df, dfnull, df)
# Function to do the filtering; returns list of data frames to keep or null
filterprecip <- function(d) {
if (nrow(filter(d, !is.na(precip), datetime >= as.POSIXct("2015-04-01"), datetime < as.POSIXct("2015-12-01"))) >
0) {
return(filter(d, datetime >= as.POSIXct("2015-04-01"), datetime < as.POSIXct("2015-12-01")))
}
}
# Function to remove NULLS in returned list
# (Credit to Hadley Wickham: http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8102.html)
compact <- function(x) Filter(Negate(is.null), x)
# Filter the list
results <- compact(lapply(df.list, filterprecip))
# Check that you got a list of 2 data frames in the right date range
str(results)
Based on what you've written, is sounds like you're just interested in subsetting your list of files if data exists in the precip column for this specific date range.
> valuesExist <- function(df,start="2015-04-01 0:00:00",end="2015-11-30 23:59:59"){
+ sub.df <- df[df$datetime>=start & df$datetime>=end,]
+ if(sum(is.na(sub.df$precip)==nrow(df)){return(FALSE)}else{return(TRUE)}
+ }
> lst2.bool <- lapply(lst2, valuesExist)
> lst2 <- lst2[lst2.bool]
> lst3 <- lapply(lst2, function(x) {x[x$datetime>="2015-04-01 0:00:00" & x$datetime>="2015-11-30 23:59:59",]}
> sapply(names(lst2), function (x) write.csv(lst3[[x]],file = paste0(names(lst2[x]), ".csv"),row.names = FALSE))
If you want to have a dynamic start and end time, toss a variable with these values into the valueExist function and replace the string timestamp in the lst3 assignment with that same variable.
If you wanted to combine the two lapply loops into one, be my guest, but I prefer having a boolean variable when I'm subsetting.
I stumbled across a peculiar behavior in the lubridate package: dmy(NA) trows an error instead of just returning an NA. This causes me problems when I want to convert a column with some elements being NAs and some date-strings that are normally converted without problems.
Here is the minimal example:
library(lubridate)
df <- data.frame(ID=letters[1:5],
Datum=c("01.01.1990", NA, "11.01.1990", NA, "01.02.1990"))
df_copy <- df
#Question 1: Why does dmy(NA) not return NA, but throws an error?
df$Datum <- dmy(df$Datum)
Error in function (..., sep = " ", collapse = NULL) : invalid separator
df <- df_copy
#Question 2: What's a work around?
#1. Idea: Only convert those elements that are not NAs
#RHS works, but assigning that to the LHS doesn't work (Most likely problem::
#column "Datum" is still of class factor, while the RHS is of class POSIXct)
df[!is.na(df$Datum), "Datum"] <- dmy(df[!is.na(df$Datum), "Datum"])
Using date format %d.%m.%Y.
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = c(NA_integer_, NA_integer_, :
invalid factor level, NAs generated
df #Only NAs, apparently problem with class of column "Datum"
ID Datum
1 a <NA>
2 b <NA>
3 c <NA>
4 d <NA>
5 e <NA>
df <- df_copy
#2. Idea: Use mapply and apply dmy only to those elements that are not NA
df[, "Datum"] <- mapply(function(x) {if (is.na(x)) {
return(NA)
} else {
return(dmy(x))
}}, df$Datum)
df #Meaningless numbers returned instead of date-objects
ID Datum
1 a 631152000
2 b NA
3 c 632016000
4 d NA
5 e 633830400
To summarize, I have two questions: 1) Why does dmy(NA) not work? Based on most other functions I would assume it is good programming practice that every transformation (such as dmy()) of NA returns NA again (just as 2 + NA does)? If this behavior is intended, how do I convert a data.frame column that includes NAs via the dmy() function?
The Error in function (..., sep = " ", collapse = NULL) : invalid separator is being caused by the lubridate:::guess_format() function. The NA is being passed as sep in a call to paste(), specifically at fmts <- unlist(mlply(with_seps, paste)). You can have a go at improving the lubridate:::guess_format() to fix this.
Otherwise, could you just change the NA to characters ("NA")?
require(lubridate)
df <- data.frame(ID=letters[1:5],
Datum=c("01.01.1990", "NA", "11.01.1990", "NA", "01.02.1990")) #NAs are quoted
df_copy <- df
df$Datum <- dmy(df$Datum)
Since your dates are in a reasonably straight-forward format, it might be much simpler to just use as.Date and specify the appropriate format argument:
df$Date <- as.Date(df$Datum, format="%d.%m.%Y")
df
ID Datum Date
1 a 01.01.1990 1990-01-01
2 b <NA> <NA>
3 c 11.01.1990 1990-01-11
4 d <NA> <NA>
5 e 01.02.1990 1990-02-01
To see a list of the formatting codes used by as.Date, see ?strptime