I am very new to R, I watched a youtube video to do various time series analysis, but it downloaded data from yahoo - my data is in Excel. I wanted to follow the same analysis, but with data from an excel.csv file. I spent two days finding out that the date must be in USA style. Now I am stuck again on a basic step - loading the data so it can be analysed - this seems to be the biggest hurdle with R. Please can someone give me some guidance on why the command shown below does not do the returns for the complete column set. I tried the zoo format, but it didn't work, then I tried xts and it worked partially. I suspect the original import from excel is the major problem. Can I get some guidance please
> AllPrices <- as.zoo(AllPrices)
> head(AllPrices)
Index1 Index2 Index3 Index4 Index5 Index6 Index7 Index8 Index9 Index10
> AllRets <- dailyReturn(AllPrices)
Error in NextMethod("[<-") : incorrect number of subscripts on matrix
> AllPrices<- as.xts(AllPrices)
> AllRets <- dailyReturn(AllPrices)
> head(AllRets)
daily.returns
2012-11-06 0.000000e+00
2012-11-07 -2.220249e-02
2012-11-08 1.379504e-05
2012-11-09 2.781961e-04
2012-11-12 -2.411128e-03
2012-11-13 7.932869e-03
Try to load your data using the readr package.
library(readr)
Then, look at the documentation by running ?read_csv in the console.
I recommend reading in your data this way. Specify the column types. For instance, if your first column is the date, read it in as a character "c" and if your other columns are numeric use "n".
data <- read_csv('YOUR_DATA.csv', col_types = "cnnnnn") # date in left column, 5 numeric columns
data$Dates <- as.Date(data$Dates, format = "%Y-%m-%d") # make the dates column a date class (you need to update "Dates" to be your column name for the Dates column, you may need to change the format
data <- as.data.frame(data) # turn the result into a dataframe
data <- xts(data[,-1], order.by = XAU[,1]) # then make an xts, data is everything but the date column, order.by is the date column
Related
please do not smash me before reading. First of all before asking the question I spent 3 hours just try to figure out couple of things, I have tried different approaches however it does not work, if you are here to direct me go google it, please do not. Thanks.
I had an xls data frame, tried to read in R with my lecturer told, it does not work, he basically wants us to implement a rjava library to read .xls file with read_xlsx(), i tried, nope does not work, just can not read even the file.
(As usual for me, read_csv() format, I converted .xls into .csv and finally could read in my R script.
And hurraa worked!!!)
Here is the tricky problem, I am explaining all steps, want you to feel my pain.
These are my column names:
When i split into 2 columns and try to convert, function works but all data under the column of date or time or date...time disappears, all i see is NA all the way down.
I appriciate if you can help me, thanks a lot.
sapply(myDataFrame, class)
Date...Timestamp DO1 DO2 Controlling.DO pH
"character" "numeric" "numeric" "numeric" "numeric"
Biomass Titre..mg.mL. Base.Buffer Media.Batch
"numeric" "numeric" "integer" "integer"**
As you can see I got a very very well thought column name (Date...Timestamp), anyway I need to convert char format into Date format, but all column all the way down "28/03/2020 17:28" formatted.
What i did ?
#Split Date / Timestamp column by character
myDataFrame <- myDataFrame %>%
mutate(Date = str_sub(Date...Timestamp, 1,11)) %>%
mutate(Time = str_sub(Date...Timestamp, 11))
#DataFrame manipulation, drop the old Data Timestamp column
myDataFrame <- myDataFrame %>%
select(-Date...Timestamp)
splitted column into two as wanted, date and time, cool!!! but when I try to change type of columns I get different variations of errors with each function.
i tried as.Date(), as.date.numeric(), as.POSIXct()
# converting to datetime object
myDataFrame[["Date"]] <- as.POSIXct(myDataFrame[["Date"]], format = "%D/%M/%Y")
myDataFrame[["Time"]] <- as.POSIXct(myDataFrame[["Time"]], format = "%H:%M")
myDataFrame$Date <- as.Date(myDataFrame$Date)
myDataFrame$Time <- as.Date(myDataFrame$Time)
# tried before splitting and splitting afterwards, still error.
myDataFrame$Date...Timestamp <- as.Date(myDataFrame$Date...Timestamp)
I am having two date(formatted as dd-mm-yyyy in excel) columns in my data in excel sheet.
Date Delivery Date Collection
06-08-17 15-08-17
11-04-17 15-04-17
24-01-17 24-01-17
11-08-16 14-08-16
There are multiple issues.
Currently I am reading a subset of data(manually made of top 100 rows in another excel sheet.).
The dates in same format in excel are shown differently in R.
They all look like as in Data.Collection when I read the whole data set.
data <- read.xlsx("file.xlsx", sheetName='subset', startRow=1)
The data output shown in R is
.
I need them all to be shown as in Data.Delivery because I need to write the result back after analysis.
I am also trying to make it Date in R using
dates <- data$Date.Delivery
as.Date(dates, origin = "30-12-1899",format="%d-%m-%y")
To format Date.Collection as in Data.Delivery after reading your file, try
# see the str of your data
str(data)
# if Date.Collection is characher
data$Date.Collection <- as.numeric(data$Date.Collection)
# if Date.Collection is factor
data$Date.Collection <- as.numeric(levels(data$Date.Collection))[data$Date.Collection]
# conversion
data$Date.Collection <- as.Date(data$Date.Collection - 25569, origin = "1970-01-01")
or you can read the file using "gdata" or "XLConnect" packages to read the column as factor.
then use ymd() from lubridate to convert it into date
require(gdata)
data = read.xls (path, sheet = 1, header = TRUE)
data$Date.Collection <- ymd(data$Date.Collection)
I have an issue that I'm running into on Windows. Suppose that I have the following data stored in a text file dat.csv:
timestamp, demand
2011-05-27 15:50:04, 38874
2016-03-27 01:30:03, 25107
This data is originally from a much larger csv file detailing the energy market in the UK. I attempt to create an xts time series object from this file as follows:
> library(xts)
> dat <- read.csv('dat.csv', sep=',', header=T, stringsAsFactors=F)
> dat.xts <- xts(dat[, 2],
> order.by = strptime(dat$timestamp, format="%Y-%m-%d %H:%M:%S"))
However, when I attempt to view the resulting xts object, this is what happens:
> dat.xts
[,1]
2011-05-27 15:50:04 38874
<NA> 25107
As you can see, while the index for the first line was parsed correctly, that for the second line has resulted in an NA.
Interestingly, the same code appears to run correctly on Ubuntu 16.04. I suspect it's something to do with how timezones work on Windows but I'm not entirely certain about that. Can somebody explain how I can avoid this problem when running on Windows?
Trying to create an xts file but after formatting upon loading in, I have different number of rows for dates than I do for my data. My data has many columns with varying number of rows, anywhere from 20 to 200. I want to create a separate variable after loading in, and the variable with depend on the composite I want to look at, so I want a full data.frame with NAs before creating a variable where I will na.omit and reduce the dimensions.
Here is the code:
#load file with desired composite
allcomposites <- read.csv("Composites 2014.08.31.csv", header = T)
compositebench <- allcomposites[1, 2:ncol(allcomposites)]
dates1 <- as.Date(allcomposites$Name, format = "%m/%d/%Y")
allcomposites <- as.data.frame(lapply(allcomposites[2:nrow(allcomposites),2:ncol(allcomposites)], as.numeric))
allcomposites <- as.xts(allcomposites, order.by = dates1)
## Error in xts(x, order.by = order.by, frequency = frequency, ...) :
## NROW(x) must match length(order.by)
Edit to show what allcomposites looks like:
Name Composite1 Composite2 Composite3 Composite4 Composite5
Bmark 229 229 982 612 995
8/31/2014 0.9979 0.9404 4.3808 3.9296
7/31/2014 -0.4563 -0.3038 -1.7817 -1.7248
6/30/2014 0.205 0.2234 2.2184 2.7304
5/31/2014 1.311 1.5771 3.4824 1.7601
4/30/2014 0.9096 1.0187 -1.9195 1.2964
You need to be more careful when removing the first row from dates1 as well as allcomposites.
Here's another way to accomplish your goal:
Lines <- "Name Composite1 Composite2 Composite3 Composite4 Composite5
Bmark 229 229 982 612 995
8/31/2014 0.9979 0.9404 4.3808 3.9296
7/31/2014 -0.4563 -0.3038 -1.7817 -1.7248
6/30/2014 0.205 0.2234 2.2184 2.7304
5/31/2014 1.311 1.5771 3.4824 1.7601
4/30/2014 0.9096 1.0187 -1.9195 1.2964"
library(xts)
# use fill=TRUE because you only provided data for 4 composites
allcomp <- read.table(text=Lines, header=TRUE, fill=TRUE)
# remove the first row that contains "Bmark"
allcomp <- allcomp[-1,]
# create an xts object from the remaining data
allcomp_xts <- xts(allcomp[,-1], as.Date(allcomp[,1], "%m/%d/%Y"))
## Error in xts(x, order.by = order.by, frequency = frequency, ...
## NROW(x) must match length(order.by)
I wasted hours running into this error. Regardless of whether or not I had the exact same problem, I'll show how I solved for this error message in case it saves you the pain I had.
I imported an Excel or CSV file (tried both) through several importing functions, then tried to convert my data (as either a data.frame or .zoo object) into an xts object and kept getting errors, this one included.
I tried creating a vector of dates seperately to pass in as the order.by parameter. I tried making sure the date vector the rows of the data.frame were the same. Sometimes it worked and sometimes it didn't, for reasons I can't explain. Even when it did work, R had "coerced" all my numeric data into character data. (Causing me endless problems, later. Watch for coercion, I learned.)
These errors kept happening until:
For xts conversion I used the date column from the imported Excel sheet as the order.by parameter with an as.Date() modifier, AND I *dropped the date column during the conversion to xts.*
Here's the working code:
xl_sheet <- read_excel("../path/to/my_excel_file.xlsx")
sheet_xts <- xts(xl_sheet[-1], order.by = as.Date(xl_sheet$date))
Note my date column was the first column, so the xl_sheet[-1] removed the first column.
I'm getting an error using smartbind to append two datasets. First, I'm pretty sure the error I'm getting:
> Error in as.vector(x, mode) : invalid 'mode' argument
is coming from the date variable in both datasets. The date variable in it's raw format is such: month/day/year. I transformed the variable after importing the data using as.Date and format
> rs.month$xdeeddt <- as.Date(rs.month$xdeeddt, "%m/%d/%Y")
> rs.month$deed.year <- as.numeric(format(rs.month$xdeeddt, format = "%Y"))
> rs.month$deed.day <- as.numeric(format(rs.month$xdeeddt, format = "%d"))
> rs.month$deed.month <- as.numeric(format(rs.month$xdeeddt, format = "%m"))
The resulting date variable is as such:
> [1] "2014-03-01" "2014-03-13" "2014-01-09" "2013-10-09"
The transformation for the date was applied to both datasets (the format of the raw data was identical for both datasets). When I try to use smartbind, from the gtools package, to append the two datasets it returns with the error above. I removed the date, month, day, and year variables from both datasets and was able to append the datasets successfully with smartbind.
Any suggestions on how I can append the datasets with the date variables.....?
I came here after googling for the same error message during a smartbind of two data frames. The discussion above, while not so conclusive about a solution, definitely helped me move through this error.
Both my data frames contain POSIXct date objects. Those are just a numeric vector of UNIXy seconds-since-epoch, along with a couple of attributes that provide the structure needed to interpret the vector as a date object. The solution is simply to strip the attributes from that variable, perform the smartbind, and then restore the attributes:
these.atts <- attributes(df1$date)
attributes(df1$date) <- NULL
attributes(df2$date) <- NULL
df1 <- smartbind(df1,df2)
attributes(df1$date) <- these.atts
I hope this helps someone, sometime.
-Andy