r convert nested dataframe to xts - r

I am trying to secure proper convert from a dataframe (with nested dataframes & lists) into xts. I know that the best startpoint is to use a matrix as a base for creating an xts, but I am trying to simulate how I get the original downloaded data.
My question: If I need to use the base setup with nested dataframes and list, what steps other than in the below script, do I need to secure to get the data into xts?
Below you find my code. Note that it works to get the xts produced if uncomment complete section "3."
# 1.creates the "top-level" df
date <- c("2016-10-10 21:32:00", "2016-10-10 21:33:00") # vector for creating the df
volume <- c(1,2) # vector for creating the df
df2 <- data.frame(date = date, volume = volume) # creating the df
# 2.change class of 2 columns
df2$date <- as.POSIXct(df2$date) #change from character to POSIX
df2$volume <- as.numeric(df2$volume) #change from character to numeric
# 3.create openPrices (set: base/bid/ask) < - Potential problem for creating xts
df2$openPrice <- data.frame(no2 = c(1,2)) # creating a nested df with 2 temp column/values.
df2$openPrice$bid <- list(1.1, 1.2) # creating a list within the nested df
df2$openPrice$ask <- list(2.2, 2.3) # creating a list within the nested df
df2$openPrice$no2 <- NULL # remove the 2 temp column/values
# 4.create an xts(myxts1), based on a dataframe (df2) with nested dataframes and lists.
myxts1 <- xts(df2[,-1],order.by = df2$date)
! Note. The result needs to be stripped off from nested, since it is
the nested that provokes problem for the step of creating the xts.

Related

Unnest a ts class

My data has multiple customers data with different start and end dates along with their sales data.So I did simple exponential smoothing.
I applied the following code to apply ses
library(zoo)
library(forecast)
z <- read.zoo(data_set,FUN = function(x) as.Date(x) + seq_along(x) / 10^10 , index = "Date", split = "customer_id")
L <- lapply(as.list(z), function(x) ts(na.omit(x),frequency = 52))
HW <- lapply(L, ses)
Now my output class is list with uneven lengths.Can someone help me how to unnest or unlist the output in to a data frame and get the fitted values,actuals,residuals along with their dates,sales and customer_id.
Note : the reson I post my input data rather than data of HW is,the HW data is too large.
Can someone help me in R.
I would use tidyverse package to handle this problem.
map(HW, ~ .x %>%
as.data.frame %>% # convert each element of the list to data.frame
rownames_to_column) %>% # add row names as columns within each element
bind_rows(.id = "customer_id") # bind all elements and add customer ID
I am not sure how to relate dates and actual sales to your output (HW). If you explain it I might provide solution to that part of the problem too.
Firstly took all the unique customer_id into a variable called 'k'
k <- unique(data_set$customer_id)
Created a empty data frame
b <- data.frame()
extracted all the fitted values using a for loop and stored in 'a'.Using the rbind function attached all the fitted values to data frame 'b'
for(key in k){
print(a <- as.data.frame((as.numeric(HW_ses[[key]]$model$fitted))))
b <- rbind(b,a)
}
Finally using column bind function attached the input data set with data frame 'b'
data_set_final <- cbind(data_set,b)

Vector gets stored as a dataframe instead of being a vector

I am new to r and rstudio and I need to create a vector that stores the first 100 rows of the csv file the programme reads . However , despite all my attempts my variable v1 ends up becoming a dataframe instead of an int vector . May I know what I can do to solve this? Here's my code:
library(readr)
library(readr)
cup_data <- read_csv("C:/Users/Asus.DESKTOP-BTB81TA/Desktop/STUDY/YEAR 2/
YEAR 2 SEM 2/PREDICTIVE ANALYTICS(1_PA_011763)/Week 1 (Intro to PA)/
Practical/cup98lrn variable subset small.csv")
# Retrieve only the selected columns
cup_data_small <- cup_data[c("AGE", "RAMNTALL", "NGIFTALL", "LASTGIFT",
"GENDER", "TIMELAG", "AVGGIFT", "TARGET_B", "TARGET_D")]
str(cup_data_small)
cup_data_small
#get the number of columns and rows
ncol(cup_data_small)
nrow(cup_data_small)
cat("No of column",ncol(cup_data_small),"\nNo of Row :",nrow(cup_data_small))
#cat
#Concatenate and print
#Outputs the objects, concatenating the representations.
#cat performs much less conversion than print.
#Print the first 10 rows of cup_data_small
head(cup_data_small, n=10)
#Create a vector V1 by selecting first 100 rows of AGE
v1 <- cup_data_small[1:100,"AGE",]
Here's what my environment says:
cup_data_small is a tibble, a slightly modified version of a dataframe that has slightly different rules to try to avoid some common quirks/inconsistencies in standard dataframes. E.g. in a standard dataframe, df[, c("a")] gives you a vector, and df[, c("a", "b")] gives you a dataframe - you're using the same syntax so arguably they should give the same type of result.
To get just a vector from a tibble, you have to explicitly pass drop = TRUE, e.g.:
library(dplyr)
# Standard dataframe
iris[, "Species"]
iris_tibble = iris %>%
as_tibble()
# Remains a tibble/dataframe
iris_tibble[, "Species"]
# This gives you just the vector
iris_tibble[, "Species", drop = TRUE]

Why add column data to data frame not working with a function in r

I am curious that why the following code doesn't work for adding column data to a data frame.
a <- c(1:3)
b <- c(4:6)
df <- data.frame(a,b) # create a data frame example
add <- function(df, vector){
df[[3]] <- vector
} # create a function to add column data to a data frame
d <- c(7:9) # a new vector to be added to the data frame
add(df,d) # execute the function
If you run the code in R, the new vector doesn't add to the data frame and no error also.
R passes parameters to functions by value - not by reference - that means inside the function you work on a copy of the data.frame df and when returning from the function the modified data.frame "dies" and the original data.frame outside the function is still unchanged.
This is why #RichScriven proposed to store the return value of your function in the data.frame df again.
Credits go to #RichScriven please...
PS: You should use cbind ("column bind") to extend your data.frame independently of how many columns already exist and ensure unique column names:
add <- function(df, vector){
res <- cbind(df, vector)
names(res) <- make.names(names(res), unique = T)
res # return value
}
PS2: You could use a data.table instead of a data.frame which is passed by reference (not by value).

Manipulating a dataset by separating variables

I have a data set that looks similar to the image shown below. Total, it is over a 1000 observations long. I want to create a new data frame that separates the single variable into 3 variables. Each variable is separated by a "+" in each observation, so it will need to be separated by using that as a factor.
Here is a solution using data.table:
library(data.table)
# Data frame
df <- data.frame(MovieId.Title.Genres = c("yyyy+xxxx+wwww", "zzzz+aaaa+aaaa"))
# Data frame to data table.
df <- data.table(df)
# Split column into parts.
df[, c("MovieId", "Title", "Genres") := tstrsplit(MovieId.Title.Genres, "\\+")]
# Print data table
df
I'll assume that your movieData object is a single column data.frame object.
If you want to split a single element from your data set, use strsplit using the character + (which R wants to see written as "\\+"):
# split the first element of movieData into a vector of strings:
strsplit(as.character(movieData[1,1]), "\\+")
Use lapply to apply this to the entire column, then massage the resulting list into a nice, usable data.frame:
# convert to a list of vectors:
step1 = lapply(movieData[,1], function(x) strsplit(as.character(x), "\\+"))
# step1 is a list, so make it into a data.frame:
step2 = as.data.frame(step1)
# step2 is a nice data.frame, but its names are garbage. Fix it:
movieDataWithColumns = setNames(step2, c("MovieId", "Title", "Genres"))

How to define 1000 dataframes in a single one?

My problem is the following. Suppose I have 1000 dataframes in R with the names eq1.1, eq1.2, ..., eq1.1000. I would like a single dataframe containing my 1000 dataframes. Normally, if I have only two dataframes, say eq1.1 and eq1.2 then I could define
df <- data.frame(eq1.1,eq1.2)
and I'm good. However, I can't follow this procedure because I have 1000 dataframes.
I was able to define a list containing the names of my 1000 dataframes using the code
names <- c()
for (i in 1:1000){names[i]<- paste0("eq1.",i)}
However, the elements of my list are recognized as strings and not as the dataframes that I previously defined.
Any help is appreciated!
How about
df.names <- ls(pattern = "^eq1\\.\\d")
eq1.dat <- do.call(cbind,
lapply(df.names,
get))
rm(list = df.names)
library(stringi)
library(dplyr)
# recreate dummy data
lapply(1:1000,function(i){
assign(sprintf("eq1.%s",i),
as.data.frame(matrix(ncol = 12, nrow = 13, sample(1:15))),
envir = .GlobalEnv)
})
# Now have 1000 data frames in my working environment named eq1.[1:1000]
> str(ls(pattern = "eq1.\\d+"))
> chr [1:1000] "eq1.1" "eq1.10" "eq1.100" "eq1.1000" "eq1.101" "eq1.102" "eq1.103" ...
1) create a holding data frame from the ep1.1 data frame that will be appended
each iteration in the following loop
empty_df <- eq1.1
2) im going to search for all the data frame named by convention and
create a data frame from the returned characters which represent our data frame
objects, but are nothing more than a character string.
3) mutate that data frame to hold an indexing column so that I can order the data frames properly from 1:1000 as the character representation will not be in numeric order from the step above
4) Drop the indexing column once the data frame names are in proper sequence
and then unlist the dfs column back into a character sequence and slice
the first value out, since it is stored already to our empty_df
5) loop through that sequence and for each iteration globally assign and
bind the preceding data frame into place. So for example on iteration 1,
the empty_df is now the same as data.frame(ep1.1, ep1.2) and for the
second iteration the empty_df is the same as data.frame(ep1.1, ep1.2, ep1.3)
NOTE: the get function takes the character representation and calls the data object from it. see ?get for details
lapply(
data.frame(dfs = ls(pattern = 'eq1\\.\\d+'))%>%
mutate(nth = as.numeric(stri_extract_last_regex(dfs,'\\d+'))) %>%
arrange(nth) %>% select(-nth) %>% slice(-1) %>% .$dfs, function(i){
empty_df <<- data.frame(empty_df, get(i))
}
)
All done, all the dataframes are bound to the empty_df and to check
> dim(empty_df)
[1] 13 12000

Resources